Advanced Processor Technologies Home
APT Advanced Processor Technologies Research Group

Efficient Parallel Implementation of Multilayer Backpropagation Network on Torus-connected CMPs

X. Jin, M. Lujan, M.M. Khan, L.A. Plana, A.D. Rast, S.R. Welbourne, S.B.

Abstract

This paper presents an efficient implementation and performance analysis of mapping multi-layer perceptron networks with the backpropagation learning rule on a new system based on torus-connected CMPs topology. A new algorithm called pipelined checker-boarding partitioning scheme is proposed for efficient mapping. The new mapping algorithm relies on a checker-board partitioning scheme (or block-block matrix), but the key advantage comes from introducing a pipelined mode. The sixstage pipelined mode captures the parallelism within each partition of the weight matrix, allowing the overlapping of communication and computation. Not only does the proposed mapping localize communication, but it can also hide a part of or even all the communication. This mapping scheme is evaluated based on the SpiNNaker - a massively parallel architecture dedicated for neural network simulation. Each SpiNNaker node is a bespoke multi-core chip with an on-chip router, and these nodes are interconnected through a two dimensional torus mesh. The results with SpiNNaker configurations up to 1000 nodes (20000 cores) show that the pipelined model is more efficient than the traditional non-pipelined model when training large-scale recurrent neural networks.

PDF (654K)