Publication Entry

[Back]

Talks and Poster Presentations (with Proceedings-Entry):

M. Forsell, J. Roivainen, V. Leppänen, J. Träff:
"Implementation of Multioperations in Thick Control Flow Processors";
Talk: 20th Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2018) in conjunction with IPDPS 2018, Vancouver, British Columbia, Canada; 2018-05-21 - 2018-05-25; in: "Proceedings of the IEEE 32nd International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2018)", IEEE, (2018), ISBN: 978-1-5386-5556-6; 744 - 752.

English abstract:

Multioperations are primitives of parallel computation for which processors perform a reduction, e.g. addition, on values provided by multiple threads into a single value in a constant number of steps. Algorithmically, multioperations can speed up execution by a logarithmic factor over their single operation counterparts. In this paper, we propose an architectural technique for realizing multioperations in thick control flow processors. Thick control flows (TCF) are computational constructs that simplify parallel programming by bundling a number of homogeneous threads following the same control path into universalized vector-like entities. The elements of TCFs are called fibers to distinguish them from ordinary threads having their own individual control. Processors designed for executing TCFs feature a unique frontend-backend structure to provide low-latency processing of TCF-common computations and high-throughput execution of data parallel fibers. Our proposal relies on step caches and equally sized multioperation scratchpads, while on the memory side, we make use of active memory modules. The idea is to compute partial results in backend units to reduce the traffic to the referred shared memory location. The final result is then computed in the active memory unit of the target memory module. According to the evaluation made with our TCF-aware processor equipped with multioperation scratchpads and active memory units, it indeed executes certain N data element-algorithms log N times faster than the baseline processor. The cost of the implementation is preliminarily evaluated.

Keywords:

parallel computing, processor architecture, multioperations, reductions, TCF

"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)

http://dx.doi.org/10.1109/IPDPSW.2018.00121

Created from the Publication Database of the Vienna University of Technology.