Publikationseintrag

[Zurück]

Vorträge und Posterpräsentationen (mit Tagungsband-Eintrag):

M. Forsell, J. Roivainen, V. Leppänen, J. Träff:
"Implementation of Multioperations in Thick Control Flow Processors";
Vortrag: 20th Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2018) in conjunction with IPDPS 2018, Vancouver, British Columbia, Canada; 21.05.2018 - 25.05.2018; in: "Proceedings of the IEEE 32nd International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2018)", IEEE, (2018), ISBN: 978-1-5386-5556-6; S. 744 - 752.

Kurzfassung englisch:

Multioperations are primitives of parallel computation for which processors perform a reduction, e.g. addition, on values provided by multiple threads into a single value in a constant number of steps. Algorithmically, multioperations can speed up execution by a logarithmic factor over their single operation counterparts. In this paper, we propose an architectural technique for realizing multioperations in thick control flow processors. Thick control flows (TCF) are computational constructs that simplify parallel programming by bundling a number of homogeneous threads following the same control path into universalized vector-like entities. The elements of TCFs are called fibers to distinguish them from ordinary threads having their own individual control. Processors designed for executing TCFs feature a unique frontend-backend structure to provide low-latency processing of TCF-common computations and high-throughput execution of data parallel fibers. Our proposal relies on step caches and equally sized multioperation scratchpads, while on the memory side, we make use of active memory modules. The idea is to compute partial results in backend units to reduce the traffic to the referred shared memory location. The final result is then computed in the active memory unit of the target memory module. According to the evaluation made with our TCF-aware processor equipped with multioperation scratchpads and active memory units, it indeed executes certain N data element-algorithms log N times faster than the baseline processor. The cost of the implementation is preliminarily evaluated.

Schlagworte:

parallel computing, processor architecture, multioperations, reductions, TCF

"Offizielle" elektronische Version der Publikation (entsprechend ihrem Digital Object Identifier - DOI)

http://dx.doi.org/10.1109/IPDPSW.2018.00121

Erstellt aus der Publikationsdatenbank der Technischen Universität Wien.