Talks and Poster Presentations (with Proceedings-Entry):

M. Forsell, J. Roivainen, J. Träff:
"Optimizing Memory Access in TCF Processors with Compute-Update Operations";
Talk: 22nd Workshop on Advances in Parallel and Distributed Computational Models (APDCM 2020) in conjunction with IPDPS 2020 - Online Conference, New Orleans, Louisiana, USA; 2020-05-18 - 2020-05-22; in: "Proceedings of the IEEE 34th International Parallel and Distributed Processing Symposium Workshops (IPDPSW 2020)", IEEE, (2020), ISBN: 978-1-7281-7457-0; 577 - 586.

English abstract:
The thick control flow (TCF) model is a data parallel abstraction of the thread model. It merges homogeneous threads (called fibers) flowing through the same control path to entities (called TCFs) with a single control flow and multiple data flows. Fibers of a TCF are executed synchronously with respect to each other and the number of them can be altered dynamically at runtime. Multiple TCFs can be executed in parallel to support control parallelism. In our previous work, we have outlined a special architecture, TPA (Thick control flow Processor Architecture), for executing TCF programs efficiently and shown that designing algorithms with the TCF model often leads to increased performance and simplified programs due to higher abstraction, eliminated loops and redundant program elements.Compute-update memory operations, such as multioperations and atomic instructions, are known to speed up parallel algorithms performing reductions and synchronizations. In this paper, we propose special compute-update memory operations for TCF processors to optimize iterative exclusive inter-fiber memory access patterns. Acceleration is achieved, e.g., in matrix addition and log-prefix style patterns in which multiple target locations can interchange data without reloads between the instructions that slows down execution. Our solution is based on modified active memory units and special memory operations that can send their reply value to another fiber than that initiating the access. We implement these operations in our TPA processor with a minimal HW cost and show that the expected speedups are achieved. Programming examples are given.

Parallel computing, processor architecture, TCF, active memory, compute-update memory operations

"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)

Created from the Publication Database of the Vienna University of Technology.