Talks and Poster Presentations (with Proceedings-Entry):
J. Träff, S. Hunold:
"Decomposing MPI Collectives for Exploiting Multi-lane Communication";
Talk: IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference,
- 2020-09-17; in: "Proceedings of the IEEE International Conference on Cluster Computing (IEEE Cluster 2020)",
Many modern, high-performance systems increase the cumulated node-bandwidth by offering more than a single communication network and/or by having multiple connections to the network, such that a single processor-core cannot by itself saturate the off-node bandwidth. Efficient algorithms and implementations for collective operations as found in, e.g., MPI, must be explicitly designed for exploiting such multilane capabilities. We are interested in gauging to which extent this might be the case. We systematically decompose the MPI collectives into similar operations that can execute concurrently on and exploit multiple network lanes. Our decomposition is applicable to all standard MPI collectives (broadcast, gather, scatter, allgather, reduce allreduce, reduce-scatter, scan, alltoall), and our implementations' performance can be readily compared to the native collectives of any given MPI library. Contrary to expectation, our full-lane, performance guideline implementations in many cases show surprising performance improvements with different MPI libraries on a dual-socket, dual-network Intel OmniPath cluster, indicating a large potential for improving the performance of native MPI library implementations. Our full-lane implementations are in many cases large factors faster than the corresponding MPI collectives. We see similar results on a larger, dual-rail Intel InfiniBand cluster. The results indicate considerable room for improvement of the MPI collectives in current MPI libraries including a more efficient use of multilane capabilities.
"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)
Created from the Publication Database of the Vienna University of Technology.