"k-ported vs. k-lane Broadcast, Scatter, and Alltoall Algorithms";
Report for CoRR - Computing Research Repository;
Report No. arXiv:2008.12144,
In k-ported message-passing systems, a processor can simultaneously receive k different messages from k other processors, and send k different messages to k other processors that may or may not be different from the processors from which messages are received. Modern clustered systems may not have such capabilities. Instead, compute nodes consisting of n processors can simultaneously send and receive k messages from other nodes, by letting k processors on the nodes concurrently send and receive at most one message. We pose the question of how to design good algorithms for this k-lane model, possibly by adapting algorithms devised for the traditional k-ported model. We discuss and compare a number of (non-optimal) k-lane algorithms for the broadcast, scatter and alltoall collective operations (as found in, e.g., MPI), and experimentally evaluate these on a small 36×32-node cluster with a dual OmniPath network (corresponding to k=2). Results are preliminary.
Created from the Publication Database of the Vienna University of Technology.