[Back]


Scientific Reports:

J. Träff, A. Carpen-Amarie, S. Hunold, A. Rougier:
"Message-Combining Algorithms for Isomorphic, Sparse Collective Communication";
Report for CoRR - Computing Research Repository; Report No. arXiv:1606.07676, 2016; 12 pages.



English abstract:
Isomorphic (sparse) collective communication is a form of collective communication in which all involved processes communicate in small, identically structured neighborhoods of other processes. Isomorphic neighborhoods are defined via an embedding of the processes in a regularly structured topology, e.g., d-dimensional torus, which may correspond to the physical communication network of the underlying system. Isomorphic collective communication is useful for implementing stencil and other regular, sparse distributed computations, where the assumption that all processes behave (almost) symmetrically is justified.
In this paper, we show how efficient message-combining communication schedules for isomorphic, sparse collective communication can easily and efficiently be computed by purely local computations. We give schemes for \emph{isomorphic \alltoall} and \emph{\allgather} communication that reduce the number of communication rounds and thereby the communication latency from s to at most Nd, for neighborhoods consisting of s processes with the (small) factor N depending on the structure of the neighborhood and the capabilities of the communication system. Using these schedules, we give \emph{zero-copy implementations} of the isomorphic collectives using MPI and its derived datatypes to eliminate explicit, process-local copy operations. By benchmarking the collective communication algorithms against straightforward implementations and against the corresponding MPI neighborhood collectives, we document significant latency improvements of our implementations for block sizes of up to a few kilobytes. We discuss further optimizations for computing even better schedules, some of which have been implemented and benchmarked.
The proposed message-combining schedules are difficult
to incorporate as implementations for the MPI neighborhood
collectives, since MPI processes lack the necessary information
about the global structure of the communication pattern. If this information is externally asserted, our algorithms can be used to improve the performance of the MPI neighborhood collectives for isomorphic communication patterns for latency-sensitive problem sizes.