Talks and Poster Presentations (without Proceedings-Entry):
"Accurately Measuring MPI Collectives with Synchronized Clocks";
Talk: Dagstuhl Seminar 15281: Algorithms and Scheduling Techniques to Manage Resilience and Power Consumption in Distributed Systems,
Schloss Dagstuhl, Wadern, Germany (invited);
We consider the problem of accurately measuring the time to complete an MPI collective operation, as the result strongly depends on how the time is measured.
Our goal is to develop an experimental method that allows for reproducible measurements of MPI collectives. When executing large parallel codes, MPI processes are often skewed in time when entering a collective operation. However, for the sake of reproducibility, it is a common approach to synchronize all processes before they call the MPI collective operation. We therefore take a closer look at two commonly used process synchronization schemes: (1) relying on MPI_Barrier or (2) applying a window-based scheme using a common global time. We analyze both schemes experimentally and show the pros and cons of each approach. As window-based schemes require the notion of global time, we thoroughly evaluate different clock synchronization algorithms in various experiments. We also propose a novel clock synchronization algorithm that combines two advantages of known algorithms, which are (1) taking the inherent clock drift into account and (2) using a tree-based synchronization scheme to reduce the synchronization duration.
Created from the Publication Database of the Vienna University of Technology.