Talks and Poster Presentations (with Proceedings-Entry):

S. Hunold, A. Bhatele, G. Bosilca, P. Knees:
"Predicting MPI Collective Communication Performance Using Machine Learning";
Talk: IEEE International Conference on Cluster Computing (IEEE Cluster 2020) - Online Conference, Kobe, Japan; 2020-09-14 - 2020-09-17; in: "Proceedings of the IEEE International Conference on Cluster Computing (IEEE Cluster 2020)", IEEE, (2020), ISBN: 978-1-7281-6678-0; 259 - 269.

English abstract:
The Message Passing Interface (MPI) defines the semantics of data communication operations, while the implementing libraries provide several parameterized algorithms for each operation. Each algorithm of an MPI collective operation may work best on a particular system and may be dependent on the specific communication problem. Internally, MPI libraries employ heuristics to select the best algorithm for a given communication problem when being called by an MPI application. The majority of MPI libraries allow users to override the default algorithm selection, enabling the tuning of this selection process. The problem then becomes how to select the best possible algorithm for a specific case automatically. In this paper, we address the algorithm selection problem for MPI collective communication operations. To solve this problem, we propose an auto-tuning framework for collective MPI operations based on machine-learning techniques. First, we execute a set of benchmarks of an MPI library and its entire set of collective algorithms. Second, for each algorithm, we fit a performance model by applying regression learners. Last, we use the regression models to predict the best possible (fastest) algorithm for an unseen communication problem. We evaluate our approach for different MPI libraries and several parallel machines. The experimental results show that our approach outperforms the standard algorithm selection heuristics, which are hard-coded into the MPI libraries, by a significant margin.

Message Passing Interface, Performance Prediction, Auto-tuning, Machine Learning, GAM, XGBoost, KNN

"Official" electronic version of the publication (accessed through its Digital Object Identifier - DOI)

Created from the Publication Database of the Vienna University of Technology.