Talks and Poster Presentations (with Proceedings-Entry):
T. Rausch, W. Hummer, V. Muthusamy:
"An Experimentation and Analytics Framework for Large-Scale AI Operations Platforms";
Talk: 2020 USENIX Conference on Operational Machine Learning (OpML 2020) - Online Conference,
Berkeley, CA, USA;
- 2020-08-07; in: "Proceedings of the 2020 USENIX Conference on Operational Machine Learning (OpML 2020)",
This paper presents a trace-driven experimentation and analytics framework that allows researchers and engineers to
devise and evaluate operational strategies for large-scale AI
workflow systems. Analytics data from a production-grade AI
platform developed at IBM are used to build a comprehensive
system and simulation model. Synthetic traces are made available for ad-hoc exploration as well as statistical analysis of
experiments to test and examine pipeline scheduling, cluster
resource allocation, and similar operational mechanisms.
Created from the Publication Database of the Vienna University of Technology.