Data Challenge Track
Data is the foundation of many important decision-making processes in performance engineering tasks of modern systems. Data can tell us about the past and present of a system’s performance, and it can help us to make predictions about the system’s performance. Therefore, ICPE 2022 will for the first time be hosting a data challenge track, inspired by several other conferences, such as MSR and PROMISE.
In this track, an industrial performance dataset will be provided. The participants are invited to come up with research questions about the dataset, and study those. The challenge is open-ended: participants can choose the research questions that they find most interesting. The proposed approaches and/or tools and their findings are discussed in short papers, and presented in the main conference.
How to participate in the challenge
Read the data description
Think of something cool to do with the data. This can be anything you want, including a visualization, analysis, approach or tool
Implement your idea, evaluate it, and write down your idea and the results in a short paper
This year, the challenge dataset is provided by MongoDB.
MongoDB runs performance tests in its continuous integration system. Hundreds of tests run in multiple system configurations generating thousands of performance results. Those results are analyzed using change point detection to identify changes in performance, which are then manually triaged and assigned to developers to fix. MongoDB has previously discussed this process in papers presented at ICPE 2020 and ICPE 2021, and released the underlying source code. Now it has opened up the underlying dataset. The dataset includes the performance results from multiple years. It also includes the related data, such as the computed change points, triage decisions, and tickets opened to address the issues.
The use of change point detection greatly improved the ability to detect when performance changed. However, the current algorithm still identifies many changes that are not actionable, the changes either being due to system noise or being so small as to be non-operable. This imposes a large load on the people who triage the results and leads to a risk of missing some changes.
High level possible ideas for participants include but are not limited to:
Improve on the existing change point detection algorithm (improve sensitivity and precision)
Develop algorithms that automatically triage the change points produced by the system
Explore for correlations between the performance of different tests, configurations, and commits
Suggest improved scheduling algorithms that reduce total number of test executions or total detection time without hurting accuracy
Develop algorithms or visualization techniques to compare performance over time (e.g., year over year) to determine the statistically significant changes over that time (as opposed to just the point comparisons).
The submission time aligns with the other early year tracks (poster, tutorial, demo, wip/vision, and workshops) and can be found here.
A challenge paper should contain the following elements:
A description of the problem that you are studying, and an explanation of why the problem is important
A description of the solution that you are proposing
An evaluation of the solution
A discussion of the implications of your solution and results
We highly encourage the source code of the solution to be included with the submission (e.g., in a GitHub repository), but this is not mandatory for acceptance of a data challenge paper.
The page limit for challenge papers is 4 pages (including all figures and tables) + 1 page for references. Challenge papers will be published in the companion to the ICPE 2022 proceedings. All challenge papers will be reviewed by at least two program committee members. Note that submissions to this track are double-blind: for details see the Double Blind FaQ page. The best data challenge paper will be awarded by the track chairs and the program committee members.
Submissions are made via the ICPE EasyChair (https://easychair.org/conferences/?conf=icpe2022) by selecting the respective track.
Data Challenge Chairs
- Cor-Paul Bezemer (University of Alberta)
- David Daly (MongoDB)
- Weiyi Shang (Concordia University)