About DataPerf

We, researchers from Coactive.AI, ETH Zurich, Google, Harvard University, Landing.AI, Meta, Stanford University, and TU Eindhoven, are announcing DataPerf, a new benchmark suite for machine learning datasets and data-centric algorithms. We presented DataPerf at the NeurIPS Data-centric AI Workshop. Going forward, we invite you to join us in defining and developing the benchmark suite in the DataPerf Working Group hosted by the MLCommons® Association. If you are interested in using the DataPerf benchmark or participating in leaderboards and challenges based on DataPerf in 2023, please sign up for DataPerf-announce (click link then "Ask to Join").

Introduction. DataPerf is a benchmark suite for ML datasets and data-centric algorithms. Historically, ML research has focused primarily on models, and simply used the largest existing dataset for common ML tasks without considering the dataset’s breadth, difficulty, and fidelity to the underlying problem. This under-focus on data has led to a range of issues, from data cascades in real applications, to saturation of existing dataset-driven benchmarks for model quality impeding research progress. In order to catalyze increased research focus on data quality and foster data excellence, we created DataPerf: a suite of benchmarks that evaluate the quality of training and test data, and the algorithms for constructing or optimizing such datasets, such as core set selection or labeling error debugging, across a range of common ML tasks such as image classification. We plan to leverage the DataPerf benchmarks through challenges and leaderboards.

Inspiration. We are motivated by a number of prior efforts including: efforts to develop adversarial data such as Cats4ML and Dynabench, efforts to develop specific benchmarks or similar suites such as the DCAI competition and DCBench, and the MLPerf™ benchmarks for ML speed. We aim to provide clear evaluation and encourage rapid innovation aimed at conferences and workshops such as the NeurIPS Datasets and Benchmarks track. Similar to the MLPerf effort, we’ve brought together the leaders of these motivating efforts to build DataPerf.

Goals. DataPerf has these goals:

Focus research and development on improving ML dataset quality
Improve ML training datasets to increase accuracy and/or reduce data required to train
Improve ML test datasets to drive ML solution fidelity and reliability
Motivate datasets that increase representation and decrease bias
Drive development of better techniques and tools for creating and optimizing datasets
Provide consistent metrics for researchers and commercial developers
Enforce replicability to ensure reliable results
Keep benchmarking effort affordable so all can participate

General Approach. Our general approach is to define benchmark types for training sets, test sets, and a range of data-centric algorithms, then define specific benchmarks by applying a benchmark type to a common ML task such as image classification or spoken keyword identification. In this way, the DataPerf benchmark suite is defined as the cross product of { benchmark types } x { ML tasks }.

Benchmark Types and Metrics. The DataPerf suite includes the benchmark types listed below. Each benchmark type uses a different metric, though all in principle either maximize the efficacy of a training set or the breadth/difficulty of a test set.

Training dataset: create a novel training dataset that maximizes the accuracy of a standard set of models trained on it.
Test dataset: identify novel test data which is incorrectly labeled by the maximum percentage of a standard set of trained models yet is correctly labeled by humans.
Selection algorithm: select a subset of a larger dataset for use as a training set to maximize the accuracy of a standard set of models trained on it.
Debugging algorithm: identify mislabeled or unlabeled data that, when corrected, maximizes the accuracy of a standard set of models trained on it.
Slicing algorithm: identify semantically consistent slices of a dataset for which a trained model underperforms and maximize top-K precision of such identification vs. ideal choices.
Valuation: estimate the increase in accuracy from supplementing a known training dataset with new data, presently unlabeled, and minimize the error with respect to the actual value.

Leaderboards and Challenges. In 2023, we will launch leaderboards and challenges based on the DataPerf benchmarks to encourage constructive competition, identify best-of-breed ideas, and inspire the next generation of concepts for building and optimizing datasets. We will operate the leaderboards and challenges using a platform based on Dynabench.

Organization. The DataPerf benchmarks, leaderboards, challenges, and platform will be hosted by the MLCommons Association, which also hosts the MLPerf Benchmarks. The MLCommons Association is a non-profit engineering consortium with over 50 members including large tech companies, startups, and academics. The MLCommons Association’s mission is to make ML better for everyone through benchmarks, public datasets, best practices, and research.

Learn more. You can read a full description of DataPerf in the whitepaper.

Page updated

Google Sites

Report abuse