Each of us continuously generates a stream of data. When we buy a coffee, watch a romcom or action movie, or visit the gym or the doctor’s office (tracked by our phones), we hand over our data to companies that hope to make money from that information – either by using it to train an AI system to predict our future behavior or by selling it to others.
But what is that data worth?
“There’s a lot of interest in thinking about the value of data,” says James Zou, assistant professor of biomedical data science at Stanford University, member of the Stanford Institute for Human-Centered Artificial Intelligence, and faculty lead of a new HAI executive education program on the subject. How should companies set prices for data they buy and sell? How much does any given dataset contribute to a company’s bottom line? Should each of us receive a data dividend when companies use our data?
Motivated by these questions, Zou and graduate student Amirata Ghorbani have developed a new and principled approach to calculating the value of data that is used to train AI models. Their approach, detailed in a paper presented at the International Conference on Machine Learning and summarized for a slightly less technical audience in arXiv, is based on a Nobel Prize-winning economics method and improves upon existing methods for determining the worth of individual datapoints or datasets. In addition, it can help AI systems designers identify low value data that should be excluded from AI training sets as well as high value data worth acquiring. It can even be used to reduce bias in AI systems.
Study co-author, Amirata Ghorbani, is a 2016 SGF Fellow.