Skip to content Skip to navigation

A New Approach to the Data-Deletion Conundrum

People being erased by a pencil.
Sep 24 2021
Faculty, Fellow, Research, Stanford

Rising consumer concern over data privacy has led to a rush of “right to be forgotten” laws around the world that allow individuals to request their personal data be expunged from massive databases that catalog our increasingly online lives. Researchers in artificial intelligence have observed that user data does not only exist in its raw form in a database, it is also implicitly contained in models trained on that data. So far, they have struggled to find methods for deleting these “traces” of users efficiently. The more complex the model is, the more challenging it becomes to delete data.

“The exact deletion of data — the ideal — is hard to do in real time,” says James Zou, a professor of biomedical data science at Stanford University and an expert in artificial intelligence. “In training our machine learning models, bits and pieces of data can get embedded in the model in complicated ways. That makes it hard for us to guarantee a user has truly been forgotten without altering our models substantially.”

Zou is senior author of a paper recently presented at the International Conference on Artificial Intelligence and Statistics (AISTATS) that may provide a possible answer to the data deletion problem that works for privacy-concerned individuals and artificial intelligence experts alike. They call it approximate deletion.

“Approximate deletion, as the name suggests, allows us to remove most of the users’ implicit data from the model. They are ‘forgotten,’ but in such a way that we can do the retraining of our models at a later, more opportune time,” says Zach Izzo, a graduate student in mathematics and the first author of the AISTATS paper.

Approximate deletion is especially useful in quickly removing sensitive information or features unique to a given individual that could potentially be used for identification after the fact, while postponing the computationally intensive full model retraining to times of lower computational demand. Under certain assumptions, Zou says, approximate deletion even achieves the holy grail of exact deletion of a user’s implicit data from the trained model.

Study co-author, Zachary Izzo, is a 2020 SIGF Fellow.

Read the full article