Skip to content Skip to navigation

Why AI Struggles To Recognize Toxic Speech on Social Media

A person looks at a cellphone with a frown.
Jul 13 2021
Faculty, Fellow, Research, Stanford

Facebook says its artificial intelligence models identified and pulled down 27 million pieces of hate speech in the final three months of 2020. In 97 percent of the cases, the systems took action before humans had even flagged the posts.

That’s a huge advance, and all the other major social media platforms are using AI-powered systems in similar ways. Given that people post hundreds of millions of items every day, from comments and memes to articles, there’s no real alternative. No army of human moderators could keep up on its own.

But a team of human-computer interaction and AI researchers at Stanford sheds new light on why automated speech police can score highly accurately on technical tests yet provoke a lot dissatisfaction from humans with their decisions. The main problem: There is a huge difference between evaluating more traditional AI tasks, like recognizing spoken language, and the much messier task of identifying hate speech, harassment, or misinformation — especially in today’s polarized environment.

Read the study: The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality

“It appears as if the models are getting almost perfect scores, so some people think they can use them as a sort of black box to test for toxicity,’’ says Mitchell Gordon, a PhD candidate in computer science who worked on the project. “But that’s not the case. They’re evaluating these models with approaches that work well when the answers are fairly clear, like recognizing whether ‘java’ means coffee or the computer language, but these are tasks where the answers are not clear.”

The team hopes their study will illuminate the gulf between what developers think they’re achieving and the reality — and perhaps help them develop systems that grapple more thoughtfully with the inherent disagreements around toxic speech.

Co authors include, Mitchel Gordon, a 2016 SGF Fellow, and Kaitlyn Zhou, a 2019 SGF Fellow. 

Read the full article