Stanford AI team aims to fix social media content moderation
Researchers at the Stanford Institute for Human-Centered Artificial Intelligence (HAI) proposed a jury-learning algorithm that aims to improve online content moderation last spring, hoping to deploy their system on platforms like Reddit and Discord in the near future.
The HAI team believe their system can create more inclusive online spaces by representing a greater diversity of voices when identifying toxic comments, which the researchers defined as unwanted and offensive speech. By modeling individual annotators – people who provide human answers, or ground truths, for the machine to imitate – jury learning allows users to mix and match different identities to populate a “jury” cast. desirable.
“We wanted to empower people who deploy machine learning models to make explicit choices about the voices their models reflect,” said lead researcher Mitchell Gordon ’16 and fourth-year Ph.D. in computer science. student wrote in a statement to the Daily.
Currently, human moderators define the line between appropriate content and toxic content. A veteran Reddit moderator who works with over 50 communities wrote that the biggest challenge in moderating content is ensuring that “no one… [feels] afraid to comment for fear of being attacked, yelled at or harassed. The moderator requested anonymity for fear of reprisals.
However, moderators often struggle to come up with a consistent definition of what constitutes an “attack,” especially since different demographics, experiences and values will lead to different reactions, they wrote in a statement. at the Daily.
Gordon and assistant professor of computer science Michael Bernstein recognized the challenge of consistent evaluation when they noticed the extent of disagreement among human annotators in toxic comment datasets. For example, LGBTQ+ members may perceive gender-related comments differently than people who aren’t personally affected by the comments, Gordon said.
“If we simulated re-collecting this dataset with a different set of randomly chosen annotators, something like 40% of the ‘ground truth’ labels would flip (from toxic to non-toxic, or vice versa),” Gordon wrote.
This observation inspired the jury learning algorithm, which aims to focus on marginalized voices, including LGBTQ+ people and racial minorities, who are disproportionately affected by toxic content.
Current machine learning classification approaches, used by platforms such as Facebook and YouTube, do not explicitly resolve disagreements between annotators and do not assign the most popular ground truth label. However, researchers believe that the majority opinion does not always lead to a fair decision. “Ignoring people who disagree can be really problematic because voice matters,” Gordon wrote.
Gordon and the research team wondered what insights a machine learning classifier should represent when determining the toxicity of a comment.
The HAI research team, which includes Gordon, Bernstein, Ph.D. in second-year computer science. student Michelle Lam ’18, third-year PhD in computer science. student Joon Sung Park, Kayur Patel MS ’05, professor of communication Jeffrey T. Hancock, and professor of computer science Tatsunori Hashimoto, proposed jury learning as a way to identify perspectives to weigh in assessing toxicity of a comment. Given a dataset with comments and corresponding ground truths labeled by multiple annotators, the algorithm models each annotator individually based on their labeling decisions and provides demographic data, such as their race, gender, or political identity.
Practitioners then decide on the distribution of the jury by specifying the demographics to be represented and in what proportion. The system randomly selects a sample of jurors from the training dataset that matches the specified distribution, and the model predicts the individual juror responses. This is repeated 100 times to create 100 parallel panels, and the final toxicity classification is produced by calculating the median of the means across all sampled groups. The system also generates individual juror decisions and presents a counterfactual jury – a jury distribution that would reverse the classifier’s prediction.
While the team’s current work focuses on content moderation, the researchers said it could apply to other socially contested issues, where disagreements over ground-truth labels are common. Bernstein said jury learning also applies to creative tasks, where judging the value of a design can be deeply controversial depending on an annotator’s artistic background and style. He said medical scenarios were another example: experts from different specialties and backgrounds can provide different diagnoses for a particular patient. Researchers hope to further explore jury learning in various contexts, Bernstein said.
“Our hope is that we can inspire both industry and civil society to consider this kind of architecture as a flexible way to build more prescriptive algorithms appropriate in contested scenarios,” Bernstein said.
The HAI research team also hopes to further develop an ethical framework to guide practitioners in selecting the most appropriate jury cast that best represents competing viewpoints on socially contested issues.