Assistant professor’s two NSF grants aim to better sort social media content and identify online trolls
Social media discussions, both healthy and unhealthy, fuel much of the public discourse and media coverage in our 21st century world. Some people use platforms like Facebook, Twitter, and Reddit to make positive connections, but others prefer to spread misinformation and hatred.
Given the popularity of these platforms and others like them, which see millions of posts every day, it can be difficult for researchers to understand what is being shared and how it affects our opinions on political and social topics.
Assistant Professor Jeremy Blackburn – a faculty member in the Computer Science Department at Binghamton University’s Thomas J. Watson College of Engineering and Applied Science since 2019 – is developing ways to make it easier to collect and sort content online, especially from emerging social media platforms.
Blackburn recently received a National Science Foundation CAREER Prize of $ 517,484 over five years for his project “Towards a Data-Driven Online Understanding of Sentiment”. The CAREER Prize supports early career faculty who have the potential to serve as academic models in research and education.
The project has four objectives:
- Create a cross-platform social media dataset, developing tools to leverage previous experience in large-scale data collection to perform continuous identification and collection of multimedia data from social media platforms emerging.
- Develop data-driven techniques for understanding the coded language used in social media, both text and images.
- Develop a new system to gauge the sentiment of content by comparing items rather than looking at them individually.
- Explore online sentiment modeling at the user and community level.
“The focus is on the pictures,” Blackburn said. “Can we deduce the underlying feeling or meaning of an image? Images are used almost as much as text on the internet, and it’s hard to understand what people are talking about if you can’t understand the visual language they are using.
Current algorithms rank the sentiment of an image by rating it and giving it an independent score, he said. For example, one tweet might score a 0.4 on a predetermined “happiness scale”, while another might score a 0.5 – but what does this incremental difference mean to humans?
Instead, by showing two pieces of content and asking which one is more positive, Blackburn hopes to have a better measure of the emotion behind it. What complicates this effort, however, is how the pictures become memes among certain subsets of online commentators.
“We’re not interested in just saying what’s in the image – we’re interested in saying how it’s used,” he said. “We take the adage ‘a picture is worth 1,000 words’ and treat it like a piece of vocabulary. We have ways to capture how it looks, but we’re also going to treat it as a word like we do in a language model and place it where it’s been used.
“For example, if you tweet an image you can also include words, and if we have enough of these samples, we can now understand that someone is upset or sad or whatever the underlying meaning is. translate it into regular words.
While the development of this new technology to monitor feelings online can have many uses, such as politics and business, Blackburn has one specific goal that he hopes to achieve.
“We could better understand violent content or hate speech online that is heavily coded, or we could identify disinformation so people cannot hide this type of behavior using only pictures,” he said. . “It’s my personal passion and the reason why I develop it.”
Another recently awarded NSF project aims to better detect so-called “troll” accounts spreading false information as part of broader social media influencing campaigns.
The $ 220,000 grant over two years – a collaboration with Assistant Professor Gianluca Stringhini of Boston University – will collect information on the accounts of trolls identified by Twitter and Reddit as belonging to disinformation campaigns carried out by adversary countries of the United States.
These malicious users are different from “bot” accounts which automatically post the same message in multiple places. They are coordinated to interact with each other and take multiple sides of the same argument just to sow discord among all who watch them.
One example, Blackburn said, is that of two troll accounts “arguing” over “Black Lives Matter” versus “All Lives Matter” not on principle, but simply to spark drama among other users.
“Over time, the same troll account can take different positions on the same issue, because at the end of the day they don’t have a particular opinion – they just want to cause trouble,” he said. . “They have to convince people to get involved.
The data collected for this project will be used to train machine learning algorithms to identify troll accounts by codifying patterns of interactions that are rare in real accounts. Social media platforms would then be able to stop the trolling without needing someone to moderate every questionable post.
“Towards a Data-Driven Understanding of Online Sentiment” is NSF award # 2046590. “Detection of Accounts Involved in Social Media Influencer Campaigns” is NSF Award # 2114411.