Expert’s Corner with NLP and Misinformation Expert Preslav Nakov

Preslav Nakov has established himself as one of the leading experts on the use of AI against propaganda and disinformation. He has been very influential in the field of natural language processing and text mining, publishing hundreds of peer reviewed research papers. He spoke to us about his work dealing with the ongoing problem of online misinformation.

1. What do you think about the on-going infodemic? With your extensive work on fake news, do you think there will be a point where we can see a decrease in such content?

Indeed, the global COVID-19 pandemic has also brought us the first global social media infodemic. At the beginning of the pandemic, the World Health Organization already realized the importance of the problem and put fighting the infodemic at rank two in its list of top-5 priorities. The infodemic represents an interesting blending of political and medical misinformation and disinformation. Now, a year and a half later, both the pandemic and the infodemic persist. Yet, I am an optimist. What has fueled the infodemic initially was that so little was known about COVID-19, and there was a lot of void to be filled. Later on, with the emergence of the vaccines, the infodemic got a new boost by the re-emerging anti-vaxxer movement, which has grown to be much more powerful than before. However, as the severity of the pandemic has now started to decrease, for example, we see full stadiums at EURO 2020 with no masks and little social distancing, to a large extent thanks to the vaccines, I expect that the infodemic will soon follow a similar downward trajectory. Yet, it will not die out completely, just decrease.

2. What drove you to pursue research in the fake news and misinformation domain?

As part of a collaboration between the Qatar Computing Research Institute (QCRI) and MIT, I was working on question answering in community forums, where the goal was to detect which answers in the forum were good, in the sense of trying to answer the question directly, as opposed to giving indirectly related information, discussing other topics, or talking to other users. We developed a strong system, which we deployed for use in production in a local forum, Qatar Living, where it is operational to date, but we soon realized that not all good answers were factually true. This got me interested in the factuality of user-generated content. Soon, along came the 2016 US Presidential election, and fake news and factuality became a global concern. Thus, I started the Tanbih mega-project, which we are developing at QCRI in collaboration with MIT, and other partners. The aim of the project is to help fight the fake news, propaganda, and media bias by making users aware of what they are reading, thus promoting media literacy and critical thinking. At Checkstep, we’re currently building AI-first tools to tackle hate speech, spam and misinformation.

3. What do you think about the upcoming regulations — EU’s DSA and OSB-UK?

These upcoming EU and U.K. regulations (and related proposals that are being discussed in the USA and other countries), have the potential to become transformative in the way GDPR was. Platforms would suddenly become responsible for their content, and would have a legal obligation to enforce their own terms of service as well as to comply with legislation on certain kinds of malicious content. They would also have an obligation to be able to explain their moderation decisions to their users as well as to external regulatory authorities. I see this as a hugely positive development, the way that GDPR was.

Legislators should be careful though to keep a good balance between trying to limit the spread of malicious content and protecting free speech. Moreover, we all should be cautious and remember that fake news and hate speech are complex problems and that legislation is only part of the overall solution. We would still need human moderators, research and development of tools that can help automate the process of content moderation at scale, fact-checking initiatives, high-quality journalism, teaching media literacy, and cooperation with platforms where user-generated content spreads.

4. How should platforms better prepare themselves?

Big Tech companies are already taking this seriously and have been developing in-house solutions for years. However, complying with the new legislation would be a challenge for small and mid-size companies (though it is also true that it affects them less), as well as for large ones for which user-generated content is important, but is not their core business. For example, a small fitness club that also has a forum on their website could not afford to hire and train its own content moderators. Such companies face two main options: (a) shut down their fora to avoid any issues, or (b) try to outsource content moderation, partially or completely. When it comes to content moderation at scale, there is a clear need for automation, which can take care of a large number of easy cases, but the final decision in hard cases should be taken by humans, not machines.

5. Any recent talks/research you’d like to talk about? Can also mention future talks?

Fighting the infodemic is typically thought of in terms of factuality, but it is a much broader problem. In February 2020, MIT Technology Review had an article that pointed out certain characteristics of the infodemic that go beyond factuality, such as fueling panic and racism.

Indeed, if the 2016 U.S. Presidential election gave us the term “fake news”, the 2020 one got the USA and the world concerned about a range of other types of malicious content online. The infodemic has demonstrated that this is part of the same problem, with dangers ranging from promoting fake cures, rumors, and conspiracy theories to spreading racism, xenophobia, and panic. Addressing these issues requires solving a number of challenging problems such as identifying messages making claims, determining their check-worthiness and factuality, and their potential to do harm as well as the nature of that harm, to mention just a few. Thus, as part of Tanbih, we have been working on a system that can analyze user-generated content in Arabic, Bulgarian, English, and Dutch, which covers all these aspects and combines the perspectives and the interests of journalists, fact-checkers, social media platforms, policy makers, and society as a whole. A preliminary version of this work appeared in ICWSM-2021 last week.

We have been also looking into supporting fact-checkers and journalists by developing tools for predicting which claims are check-worthy and which ones have been previously fact-checked. We have an upcoming survey paper at IJCAI-2021, which surveys the AI technology that can help the fact-checkers. This has been the mission of the CLEF CheckThat! lab, which we have been organizing for four years now; also look at our recent ECIR-2021 paper about the lab.

Another research line I was involved in aims to detect the use of propaganda techniques in memes. Memes are very important as a large fraction of propaganda in social media is multimodal, mixing textual with visual content. Moreover, by focusing on the specific techniques (e.g., name calling, loaded language, flag-waving, whataboutism, black & white fallacy, etc.), we can train people to recognize how they are being manipulated. Recognizing twenty-two such techniques in memes has been the subject of a recent SemEval-2021 shared task; there is also an upcoming paper at ACL-2021.

In terms of content moderation, we recently wrote a survey that studied the dichotomy between what types of abusive language online platforms seek to curb vs. what research efforts there are to automatically detect abusive language.

6. Any personal anecdotes where you fell prey to fake news?

I have fallen prey to fake news many times, and I keep being fooled from time to time. Many friends and relatives send me articles asking me: is this fake news? In most cases, it is easy to tell, for example, maybe the article is just two to three sentences long and doesn’t give much support to the claim in the title, or maybe the website is a known fake news or satirical one, or a simple reverse image search reveals that the photo in the article is from a different event, or maybe the claim was previously fact-checked and known to be true/false, etc. Yet, in many cases, this is very hard, and my answer is: I am sorry, but I do not have a crystal ball. In fact, several studies in different countries have shown the same thing: most people cannot distinguish fake from real news; in the EU, this is true for 75% of young people.

Yet, with proper training, people can improve quite a bit. Indeed, two years ago Finland declared that they have won the war on fake news thanks to their massive media literacy program targeting all levels of the society, but primarily the schools. It took them five years, which shows that real results are possible and achievable in a realistic time period. We should be careful when setting our expectations though: the goal should not be to eradicate all fake news online; it should rather be to limit its impact, thus making it irrelevant. This has already happened to spam, which is still around, but is not the same kind of a problem that it used to be some 15–20 years ago; now Finland has shown that we can achieve the same with fake news as well. Thus, while the short-term solution should focus on content moderation and on limiting the spread of malicious content, the mid-term and long-term solution would do better to look at explainability and training users: this is fake news because …, this is hateful/offensive language because …, etc.

An edited version of this story originally appeared in The Checkstep Round-up newsletter https://checkstep.substack.com/p/calls-for-more-transparency-and-safety

More posts like this

We want content moderation to enhance your users’ experience and so they can find their special one more easily.

Fake Dating Pictures: A Comprehensive Guide to Identifying and Managing 

In the world of online dating, fake dating pictures are harmful, as pictures play a crucial role in making a strong first impression. However, not all dating pictures are created equal. There is a growing concern about fake profiles using deceptive or doctored images.  To navigate the online dating landscape successfully, it's important to know
5 minutes

What is Content Moderation? 

Content moderation is the strategic process of evaluating, filtering, and regulating user-generated content on digital ecosystems. It plays a crucial role in fostering a safe and positive user experience by removing or restricting content that violates community guidelines, is harmful, or could offend users. An effective content moderation system is designed to strike a delicate
5 minutes

Transforming Text Moderation with Content Moderation AI

In today's interconnected world, text-based communication has become a fundamental part of our daily lives. However, with the exponential growth of user-generated text content on digital platforms, ensuring a safe and inclusive online environment has become a daunting task. Text moderation plays a critical role in filtering and managing user-generated content to prevent harmful or
4 minutes

Streamline Audio Moderation with the Power of AI

In today's digitally-driven world, audio content has become an integral part of online platforms, ranging from podcasts and audiobooks to user-generated audio clips on social media. With the increasing volume of audio content being generated daily, audio moderation has become a critical aspect of maintaining a safe and positive user experience. Audio moderation involves systematically
4 minutes

It’s Scale or Fail with AI in Video Moderation

In the digital age, video content has become a driving force across online platforms, shaping the way we communicate, entertain, and share experiences. With this exponential growth, content moderation has become a critical aspect of maintaining a safe and inclusive online environment. The sheer volume of user-generated videos poses significant challenges for platforms, necessitating advanced
4 minutes

Enable and Scale AI for Podcast Moderation

The podcasting industry has experienced an explosive growth in recent years, with millions of episodes being published across various platforms every day. As the volume of audio content surges, ensuring a safe and trustworthy podcast environment becomes a paramount concern. Podcast moderation plays a crucial role in filtering and managing podcast episodes to prevent the
4 minutes

Ready or Not, AI Is Coming to Content Moderation

As digital platforms and online communities continue to grow, content moderation becomes increasingly critical to ensure safe and positive user experiences. Manual content moderation by human moderators is effective but often falls short when dealing with the scale and complexity of user-generated content. Ready or not, AI is coming to content moderation operations, revolutionizing the
5 minutes

How to Protect the Mental Health of Content Moderators? 

Content moderation has become an essential aspect of managing online platforms and ensuring a safe user experience. Behind the scenes, content moderators play a crucial role in reviewing user-generated content, filtering out harmful or inappropriate materials, and upholding community guidelines. However, the task of content moderation is not without its challenges, as it exposes moderators
4 minutes

Scaling Content Moderation Through AI Pays Off, No Matter the Investment

In the rapidly evolving digital landscape, user-generated content has become the lifeblood of online platforms, from social media giants to e-commerce websites. With the surge in content creation, content moderation has become a critical aspect of maintaining a safe and reputable online environment. As the volume of user-generated content continues to grow, manual content moderation
4 minutes

Overhaul Image Moderation with the Power of AI

In today's digital world, visual content plays a significant role in online platforms, ranging from social media to e-commerce websites. With the exponential growth of user-generated images, ensuring a safe and inclusive user experience has become a paramount concern for platform owners. However, image moderation poses unique challenges due to the sheer volume, diverse content,
4 minutes

Outsourcing Content Moderation

Outsourcing content moderation has become an essential aspect of managing online platforms in the digital age. With the exponential growth of user-generated content, businesses are faced with the challenge of maintaining a safe and inclusive environment for their users while protecting their brand reputation. To address this, many companies are turning to outsourcing content moderation
4 minutes

Designing for Trust in 2023: How to Create User-Friendly Designs that Keep Users Safe

The Significance of designing for trust in the Digital World In today's digital landscape, building trust with users is essential for operating a business online. Trust is the foundation of successful user interactions and transactions, it is key to encouraging users to share personal information, make purchases, and interact with website content. Without trust, users
5 minutes

17 Questions Trust and Safety Leaders Should Be Able to Answer 

A Trust and Safety leader plays a crucial role in ensuring the safety and security of a platform or community. Here are 17 important questions that a Trust and Safety leader should be able to answer.  What are the key goals and objectives of the Trust and Safety team? The key goals of the Trust
6 minutes

The Future of Dating: Embracing Video to Connect and Thrive

‍In a rapidly evolving digital landscape, dating apps are continually seeking innovative ways to enhance the user experience and foster meaningful connections. One such trend that has gained significant traction is the integration of video chat features. Video has emerged as a powerful tool to add authenticity, connectivity, and fun to the dating process. In
4 minutes

Content Moderation: A Comprehensive Guide

Content moderation is a crucial aspect of managing online platforms and communities. It involves the review, filtering, and approval or removal of user-generated content to maintain a safe and engaging environment. To navigate this landscape effectively, it's essential to understand the terminology associated with content moderation. In this article, we'll delve into a comprehensive glossary
7 minutes

Prevent unwanted content from reaching your platform

Speak to one of our experts and learn about using AI to protect your platform
Talk to an expert