fbpx

Content Moderation Using ChatGPT

ChatGPT content moderation
ChatGPT content moderation

In 10 minutes, you’ll learn how to use ChatGPT for content moderation across spam and hate speech.

Who is this for?

If you are in a technical role, and work at a company that has user generated content (UGC) then read on. We will show you how you can easily create content moderation models to scan for unwanted content.

ChatGPT to Moderate Spam, Hate Speech, and More

Generative AI systems, such as ChatGPT, due to their innate ability to understand context and language, are game-changers in content moderation. In particular, it is no longer necessary to use generic policies, and you can really tailor them for your specific platform. Generative AI systems are becoming exceptionally good at understanding context and language variations and can be game-changing for moderation purposes. Why not embrace today’s AI era and say goodbye to your keyword filters?

In the upcoming section, we delve into our first-hand encounter with implementing a precise and tailored AI scanning system. This system is highly versatile and can effectively address various forms of abuse, including bullying, hate speech, sexual content, and more. To illustrate our approach, we have selected spam as the focus of our experimentation.

How to build  custom AI content moderation classifier

As an example, we are going to show how to build a custom classifier. It took us only a few minutes to run and we tested it with several other policies, such as hateful speech. We had:

  • Access to a prompt-based Generative AI. ChatGPT worked well for us;
  • 200 examples of texts breaching and not breaching your target policy;
  • Google Sheet with a Generative AI plugin installed. Here is one you can copy. It is the one used through this tutorial with all the formulas and examples. Note for coders: A Python terminal word as well;

Step 1: The first prompt

For the spam classifier, we started with the following prompt:


You are a content moderator. Classify whether a given text contains spam. Spam refers to content that is made to mislead and dupe users for the sole purpose of gaining something out of it. Do not give any explanation, just answer with 1 if the text contains spam, and 0 if not.


This prompt, when sent to the Generative AI, will most of the time – but not always – give you a ‘0’ or ‘1’ as instructed. Now if you type any text, you will get the following answer.

Webinar: How to use ChatGPT for Content Moderation

Engage with our AI moderation experts: 20th September @ 3PM BST
Register for Webinar

Step 2: Check accuracy

Applying this prompt, we run the prompt for our set of texts to evaluate. This should take 30 seconds or more depending on the size of your data. We can then count the correct and incorrect classifications. In our example above, we obtained the following metrics for the hate classifier:

Precision58.67%
Recall89.80%
F1 score70.97%

Great start, but not perfect! 

We set up the definition of spam without any additional explanation. After reviewing the results, we figured out where the model made mistakes and improved our prompts by adding more specific rules about similar policies and some triggered slur words that cause false positive detections. 

Step 3: Update your prompt

The updated prompt:


You are a content moderator. Classify whether a given text contains spam. Spam refers to content that is made to mislead and dupe users for the sole purpose of gaining something out of it. You must detect only these types of spam: financial spam, unrealistic promises, misleading content, sexual/healthy clickbaits, impersonation. We allow different types of advertisement on our platform, so you must answer 0, when you see a default advertisement that doesn't violate our policy. Do not give any explanation, just answer with 1 if the text contains spam, and 0 if not.


We can check on a few examples that this works better. Using the full dataset with the updated prompt, we got the following metrics for this classifier:

Prompt 1Prompt 2
Precision58.67%68.00%
Recall89.80%86.73%
F1 score70.97%76.23%

Significant improvement! We can see a significant improvement in F1 macro score.

We can still do even better by “nudging” the generative AI classifier toward good answers so it can grasp subtleties in our interpretation of the policy for some sub-categories of spam. To reduce the false positive detections, we added samples that are close to policy-violating samples but do not really breach the defined policy, leading to the final prompt:


You are a content moderator. Classify whether a given text contains spam. Spam refers to content that is made to mislead and dupe users for the sole purpose of gaining something out of it. You must detect only that types of spam: financial spam, unrealistic promises, misleading content, sexual/healthy clickbaits, impersonation. 

We allow different type of advertisement on our platform, so you must answer 0, when you see default advertisement that doesn't violate our policy. Do not give any explanation, just answer with 1 if the text contains spam, and 0 if not. Following examples are not spam(label 0), be carefull to not label as spam similar cases: 

Examples: 

*however , since our previous attempts to speak to you have failed , this will be our last notice to close for you the lower rate.

*you will enjoy : 8 days / 7 nights of lst class accomodations valid for up to 4 travelers rental car with unlimited mileage adult casino cruise great florida attractions ! much much more . . . click here ! ( limited availability ) to no longer receive this or any other offer from us , click here to unsubscribe . [ bjk 9 ^ " : )

*Subject: norton systemworks 2002 final clearance 1093 norton systemworks 2002 software suite professional edition 6 feature - packed utilities , 1 great price a $ 300 . 00 + combined retail value for only $ 29 . 99 !

*Check out our newest selection of content, Games, Tones, Gossip, babes and sport, Keep your mobile fit and funky text WAP to 82468.

*winchester , ma 01890 this e - mail message is an advertisement and / or solicitation.

*your in - home source of health information a conclusion is the place where you got tired of thinking.

*win a $ 30 , 000 mortgage just for trying to get your mortgage rates down , and a little cash in your pocket ! ! know who is telling you the truth !

*we don ' t want anybody to receive or mailing who does not wish to receive them.

*to opt - out from our mailing list click here . . . .

*FREE RINGTONE text FIRST to 87131 for a poly or text GET to 87131 for a true tone!


You can check it worked well on the few examples added:

For the full dataset, we got the following metrics for this classifier:

Prompt 1Prompt 2Prompt 3
Precision58.67%68.00%70.59%
Recall89.80%86.73%89.36%
F1 score70.97%76.23%78.87%

For metrics comparison and better understanding, our previous approach for building classifiers was a sentence-transformer model for embeddings and logistic regression for the classification task. We trained this model using 1000+ manually gathered samples. This model provides the absolutely same results in the evaluation dataset as the ChatGPT approach with our final prompt.

Step 4: Deploy your updated model

The beauty of prompt engineering is that there is nothing else you need to do. 

If you know how to call a web-service, you have a readily available web-service end-point that can use directly in your application, as summarised in the small python script below.

To include the model, you need to know when you want to make the decision, for example, as soon as a new user content is submitted, you can simply call the webservice. If you use python, the Python native command post from the requests package can be used:

response = requests.post(endpoint_url, headers=headers, json=data) 

where 

  • endpoint_url is the url “https://api.openai.com/v1/chat/completions”
  • headers = { “Authorization”: f”Bearer {api_key}”, “Content-Type”: “application/json” } enables you to specify the API key linked to your OpenAI account. Replace “api_key” with your actual API key provided by OpenAI.
  • data = { “model”: “gpt-3.5-turbo”, “messages”: [{“role”: “user”, “content”: prompt}, {“role”: “system”, “content”: “You are a helpful assistant.”}, {“role”: “user”, “content”: input_text}] } gives to ChatGPT the prompt and the text that needs to be verified.

Cost and limitations of Generative AI vs classical approaches

While Generative AI, like ChatGPT, offers impressive accuracy and context understanding, it can sometimes incur higher costs due to its computational demands and usage-based pricing models. 

Additionally, Generative AI might require continuous fine-tuning to maintain its effectiveness, which can add to the overall expenses. 

Classical approaches, on the other hand, may have limitations in adapting to evolving language trends and understanding nuanced context. They might rely heavily on manual rule-setting and keyword matching, making them less agile in handling complex moderation needs. Striking the right balance between accuracy, cost-efficiency, and adaptability is crucial when deciding between these two approaches for content moderation.

Cost of Generative AI for Moderation

When evaluating the adoption of Generative AI compared to classical approaches, an essential factor to consider is cost-effectiveness. Generative AI, such as ChatGPT, offers an impressive level of accuracy and the ability to comprehend context, making it a strong contender for content moderation. However, this advanced technology often comes with higher computational requirements compared to simpler – but less effective – approaches such as keyword flagging, which can translate to increased operational costs, which can be prohibitive as costs increase with volume nearly linearly. The initial investment in setting up and maintaining the necessary infrastructure for Generative AI can be substantial, and ongoing costs may include periodic fine-tuning to ensure sustained accuracy.

Classical approaches to content moderation, while potentially more budget-friendly upfront, may reveal their limitations in the long run. These methods often rely on predefined rules, keyword matching, and heuristics to identify inappropriate content. However, this approach can struggle with understanding nuanced context and adapting to rapidly evolving language trends. As a result, false positives and negatives might occur frequently, leading to increased human intervention for verification. Moreover, classical methods might necessitate constant manual updates to keep up with emerging patterns of misuse, which can be resource-intensive and time-consuming.

Striking the right balance between the accuracy and adaptability of Generative AI and the potential lower costs of classical approaches is a complex decision. Organizations must weigh the benefits of highly accurate content analysis with the associated expenses, considering factors like their platform’s scale, user base, and regulatory requirements. It’s crucial to factor in the long-term implications, including ongoing maintenance, to ensure that the chosen content moderation solution aligns with the organization’s goals and resources.

Is ChatGPT better than Google Bard?

In the realm of AI-powered language models, the question of whether ChatGPT is superior to Google Bard naturally arises. Both models have generated considerable interest and attention due to their language generation capabilities, but they have distinctive features that set them apart.

ChatGPT, developed by OpenAI, has showcased its prowess in generating coherent and contextually relevant responses. Its underlying transformer architecture enables it to engage in detailed conversations across a wide array of topics. While ChatGPT is highly proficient at mimicking human-like language, it can sometimes generate responses that are factually incorrect or contextually nonsensical. OpenAI’s iterative training approach has improved the model’s performance over time, but it still encounters challenges in maintaining logical consistency in longer interactions.

Google Bard, on the other hand, has also made strides in the language generation domain. Leveraging Google’s extensive resources and expertise, Bard incorporates a more structured approach to conversation. It introduces a system of “prompts” to guide the AI’s responses, allowing users to tailor the generated content to their specific needs. Google Bard emphasizes generating creative content, such as poetry and song lyrics, making it an interesting option for artistic endeavors. However, Bard’s responses can occasionally lack the natural flow found in human conversations, and users might need to experiment with prompts to achieve the desired outcome.

Ultimately, the comparison between ChatGPT and Google Bard depends on the intended use case. While both models excel in generating text, ChatGPT’s focus on dynamic conversations and Google Bard’s emphasis on creative expression indicate that their strengths lie in different areas. Users should carefully evaluate their requirements and preferences to determine which model aligns better with their specific needs.

Conclusion

In just a few simple steps and with the help of a prompt-based Generative AI system, we have successfully built a highly accurate and custom content moderation system. Content moderation is a crucial aspect of any platform or app with social features, and with the growing volume of user-generated content, it has become even more vital. Embracing the power of AI and leaving behind traditional keyword filters, we have harnessed the capabilities of Generative AI to understand context and language variations, making our moderation process much more effective.

Through this journey, we started with a basic prompt and iteratively improved it to achieve outstanding results. Our AI system now demonstrates remarkable precision, recall, and F1 macro scores, effectively identifying and classifying spam. The approach we adopted outperformed even traditional methods like the sentence-transformer model combined with logistic regression.

With this easy-to-deploy and cost-effective AI content moderator, we can ensure a safer and more positive user experience on our platform. As the world of AI continues to evolve, we can continue refining and enhancing our system, adapting it to new challenges and regulations that may arise in the future.

In this instance, leveraging Generative AI for content moderation has proven to be a game-changing strategy, enabling us to effectively handle the ever-increasing content on our platform and maintain a secure and welcoming environment for all users. By staying at the forefront of the AI era, we are continuously improving Checkstep content moderation practices and create a more inclusive and respectful online community. 

So, why not take the plunge into the AI-driven future and say farewell to outdated content moderation techniques? You can embrace the possibilities that AI offers and build your own Generative AI content moderator today. Get in touch with us at Checkstep if you have any questions about managing your user content and policies at scale.

More posts like this

We want content moderation to enhance your users’ experience and so they can find their special one more easily.

EU Transparency Database: Shein Leads the Way with Checkstep’s New Integration

🚀 We launched our first Very Large Online Platform (VLOP) with automated reporting to the EU Transparency Database. We’ve now enabled these features for all Checkstep customers for seamless transparency reporting to the EU. This feature is part of Checkstep’s mission to make transparency and regulatory compliance easy for any Trust and Safety team. What…
2 minutes

Misinformation could decide the US Presidential Election

 In 1993 The United Nations declared May 3 as World Press Freedom Day recognizing that a free press is critical to a healthy, functioning democracy. According to the UN, media freedom and individual access to information is key to empowering people with control over their own lives. It seems the global population mostly believes it…
5 minutes

Lie of the Year: Insurrection Denials Claim Top Spot

“The efforts to downplay and deny what happened are an attempt to brazenly recast reality itself.” After twelve months of hard work debunking hundreds of misleading and false claims, the good folks at Poynter Institute’s PolitiFact take a moment out of their normal schedule to award the Lie of the Year, and for 2021 that dubious…
3 minutes

The Ultimate Guide to GenAI Moderation x Sightengine

Map your GenAI risks and craft “AI-resilient” policies [Part 1] GenAI presents significant challenge for platforms and the Trust & Safety field. As we head into 2025, AI-generated content and detection advancements are poised to take center stage. This post is part of a two-part blog series, co-authored with our partner Sightengine, exploring innovative strategies and…
12 minutes

Checkstep’s 2024: A Year of Innovation, Impact, and Safety

As we’re working on 2025’s OKR, we’re reflecting at Checkstep an an incredible year, shipping new product features faster than ever and positive feedback from our early adopters, giving us the feeling that our platform now ready to scale and will prepare our users to benefit from the next big wave of AI agents. We…
4 minutes

Building a trusted and authentic social shopping experience: Bloop partners with Checkstep for comprehensive moderation and compliance solutions

The fast-growing ecommerce startup Bloop has partnered with Checkstep to safeguard the user-generated content (UGC) on its new social shopping platform, ensuring Trust and Safety for users. About Bloop Bloop is reshaping the social shopping landscape by rewarding real consumer influence. Bloop combines the best elements of social networking and marketplace platforms. The team aims…
4 minutes

Why emerging trends put your user retention at risk – and how to fix it with flexible LLM prompts

Emerging trends can severely threaten user retention We've recently seen how hate speech and misinformation can put user retention at risk during the recent UK far-right riots. Recent events like the UK far-right riots have highlighted how unchecked hate speech and misinformation can severely threaten user retention. When harmful content spreads without effective moderation, it…
5 minutes

UK far-right riots: Trust & Safety solutions for online platforms

The far-right riots in the UK The recent UK far-right riots, undeniably fuelled by misinformation spread on social media platforms serves as a stark reminder of the urgent need for online platforms to adapt their content moderation policies and work closely with regulators to prevent such tragedies. The consequences of inaction and noncompliance are serious,…
10 minutes

The longest 20 days and 20 nights: how can Trust & Safety Leaders best prepare for US elections

Trust and Safety leaders during the US elections: are you tired of election coverage and frenzied political discussion yet? It’s only 20 days until the US votes to elect either Kamala Harris or Donald Trump into the White House and being a Trust and Safety professional has never been harder. Whether your site has anything…
5 minutes

7 dating insights from London Global Dating Insights Conference 2024

Hi, I'm Justin, Sales Director at Checkstep. In September, I had the opportunity to attend the Global Dating Insights Conference 2024, where leaders in the dating industry gathered to discuss emerging trends, challenges, and the evolving landscape of online dating. This year's conference focused on how dating platforms are adapting to new user behaviors, regulatory…
3 minutes

Virtual Reality Content Moderation Guide

Its’s no surprise that virtual reality (VR) and the Metaverse have become buzzwords in the world of technology. Notably, these immersive experiences are revolutionising the way we interact with digital content and each other. However, as the popularity of VR continues to grow, attracting more and more users, so does the need for content moderation.…
14 minutes

Digital Services Act (DSA) Transparency Guide [+Free Templates]

The Digital Services Act (DSA) is a comprehensive set of laws that aims to regulate digital services and platforms to ensure transparency, accountability, and user protection. In other words, it’s the European Union’s way of regulating and harmonizing separate laws under one universal piece of legislation to prevent illegal and harmful activities online and the…
7 minutes

Top 3 Digital Services Act Tools to make your compliance easier

Introduction The Digital Service Act (DSA) is a European regulation amending the June, 8th 2000 Directive on electronic commerce (Directive 2000/31/EC). Its goal is to modernize and harmonize national legislation within the internal market in response to the risks and challenges of digital transformation. The DSA applies to a large range of digital services such…
12 minutes

Minor protection : 3 updates you should make to comply with DSA provisions

Introduction While the EU already has some rules to protect children online, such as those found in the Audiovisual Media Services Directive, the Digital Services Act (DSA) introduces specific obligations for platforms. As platforms adapt to meet the provisions outlined in the DSA Minor Protection, it's important for businesses to take proactive measures to comply…
5 minutes

The Digital Services Act (DSA) Guide

What is the Digital Services Act (DSA)? The Digital Services Act, otherwise known as the DSA, is the first attempt by theEuropean Union to govern platforms at the regulatory level. Up until this point, all 27 EUmember states have each had their own laws that may or may not apply to onlineplatforms. The DSA is…
7 minutes

Prevent unwanted content from reaching your platform

Speak to one of our experts and learn about using AI to protect your platform
Talk to an expert