fbpx

The Ultimate Guide to GenAI Moderation x Sightengine

GenAI Guide

Map your GenAI risks and craft “AI-resilient” policies [Part 1]

GenAI presents significant challenge for platforms and the Trust & Safety field. As we head into 2025, AI-generated content and detection advancements are poised to take center stage. This post is part of a two-part blog series, co-authored with our partner Sightengine, exploring innovative strategies and solutions to address emerging threats.

In this two-part guide, Part 1 explores the foundational principles of understanding risk vectors, transparency, and policy alignment to mitigate AI-generated content risks, while Part 2 delves into advanced detection and moderation techniques using tools like Sightengine and Checkstep to enhance platform safety.

Introduction

What if the image that briefly sent the U.S. stock market into a nosedive wasn’t real? In May 2023, a highly convincing fake image of an explosion near the Pentagon spread rapidly across verified accounts on X (formerly Twitter). The result: panic, misinformation, and a temporary market dip. The incident underscores a pressing reality: generative AI content can look so real that even the sharpest eyes are fooled.

Generative AI is everywhere, shaping how we create and consume content. On platforms hosting user-generated content (UGC), this means encountering a wide range of AI-generated or AI-manipulated material daily. While some uses are harmless, like airbrushed vacation photos; others are more sinister, such as deepfakes, misinformation, or scams.
For platforms in the Trust & Safety space, the stakes are higher than ever. Beyond potential reputational and operational risks, there are real human consequences, as highlighted by the tragic case of a lawsuit against Character.AI. A grieving mother alleges that the platform’s chatbot influenced her son’s suicide.

This landscape makes it critical for platforms to anticipate and mitigate the worst-case scenarios associated with AI-generated content. Not only does this safeguard users, but it also protects your platform’s integrity and future growth.

The first step? Mapping your risks. Could you rank all your risks from 1 to 5 depending on the threat’s possibility and the scale of it to give a priority? The sections below will guide you through identifying relevant risks, assessing their potential impact, and laying the groundwork for AI-resilient policies.

Misinformation, 
deepfakes and creative spams: map your GenAI risks first

Introduction

AI-generated content isn’t inherently harmful. Most of it is innocuous or even beneficial, enhancing user experience by making creative tools accessible. However, risks emerge when generative AI is used in ways that deceive or exploit users. Notably, misinformation and impersonation create reputational and operational risks for any platform hosting UGC. Deepfakes and creative spam, for instance, can undermine user trust if left unchecked.

To build effective policies, platforms need a clear understanding of these risk vectors and their potential impact. Platforms should start by mapping risk vectors, including falsification and volume risks, to ensure they have a tailored moderation strategy. Knowing the types of risks, specific to user demographics and content types, is foundational for a scalable response system. 

Below, we dive into three critical categories of risks and explore how platforms can address them.

1. Misinformation, disinformation and GenAI

Misinformation and disinformation did not appear with the rise of new technologies. But technologies created a new and accelerated means for bad actors to generate this content. The Integrity Institute published an interesting paper in 2023 in which x states “The increased quantity and quality of mis- and disinformation stands at the forefront of concerns given the accessibility of LLMs, there is the prospect of highly scalable influence campaigns (..) It is also possible that the low barrier of entry expands access to a greater number of actors.” Anyone can easily and at no cost create in only a couple seconds, any speech, voice over or text in any given language.
 
This is one of the most common risks for social media platforms. Misleading content like fabricated news stories or doctored images that look not only very convincing are also very difficult to detect. A great example of this is what happened when a fake image showing an explosion near the Pentagon was shared by several verified X accounts in May 2023. This image was so real, it wasn’t moderated causing a lot of turmoil and a brief dip in stock markets. 

2. Deepfakes in the GenAI era

GenAI Deepfakes

Source: Instagram, Taylor Swift

A deepfake is an artificial image or video generated by a special kind of machine learning called “deep” learning. Deepfakes represent a unique and rapidly growing challenge. Unlike misinformation, which exploits existing content, deepfakes create entirely new realities. These AI-generated images and videos can depict people doing or saying things they never did, with a realism that can fool even the sharpest eyes. The most recent public deepfake controversy was pop singer Taylor Swift seeing President Elect Donald Trump post on Truth Social a series of AI-generated images with the singer falsely endorsing his campaign. 

The technology behind deepfakes and generative adversarial networks (GANs) has been around since 2014. As face-swapping technologies became more popular starting in 2017, the popularity and growth of deepfakes began to take off. Countries, like the United States are exploring regulatory measures to regulate the spread of deep fakes. So far, none have been compelling enough to stop it at a macro level. 

What started as an academic experiment has become a commercialized threat. Today, deepfake tools are accessible to anyone with an internet connection. A 2023 report by Security Hero revealed:
The number of deepfake videos online has increased by 550% since 2019, reaching over 95,000.
98% of all deepfake videos are pornographic, with 99% targeting women.
There are now 42 user-friendly deepfake tools, collectively searched over 10 million times monthly.

Deepfakes pose particular risks for platforms that rely on user profiles, photos, or videos. Social media, dating apps, and even adult-content platforms face heightened vulnerability. For example, a social media platform may see deepfakes used for impersonation, while a dating app could encounter fraudulent profiles created to exploit users emotionally or financially.

Deepfakes GenAI Risk Map

3. Frauds, creative scams and spams and GenAI

GenAI Insurance Scams

Source: Auto Express, UK

Generative AI has significantly lowered the barrier to creating convincing fabricated content, opening the door for a new wave of frauds, creative scams, and spams. Historically, scams have relied on manual or basic digital manipulation, but with AI, the scale and sophistication of these threats have reached unprecedented levels. 

AI has enabled scams to evolve from rudimentary frauds into sophisticated schemes, increasing their effectiveness and reach. Key characteristics of this new wave of frauds include:

  • Realism: AI-generated photos, videos, and text are often indistinguishable from authentic content, making scams more believable.
  • Speed: AI can produce convincing fake evidence in minutes, allowing fraudsters to operate on a much larger scale.
  • Cost-Effectiveness: Many AI tools are free or low-cost, making them accessible to even the most novice scammers.
  • Automation: Scripts and bots powered by AI can distribute scams or spam across platforms without human intervention.

For which kind of platforms this could be a higher risk? 

  • Scam or Impersonation Risks on Dating Platforms: Fraudsters can use AI to create deepfake profiles that mimic real users or create entirely fictitious but convincing personas. These profiles can deceive users into sharing sensitive information, money, or other valuables.
    • Platforms at risk: Dating apps such as Hinge, Bumble, or Tinder, which heavily rely on user profiles and images.
  • Insurance Scams: Fraudsters are increasingly using AI-generated images to create fake incidents and submit fraudulent insurance claims. For instance, a recent case involved a van driver who doctored photos to fabricate damage to their vehicle and supported the claim with a forged repair invoice. AI enables the quick and realistic creation of such fake evidence.
    • Platforms at risk: Insurance platforms, or any digital-first claims systems that rely heavily on photographic evidence, are particularly vulnerable.
  • Marketplace Scams: AI-generated photos of fake goods or altered images of damaged products can be used to manipulate buyers or sellers on e-commerce platforms.
    • Platforms at risk: Image-based transaction platforms like eBay, Craigslist, or Facebook Marketplace.

Scams GenAI Risk Map

4. Build your GenAI risk map – TEMPLATE

GenAI Moderation Guide

Download our GenAI Guide to get access to this simple and easy-to-reproduce table template for building your risk map that platforms can use to categorize and prioritize risks effectively.

2. Build a robust set of “GenAI-resilient” policies 

Introduction

No two platforms are alike, so why should their content moderation policies be? Whether you’re moderating a bustling social media site, a niche dating platform, or an expansive e-commerce marketplace, your policies must be tailored to fit the unique dynamics of your platform. Crafting context-driven, “AI-resilient” policies isn’t just a safeguard, it’s a strategy for building trust, protecting your users, and ensuring your platform’s integrity.

1. Why GenAI-resilient policies matter?

Generative AI is rewriting the rules of user-generated content (UGC). While AI tools can empower users to create exciting and innovative content, they also introduce new risks, such as impersonation, scams, and misinformation. Without a robust set of policies, platforms may struggle to manage these risks, leading to confusion among users and potential harm to their community and reputation.

By establishing clear, context-specific guidelines, platforms can:

  • Set expectations for acceptable behavior.
  • Provide users with clarity on what’s permissible.
  • Mitigate risks associated with AI-generated content.

    The result? A safer, more transparent ecosystem where users understand their responsibilities, and platforms can efficiently manage emerging challenges.

2. The power of context-specific guidelines

Policies must align with the nature of the platform and the behavior of its user base. A one-size-fits-all approach simply won’t cut it. Here are some examples of how policies can vary based on platform type:

  • Social media platforms: These platforms often deal with large-scale content sharing, making them particularly vulnerable to misinformation and deepfakes. Policies might include:
    • Banning AI-generated impersonations of public figures.
    • Requiring AI-generated media to be labeled as such.
    • Implementing strict penalties for misleading or harmful content.

  • Dating apps: For dating apps, user trust is paramount. Policies should prioritize:
    • Prohibiting AI-generated or manipulated profile photos.
    • Mandating authenticity checks for user-generated images and text.
    • Establishing guidelines against the use of chatbots to deceive users.

  • E-commerce marketplaces: Marketplaces face a unique set of risks tied to counterfeit goods and fraudulent listings. Effective policies might include:
    • Requiring verification for listings suspected to contain AI-generated content.
    • Banning the sale of AI-generated counterfeit goods.
    • Instituting strict measures against fraudulent claims, such as fake product images.

  • This context-driven approach ensures policies are not only relevant but also actionable, addressing the specific challenges each platform faces.

3. Clear user guidelines and responsibilities

When it comes to content moderation, ambiguity is your enemy. Users should never be left guessing about what’s acceptable on your platform. By providing clear, easy-to-understand guidelines, you can foster a community that respects and adheres to your rules.

What good policies look like:

  • Proactive clarity: Clearly outline what types of AI-generated content are allowed and under what conditions. For example, an art-sharing platform might permit AI creations but require proper attribution or labeling.
  • Consistent enforcement: Policies are only as good as their enforcement. Ensure moderation teams have the tools and training to apply guidelines uniformly.
  • User education: Go beyond rules, help users understand why certain policies are in place. For instance, explain that banning AI-generated deepfake profiles on a dating platform protects user trust and safety.
  • Platforms that communicate these rules effectively create a sense of accountability and collaboration with their users.

4. Examples of policy differentiation

To illustrate how AI-resilient policies adapt to different platforms, consider the following scenarios:
Social media: A policy could specify that AI-generated political content must include disclaimers, helping users distinguish between human-made and AI-created posts.

  • Art communities: A platform like DeviantArt might allow AI-generated artwork but require that creators disclose the tools used, preserving transparency in the creative process.
  • Marketplace transactions: An e-commerce platform like Etsy might implement rules against AI-generated counterfeit products, requiring sellers to provide proof of authenticity for high-risk items.

Creating AI-resilient policies doesn’t happen overnight, but the effort pays dividends in safeguarding your platform. Start by understanding your user base and the specific risks AI-generated content poses.

Then, build policies that are:

  • Tailored: Align rules with your platform’s purpose and audience.
  • Transparent: Ensure users understand both the guidelines and their rationale.
  • Adaptable: Stay prepared to update policies as AI technology evolves.

  • By taking these steps, platforms can foster trust, reduce risks, and maintain the integrity of their UGC ecosystems in the age of generative AI.

3. Educate your users and embrace transparency with context-driven operations

Introduction

Moderating content is just one piece of the puzzle in managing generative AI on your platform. To truly foster trust and safety, platforms must go beyond moderation. Educating users, prioritizing transparency, and understanding the nuances of context and impact are just as vital. These steps not only help mitigate risks but also empower users to engage responsibly with AI-generated content.

1. Transparency: let users see the bigger picture about GenAI

The question isn’t just whether content should be moderated: it’s whether platforms should make it clear when content is AI-generated.


The answer is increasingly a resounding “yes.” Transparency about AI origins is rapidly becoming an industry standard, driven in part by legislation. The EU Artificial Intelligence (AI) Act is paving the way for global regulation, requiring platforms to label AI-generated content so users understand its origins. This approach isn’t just about compliance; it’s about trust.


Big players like Meta have already embraced this model. On platforms like Facebook and Instagram, AI-generated content comes with labels and additional descriptions to inform users about its creation. By adopting similar practices, platforms can reassure users and foster a culture of openness.

2. Context, impact, and virality: beyond black & white moderation

Generative AI is blurring the lines between what’s real and what’s not. Yet one thing remains constant: is there potential for harm? regardless of whether the content is AI-generated or human-made. This is where context becomes indispensable. Without understanding these nuances moderation decisions can be inconsistent or unfair.


Not all AI-generated content requires removal. In fact, much of it can remain untouched as long as its context and impact are understood. Consider these nuances:


  • Context matters: A harmless AI-generated joke might be fine on one platform but problematic on another due to its audience or setting.

  • Impact is key: Even seemingly innocuous content can have outsized consequences. A joke may go viral, sparking waves of harassment or amplifying misinformation. Policies must account for not only the nature of the content but also its potential to cause harm at scale.

  • Virality amplifies risk: Generative AI allows content to spread faster and further than ever before. Something intended as a small joke or creative experiment can become a global phenomenon overnight, potentially causing harm. Platforms must monitor the virality of AI-generated content and assess whether its impact aligns with their policies.

3. The role of policies in balancing transparency and moderation for GenAI

Content Moderation Policies template

Transparent operations and context-sensitive evaluations are crucial for managing generative AI, but they work best when paired with robust, actionable policies. 



Policies should:

  • Provide guidelines on how AI-generated content is flagged, labeled, and moderated.
  • Equip moderators and classifiers with tools to assess content for both harmful intent and unintended consequences.
  • Balance between removing harmful content and allowing harmless, creative uses of generative AI.


For example, a platform might decide not to ban an AI-generated meme outright but to intervene if it’s weaponized for harassment. Similarly, AI classifiers should flag potentially harmful content for human review, ensuring moderation decisions are fair and consistent. 
If you need a template to write or rewrite your content moderation policies, we’ve got you covered.

Conclusion

By understanding risk vectors and incorporating transparency and policy alignment, platforms can significantly reduce the potential harms posed by AI-generated content. We mapped these foundational elements, let’s now dive with part 2 into advanced detection and moderation techniques using tools like Sightengine and Checkstep. These technologies help automate complex moderation tasks, adding the adaptability needed to keep up with GenAI content risks.

To learn more about GenAI moderation and get started with easy-to-use templates, download our unified guide below.

More posts like this

We want content moderation to enhance your users’ experience and so they can find their special one more easily.

Top 3 Digital Services Act Tools to make your compliance easier

Introduction The Digital Service Act (DSA) is a European regulation amending the June, 8th 2000 Directive on electronic commerce (Directive 2000/31/EC). Its goal is to modernize and harmonize national legislation within the internal market in response to the risks and challenges of digital transformation. The DSA applies to a large range of digital services such…
12 minutes

The Evolution of Online Communication: Cultivating Safe and Respectful Interactions

What was once an outrageous dream is now a mundane reality. Going from in-person communication to being able to hold a conversation from thousands of kilometres away has been nothing short of revolutionary. From the invention of email to the meteoric rise of social media and video conferencing, the ways we connect, share, and interact…
5 minutes

TikTok DSA Statement of Reasons (SOR) Statistics

What can we learn from TikTok Statements of Reasons? Body shaming, hypersexualisation, the spread of fake news and misinformation, and the glorification of violence are a high risk on any kind of Social Network. TikTok is one of the fastest growing between 2020 and 2023 and has million of content uploaded everyday on its platform.…
10 minutes

How to deal with Fake Dating Profiles on your Platform

Have you seen an increase in fake profiles on your platform? Are you concerned about it becoming a wild west? In this article, we’ll dive into how to protect users from encountering bad actors and create a safer environment for your customers. An Introduction to the Issue Dating apps have transformed the way people interact…
5 minutes

Moderation Strategies for Decentralised Autonomous Organisations (DAOs)

Decentralised Autonomous Organizations (DAOs) are a quite recent organisational structure enabled by blockchain technology. They represent a complete structural shift in how groups organise and make decisions, leveraging decentralised networks and smart contracts to facilitate collective governance and decision-making without a centralised authority. The concept of DAOs emerged in 2016 with the launch of "The…
6 minutes

How Content Moderation Can Save a Brand’s Reputation

Brand safety and perception have always been important factors to look out for in any organisation, but now, because we live in a world where social media and the internet play an essential role in the way we interact, that aspect has exponentially grown in importance. The abundance of user-generated content on different platforms offers…
5 minutes

How to Keep your Online Community Abuse-Free

The Internet & Community Building In the past, if you were really into something niche, finding others who shared your passion in your local area was tough. You might have felt like you were the only one around who had that particular interest. But things have changed a lot since then. Now, thanks to the…
6 minutes

9 Industries Benefiting from AI Content Moderation

As the internet becomes an integral part of people's lives, industries have moved towards having a larger online presence. Many businesses in these industries have developed online platforms where user-generated content (UGC) plays a major role. From the rise of online healthcare to the invention of e-learning, all of these promote interaction between parties through…
8 minutes

Why moderation has become essential for UGC 

User-Generated Content (UGC) has become an integral part of online participation. Any type of material—whether it's text, photos, videos, reviews, or discussions—that is made and shared by people instead of brands or official content providers is called user-generated content. Representing variety and honesty, it is the online community's collective voice. Let's explore user-generated content (UGC)…
6 minutes

3 Facts you Need to Know about Content Moderation and Dating Going into 2024

What is Content Moderation? Content moderation is the practice of monitoring and managing user-generated content on digital platforms to ensure it complies with community guidelines, legal standards, and ethical norms. This process aims to create a safe and inclusive online environment by preventing the spread of harmful, offensive, or inappropriate content. The rise of social…
6 minutes

How to use Content Moderation to Build a Positive Brand Image

The idea of reputation has changed dramatically in the digital age, moving from conventional word-of-mouth to the wide world of user-generated material on the internet. Reputation has a long history that reflects changes in communication styles, cultural developments, and technological advancements. The importance of internet reviews has been highlighted by recent research conducted by Bright…
5 minutes

Live Chat Moderation Guide

Interactions have moved online, and people now have the ability to interact as users, share content, write comments, and voice their opinions online. This revolution in the way people interact has led to the rise of many businesses that use live chat conversations and text content as one of their main components. Let's take, for…
10 minutes

The Digital Services Act (DSA) Guide

What is the Digital Services Act (DSA)? The Digital Services Act, otherwise known as the DSA, is the first attempt by theEuropean Union to govern platforms at the regulatory level. Up until this point, all 27 EUmember states have each had their own laws that may or may not apply to onlineplatforms. The DSA is…
7 minutes

Virtual Reality Content Moderation Guide

Its’s no surprise that virtual reality (VR) and the Metaverse have become buzzwords in the world of technology. Notably, these immersive experiences are revolutionising the way we interact with digital content and each other. However, as the popularity of VR continues to grow, attracting more and more users, so does the need for content moderation.…
14 minutes

The longest 20 days and 20 nights: how can Trust & Safety Leaders best prepare for US elections

Trust and Safety leaders during the US elections: are you tired of election coverage and frenzied political discussion yet? It’s only 20 days until the US votes to elect either Kamala Harris or Donald Trump into the White House and being a Trust and Safety professional has never been harder. Whether your site has anything…
5 minutes

Prevent unwanted content from reaching your platform

Speak to one of our experts and learn about using AI to protect your platform
Talk to an expert