The Ultimate Guide to GenAI Moderation x Sightengine

Map your GenAI risks and craft “AI-resilient” policies [Part 1]

GenAI presents significant challenge for platforms and the Trust & Safety field. As we head into 2025, AI-generated content and detection advancements are poised to take center stage. This post is part of a two-part blog series, co-authored with our partner Sightengine, exploring innovative strategies and solutions to address emerging threats.

In this two-part guide, Part 1 explores the foundational principles of understanding risk vectors, transparency, and policy alignment to mitigate AI-generated content risks, while Part 2 delves into advanced detection and moderation techniques using tools like Sightengine and Checkstep to enhance platform safety.

Introduction

What if the image that briefly sent the U.S. stock market into a nosedive wasn’t real? In May 2023, a highly convincing fake image of an explosion near the Pentagon spread rapidly across verified accounts on X (formerly Twitter). The result: panic, misinformation, and a temporary market dip. The incident underscores a pressing reality: generative AI content can look so real that even the sharpest eyes are fooled.

Generative AI is everywhere, shaping how we create and consume content. On platforms hosting user-generated content (UGC), this means encountering a wide range of AI-generated or AI-manipulated material daily. While some uses are harmless, like airbrushed vacation photos; others are more sinister, such as deepfakes, misinformation, or scams.
For platforms in the Trust & Safety space, the stakes are higher than ever. Beyond potential reputational and operational risks, there are real human consequences, as highlighted by the tragic case of a lawsuit against Character.AI. A grieving mother alleges that the platform’s chatbot influenced her son’s suicide.

This landscape makes it critical for platforms to anticipate and mitigate the worst-case scenarios associated with AI-generated content. Not only does this safeguard users, but it also protects your platform’s integrity and future growth.

The first step? Mapping your risks. Could you rank all your risks from 1 to 5 depending on the threat's possibility and the scale of it to give a priority? The sections below will guide you through identifying relevant risks, assessing their potential impact, and laying the groundwork for AI-resilient policies.

Misinformation,  deepfakes and creative spams: map your GenAI risks first

Introduction

AI-generated content isn’t inherently harmful. Most of it is innocuous or even beneficial, enhancing user experience by making creative tools accessible. However, risks emerge when generative AI is used in ways that deceive or exploit users. Notably, misinformation and impersonation create reputational and operational risks for any platform hosting UGC. Deepfakes and creative spam, for instance, can undermine user trust if left unchecked.

To build effective policies, platforms need a clear understanding of these risk vectors and their potential impact. Platforms should start by mapping risk vectors, including falsification and volume risks, to ensure they have a tailored moderation strategy. Knowing the types of risks, specific to user demographics and content types, is foundational for a scalable response system.

Below, we dive into three critical categories of risks and explore how platforms can address them.

1. Misinformation, disinformation and GenAI

Misinformation and disinformation did not appear with the rise of new technologies. But technologies created a new and accelerated means for bad actors to generate this content. The Integrity Institute published an interesting paper in 2023 in which x states “The increased quantity and quality of mis- and disinformation stands at the forefront of concerns given the accessibility of LLMs, there is the prospect of highly scalable influence campaigns (..) It is also possible that the low barrier of entry expands access to a greater number of actors.” Anyone can easily and at no cost create in only a couple seconds, any speech, voice over or text in any given language.

This is one of the most common risks for social media platforms. Misleading content like fabricated news stories or doctored images that look not only very convincing are also very difficult to detect. A great example of this is what happened when a fake image showing an explosion near the Pentagon was shared by several verified X accounts in May 2023. This image was so real, it wasn’t moderated causing a lot of turmoil and a brief dip in stock markets.

Entry 4075

2. Deepfakes in the GenAI era

Entry 4077

Source: Instagram, Taylor Swift

A deepfake is an artificial image or video generated by a special kind of machine learning called “deep” learning. Deepfakes represent a unique and rapidly growing challenge. Unlike misinformation, which exploits existing content, deepfakes create entirely new realities. These AI-generated images and videos can depict people doing or saying things they never did, with a realism that can fool even the sharpest eyes. The most recent public deepfake controversy was pop singer Taylor Swift seeing President Elect Donald Trump post on Truth Social a series of AI-generated images with the singer falsely endorsing his campaign.

The technology behind deepfakes and generative adversarial networks (GANs) has been around since 2014. As face-swapping technologies became more popular starting in 2017, the popularity and growth of deepfakes began to take off. Countries, like the United States are exploring regulatory measures to regulate the spread of deep fakes. So far, none have been compelling enough to stop it at a macro level.

What started as an academic experiment has become a commercialized threat. Today, deepfake tools are accessible to anyone with an internet connection. A 2023 report by Security Hero revealed:
The number of deepfake videos online has increased by 550% since 2019, reaching over 95,000.
98% of all deepfake videos are pornographic, with 99% targeting women.
There are now 42 user-friendly deepfake tools, collectively searched over 10 million times monthly.

Deepfakes pose particular risks for platforms that rely on user profiles, photos, or videos. Social media, dating apps, and even adult-content platforms face heightened vulnerability. For example, a social media platform may see deepfakes used for impersonation, while a dating app could encounter fraudulent profiles created to exploit users emotionally or financially.

Entry 4079

3. Frauds, creative scams and spams and GenAI

Entry 4081

Source: Auto Express, UK

Generative AI has significantly lowered the barrier to creating convincing fabricated content, opening the door for a new wave of frauds, creative scams, and spams. Historically, scams have relied on manual or basic digital manipulation, but with AI, the scale and sophistication of these threats have reached unprecedented levels.

AI has enabled scams to evolve from rudimentary frauds into sophisticated schemes, increasing their effectiveness and reach. Key characteristics of this new wave of frauds include:

Realism: AI-generated photos, videos, and text are often indistinguishable from authentic content, making scams more believable.
Speed: AI can produce convincing fake evidence in minutes, allowing fraudsters to operate on a much larger scale.
Cost-Effectiveness: Many AI tools are free or low-cost, making them accessible to even the most novice scammers.
Automation: Scripts and bots powered by AI can distribute scams or spam across platforms without human intervention.

For which kind of platforms this could be a higher risk?

Scam or Impersonation Risks on Dating Platforms: Fraudsters can use AI to create deepfake profiles that mimic real users or create entirely fictitious but convincing personas. These profiles can deceive users into sharing sensitive information, money, or other valuables.
Insurance Scams: Fraudsters are increasingly using AI-generated images to create fake incidents and submit fraudulent insurance claims. For instance, a recent case involved a van driver who doctored photos to fabricate damage to their vehicle and supported the claim with a forged repair invoice. AI enables the quick and realistic creation of such fake evidence.
Marketplace Scams: AI-generated photos of fake goods or altered images of damaged products can be used to manipulate buyers or sellers on e-commerce platforms.

Entry 4083

4. Build your GenAI risk map - TEMPLATE

Entry 4085

Download our GenAI Guide to get access to this simple and easy-to-reproduce table template for building your risk map that platforms can use to categorize and prioritize risks effectively.

2. Build a robust set of “GenAI-resilient” policies

Introduction

No two platforms are alike, so why should their content moderation policies be? Whether you’re moderating a bustling social media site, a niche dating platform, or an expansive e-commerce marketplace, your policies must be tailored to fit the unique dynamics of your platform. Crafting context-driven, “AI-resilient” policies isn’t just a safeguard, it’s a strategy for building trust, protecting your users, and ensuring your platform’s integrity.

1. Why GenAI-resilient policies matter?

Generative AI is rewriting the rules of user-generated content (UGC). While AI tools can empower users to create exciting and innovative content, they also introduce new risks, such as impersonation, scams, and misinformation. Without a robust set of policies, platforms may struggle to manage these risks, leading to confusion among users and potential harm to their community and reputation.

By establishing clear, context-specific guidelines, platforms can:

Set expectations for acceptable behavior.
Provide users with clarity on what’s permissible.
Mitigate risks associated with AI-generated content.

The result? A safer, more transparent ecosystem where users understand their responsibilities, and platforms can efficiently manage emerging challenges.

2. The power of context-specific guidelines

Policies must align with the nature of the platform and the behavior of its user base. A one-size-fits-all approach simply won’t cut it. Here are some examples of how policies can vary based on platform type:

Social media platforms: These platforms often deal with large-scale content sharing, making them particularly vulnerable to misinformation and deepfakes. Policies might include:
Dating apps: For dating apps, user trust is paramount. Policies should prioritize:
E-commerce marketplaces: Marketplaces face a unique set of risks tied to counterfeit goods and fraudulent listings. Effective policies might include:
This context-driven approach ensures policies are not only relevant but also actionable, addressing the specific challenges each platform faces.

3. Clear user guidelines and responsibilities

When it comes to content moderation, ambiguity is your enemy. Users should never be left guessing about what’s acceptable on your platform. By providing clear, easy-to-understand guidelines, you can foster a community that respects and adheres to your rules.

What good policies look like:

Proactive clarity: Clearly outline what types of AI-generated content are allowed and under what conditions. For example, an art-sharing platform might permit AI creations but require proper attribution or labeling.
Consistent enforcement: Policies are only as good as their enforcement. Ensure moderation teams have the tools and training to apply guidelines uniformly.
User education: Go beyond rules, help users understand why certain policies are in place. For instance, explain that banning AI-generated deepfake profiles on a dating platform protects user trust and safety.
Platforms that communicate these rules effectively create a sense of accountability and collaboration with their users.

4. Examples of policy differentiation

To illustrate how AI-resilient policies adapt to different platforms, consider the following scenarios:
Social media: A policy could specify that AI-generated political content must include disclaimers, helping users distinguish between human-made and AI-created posts.

Art communities: A platform like DeviantArt might allow AI-generated artwork but require that creators disclose the tools used, preserving transparency in the creative process.
Marketplace transactions: An e-commerce platform like Etsy might implement rules against AI-generated counterfeit products, requiring sellers to provide proof of authenticity for high-risk items.

Creating AI-resilient policies doesn’t happen overnight, but the effort pays dividends in safeguarding your platform. Start by understanding your user base and the specific risks AI-generated content poses.

Then, build policies that are:

Tailored: Align rules with your platform’s purpose and audience.
Transparent: Ensure users understand both the guidelines and their rationale.
Adaptable: Stay prepared to update policies as AI technology evolves. 
By taking these steps, platforms can foster trust, reduce risks, and maintain the integrity of their UGC ecosystems in the age of generative AI.

3. Educate your users and embrace transparency with context-driven operations

Introduction

Moderating content is just one piece of the puzzle in managing generative AI on your platform. To truly foster trust and safety, platforms must go beyond moderation. Educating users, prioritizing transparency, and understanding the nuances of context and impact are just as vital. These steps not only help mitigate risks but also empower users to engage responsibly with AI-generated content.

1. Transparency: let users see the bigger picture about GenAI

The question isn’t just whether content should be moderated: it’s whether platforms should make it clear when content is AI-generated. 

The answer is increasingly a resounding “yes.” Transparency about AI origins is rapidly becoming an industry standard, driven in part by legislation. The EU Artificial Intelligence (AI) Act is paving the way for global regulation, requiring platforms to label AI-generated content so users understand its origins. This approach isn’t just about compliance; it’s about trust. 

Big players like Meta have already embraced this model. On platforms like Facebook and Instagram, AI-generated content comes with labels and additional descriptions to inform users about its creation. By adopting similar practices, platforms can reassure users and foster a culture of openness.

2. Context, impact, and virality: beyond black & white moderation

Generative AI is blurring the lines between what’s real and what’s not. Yet one thing remains constant: is there potential for harm? regardless of whether the content is AI-generated or human-made. This is where context becomes indispensable. Without understanding these nuances moderation decisions can be inconsistent or unfair. 

Not all AI-generated content requires removal. In fact, much of it can remain untouched as long as its context and impact are understood. Consider these nuances: 

Context matters: A harmless AI-generated joke might be fine on one platform but problematic on another due to its audience or setting. 
Impact is key: Even seemingly innocuous content can have outsized consequences. A joke may go viral, sparking waves of harassment or amplifying misinformation. Policies must account for not only the nature of the content but also its potential to cause harm at scale. 
Virality amplifies risk: Generative AI allows content to spread faster and further than ever before. Something intended as a small joke or creative experiment can become a global phenomenon overnight, potentially causing harm. Platforms must monitor the virality of AI-generated content and assess whether its impact aligns with their policies.

3. The role of policies in balancing transparency and moderation for GenAI

Entry 4087

Transparent operations and context-sensitive evaluations are crucial for managing generative AI, but they work best when paired with robust, actionable policies.   

Policies should:

Provide guidelines on how AI-generated content is flagged, labeled, and moderated.
Equip moderators and classifiers with tools to assess content for both harmful intent and unintended consequences.
Balance between removing harmful content and allowing harmless, creative uses of generative AI.

For example, a platform might decide not to ban an AI-generated meme outright but to intervene if it’s weaponized for harassment. Similarly, AI classifiers should flag potentially harmful content for human review, ensuring moderation decisions are fair and consistent.  If you need a template to write or rewrite your content moderation policies, we’ve got you covered.

Conclusion

By understanding risk vectors and incorporating transparency and policy alignment, platforms can significantly reduce the potential harms posed by AI-generated content. We mapped these foundational elements, let’s now dive with part 2 into advanced detection and moderation techniques using tools like Sightengine and Checkstep. These technologies help automate complex moderation tasks, adding the adaptability needed to keep up with GenAI content risks.

To learn more about GenAI moderation and get started with easy-to-use templates, download our unified guide below.

The Ultimate Guide to GenAI Moderation x Sightengine

Share

Share

Map your GenAI risks and craft “AI-resilient” policies [Part 1]

Introduction

Misinformation,  deepfakes and creative spams: map your GenAI risks first

Introduction

1. Misinformation, disinformation and GenAI

2. Deepfakes in the GenAI era

3. Frauds, creative scams and spams and GenAI

4. Build your GenAI risk map - TEMPLATE

2. Build a robust set of “GenAI-resilient” policies

Introduction

1. Why GenAI-resilient policies matter?

2. The power of context-specific guidelines

3. Clear user guidelines and responsibilities

4. Examples of policy differentiation

3. Educate your users and embrace transparency with context-driven operations

Introduction

1. Transparency: let users see the bigger picture about GenAI

2. Context, impact, and virality: beyond black & white moderation

3. The role of policies in balancing transparency and moderation for GenAI

Conclusion

Previous

Building Safer Dating Platforms with AI Moderation

Next

Checkstep's 2024: A Year of Innovation, Impact, and Safety

Want to see our AI content moderation platform for yourself?

The Ultimate Guide to GenAI Moderation x Sightengine

Share

Share

Map your GenAI risks and craft “AI-resilient” policies [Part 1]

Introduction

Misinformation, deepfakes and creative spams: map your GenAI risks first

Introduction

1. Misinformation, disinformation and GenAI

2. Deepfakes in the GenAI era

3. Frauds, creative scams and spams and GenAI

4. Build your GenAI risk map - TEMPLATE

2. Build a robust set of “GenAI-resilient” policies

Introduction

1. Why GenAI-resilient policies matter?

2. The power of context-specific guidelines

3. Clear user guidelines and responsibilities

4. Examples of policy differentiation

3. Educate your users and embrace transparency with context-driven operations

Introduction

1. Transparency: let users see the bigger picture about GenAI

2. Context, impact, and virality: beyond black & white moderation

3. The role of policies in balancing transparency and moderation for GenAI

Conclusion

Previous

Building Safer Dating Platforms with AI Moderation

Next

Checkstep's 2024: A Year of Innovation, Impact, and Safety

Want to see our AI content moderation platform for yourself?

Misinformation,  deepfakes and creative spams: map your GenAI risks first