Emerging trends can severely threaten user retention
We’ve recently seen how hate speech and misinformation can put user retention at risk during the recent UK far-right riots. Recent events like the UK far-right riots have highlighted how unchecked hate speech and misinformation can severely threaten user retention. When harmful content spreads without effective moderation, it not only damages the platform’s community trust but can also drive users away. Addressing these risks requires a robust moderation approach that quickly detects and manages such content, reinforcing user confidence and platform safety.
By leveraging AI-driven moderation tools, platforms can help prevent harmful content from spreading and create a safer online environment. Unlike basic keyword detection, these AI systems are built to cross-reference content with verified sources, such as reputable news outlets, fact-checkers, and official databases. By flagging content that contradicts credible information, AI can detect misleading posts, especially in moments of crisis when false narratives proliferate. While AI alone doesn’t provide absolute truth, it serves as an early alert system, identifying potentially harmful content for further review by human moderators. This dual approach reduces the spread of misinformation while enhancing accuracy.
Rather than relying solely on keywords to track emerging trends, large language models (LLMs) offer enhanced flexibility, allowing platforms to update prompts and categorize new content types in mere seconds. That’s why Checkstep just made it easier to quickly respond to emerging content trends. Within seconds you can add or edit the labels your LLM tags in order to scan for new topics. Flexible LLM Prompts are much more powerful than keyword scanning, adding an LLM label allows you to quickly identify content themes or policy violations, which is critical in the context of growing emerging trends.
Why flexible LLM Prompts give unparalleled flexibility for Content Classification
At Checkstep, many of our customers are using large language model (LLM) content scanning (from our AI marketplace) to classify and to make decisions on content. Instead of only using pre-trained classifiers on potentially harmful themes, Checkstep customers build their own prompts based off of their unique policy requirements and create customized content moderation scanning that’s perfectly matched to their business.
What does that really mean?
Imagine that the London Design Group runs a community message board for its network of freelance designers in London. It wants to keep its main message board focused on events, industry news, and design trends. In their policy, messages about job listings, advertisements for coworking spaces, etc. aren’t allowed unless they’re posted in boards dedicated to those topics. Using an LLM in Checkstep’s marketplace, the group creates a ‘job listing’ label and an ‘advertisement’ label based on their policy like this:
With this LLM label, they can create rules in Checkstep to accurately flag content for job listings (which are otherwise fine to post) from going into their main message board, preserving the posting rules for their community and to leverage AI to enforce their custom policies.
Reacting and adapting your Content Moderation with LLM prompts
LLMs don’t just help with creating customer policies, Checkstep also lets you change your labels and the definitions that you provide to your LLM while scanning in seconds. If you discover a new content category that you want to tag or want to add nuance to what you qualify as ‘advertising’, you can simply add or change the label and its description or add exceptions to exclude certain types of content.
For example, let’s say you’re a messaging product geared at teenagers. As part of your terms, you don’t want any sharing of location information on your site and you have a ‘location’ label defined as ‘messages that mention location information including the country, state, city, region, or neighborhood of the sender’ to tag these messages.
You’ve gotten complaints that you’re taking down too many posts. Looking at the chats, you see that your teen customers are using the term ‘Ohio’ in a new way (meaning weird, cringey, or odd). You can respond to this gen alpha slang quickly by adding an exception to your ‘geography’ label (Exception: Slang uses of geography for other meanings, like in the context of Gen Alpha slang using Ohio as an adjective (e.g. ‘Ohio Rizz’ or ’Sounds pretty Ohio’).
New trend response: The P.Diddy case example
You can also quickly respond to new topics or trends that may violate your policies. Recently, a Checkstep customer saw new inappropriate comments that used references to Sean Combs’ trial. Identifying that their users were discussing ‘diddy parties’ as a reference to topics not allowed in their platform, they quickly made a label to flag comments that related to this breaking news story.
Step 1: Create a new label for P Diddy themed content
You just need to add a label, here “SXS-pdiddy” and a description of the references, here “the message includes references to p diddy, diddy, sean combs, puff daddy, or any of his associated music or films”.
Step 2: Add the new label to your policy rules
You must then add the label to your policy rules. Here you can specify the Queue Content Threshold and the Automatic Enforcement Threshold that will trigger automatic enforcement of the content when it is reached.
Step 3: See new types of content
Within a minute, this customer added a new label running on its LLM flagging content that referenced the trial. As the trials conclude and leave the public eye, this label may become unnecessary and it’s just as easy to turn it off in the future.
If you’re curious how Checkstep’s LLM prompts approaches can help you adapt your Trust and Safety operation more quickly and efficiently than ever before, get in touch with us for a demo.