Safety vs Privacy: Direct Messages

Platforms which allow users to message other users directly might want to moderate those interactions for safety purposes. However, most types of private message scanning is illegal in Europe under the e-Privacy Directive and in some cases is directly prohibited by encryption. This can pose a risk to your users and your platform — especially as private messaging channels are often used for more severe types of harm, i.e. sharing of illegal media, bullying, extortion, grooming and radicalisation.

There are different ways for platforms to still protect their communities and brand which depends on their overall Trust & Safety strategy which relies upon the following factors:

  1. Current platform Trust & Safety risk and impact assessment
  2. Strategic, regulatory and financial constraints
  3. Safety, security and privacy trade-off framework

Once a platform has established their overall T&S strategy, different strategies can help with mitigating risk from private messaging:

  1. Prevention by targeting malicious actors before they get a chance to create victims
  2. Enhanced reporting, allowing users to report harmful content and behaviour themselves
  3. Multiple account detection of users who have more than one account and/or are returning to a platform with another account after being banned (recidivism)

Every strategy has their pros and cons, steep reductions of harm prevalence can go hand-in-hand with processing large amounts of data, operational overhead or even risking over-enforcement. The best solution will likely be a combination of all the above strategies.

Checkstep, a Trust & Safety Software designed to manage end-to-end operations and privacy and online safety regulations, offers tools and services to help with a detailed risk assessment and implementation strategy — tailored to each platform’s needs and with built-in compliance features.

Overview of Trust & Safety strategies, covering prevention, enhanced reporting, and multiple account detection
Trust & Safety strategy overview

Scanning users private messages for Trust and Safety purposes? This is where privacy conflicts with trust and safety, creating confusion for platforms on how to proceed. If you are in Europe, this type of scanning is likely illegal under the e-Privacy Directive (ePD, Directive 2002/58/EC), which prohibits any type of monitoring or intercepting of messages or associated metadata (Article 5 of ePD). Often risk is contained in private communication, so what can you do about it?

First, let’s discuss how bad actors often operate. Most bad actors operate in the same way; they use a public platform service to identify potential victims; either by targeting victims who fit a specific profile, or by targeting as many users as possible. Once they have their victim, they need to move them into a 1–1 messaging system, either provided by the platform or another platform where they can continue their potentially harmful behaviour. There are limited opportunities to disrupt this flow, as messaging can be encrypted and/or it might be illegal to scan, in addition to bad actors becoming increasingly good at evading detection. You can find forums in dark corners of the internet where all kinds of bad actors discuss how they evade detection on most platforms. Detecting badness and risk is a very adversarial issue. The adversarial nature means detection is always changing. As soon as something is put in place, the bad actors change their Modus Operandi to avoid it: this is a costly game of “whack-a-mole”.

The costs vary in tangibility.. You may need ML/AI and engineering teams dedicated to the specific issue, as well as researchers and governance professionals to help manage it (more tangible).The malicious actors can also cause user churn and reputational damage to your brand (less tangible).These damaging experiences span the spectrum. Depending on your platform’s offering, you may see harms that range from bullying to threats to life . How platforms solve the issue depends on several factors:


Assessing risks within your platform/service. You likely have this intelligence already, but codifying it into a risk assessment process is good practice and positions you better for the later steps. Think of this as a data protection impact assessment but for trust and safety.

Questions to ask yourself:

  1. What risks are you likely to face?
  2. What are the impacts of those risks?
  3. Are your users minors? Are there specific vulnerable users that may face a disproportionate impact from certain problem types?


Assessing financial, platform/service risk appetite and the governance strategy of your platform/service.

Questions to ask yourself:

  1. How much risk are you willing to accept?
  2. Is there a calculated approach to limiting harm that balances enforcement with platform growth?
  3. What is the platform/service’s legal tolerance overall?


Once you know what the risks are and the environment you are operating in, it is time to make some hard decisions.

Questions to ask yourself:

  1. What is the budget, and where is it best spent?
  2. Are you trading off privacy for safety, or security for privacy?

This forms your strategy for trust and safety.

Responsibility, Adherence to Legislation, Transparency, Remediation, Fairness
Points to consider when building your Trust & Safety strategy

The optimal approach to solve problems is the one that works best for you. Most platform services want to keep to a “middle of the pack” risk model, whereby they lean more to one direction, but not so much that they completely disregard the other direction. A good example would be lowered thresholds for detecting potential scammers. This is potentially more intrusive on privacy , but that risk can be mitigated by following principles such as minimal data scanning, and almost immediate deletion for benign content. An example of going too far in one direction is scanning private messages — this is an outright violation of the law.

There are different strategies that can help overcome the restriction of proactive detection in messaging. The layers of a good overall strategy are:

Strategy 1: Prevention

There is a very limited window of opportunity for action just before (or as) the bad actor approaches a victim transitioning from a public platform to a private one. Malicious actors often share traits in their profiles and behaviour. These similarities reveal patterns that can be detected by AI and actioned before a bad interaction even takes place. One possible concern with this approach is that it presumes a user is “guilty until proven innocent”. The mitigation here is a defensible, logical and transparent strategy to how that decision was made.This approach forms the bedrock of this defence.


  1. Prevalence of bad interactions on messaging decreases
  2. Can also result in a drop in prevalence of “bad” content on all surfaces
  3. Bad interactions are prevented, rather than remediated


  1. “Guilty until proven innocent” is a big concern here
  2. Large amounts of data processed
  3. Could easily lead to over-enforcement (without active counter measures)

Strategy 2: Enhanced reporting

The restrictions on scanning message content under the e-PD don’t apply if the user reports the content themselves. The issue is that getting users to report is notoriously difficult to do, but there are ways we can make it more likely. Prominence, ease of use, and education are the three pillars that underpin a good reporting strategy. Again, these pillars have to balance the overall user experience of the product. However, through A/B testing and iterative processes, a suitable balance can be found.

For instance, every time a new message thread is started, a banner could remind users about reporting bad behaviour. This banner can have varying messages, mixing action-based prompts together with education-based ones. It’s a chance to remind users that we rely on them to report bad actors — this is their community to moderate too!

The Steps of Enhanced Reporting: Prominence, Ease of Use, Education, Benefits & Limitations
Enhanced Reporting steps

Strategy 3: Multiple account detection

Bad actors often operate and maintain multiple accounts, knowing that sooner or later their account will be placed under a restriction. At this point, they might switch to a mirror account, also known as recidivism. Users may also spin up new accounts on an ad-hoc basis if they are banned permanently or suspended temporarily. Restricting accounts by suspension or banning is pointless if the bad actor can just use another account.

The measures taken to protect the community need to be effective, and not just dealing with an isolated problem. Repeat offenders have to be targeted and will only get swept up when they re-offend. Without analysis and strategies in place, a small number of bad actors can be responsible for a large proportion of issues.

Multiple Account Detection.
Multiple Account Detection strategies, benefits and limitations

How does this work in practice?

Basic strategy — single datapoint based:

  • Blacklist email addresses when users are suspended
  • Blacklist sign up information (telephone numbers, specific IP addresses, payment card information, paypal accounts) so that if a bad actor uses some credential from a banned account, it is flagged or blocked

Intermediate strategy — heuristic based:

  • Use above information to not only block/flag new sign ups, but also fan-out search when a user is blocked to look at other information
  • Use more nuanced details, such as profile pictures, bio descriptions, contacts in a heuristic model

Advanced strategy — ML based

  • Use graph data to create a risk model
  • Risk model can take action on proactive and retroactive basis

Of course, this all has to be applied to your particular product with its associated nuances and the assets you have available (for instance, you may not have a graph). There will also need to be an appeal process to help remediate any false positives and restore accounts.


  1. Makes sanctions against users more meaningful
  2. Prevents recidivism
  3. Can help identify hacked accounts by identifying shared credentials, can also help accurately represent active users for reporting purposes


  1. False positives or over enforcement can creep up, especially if ML based becomes too reliant on location signals or device IDs (student houses, family members, public computers etc.)
  2. Disposable contact points are all too easy, although you can blacklist the common domains


The best solution is likely a combination of all three strategies at varying levels. Ultimately, your approach to enforcement is dependent on the resources available, your tech stack, and your unique risk matrix. Each strategy has its own considerations, and a holistic data governance strategy must be in place to underpin any data processing.

How does Checkstep help?

Preventing an event from happening is possible. AI can recognize unique “badness” signals specific to your platform and take action before the harm happens. As ever with our service, everything is configurable. The AI is tuned to your specific service, and you have control over the thresholds so the risk model of your service can be accurately represented.

Preventing recidivism should also be high on your agenda, and we can develop bespoke strategies to help you prevent known bad actors using multiple accounts. We have an experienced in-house team who have worked on all kinds of problem types at scale for top tech companies. Recidivism is usually seen as the low-hanging fruit of content moderation. With the appropriate strategies in place, the prevalence of recidivist behaviour can see a notable decrease.

Reporting is a trial and error issue, but once the reports are coming into the platform the key to success is accuracy and automation. The models can make a first pass at reports raised by users, and can even take automated actions to suspend account access.The thresholds at which this happens are completely customisable and in your control. At the very least, the reports can be sorted into different queues and prioritised based on custom preferences so that human reviewers’ time can be utilised effectively. The automated approach can also protect your human reviewers from the worst harms by taking automated actions while limiting them to distressing content.

All of the actions on each strategy are customisable too. An account can be blocked, taken down, placed into read-only mode automatically. You can even combine human and AI review to make a decision..

Once this is done, a notification is automatically generated explaining to the user what action has been taken, why it was taken, and how to appeal this decision.. Again, notices are customisable so you can balance transparency with safety and protect your brand identity. Ensuring a user appeal process is in place complies with legislations and reduces the risk of false positives while also making users feel valued and heard

Of course this applies to automated detection as required by the EU, but we also know that users often don’t report incidents regardless of how frictionless the process is. Despite its name, proactive detection requires a triggering event to happen, which means the damage is done by the time any action is taken. Every strategy needs an appeal process to ensure fairness, remediate false positives, and give users a good experience.

This also keeps your brand protected on multiple legislative fronts, covering obligations under existing legislation such as the e-Privacy DirectiveGDPR Article 22 (automated decision making) and upcoming legislation such as the Digital Services Act (DSA).

The regulatory landscape is moving swiftly, in the EU especially, but North America and APAC are also moving fast. Checkstep even has built-in automated transparency reporting that complies with upcoming DSA requirements. We also know you need trust and safety systems to work in your wider governance operation, so we invest in holistic data governance practices and security as well

About the author

Kieron Maddison has worked in compliance for more than five years, including OneTrust, Trainline and Meta (Facebook). Kieron’s role at Meta was a Program Manager where he was the link between trust and safety, privacy, security and product/engineering. In his role at Meta he was instrumental in defining new approaches for program management and worked to spin up new initiatives from support to the oversight board. Kieron is passionate about protecting people’s rights, particularly children and young people, and is highly analytical in balancing this with business needs.

More posts like this

We want content moderation to enhance your users’ experience and so they can find their special one more easily.

Fake Dating Pictures: A Comprehensive Guide to Identifying and Managing 

In the world of online dating, fake dating pictures are harmful, as pictures play a crucial role in making a strong first impression. However, not all dating pictures are created equal. There is a growing concern about fake profiles using deceptive or doctored images.  To navigate the online dating landscape successfully, it's important to know
5 minutes

Transforming Text Moderation with Content Moderation AI

In today's interconnected world, text-based communication has become a fundamental part of our daily lives. However, with the exponential growth of user-generated text content on digital platforms, ensuring a safe and inclusive online environment has become a daunting task. Text moderation plays a critical role in filtering and managing user-generated content to prevent harmful or
4 minutes

Streamline Audio Moderation with the Power of AI

In today's digitally-driven world, audio content has become an integral part of online platforms, ranging from podcasts and audiobooks to user-generated audio clips on social media. With the increasing volume of audio content being generated daily, audio moderation has become a critical aspect of maintaining a safe and positive user experience. Audio moderation involves systematically
4 minutes

It’s Scale or Fail with AI in Video Moderation

In the digital age, video content has become a driving force across online platforms, shaping the way we communicate, entertain, and share experiences. With this exponential growth, content moderation has become a critical aspect of maintaining a safe and inclusive online environment. The sheer volume of user-generated videos poses significant challenges for platforms, necessitating advanced
4 minutes

Enable and Scale AI for Podcast Moderation

The podcasting industry has experienced an explosive growth in recent years, with millions of episodes being published across various platforms every day. As the volume of audio content surges, ensuring a safe and trustworthy podcast environment becomes a paramount concern. Podcast moderation plays a crucial role in filtering and managing podcast episodes to prevent the
4 minutes

Ready or Not, AI Is Coming to Content Moderation

As digital platforms and online communities continue to grow, content moderation becomes increasingly critical to ensure safe and positive user experiences. Manual content moderation by human moderators is effective but often falls short when dealing with the scale and complexity of user-generated content. Ready or not, AI is coming to content moderation operations, revolutionizing the
5 minutes

How to Protect the Mental Health of Content Moderators? 

Content moderation has become an essential aspect of managing online platforms and ensuring a safe user experience. Behind the scenes, content moderators play a crucial role in reviewing user-generated content, filtering out harmful or inappropriate materials, and upholding community guidelines. However, the task of content moderation is not without its challenges, as it exposes moderators
4 minutes

Scaling Content Moderation Through AI Pays Off, No Matter the Investment

In the rapidly evolving digital landscape, user-generated content has become the lifeblood of online platforms, from social media giants to e-commerce websites. With the surge in content creation, content moderation has become a critical aspect of maintaining a safe and reputable online environment. As the volume of user-generated content continues to grow, manual content moderation
4 minutes

Overhaul Image Moderation with the Power of AI

In today's digital world, visual content plays a significant role in online platforms, ranging from social media to e-commerce websites. With the exponential growth of user-generated images, ensuring a safe and inclusive user experience has become a paramount concern for platform owners. However, image moderation poses unique challenges due to the sheer volume, diverse content,
4 minutes

Outsourcing Content Moderation

Outsourcing content moderation has become an essential aspect of managing online platforms in the digital age. With the exponential growth of user-generated content, businesses are faced with the challenge of maintaining a safe and inclusive environment for their users while protecting their brand reputation. To address this, many companies are turning to outsourcing content moderation
4 minutes

Designing for Trust in 2023: How to Create User-Friendly Designs that Keep Users Safe

The Significance of designing for trust in the Digital World In today's digital landscape, building trust with users is essential for operating a business online. Trust is the foundation of successful user interactions and transactions, it is key to encouraging users to share personal information, make purchases, and interact with website content. Without trust, users
5 minutes

Content Moderation: A Comprehensive Guide

Content moderation is a crucial aspect of managing online platforms and communities. It involves the review, filtering, and approval or removal of user-generated content to maintain a safe and engaging environment. To navigate this landscape effectively, it's essential to understand the terminology associated with content moderation. In this article, we'll delve into a comprehensive glossary
7 minutes

How Predators Are Abusing Generative AI

The recent rise of generative AI has revolutionized various industries, including Trust and Safety. However, this technological advancement generates new problems. Predators have found ways to abuse generative AI, using it to carry out horrible acts such as child sex abuse material (CSAM), disinformation, fraud, and extremism. In this article, we will explore how predators
4 minutes

The Future of Dating: Embracing Video to Connect and Thrive

‍In a rapidly evolving digital landscape, dating apps are continually seeking innovative ways to enhance the user experience and foster meaningful connections. One such trend that has gained significant traction is the integration of video chat features. Video has emerged as a powerful tool to add authenticity, connectivity, and fun to the dating process. In
4 minutes

How to Launch a Successful Career in Trust and Safety‍

Before diving into the specifics of launching a career in Trust and Safety, it's important to have a clear understanding of what this field entails. Trust and Safety professionals are responsible for maintaining a safe and secure environment for users on digital platforms. This includes identifying and addressing harmful content, developing policies to prevent abuse,
5 minutes

Prevent unwanted content from reaching your platform

Speak to one of our experts and learn about using AI to protect your platform
Talk to an expert