Safety vs Privacy: Direct Messages

Platforms which allow users to message other users directly might want to moderate those interactions for safety purposes. However, most types of private message scanning is illegal in Europe under the e-Privacy Directive and in some cases is directly prohibited by encryption. This can pose a risk to your users and your platform — especially as private messaging channels are often used for more severe types of harm, i.e. sharing of illegal media, bullying, extortion, grooming and radicalisation.

There are different ways for platforms to still protect their communities and brand which depends on their overall Trust & Safety strategy which relies upon the following factors:

  1. Current platform Trust & Safety risk and impact assessment
  2. Strategic, regulatory and financial constraints
  3. Safety, security and privacy trade-off framework

Once a platform has established their overall T&S strategy, different strategies can help with mitigating risk from private messaging:

  1. Prevention by targeting malicious actors before they get a chance to create victims
  2. Enhanced reporting, allowing users to report harmful content and behaviour themselves
  3. Multiple account detection of users who have more than one account and/or are returning to a platform with another account after being banned (recidivism)

Every strategy has their pros and cons, steep reductions of harm prevalence can go hand-in-hand with processing large amounts of data, operational overhead or even risking over-enforcement. The best solution will likely be a combination of all the above strategies.

Checkstep, a Trust & Safety Software designed to manage end-to-end operations and privacy and online safety regulations, offers tools and services to help with a detailed risk assessment and implementation strategy — tailored to each platform’s needs and with built-in compliance features.

Overview of Trust & Safety strategies, covering prevention, enhanced reporting, and multiple account detection
Trust & Safety strategy overview

Scanning users private messages for Trust and Safety purposes? This is where privacy conflicts with trust and safety, creating confusion for platforms on how to proceed. If you are in Europe, this type of scanning is likely illegal under the e-Privacy Directive (ePD, Directive 2002/58/EC), which prohibits any type of monitoring or intercepting of messages or associated metadata (Article 5 of ePD). Often risk is contained in private communication, so what can you do about it?

First, let’s discuss how bad actors often operate. Most bad actors operate in the same way; they use a public platform service to identify potential victims; either by targeting victims who fit a specific profile, or by targeting as many users as possible. Once they have their victim, they need to move them into a 1–1 messaging system, either provided by the platform or another platform where they can continue their potentially harmful behaviour. There are limited opportunities to disrupt this flow, as messaging can be encrypted and/or it might be illegal to scan, in addition to bad actors becoming increasingly good at evading detection. You can find forums in dark corners of the internet where all kinds of bad actors discuss how they evade detection on most platforms. Detecting badness and risk is a very adversarial issue. The adversarial nature means detection is always changing. As soon as something is put in place, the bad actors change their Modus Operandi to avoid it: this is a costly game of “whack-a-mole”.

The costs vary in tangibility.. You may need ML/AI and engineering teams dedicated to the specific issue, as well as researchers and governance professionals to help manage it (more tangible).The malicious actors can also cause user churn and reputational damage to your brand (less tangible).These damaging experiences span the spectrum. Depending on your platform’s offering, you may see harms that range from bullying to threats to life . How platforms solve the issue depends on several factors:


Assessing risks within your platform/service. You likely have this intelligence already, but codifying it into a risk assessment process is good practice and positions you better for the later steps. Think of this as a data protection impact assessment but for trust and safety.

Questions to ask yourself:

  1. What risks are you likely to face?
  2. What are the impacts of those risks?
  3. Are your users minors? Are there specific vulnerable users that may face a disproportionate impact from certain problem types?


Assessing financial, platform/service risk appetite and the governance strategy of your platform/service.

Questions to ask yourself:

  1. How much risk are you willing to accept?
  2. Is there a calculated approach to limiting harm that balances enforcement with platform growth?
  3. What is the platform/service’s legal tolerance overall?


Once you know what the risks are and the environment you are operating in, it is time to make some hard decisions.

Questions to ask yourself:

  1. What is the budget, and where is it best spent?
  2. Are you trading off privacy for safety, or security for privacy?

This forms your strategy for trust and safety.

Responsibility, Adherence to Legislation, Transparency, Remediation, Fairness
Points to consider when building your Trust & Safety strategy

The optimal approach to solve problems is the one that works best for you. Most platform services want to keep to a “middle of the pack” risk model, whereby they lean more to one direction, but not so much that they completely disregard the other direction. A good example would be lowered thresholds for detecting potential scammers. This is potentially more intrusive on privacy , but that risk can be mitigated by following principles such as minimal data scanning, and almost immediate deletion for benign content. An example of going too far in one direction is scanning private messages — this is an outright violation of the law.

There are different strategies that can help overcome the restriction of proactive detection in messaging. The layers of a good overall strategy are:

Strategy 1: Prevention

There is a very limited window of opportunity for action just before (or as) the bad actor approaches a victim transitioning from a public platform to a private one. Malicious actors often share traits in their profiles and behaviour. These similarities reveal patterns that can be detected by AI and actioned before a bad interaction even takes place. One possible concern with this approach is that it presumes a user is “guilty until proven innocent”. The mitigation here is a defensible, logical and transparent strategy to how that decision was made.This approach forms the bedrock of this defence.


  1. Prevalence of bad interactions on messaging decreases
  2. Can also result in a drop in prevalence of “bad” content on all surfaces
  3. Bad interactions are prevented, rather than remediated


  1. “Guilty until proven innocent” is a big concern here
  2. Large amounts of data processed
  3. Could easily lead to over-enforcement (without active counter measures)

Strategy 2: Enhanced reporting

The restrictions on scanning message content under the e-PD don’t apply if the user reports the content themselves. The issue is that getting users to report is notoriously difficult to do, but there are ways we can make it more likely. Prominence, ease of use, and education are the three pillars that underpin a good reporting strategy. Again, these pillars have to balance the overall user experience of the product. However, through A/B testing and iterative processes, a suitable balance can be found.

For instance, every time a new message thread is started, a banner could remind users about reporting bad behaviour. This banner can have varying messages, mixing action-based prompts together with education-based ones. It’s a chance to remind users that we rely on them to report bad actors — this is their community to moderate too!

The Steps of Enhanced Reporting: Prominence, Ease of Use, Education, Benefits & Limitations
Enhanced Reporting steps

Strategy 3: Multiple account detection

Bad actors often operate and maintain multiple accounts, knowing that sooner or later their account will be placed under a restriction. At this point, they might switch to a mirror account, also known as recidivism. Users may also spin up new accounts on an ad-hoc basis if they are banned permanently or suspended temporarily. Restricting accounts by suspension or banning is pointless if the bad actor can just use another account.

The measures taken to protect the community need to be effective, and not just dealing with an isolated problem. Repeat offenders have to be targeted and will only get swept up when they re-offend. Without analysis and strategies in place, a small number of bad actors can be responsible for a large proportion of issues.

Multiple Account Detection.
Multiple Account Detection strategies, benefits and limitations

How does this work in practice?

Basic strategy — single datapoint based:

  • Blacklist email addresses when users are suspended
  • Blacklist sign up information (telephone numbers, specific IP addresses, payment card information, paypal accounts) so that if a bad actor uses some credential from a banned account, it is flagged or blocked

Intermediate strategy — heuristic based:

  • Use above information to not only block/flag new sign ups, but also fan-out search when a user is blocked to look at other information
  • Use more nuanced details, such as profile pictures, bio descriptions, contacts in a heuristic model

Advanced strategy — ML based

  • Use graph data to create a risk model
  • Risk model can take action on proactive and retroactive basis

Of course, this all has to be applied to your particular product with its associated nuances and the assets you have available (for instance, you may not have a graph). There will also need to be an appeal process to help remediate any false positives and restore accounts.


  1. Makes sanctions against users more meaningful
  2. Prevents recidivism
  3. Can help identify hacked accounts by identifying shared credentials, can also help accurately represent active users for reporting purposes


  1. False positives or over enforcement can creep up, especially if ML based becomes too reliant on location signals or device IDs (student houses, family members, public computers etc.)
  2. Disposable contact points are all too easy, although you can blacklist the common domains


The best solution is likely a combination of all three strategies at varying levels. Ultimately, your approach to enforcement is dependent on the resources available, your tech stack, and your unique risk matrix. Each strategy has its own considerations, and a holistic data governance strategy must be in place to underpin any data processing.

How does Checkstep help?

Preventing an event from happening is possible. AI can recognize unique “badness” signals specific to your platform and take action before the harm happens. As ever with our service, everything is configurable. The AI is tuned to your specific service, and you have control over the thresholds so the risk model of your service can be accurately represented.

Preventing recidivism should also be high on your agenda, and we can develop bespoke strategies to help you prevent known bad actors using multiple accounts. We have an experienced in-house team who have worked on all kinds of problem types at scale for top tech companies. Recidivism is usually seen as the low-hanging fruit of content moderation. With the appropriate strategies in place, the prevalence of recidivist behaviour can see a notable decrease.

Reporting is a trial and error issue, but once the reports are coming into the platform the key to success is accuracy and automation. The models can make a first pass at reports raised by users, and can even take automated actions to suspend account access.The thresholds at which this happens are completely customisable and in your control. At the very least, the reports can be sorted into different queues and prioritised based on custom preferences so that human reviewers’ time can be utilised effectively. The automated approach can also protect your human reviewers from the worst harms by taking automated actions while limiting them to distressing content.

All of the actions on each strategy are customisable too. An account can be blocked, taken down, placed into read-only mode automatically. You can even combine human and AI review to make a decision..

Once this is done, a notification is automatically generated explaining to the user what action has been taken, why it was taken, and how to appeal this decision.. Again, notices are customisable so you can balance transparency with safety and protect your brand identity. Ensuring a user appeal process is in place complies with legislations and reduces the risk of false positives while also making users feel valued and heard

Of course this applies to automated detection as required by the EU, but we also know that users often don’t report incidents regardless of how frictionless the process is. Despite its name, proactive detection requires a triggering event to happen, which means the damage is done by the time any action is taken. Every strategy needs an appeal process to ensure fairness, remediate false positives, and give users a good experience.

This also keeps your brand protected on multiple legislative fronts, covering obligations under existing legislation such as the e-Privacy DirectiveGDPR Article 22 (automated decision making) and upcoming legislation such as the Digital Services Act (DSA).

The regulatory landscape is moving swiftly, in the EU especially, but North America and APAC are also moving fast. Checkstep even has built-in automated transparency reporting that complies with upcoming DSA requirements. We also know you need trust and safety systems to work in your wider governance operation, so we invest in holistic data governance practices and security as well

About the author

Kieron Maddison has worked in compliance for more than five years, including OneTrust, Trainline and Meta (Facebook). Kieron’s role at Meta was a Program Manager where he was the link between trust and safety, privacy, security and product/engineering. In his role at Meta he was instrumental in defining new approaches for program management and worked to spin up new initiatives from support to the oversight board. Kieron is passionate about protecting people’s rights, particularly children and young people, and is highly analytical in balancing this with business needs.

More posts like this

We want content moderation to enhance your users’ experience and so they can find their special one more easily.

The Psychology Behind AI Content Moderation: Understanding User Behavior

Social media platforms are experiencing exponential growth, with billions of users actively engaging in content creation and sharing. As the volume of user-generated content continues to rise, the challenge of content moderation becomes increasingly complex. To address this challenge, artificial intelligence (AI) has emerged as a powerful tool for automating the moderation process. However, user…
5 minutes

Content Moderation for Virtual Reality

What is content moderation in virtual reality? Content moderation in virtual reality (VR) is the process of monitoring and managing user-generated content within VR platforms to make sure it meets certain standards and guidelines. This can include text, images, videos, and any actions within the 3D virtual environment. Given the interactive and immersive nature of…
31 minutes

The Impact of Trust and Safety in Marketplaces

Nowadays, its no surprise that an unregulated marketplace with sketchy profiles, violent interactions, scams, and illegal products is doomed to fail. In the current world of online commerce, trust and safety are essential, and if users don't feel comfortable, they won’t buy. As a marketplace owner, ensuring that your platform is a safe and reliable…
9 minutes

How AI is Revolutionizing Content Moderation in Social Media Platforms

Social media platforms have become an integral part of our lives, connecting us with friends, family, and the world at large. Still, with the exponential growth of user-generated content, ensuring a safe and positive user experience has become a daunting task. This is where Artificial Intelligence (AI) comes into play, revolutionizing the way social media…
3 minutes

Customizing AI Content Moderation for Different Industries and Platforms

With the exponential growth of user-generated content across various industries and platforms, the need for effective and tailored content moderation solutions has never been more apparent. Artificial Intelligence (AI) plays a major role in automating content moderation processes, but customization is key to address the unique challenges faced by different industries and platforms. Understanding Industry-Specific…
3 minutes

Emerging Threats in AI Content Moderation : Deep Learning and Contextual Analysis 

With the rise of user-generated content across various platforms, artificial intelligence (AI) has played a crucial role in automating the moderation process. However, as AI algorithms become more sophisticated, emerging threats in content moderation are also on the horizon. This article explores two significant challenges: the use of deep learning and contextual analysis in AI…
4 minutes

The Impact of AI Content Moderation on User Experience and Engagement

User experience and user engagement are two critical metrics that businesses closely monitor to understand how their products, services, or systems are being received by customers. Now that user-generated content (UGC) is on the rise, content moderation plays a main role in ensuring a safe and positive user experience. Artificial intelligence (AI) has emerged as…
4 minutes

Future Technologies : The Next Generation of AI in Content Moderation 

With the exponential growth of user-generated content on various platforms, the task of ensuring a safe and compliant online environment has become increasingly complex. As we look toward the future, emerging technologies, particularly in the field of artificial intelligence (AI), are poised to revolutionize content moderation and usher in a new era of efficiency and…
3 minutes

Global Perspective : How AI Content Moderation Differs Across Cultures and Religion

The internet serves as a vast platform for the exchange of ideas, information, and opinions. However, this free exchange also brings challenges, including the need for content moderation to ensure that online spaces remain safe and respectful. As artificial intelligence (AI) increasingly plays a role in content moderation, it becomes essential to recognize the cultural…
5 minutes

Ethical Consideration in AI Content Moderation : Avoiding Censorship and Biais

Artificial Intelligence has revolutionized various aspects of our lives, including content moderation on online platforms. As the volume of digital content continues to grow exponentially, AI algorithms play a crucial role in filtering and managing this content. However, with great power comes great responsibility, and the ethical considerations surrounding AI content moderation are becoming increasingly…
3 minutes

‍The Future of Dating: Embracing Video to Connect and Thrive

In a rapidly evolving digital landscape, dating apps are continually seeking innovative ways to enhance the user experience and foster meaningful connections. One such trend that has gained significant traction is the integration of video chat features. Video has emerged as a powerful tool to add authenticity, connectivity, and fun to the dating process. In…
4 minutes

17 Questions Trust and Safety Leaders Should Be Able to Answer 

A Trust and Safety leader plays a crucial role in ensuring the safety and security of a platform or community. Here are 17 important questions that a Trust and Safety leader should be able to answer.  What are the key goals and objectives of the Trust and Safety team? The key goals of the Trust…
6 minutes

What is Content Moderation: a Guide

Content moderation is one of the major aspect of managing online platforms and communities. It englobes the review, filtering, and approval or removal of user-generated content to maintain a safe and engaging environment. In this article, we'll provide you with a comprehensive glossary to understand the key concepts, as well as its definition, challenges and…
15 minutes

A Guide to Detect Fake User Accounts

Online social media platforms have become an major part of our daily lives: with the ability to send messages, share files, and connect with others, these networks provide a way, for us users, to stay connected. Those platforms are dealing with a rise of fake accounts and online fraudster making maintaining the security of their…
4 minutes

How Predators Are Abusing Generative AI

The recent rise of generative AI has revolutionized various industries, including Trust and Safety. However, this technological advancement generates new problems. Predators have found ways to abuse generative AI, using it to carry out horrible acts such as child sex abuse material (CSAM), disinformation, fraud, and extremism. In this article, we will explore how predators…
4 minutes

Prevent unwanted content from reaching your platform

Speak to one of our experts and learn about using AI to protect your platform
Talk to an expert