Safety vs Privacy: Direct Messages

Platforms which allow users to message other users directly might want to moderate those interactions for safety purposes. However, most types of private message scanning is illegal in Europe under the e-Privacy Directive and in some cases is directly prohibited by encryption. This can pose a risk to your users and your platform — especially as private messaging channels are often used for more severe types of harm, i.e. sharing of illegal media, bullying, extortion, grooming and radicalisation.

There are different ways for platforms to still protect their communities and brand which depends on their overall Trust & Safety strategy which relies upon the following factors:

Current platform Trust & Safety risk and impact assessment
Strategic, regulatory and financial constraints
Safety, security and privacy trade-off framework

Once a platform has established their overall T&S strategy, different strategies can help with mitigating risk from private messaging:

Prevention by targeting malicious actors before they get a chance to create victims
Enhanced reporting, allowing users to report harmful content and behaviour themselves
Multiple account detection of users who have more than one account and/or are returning to a platform with another account after being banned (recidivism)

Every strategy has their pros and cons, steep reductions of harm prevalence can go hand-in-hand with processing large amounts of data, operational overhead or even risking over-enforcement. The best solution will likely be a combination of all the above strategies.

Checkstep, a Trust & Safety Software designed to manage end-to-end operations and privacy and online safety regulations, offers tools and services to help with a detailed risk assessment and implementation strategy — tailored to each platform’s needs and with built-in compliance features.

Scanning users private messages for Trust and Safety purposes? This is where privacy conflicts with trust and safety, creating confusion for platforms on how to proceed. If you are in Europe, this type of scanning is likely illegal under the e-Privacy Directive (ePD, Directive 2002/58/EC), which prohibits any type of monitoring or intercepting of messages or associated metadata (Article 5 of ePD). Often risk is contained in private communication, so what can you do about it?

First, let’s discuss how bad actors often operate. Most bad actors operate in the same way; they use a public platform service to identify potential victims; either by targeting victims who fit a specific profile, or by targeting as many users as possible. Once they have their victim, they need to move them into a 1–1 messaging system, either provided by the platform or another platform where they can continue their potentially harmful behaviour. There are limited opportunities to disrupt this flow, as messaging can be encrypted and/or it might be illegal to scan, in addition to bad actors becoming increasingly good at evading detection. You can find forums in dark corners of the internet where all kinds of bad actors discuss how they evade detection on most platforms. Detecting badness and risk is a very adversarial issue. The adversarial nature means detection is always changing. As soon as something is put in place, the bad actors change their Modus Operandi to avoid it: this is a costly game of “whack-a-mole”.

The costs vary in tangibility.. You may need ML/AI and engineering teams dedicated to the specific issue, as well as researchers and governance professionals to help manage it (more tangible).The malicious actors can also cause user churn and reputational damage to your brand (less tangible).These damaging experiences span the spectrum. Depending on your platform’s offering, you may see harms that range from bullying to threats to life . How platforms solve the issue depends on several factors:

Risks

Assessing risks within your platform/service. You likely have this intelligence already, but codifying it into a risk assessment process is good practice and positions you better for the later steps. Think of this as a data protection impact assessment but for trust and safety.

Questions to ask yourself:

What risks are you likely to face?
What are the impacts of those risks?
Are your users minors? Are there specific vulnerable users that may face a disproportionate impact from certain problem types?

Constraints

Assessing financial, platform/service risk appetite and the governance strategy of your platform/service.

Questions to ask yourself:

How much risk are you willing to accept?
Is there a calculated approach to limiting harm that balances enforcement with platform growth?
What is the platform/service’s legal tolerance overall?

Trade-offs

Once you know what the risks are and the environment you are operating in, it is time to make some hard decisions.

Questions to ask yourself:

What is the budget, and where is it best spent?
Are you trading off privacy for safety, or security for privacy?

This forms your strategy for trust and safety.

The optimal approach to solve problems is the one that works best for you. Most platform services want to keep to a “middle of the pack” risk model, whereby they lean more to one direction, but not so much that they completely disregard the other direction. A good example would be lowered thresholds for detecting potential scammers. This is potentially more intrusive on privacy , but that risk can be mitigated by following principles such as minimal data scanning, and almost immediate deletion for benign content. An example of going too far in one direction is scanning private messages — this is an outright violation of the law.

There are different strategies that can help overcome the restriction of proactive detection in messaging. The layers of a good overall strategy are:

Strategy 1: Prevention

There is a very limited window of opportunity for action just before (or as) the bad actor approaches a victim transitioning from a public platform to a private one. Malicious actors often share traits in their profiles and behaviour. These similarities reveal patterns that can be detected by AI and actioned before a bad interaction even takes place. One possible concern with this approach is that it presumes a user is “guilty until proven innocent”. The mitigation here is a defensible, logical and transparent strategy to how that decision was made.This approach forms the bedrock of this defence.

Benefits

Prevalence of bad interactions on messaging decreases
Can also result in a drop in prevalence of “bad” content on all surfaces
Bad interactions are prevented, rather than remediated

Limitations/disadvantages

“Guilty until proven innocent” is a big concern here
Large amounts of data processed
Could easily lead to over-enforcement (without active counter measures)

Strategy 2: Enhanced reporting

The restrictions on scanning message content under the e-PD don’t apply if the user reports the content themselves. The issue is that getting users to report is notoriously difficult to do, but there are ways we can make it more likely. Prominence, ease of use, and education are the three pillars that underpin a good reporting strategy. Again, these pillars have to balance the overall user experience of the product. However, through A/B testing and iterative processes, a suitable balance can be found.

For instance, every time a new message thread is started, a banner could remind users about reporting bad behaviour. This banner can have varying messages, mixing action-based prompts together with education-based ones. It’s a chance to remind users that we rely on them to report bad actors — this is their community to moderate too!

Strategy 3: Multiple account detection

Bad actors often operate and maintain multiple accounts, knowing that sooner or later their account will be placed under a restriction. At this point, they might switch to a mirror account, also known as recidivism. Users may also spin up new accounts on an ad-hoc basis if they are banned permanently or suspended temporarily. Restricting accounts by suspension or banning is pointless if the bad actor can just use another account.

The measures taken to protect the community need to be effective, and not just dealing with an isolated problem. Repeat offenders have to be targeted and will only get swept up when they re-offend. Without analysis and strategies in place, a small number of bad actors can be responsible for a large proportion of issues.

How does this work in practice?

Basic strategy — single datapoint based:

Blacklist email addresses when users are suspended
Blacklist sign up information (telephone numbers, specific IP addresses, payment card information, paypal accounts) so that if a bad actor uses some credential from a banned account, it is flagged or blocked

Intermediate strategy — heuristic based:

Use above information to not only block/flag new sign ups, but also fan-out search when a user is blocked to look at other information
Use more nuanced details, such as profile pictures, bio descriptions, contacts in a heuristic model

Advanced strategy — ML based

Use graph data to create a risk model
Risk model can take action on proactive and retroactive basis

Of course, this all has to be applied to your particular product with its associated nuances and the assets you have available (for instance, you may not have a graph). There will also need to be an appeal process to help remediate any false positives and restore accounts.

Benefits:

Makes sanctions against users more meaningful
Prevents recidivism
Can help identify hacked accounts by identifying shared credentials, can also help accurately represent active users for reporting purposes

Limitations/disadvantages:

False positives or over enforcement can creep up, especially if ML based becomes too reliant on location signals or device IDs (student houses, family members, public computers etc.)
Disposable contact points are all too easy, although you can blacklist the common domains

Summary:

The best solution is likely a combination of all three strategies at varying levels. Ultimately, your approach to enforcement is dependent on the resources available, your tech stack, and your unique risk matrix. Each strategy has its own considerations, and a holistic data governance strategy must be in place to underpin any data processing.

How does Checkstep help?

Preventing an event from happening is possible. AI can recognize unique “badness” signals specific to your platform and take action before the harm happens. As ever with our service, everything is configurable. The AI is tuned to your specific service, and you have control over the thresholds so the risk model of your service can be accurately represented.

Preventing recidivism should also be high on your agenda, and we can develop bespoke strategies to help you prevent known bad actors using multiple accounts. We have an experienced in-house team who have worked on all kinds of problem types at scale for top tech companies. Recidivism is usually seen as the low-hanging fruit of content moderation. With the appropriate strategies in place, the prevalence of recidivist behaviour can see a notable decrease.

Reporting is a trial and error issue, but once the reports are coming into the platform the key to success is accuracy and automation. The models can make a first pass at reports raised by users, and can even take automated actions to suspend account access.The thresholds at which this happens are completely customisable and in your control. At the very least, the reports can be sorted into different queues and prioritised based on custom preferences so that human reviewers’ time can be utilised effectively. The automated approach can also protect your human reviewers from the worst harms by taking automated actions while limiting them to distressing content.

All of the actions on each strategy are customisable too. An account can be blocked, taken down, placed into read-only mode automatically. You can even combine human and AI review to make a decision..

Once this is done, a notification is automatically generated explaining to the user what action has been taken, why it was taken, and how to appeal this decision.. Again, notices are customisable so you can balance transparency with safety and protect your brand identity. Ensuring a user appeal process is in place complies with legislations and reduces the risk of false positives while also making users feel valued and heard

Of course this applies to automated detection as required by the EU, but we also know that users often don’t report incidents regardless of how frictionless the process is. Despite its name, proactive detection requires a triggering event to happen, which means the damage is done by the time any action is taken. Every strategy needs an appeal process to ensure fairness, remediate false positives, and give users a good experience.

This also keeps your brand protected on multiple legislative fronts, covering obligations under existing legislation such as the e-Privacy Directive, GDPR Article 22 (automated decision making) and upcoming legislation such as the Digital Services Act (DSA).

The regulatory landscape is moving swiftly, in the EU especially, but North America and APAC are also moving fast. Checkstep even has built-in automated transparency reporting that complies with upcoming DSA requirements. We also know you need trust and safety systems to work in your wider governance operation, so we invest in holistic data governance practices and security as well

About the author

Kieron Maddison has worked in compliance for more than five years, including OneTrust, Trainline and Meta (Facebook). Kieron’s role at Meta was a Program Manager where he was the link between trust and safety, privacy, security and product/engineering. In his role at Meta he was instrumental in defining new approaches for program management and worked to spin up new initiatives from support to the oversight board. Kieron is passionate about protecting people’s rights, particularly children and young people, and is highly analytical in balancing this with business needs.

Safety vs Privacy: Direct Messages

Share

Share

Risks

Questions to ask yourself:

Questions to ask yourself:

Questions to ask yourself:

Benefits

Limitations/disadvantages

Basic strategy — single datapoint based:

Intermediate strategy — heuristic based:

Advanced strategy — ML based

Benefits:

Limitations/disadvantages:

How does Checkstep help?

Previous

European Parliament Approves the AI Act

Next

Expert’s Corner with T&S Expert Manpreet Singh

Want to see our AI content moderation platform for yourself?