This month’s expert is Alexandra Koptyaeva. She wears many hats, one of which is the Trust and Safety Lead at Heyday.
Alexandra has been working in Trust and Safety for over three and a half years starting in content moderation, investigations, and quality assurance, and later leading a team and managing company processes. Initially, she started as a content moderator in 2018 with a focus on fraud in e-commerce. Back then, T&S was not as widely known as it is now. Her first role sparked her interest in investigations — she enjoyed the thrill of detecting fake brands based on the smallest details (e.g., obscuring the brand logo). Over the years, she began to grow professionally, and in September 2022, was hired as a Trust and Safety Lead at Heyday — a new social media app. This role requires her to use all the experience and knowledge she has accumulated — she’s now responsible for everything starting from content moderation and policy management to handling escalations to NCMEC and Crisis Response services. She is excited by how her work can shape T&S at Heyday and looks forward to creating new processes from scratch, and she’s especially looking forward to having alpha and beta users on our platform.
1. How can social media platforms better detect CSAM?
There are extensive resources available to prevent CSAM from spreading beforehand and detect it in a timely manner. For example, partnering with the right Content Moderation solution would be the first step, and ensuring that the AI filters cover this topic. At Heyday, we’re also going to implement the CSAM Keyword Hub as an extra precaution. It’s open source, which is especially handy for platforms with a limited budget for T&S.
When it comes to actual detection, it’s certainly not the easiest thing to catch. Users are very creative, and certain abbreviations, emojis, or combinations of signals might indicate suspicious behavior. Personally, I’m trying to read the whitepapers, follow the updates, and attend webinars about this topic to stay informed. In addition, I also have a profile on the Heyday app so I can see what’s happening from the user’s perspective.
As I’ve been previously managing teams of Content Moderators, I’d say that CSAM detection and timely actions are a vital part of human moderation as well. Meaning that keeping the team updated, organizing regular team meetings, calibrations, and presenting recent trends should be on the agenda. The specialists should know how to recognize CSAM, what to pay attention to, what its signs are, and why it’s important to report it to senior managers and escalate it to law enforcement. If teams don’t understand its implications, then no algorithms in place will help to prevent it if a moderator approved a profile as a “false positive” and didn’t notice alarming signs.
In parallel with the internal processes, [CSAM prevention] should also be reflected in community guidelines. It’s essential to be transparent and let users know how the platform positions itself and what actions will be taken against someone who might attempt to spread CSAM.
2. Nudity and social media platforms have often had some overlap. How can platforms better position themselves in terms of moderation to avoid instances of prostitution, flashing, and possible sextortion or blackmailing?
Educating users is the key. As a policy manager, I can create the best community guidelines; but, how can one be sure that they were read, understood, and accepted?
From the internal perspective, it’s helpful to provide clear guidelines with enforcement actions for each violation category. For instance, ban a user for some duration if they upload inappropriate content, or restrict their activity for some time. On the user’s end, I do think it’s important that they receive a message with a brief explanation as to why their content was removed. It’s a small step but at least it might make a user think twice before doing it again.
Depending on the targeted age group, I’d also suggest organizing monthly campaigns to educate users and simply explain to them why it’s not okay to share this content, and how to avoid sextortion or blackmailing if it comes to that. There’s no need to scare the users away; based on my previous experience in moderation, I feel like some users (especially younger ones) are sharing nudes with others due to a lack of awareness. The person on the other end gained their trust, and they think that nothing bad will happen because “they’re friends”. It was tough for me to read user reports and complaints when they were later bullied or blackmailed. Creating pop-ups and asking “Are you sure you want to share it, etc.” might save someone’s sanity and dignity later.
If a platform is targeting 18+ users, it can be less strict about its guidelines. However, I’d still recommend having some protection in place to prevent flashing and prostitution. It can be done with the help of AI detection, effective moderation, and enforced community guidelines.
At Heyday, for example, we’ll be permanently banning users who’re confirmed for prostitution, and we’ll be implementing some ban durations for uploading or spreading inappropriate content.
3. How important is data privacy while moderating content?
From the platform’s perspective, I believe that protecting users’ privacy is a must. Apart from the formal documents where it’s reflected (e.g., Privacy Policy for users and NDA for the staff), it should be also reinforced at the company level.
As a manager, I consider it one of my responsibilities to explain to the team the consequences of exposing someone’s information by any means, or of engaging with users from the moderator’s account. If by any means I notice that either of these instructions was not followed, I’ll immediately have a follow-up that might lead to the contract termination.
Let me give you two hypothetical scenarios:
(1) A content moderator decided to take a photo of their screen and share it on their social media. A username or a profile picture was visible, so someone might have identified this person. Not only does it put the end user at risk but it also jeopardizes the platform’s integrity and undermines trust in it. I don’t think that anyone would want to be in this situation as a user of any platform.
(2) A user sent a personal message or a friend request through the platform to a content moderator (if applicable), or a CM themselves decided to engage directly with a user. In both cases, I regularly remind my team to never engage, as they represent the company when using their work profiles.
I understand that there might be stressful situations when one feels like they have to do something (e.g., when working on time-sensitive cases) to help a user but that’s why each platform should have internal policies in place that have to be regularly reinforced, so the moderators know what to do.
4. There’s often a thin line between over-moderation and censorship. How do you distinguish this?
I’ll answer this question through an example: when I began working closely with content moderators from different countries, I noticed that they tend to be more or less strict with the content depending on their origin or place of birth. Although the platform has the same internal guidelines for everyone, they were interpreted differently by specialists from Latin America, Southeast Asia, or from Europe. Some used to remove any content that was even slightly violating the rules, while others passed it because it was a “cultural norm” for them and they didn’t find it suspicious.
To make sure that everyone not only understands but also enforces policies correctly across the company, I would suggest the following:
(1) Creating very detailed internal guidelines with as few grey areas as possible, so there’s no room for self-interpretation. If anything is not clear or a specialist is in doubt, [moderators] should escalate it to their team leaders and ask for a second opinion.
(2) Organizing regular calibrations with a team — collecting several random examples from different language markets and either asking everyone to reply in a form (e.g. Google Forms), or bring them up during a team meeting, and ask specialists to vote in the chat. I’ve participated in weekly team calibrations before and it was always fun and very engaging.
(3) Doing Quality Assurance and passive shadowing with specialists — this way, a Manager can identify some inconsistencies and help improve the overall quality of moderation.
On a user level — listen to your community and what they’re saying about your company. I’d imagine that customer support might receive emails or Zendesk requests asking to explain why certain content was removed. In a way, it could also be a good resource for a policy manager and content moderators to analyze what could be improved and maybe implement some changes.
5. When thinking about online safety, how much responsibility is on the end-user and how much do you believe is the platform’s responsibility?
Depending on a targeted location, there might be some regulations in place when it comes to online safety, meaning that whether one wants it or not, a platform will be deemed responsible. I can’t express my opinion regarding some legislation, as it’s currently less strict in the US, especially when targeting the 18+ community. But I do think that it’s a question of liability and possible negligence if a platform knows what’s happening with its users, has all the means to prevent it, and yet it doesn’t for whatever reason.
Talking about the end-user, it depends on the situation. For instance, if a user joined live streaming with their camera on and got bullied by the community for no reason, then it’s probably hardly their fault. On the other hand, if a platform has policies in place to ensure that a user knows about possible consequences of sharing their personal information, for example, and yet a user still does it — then I wouldn’t say that it’s entirely a platform’s fault.
6. How can platforms better train end-users about the possible types of online harm without coming across as controlling?
a. How to ensure they’re not over moderated?
b. Being a supporter rather than forcing communities to say what’s right?
I’d say it depends on the type of platform, the age of targeted users, and possible threats it may have. Apart from the Community Guidelines which usually mention some online risks, some platforms also have pop-up notifications to double-check if a person wants to send certain information, or they blur some content and inform a user that it might be harmful or inappropriate.