Click here to read our latest report “Transmisogyny, Colonialism and Online Anti‐Trans Activism Following Violent Extremist Attacks in the US and EU”

Preventing Extremist Violence Using Existing Content Moderation Tools

Preventing Extremist Violence Using Existing Content Moderation Tools
28th March 2024 Kris McGuffie
In Gaming, Insights

Kris McGuffie is a member of the Extremism and Gaming Research Network (EGRN). The EGRN works together to uncover how malign actors exploit gaming, to build resilience in gaming communities to online harms, and to discover new ways to use gaming for good.

Accurate content moderation can save lives in a number of ways, including by acting as a warning system about the risks of offline extremist violence and removing the fuel that incites extremist violence. Disrupting, removing, and reporting content that portends extremist violence can be done with current detection tools and infrastructure. A benefit of the explosion in automated content moderation approaches is that any number of widely accessible automated detection tools can be used on known violent extremist user generated content (UGC) to improve a given platform’s detection methods through fine-tuning and customisation. Preventing extremist violence by detecting precise online behavioural patterns with content moderation tools requires a willingness to collect, clean, and share better data. Data- and resource-sharing between industry, civil society, and government is essential for keeping pace with extremist bad actors and those who support them or share their content unwittingly.

This Insight outlines a way forward with detecting the online UGC of people who commit extremist violence. This piece is written for an audience of practitioners who make decisions about which automated content moderation tools are used and how those tools are deployed. 

Existing Tools Provide Early Detection & Removal 

Established automated detection can be used to identify and address users who may commit acts of extremist violence. Automated content moderation employs anything from basic algorithms to much more complex systems built from various forms of AI. Content moderation tools have become more widespread and accessible, particularly with the release of large language models. These models can make creating and deploying detection tools more fluid by assisting with tasks like editing code and augmenting data used for model training. Basic testing can validate the specific combination of current models and tools most likely to act as an early warning system for offline extremist violence risk. 

Analysing the digital trail of violent extremist offenders to select precisely which set of detection models will flag ‘low-prevalence, high-harm’ content is straightforward, but it requires prioritising time and resources. There are multiple ways of signalling extremist violence potential online: through the content itself and through metadata such as posting patterns and proximity to users generating similar high-risk content that broadcasts violence potential. The language used in violent extremist manifestos and on forums includes high-risk linguistic signals, such as expressions of an urgent need to commit out-group violence to ensure the survival of an in-group. Detection tools can be directed towards such content, which platforms may garner through data- and resource-sharing partnerships. 

Recent research into online posting and offline extremist violence helps to narrow the potential pool of concerning online behaviours, notably citing linguistic kinship signals combined with outgroup hate and/or condoning violence, as well as posting quantity over time as important indicators of offline extremist violence. Kinship in language signifies that the audience being addressed is family through terms such as ‘brother’, ‘sister’, or related familial words and phrases indicating family-like bonds. Such insights can be used together with detection tools on known violent extremist content to adjust detection and mitigation systems to address the characteristics most strongly associated with extremist violence potential. 

Detection tools designed around common policy categories like hate speech and violence will naturally provide coverage for some portion of an AI-based violent extremist threat detection system. In contrast, prosocial detection approaches may be especially helpful in identifying the type of kinship signals that help cement relationships established for both positive and odious purposes. While it’s common to think about content moderation occurring on the level of individual comments, contemporary tooling enables larger patterns to be identified across conversations or even through specific areas of a platform, such as a user group, game, thread, or channel. Detection at the comment, conversation, and community levels saves time and resources in sorting out neutral or prosocial content from nefarious content. In this way, an otherwise neutral or healthy conversation with only a few problematic posts can be separated from a group of users who consistently post high-risk content. Similarly, a single user posting content indicating a high risk for extremist violence will stand out with an approach that identifies individual, conversation, and community-level patterns. 

Multiple AI-based approaches to detection can identify users, conversations, and communities, signalling a high likelihood of extremist violence. Ultimately, sequencing multiple approaches to determine what is most useful for a given behaviour, community, and platform will likely be the most efficient and effective tactic in the long term. Probabilistic, deterministic, and symbolic approaches can each help identify and manage both the type of high-risk violent extremist content that is anticipated, as well as new patterns of behaviour that indicate violence risk. 

Neuro-symbolic approaches are being used in the content detection space, and published research on this in relation to behavioural detection is promising. While best when layered with other methods, deterministic rules-based approaches enable companies to cut through the noise of prolific online communities to ensure that only high-confidence content and users are flagged. Consider a rule such as: if kinship signals exist and outgroup hate or condoning violence appears, then seek pre-specified metadata signals, which would screen out lower-risk content and then flag the content for medium or high risk for extremist violence. When confidence weighting is applied at the right level for a given model use case, such approaches enable platforms to systematically distinguish nuisance behaviours from severely harmful dynamics. 

Benefits of Improved Early Warning Systems

With better real-time detection, platforms can be empowered to break up harmful and criminal communities, helping to damage online influence processes that lead to extremist violence. The precedent of reporting only explicit threats and providing law enforcement with access to user data after a user has committed a violent extremist crime can be more often supplanted by an early warning system that destroys the pipeline of violence. We are moving beyond considering only physical threat frameworks and terrorist group affiliation requirements that left us insufficiently responsive to genocide, insurrection, and various forms of violent extremism planned or incited online. 

Improved early warning systems help platforms to enforce policy and remain compliant with their legal and ethical obligations to users, society, and shareholders. We have seen early warning systems in the post-9/11 era become synonymous with racial and xenophobic profiling, which is why such systems must be used with analytical and ethical vigour, along with guardrails that ensure that both conventional and digital due process standards are in place.

Data- and Resource-Sharing are Critical

Data sharing is essential to detecting violent extremist threats. Platforms, law enforcement, and researchers have access to high-risk data that is directly connected to extremist violence. Platforms also have access to their specific online traffic, significant computing power, and technology staff who can leverage and make use of often messy data. Online data in its raw form must be cleaned and annotated so that automated systems can be fine-tuned for better detection and data privacy standards can be upheld. Data sharing related to other online behaviours like child sexual exploitation and abuse are already established, including the Tech Coalition’s Lantern program. Lantern enables the sharing of both user-generated content and metadata of individuals who have violated policies related to child sexual exploitation and abuse. 

The Digital Services Act also outlines certain types of data-sharing requirements for “very large online platforms.” Ethical data sharing is best managed within systems that recognise international human rights standards, comply with the most stringent data privacy requirements, and protect the individuals managing, reviewing, and analysing harmful data. The norm of data- and resource-sharing is being created under the pressure of increasingly clear connections between online behaviours and offline harms. The recent notoriety of online sexual extortionists (sextortionists) whose criminal behaviours have led to the suicide of minors is contributing to this painful but crucial understanding of online-to-offline harms.

There are myriad ways that data- and resource-sharing might take shape. The following examples reinforce that we have everything required to better identify extremist violent offenders before they commit vicious acts of violence:

  • Researchers can share high-risk, violent extremist content with colleagues at tech platforms who demonstrate willingness to use the data for testing. Test results would be shared across partnerships for the public good, comprising one facet of a whole-of-society approach. Proper safeguarding of such high-risk information is essential.
  • Tech companies can provide anonymised data sets for proper testing and analysis by researchers, sharing only platform type and any other metadata required to complete analysis. Open-source research products produced by researchers help industry and society equally.
  • Service providers (such as Amazon Web Services, Google Cloud, and Microsoft Azure) can offset the prohibitive costs of computing power used to build and deploy content moderation tools by offering discounts for researchers and small companies who prove that the work they are doing will be shared openly and benefit the public good. Levelling the playing field with cloud computing access enables greater objectivity in the research itself, disentangling it from funding sources that may restrict research scope and ensuring that research that benefits global society can be shared among stakeholders.
  • Generative AI companies can offer reduced-cost access to groups who can prove their projects will result in information and insights that can be shared to protect society and prevent harm.

With all resource- and data-sharing efforts, the fruit of essential analysis needs to be shared for society to benefit. The era of black box systems designed only for profit will end if we all take greater responsibility for the online world we have created collectively. 

If the last year of AI hype and tumult has taught us anything, it is that automated detection methods need to match those of our adversaries, and would-be violent extremist offenders are hiding in plain sight on platforms used by all of us. It is reasonable and straightforward to fulfil the life-saving mission of content moderation by ensuring that current technology is used to disrupt online behaviours that result in deaths and societal harm. By making better use of existing tools and sharing resources and insight, we can prevent more of the senseless loss of lives in places of worship, classrooms, grocery stores, concerts, and other public places where communities gather.