Click here to read our latest report “Beyond Extremism: Platform Responses to Online Subcultures of Nihilistic Violence”

AI-led Content Moderation Tools: Are They The Answer To Combatting Online Extremism in Canada?

AI-led Content Moderation Tools: Are They The Answer To Combatting Online Extremism in Canada?
9th March 2026 Arunita Das
In Insights

Social media has proven to be a significant medium fomenting violent extremism, the circulation and exchange of online hate, and the organising of extremist networks. In the last decade, Canada has witnessed a surge of extremists who post online

To tackle the growth of extremist content online, several global companies are competing to develop accurate Artificial Intelligence (AI)-led technologies that detect violent extremist speech. These models are trained on large amounts of data to detect violent and extremist speech. 

This Insight synthesises the advantages and disadvantages of AI technologies used to screen materials for Canadian extremist content. Unlike the United States, Canada’s constitutional framework protects freedom of expression with “reasonable limits,” as reflected in proposed bills such as Bill C-63, the Online Harms Act, which is designed to combat extremist material online. Given this approach in Canada, this Insight focuses on AI-led moderation within the Canadian context. 

The Insight argues that social media platforms struggle to consistently detect violent extremist content, particularly when the content itself is constantly evolving and intentionally evading moderation to confuse systems. AI-powered content moderation platforms, such as the innovative Multimodal Discussion Transformer (mDT) tool developed by Canadian researchers, can prove useful in detecting extremist speech when these technologies are continually refined with the expertise of human moderators. The Insight concludes by addressing how to incentivise social media companies to use AI-led tools like mDT. 

The Good and Bad: AI-led Extremist Content Moderation 

With millions of users interacting online, AI-led content moderation can be a valuable tool in detecting violent speech. AI tools can range from machine-learning classifiers trained on known extremist language and symbols to newer large language models (LLMs) used to provide secondary assessments of content flagged as potentially violent or extremist. Although platforms like Facebook, Instagram, and TikTok do not disclose which kind of platforms they use for moderation, they have begun integrating large language models (LLMs) into their moderation workflow to provide secondary assessments of potentially extremist content. However, violent extremist content continues to exist online, threatening the safety of targeted demographics. Thus, the challenge lies in detecting violent extremist content that uses ambiguous language and evolving slang to spread extremism.  

While AI-led moderation has notable draws, these systems are not perfect. Research shows that Canadian far-right extremists, like members of the Canadian white supremacist, extremist movement Diagolon, openly post violent, racist, sexist, xenophobic, antisemitic, and anti-LGBTQ+ content, with popular profiles on Facebook, Instagram, and YouTube. Although Meta’s public abuse and “Hateful Conduct” policies prohibit dehumanising and exclusionary language targeting protected groups, Diagolon’s content remains online. 

Currently, there is a trend towards rising vitriol directed at South Asians in Canada, often stemming from extremist actors under Diagolon. Research by the Institute for Strategic Dialogue (ISD) has documented a significant escalation in anti-South Asian abuse online, with posts containing anti-South Asian slurs on platforms including Facebook, Instagram, and X. This hate has increased by more than 1,350 per cent from 2023 to 2024. 

This is alarming, as violent extremist posts online have evidently manifested into violence offline. In 2023, Statistics Canada reported that police reported hate crimes targeting South Asians grew by over 227 per cent. As of 2025, there has been a steady increase in South Asians reportedly experiencing slurs and threats of physical violence.

Natural Language Processing (NLP) Researcher Dr Christine de Kock and her team’s research on AI moderation helps explain why this type of content remains online. Often, automated systems disagree when assessing extremist statements that imply violence towards a targeted group, rather than stating it explicitly. This would include, for example, in Figure 1 below, Islamophobic posts and comments that ask when “we can start shooting our enemies.” De Kock et al.’s research (2025) demonstrates that LLMs cannot reliably decode extremist content without specialised learning models. The posts frequently evade detection because they stop short of explicit threats, falling outside of automated moderation, despite clearly endorsing violence.

Figure 1: A screenshot from a 2025 post, from an Instagram profile that heavily posts Islamophobic and xenophobic content. Another user who regularly interacts with Canadian extremist content asks when they can shoot “enemies.”

As humans, we recognise the intent behind these posts. However, automated systems are more likely to classify it as political commentary, which does not always go against the hateful conduct policies of popular social media websites. 

Introducing the Multi-Modal Discussion Transformer (mDT)

A growing body of research highlights the importance of context-aware and graph-based models like Multi-Modal large language models in AI-led content moderation. Rather than classifying posts in isolation, these systems model entire conversation threads and user interactions to better detect extremist content. 

This Insight does not assert an outright rejection of AI-led moderation, but rather warns against treating it as a complete solution. To address these problems, Liam Hebert and his team of Canadian researchers at the University of Waterloo developed the Multi-Modal Discussion Transformer (mDT). Unlike older models that support one-size-fits-all approaches, mDT is trained on large amounts of data and leverages deep neural networks to understand the nuance of speech, as extremist content online becomes more oblique, ambiguous, and adaptive. Rather than treating posts as discrete units, the mDT integrates text, images, and discussion threads through graph transformers, allowing comments to be interpreted in relation to their surrounding context. This is a significant intervention, given that extremist speech is frequently evolving and adaptive in nature. Research by Célia Nouri (2025) suggests that mapping social media discussions as graph transformers significantly improves the detection of extremist content. By grounding multi-modal representations in discussion graphs, the model demonstrates improved performance over text-only systems, reducing “mistakes” made in flagging extremist language.

Currently, this software is designed with an 88% accuracy rate compared to existing models. This is compared to social media platforms that still rely on existing text-based transformer models like BERT, despite a large amount of extremists posting photos, captioned memes, and videos popularly used on TikTok or YouTube

Although the mDT is in the research phase, it demonstrates how AI-led technologies can more accurately reflect the complexities involved with online extremism. These findings highlight the importance of continued research into holistic, context-sensitive approaches to content moderation. For the sake of knowledge mobilisation, Hebert and the team published their code publicly on GitHub, where you can find the data used in their experiments. By enabling others to build on this model, the goal is to conduct ongoing research to eventually produce a precise content moderation tool. 

How can we incentivise social media companies to implement AI tools for content moderation?

With routine auditing and updating, we can continue to build precise AI-led tools to complement content moderation that targets extremism. The next challenge, however, is to encourage social media companies to adopt moderation infrastructure. For example, Meta CEO Mark Zuckerberg announced in early 2025 that the company was stepping back from using automated systems because they were generating “too many mistakes and too much censorship.” While these changes apply in the United States for now, Meta suggested that this will eventually apply internationally

As legal scholar Dr Natalie Alkiviadou explains (2025), social media platforms vehemently emphasise free speech rights. Often, users who promote free speech rights consider any suspensions or other penalties imposed by social media platforms as censorship and a violation of their constitutional rights to freedom of speech, particularly in the US. As social media platforms thrive from viral engagement of any kind, their websites adhere to limited content regulation. 

Social media platforms largely take a reactive approach when taking down violent extremist content, enforcing measures only once it has been flagged in an ongoing investigation by authorities and other public pressures. 

The goal is not to deplatform or censor users. Rather, AI-moderation with human monitoring, clearer frameworks around extremist conduct, and increased accountability can help in addressing the root problems associated with the spread of violent extremism. 

In an era of evolving content moderation, there are   some ways social media companies may be incentivised to moderate their content:

Moderation Can Help Improve Their Reputation as a Safe Social Media Platform

From a corporate perspective, the financial losses associated with reputational damage and advertiser boycotts can outweigh the marginal gains from viral hate content. Sociologist Thomas Davidson (2025) says that allowing abuse to proliferate, “ranging from spam and misinformation to graphic violence and pornography,” can discourage valuable participation from everyday members, thereby “diminishing the user engagement that is central to the business model of social media” (p.1). Moderating content can foster healthier and more inclusive environments that allow every social media user to freely participate on social media without fear of targeted abuse and harassment from extremist actors. 

Emphasise that Harms Outweigh the Benefits of Viral Engagement

To add on, it is clear that online abuse from extremist content has deleterious effects on individual, community, and social well-being. Research by the Canadian Practitioners Network For The Prevention of Extremist Violence (2025) found that social media has proven to serve far-right extremist groups, enabling them to disseminate extremist speech, become mainstream and address the general public, appeal and recruit supporters, and develop a community (pp. 23-24). Psychologist Dr Pablo Madriaza et al. (2025) find that individuals exposed to extremist content online express a wide range of negative emotions, including anger, sadness, fear, helplessness, guilt, and distrust. 

In serious cases, with examples like the senseless misogynist, Islamophobic violence committed by Alec Minassian, Nathaniel Veltman, and Alexandre Bissonette, spreading extremism online can physically and violently manifest offline.  In 2017, Alexandre Bissonnette killed six worshippers at a Quebec City mosque after years of absorbing extreme Islamophobic and far-right online content. In 2018, Alek Minassian murdered ten people in Toronto after engaging with violent misogynistic communities online. In 2021, Nathaniel Veltman carried out a targeted attack against a Muslim family in London, Ontario. Superior Court Justice Renee Pomerance said Veltman drew “much of his rage from internet sources.” These cases reflect that creating a space for extremism to thrive garners an audience and produces these images and stereotypes about targeted groups that are then internalised, regulated, and normalised. Thus, extremism online can lead to violent crimes offline, threatening the security, rights and protection of targeted groups from violence. 

Government and Policy Intervention

While social media companies can be incentivised to improve AI-moderation tools, clear legal frameworks directed towards combatting online extremism can hold these social media companies accountable. Since late 2024, the Government of Canada launched Canada’s Action Plan on Combating Hate, which included its pledge to eliminate violent extremist content online. This plan introduced Bill-63, the “Online Harms Act,” designed to combat extremist material with amendments to the Criminal Code, the Canadian Human Rights Act and an Act respecting the mandatory reporting of Internet child pornography. This legislation is currently tabled. At this point, it is not clear what the government’s vision is for Canada’s future with Bill-63. However, it will be interesting to see how the proposed amendments will be applied in the coming years.

As an example, Germany has Netzwerkdurchsetzungsgesetz (NetzDG) laws that require social media platforms to remove or block “illegal content” posted by users. This law was designed to combat extremism by requiring social media companies restrict speech, rather than censoring speakers directly. 

While Canada’s constitutional framework differs from Germany’s, this law can help by providing guidelines on what is permissible, removing the responsibility from social media companies, while directing social media platforms to moderate their content, and holding companies accountable overall. 

These are some reasons social media companies might consider when investing in content moderation. That said, technological advancements will continue, and online extremism is alive and well in Canada. And so, we need to continually discuss the efficacy of AI-led content moderation tools, and what place they have as we continue to discuss the intersection between the regulation of extremist content and balancing our rights to freedom of expression as Canadians. 

Arunita Das (she/her) is a PhD Candidate in the Socio-Legal Studies program at York University. With her teaching, graduate studies and work with non-profit charitable organisations, Das has developed research in hate speech, hate crime, and violent extremism, for over eight years. Her current research examines the relationship between far-right women, online hate, and free expression laws in Canada. She has published her research within the Osgoode Hall Law Journal, Social Science and Humanities Open, and with Bloomsbury (formerly Rowman and Littlefield). 

Cole Hennig (he/him) is an automation and systems strategist. Through hands-on development, platform integrations, and continual auditing and updating of process design, he has built scalable solutions that streamline compliance, marketing, and CRM workflows for Canadian real estate companies. His current work focuses on AI-assisted document processing, cloud-based automation, and improving accuracy, efficiency, and organisational resilience.

Are you a tech company interested in strengthening your capacity to counter terrorist and violent extremist activity online? Apply for GIFCT membership to join over 30 other tech platforms working together to prevent terrorists and violent extremists from exploiting online platforms by leveraging technology, expertise, and cross-sector partnerships.