Introducing BR-ECHO: A Tool to Fight Online Extremism in Brazilian Portuguese

11th August 2025 Ricardo Cabral Penteado

By Ricardo Cabral Penteado

11th August 2025 In Insights, PhD Researcher Series

This Insight contributes to GNET’s PhD Researcher Series, highlighting emerging academic voices in the field of countering violent extremism and terrorism online.

Underrepresented Languages in Extremist Content Moderation

Despite growing advances in technological responses to online extremism — particularly in English and Arabic — many widely spoken languages remain overlooked in both research and content moderation infrastructure. Portuguese, spoken by over 250 million people globally, is one such language. In contexts like Brazil, this absence is particularly critical, as current multilingual moderation systems often fail to detect culturally specific forms of violent discourse. This Insight introduces the BR-ECHO (Brazilian Extremist Content Hashing Observatory), a modular pipeline created to fill this gap. By integrating explainability, adaptability, and traceability, the platform aims to support preventing and countering violent extremism (P/CVE) through linguistically and operationally contextualised tools designed for under-resourced environments.

Brazil’s Recent Extremist Background

Brazil’s extremist ecosystem reveals a troubling convergence of digital subcultures, far-right extremist ideology, and real-world violence. The National Socialist Black Metal (NSBM) scene, active across Telegram, Spotify, and other mainstream social media platforms, has become a vector for white supremacist propaganda and recruitment. Parallel to this, Terrorgram channels in Portuguese play a key role in disseminating neo-Nazi, accelerationist, and militant content, fostering the radicalisation of youth and contributing to a wave of school attacks — 42 between 2001 and 2024, with over 64% (27 cases) occurring between March 2022 and December 2024.

These digital dynamics intersect with broader political and criminal networks. The far-right movement in Brazil has harnessed online radicalisation by blending religious nationalism, authoritarian discourse, and conspiracy theories aligned with global narratives such as QAnon. The 8 January 2023 insurrection in Brasília demonstrates how online mobilisation can trigger coordinated physical action, often wrapped in coded language and militaristic symbols. The 2024 Antioch school attacker’s reference to Brazilian extremist manifestos — alongside non-English-speaking perpetrators — exposes how content in underrepresented languages fuels a global repository of hate and tactical knowledge, highlighting the limitations of moderation frameworks designed primarily for English content.

Developing the BR-ECHO

In response to these multilayered challenges, BR-ECHO was developed as a research-driven initiative rooted in Brazil’s local language, political and social context, and other cultural realities. The platform is one of the primary outputs of my doctoral research in Linguistics at the University of São Paulo (USP), focusing on Natural Language Processing (NLP). It offers a modular and explainable system for analysing extremist discourse in Brazilian Portuguese, integrating risk classification, semantic justification, and hash-based traceability.

As extremist discourse becomes increasingly transnational and multimodal, addressing the linguistic gaps in content moderation infrastructure has become critical. BR-ECHO contributes to this global challenge by focusing on Brazilian Portuguese, offering a unique combination of explainability, contextual sensitivity, and operational flexibility. By equipping analysts and policymakers with interpretable outputs and culturally attuned classifications, the platform promotes more equitable and scalable approaches to moderation — particularly in underrepresented linguistic contexts where traditional systems underperform. The following sections provide a detailed overview of the platform’s technical architecture and methodological foundations.

BR-ECHO: A Modular Pipeline for Detection and Explanation of Extremist Content in Brazilian Portuguese

At its core, BR-ECHO is built upon four integrated layers: data ingestion, risk classification, semantic explainability, and traceability via content hashing. The pipeline was specifically designed to analyse extremist content in Brazilian Portuguese with cultural sensitivity and computational rigour.

During the corpus construction phase, BR-ECHO aggregates and curates textual materials from multiple sources to ensure both representativeness and contextual specificity. A key component in this process is the use of curated repositories, such as the Repository of Extremist-Aligned Documents (READ), developed by the Centre for Statecraft and National Security (CSNS) at King’s College London. READ provides a controlled environment for accessing far-right and ideologically motivated violent extremist (IMVE) materials, collected with academic rigour and grounded in digital ethnographic methodologies. BR-ECHO utilises this repository to obtain validated samples of instructional texts, manifestos, and other materials frequently linked to radicalisation and mobilisation. These documents are automatically translated and subsequently processed through a rigorous multi-stage annotation workflow. Initial classification is conducted using three large language models (LLMs), followed by expert manual review to ensure both linguistic accuracy and contextual depth.

In addition to curated archives, BR-ECHO’s corpus construction incorporates data obtained through targeted scraping of Brazilian extremist platforms, including 55chan and Portuguese-language Telegram channels affiliated with far-right extremist and accelerationist ideologies. Where thematic or linguistic similarities are observed, relevant international corpora are also integrated to enrich comparative analysis. To ensure balanced class distribution for training and evaluation, the dataset has also been supplemented with non-extremist content, including samples from corpora such as HateBR. This hybrid construction strategy ensures that the final corpus captures both the localised nuances of the Brazilian extremist ecosystem and its intersections with transnational networks — providing a rigorous foundation for fair and robust classification tasks.

The classification component is structured in two phases to ensure both breadth and granularity of analysis. Drawing upon the annotated corpus described above, BR-ECHO employs two independently trained models: a binary classifier that distinguishes extremist from non-extremist content, and a multi-label classifier that assigns specific risk categories. In the data ingestion stage, when users upload new raw data via the platform interface, both the binary and multi-label classification models are automatically activated, enabling real-time analysis informed by the semantic and structural patterns embedded in the training corpus. As output, the system generates a structured table containing the original text, the binary classification result, the assigned risk category or categories, and any glossary-triggered terms or expressions detected during processing.

Figure 1 – Interface overview of the BR-ECHO platform. The screenshot displays the main dashboard, including glossary filters (by term type and ideological affiliation), CSV batch classification module, and access to core functionalities such as semantic justification (RAG), hash generation, and manual review.

The first stage functions as a binary filter, distinguishing between extremist and non-extremist content. This step relies on fine-tuned versions of BERTimbau transformer-based models pre-trained on Brazilian Portuguese. The localisation of these models enhances their sensitivity to culturally specific markers that multilingual or English-centric moderation systems might otherwise miss. This initial layer ensures that only relevant texts proceed to more resource-intensive evaluations, thereby optimising performance and minimising noise.

The second stage of BR-ECHO’s classification pipeline applies a five-tier risk taxonomy conceptually inspired by the TCAP Tiered Alert System, developed by Tech Against Terrorism. While tailored to the Brazilian context, this taxonomy draws upon international standards for assessing terrorist and violent extremist content. The classification schema comprises five categories (0) Non-extremist, (1) Glorification of Violence, (2) Incitement to Extremism, (3) Instructional Material, and (4) Imminent Threat. This schema allows for a more nuanced understanding of the semantic and rhetorical features embedded within extremist discourse. For example, label 1 identifies content that celebrates violence through martyrdom narratives or heroic portrayals; label 2 captures explicit calls for radicalisation or mobilisation; label 3 flags tactical or didactic material such as manifestos and attack manuals; and label 4 highlights content that signals immediate threat, including direct incitement to violence or operational planning. By organising content along this spectrum, BR-ECHO not only improves classification accuracy but also computationally assesses, through linguistic patterns and escalation indicators, whether a given fragment may represent a trajectory of radicalisation—thereby supporting more proactive and pre-emptive moderation strategies.

Preliminary evaluations using the experimental corpus have produced promising early results. Dozens of machine learning and deep learning models were tested, with fine-tuned BERTimbau models (base and large) delivering the best performance, particularly in the binary classification task. For the multilabel classification, results have also been encouraging, with some tests achieving F1-scores above 0.90, driven by the strategic application of data augmentation methods, which contributed to improved robustness across categories. The F1-score, which balances precision and recall, is particularly relevant in high-stakes contexts such as extremist content detection, where both false positives and false negatives carry significant risks. Nevertheless, initial experiments revealed challenges in consistently differentiating between “Glorification of Violence” (label 1) and “Incitement to Extremism” (label 2), highlighting the semantic overlap between passive heroification and active mobilisation. These findings emphasise the need for further corpus expansion and refinement, as well as additional training and validation cycles to improve category distinction and model generalisability.

A detailed ideological glossary complements the classification process by identifying terms and expressions typical of Brazilian extremist subcultures. Inspired by frameworks such as the Terrorist Content Analytics Platform (TCAP) developed by Tech Against Terrorism, the glossary was curated through a combination of corpus linguistics, discourse analysis, and manual annotation by subject-matter experts. It captures both overt and coded language — including memes, slang, abbreviations, and symbolic references — often used to evade detection or signal in-group affiliation. Unlike static keyword lists, the glossary is context-sensitive and continuously updated to reflect the evolving nature of radical discourse, thereby maintaining alignment with emergent rhetorical strategies and linguistic patterns.

One of BR-ECHO’s most distinctive innovations is the integration of semantic explainability through Retrieval-Augmented Generation (RAG). After the classification models assign a risk level, the system produces natural language justifications on a sentence-by-sentence basis, referencing ideological matches and discursive cues. These explanations — along with all other core functionalities — are made accessible through a unified, Streamlit-based interface, enabling analysts, moderators, and researchers to audit the rationale behind each decision. This layer of transparency not only builds trust but also helps ensure alignment with ethical standards for AI deployment, serving as a model for interpretable machine learning in high-risk digital environments.

The final layer of the pipeline supports long-term monitoring and traceability by generating digital fingerprints (hashes) of all classified content. Each textual fragment receives a unique hash signature using algorithms such as SHA256, BLAKE3, or Simhash, depending on content characteristics and monitoring objectives. These hashes are stored in a local database structured after the GIFCT’s Hash-Sharing Database (HSDB), allowing the detection of repeated or repurposed extremist content across platforms. While BR-ECHO currently operates as an independent research tool, its architecture was designed with future interoperability in mind, supporting integration with regional and international moderation frameworks.

Beyond its technical capabilities, BR-ECHO is designed as a practical and accessible resource for a broad range of stakeholders, including civil society organisations, investigative journalists, academic researchers, and platform moderators. These users can benefit from its transparent classification mechanisms, detailed semantic justifications, and hash-based tracking system to better identify, interpret, and counter the spread of violent extremism online. While still in its development and testing phase, BR-ECHO already represents a concrete response to growing demands for more equitable, explainable, and linguistically inclusive moderation tools. By foregrounding the importance of localised linguistic knowledge — particularly through its focus on Brazilian Portuguese — the system addresses the global challenge of underrepresented languages in content moderation.

Designed as a cyclical architecture, it also enables the generation of high-quality annotated datasets via human-in-the-loop review and semantic justification modules. These datasets, in turn, support the retraining and fine-tuning of classification models, enhancing their ability to capture the nuanced granularity of extremist discourse. Although still in development, BR-ECHO’s architecture already offers actionable insights and contributes to a shared ethical standard for content governance.

Conclusion

Efforts to counter violent extremism online must transcend dominant languages and geopolitical centres in order to become genuinely inclusive and globally effective. While extremist actors swiftly adapt their discourse across digital platforms, moderation technologies often lag behind — especially in contexts where linguistic resources are limited or neglected. In this scenario, BR-ECHO illustrates how computational linguistics can serve not only as a technical field, but as a public safety framework rooted in cultural and linguistic specificity.

By focusing on Brazilian Portuguese — a non-hegemonic language spoken by over 200 million people — the project demonstrates how localised models, ideological glossaries, and explanatory classification tools can strengthen digital resilience and improve detection of online extremism. Although tailoring moderation systems to specific languages is often seen as resource-intensive, BR-ECHO offers a concrete, low-cost example of how modular, open-source pipelines can be adapted for high-risk yet under-resourced environments.

Crucially, initiatives like BR-ECHO show that the Global South need not remain at the periphery of digital security innovation. Instead, it can lead the way in designing scalable, context-sensitive technologies that address its own realities. GIFCT member platforms are strongly encouraged to examine such models closely, not only to foster greater inclusion but also to explore their potential for replication across other linguistic contexts. Collaboration opportunities — including co-development, early-stage testing, and piloting — are warmly welcomed.

–

Ricardo Cabral Penteado is a Ph.D. candidate in Computational Linguistics at the University of São Paulo (USP). He specialises in deep learning and natural language processing (NLP), focusing on the intersection of violent extremism and technology within the Brazilian and Latin American context. Ricardo is a 2024-2025 GNET Fellow.

–

Are you a tech company interested in strengthening your capacity to counter terrorist and violent extremist activity online? Apply for GIFCT membership to join over 30 other tech platforms working together to prevent terrorists and violent extremists from exploiting online platforms by leveraging technology, expertise, and cross-sector partnerships.

Tags: Content Moderation, Propaganda, Social Media, South America

Introducing BR-ECHO: A Tool to Fight Online Extremism in Brazilian Portuguese

By Ricardo Cabral Penteado

Ricardo Cabral Penteado

Share

GNET’s Research Digest