One of the most persistent policy concerns of recent years is the notion that social media platforms’ recommendation systems restrict the flow of information to users. These algorithms curate content that is personalised to users based on previous online interactions and other tracked data points. The concern is that they create ‘filter bubbles’ – as coined by Eli Pariser – which may foster ideological homophily by serving users more content which fits their pre-existing worldviews, which may create a false sense of reality. In the context of extremism, it has been argued by the EU Counter-Terrorism Coordinator that the amplification of legal harmful content may be conducive to radicalisation and violence because it normalises it and exacerbates polarisation in society. In the last few years, I have been part of a team that has conducted empirical experiments to assess whether recommendation systems can promote extreme content, as well as taking stock of the range of regulatory responses that have been employed by governments. Most recently, I surveyed the existing empirical literature on this topic for the Global Internet Forum to Counter Terrorism. Below, I map out what we know about recommendation systems and extremism, and importantly, where we may be lacking key knowledge.
Can Social Media Recommendation Systems Promote Extremist Content?
Put simply, there is good reason to think that social media recommendation systems can promote extreme content. In the literature review mentioned above, 10 of the 15 studies demonstrated some kind of positive effect. Take, for example, the study by Schmitt and colleagues, who look at counter-messaging on YouTube, finding a high crossover in the potential recommendations between these videos and extremist ones, particularly when there was keyword similarity. Research by Papadamou et al. finds that Incel videos on YouTube are relatively unlikely to be recommended in a vacuum, but when an automated user begins to watch such content in a ‘random walker’ simulation, the system recommends more with increasing frequency. Taking a qualitative interview-based approach on the media diets of convicted and former Islamists, Baugut and Neumann find that many stated that they began with a basic interest in Islam or in news media that was outside the mainstream. These individuals then followed platforms’ algorithmically-influenced recommendations to where they encountered radical propaganda.
However, we should take caution before concluding that recommendation systems promote extreme content. Many studies demonstrated minimal or null findings. Research by Hosseinmardi and colleagues finds that there is a diversity of pathways to get to far-right content and only a fraction can be attributed to recommendations. A longitudinal study analysing Twitter by Huszár et al. found no evidence that far-right or far-left political groups were amplified beyond the reach of centrist ones. One piece of research by Ledwich and Zaitsev even suggests that YouTube’s recommendation system actively discourages users from extremist content in favour of mainstream videos, although there has been some critique and back and forth on the methods and limitations of this paper. Despite the majority of papers suggesting that there are positive effects, we should pay attention to the sizable minority that go against the grain, particularly as many were published more recently (more on this below).
Are People ‘Radicalising by Algorithm’?
Given that many studies demonstrate that recommendation systems can promote extreme content, one may be inclined to vindicate the EU Counter-Terrorism Coordinator who suggested that they may be conducive to violent radicalisation, or the numerous journalistic pieces which speak of YouTube as a radicaliser or detail extremist ‘rabbit holes’. However, there is little evidence to support this idea. We still know little about the effects of extremist content in the process of radicalisation; it is reasonable to assume it can play some effect, but recent studies have demonstrated that being persuaded by propaganda is a complex phenomenon which is often moderated by other factors such as personality traits or subversive online activity. Furthermore, we know that for almost all individuals that do radicalise, the Internet plays a facilitative, rather than driving role; those that engage in terrorist activities still tend to rely on offline interactions and local social networks. Put simply, radicalisation is a complex social phenomenon and we do not know enough to assume that an increase in radical content will necessarily lead to a greater chance of violence.
Looking at the corpus of studies on recommendation algorithms bears out this complexity. As a rule, the more factors that studies include as independent variables, the less clear the case for ‘radicalisation by algorithm’ becomes. For example, a study by Chen and colleagues for the Anti-Defamation League had human participants install a browser which tracked their behaviour on YouTube. They find that the platform’s recommendation system can expose individuals to extreme content, but that this was almost exclusively accounted for by individuals that self-reported high levels of racial resentment. This finding mirrors the wider “filter bubbles” research, which frequently shows that recommendation systems can promote extreme content, but this largely plays a smaller role than individuals’ own choices.
Wolfowicz, Weisburd, and Hasisi conducted an experiment in which the treatment group suppressed Twitter’s algorithm by rejecting all of the platform’s automated recommendations (a control group accepted them). They then sought to test the relationship between ‘filter bubbles,’ network ideological homophily (i.e. echo chambers), and the justification of violence. They find no direct link between suppressing Twitter’s algorithm and justifying violence, but when they include the network variables, they find an interactive relationship. This demonstrates the complexity of the radicalisation process in which peer-to-peer communication plays a key role in facilitating how algorithmically driven content affects its audience.
What Don’t We Know?
Looking at the corpus of studies in the literature review, some clear trends emerge which lead to knowledge gaps. To begin, YouTube is by far the most studied platform; 10 of the 15 pieces of research were either primarily or partially based on the site (this includes Baugut and Neumann’s interview-based research in which the findings explicitly referenced it). While it is reasonable to study one of the largest platforms on the Internet, which has often been signalled out as hosting problematic content, this emphasis may also be due to the fact that it has one of the most researcher-friendly Application Programming Interfaces (API) of the large platforms. In other words, it may be the most studied because researchers can get to it, and are restricted from accessing other platforms. Only five platforms were studied in the whole corpus (YouTube; Twitter; Facebook; Reddit; and Gab), which excludes hundreds of other platforms – although more recently, studies have analysed TikTok and Amazon that were not included in this review.
There is also a heavy Western focus within the corpus. Seven studies use English-language ‘seed’ accounts for their investigation, with German (and a mix of German and English) in two others. By comparison, only two used Arabic language content, and three studies used a mixed set of languages which include English, Japanese, French, Spanish, German, Turkish, Arabic, and Mandarin. Language is important here, in a study by Murthy which analyses ISIS videos in YouTube recommendations, he finds that Arabic videos are more likely to be recommended than English ones. It is plausible that platforms’ responses are most finely attuned to English-language content, but that other languages do not have the same scope of resources which are dedicated to countering problematic content in recommendations.
One gap that will be difficult to remedy is the speed of academic publishing, particularly when compared to the breakneck pace of change of the tech world. Studies in this corpus collected data from 2013-2020 and it often takes between 1-2 years from data collection before an article is published. On the other hand, social media platforms update their algorithms daily, and can make several policy changes per year. One example to demonstrate how Twitter’s position has changed over the last decade is to consider that in 2012, their General Manager stated that they “remain neutral as to the content…we are the free speech wing of the free speech party,” but today, they employ much more rigorous content removal, as well as down-ranking tweets or removing them from amplification in timelines. Other platforms, such as YouTube, Facebook, and Reddit also all employ methods of down-ranking content or removing it from recommendations. When looking back on the findings of studies that collected data even a year ago, it is likely that the platform has undergone several technological and policy changes since.
There are also methodological limitations within the corpus of studies. 13 of the 15 rely on some kind of ‘black box’ design, meaning that they do not manipulate platforms’ recommendation systems, but rather input data (for example, by beginning to watch extreme content) and see what the platform recommends. Knott and colleagues argue that this is a problem because it cannot test causal hypotheses. Rather, it simply collects data from an opaque system in which hidden factors could be clouding why the system is choosing to recommend content. Another limitation is that most of the studies that collect data on YouTube do so by accessing the platform’s API and collecting the Related Videos to establish whether the pool of potential recommendations contains extreme content. However, this does not mimic the user/platform relationship in which users continually interact with content, causing an algorithm to learn and tailor content accordingly. In essence, these studies provide a snapshot of potential recommendations, but cannot account for personalisation, which is the very nature of the ‘filter bubble’ concern.
Where Do We Go From Here?
This Insight has shown that, although recommendation systems can amplify extreme content, research paints a complex picture in which there are many moving variables. There is certainly insufficient evidence to support claims of ‘radicalisation by algorithm’; they are one small piece of a puzzle that occupies an inflated level of concern in policy circles. To move towards a better understanding in future, it is vital that researchers and tech companies work together and share internal data and avoid relying on ‘black box’ studies, as well as broadening data collection approaches beyond YouTube and English-language content. Future research should aim to bring human participants into the mix, both by tracking user data online, as well as employing experimental media effects research in which participants experience a radical ‘filter bubble.’ Overall, recent years have brought more robust empirical research which can act as a springboard for the next generation of scholars to utilise novel methods to try and tackle the problem.