Project 4: Platform Audits

EH
Elizabeth Huang
,
AP
Anneliese Peerbolte
,
BB
Brooke Bao
,
EY
Elizabeth Yan

     The year is 2008. The year before, the first iPhone was released making it the first time the masses could carry access to the World Wide Web in their pocket. The world is on the cusp of a great expansion of expression and community as never seen before. A thirty-year-old Richard Spencer had just dropped out of his PhD program at Duke to work for “The American Conservative” magazine—he was fired within the year for views called too extreme (ABC 13 Houston, 2017). He started “The Alternative Right” webzine in 2010. Alt-right, popularized by Spencer, is a term describing the far-right, white nationalist ideology. “Alt-right pipeline” is a theory describing the radicalization of individuals towards the alt-right, especially on the internet towards communities such as the one Spencer created. There have been many theories as to whether individuals in certain communities or with certain traits are more likely to participate in this pipeline, of which the online gaming community, specifically gaming content creators on YouTube, are at the forefront. It has been observed that gaming communities have developed a violent ideology associated with the American alt-right. The YouTube algorithm is often a central blame for this radicalization of young men. Specifically, misinformation seems to have a central role in how people can be sucked into the alt-right pipeline. Safiya Noble suggests that a rapid shift toward commercial interests lends to deregulated media. Even more so, the internet spreads this information quickly and without many checks (Noble, 171-172). YouTube’s ad-driven revenue provides a perfect example of this vulnerability, and thus a breeding ground for alt-right content.

     2014’s “Gamergate” served as the start of the political split within the gaming community. The movement started as an anti-feminist heckling campaign, building off of the misogyny rampant in many gaming communities. Much of this manifests as doxxing, death threats, threats of sexual assault, and general harassment of female journalists. However, there was a much deeper and more insidious root to Gamergate beyond sexism: online recruitment from neo-nazi white supremacist groups. Much of this radicalization happened on platforms like 4chan, 8chan, and Voat, which were largely unregulated and fostered quite a few unethical ideologies. 4chan’s “/pol” chat board specifically appealed to many young men, Milo Yiannopoulos argues, because of the shock value and comedy they find from racist trolling. The conservative radicalization, largely of young white men, has a clear follow-through in the “trolling” behavior of many Trump supporters in the 2016 election, an event marking a distinct political shift in the country (Romano). Kint Finley writes that Gamergate was so successful because “It tells [young white men] that they are the natural rulers of the world, but that they are simultaneously being oppressed by a secret religious order” (Matthews). The left’s push towards racial and gender equality feels like a direct attack on these young white men, making them prime victims of the alt-right movement.

     We are interested in looking at how platforms like YouTube allow this kind of morally dubious content to occupy its platform in such a central space. Specifically, we are interested in seeing how gamer content contains alt-right language that leads viewers to engage with increasingly radical alt-right content. This is both an epistemic harm to the larger community, as misinformation is being spread unchecked on these platforms, and a tangible harm to marginalized communities. This project is of great importance to us because of the harm this algorithmic injustice is doing to our community, specifically in radicalizing many young men. Through the power of data analysis, we wanted to dive deep into what the YouTube algorithm is really doing, and if we can fall down the rabbit hole ourselves.

    Our study applies data auditing to examine how YouTube’s recommendation algorithm might influence exposure to far-right ideologies within gaming communities. By investigating the following: (1) Does engagement with gaming content on YouTube correlate with exposure to far-right ideologies? and (2) Does YouTube’s recommendation algorithm reinforce connections between gaming content and far-right ideology? We aim to determine if the algorithmic prioritization of engagement amplifies exposure to extremist content. We hypothesize that YouTube’s algorithm, favoring high-engagement material, amplifies user exposure to far-right content within gaming.

    Rumi Khan’s The Alt-Right as Counterculture examines how far-right groups co-opt symbols like “Deus Vult” and “remove kebab” from gaming communities, repurposing them as covert ideological symbols of irony. Khan argues that these symbols, often perceived as humorous, align with far-right ideologies and appeal to gamers predisposed to engaging with content based on subcultural humor. This perspective suggests that YouTube’s algorithm, which elevates engaging content, might prioritize these ill-intended memes without recognizing their underlying messages. Thus, engagement with gaming content could serve as an entry point to far-right ideas if these memes remain unchecked by content moderation algorithms.

    Similarly, Gallagher et al.’s Pulling Back the Curtain reveals that YouTube’s recommendation algorithm often leads teenage users with a casual interest in gaming toward violent, misogynistic, or extremist content. Gallagher et al. found that YouTube’s recommendations frequently blur distinctions between harmless gaming interests and content that includes extremist undertones. They observed that even gamers engaging with non-political content were nudged toward harmful ideologies due to the recommendation algorithm’s reliance on engagement metrics. Together with Khan’s findings, this may suggest that engagement with gaming content correlates with greater exposure to far-right ideologies, especially as algorithms interpret engagement rather than content intent.

    Understanding our second research question—whether YouTube’s algorithm reinforces connections between gaming content and far-right ideology—can be guided by Hazem Ibrahim et al.’s YouTube’s Recommendation Algorithm is Left-Leaning in the United States. Their research shows that YouTube’s algorithm aims to moderate extremist content by gradually pulling users away from ideological extremes. However, Ibrahim et al. also found that the algorithm sustains echo chambers within certain niches. The effect is particularly significant for gaming audiences, where ironic humor and far-right symbols often overlap, potentially creating a reinforcing cycle of ideological exposure. The study implies that YouTube’s recommendation pathways, though generally effective at discouraging extremism, may still promote sustained far-right engagement within gaming, as these ideologies align with the community’s subcultural language and engagement-focused metrics.

    We also see that Gallagher’s findings further support this concern; their study demonstrates how teenage gaming users are repeatedly exposed to ideologically charged content under YouTube’s engagement-oriented recommendation system. This repetition, often driven by the algorithm’s emphasis on similar engagement profiles, suggests that viewers in gaming communities might remain within echo chambers that reinforce their exposure to far-right ideas. In their data auditing approach, Gallagher’s team mapped out the pathways of exposure and documented recurring themes in recommended content, showing how YouTube’s algorithm, while aiming to keep users engaged, can unintentionally create feedback loops in specific communities. In gaming, where humor and symbols with subtle ideological messages are prevalent, this algorithmic feedback loop risks continuously exposing users to far-right content, sustaining engagement by surfacing content that aligns with previous interactions, even if subtly extremist.

    Overall, these studies suggest that YouTube’s engagement-based algorithm may foster pathways toward far-right ideologies in gaming, making data auditing crucial to understanding the potential “alt-right pipeline” effect in its community.

3.1 High-Level Idea

     This data collection process aims to analyze the presence of alt-right language in gaming-related content on YouTube, exploring possible connections in the content via video transcriptions. The main objective is to understand the overlap of alt-right language usage within these spaces.

3.2 Team Repository

     This section provides an overview of our team repository, including key information and resources hosted on GitHub.

3.3 Raw Data Profile

    Before beginning data collection, our search term queries and relevant keywords were sourced from 19 sites: 11 for alt-right and 8 for gaming. This provided us with a list of 214 alt-right keywords, 486 gaming keywords, and 112 skip words (commonly used words within these articles that did not significantly contribute to the meaning of a sentence).
In our data collection, we were able to query a total of 5860 unique videos within the gaming category. The bulk of the alt-right content used came from the video transcriptions in the 93 queries of the top search result to the keywords in our alt-right keyword list.
The relevant data stored from these search results (in both categories of videos – political and gaming) included: videoID, channelID, title, description, liveBroadcastContent, channelTitle, publishTime, alt_right_similarity_score, and predicted_category.

3.4 Methodology

    The method we chose to observe the presence of alt-right content in gaming communities on YouTube was primarily through search results. An alternative method we explored to investigate this relationship was probing YouTube’s recommendation system via “Up-Next” videos. A similar study on the observation of political bias on YouTube, for instance, uses the following strategy: using the top 3 search results of political queries to queue recommendations. They use this strategy as “a majority of YouTube users only click on the top 3 results” (Lutz, 2021). In these results and their recommended videos, the paper outlines using BERT to perform sentiment analysis on each video transcript and identify the political affiliation of a given video. With this in mind, we used a similar strategy to observe the gaming community and see how much overlap might exist between sub-categories in gaming videos to alt-right media (top search result of each query). As we know that the algorithm may recommend increasingly more extreme content with each video, our data collection first seeks to understand how much existing alt-right content is contained within videos in the gaming category. The idea is that the observation of a high overlap between the two spheres of media shows a strong correlation between the two types of videos, which in turn might lead to a more likely recommendation of an alt-right video, while very little overlap means that an alt-right video is less likely to lead directly to an alt-right video.

3.5 Implementation Details

    Using the words sourced from various articles, we first queried 93 alt-right terms in YouTube’s search. We then took just the top search result of each query and stored these results in an alt-right video data frame. To ensure that our search queries yielded legitimate alt-right/ extreme content results, we manually checked the videos on YouTube (see Screenshots). Our initial idea for finding gaming videos was a similar method: retrieving the top search results for gaming-related queries. However, we received a more comprehensive list of relevant gaming videos through the gaming category as provided through the YouTube data API. Thus, our data collection mainly consisted of querying the YouTube data API for the top search results in the gaming category (without prompt/ query provided). Each search result request yielded roughly 300-700 videos of results, all of which were ultimately stored in a larger data frame (duplicates removed to only contain unique videos) of all gaming videos found.

    The biggest limitation of our methodology was the quota limit for API requests, which reduced the size of our data collection pool. To work around this, the code ensures that all data is written to a .csv file even when not all search results can be requested, providing a save point right before the quota limit is reached. Additionally, we switched our initial implementation of using the same API for retrieving video captions and used a Python package for transcriptions to reduce token usage.

Click on graphs to expand.

Figure 1: Distribution of Alt-Right Similarity Scores Across All Gaming Videos

Figure 1: Alt-Right Similarity Scores
Red line indicates the average similarity score across all gaming videos.

Figure 2: Gaming Video Sub-Categories vs Alt-Right Similarity Scores (Violin Plot)

Figure 2: Gaming Video Sub-Categories
Highest average similarity score, 4.354%, associated with Strategy Game Tips, highlighted in yellow.

Figure 3: Top 10 Gaming Sub-Categories vs Alt-Right Similarity Scores (Boxplot)

Figure 3:Figure 3: Top 10 Gaming Sub-Categories vs Alt-Right Similarity Scores (Boxplot)
Highest single video similarity score, 11.666%, associated with Mobile Game Updates outlined in red.

Figure 4: Gaming Video Sub-Categories vs Alt-Right Similarity Scores (Boxplot)

Figure 4: Gaming Video Sub-Categories vs Alt-Right Similarity Scores (Boxplot)
Highest single video similarity score, 11.666%, associated with Mobile Game Updates highlighted in yellow.

Figure 5: Distribution of Gaming Video Sub-Categories (Count of # Sub-Category Videos)

Figure 5: Distribution of Gaming Video Sub-Categories (Count of # Sub-Category Videos)
Top 5 sub-category video volume counts (highest to lowest): Other, Minecraft Hacks, Game Updates, League of Legends, Game Mechanics.

Figure 6: Overall Gaming Alt-Right Similarity Score From 2009 – 2024 (by Year)

Figure 6: Overall Gaming Alt-Right Similarity Score From 2009 – 2024 (by Year)
Drop from over 4% alt-right similarity to under 2% similarity following 2019.

Table 1: Top 5 Videos with Highest Alt-Right Similarity Scores

Title Alt-Right Similarity Score Video Link
7 Times Games Were More Realistic Than We Were Expecting 0.11666 Watch Video
7 Cheaty Bosses Who Didn’t Fight Fair 0.106907 Watch Video
7 Times Stealth Made You Rage: Commenter Edition 0.106665 Watch Video
6 Military Baddies in Serious Need of a Career Change 0.097498 Watch Video
7 Huge Videogame Phenomena No One Saw Coming 0.097096 Watch Video

4.1 Summary of Findings (Qualitative & Quantitative Analysis)

    Our analysis finds that gaming videos have an average similarity of 0.584% to alt-right media (see Figure 1). Unsurprisingly, the 5 videos that were scored with the highest alt-right similarity had the most mention of themes of gun violence, rage, and death (Table 1). The sub-category with the highest average similarity score is Strategy Game Tips, which is 4.354% similar (see Figure 2). While the amount of overlap isn’t particularly high, the potential implications of propagating extreme content in the gaming sphere can be especially severe as the average audience members in the gaming community are teens.

    For instance, in the highest similarity category, Strategy Game Tips, videos comment on games such as Teamfight Tactics and Warhammer 3, both of which are for youth ages 13 and up. Additionally, the single highest similarity score among all the videos was found in the Mobile Game Updates sub-category, with one of its videos reaching 11.666% in similarity (see Figure 3, 4). Beyond these high-similarity sub-categories, Minecraft Hacks stands out as the category with the highest volume of videos in the gaming segment with more than 532 videos in its category, taking up 9.078% of the entire dataset (see Figure 5).

    Between 2019 and 2020, we observed a significant decrease in alt-right similarity scores (Figure 6), which may be attributed to the changes in YouTube's search algorithms at roughly the same timeline. According to YouTube’s Official Blog team, “... in 2019 [YouTube] first began demoting borderline content in recommendations, resulting in a 70% drop in watchtime on non-subscribed, recommended borderline content in the U.S. Today, consumption of borderline content that comes from our recommendations is significantly below 1%” (Goodrow, 2021). These algorithm changes likely influenced gaming co

4.2 Methodology

     Our data analysis process involved several stages aimed at identifying the similarity between gaming videos and alt-right political media. Initially, we use a small number of purely political video transcripts preprocessed to remove any extraneous content (words like “and”, “or”, “the” which do not particularly reveal anything about the content of a text). The process for selecting these alt-right content videos is described above in our data collection methodology. To measure the similarity between alt-right content and the gaming video transcripts, we calculate a similarity score using word vectorization between each gaming video and the alt-right baseline. These scores provide a quantitative measure of the degree of alignment between gaming content and alt-right themes. Additionally, to understand how specific sub-categories within gaming may correlate with alt-right content, a classification approach is used to determine the categories of the gaming videos.

4.3 Implementation

     Using the videoID information from the alt-right data frame, we retrieved video transcripts from all search results and combined them into a single document. We then used SpaCy, an NLP library, to extract all words and remove any irrelevant entities as specified in the skip_words.txt file. The content of this was then combined with all the keywords we sourced from articles to form a master list of keywords used as a baseline for alt-right content. For each gaming video transcript, the similarity between the alt-right master list of keywords and the transcription of a given video was measured through cosine similarity. We implemented this through our compute_cosine_similarity function, which calculates the similarity between the transcript of a YouTube video and a predefined list of keywords. Using the TfidfVectorizer library, the function converts both the alt-right keywords and the gaming video transcript into vectors of term frequencies-inverse document frequencies (TF-IDF), which helps represent the importance of each word in the documents. The function then calculates the cosine similarity between the vectorized transcript and the combined keyword list, resulting in a similarity score. A similarity score of 0.0 means there is no similarity between a gaming video and alt-right media, whereas a maximum similarity score of 1.0 (100%) means that the content is entirely the same. To understand how sub-categories within gaming might be correlated with alt-right content, we used a keyword classification method to predict the categories of all retrieved gaming videos. The categorization keywords that helped make these predictions were generated by ChatGPT, which summarized patterns of video titles from the gaming video dataset we collected.

    Our observation of the content overlap between gaming videos and alt-right media on YouTube calls attention to the intersection of algorithmic biases and social influence, themes central to both Data Feminism by Catherine D’Ignazio and Lauren F. Klein and Algorithms of Oppression by Safiya Noble. More specifically, our analysis examines the power held by the algorithms designed by YouTube, which influences the content watched by users.

    In Data Feminism, D’Ignazio and Klein, one principle that strongly relates to our research topic is the idea of examining power. D’Ignazio and Klein argue for a critical examination of how data systems reflect and reinforce existing power dynamics (Chapter 1), which is reflected in our observation of YouTube’s recommendation system, which urges users to make uninformed decisions on the platform. A secondary principle observed in our project is plurality (Chapter 5); as our objective was to thoroughly probe the platform for content that may exclude more neutral perspectives.

    Similarly, Noble highlights how search engines and recommendation systems can amplify harmful stereotypes and contribute to the marginalization of certain groups by promoting biased content via “neutral” or “objective” recommendation systems as the ultimate decisions algorithms make are solely based on their designs, which translate human biases into automated/ digital processes (Noble, pp. 59-60). Applying these perspectives in our methodology helped to uncover that YouTube’s algorithm may inadvertently favor or demote certain types of content based on the extremity of its video content (language included in each video transcript).

    When the content was observed more closely, we were able to find instances of aggressive behavior or wording within these video transcripts, proving the validity of our algorithms. The findings of decreased similarity scores support the idea of favoring alt-right content as we observed the impact of deliberate intervention in post-2019 results, demonstrating the mitigation of extreme content/ biases with intentional algorithmic adjustments.

    Ultimately, both Data Feminism and Algorithms of Oppression underscore the importance of designing algorithms with an awareness of social impact, particularly for communities vulnerable to bias. This study’s findings reinforce that unchecked algorithmic processes can inadvertently lead to the propagation of harmful content. Therefore, a feminist and justice-oriented approach to data analysis and recommendation algorithms is essential for fostering inclusivity and equity within digital spaces.

     To address the overlap between alt-right content and other ideologically driven spaces on platforms like YouTube, a focused reevaluation of both the platform's recommendation algorithms and its monetization policies is essential. Our research (literature review) and data observations suggest that videos with alt-right or alt-lite language are often promoted by YouTube’s algorithm because they drive high engagement, especially among niche, ideologically aligned audiences. One specific recommendation to reduce the prominence of such content is to adjust the monetization model. Videos featuring harmful rhetoric should be penalized or demonetized to limit the financial rewards associated with spreading extremist ideologies. If creators are unable to profit from videos that exploit or promote harmful content, the incentive to produce and recommend this material is narrowed.

     This change would also help shift the algorithm’s focus toward content that fosters more constructive, ethical discussions, rather than engagement metrics, which escalate divisive controversy. While this adjustment may face resistance from creators and stakeholders who prioritize profit, it aligns the platform's incentive structure with societal responsibility. By no longer rewarding extremist content financially, YouTube could encourage creators to produce more inclusive, informative, and responsible videos, which would ultimately benefit the platform's long-term credibility by improving the safety and satisfaction of users in the online community.

     Additionally, actions to drive greater platform accountability are critical in addressing this issue. Independent, data-driven audits of YouTube’s recommendation algorithms could provide transparency in how content is promoted (both in recommendations and in the visibility of top search results) and help identify areas where harmful ideologies are unintentionally amplified. By conducting regular external audits, YouTube would ensure its content recommendation system adheres to responsible standards, promoting positive content while minimizing the influence of divisive or extremist material. These audits would also offer actionable insights, allowing the platform to make necessary adjustments to its algorithms and content moderation practices.

     Ultimately, the goal is to strengthen the autonomy and knowledge of users, empowering them to make informed decisions without the invisible influence of algorithms pushed onto their community spaces. Platforms like YouTube must prioritize user safety and autonomy, fostering environments where users are aware of their digital decisions and encouraged to engage with content that aligns with collective ethical standards. Through the combination of adjusting monetization policies and ensuring stronger platform accountability, YouTube and other platforms can take meaningful steps to improve the moral soundness of popular digital media and promote healthier, more constructive dialogue across diverse categories of content.

General References

Sources for Key Terms

Gaming Related References