"Thought I'd Share First": An Analysis of COVID-19 Conspiracy Theories and Misinformation Spread on Twitter.


The COVID-19 outbreak has left many people isolated within their homes, turning to social media for news and social connection, which leaves them vulnerable to believing and sharing misinformation. Health-related misinformation threatens adherence to public health messaging and monitoring its spread on social media is critical to understanding the evolution of ideas that have potentially negative public health impacts.

We use Twitter data to explore methods that characterize and classify four COVID-19 conspiracy theories, and provide context for each conspiracy theory through the first five months of the pandemic.

We begin with a corpus of COVID-19 tweets (N ~ 120 million) spanning late January to early May 2020. We first filter tweets using regular expressions (N = 1.8 million) and use random forest classification models to identify tweets that belong to four conspiracy theories. Our classified datasets are then used in downstream sentiment analysis and dynamic topic modeling to characterize linguistic features of COVID-19 conspiracy theories as they evolve over time.

Analysis using model-labeled data was beneficial for increasing the proportion of data matching misinformation indicators. Random forest classifier metrics varied across the four conspiracy theories considered (F1 scores between 0.347-0.857), with performance increasing the more narrowly a given conspiracy theory was defined. We show misinformation tweets demonstrate more negative sentiment when compared to non-misinformation tweets and that theories evolve over time, incorporating details from unrelated conspiracy theories as well as real-world events.

Though we focus here on health-related misinformation, this combination of approaches is not specific to public health and is valuable for characterizing misinformation in general, which is an important first step in creating targeted messaging to counteract such spread. Initial messaging should aim to preempt generalized misinformation before it becomes widespread, while later messaging will need to target evolving conspiracy theories and the new facets of each as they become incorporated.

MIDAS Network Members