Sentiment Analysis of Presidential Town Halls Using Twitter + ML

How did Americans feel about each candidate during their respective dueling town halls on October 15, 2020? Twitter users had plenty to say throughout the night about both Joe Biden and President Trump. Here’s what I found:

I used Tweepy’s streaming functionality to collect the tweets into separate pandas DataFrames for each candidate and preprocessed them to remove URLs, punctuation and stopwords. I also lemmatized the tweets to reduce each word to its dictionary form for more accurate analysis and removed retweets and @ replies to ensure the originality of each tweet and avoid repetition. Additionally, I limited the collection of tweets to only those written in English and posted by users with greater than 500 followers to try to limit spam posts. In the end, 148,711 tweets about Joe Biden and 144,401 tweets about President Trump were collected and analyzed.

In order to determine the sentiment score for each tweet, I trained a Naive Bayes model on the Sentiment140 dataset of 1.6 million tweets using SKLearn and applied the machine learning model to each DataFrame after vectorizing the data. After running the model, I added sentiment prediction for each tweet to a new column for both original DataFrames, calculated the rolling average sentiment score per 1000 tweets and plotted each rolling average point using Matplotlib.

During the 90 minutes, tweets about Biden were, on average, slightly more positive than those about President Trump. These results seem to align with the current polling that shows Biden in a strong position against President Trump with 18 days until Election Day.