Early Flu Detection Using Social Media (Twitter Feed)

In this project, we explored the potential of using social media to detect a potential spike in influenza earlier or later than what would have deemed as flu season. View the Github repository here.

The general idea was to use tweets from previous years as a baseline to determine whether or not there is unusually higher chatter in Twitter about the flu. Once that data is collected, they are analyzed for sentiment: positive, neutral, or negative.

After gathering and analyzing historical data, we captured a live stream of the tweets which were then exported and analyzed to see if there are significant differences against the previous years. The ideal implementation of the system would be a dashboard that is updated every day.

Our findings indicated that the usefulness of a tool such as this is very limited. First of all, the accuracy of sentiment analysis a big driver on the results. As expected, using a better corpus and training data instead of the nltk library default does not yield satisfactory results. Secondly, social media posts are not a reliable source of truth, as most tweets and similar social media posts follow trends. However, there is a potential to use it as a symptom marker or trigger to examine if there is indeed an increase in reported cases of influenza from more reliable sources (hospitals, CDC reports, etc.).

Comparison of Historical Data and New Streams.png