There’s been a lot of buzz over the past couple of years on predicting the outcome of events based on Twitter data. Having easy access to the thoughts of millions of people worldwide, tapping into the stream of short, cryptic and mostly useless tweets and trying to make some sense out of them attracted the interest of a lot of curious people.
Jessica Chung and Erik Tjong Kim Sang tried to predict the outcome of political elections. Johan Bollen found a correlation between Twitter and the stock market. Xiaofeng Wang tried to predict crime based on tweets.
When it comes to predicting Oscar winners, Liviu lica and the guys from uberVU used overall sentiment, which worked in 2011, but failed in 2012. At the uberVU hackaton, I tried using another approach, focused on adjective, which (lucky me) seemed to work. But a new study showed that
Twitter messages are not useful when it comes to movie predictions. And I agree with them: all of the ideas above are flawed. People are noisy sensors. Aggregating over noisy sensors does not result in the right answer, just in an estimate of it (along with an uncertainty level).
But there is one way to reduce the uncertainty level down to a negligible value: use tweets from psychics. The problem with this approach is identifying “psychic” tweets. Obviously, there are very few psychics in the world, so identifying their tweets is not trivial.
I used a simple rule-based filtering approach: I picked only tweets that don’t contain a question (no ‘?’) and the author expresses certainty about who the winner will be (the phrase ‘will win’ appears in the tweet, but ‘think’ or ‘hope’ don’t).
For the proof of concept, I used the corpus from the previous hackaton – 62000 tweets recorded in one week, prior to the Oscars, each tweet assigned to the movie it’s referring. The movie “The Tree of Life” has just 2100 tweets, while “The Artist” goes up to 19200. Out of the 62000 tweets, I get only 98 after filtering. Let’s see how they are distributed:
So there you have it – the power of psychic tweets, predicting the Oscar winner!
Disclaimer: While the data and results are real, I hope you enjoyed this April 1st prank 🙂