Can you really use twitter to work out the fortunes of the X Factor contestants and predict who is beating who? Iím no expert on Twitter, but thanks to Toby at http://anlytk.wordpress.com thereís a set of figures available for positive and negative sentiment for each contestant each week. Two obvious problems to turn these into predictions though:
1 Ė how do you deal with negative comments? The phone vote picks up favourite acts, but itís clear that some of the acts in trouble had many positives but also plenty of negatives?
2 Ė how do you factor in that some acts will have a big fanbase on twitter, but others who are doing fine on the phone vote wonít have much support there?
For 1, the fairest way seems to be to subtract a proportion of the negatives from the positives. So score = positive comments Ė (k*negative comments) where k is less than 1. What should this k be though? 0 ignores negative comments, k = 1 assumes each negative cancels out a positive. Somewhere in between is probably right.
Question 2 is easier; although the raw numbers donít tell you very much, the ratios and changes from week to week give big clues. If Act A has 1000 positive comments in weeks 1 and 2, and Act B has 300 in week 1 and 800 in week 2, that tells me Act B has done far better against Act A in the second week. I also assume that acts donít change fanbase too much during the competition; this means that a single weighting factor can be used to multiply an actís twitter score up to give a fair comparison.
There are plenty of constraints as the bottom 2 acts are announced, and credible newspaper reports on how the voting is going which give together:
So can we find a value of k, and weighting factors for each act which take the twitter stats and give the results above? After plenty of trial and error, hereís the closest I can get:
Weeks 1,2, and 5 match all the constraints.
Week 3 is exactly right except for The Risk scoring, er, 0% in the vote. This was the week when Ashley morphed into Ashford, so a strange week for them overall Ė I canít come up with a decent explanation . I correctly have Misha and Sophie the next two behind them though.
Week 4 has the correct top 2, and if I could swap my bottom 2 of Kitty/The Risk, and the next two of Frankie/Johnny, it would be exactly right.
How confident am I about this? This is all new and unproven, but I am amazed how close itís possible to map the twitter figures onto the results. Iím much less confident about Janet; Iíve no upper bound for her as sheís allegedly won every week so far, so Iíve chosen to put her just ahead of Marcus in week 4; she could be much higher. Amelia is also an unknown, we just donít have enough data for her, but just to get her on the graph Iíve given her the same weighting as Little Mix as they may be fishing in the same area for votes.
Up to you whether you think this is accurate or not, but if so:
SoÖwhat do you think? I may be setting myself up for a kicking at the end of the series if this is way off, but there does seem to be a strong pattern over the weeks. Iíve tried something similar based on youtube views, but the patterns seem far weaker (and contradict each other in places.)
Iíll aim to predict next weekís voting percentages when the anlytk stats are released. Comments/criticisms/suggestions welcome, I post as tpfkar at sofabet.com, and email@example.com gets me too.
Thanks for reading
Update November 21st: See here for the first live attempt at using this for a vote prediction.