Can Twitter predict X Factor voting?

Can you really use twitter to work out the fortunes of the X Factor contestants and predict who is beating who? I’m no expert on Twitter, but thanks to Toby at http://anlytk.wordpress.com there’s a set of figures available for positive and negative sentiment for each contestant each week. Two obvious problems to turn these into predictions though:

1 – how do you deal with negative comments? The phone vote picks up favourite acts, but it’s clear that some of the acts in trouble had many positives but also plenty of negatives?
2 – how do you factor in that some acts will have a big fanbase on twitter, but others who are doing fine on the phone vote won’t have much support there?

For 1, the fairest way seems to be to subtract a proportion of the negatives from the positives. So score = positive comments – (k*negative comments) where k is less than 1. What should this k be though? 0 ignores negative comments, k = 1 assumes each negative cancels out a positive. Somewhere in between is probably right.

Question 2 is easier; although the raw numbers don’t tell you very much, the ratios and changes from week to week give big clues. If Act A has 1000 positive comments in weeks 1 and 2, and Act B has 300 in week 1 and 800 in week 2, that tells me Act B has done far better against Act A in the second week. I also assume that acts don’t change fanbase too much during the competition; this means that a single weighting factor can be used to multiply an act’s twitter score up to give a fair comparison.

How can I tell how the remaining acts are doing?

There are plenty of constraints as the bottom 2 acts are announced, and credible newspaper reports on how the voting is going which give together:

So can we find a value of k, and weighting factors for each act which take the twitter stats and give the results above? After plenty of trial and error, here’s the closest I can get:

How accurate is this against published results?

Weeks 1,2, and 5 match all the constraints.
Week 3 is exactly right except for The Risk scoring, er, 0% in the vote. This was the week when Ashley morphed into Ashford, so a strange week for them overall – I can’t come up with a decent explanation . I correctly have Misha and Sophie the next two behind them though.
Week 4 has the correct top 2, and if I could swap my bottom 2 of Kitty/The Risk, and the next two of Frankie/Johnny, it would be exactly right.

How confident am I about this? This is all new and unproven, but I am amazed how close it’s possible to map the twitter figures onto the results. I’m much less confident about Janet; I’ve no upper bound for her as she’s allegedly won every week so far, so I’ve chosen to put her just ahead of Marcus in week 4; she could be much higher. Amelia is also an unknown, we just don’t have enough data for her, but just to get her on the graph I’ve given her the same weighting as Little Mix as they may be fishing in the same area for votes.

What does this tell us for the final few weeks?

Up to you whether you think this is accurate or not, but if so:

So…what do you think? I may be setting myself up for a kicking at the end of the series if this is way off, but there does seem to be a strong pattern over the weeks. I’ve tried something similar based on youtube views, but the patterns seem far weaker (and contradict each other in places.)

I’ll aim to predict next week’s voting percentages when the anlytk stats are released. Comments/criticisms/suggestions welcome, I post as tpfkar at sofabet.com, and thesameusername@hotmail.com gets me too.

Thanks for reading

Update November 21st: See here for the first live attempt at using this for a vote prediction.