Can you really use twitter to work out the fortunes of the X Factor contestants and predict who is beating who? I’m no expert on Twitter, but thanks to Toby at http://anlytk.wordpress.com there’s a set of figures available for positive and negative sentiment for each contestant each week. Two obvious problems to turn these into predictions though:

1 – how do you deal with negative comments? The phone vote picks up favourite acts, but it’s clear that some of the acts in trouble had many positives but also plenty of negatives?

2 – how do you factor in that some acts will have a big fanbase on twitter, but others who are doing fine on the phone vote won’t have much support there?

For 1, the fairest way seems to be to subtract a proportion of the negatives from the positives. So score = positive comments – (k*negative comments) where k is less than 1. What should this k be though? 0 ignores negative comments, k = 1 assumes each negative cancels out a positive. Somewhere in between is probably right.

Question 2 is easier; although the raw numbers don’t tell you very much, the ratios and changes from week to week give big clues. If Act A has 1000 positive comments in weeks 1 and 2, and Act B has 300 in week 1 and 800 in week 2, that tells me Act B has done far better against Act A in the second week. I also assume that acts don’t change fanbase too much during the competition; this means that a single weighting factor can be used to multiply an act’s twitter score up to give a fair comparison.

There are plenty of constraints as the bottom 2 acts are announced, and credible newspaper reports on how the voting is going which give together:

- (voting) week 1, Frankie is bottom (Nu Vibe ignored) Kitty is near the bottom, The Risk 2nd, Janet 1st.
- week 2, Kitty/Sami bottom 2, Johnny 2nd, Janet 1st
- week 3, Misha/Sophie bottom 2, Johnny 2nd, Janet 1st
- week 4, The Risk bottom, Kitty/Johnny next, Marcus 2nd, Janet 1st
- week 5, Kitty/Misha bottom 2

So can we find a value of k, and weighting factors for each act which take the twitter stats and give the results above? After plenty of trial and error, here’s the closest I can get:

Weeks 1,2, and 5 match all the constraints.

Week 3 is exactly right except for The Risk scoring, er, 0% in the vote. This was the week when Ashley morphed into Ashford, so a strange week for them overall – I can’t come up with a decent explanation . I correctly have Misha and Sophie the next two behind them though.

Week 4 has the correct top 2, and if I could swap my bottom 2 of Kitty/The Risk, and the next two of Frankie/Johnny, it would be exactly right.

How confident am I about this? This is all new and unproven, but I am amazed how close it’s possible to map the twitter figures onto the results. I’m much less confident about Janet; I’ve no upper bound for her as she’s allegedly won every week so far, so I’ve chosen to put her just ahead of Marcus in week 4; she could be much higher. Amelia is also an unknown, we just don’t have enough data for her, but just to get her on the graph I’ve given her the same weighting as Little Mix as they may be fishing in the same area for votes.

Up to you whether you think this is accurate or not, but if so:

- Janet has been drawn back to the pack but still looks comfortable
- Marcus has increased vote share every week.
- After a strong week 3 (Set Fire to the Rain) Craig is on a fast downward spiral.
- Johnny was polling very strongly; but producer influence is so great that they got him from 2nd to out in a week.
- Little Mix are polling very steadily after a shaky start.
- Sympathy bounces are substantial; Frankie shows as 3rd in week 2, Kitty up to 6th & Misha 4th after being in the bottom 2. Or maybe twitter exaggerates these acts being talked about.
- Janet and Marcus are credible candidates for the win, possibly with Amelia joining them.
- Craig and Misha likely to be the bottom 2 this week, if Little Mix are given a high-profile vocal arrangement.

So…what do you think? I may be setting myself up for a kicking at the end of the series if this is way off, but there does seem to be a strong pattern over the weeks. I’ve tried something similar based on youtube views, but the patterns seem far weaker (and contradict each other in places.)

I’ll aim to predict next week’s voting percentages when the anlytk stats are released. Comments/criticisms/suggestions welcome, I post as tpfkar at sofabet.com, and thesameusername@hotmail.com gets me too.

Thanks for reading

Update November 21st: See here for the first live attempt at using this for a vote prediction.