Sentiment analysis and the problem with computational analysis

by Jed

Sentiment analysis is, for me, one of the most annoying phrases in the world (as you may have seen me tweet to Matt the other day). Whenever I hear someone mention it I picture two guys in cheap suits speaking to a group of board members at a big brand explaining complicated graphs and pointing at a smiley face, an indifferent face and finally, a sad face and then telling the brand that 65% of the world thinks that it’s product is ace and says so online. These suits then unveil a slide with a picture of a slick looking program that ‘looks at the whole of the web and computes how people feel about you’. Good stuff.

I’m probably not that far from the truth with my little image. Some of those cheap suits might even be reading this post now. Cool.

To be fair, my issue is less with sentiment analysis and more to do with how certain companies sell ‘automated’ sentiment analysis.

The issues with computational sentiment analysis are pretty well documented, and mostly documented by folk much more intelligent than me (that’s you Katie and Jason). The main problem being that computers aren’t smart enough to figure out the nuances of humans.

After sifting through literally hundreds of different services that monitor, track and analyse the web (most of which offer sentiment) – I found that the typical error rate for computational sentiment analysis is around 60-70%. So at most you’re getting valid data 40% of the time.

Yep, six out of ten pieces of information aren’t accurate (IE pointless pieces of information). So if 40% of the time the information is right, and you don’t know which 40%, how can you even begin to understand how people feel about your brand/product/CEO? You can’t, that’s how.

Even with Bayesian Analysis (usually used to filter spam, can kind of be applied to sentiment – still in infancy, mind), Markov Chains (a very complicated method that you can use to help computers learn – beware, I had to dig out my old A Level Maths texts) or even support vector machines (how many, many expensive monitoring companies ‘teach’ their computers to learn) there are still massive flaws with computers trying to understand what we mean when we write.

Plus, when I say ‘I love Washington’, do I mean the location, the actor (Denzel) or the cake? Or a friend of mine? Or a TV programme? These sort of issues are why Google recently bought Metaweb (to make search smarter).

The other problem is that our opinions and sentiment are transient. We’re allowed to change our minds and we frequently do (some of us more than others) and if our content is going to be analysed, how do we do that? Do we create aggregate compound sentiment scores? Or do we display individual mentions? There are too many complications for a computer, or most humans, to understand.

The bottom line is that the best way to compute sentiment analysis online is to do it yourself.

I’ll see you in six months when you’ve finished…