‘Sentiment Analysis’ – How to deal with Systemic Rebellion?
April 5, 2011 1 Comment
“Sentiment Analysis” is the fancy technical name of summarizing subjective group opinion about something into, usually, a quantitative rating. e.g., How can we tell if people love “French Laundry” more than “Chef Liu” from Yelp reviews? One obvious source to run such analysis is social networks, Amazon/Netflix/Yelp reviews, TechCrunch comments etc.
A while ago, I came to a realization after analyzing 1000+ book reviews – sorted by date – that books tend to get higher, more positive reviews near the release time. The wider known the author is, the better immediate review the book seems to get. This seems to make intuitive sense. Most early readers wait long time to read, say, newest Harry Potter. Some of them also fall under the spell of “sunk cost fallacy” — the same reason you see more Fords on the street immediately after you buy one. The time-money-emotional investment often tend to overlook any weakness in the new book of the popular author.
However, this model of mine just turned upside down. I was eagerly waiting for one of my favorite fiction author’s new book. It is released today and it should be in any moment from Amazon. I casually went to check Amazon reviews — it typically is 5 stars on the first week (same for James Patterson too!). Except this time, the reviews were scathing. 35 out of 44 reviewers gave it ONE star — none of it for the story or the writing. It seems Amazon was charging more for the Kindle version than hardcover. Users came together in the “Discussion Forum” and decided to send a clear and consistent message by drubbing the book. Some of the reviewers in fact apologized to the author for bad reviews without even reading the book.
So, how could I take care of this challenge during ‘Sentiment Analysis’? Normalizing the “extreme reviews” with consistently similar tonal score (rant against Amazon, in this case) looks one possible way. Treating seasonality of reviews / feedback — is it unnaturally clustered around a time-frame – is another. Hooking up with other ‘networks’ (in this case the data from forum) for the same identities could be yet another way. A very interesting challenge indeed.
At the very least, I would be more careful before stating ‘earlier book reviews are always more forgiving’ in public from now on!