State of Data Last Week – Oct 02
October 3, 2010 Leave a comment
<Cool Numbers> Data can ban grunting in tennis; Are IMDB female users far less tolerant of bad movies; $ value for 0.1% increase in employee engagement.
- Female IMDB users are far less tolerant of bad movies compared to male. For “Top 50” movies, male:female rater ratio is about 5:1. For “Bottom 10” movies, male:female rater ratio decreases to 3:2
- Opponents of Tennis players who grunt during serve (e.g., Maria Sharapova – 100 decibels) get significantly slower (21-33ms). In professional tennis, this delay translates to the ball travelling 2 extra feet before the opponent can otherwise respond.
- Talent Analytics – Starbucks and Best Buy can apparently identify the value of a 0.1% increase in engagement among employees at a particular store. At Best Buy, for example, that value is more than $100,000 in the store’s annual operating income
- Measuring unique visitors, Paypal was #1 US FI (Financial Institution) in August. It was visited by roughly 5 million more unique visitors than the 2nd (Chase)
<Analysis> How to identify and stay clear of “Faux Marketing Metrics” serving no real purpose (including, may be, on this newsletter).
<Strategy/Arch> The worst metaphor in Cloud Computing is apparently “Cloud in a box” (Thanks Oracle!). NoSQL is #9, and “Cloud Computing” itself is at #15.
<Big Data> LinkedIn analytics team looks under the hood of Signal (heavily using Lucene) – social search for LinkedIn-Twitter accounts.
<Schema> How Google did incremental real time search with “Percolator” (on BigTable) and reduced average age of documents 50% – excellent white paper by Googlers from UseNix 2010. Percolator is shown to be 1000x faster than traditional MapReduce.
<DBMS> How is the contention to read the same block another node is writing at the same time processed in Oracle RAC – the “gc buffer busy waits” and the tactics to address it really explained well.
The very sub-optimal way to convert IP address to an integer in MySQL (and to write it in a book!) is shown here.
<Visualization> Crappy iPhone signal is not a problem anymore. Bring Chuck Norris closer to the iPhone, all bars will light up instantly!
<Cocktail party cheat-sheet> Thinking to add indexes defensively? 31 indexes on a 340 column table could slow down inserts 8x.