State of Data #71

#analysisHow to (or, not to) predict Nobel Prize winners, year after year

Is cycling unhealthier than driving? Statisticians argue.

Comparison between Terracotta and Memcache (disclaimer: from Terracotta, but mostly in line)

Could teaching MapReduce be good for beginning undergraduates? 

The basic issue is that Google’s narrow MapReduce API conflates logical semantics (define a function over all items in a collection) with an expensive physical implementation (utilize a
parallel barrier). As it happens, many common cluster-wide operations over a collection of items do not require a barrier even though they may require all-to-all communication.  But there’s no way to tell the API whether a particular Reduce method has that property, so the runtime always does the most expensive thing imaginable in distributed coordination: global synchronization.

In PASS 2011, Microsoft declares partnership with Hortonworks to integrate with Hadoop 

#Data_Science –   Pandas is a New Data Analysis toolkit in Python (on that excuse a moment of cuteness)


#Finally We got a Bayesian Nobel Recipient.

#DBMSDo you secure the data or the software? Security expert Pete Finnigan postulates it is always about securing data (PDF) in a lecture, out of all places, in Bletchley Park.

Why ‘form factor’ of data should evolve into ‘of courseness’

The old players, much like the old Facebook, presented information in a way the filesystems or databases viewed it. Apple refactored this data back into a familiar interface; one more similar to the ways humans interacted with the same language prior to it’s digital translation.  

Must read new book in Safari – ‘Big Data Glossary’ – where else could we learn about Hypertable, MapR, OpenNLP, Fusion Tables and BSON all in the same place! 

#visualizationDo word 
clouds add genuine value or merely are “mullets of the Internet”? 


  • Could someone predict what TV channel you’re watching from your smart meter data? This paper (originally in German) shows how they did it.
  • What TV shows everyone is talking about? Now, what TV shows Rihanna Fans or Diet Coke drinkers are talking about? Social Data/TV Leaderboard reveals it all!
  • Data + Viz + Social Awareness + Gamification – Living within means with
  • How data is helping babies (and their parents!) to sleep better

