State of Data #71
October 21, 2011 Leave a comment
#analysis – How to (or, not to) predict Nobel Prize winners, year after year
Is cycling unhealthier than driving? Statisticians argue.
#architecture – Comparison between Terracotta and Memcache (disclaimer: from Terracotta, but mostly in line)
#big_data – Could teaching MapReduce be good for beginning undergraduates?
The basic issue is that Google’s narrow MapReduce API conflates logical semantics (define a function over all items in a collection) with an expensive physical implementation (utilize a
parallel barrier). As it happens, many common cluster-wide operations over a collection of items do not require a barrier even though they may require all-to-all communication. But there’s no way to tell the API whether a particular Reduce method has that property, so the runtime always does the most expensive thing imaginable in distributed coordination: global synchronization.
#conference – In PASS 2011, Microsoft declares partnership with Hortonworks to integrate with Hadoop
#Data_Science – Pandas is a New Data Analysis toolkit in Python (on that excuse a moment of cuteness)
#Finally We got a Bayesian Nobel Recipient.
#DBMS – Do you secure the data or the software? Security expert Pete Finnigan postulates it is always about securing data (PDF) in a lecture, out of all places, in Bletchley Park.
#idea – Why ‘form factor’ of data should evolve into ‘of courseness’
The old players, much like the old Facebook, presented information in a way the filesystems or databases viewed it. Apple refactored this data back into a familiar interface; one more similar to the ways humans interacted with the same language prior to it’s digital translation.
#learning – Must read new book in Safari – ‘Big Data Glossary’ – where else could we learn about Hypertable, MapR, OpenNLP, Fusion Tables and BSON all in the same place!
#visualization – Do word clouds add genuine value or merely are “mullets of the Internet”?
#etc
- Could someone predict what TV channel you’re watching from your smart meter data? This paper (originally in German) shows how they did it.
- What TV shows everyone is talking about? Now, what TV shows Rihanna Fans or Diet Coke drinkers are talking about? Social Data/TV Leaderboard reveals it all!
- Data + Viz + Social Awareness + Gamification – Living within means with Playspent.org
- How data is helping babies (and their parents!) to sleep better
