State of Data #87
February 24, 2012 2 Comments
#analysis – Why retailers like J.C Penney and Gamestop are shutting down their Facebook stores
#architecture – MapReduce Patterns, Algorithms and Use Cases
#big_data – How companies are using big data, the latest round trip –
- Target (yeah, that pregnancy meme)
- Pulse – to drive rich user features
- Facebook – ‘weblining – when you may be refused health insurance based on your Google search about a medical condition’
- Heroku – send SQL query results sent as URL
#Data_Science – The term ‘Data Science’ existed 10 years ago – Data Science paper (pdf) from William Cleveland (2001) indicates six areas to master, most important being ‘Multidisciplinary Investigations’.
#DBMS – Latest MySQL version claims significant performance improvement especially with NoSQL patterns (1B queries/minute)
#idea – ‘GOOD with numbers? Fascinated by data? The sound you hear is opportunity knocking.’
#learning – **** ‘Machine Learning for Hackers’ is now available in Safari and in Amazon. Great book!
#visualization – THREE Data Visualization Books – from decades ago –
#etc
- Data viz of 10,000 taxis in Manhattan
- iPad reading speed is 6.2% slower; Kindle – 10.7% slower than printed book. Good news is the gap is rapidly shrinking
- Guilt free. Finally!! Concept of ‘Statistical Significance’ would never be here without beer
- Alan Turing’s Library List, and his report cards – (Mathematics) Not very good.



“… – Data Science paper (pdf) from William Cleveland (1981) indicates ” I’ve been tracking this doc and appers to be published on 2001:
http://stat.bell-labs.com/wsc/webpapers.html
http://www.stat.purdue.edu/~wsc/papers.html
Is there something I’m missing?
You are right. My bad. Fixed.