State of Data – #45
April 22, 2011
#analysis – Data Analysis in action ‘Yahoo Search Revenue Disaster’
#architecture – ‘Metrics Everywhere’ (entertaining PDF) from Yammer folks – on how to make better decisions using numbers. Ways to measure – Guages (# of cities), Counters (# of open connections), Meters (# of req/sec), Histograms (# of cities returned – percentile), Timers (# of ms to respond); and Vitter’s Algorithm.
#big_data – IEEE VAST 2011 challenge – three mini-challenges on Epidemic Spread, Cybersecurity, Text Analytics aggregating up to a ‘Grand Challenge’ that combines all data sets.
#DBMS – Here’s a pretty good noSQL “book” or compendium (PDF) from folks in Stuttgart University, Germany – from Basic Concepts to detailed comparison between MongoDB / CouchDB / Dynamo / Cassandra etc
#learning – Split-Apply-Combine’ strategy for Data Analysis (PDF) ‘where you break up a big problem into manageable pieces, operate on each piece independently and then put all the pieces back together’. [Journal of Statistical Software]
#visualization – How Switzerland Federal Statistics Office changed the game in Census – ‘only a small population is now surveyed by phone or in person’
- See the winning apps in World Bank Data Apps challenge (mentioned in this space before)
- Sort Algorithms explained by Hungarian folk dancers. Beautiful!
- Goofy way to look at all 49,571 symbols humanity uses to exchange information (Unicode system) – at 24fps it takes 33 minutes
- Why humans love Pie charts so much? (PDF) Stephen Few answers on the ‘irresistible fascination’ at latest Perceptual Edge