State of Data #82
January 20, 2012 1 Comment
#analysis – The Rapidly changing landscape of Mobile Data – nice aggregate of latest values of the trends
#architecture – Zero to Hadoop in 5 minutes
#big_data – Extract from ‘Too big to Know’ – a new book on Big Data and its impact on our brains – ‘designed the Eureqa computer program to find equations that make sense of large quantities of data that have stumped mere humans, including cellular signaling and the effect of cocaine on white blood cells. Eureqa looks for possible equations that explain the relation of some likely pieces of data, and then tweaks and tests those equations to see if the results more accurately fit the data. It keeps iterating until it has an equation that works.’
#Data_Science – In Defense of Online Anonymity – Disqus data shows pseudonymous commenters are the best –
#DBMS – Why RAID is so important for databases – A Primer
#idea – Statistician who is building algorithm to forecast when someone will go back to committing a crime – ‘algorithm that forecasts a particular outcome—someone committing murder, for example—Berk applied a subset of the data to “train” the computer on which qualities are associated with that outcome. “If I could use sun spots or shoe size or the size of the wristband on their wrist, I would,”
#learning – 40 years of boxplots (pdf) – ‘Boxplots use robust summary statistics that are always located at actual data points, are quickly computable (originally by hand), and have no tuning parameters. They are particularly useful for comparing distributions across groups.
#etc
- When compassion trumpets data – ‘Doctors don’t really have a clue how to predict how long a patient will live.’ Actual paper (PDF) – ‘A patient is eligible for hospice care if they have an estimated life expectancy of six months or less. .. the actual length of stay is usually less than six weeks’
- ‘Top 1%’ is really mostly about ‘Top 0.1%’ – The growth in 1% is mostly sustained by the 0.1%
- Twitterati ‘#fail Chrome sees the words “flume” and “hadoop” in an email, then suggests that the page is in Spanish and should be translated to English’
- CES 2012 Notable Gadget for Data Junkies – Basis – ’24-hr wristwatch monitor of heart rate, calories burned, level of activity, and duration of sleep. The data is collected and summarized on a beautifully rendered web dashboard that serves as a kind of biometric diary.’


This is utterly nice, very indepth, thank you.