State of Data #62
August 18, 2011 Leave a comment
#analysis – Hotmail product usage data analysis and how it influences the design –
“three types based on their behavior—Filers, Pilers, and Deleters..
Deleters generally delete email after it arrives. Deleters receive an average of 211 email messages each week and end up deleting almost 80% of them.. The mantra for these people is, “My kitchen has to be clean before I start cooking.
Filers put nearly half of their email (44%) into folders immediately after it arrives.
Pilers receive the least amount of email each week (174 messages). But that means they still receive an average of 9,048 email messages per year. Because most of those messages (57%) never leave the Piler’s inbox, their email starts to pile up”
#analysis – Google has started certification on Analytics with detailed “Analytics IQ Lessons” culminating in an exam
#big_data – Whole controversy around KissMetrics Data Collection practices and their official response to the allegations
#conference – ACM Data Mining Camp, October 2011 – “local, cheap, and high-quality learning opportunity”
#Data_Science – Verifying Benford’s Law on Tweets - it works!
#DBMS – Most Big Data engineers mention ‘performance’ as the #1 priority. ‘3-minute test: What do you know about SQL Performance’ lets you figure out strengths, choose between MySQL; Oracle; PostGres; SQL Server and hammer out.
#idea – Are we becoming too analytical? Serious introspection to be self-aware of possible ‘bandwagon effect’ of ‘big data’ and ‘analytics’–
“But the biggest reason I believe these two products have not taken off is their reliance on the belief that simply giving people their data and letting them analyze it is the way to improve behavior (both for health and for the environment)
One of the first things we teach in introductory human-computer interaction (HCI) is that “you are not your user” and “beware designer ego bias.” Google seemed to have fallen into this well-known trap in their design and testing for Google PowerMeter (and perhaps Google Health).”
#learning – Stanford University courses on Data – FREE for Fall, 2011, requires about 10 hrs of work a week per course; class begins on October 10 –
- Introduction to Databases
- Introduction to Artificial Intelligence (Peter Norvig is the instructor!)
- Introduction to Machine Learning
#math/stat – How likely is it for a telephone number (w/o area code) to be prime? About 6%. With area code it may be somewhere around 4%.
#visualization – Dichotomy or Difference? Statistical Graphics vs. Information Visualization – two crisp articles in most recent ‘Statistical Computing and Graphics Newsletter’ (PDF) discuss it from POVs of Computer Science and Statistics. Follow-up from Andrew Gelman is interesting too.
#etc
- R.I.P – Statistician who saved millions of lives dies at 87
- Enough? The world has 3 chickens per person, according to UN statistics
- And Precision! – “SQLite is not designed to replace Oracle. It is designed to replace fopen()”
- Cum hoc ergo propter hoc – Most fundamental law of analysis : Correlation does not imply causation – Cancer causes cell phones

