State of Data #65
September 9, 2011 Leave a comment
A great series on A/B testing from 37signals – Part1, Part 2, Part 3 and concludes – “Big photos of smiling customers work”
#architecture – Analytic Data Management at Zynga (5 TB/day) and LinkedIn – Data is divided into two parts. One part has a pretty ordinary schema; the other is just stored as a huge list of name-value pairs. (This is much likeeBay‘s approach with its Teradata-based Singularity, except that eBay puts the name-value pairs into long character strings.) About half the data is in each part, but I don’t think that’s by deliberate choice.
#big_data – Forrester defines ‘Big Data’ – ‘techniques and technologies that make handling data at extreme scale economical’
#Data_Science – How facial recognition can uncover the first 5 digits of your SSN
#DBMS – Neat trick if you quickly want to gain performance on a long-running batch job (e.g., ETL) – reduce number of commits with just a parameter.
#idea – ‘Shower of Data’ from Seth Godin –“A new generation, one that grew up with a data surplus, is coming along…what always happens when something goes from scarce to surplus. First we bathe in it, then we waste it.”
#learning – ‘Big Data Now’ from O’Reilly is now available FREE in Amazon Kindle
#visualization – Mapping email closing lines
#etc
- ‘The Theory That would not die’ — A History of Bayes Theorem – ‘Alan Turing used it to decode the German Enigma cipher; U.S.
Navy to search for a missing H-bomb; to assess the likelihood of a nuclear accident; and .. used to verify the authorship of the Federalist Papers’ - Half-life of a link – “The mean half life of a link on twitter is 2.8 hours, on facebook it’s 3.2 hours and via ‘direct’ sources (like email or IM clients) it’s 3.4 hours. So you can expect, on average, an extra 24 minutes of attention if you post on facebook than if you post on twitter“
- Alternative Leading Indicators – Big Mac index is so 2010.
‘A reader from the pharmaceutical industry recommends tracking suppositories. “Financial worries and austerity changes in diet cause intestinal disorders,” he says, and sales of suppositories therefore rise as the economy goes down the pan.’
- Which ‘p’ is which in statistics – “You just get used to it and figure out which p is which from context. It reminds me of George Forman naming all five of his sons George“


