State of Data #65

#analysisDraw the correlation curve, *then* see what trend your line maps to.

A great series on A/B testing from 37signals – Part1, Part 2, Part 3 and concludes – “Big photos of smiling customers work


#architecture Analytic Data Management at Zynga (5 TB/day) and LinkedIn – Data is divided into two parts. One part has a pretty ordinary schema; the other is just stored as a huge list of name-value pairs. (This is much likeeBay‘s approach with its Teradata-based Singularity, except that eBay puts the name-value pairs into long character strings.) About half the data is in each part, but I don’t think that’s by deliberate choice.

#big_dataForrester defines ‘Big Data’ – ‘techniques and technologies that make handling data at extreme scale economical’ 


#Data_ScienceHow facial recognition can uncover the first 5 digits of your SSN

#DBMSNeat trick if you quickly want to gain performance on a long-running batch job (e.g., ETL) – reduce number of commits with just a parameter.

#idea Shower of Data’ from Seth Godin –new generation, one that grew up with a data surplus, is coming along…what always happens when something goes from scarce to surplus. First we bathe in it, then we waste it.”

Big Data Now’  from O’Reilly is now available FREE in Amazon Kindle

#visualizationMapping email closing lines



  • ‘The Theory That would not die’ — A History of Bayes Theorem – ‘Alan Turing used it to decode the German Enigma cipher; U.S.
    Navy to search for a missing H-bomb; to assess the likelihood of a nuclear accident; and .. used to verify the authorship of the Federalist Papers’ 

  • Half-life of a link – “The mean half life of a link on twitter is 2.8 hours, on facebook it’s 3.2 hours and via ‘direct’ sources (like email or IM clients) it’s 3.4 hours. So you can expect, on average, an extra 24 minutes of attention if you post on facebook than if you post on twitter

  • Alternative Leading Indicators – Big Mac index is so 2010.
    A reader from the pharmaceutical industry recommends tracking suppositories. “Financial worries and austerity changes in diet cause intestinal disorders,” he says, and sales of suppositories therefore rise as the economy goes down the pan.’
  • Which ‘p’ is which in statisticsYou just get used to it and figure out which p is which from context. It reminds me of George Forman naming all five of his sons George

About Nilendu Misra
I love to learn, create and coach. Things that I do well are - Communicating ideas - verbally or through words and diagrams; Problem Solving - Logical or Abstract; Very Large Scale Systems; think about 'Frighteningly Simple' approach first. Things that I intend to do better are - Establishing Stringent Process; Exchanging Tough Feedback; Keeping up with my reading or To-Do list to be able to completely relax.

Comments are closed.


Get every new post delivered to your Inbox.

%d bloggers like this: