State of Data #65

#analysisDraw the correlation curve, *then* see what trend your line maps to.

A great series on A/B testing from 37signals – Part1, Part 2, Part 3 and concludes – “Big photos of smiling customers work

image

#architecture Analytic Data Management at Zynga (5 TB/day) and LinkedIn – Data is divided into two parts. One part has a pretty ordinary schema; the other is just stored as a huge list of name-value pairs. (This is much likeeBay‘s approach with its Teradata-based Singularity, except that eBay puts the name-value pairs into long character strings.) About half the data is in each part, but I don’t think that’s by deliberate choice.

#big_dataForrester defines ‘Big Data’ – ‘techniques and technologies that make handling data at extreme scale economical’ 

image

#Data_ScienceHow facial recognition can uncover the first 5 digits of your SSN

#DBMSNeat trick if you quickly want to gain performance on a long-running batch job (e.g., ETL) – reduce number of commits with just a parameter.


#idea Shower of Data’ from Seth Godin –new generation, one that grew up with a data surplus, is coming along…what always happens when something goes from scarce to surplus. First we bathe in it, then we waste it.”


#learning
Big Data Now’  from O’Reilly is now available FREE in Amazon Kindle

#visualizationMapping email closing lines

image

#etc

  • ‘The Theory That would not die’ — A History of Bayes Theorem – ‘Alan Turing used it to decode the German Enigma cipher; U.S.
    Navy to search for a missing H-bomb; to assess the likelihood of a nuclear accident; and .. used to verify the authorship of the Federalist Papers’ 

  • Half-life of a link – “The mean half life of a link on twitter is 2.8 hours, on facebook it’s 3.2 hours and via ‘direct’ sources (like email or IM clients) it’s 3.4 hours. So you can expect, on average, an extra 24 minutes of attention if you post on facebook than if you post on twitter

  • Alternative Leading Indicators – Big Mac index is so 2010.
    A reader from the pharmaceutical industry recommends tracking suppositories. “Financial worries and austerity changes in diet cause intestinal disorders,” he says, and sales of suppositories therefore rise as the economy goes down the pan.’
  • Which ‘p’ is which in statisticsYou just get used to it and figure out which p is which from context. It reminds me of George Forman naming all five of his sons George
About these ads

About Nilendu Misra
I love to learn, create and coach. Things that I do well are - Communicating ideas - verbally or through words and diagrams; Problem Solving - Logical or Abstract; Very Large Scale Systems; think about 'Frighteningly Simple' approach first. Things that I intend to do better are - Establishing Stringent Process; Exchanging Tough Feedback; Keeping up with my reading or To-Do list to be able to completely relax.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: