State of Data #61

#analysisWhat does that “Register” button cost you? It cost one e-tailer $300M/year as “fastest way to alienate those customers and scare away that free money is to make its owner establish a relationship with you before s/he can make a purchase”.

Consumer:Creator ratio – 1M:1 (50 years ago) to 100:1 (Etsy era) 

Very detailed tabular comparison of Top 6 “Cloud Computing” services (PDF) – AWS; GAE; Azure;; RackSpace and GoGrid


#big_data(Greenplum + SAS) vs. ($5K hardware + R Enterprise) – the latter ran logistic regression on 1 Billion records in 75 seconds – “ just as fast, and at less than 1% of the hardware cost

#Data_Science –   Machine Learning on Big Data – Lessons Learned from Google Projects. E.g., how do they render the ‘best guess’ in the following search?


#DBMSMythbusters: Stored Procedures Edition – agree or disagree, worth a read.



#learningBell curve (or, normal distribution) is not just a math thing, it is naturally ubiquitous. Watch out for it in door wear patterns (why would the left door wear distribution sit above the right door – this editor has a theory. Hint: which hand most would carry goods getting out of a store?)


#visualizationEver think what the real color of summer would be? Or, of Thursday? “using simple algorithms on data originating from subjective human perceptions — system created to find out the colour of anything, by querying and aggregating image data from Flickr”


  • Would you choose a different number if asked for ‘favorite number’ than ‘random number? Most people intrinsically like Prime numbers. Help uncover world’s most ‘favorite number’ 

  • Backup 1: Chaos 0 – Make Data ImmortalStartup claims a DVD form-factor storage that “you can dip it in liquid nitrogen and then boiling water without harming it” 
  • Backup 1: Chaos 1 – ‘Hard Disk Crusher” – a ‘new spin on destruction’. Economist writes – “A baseball bat might have been more liberating, but the hydraulic crusher’s surgical precision nonetheless holds a certain charm.


