State of Data #107

Top Read

Nobody ever got fired for using Hadoop on a cluster’ (Microsoft Research)

“We analyzed 174,000 jobs submitted to a production analytics cluster in Microsoft in a single month in 2011 and found that the median job input data set size was less than 14GB… Facebook jobs follow a power-law distribution with small jobs dominating; from their graphs it appears that at least 90% of the jobs have input sizes under 100 GB. We therefore believe that there are many jobs run on these clusters which are smaller than the memory of a single server.”


Highly insightful paper from Microsoft – how product measurement (e.g., A/B testing) often deceives

“Bing, Microsoft’s search engine, had a bug in an experiment, which resulted in very poor search results being shown to users. Two key organizational metrics that Bing measures progress by are share and revenue, and both improved significantly: distinct queries per user went up over 10%, and revenue per user went up over 30%! “

Big Data

7 Startups trying to solve your Big Data Problems

Data Science

Introduction to Data Science’ (free) Book




: Automatic SQL Injection tool


How natural attractiveness of Normal Distribution makes people build elusive models for random, ‘Black Swan’ events. Or, why we made large-scale ‘financial crises’ unavoidable.

“Now for an abnormal question: to what extent is normality actually a good statistical description of real-world behaviour?  Evidence against has been mounting for well over a century.

In the 1870s, the German statistician Wilhelm Lexis began to develop the first statistical tests for normality.  Strikingly, the only series Lexis could find which closely matched the Gaussian distribution was birth rates. The natural world suddenly began to feel a little less normal.”


Everything you wanted to know about Machine Learning under 30 minutes – a talk from Hilary Mason

‘The talk is geared toward engineers with no prior knowledge of machine learning, and it’s designed to lay out the basic vocabulary and way that we think about the world to provide an amusing foundation. This talk is not an in-depth tutorial.


The Blue Economy – Visualizing Fishing, Transport, Energy & Cities



About Nilendu Misra
I love to learn, create and coach. Things that I do well are - Communicating ideas - verbally or through words and diagrams; Problem Solving - Logical or Abstract; Very Large Scale Systems; think about 'Frighteningly Simple' approach first. Things that I intend to do better are - Establishing Stringent Process; Exchanging Tough Feedback; Keeping up with my reading or To-Do list to be able to completely relax.

Comments are closed.

%d bloggers like this: