State of Data #78

#analysis – Data Informed, Not Data Driven” – How analytics play a critical role in Facebook design decisions. 

Why photo upload “success rate” is so important on Sundays (people do ‘important’ stuff on weekend and upload 150% more pics on Sundays);

‘..very difficult for a set of metrics to fully represent what you value

#architecture – How to build a data mining web app –

“Well, here is a source code that deploys your app with one command on Google App Engine. You just need to focus on where to get the data (ETL), what to do with it (DM), and how to display it (VISUALIZATION). The source code has example that you can swap with an idea of your own.”

 – How 2012 will be for Big Data – Predictions –

a)     Technology – 5 Big Data Predictions for 2012 – Streaming Data Processing

b)    Volume – From IDC – “Big Data will earn its place as the next “must have” competency in 2012 as the volume of digital content grows to 2.7 zettabytes (ZB), up 48% from 2011”

c)     Effectiveness– From Gartner – “Through 2015, more than 85 percent of Fortune 500 organizations will fail to effectively exploit big data for competitive advantage.” 

#Data_Science – Analyzing email to find close friends

‘..took all the e-mail data from an international firm and for one of its offices asked employees to list the people in their social network, dividing the list into friends, colleagues, and acquaintances.

Then Uzzi and Wuchty scanned the workers’ e-mails, for each recording sender and receiver, the time it was sent, and the time it took for the receiver to respond.

The researchers found that both methods—the volume threshold and the response criterion—did a fair job of approximating the social networks the employees had reported themselves.

But then Uzzi and Wuchty tried something new. Instead of looking at the absolute values for volume and response, they looked at the response time.. The new method predicted who was in different employees’ social networks with an accuracy that is several percent higher than the other methods’’

#DBMS – Cloud Storage Benchmark (PDF)


#idea – Management by Statistics

“Lots of folks play fast and loose with statistics to make political points. If I told you the United States has lost most of its manufacturing jobs, is that a problem? What if I told you the United States manufactures the most in the world, but manages to do so with the fewest number of people? (Much like how the U.S. produces the most agricultural goods, but uses very few people to do so) Would you still think that is a problem? You could argue this either way, of course, but the point is that the same observable reality can be presented in various ways, thereby slanting the story.”


#learning – How to ‘cook’ with Data – “Same data, same map, different stories

“As you can see, for each definition of class limits you get a different message. Most people just use equal intervals, but that’s lazy, IMHO. Using equal intervals in a choropleth map is like sorting a bar chart alphabetically. The only thing that is worse than equal intervals is equal intervals plus round numbers.”


#visualization – What the world searched for in 2011 – Casey Anthony to Fukushima




About Nilendu Misra
I love to learn, create and coach. Things that I do well are - Communicating ideas - verbally or through words and diagrams; Problem Solving - Logical or Abstract; Very Large Scale Systems; think about 'Frighteningly Simple' approach first. Things that I intend to do better are - Establishing Stringent Process; Exchanging Tough Feedback; Keeping up with my reading or To-Do list to be able to completely relax.

Comments are closed.

%d bloggers like this: