State of Data #116

Top Read

Reddit’s Database has only Two tables

they use two tables for each “thing”, so a thing/data pair for accounts, a thing/data pair for links, etc.”


Gregor Mendel’s Suspicious Data

“He [Mendel] was most anxious to have his results replicated and expanded, for even self-possessed people (and he wasn’t) entertain occasional misgivings about the accuracy, originality, and significance of their work.

To achieve these goals, his work had to be understood. In comparison to his theories, of whose validity he was sure, the data were of no significance whatsoever.”

Big Data

Cool Algorithms: How toEstimate Cardinality of Large Dataset

Data Science

Ads and the City: Considering Geographic Distance in Recommendations (pdf)

“ human mobility, we learn two insights: 1) there are special individuals who visit many places; and 2) individuals go to a venue not only because they like it but also because they are closeby.

We model these insights into two simple models and learn that: 1) simply recommending power users works better than random but is far from producing the best recommendations; 2) an item-based recommender system produces accurate recommendations; and 3) recommending places that are closest to a user’s geographic center of interest produces recommendations that are as accurate as item-based recommender’s”


Tom Kyte hands over the ‘keys to Oracle’ – and it is free


HBR’s ‘Big Data’ Insight Center


How Google builds Maps and provides Directions

“I came away convinced that the geographic data Google has assembled is not likely to be matched by any other company. The secret to this success isn’t, as you might expect, Google’s facility with data, but rather its willingness to commit humans to combining and cleaning data about the physical world.”


Some great Data Visualization Tutorials from Flowing Data


About Nilendu Misra
I love to learn, create and coach. Things that I do well are - Communicating ideas - verbally or through words and diagrams; Problem Solving - Logical or Abstract; Very Large Scale Systems; think about 'Frighteningly Simple' approach first. Things that I intend to do better are - Establishing Stringent Process; Exchanging Tough Feedback; Keeping up with my reading or To-Do list to be able to completely relax.

Comments are closed.

%d bloggers like this: