State of Data #116

Top Read

Reddit’s Database has only Two tables

they use two tables for each “thing”, so a thing/data pair for accounts, a thing/data pair for links, etc.”

Analysis

Gregor Mendel’s Suspicious Data

“He [Mendel] was most anxious to have his results replicated and expanded, for even self-possessed people (and he wasn’t) entertain occasional misgivings about the accuracy, originality, and significance of their work.

To achieve these goals, his work had to be understood. In comparison to his theories, of whose validity he was sure, the data were of no significance whatsoever.”

Big Data

Cool Algorithms: How toEstimate Cardinality of Large Dataset

Data Science

Ads and the City: Considering Geographic Distance in Recommendations (pdf)

“..in human mobility, we learn two insights: 1) there are special individuals who visit many places; and 2) individuals go to a venue not only because they like it but also because they are closeby.

We model these insights into two simple models and learn that: 1) simply recommending power users works better than random but is far from producing the best recommendations; 2) an item-based recommender system produces accurate recommendations; and 3) recommending places that are closest to a user’s geographic center of interest produces recommendations that are as accurate as item-based recommender’s”

DBMS

Tom Kyte hands over the ‘keys to Oracle’ – and it is free

Idea

HBR’s ‘Big Data’ Insight Center

Learning

How Google builds Maps and provides Directions

“I came away convinced that the geographic data Google has assembled is not likely to be matched by any other company. The secret to this success isn’t, as you might expect, Google’s facility with data, but rather its willingness to commit humans to combining and cleaning data about the physical world.”

Visualization

Some great Data Visualization Tutorials from Flowing Data

etc

About Nilendu Misra
I love to learn, create and coach. Things that I do well are - Communicating ideas - verbally or through words and diagrams; Problem Solving - Logical or Abstract; Very Large Scale Systems; think about 'Frighteningly Simple' approach first. Things that I intend to do better are - Establishing Stringent Process; Exchanging Tough Feedback; Keeping up with my reading or To-Do list to be able to completely relax.

One Response to State of Data #116

  1. Ahmad says:

    I every time emailed this weblog post page to all my contacts, because if like to read it then my friends will too.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: