State of Data #116
September 14, 2012 1 Comment
Top Read
Reddit’s Database has only Two tables
“they use two tables for each “thing”, so a thing/data pair for accounts, a thing/data pair for links, etc.”
Analysis
Gregor Mendel’s Suspicious Data
“He [Mendel] was most anxious to have his results replicated and expanded, for even self-possessed people (and he wasn’t) entertain occasional misgivings about the accuracy, originality, and significance of their work.
To achieve these goals, his work had to be understood. In comparison to his theories, of whose validity he was sure, the data were of no significance whatsoever.”
Big Data
Cool Algorithms: How toEstimate Cardinality of Large Datasets
Data Science
Ads and the City: Considering Geographic Distance in Recommendations (pdf)
“..in human mobility, we learn two insights: 1) there are special individuals who visit many places; and 2) individuals go to a venue not only because they like it but also because they are closeby.
We model these insights into two simple models and learn that: 1) simply recommending power users works better than random but is far from producing the best recommendations; 2) an item-based recommender system produces accurate recommendations; and 3) recommending places that are closest to a user’s geographic center of interest produces recommendations that are as accurate as item-based recommender’s”
DBMS
Tom Kyte hands over the ‘keys to Oracle’ – and it is free
Idea
HBR’s ‘Big Data’ Insight Center
Learning
How Google builds Maps and provides Directions
“I came away convinced that the geographic data Google has assembled is not likely to be matched by any other company. The secret to this success isn’t, as you might expect, Google’s facility with data, but rather its willingness to commit humans to combining and cleaning data about the physical world.”
Visualization
Some great Data Visualization Tutorials from Flowing Data
etc
- Is 108 the most beautiful number?
- Visualizing Drake Equation: How many alien civilizations exit?
- Best chess openingsfrom over 2 million matches
- Data Werewolves from LinkedIn
I every time emailed this weblog post page to all my contacts, because if like to read it then my friends will too.