State of Data Last Week -#41
March 25, 2011 3 Comments
#analysis – Pete Warden launches an amazing “Data Science Toolkit” landing page – collection of most useful open-source tools and data sets “wrapped in REST/JSON interface” that could be used for, say, “extracting main text from a news story”
#architecture – Tired of “superficiality” around the NoSQL/ RDBMS decision? Want to read some solid math to see where each benefits from and together can evolve into something way more powerful? ACM publishes what this aggregator feels “Paper of the Year” – “A co-relational Model of Data for Large Shared Data Banks”
#big_data – Google UK’s new quarterly online magazine dedicates inaugural issue to data – contributors include Hans Rosling, Hal Varian. “We used to be data poor, now the problem is data obesity”
#DBMS – How Yelp uses MySQL and InnoDB engine presentation (PDF) – even though the 102-slide “deep technical” presentation starts with “We are not really MySQL or InnoDB experts”, this is as good as it gets.
#learning – ThinkStats: An introduction to Probability & Statistics for (Python) Programmers
#visualization – How glow.mozilla.org visualizes real time download of Firefox 4
- HA/DR is indeed rocket science. Nasa satellite ‘Kepler’ ,too, suffered a 144-hr “outage” due to a ‘network glitch’
- Google maps 300TB of real-world Internet speed data
- So useful. Why didn’t we think of this before? See how many times a URL has been shared on Facebook – will give a pretty good idea of “social buzz”. Next wish – who all are being advised to make me a friend.
- Top 100 Socially networked cities in U.S. Men’s Health magazine ran the analysis?!