State of Data Last Week – July 6
August 9, 2010
Cool Numbers – Microsoft by Numbers
- Microsoft sells 7 copies of Windows every second;
- Largest 25 US dailies have 16M subscribers
- Netflix alone has 14M members. Xbox has 23M!!
Hadoop Summit 2010 presentations are now available – My favorite sessions were –
- Hadoop and Pig at Twitter
- Data Applications and Infrastructure at LinkedIn
- Integration Patterns & Practices for Hadoop
Good old (R)DBMS still alive– Why Quora chose MySQL as data store rather than the NoSQLs? The elevator pitch is – if your app can run fine with partitioned data (i.e., not having to go to more than 1 shard / partition), it will be fine with just about any data store. I also liked use of Donald Knuth’s “Premature optimization is the root of all evil” there.
OK, enough of dinosaurs! What about Facebook or World’s Largest Hadoop now – 21PB of data in a single HDFS cluster, 12TB/server/32GB RAM. Yahoo loses again with a “meagerly” 12PB!
And Twitter’s? Avi Bryant of Twitter Analytics Team speaks this week about how DabbleDB could improve Twitter ad efficiency.
Here’s one trivia to impress folks – “SQL IS “Turing Complete” (a fancy way to indicate you can do IF-ELSE and GO TO). If you want to brush it up – SQLZoo is a fantastic FREE resource. Spread it to all the new interns and developers – they will thank you after a year! Also, it is database agnostic.
On Mobile Data API front – it’s either an early Christmas or a very bad hurricane. Oracle joined SQLLite Consortium in late June.
Lastly, how expensive was Amazon’s 3-hr “outage” last week? About $5.25M (they’ve hourly revenue of $1.75M)