State of Data Last Week – Aug 01
August 9, 2010
Cool Numbers – Change flows through the free WI-FI, finally!
- On “World Cup of Data Sorting” — 1TB data is now sorted < 60 sec! (52 nodes * [8 Cores; 24GB RAM; Cisco Nexus 5020 switch]
- VM becomes even more overrated? Economics of PB scale storage pods – 67TB 4U server at $7,867! 24x cheaper than Amazon S3.
- Google owns 98.29% of Mobile search share.
- American (AT&T) iPad users pay 25x per GB compared to Singapore (Singtel) ones.
noRestart? Last Monday, Twitter database took 12 hrs to restart making the whole ecosystem “unusable”.
Free (legal!) book on Probability and Statistics with R. (note – this is introduction to Statistics and Probability using R; NOTIntroduction to R using Statistics and Probability)
37Signals’ database elevator pitch – (1) Use Solid State Disks; (2) Delay sharding as long as you can.
New version of the database SQL Anywhere 12 is now publicly available. Developer free edition can be downloaded here.
Business of Data – Google and CIA co-funding the same data mining startup – Recorded Future.
Next Data Startup idea? Build a common data format for sharing proteomics data. It’s in huge mess today because everyone speaks different “language”.
Data Visualization – 30-minute history. The first pie-chart was published in 1801.
Quote of the week – “3NF is typically a selfless model used by Enterprise data warehouse, which is used by the whole company. Astar schema is a selfish model, used by a department, because it’s already got aggregation in it.” (Forrester)
Cocktail party cheat-sheet –MySQL cannot do hash joins.