State of Data #60
August 4, 2011
#architecture – Build ‘Just in time, not Just in case’ – Twitter’s ‘Data Architecture 1.0’ did not contain many of intuitively obvious strategies (sharding, primary key partitioning etc) – “Big Data in Real-Time at Twitter”. If it were not from Twitter, a tenured technologist might even have scoffed at it – but, hey, they got it done eventually.
#big_data – Facebook, 37Signals, Twitter all did it. But Ebay’s swift and successful deployment of 100TB SSD shows that “singularity” has been reached for Big Data.
Past and present ‘Big Data’ strategies are based on the paradigm of ‘disk access is expensive, avoid it if you can’. That would be morphed, re-visited and often even ignored by businesses as disk access would become 10x faster in about 18 month for the same price-reliability ratio as of now.
After replacing 100TB of storage in a year, eBay saw a 50% reduction in standard storage rack space, a 78% drop in power consumption and a five-fold boost in I/O performance. That speed boost now allows eBay to bring a new VM online in five minutes, compared to 45 minutes previously.
#DBMS – Save your B, C, D (Business, Customers, Data) from A (Anon attacks)– an excellent pocket reference on ‘SQL Injection’ – for MySQL, Oracle, MSSQL
#learning – What is the answer to every question in the world?
(b) It Depends
‘Indexing the WWW – The Journey so Far’ is a must read for understanding the nuanced trade-offs between supposedly obvious strategies (say, memory-based indexing) vs. typically not the first-choice on coffee table voting (say, disk-based indexing). At some boundary, every solution stops working as advertised. The trick is to find out the extremity gap of the boundary from present business needs.
#visualization – Who uses more storage? Manufacturing wins hands down (Hat tip – Sharat Israni)
- Economics of Keywords in Adwords – Looking at Google keywords cost analysis, web looks like a giant engine mainly used to insure (car, health, cord blood), claim and borrow money.
- Trivial Pursuit of Happiness – How a hastily thought levity of ‘Big Mac Index’ gets economic pulse better than many other extremely well thought of indicators.
- 27,000? How many English words you know? Does it vary between native and non-native speakers? Climate? Statistically, it is easy to find out if you spend about 6 minutes here.
- Simplicity Wins – Eventually, longer words get obliterated lost. Simple ‘X-ray’ killed ‘Roentgenogram’. What do we learn from word usage histogram after getting access to 4% of books ever printed? Does mobile spell checker nudge people to use ‘canonical and shorter’ words?
- Google uses 0.01% of World Electricity thanks to the 900,000+ server fleet