State of Data #63

#analysis – 46 page Internet Marketing Strategy “briefing looking at customer centricity, channel diversification, data, social media and content strategy. This is their usual high grade quality and worth a look”

Disdain Data Diving – “Today’s Big Data heavy-lifting machines and software systems were built back in the day when millions of customers made millions of phone calls and each one had to be captured, stored, and found in a heartbeat. Banking and credit card transactions by the billions had to be put into safekeeping somewhere they could be added up, averaged, and recalled if need be.

#architecture –  MongoDB loves BSON (Binary JSON) for Data Exchange —

“Fast scan-ability. For very large JSON documents, scanning can be slow. To skip a nested document or array we have to scan through the intervening field completely. In addition as we go we must count nestings of braces, brackets, and quotation marks. In BSON, the size of these elements is at the beginning of the field’s value, which makes skipping an element easy.

#big_pig_data – Angry Birds is played 1.4B minutes a week. Now, they have tied up with a predictive analytics solution provider to help forecast pig smashing abilities.

#Data_Science –   Multiple packages in R to read online datasets 

 – A phenomenal paper from NoCOUG on ‘NFS Tuning for Oracle’ (PDF) by Kyle Hailey. 

 #idea – Facebook engineer suggests reducing disk RPM to reduce data center power cost


Item Value
Normal Speed 7200 RPM
Reduced Speed 3600 RPM
State Transition (triggered by an OS command) 15 seconds
Normal Idle Power 7W
Reduced Speed Idle Power 3W
Normal Bandwidth ~100 MB/s
Reduced Speed Bandwidth >10 MB/s
Normal Latency ~10 ms
Reduced Speed Latency <100 ms



 #learning – What every Data Programmer Needs to know about Disks (PPT; from OSCON 2011) – very highly recommended especially for ‘Why EC2 I/O is Slow and Unpredictable’ –

Newer intel chips have the northbridge controller on-die. Southbridge bandwidth is usually <= 10GB/sec, and you are sharing this with other customers’ network and disk I/O. That, and you may be sharing drive spindles.



#visualization – Stanford’s ‘Republic of Letters’ visualization – “on database of thousands of letters exchanged between prominent intellectuals in the 17th and 18th centuries” – is made on HTML5. Has connections, volume and flow views of over 55,000 letters exchanged among 6,400 correspondents.



About Nilendu Misra
I love to learn, create and coach. Things that I do well are - Communicating ideas - verbally or through words and diagrams; Problem Solving - Logical or Abstract; Very Large Scale Systems; think about 'Frighteningly Simple' approach first. Things that I intend to do better are - Establishing Stringent Process; Exchanging Tough Feedback; Keeping up with my reading or To-Do list to be able to completely relax.

One Response to State of Data #63

  1. Pingback: State of Data #63 | Online Banking and Internet Banking


Get every new post delivered to your Inbox.

%d bloggers like this: