State of Data #58

 #analysis – Machine Learning Fairy Dust“Machine learning as a meme is very similar to “social” five to ten years ago: you took an okay-ish concept, added some crowdsourcing, folksonomies and social networking, and there it was, your wonderful Web 2.0 brainchild


#architecture – my/new/no/sql – Amazon CTO Werner Vogels and Facebook DB Engineer shreds Stonebraker’s tall claim (‘Facebook trapped in MySQL – fate worse than death’; see SoD #56).

Vogels tweeted –

Ouch!! “If you have never developed anything of that scale, you cannot be taken serious if you call for the reengineering of facebook’s data store,”
no troll left behind –  “Scaling data systems in real life has humbled me. I would not dare criticize an architecture that the holds social graphs of 750M and works”.

Facebook DB engineer   “What happens in real world if one gets 2x efficiency gain? Twice more data can be stored, twice more data intensive products can be launched.
What happens in academia of in-memory databases, if one gets 2x efficiency gain? A paper.”


#big_data – How to avoid Hadoop’s ‘tremendous inefficiency’? Daniel Abadi ruminates –


“The problem with Hadoop is that its strength is also its weakness. Hadoop gives the user tremendous flexibility and power to scale all kinds of different data management problems. This is obviously great. But it is this same flexibility that allows the user to perform incredibly inefficient things and not care …”

#learning – What REALLY kills transactional app performance – if all developers watch this 2 minutes of video snippet, most applications could be significantly faster.

‘Nested Select’ or ‘N+1 problem’ is firing many SQLs to get essentially the same set of underlying data. The metaphor to understand this anti-pattern is FANTASTIC—


“Would we do this for grocery? Then why would we use this pattern to get data out?

1.              Drive to the super market
2.              Locate what’s needed (e.g., milk)
3.              Pay
4.              Store the item in the car
5.              Drive back home
6.              Store the item (e.g., fridge)
7.              Then start again for the next item on the shopping list (e.g., corn flakes).”


#visualization – 80-ft wide visualization display driven by Space-Time Insight’s analytics’



§          Top 3 names as password – Maggie, Michael, Jennifer

§          14% passwords are purely numeric

§          Most popular keyboard pattern password is – drum roll – querty


About Nilendu Misra
I love to learn, create and coach. Things that I do well are - Communicating ideas - verbally or through words and diagrams; Problem Solving - Logical or Abstract; Very Large Scale Systems; think about 'Frighteningly Simple' approach first. Things that I intend to do better are - Establishing Stringent Process; Exchanging Tough Feedback; Keeping up with my reading or To-Do list to be able to completely relax.

One Response to State of Data #58

  1. Mohan Arun says:

    On the FB/MySQL debacle, I just remembered CUBRID, positioned as an alternative to high-traffic MySQL installations (> 2mn records).

%d bloggers like this: