State of Data #58
July 22, 2011 1 Comment
#architecture – my/new/no/sql - Amazon CTO Werner Vogels and Facebook DB Engineer shreds Stonebraker’s tall claim (‘Facebook trapped in MySQL – fate worse than death’; see SoD #56).
Vogels tweeted -
Ouch!! “If you have never developed anything of that scale, you cannot be taken serious if you call for the reengineering of facebook’s data store,”
no troll left behind – ”Scaling data systems in real life has humbled me. I would not dare criticize an architecture that the holds social graphs of 750M and works”.
Facebook DB engineer “What happens in real world if one gets 2x efficiency gain? Twice more data can be stored, twice more data intensive products can be launched.
What happens in academia of in-memory databases, if one gets 2x efficiency gain? A paper.”
#big_data – How to avoid Hadoop’s ‘tremendous inefficiency’? Daniel Abadi ruminates –
“The problem with Hadoop is that its strength is also its weakness. Hadoop gives the user tremendous flexibility and power to scale all kinds of different data management problems. This is obviously great. But it is this same flexibility that allows the user to perform incredibly inefficient things and not care …”
#learning – What REALLY kills transactional app performance - if all developers watch this 2 minutes of video snippet, most applications could be significantly faster.
‘Nested Select’ or ‘N+1 problem’ is firing many SQLs to get essentially the same set of underlying data. The metaphor to understand this anti-pattern is FANTASTIC—
“Would we do this for grocery? Then why would we use this pattern to get data out?
1. Drive to the super market
2. Locate what’s needed (e.g., milk)
3. Pay
4. Store the item in the car
5. Drive back home
6. Store the item (e.g., fridge)
7. Then start again for the next item on the shopping list (e.g., corn flakes).”
#visualization – ‘80-ft wide visualization display driven by Space-Time Insight’s analytics’
#etc
- Num3ers – Santa Cruz police apply predictive analysis to forecast with 71% accuracy on next crime. [ed. Could one day 911 call be reversed? If one’s house is about to be broken into or before a heart attack, 911 calls us.]
- Real time Data Drug raids – Wow!
- Why do hotel guests leave disproportionately high positive reviews? Because of more human contact?
- Password entropy – wonderful analysis of recently leaked passwords.
§ Top 3 names as password – Maggie, Michael, Jennifer
§ 14% passwords are purely numeric
§ Most popular keyboard pattern password is – drum roll – querty
On the FB/MySQL debacle, I just remembered CUBRID, positioned as an alternative to high-traffic MySQL installations (> 2mn records).
http://www.netmagazine.com/news/cubrid-takes-mysql