State of Data Last Week – #33
January 30, 2011 Leave a comment
#analysis – Daniel Huffman filtered 1.5 million tweets from March and April 2010 and mapped the rate of profanity (12M PDF map) across America
#architecture – Ron Bodkin – founder of ‘Think Big Analytics’ – discusses big data architecture. He raises some interesting patterns on analyzing cross-data center large data where fitting most data within RAM is not feasible. Also, ‘Data Scientist kind of guy’ needs to ‘notice anomaly’ rather than be just an expert on ‘statistical abstract reasoning’. Even better – ‘Small and big data are really a continuum’. Small data can use the practices and tools and tremendously benefit as well.
#big_data – Datamarket.com launches live with 13,000 data sets, 100M time series, 600M facts including from UN, World Bank, Eurostat, Gapminder etc
#career – How to write a ‘noSQL CV’
#DBMS – Latest ACM issue ruminates about “System Administration Soft Skills” – all of it applies especially to other ‘backend’ technical jobs as well – DBAs etc.
#learning – Very nice SQL to Pig (Hadoop) reference cheat-sheet
#visualization – Intuit releases “Small Business By The Numbers” visualization (larger version). Does it really take a day and $109 to open a start-up in New Zealand?
#etc
- Not just games anymore – humans have collectively spent almost 6 million years playing “World of Warcraft”
- Why do Buses come in three – Nifty data app ‘shows not just what bus to take..but also which of the approaching buses that will take you there have any seats left’
- A millisecond of speed boost could cost $80,000. Speaking of network latency, this is how Netflix streaming APIs perform on Top ISP Networks – HD streams at about 4,800 Kbits/sec – CableVision either is really very fast, or most of its customers just watch a lot of recent (i.e., HD) movies.
- United States of Surname visualization – ‘not just Smiths and Johnsons – but also of Garcias and Nguyens’