State of Data Last Week – #38
March 4, 2011 Leave a comment
#analysis – 133M blog posts, 231M social media feeds. 3TB data set collected between Jan 13 and Feb 14, 2011. Data Challenge – culminating in ICWSM @ Barcelona this summer – ‘locate significant posts in the collection which are relevant to the revolutions in Tunisia and Egypt’.
#architecture – StackOverflow Architecture lowdown – how it deals with 800 HTTP requests/second. “Some raw SQL” in data access layer.
#big_data – MongoDB – apparently works great when entire data fits in the memory; otherwise it could be ‘up to 17 sec for 30,000 reads’
#DBMS – Big Data is Big Business – TeraData buys Aster Data for $263M
#learning – Why ‘most benchmarks are seriously broken’ because ‘complexity and performance model quality are inversely related’ - a great talk on ‘Performance Anxiety’ at Devoxx 2010.
#visualization – RStudio – new IDE for R – got raving reviews and many endorsements from community
- Happy Birthday, irrationality – Today, March 4, is the 250th anniversary of the proof that Pi is irrational
- API-nomics — 3 calls per day per living human on earth??!!– Google maps gets 5 billion calls a day; SalesForce gets 50% of their transactions through APIs; Twitter 75%
- Best Questions for a First Date – (some NSFW language warning) analyzing ‘OkCupid’s database of 275,294 match questions—probably the biggest collection of relationship concerns on earth—and the 776 million answers people have given us’
- It’s SO right!! Meta-horoscope ‘made from most common words in 4,000 star sign predictions’ by Visualization Guru David McCandless