State of Data #76
December 2, 2011 Leave a comment
#analysis – Analytics – The Widening Divide (MIT-IBM Study) –
- Aspirational companies just thinking about analytics,
- Experienced companies with some solid progress on analytics
- Transformed companies, those with advanced capabilities and significant results.
#architecture – Realtime Data Mining at 120,000 tweets per second –
“Data is extracted out of the firehose and normalized. Twitter data is highly dimensional, it has 30 plus attributes and you get access to them all. These attributes include geolocation, name, profile data, the tweet itself, timestamp, number of followers, number of retweets, verified user status, client type, etc.”
“All males would be 50% of the firehose or 125 million tweets as there’s a 50-50 male female split on Twitter. Creating a filter for all tweets made by males would not be bright. It would be very expensive. What you want to do is look at use cases. Are you a bank, are you a pharma, and figure out what you are interested in specifically.”
#big_data – Drowning in Data (video; 2’21”) – ‘bandwidth’ of Tweet creation is 46MBPS; Big source of big data is now wireless sensors; Do we need to the biggest unit of measurement (Yotta; 24 0s after 1 byte)?
#conference – Most interesting papers from InfoVis conference 2011
#Data_Science – ‘Teaching Statistics’ (PDF) from Andrew Gelman is a must quick-view. E.g., why are counties with highest kidney cancer mostly in the center-west? Then, how do you explain center west region also has lowest kidney cancer death rates?
#DBMS – A Two-Year Case Study on NoSQL
#idea – Data is the new .com?
“..mere presence of data is not itself an indicator of having deep and relevant data DNA. “Hey, our business generates a lot of data, BIG DATA” is a phrase I hear frequently which I assume is supposed to get me excited. It doesn’t. “Hey, we’ve got this thesis that as our business scales we’re going to build a monster data asset that can better help us attract, retain and monetize happy customers. It will help us create competitive barriers and we’re planning for this from Day 1. We’ve shared some early data with a data-hacker buddy and feel this is a promising avenue for building company value.” Hey now, NOW you’ve got my attention.”
#learning – How to do heat map like ‘Color Scales’ in Excel
#visualization – Bloomberg is DATA – the “About” page of a company transformation
- Music of Math – Google Engineer Alexander Chen created baroque.me – “playable visualization of the famous first prelude from Bach’s cello suites. “Using the mathematics behind string length and pitch,” Chen explains, “it came from a simple idea: what if all the notes were drawn as strings?”
- Spot data ‘overfitting’ in analysis
- #censusData How did recession impact state-to-state migration?
- Wealth of Data != Data of Wealth (Hat tip: Siddharth Ram)