State of Data Last Week – #30
January 8, 2011
#analysis – Social Data Mining to forecast economic crisis(PDF) – was 2008 really the first ‘financial crisis sparked by Big Data’
#architecture – Machine vs. Human generated Data difference is simple –machine generated data scales linearly with computing power
#big_data – Billion Prices Project by MIT tracks price fluctuations of >5M items sold by 300 retailers in more than 70 countries
#DBMS – Historical Perspective of ORM and Alternatives (caveat – As he mentions, this person ranks pretty high on ‘orm bad’ Google searches ;-)
#learning – Hans Rosling’s ‘Joy of Stats’ – the whole 59 minutes – is now online.
#outage – ‘100% Data Recovery with unfortunate exception’ (!) from Dec 31 Hotmail outage
#visualization – Interactive Map of Census data block by block ‘including indicators such as ethnic groups, income, housing, families and education’.
- X and Z preservation medicine – Why so many prescription drugs are loaded with X’s and Z’s? Think Scrabble – ‘x and z count for 8 and 10 points’; i.e., those letters are infrequent so words containing those ‘stand out’ in running text. Z is used 0.07%, X is used 0.15% in English words
- Bad Benchmark – 15% of web users may not receive compressed (gzipped) content even if they may have ‘modern’ browsers thanks to ‘proxies and security software’ mangling HTTP header
- Sometimes numbers are just numbers, with no significance – The Creator of ‘answer to life, the universe and everything’ put to rest behind the ‘42’ rumors way back in 1993
- People living in areas with higher number of mobile phone towers have more children? The difference between correlation and causation is typically illustrated by ‘US Highway Fatality Rate goes up linearly with Fresh Lemons imported from Mexico’