State of Data #81
January 13, 2012 2 Comments
‘studied 51,854 reviews contributed to Amazon, covering 858 books from 2000 to early 2004. We found that the order in which reviews are written matters a great deal: Some newly posted reviews tend to disagree with existing reviews, instead of only focusing on the book.;
#architecture – IBM’s Architecture for Astronomical Big Data
‘A main design challenge is how to process one Exabyte of raw data per day. This is the data amount anticipated when the SKA system as the world’s largest and most sensitive radio telescope will be ready; it’s construction will start in 2016. IBM claims that this data amount exceeds the entire daily Internet traffic. The amount would suffice to fill over 15 million 64 GB iPods.’
- Before the eighties, the danceability of a song was not very relevant to its hit potential. From then on, danceable songs were more likely to become a hit. Also the average danceability of all songs on the charts suddenly increased in the late seventies.
- In the eighties slower musical styles (tempo 70-89 beats per minute), such as ballads, were more likely to become a hit.
#Data_Science – PageRank algorithm to find the ‘Best Cricket Team’ (pdf) and ‘Best Captains’ in different formats of the game
#idea – Next time you go to a doctor for physical, your data collection may be ‘gamified’ and a whole lot more fun thanks to TonicHealth
#learning – ‘Modeling with Data – Tools and Techniques for Scientific Computing’ – now full book available from the author.
‘When I talk to a statistician, a model means a probability distribution over elements, and that’s about it. I’d start talking to a statistician about modeling subject-specific knowledge about the interaction of elements, and giant question marks would appear over his head. Which is not to say that the person is a moron, but just that his understanding of the meaning of the word model is much more narrowly focused than mine.’
#visualization – Visualize CPU Utilization in a Large Data Center – models and approaches
- ‘Millions of hours of supercomputing’ proves minimum number of clues to complete a 9*9 grid of Sudoku is 17 (most newspapers have 25 clues)
- Place this gadget, called SnapShot, under your car dashboard for 30 days and it will generate a report of how much you drive, at what time of day, and how many sudden stops you make.’
- Map of the World where countries are weighted by Number of languages they produced
- Why Online Merchants want you to drink? Simple, their sales peak when you do!