State of Data #81
January 13, 2012 2 Comments
‘studied 51,854 reviews contributed to Amazon, covering 858 books from 2000 to early 2004. We found that the order in which reviews are written matters a great deal: Some newly posted reviews tend to disagree with existing reviews, instead of only focusing on the book.;
#architecture – IBM’s Architecture for Astronomical Big Data
‘A main design challenge is how to process one Exabyte of raw data per day. This is the data amount anticipated when the SKA system as the world’s largest and most sensitive radio telescope will be ready; it’s construction will start in 2016. IBM claims that this data amount exceeds the entire daily Internet traffic. The amount would suffice to fill over 15 million 64 GB iPods.’
#big_data – Can Data Science predict Hit songs? Hey ya! They say you can ‘score’ your own song real soon. Insights –
- Before the eighties, the danceability of a song was not very relevant to its hit potential. From then on, danceable songs were more likely to become a hit. Also the average danceability of all songs on the charts suddenly increased in the late seventies.
- In the eighties slower musical styles (tempo 70-89 beats per minute), such as ballads, were more likely to become a hit.
#Data_Science – PageRank algorithm to find the ‘Best Cricket Team’ (pdf) and ‘Best Captains’ in different formats of the game
#DBMS – Jonathan Lewis’ ‘Oracle Core: Essential Internals’ has already been dubbed ‘likely be the best Oracle internals book out there for the coming 10 years’ by folks who are top of the trade.
#idea – Next time you go to a doctor for physical, your data collection may be ‘gamified’ and a whole lot more fun thanks to TonicHealth
#learning – ‘Modeling with Data – Tools and Techniques for Scientific Computing’ – now full book available from the author.
‘When I talk to a statistician, a model means a probability distribution over elements, and that’s about it. I’d start talking to a statistician about modeling subject-specific knowledge about the interaction of elements, and giant question marks would appear over his head. Which is not to say that the person is a moron, but just that his understanding of the meaning of the word model is much more narrowly focused than mine.’
#visualization – Visualize CPU Utilization in a Large Data Center – models and approaches
#etc
- ‘Millions of hours of supercomputing’ proves minimum number of clues to complete a 9*9 grid of Sudoku is 17 (most newspapers have 25 clues)
- Place this gadget, called SnapShot, under your car dashboard for 30 days and it will generate a report of how much you drive, at what time of day, and how many sudden stops you make.’
- Map of the World where countries are weighted by Number of languages they produced
- Why Online Merchants want you to drink? Simple, their sales peak when you do!
Many thanks for your submission, previously interesting and compelling. I found my way here through Google, I’ll return over again
Please, keep submitingmore stuff like this its intresting!!