State of Data Last Week – Nov 06
November 7, 2010 Leave a comment
________________________________________________________________
<Inside Intuit> Database Engineer (10+ years of experience) position open in EMS (Reno, Nevada)
Director, Data Offerings position open in BIO (Mountain View, CA)
_________________________________________________________
<Analysis> Expedia removed an optional field (Company) from “Buy Now” page – it cost $12M profit a year otherwise.
<Architecture> When we like a pastry shop – Yelp uses Amazon Elastic MapReduce (EMR) to analyze (100GB/day) using mrjob – a Python framework to write MapReduce jobs. Yelp has taken down their in-house Hadoop clusters in May, 2010.
<Big Data> A list of references for mining from streaming data – map-reduce is not that great for streaming / non-stored data as user does not know what and how much data is to analyze beforehand. Yahoo’s S4 is quickly getting popular as distributed stream computing platform.
<DBMS> InnoDB – faster storage engine for mySQL – is no more available for free in “Classic MySQL” L
<Learning> Adrian Cockcroft (Netflix Performance Architect; ex-Sun) wrote an amazing set of articles comparing noSQL availability models, and What Netflix needs from noSQL.
<Visualization> Best graphical analysis of 2010 election in 10 visuals comes from New York Times.
<Numbers>
- How long will search be the king? Twice as many people in age group 18-29 discover a product or service through Social Network (Facebook) compared to all age groups of consumers
- Facebook profile photo angled at 15”? Fun-loving. 16”? Uh-oh. Risky business – Fast Company analyzes.
- Can mod_pagespeed from Google really speed up your site 2x? Here’s a quick way to find out from another proxy.
- Stats on P2P file sharing — Larges – 746GB (all 2010 World Cup Soccer matches; ~6GB per 45 min); Oldest – The Matrix Ascii; Most Data Transferred by single torrent – 15.77PB (StarCraft 2)