State of Data Last Week – Oct 23

<Analysis> Have diminishing returns set in to investments in higher education? Somewhat counter-intuitive analysis of BLS data thinks so.

Nasdaq releases on-demand historical stock data – nice way to test trading algorithms with accurate, bulk data.

<Architecture> Michael Stonebraker clarifies his position – CAP theorem should not be justification to give up on ACID. Also “rm –rf” type errors cannot be recovered using CAP theorem terms.

<Big Data>
Showing off? Average Hadoop cluster has 66 nodes and 114TB data. Ebay has 8500 nodes, 2PB.

<DBMS> MySQL cluster delivers 180K primary key SELECTs per sec and 120K UPDATEs per sec on just 2 nodes.

Cary Milsap continues his “Thinking Clearly about Performance” in ACM – should you open the window (global stuff) or take off your heavy sweater (local)?

<Visualization> Hipmunk visuals for Airline reservation – available flights are aggregated over a single chart, and default-sorted by “agony” (price; duration; layover; red-eye).


  • Two sides of reference – WikiPedia puts ZERO tracking file in your computer; puts 234 cookies / beacons
  • REST service API is steadily winning the API war over SOAP. 74% of most popular 2000 Web APIs are now in REST protocol.
  • Why having a house numbered < 31 (or, digits adding up to 6) could sell earlier – many unwillingly get ‘nudged’ to live in a house numbered after birth / wedding anniversary.
  • Using auto-increment / serial numbers for entities is a standard practice in modeling. It could also give away a lot of secrets; like, estimating iPhone sales; or help statisticians win (real) wars.

