State of Data Last Week – July 19

Cool Numbers – “Are we there yet?”

  • To become a top 1,000 website you need at least 4.1 million visitors per month. 3 visitors every 2 seconds!
  • To become a top 500 website you need at least 7.4 million visitors per month.
  • To become a top 100 website you need at least 22 million visitors per month.
  • To become a top 50 website you need at least 41 million visitors per month.
  • To become a top 10 website you need at least 230 million visitors per month.
  • To become the number 1 website in the world? Then you need more than 540 million visitors per month. i.e., 210 visitors every second!
  • Your chance of having a mid-Air collision has increased 35% just last year! “”serious” airspace incursions in the United States rose from 2.44 per million flights to 3.28 per million flights.”

CouchDB becomes the first production-ready noSQL database.

Very large-scale data analytics using Google’s Dremel (PDF from 36th international VLDB conference, 2010) or, how to use SQL to query (nested columnar; *not* relational records) tables containing 1 trillion+ records.

Interesting takeaways –

  • 1.      MapReduce took 5000 sec to sort 85-B records; Dremel took about 15 sec!
  • 2.      MapReduce benefits from columnar storage just like typical RDBMS does
  • 3.      Dremel can achieve up to 100B records/second throughput with shared cluster
  • 4.      Trade speed with accuracy (finish faster with sampling!)
  • 5.      Bulk of the data scan is fast, getting to the last few percent reasonably fast is challenging (Pareto Principle)

Data Mining lecture notes from MIT. Great case study!

Data thought leader – Future of Databases – rumination on SQL, Ruby, C#  and evolution.

Data Visualization–

Ten Maps that changed the world.

Visual History of Data Visualization for past 500 years.

50 ways of visualizing BP Oil Spill


