State of Data Last Week – Sep 26

<Cool Numbers> Why performance, caching (and may be color!) matter 5x more today, than in 2003 – even for environment?

  • “From 2003 to 2009 the average web page grew from 93.7K to over 507K” (5x)
  • “top 500 home page goes from 507K and 64.7 requests upon initial cache-cleared load to 98.5K and 16.1 requests. On average, caching saves 81 percent of bytes, and 75 percent of the requests.”
  • And how green is it? Viewing a simple web page generates ~20 milligrams of CO2 per second. It ups to 300mg of CO2 per second with a page with video.
  • A black version of Google would save about 750 megawatt-hours – enough to power 1000 houses for a year.

<Analysis> If you’re 25, you probably listen a lot of …Primus? analyzed data to learn listening preferences vary with age and gender.

Why you should look beyond typical “Insights from Top 10” and adopt “Weighted Sort”. Avinash is also the author of two best-selling (and ultra great) books ever on Web Analytics.

<Data Outage> Persistent cache-database lookup – under a race condition created Facebook outage for 1/10th of the known universe this week.

<Strategy/Arch> Your browsing data could be ever-persistent, thanks to HTML5 — Cookies that never, ever would go away
<Learning> Finally! OCR-enabled powerful search within Video lectures. TalkMiner OCR-processes whenever slides are shown in video talks; and adds the content and the time it appears to the search metadata.

<Big Data> Facebook data center bill is about $50M/year. Here’s a great analysis of cost and excel based cost model – 57% cost is from servers (amortized over 3 years). Should someone be brave enough buck the trend and have less of servers then?

<Visualization> After burglars (see last week’s), folks buying weed are joining the data revolution. What is Marijuana really worth – apparently real-time mash-up of price data.

<Cocktail party cheat-sheet> Yes, it IS a legal requirement to generate invoice numbers without gap in UK (VAT-registered). i.e., your data store can never “lose” a sequence / auto-gen number. Why? It is more difficult to hide revenue from tax authorities without any invoice “missing”.


About Nilendu Misra
I love to learn, create and coach. Things that I do well are - Communicating ideas - verbally or through words and diagrams; Problem Solving - Logical or Abstract; Very Large Scale Systems; think about 'Frighteningly Simple' approach first. Things that I intend to do better are - Establishing Stringent Process; Exchanging Tough Feedback; Keeping up with my reading or To-Do list to be able to completely relax.

Comments are closed.

%d bloggers like this: