State of Data Last Week – Sep 26
September 25, 2010
<Cool Numbers> Why performance, caching (and may be color!) matter 5x more today, than in 2003 – even for environment?
- “From 2003 to 2009 the average web page grew from 93.7K to over 507K” (5x)
- “top 500 home page goes from 507K and 64.7 requests upon initial cache-cleared load to 98.5K and 16.1 requests. On average, caching saves 81 percent of bytes, and 75 percent of the requests.”
- And how green is it? Viewing a simple web page generates ~20 milligrams of CO2 per second. It ups to 300mg of CO2 per second with a page with video.
- A black version of Google would save about 750 megawatt-hours – enough to power 1000 houses for a year.
<Analysis> If you’re 25, you probably listen a lot of …Primus? Last.fm analyzed data to learn listening preferences vary with age and gender.
<Data Outage> Persistent cache-database lookup – under a race condition created Facebook outage for 1/10th of the known universe this week.
<Strategy/Arch> Your browsing data could be ever-persistent, thanks to HTML5 — Cookies that never, ever would go away
<Learning> Finally! OCR-enabled powerful search within Video lectures. TalkMiner OCR-processes whenever slides are shown in video talks; and adds the content and the time it appears to the search metadata.
<Big Data> Facebook data center bill is about $50M/year. Here’s a great analysis of cost and excel based cost model – 57% cost is from servers (amortized over 3 years). Should someone be brave enough buck the trend and have less of servers then?
<Visualization> After burglars (see last week’s), folks buying weed are joining the data revolution. What is Marijuana really worth – apparently real-time mash-up of price data.
<Cocktail party cheat-sheet> Yes, it IS a legal requirement to generate invoice numbers without gap in UK (VAT-registered). i.e., your data store can never “lose” a sequence / auto-gen number. Why? It is more difficult to hide revenue from tax authorities without any invoice “missing”.