Double Edition : State of Data -#48, 49 (Break for next 3 weeks)

#analysis – Felix Salmon brilliantly analyzes “Grouponomics” – if you do not buy wine, restaurant loses — “diners paid $15 for their Groupon — which gave them $30 of food..So even after knocking $22.50 off the bill (remember that Giorgio’s kept $7.50 of the proceeds of Groupon), the restaurant would often still make money


Design of Large Scale Log Analysis (PDF) from Microsoft – if you ever need to glean into web server logs, or behavioral logs or want to see ‘what logs cannot tell us’ this is a good resource to ratify. This editor’s favorite analysis fallacy (Simpson’s Paradox) is mentioned as well.

#architecture – Why Guardian chose MongoDB
One of the most scalable, performing and challenging “integration” problems ever – solved within dated infrastructure – “The Incredible delivery system of India’s Dabbawallahs” – there are SO MANY patterns to learn about (data) movement as well from here. 


#big_data – Here is to the huge potential hidden within Google Maps Directions Logs – “massive logs of people asking for directions from A to B,… And, it appears this data may be as or more useful than user reviews of businesses and maybe GPS trails for local search ranking, recommending nearby places, and perhaps local and personalized deals and advertising

The paper referenced above is a good read too – “at least 20% of web queries have local intent”, “time-aware scoring” – how one gets results back depending if the search for ‘beer’ was made during 10AM, Monday vs. 10PM, Friday etc.

#DBMS – How StackOverflow made pages 100x faster by….SQL tuning

Talking of tuning, full “Oracle Performance Tuning” Course (on Video) is now available on Safari

#learning –
Two good tools for text analysis — Word Frequency Lists and SentiWordNet

Presentation on Drizzle by Brian Aker who led MySQL until Oracle acquired Sun. Interesting observations on not only database but best practices and prevalent approaches in the industry (replication, virtualization, etc.)”

#visualization – “How Quick Can We Be – Current Data Visualization Techniques for Front-end Engineers” – shows some neat tricks with OpenHeatMap, Fusion Tables and Google Charts — slide-deck from JS Conf 2011 (full conference slides available here)

How to solve problems with Visual Analytics (PDF; 25M) – free ebook from Vismaster, European consortium for data visualization



About Nilendu Misra
I love to learn, create and coach. Things that I do well are - Communicating ideas - verbally or through words and diagrams; Problem Solving - Logical or Abstract; Very Large Scale Systems; think about 'Frighteningly Simple' approach first. Things that I intend to do better are - Establishing Stringent Process; Exchanging Tough Feedback; Keeping up with my reading or To-Do list to be able to completely relax.

Comments are closed.

%d bloggers like this: