State of Data #55

#analysis – ‘Web Analytics Career Guide – From Zero to Hero in Five Steps’. “I’ve said repeatedly that if I look into the next xx number of years Analyst is essentially a recession-proof job”. 


#architecture – Safari now has the complete video tutorial of ‘An Introduction to Machine Learning with Web Data’ from Hilary Mason, Data Scientist for Running for about 3 hrs in four key areas, highly recommended.  


#big_data – Continual fall of cost of Cloud Computing – announcement from Amazon Web Services on 06/29 made “Data Transfer In” FREE, with massive reduction in “Data Transfer Out” rates

 #DBMS – Ever wondered how complicated a simple two-table joins could indeed be? Jonathan Lewis, one of the best scientific thinkers out there, recently presented on that topic at Turkish Oracle User Group – the video of the session runs for 55 minutes. 

#learning – James Hamilton’s keynote in SIGMOD mid-June is a absolute must visit to understand where relational models failed. Video (Choose June 14, Then James as speaker), Presentation link.

Key takeaways –

  • ‘mid-90s through to around 2005, the database world went through dark ages’
  • ‘The pace of innovation was glacial  – “polishing the round ball”’
  • ‘plunging cost of computing is fueling database size growth at a super-Moore pace’
  • disk is tape, flash is disk, ram locality is king’, ‘crossed storage chasm’


#visualization – A good play on visualizing statistical open data — ‘Peoplemovin is an experimental project in data visualization by Carlo Zapponi. The main purpose of this project is to create a flow chart visualization framework based on HTML5 technologies’


  • Museum of Me – Visualize your life, friends and consciousness with helps from cool robotic arms. Good interplay of Social Data, 3D visualization from Intel. Perhaps subtly narcissistic.
  • At a rate of 20% YOY growth when does the data quadruples? 7 years.  – The math behind the rule of 72 is easy to extend to triplings (rule of 110), quadrupling (rule of 140), quintupling (rule of 160)’
  • von Neumann’s Elephant can indeed be drawn with four parameters if they are complex numbers – with the sample Python code. Brilliant!
  • SQL to JFK – San Carlos Airport, next to Larry Ellison’s pod has IATA code ‘SQL

State of Technology #14


·     Tired of the friend who interrupts often in an otherwise civilized conversation? There’s an app for that – Talk-o-Meter.

·     Want to join the ‘Quantified Self’ gang? There’s a new shoe for that. It live analyzes gait and uploads it for post analysis.

·     Everyone is talking about Lytro – the new revolution in photography. Read an analysis from Ben Horowitz, investor who usually is (a) right and (b) makes huge money on technology bets.

#architecture –   Do cores or threads provide the “true CPU power” in your system? Yes, there’s a real difference.

If cores are what is providing the true CPU power, .., when a process completes, a core becomes available and the next process begins. This perfect elapsed time sequencing assumes the OS makes no optimizations. If threads are what is providing the true CPU power, ..when a processing thread completes a thread becomes available and the next process thread begins. Again, this perfect elapsed time sequencing assumes the OS makes no optimizations        

#codeHighlights from Google IO – from HA to Performance to Full Text Search

#design36 BRILLIANTSunrise photographs.


#essayA visual (animated) essay this time – The Internet in Society

Highlights from Velocity 2011 – Slides available now. Notables –

1. What makes client side slower

2. iOS vs. Android vs. Blackberry – Bakeoff

3. Next Gen YSlow

#social – There’s a game for that app. How/why gamification could help anything – “Games are the only force in the known
universe that can get people to take actions against their self-interest
“ – a brilliant, detailed presentation.

#tweaks n’ hacks –  Someone finally is close to cracking Zodiac cipher or just another tall claim?


#parting_thought –
It was my experience that people approached an online purchase of six dollars with the same deliberation and thoughtfulness they might bring to bear when buying a new car. Prospective users would hand-wring for weeks on Twitter and send us closely-worded, punctilious lists of questions before creating an account

Maciej Ceglowski


State of Data -#54

#analysis – Data Takes its Rightful Place as a Platform” — Accenture Technology Vision 2011400 hypotheses based on input from scientists, architects and engineers’ — mentions “data” 178 times (Full PDF)



#architecture – Detailed insight into Amazon Web Services infrastructure (pdf) – from James Hamilton, VP/Distinguished Engr., AWS –

#big_data –
Data Scientist Summit 2011 videos are now available (requires registration) – notables – ‘Art and Science of Storytelling’; ‘Stories Beyond the Last click’ (last click before purchase gets paid and it’s often wrong)

#conference – Last call! Hadoop Summit 2011, June 29 at Santa Clara Convention Center

#DBMS – Expensive! Did you know that ‘Oracle maintenance costs a 22% of annual license priceand itgoes up to 27% after two years’?


#learning –
How a polymath helped build ‘n-grams viewer to chart the frequency of phrases across a corpus of 500 billion words’ (‘chide’ is the fastest changing verb ever)


#visualization –
David McCandless visualizes vanishing fish stocks – shocking depletion in last 100 years.


State of Technology -#13

 #at_other_places –

#architecture – 
In line of Khan Academy, ‘Quantum Computing for the Determined’ is an excellent series of 22 short videos  

#code – 
Functional Programming – HowTo 

#design – 
99% Conference 2011 – Key Insights into Idea Execution — “Technology is part of every problem, and every solution.” “If you’re not being told you’re crazy, you’re not thinking big  enough.

#essay – 
Larry Sanger, who started WikiPedia, analyzes whether there is a ‘new geek intellectualism’? Have people started thinking that literature, philosophy, articulation, or formal education is not necessary?

#mobile – Best technical analysis of Andriod, ever – ‘ABS: The Guts of Android 

#saas – Classification of HTTP-based APIs and effect on performance, cost, simplicity

#tool –  All you ever want to know about DNS, from a UX magazine

#tweaks n’ hacks –  
‘Why do C++ folks make things so complicated?’


  • Missing the ‘old internet’? TeleHack will help


#parting_thought – Some great ideas work spectacularly the first time around, handsomely rewarding the original entrepreneurs. Others fail or flounder initially, sometimes multiple times, before a combination of the right entrepreneur and the right market and technology conditions unlocks their true potential.’

– John O’Farrell, General Partner at Andressen Horrowitz

State of Data -#53

#analysis – What are the most common iPhone passcodes? Yes, 1234 still leads!

#architecture –  SQL Injection? More like topical rub – How CITI Account information was stolen — “if the URL was something like, all you had to do was change it to and you had access to all of their account information

 #big_data – Biggest players in Web 2.0 Data layer

Invasion of Body Hackers – how many are using data about the ‘quantified self’ to change biology, or even the consciousness


#book – Principles of Uncertainty’ – from Monty Hall to St. Petersburg Paradox (legal; PDF)

#DBMS – Strong Opinion – “ORM is an anti-pattern” – ‘another problem of ORM: inefficiency. When you fetch an object, which of its properties (columns in the table) do you need? ORM can’t know, so it gets all of them


#learning – Percona 2011 slides are now available – recommended ones are ‘Optimizing mySQL for Solid State’, ‘Business Intelligence for Big Data’, ‘TCP Performance’, and ‘NoSQL + MySQL


#visualization – Great Data Portal from African Development Bank gives access to Country Dashboards and interactive (and fast!) data queries



State of Technology -#12

#at_other_places – 


  • June 8 was World IPv6 day. IPv6 “ultimately replace the current IPv4 addressing system, which has nearly run out of addresses for networks to assign to computers and other devices
  • Everything you always wanted to know about – The Hidden Empire. Fascinating presentation.

     “In 2000, Jeff Bezos discovered it took 15 minutes to pack a best-selling $25 chair, which obliterated the margin. He then negotiated with the manufacturer, who agreed to send it pre-packaged for $0.25


 #architecture –  
The Architecture of Open Source Applications – New Book made freely available 


 #code –  
Great Analysis of stolen Sony Passwords – how many people actually use upper case etc.


 #design – 
WebP is 40% faster than JPEG. Could you tell the difference in quality?


 #essay – “How I Failed, Failed, Failed and Finally Succeeded at Learning How to Code”




#mobile – 10 Tips for Android UI Design




 #saas – 
How page load time affects your bottomline, especially on mobile

#social – Bitcoin seems to be the rage this past few weeks. Here’s a relatively unknown, but good analysis of Bitcoin


#tool –  MicroJs – “Fantastic Micro-Frameworks and Micro-Libraries for Fun and Profit”




#tweaks n’ hacks –  Missed it? Everything Steve Jobs said in WWDC in 60 sec








#parting_thought –“Privacy regimes should follow the insider-trading model. It’s not possession of secret information that is criminalised, but misuse of that information to take advantage of others” –  Tim O’Reilly, on Facebook 


State of Data – #52

#analysis – McKinsey publishes a much-talked about BigData report (PDF). It projects “140,000-190,000 more deep technical talent positions” and 1.5 Million “more data savvy managers”. Tech bubble, anyone?

#architecture – Brilliant Machine Learning Demos (verified the install works at least on XP)

#big_data – This editor’s favorite rant – many mean median when they say mean.

#contestDataVis contest from David McCandless


#DBMS – We get this question a lot. Kurt Monash cogently analyzes ‘When it’s still best to use a relational DBMS

#learning – Lately, ACM has been publishing some fantastic articles on data. The latest issue has ‘How do large-scale sites remain SQL-based’.

#visualization – Insight into the man and his ideas — Edward Tufte, The Information Sage. Why he thinks Minard Map is the best statistical graphic ever made?

Tufte pointed to the far left of the map, where the tan and black lines intersect. “And it is there,” he said, “at the beginning and at the end of the campaign, where we have a small but poignant example of the first grand principle of analytical design”: above all else, always show comparisons.