State of Data #87

#analysisWhy retailers like J.C Penney and Gamestop are shutting down their Facebook stores

#architectureMapReduce Patterns, Algorithms and Use Cases

#big_dataHow companies are using big data, the latest round trip –

  1. Target (yeah, that pregnancy meme)
  2. Pulse – to drive rich user features
  3. Facebook – ‘weblining – when you may be refused health insurance based on your Google search about a medical condition’
  4. Heroku – send SQL query results sent as URL

The term ‘Data Science’ existed 10 years ago – Data Science paper (pdf) from William Cleveland (2001) indicates six areas to master, most important being ‘Multidisciplinary Investigations’.

#DBMSLatest MySQL version claims significant performance improvement especially with NoSQL patterns (1B queries/minute)

#idea ‘GOOD with numbers? Fascinated by data? The sound you hear is opportunity knocking.’

#learning – ****Machine Learning for Hackers’ is now available in Safari and in Amazon. Great book!

THREE Data Visualization Books – from decades ago –

  1. Graphic Methods for Presenting Facts

  2. Graphic Presentation

  3. Handbook of Graphic Presentation


State of Technology #46


#architecture***** Find out how your Mobile site performs vs. (a default group) or (your competitors’) mobile site across differentMobile browsers

#codeA great, useful JavaScript Pattern and Anti-pattern collection

Nice Abstraction – 10 Ways New York Times Tells Stories through Reader Content –

  1. Reaction Grid
  2. Photo Galleries
  3. One-word submission etc.

#essayWhat every CEO needs to know About the Cloud (HBR; pdf)

#mobileTen Things to Think about before building iPad Apps

#saasSplunk is a great tool that will take seemingly irrelevant lines of text from your log files and will offer amazing insights. Its recent 4.3 version had a lot of ‘oh, it did not have that before?’ features.

A great overview of the new Splunk 4.3 release is here – it now accepts JSON format to make meaning out of it too.

#socialHTML5 Games vs. Flash Games – what’s the big deal (Infographic)


#toolZebra stripes are just elegant insect repellent

#tweaks n’ hacksHere is a way to make your site secure – don’t try to prevent malevolent attackers, if you find out one is about to attack just start wasting their time



#parting_thought“The word sustainable is unsustainable” – XKCD 

State of Data #86

#architectureExcel Cloud Data Analytics – enjoy the bandwidth of cloud, at the privacy of your spreadsheet

#big_dataGenerating Twitter WordClouds in R

#Data_ScienceHow many state license plates do you expect to see in a road trip? Depends on (a) how many miles you drive, and (b) number of licensed drivers per state

#DBMSSQL Injection Redux from Tom Kyte


#idea ***** Markov text generator of PubMed papers (using Python)

#learningcsvKit – very useful suite of tools
to munge and process CSV files (like, auto-import it to any database; join-clean-grep files etc)


#visualizationVisualizing Decision-tree for Hosting


LinkedIn is winning the Platform Game

(This paragraph is opinion, reject if you will) I’ve always believed LinkedIn will be the most profitable platform play out there (not the biggest, but most profitable). There are three things that bring people together – Health, Wealth and Children. LinkedIn caters to the second part (via career).

(Now, this part is all data, not opinion) Yesterday, LinkedIn came up with a blockbuster quarter. Here’re some nuggets from Jeff Weiner (CEO) from yesterday’s Earnings call

  • Growth rate – “For Q4, overall revenues grew 105% to a record $168 million, marking sixth straight quarter in which our revenues at least doubled over the prior year.”
  • Data ‘bragging’ – “we continue to add more than two members every second”
  • 20% of Facebook – ‘end of January we surpassed the 150 million member milestone’ (Now, according to Metcalfe’s law – value of a network 5x bigger is 25 times. So, everything else remaining same, intrinsic value of Facebook network is significantly higher as it has 845M ‘nodes’)
  • And they keep coming back – ‘unique member visit for LinkedIn in Q4 grew 67% year-over-year, even faster than the rate of membership growth ‘
  • Platform Play #1 = Global – ‘Today 60% of LinkedIn members reside outside the United States. We launched eight local languages last year’
  • Platform Play #2 = Social – In-bound, Targeted Content – ‘we introduced our professionally focused social news product LinkedIn Today, which quickly became an integral way for our members to get the insights they need to be successful. Since the end of Q3, we’ve seen nearly 60% increase in the number of members customizing their LinkedIn Today experience.’
  • Platform Play #3 = Social – Out-bound – ‘significant traffic driver to publishers all over the web. Q4 saw an increase in referral traffic of more than 45% over the previous quarter ‘
  • Platform Play #4 = ‘Share’ (catching up with ‘Like’) – ‘with more than 300,000 unique domains using the LinkedIn share button nearly double the number since our last call’
  • Platform Play #5 = Developers – ‘April, we opened up full access to LinkedIn’s platform as of now there are more than 50,000 developers using LinkedIn APIs to help build and empower the professional web.’
  • Platform Play #6 = Mobile – ‘we completely revamped our mobile experience, introducing new apps for iOS and Android, as well as a new mobile website. Mobile visits now account for more than 15% of total unique number visits’
  • Platform Play # 7= Integration – ‘..introduced some new products that integrate LinkedIn data into the fabric of the enterprise ..Sales Navigator a premium subscription that integrates with CRM platforms like sales force..
  • Platform Play #8 = Enterprise base growing more than individual – ‘We ended the year with more than 9,200 corporate customers, up 139%‘
  • Platform Play #9 = Ad Platform – ‘Cisco turned to LinkedIn to personally reach 140,000 C Level executives with an innovative first of its kind video message with their larger Cisco story campaign. Cathay Pacific is using LinkedIn to engage with the valuable frequent business traveler’

State of Data #85

#analysis Figuring out Popularity vs. Critical acclaim – ‘Golden Age’ of television with episode data


#architectureSQL Fiddle would help you prototype a model, or idea to query across multiple data sources

#big_dataRevealing Serial Killers’ Pattern of Murders

#conferenceBig Data Essentials’ – Online Forum, led by Tom Kyte of Oracle on February 16

 #Data_ScienceWould you try out Data-driven Medical Diagnosis?


For transactional apps, only a part of the result needs to be displayed. Very useful tips on how to do (a) Top-N Queries, and (b) Paginate between large resultset


#idea Malcolm Gladwell’s ‘Blink’ was said to be inspired by Gerd Gigerenzer’s seminal work on power of intuition. A while ago, Gerd wrote a great paper ‘Mindless Statistics’ (pdf) that is highly impressive. The core of this paper is equally applicable to enterprises – “Statistical rituals largely eliminate statistical thinking in the social sciences.


#learning – D3 is truly an amazing tool to visualize data-on-the-web. Scott Murray has an excellent self-contained tutorial on it.


#visualizationPlotting a Giraffe Line Chart with Data




State of Technology #44


  • Useful innovation at Hotmail – Identifying and Separating email Newsletters (aka Graymail – like this one). 1.5B newsletters are sent a day!!
  • YouTube hits 4B views a day; 60 hours of uploaded video per minute

#architectureGoogle’s Latest Issue of ‘Think Quarterly’ is about Speed –

‘In a world of increasingly compressed feedback loops – where news of Beyoncé’s pregnancy can generate 8,868 tweets in a single second – word travels fast, with or without the early adopter’s vocal approval.’

#codeGreat Introduction to Java Garbage Collection @JavaCodeGeeks

Steve Souders on ‘High Performance HTML5’ (the presentation is a learning in ‘how to communicate’ itself)

Saving 2 seconds in landing page could increase conversion by as much as 15%!

The best less than 2000 word summary of 600-page Steve Jobs biography, so far


#mobileiPad Survey January, 2012 (pdf) – “InSouth America, 27% of IT Professionals have completely replaced their laptop with iPad. InEurope, 23%.”

A continuous journey of Customer Support Optimization ‘with 123 themes and 123 theme docs, and other support resources including a knowledgebase, FAQs, a support forum, tutorials and a video library catering for over 130,000 users’

Key data-driven insights in Login/Password realm –


#tool25 Very Useful Chrome Extensions for Designers and Presenters (Special Rec.: Awesome Screenshot)

#tweaks n’ hacks
Why button-less elevators may save up to 30% time –

“The idea is that rather than having people crowd into an elevator and then request their floors, the destination elevators do the math before hand, and group people going to the same floors. This leads to fewer stops and time savings.”


“As we all know from our Blackberries, work invades leisure; but as we also all know from our iPhone, leisure invades work” – Paul Mason

State of Data #84

#analysisHow to cheat with Time Series


#architectureShould you use SQL or Hadoop? A flowchart is there to help.



#big_dataWhy Big Data won’t Make You Smart, Rich, or Pretty’ –

“Essential issues with Big data –

  1. Overconfidence
  2. Ever-changing assumptions
  3. Complexity
  4. Feedback Loops
  5. Lack of Theory
  6. Confirmation Bias
  7. Motives
  8. Acting on the Model” 

Hadoop Summit, June 13-14, San Jose call for papers due Feb 22


#Data_Science – How to use analytics to build better claims handling

#DBMSDoes noSQL mean noDBA? (Hat tip: Denise McInerney)

#idea Why (naïve) Probability is more or less bunk (pdf) – Nassim N. Taleb, best-selling author of ‘Fooled by Randomness’, writes in a new paper  —

‘The owner of the ski resort, deploring lack of snow, deposited at a shrine the Virgin Mary a $100 wishing for snow. Snow came, with such abundance, and avalanches, with people stuck in the cars, and the resort was forced to close, prompting the owner to quip “I should have only given $25”. What the owner did is discover the notion of nonlinear exposure and extreme events.’

#learningData in Davos – World Economic Forum, 2012 brings out its POV (pdf) on how Big Data could help


#visualizationWinner of 2011 ‘Phillip Meyer Award’ is won by ‘Murder Mysteries’ series for ‘Data Journalism’  – 37% homicides go unsolved!