State of Data #92

#1Read_this_WeekLocals and Tourists – how do they photograph same city differently


#analysisHow does privacy breach map with increased data volume?

‘Big Data Imperative – Driving Big Action’ – 6 principles, including –

‘For every $100 you have available to invest in making smart decisions, invest $10 in tools and vendor services, and invest $90 in big brains (aka people, aka analysis ninjas, aka you!).’

#big_data – ‘Bizarre insights from Big Data’

‘A few years ago I was speaking with the founder of an African mobile phone company, called CellTel. He told me that his company realized that they could predict the location of impending massacres in theCongo, because there were spikes in the sale of prepaid phone cards.’

#Data_Science‘Effective exploration of structured datasets – rank-aware interval based clustering’

‘For example, a dating website user who is looking for a partner between 20 and 40 years old, and who sorts the matches by income from higher to lower, will see a large number of matches in their late 30s who hold an MBA degree and work in the financial industry, before seeing any matches in different age groups and walks of life’

#DBMSData Vendors and Market Analysis


#idea Death of a Data Haven (or, of world’s ‘smallest nation’)


#learningHow to proof yourself from sensationalized stats

‘Chris Grayling cited a 35 percent increase in “violent” crime starting in 2002 as evidence of failed liberal law enforcement policies. But 2002 was the year civilians and not police were given the right to designate a crime “violent,” and many chose to see violence where the police might not have. The “35 percent increase” was the difference between apples and oranges.’

#visualizationWind Map of US will Blow You Away’




State of Data #91

#1Read_this_Week A new article in ACM dissects the taxonomy of Visualization techniques 


#analysis – Every Groupon deal means half-a-star less in Yelp (pdf) for restaurants
“Our analysis shows that while the number of reviews increases signicantly due to daily deals,
average rating scores from reviewers who mention daily deals are 10% lower than scores of their peers on average.”

– How NetFlix’ PaaS (Platform as a Service) scales million writes per second (part 3 of ‘Cloud Architecture Tutorial’ from Adrian Cockcroft)


#big_data – The Large-scale Parallelized R Forecasting that Google uses –
Our technique cut total run time by a factor of 300. Distributing the computation across many machines permits analysts to focus on statistical issues
while answering questions that would be intractable without significant parallel computational infrastructure.’


#contest – Nielsen Data Visualization, 2012 Contest

Optimizing Neural networks on CPU  (pdf; NIPS, 2011)
..machine learning applications such as speech recognition, computational complexity is fast becoming a limiting factor in their adoption.
We show how to best leverage modern CPU architectures to significantly speed-up their inference.’


#DBMS – Different ways of pagination in SQL, including performance comparison

#idea – Cassandra and Solid-state Drives – apparently, Walmart Labs runs it that way for 2 years

#learning – Graphs in a Database – just in case the Hadoop cluster is offline

#visualization – A repeat (after 2009), but extremely useful – ‘Periodic Table of Visualization’ (mouse-over the cells for illustrations)



State of Technology #50

#at_other_places –

#architecture – JavaScript performance MythBusters (from SXSU 2012) – a four-person walkthrough

#code – Database Changes Done Right’ – Full process of Software Engineering w.r.t. Databases (change, build, deploy)

#design – Google’s new “Report a Bug” or “Feedback Tool” now allows you to selectively capture screenshot in browser (Hat tip: Aalap Sharma)

 – Memes of the fortnight –

            Why I left Goldman Sachs (One word: Muppet); Why I left Google (One symbol: +); Why I left McDonald’s etc.

#mobile – Faux G – what on earth does that ‘4g’ in your iPhone 4s means
“AT&T and Verizon spent tens of billions of dollars to acquire new 700MHz licenses at auction and from existing license holders in a headlong rush for 4G. That left AT&T with the bragging rights of having a faster 3G network and a plan for a faster 4G one. In response, Verizon moved aggressively into 4G, pushing handset makers to deliver early kit (first laptop adapters and later phones), and lighting up LTE networks at a more rapid clip than AT&T. (Both now claim most of their 3G coverage will be LTE-ready by the end of 2013.)”

#saas – Amazon ‘cloud’ has roughly 450,000 servers across 7 data centers


#service – How large is your web page, and why it matters? Google home page is 0.5MB while a single tweet is 2MB!

#social – This music from the emerging rapper, released on Facebook, can ONLY be heard by one person at a time.

#tool – This man came up with the idea of iPad – 15 years ago!

#tweaks n’ hacks – London Bus stop now shows different ads to men and women


#parting_thought – 
“So much complexity in software comes from trying to make one thing do two things.” – Ryan Singer

State of Data #90

#analysis – How much does it cost to manufacture iPhone?
Square’s Growth Curve (and a detailed article from Fast Company)


#architecture – Angelina – AI that could create interesting games by itself –  “Designed by Michael Cook, a PhD student at Imperial College London, Angelina designs different aspects of games — level layout, enemy behavior, bonuses, and the like — and assembles them randomly. The system then simulates playing the level, determining what the most effective variants are, and repeats the process around 400 times, pushing the most successful elements forward in a sort of game-design Darwinism.”                                                                   


#big_data – School for Quants


#Data_Science – Scalable Machine Learning, full UCB 2012 course, with videos, online

#DBMS – ‘Taking a step back from ORMs’ –  ‘But are there simpler ways to avoid boilerplate? It seems like we should be able to do so without something as invasive as an ORM. For the sake of brevity, I’ll be using hashes rather than objects, but the principle is the same. ‘

#idea – Is ‘Big Data’ the only tech business model left?

Patient of the Future – how Quantified Self is moving the decision-making to the patient

#learning – *** Most detailed articulation on NoSQL Data Modeling Techniques

 – Stephen Wolfram analyzes his entire life


State of Data #89

 #analysis – Average person looks at their phone 150 times a day


#architecture – Open Data Handbook’ discusses the legal, social and technical aspects of open data                                                                  


#big_data – The Data Dividend’ (pdf; from a UK based think-tank) 


#Data_Science – ‘How many medals will Great Britain win in the 2012 Olympics

#DBMS – Application vs. Database Programming


#idea – ‘What Data Science should be doing

“I recently read this New York Times article about a company that figures out how to get the best deal when you rent a car.
The company is called AutoSlash and the idea is you book with them and they keep looking for good deals, coupons, or free
offers every day until you actually need the car.”

#learning – How NOT to draw bubble charts


#visualization – Nice ‘live’ visualization by Yahoo, of Yahoo’s News Delivery, and another one on Y! mail


State of Technology #48

 #at_other_places –

#architecture – Detailed Memory Usage comparison of Java Application servers

#code – Great guide to Real-time web Technologies

#design – Solar Company’s Annual Report Powered by Sun (the star)

#essay – Definitive Guide to Landing Page Optimization


#mobile – Which businesses use SMS the most? Where it’s growing like crazy?

#SaaS – All about ‘Net Promoter System’ you need to know


#social – 40% of your customer service emails could be answered by your FAQ’

#tool – Optimize your caffeine intake – there’s an app for that


#tweaks n’ hacks – How to ‘speechjam’ – ‘can be used to disturb people’s speech. In general, human speech is jammed by giving back to the speakers their own utterances at a delay of a few hundred milliseconds. ‘




#parting_thought – (On Complex Systems) We can study cars and their physical relationships, and know exactly how they work. It in no way prepares us to understand traffic when they all get together and start interacting” – Michael Gazzaniga

State of Data #88

#analysisMobile Analytics & Babies


#architecture –  Gartner Magic Quadrant 2012 for BI Platforms – detailed analysis


#big_dataData from 700+ GE Power Turbines and Data from 100K+ medical scans on GE’s CT and MRI machines


#Data_Science – *** What’s your site’s privacy score? Privacyscore can now estimate based on personal and tracking data

#DBMSDo you really need ExaData?

‘What if modern servers can actually handle data at extreme rates of throughput from storage, over PCI and into the processor cores without offloading the lower level I/O and filtration? Well, the answer to that comes down to how many processor cores are involved with the functionality that is offloaded to Exadata.’


#idea Curious case of inverted chart (and a very interesting metaphor)

WhySQL – from Evernote Architecture


#visualization100 Incredible Infographic Tools & Resources (Categorized)