State of Data #123

Top Read

Presenter’s Paradox – The ‘average’ mistake you do not know you’re making

“When buyers were presented with an iPod Touch package that contained either an iPod, cover, and one free song download, or just an iPod and cover, they were willing to pay an average of $177 for the package with the download, and $242 for the one without the download. So the addition of the low-value free song download brought down the perceived value of the package by a whopping $65!”

 

Analysis

AWS Costs Cheat Sheet

 

 

Big Data

Princeville, OR has about 10,000 people, but hundreds of thousands servers. Why Oregon hosts the cloud

 

Data Science

What is the most optimal way to find a parking spot for lowest total walk-time vs. finding a space fastest

 

DBMS

CAP theorem about to enter teenage years. How the rules have changed in 12 years.

 

Idea

 

Do machines get more confused with Lots of Data too?


“The winning algorithm was a very complex ensemble of many different approaches — so complex that it was never implemented by Netflix. With three years of effort by some of the world’s best data mining scientists, the average prediction of how a viewer would rate a film improved by less than 0.1 star.”

 

Learning

Implement SQL with Common Unix Utilities

 

Visualization

How a viral photo spreads through Facebook

 

etc

 

State of Technology #80

elsewhere

Architecture

Scalable JavaScript Design Patterns

Code

Peter Norvig on How to Code a Spelling Corrector

Design

Design Principles behind Firefox OS UX

Essay

Marissa Mayer, her leadership principles, and her Macs and Madonna Theory”

Mobile

Choice between native app and responsive website is not really a budgetary one

SaaS

Russian Novel Programming

Alexei Fyodorovich Karamazov is also called Alyosha, Alyoshka, Alyoshenka, Alyoshechka, Alexeichik, Lyosha, and Lyoshenka. Russian novel programming is the anti-pattern of one thing having many names. 

Service

Really crisp articulation of Lean Startup in about 30 super-light slides

Social

Slashdot turned 15 years – a tour of topics

 

Tool

Business Jargon Directory

 

Hack

Dentist Drill Plays Music

 

etc

 

parting thought

“Procrastination is finding the most difficult way of doing something, is jumping from one idea to another to another, is checking your emails.” – John Kelly

State of Data #120

Top Read


Big Data Cube – Volume, Velocity, Variety
(and associated paper)

 

Analysis

 

Statwing for Small Data Analysis

 

Big Data

 

Foursquare Data reveals where you live accurately 78% of time (paper)

 

Data Science


Detect Ballot Stuffing with Data

 

 

DBMS

 

Fully graphic Dashboard using SQL*Plus?!

 

Idea

 

Data Centers waste 90% or more energy they pull

“McKinsey & Company analyzed energy use by data centers and found that, on average, they were using only 6 percent to 12 percent of the electricity powering their servers to perform computations. The rest was essentially used to keep servers idling and ready in case of a surge in activity that could slow or crash their operations”

 

Learning

Simpson’s Paradox – or, how non-smokers could have higher mortality rates than smokers’?

 

Visualization

 

How to make XKCD style graphs in R?

 

etc

 

State of Data #119

Top Read

 

Demystifying Big Data: A Practical Guide To Transforming The Business of Government” – Report (pdf) is filled with some key insights –

  • While Big Data is transformative, the journey towards becoming Big Data “capable” will be iterative and cyclical, versus revolutionary
  • Big Data is often characterized by three factors: volume, velocity, and variety
  • Scalable analytics using software frameworks can be combined with storage designs that support massive growth for cost-effectiveness and reliability

Analysis

How ‘correlation does not imply causation’ became an almost irritating meme

 

Big Data

Life of Data at Facebook

Data Science

What do Real-life Hadoop workloads look like?

 

DBMS

Not sure if indexes will be used or how it will be used? You can always write such a small program to check for different execution paths

 

Idea

There are two types of data visuals– (a) Story Visualizations and (b) Answer Visualizations.

Learning

SQL Injection in 60 Seconds

 

Visualization

‘Information is Beautiful’ 2012 Award Winning Visualizations

 

etc

State of Technology #78

elsewhere

 

Architecture


20 Controversial Programming Opinions


Unit testing won’t help you write good code.

“The only reason to have Unit tests is to make sure that code that already works doesn’t break. Writing tests first, or writing code to the tests is ridiculous. If you write to the tests before the code, you won’t even know what the edge cases are. You could have code that passes the tests but still fails in unforeseen circumstances. And furthermore, good developers will keep cohesion low, which will make the addition of new code unlikely to cause problems with existing stuff.”

 

Design

 

 

Revolutionary User Interfaces over past 2300 years – timeline built using Timeline.js 

 

Essay

Do not have time to read the massive “Steve Jobs” book? The author distilled everything in these 8 pages.

 

Mobile


Taming the Mobile Beast
– from Velocity 2012

 

SaaS

 

Octane – JavaScript Benchmark Suitewith real world code from Google

 

Service

 

If you access data once or more every 5 minutes, cache it”. How does this rule survive 20 years after it was coined by Jim Gray?

 

Social


Facebook now follows you offline?

 

Tool


Six Songs of Me

 

Hack


SmartType
– could reduce eye-neck strain

 

etc

 

 

parting thought

 

“Nearly all men can stand adversity, but if you want to test a man’s character, give him power.” – Why Power Corrupts (Lincoln)

State of Data #118

Top Read

Data Scientist: The Sexiest Job of the 21st Century

 

Analysis

Overkill Analytics Paradigm– “CPU over IQ”

 

Big Data

Redesigning the Data Center – From Sweater to Shorts

If you had walked into the average data center 10 years ago, you would have needed a sweater.  Google found that it could avoid relying on giant air-conditioning units, as did other companies. The most efficient data centers now hover at temperatures closer to 80 degrees Fahrenheit, and instead of sweaters, the technicians walk around in shorts. “

 

Data Science

State of Data #117

Top Read

Most popular PINs

Analysis

GPS vs. WiFi: Battle for Location Accuracy Using Yelp Check-Ins

 

 

Big  Data

Soccer Embraces Big Data

 

Data Science

Probability & Statistics Cookbook

 

DBMS

CAP and Cloud Data Management

 

Idea

Dr. Atul Gawande on why data is a big hope for health care

 

Learning

Gephi – Open Graph Viz Platform

 

Visualization

What are the most interesting charts with more than 3 variables?

 

 

etc

 

State of Technology #76

elsewhere


Architecture

What will ‘Phone’ of 2022 look like?

“One big problem is that if you want to move something on a touchscreen from point A to point B, you have actually have to drag it all the way there. i.e. there is a 1:1 relationship between your movement and the movement “in” the device. “To move something 100 pixels, your fingers you have to move your fingers 100 pixel”

 


Code

Learning  jQuery/JavaScript –Lessons with built-in editors


Design

22 Beautiful and Useful Sites

Essay

My Life as a TaskRabbit

Mobile

Turn any site into a Responsive Site

SaaS

What are clouds made of, and what it means for developers?

Service

Rise and Fall of Wintel Economy– in one image

Social

Great Idea for Directory listing or ‘About Me’?Listing number of unread emails

Tool

Nature is the Ultimate Architect –Google Earth Fractals

 

Hack

Cosmo – Hacker ‘God’ – ‘turns 16 next March, and he may very well do so inside a prison cell’

I called Netflix and it was so easy,” he chuckles. “They said, ‘What’s your name?’ and I said, ‘Todd [Redacted],’ gave them his e-mail, and they said, ‘Alright your password is 12345,’ and I was signed in. I saw the last four digits of his credit card. That’s when I filled out the Windows Live password-reset form, which just required the first name and last name of the credit card holder, the last four digits, and the expiration date.”

etc

Parting Thought 

‘You can’t learn less.’ –Buckminster Fuller

State of Data #116

Top Read

Reddit’s Database has only Two tables

they use two tables for each “thing”, so a thing/data pair for accounts, a thing/data pair for links, etc.”

Analysis

Gregor Mendel’s Suspicious Data

“He [Mendel] was most anxious to have his results replicated and expanded, for even self-possessed people (and he wasn’t) entertain occasional misgivings about the accuracy, originality, and significance of their work.

To achieve these goals, his work had to be understood. In comparison to his theories, of whose validity he was sure, the data were of no significance whatsoever.”

Big Data

Cool Algorithms: How toEstimate Cardinality of Large Dataset

Data Science

Ads and the City: Considering Geographic Distance in Recommendations (pdf)

“..in human mobility, we learn two insights: 1) there are special individuals who visit many places; and 2) individuals go to a venue not only because they like it but also because they are closeby.

We model these insights into two simple models and learn that: 1) simply recommending power users works better than random but is far from producing the best recommendations; 2) an item-based recommender system produces accurate recommendations; and 3) recommending places that are closest to a user’s geographic center of interest produces recommendations that are as accurate as item-based recommender’s”

DBMS

Tom Kyte hands over the ‘keys to Oracle’ – and it is free

Idea

HBR’s ‘Big Data’ Insight Center

Learning

How Google builds Maps and provides Directions

“I came away convinced that the geographic data Google has assembled is not likely to be matched by any other company. The secret to this success isn’t, as you might expect, Google’s facility with data, but rather its willingness to commit humans to combining and cleaning data about the physical world.”

Visualization

Some great Data Visualization Tutorials from Flowing Data

etc

State of Data #115

Top Read

Alex Pentland, one of the most-cited computer scientists, onReinventing Society in the Wake of Big Data

“With Big Data traditional methods of system building are of limited use. The data is so big that any question you ask about it will usually have a statistically significant answer. This means, strangely, that the scientific method as we normally use it no longer works, because almost everything is significant!” 

Analysis

What Data Can’t Tell You About Customers

 

e.g., on data finding that virtually every families in urban Brazil owned a television –

This was not a huge surprise — any report can tell you the rising percentage of technology ownership among families in emerging markets. But when we dug deeper, we learned that the TVs were not status symbols or signs of increasing wealth; they were safeguards. Because of the violence prevalent in the favelas where these families lived, parents feared their children going out at night. What these parents really wanted was a way to make the living room more entertaining than the streets.

Big Data

Windows Azure condenses stored data using “lazy erasure coding (deck)

Data Science

Is QWERTY the most optimal keyboard format for touchscreens? WORST layout, based on data is –

DBMS

Sometimes we all ‘see’ (or, write!) code like this

1
2
3
4
5
6
7
8
9
10
var srcData = data;
if (data.data && data.data.data) {
    data = data.data.data;
else if (data.data) {
    data = data.data;
}
if (!data) {
    return;
}


Idea

Does every company need a “Data Dictator”?

Aetna is a good example. In 2002, Ron Williams is president, and he says, “Okay, we lost about $270 million last year. Let’s figure out what went wrong.” He brings in all of his senior execs, and he says, “Tell me about your part of the business.” And he said that every single line of business showed data showing they were making a profit. “

Learning

How Nasa makes scientific data beautiful

 

Visualization

‘..compared rates of crime with rates of belief in heaven and hell in 67 countries’

etc