State of Data #114

Top Read

Milton Friedman’s Thermostat (or, why we need to be very careful of correlations)

‘you were a passenger in a car watching the driver trying to keep a constant speed on a hilly road. You would see the gas pedal going up and down. You would see the car going downhill and uphill. But if the driver were skilled, and the car powerful enough, you would see the speed stay constant.

So, if you were simply looking at this particular “data generating process”, you could easily conclude: “Look! The position of the gas pedal has no effect on the speed!”

 

Analysis

FaceBook has about 180K servers, Google about 1M. How to estimate number of servers from a company’s energy consumption data

 

Big Data

Changing Panorama of Data (Martin Fowler)

“big data” is when the size of the data itself becomes part of the problem

 

Data Science


ggplot2 basics

 

DBMS

Paper on ‘FunSQL: It is time to make SQL functional’

 

Idea

Amazon has changed 1B people’s purchases; Google has changed 1B people’s information access; Facebook has changed 1B people’s identity.

If you get access to ALL the data in the world, what would you do? “Dinner with Data” (courtesy: The Stanford Alumni Club of Shanghai) discusses.

Learning

Slides from a Great talk on Data Quality from Deep Web (AT&T Research)

 

Visualization

Interaction Design for Data Visualization

 

etc

State of Data #113

Top Read

What Makes Paris Look Like Paris?

‘an algorithm that uses images pulled from Google’s Street View to find the small details that appear frequently in Paris and — crucially — do not appear in other cities. E.g., blue and green street signs, tall double-paned windows, balconies enclosed by iron filigree, and,  a particular lamppost style’

Analysis

Data on Multi-Device Users

Big Data

Inside Look into Outside-air Cooled and Sasquatch guarded Oregon Data Center of Facebook


Data Science

Patterns for Machine Learning

DBMS

How ebay uses Non-/Relational Technologies in ‘Buy It Now’

Idea

The Dangers of Data

‘Take a room with a furnace that’s regulated by a really good thermostat. Your data is going to show that the amount of fuel burned by the furnace is uncorrelated with the temperature in the room. Thus you’ll discover that burning fossil fuels doesn’t cause heat.’


Learning

Measuring your Heart Rate using R and Ruby

 

Visualization

The Evolution of the Web – done with HTML5, SVG and Canvas

etc

State of Data #111

Top Read

How can you integrate ‘top-down’ hypothesis-driven and ‘bottom-up’ data-driven modeling approaches into one unified mathematical framework? 

Analysis

Visualizations of the oDesk“oConomy”

Big Data

Intimacy of Data – Do we need a new model for cutting-edge insights

Data Science

DBMS

SQL myth that refuses to die

Idea

The good and bad of the Median

“consider a database with missing values for “number of children”, which (one would expect!) would always be an integer.  Substituting the mean for missings may result in observations sporting “1.2” children.  This is not a problem with the median.”

Learning

Linked Open Data – five-star format in Publishing data

Visualization

etc

State of Technology #70

elsewhere

 

Architecture

*** Modern Web Development – Part 1. Great Primer

 

Code

13 JavaScript Performance Tips from Google I/O 2012

 

Design

The Shape of Design – Design Book that’s Designed for Well


Essay

How to Write – 11 Simple Rules

 

Mobile

Unlock your Phone with a Look – if you have Android

 

SaaS

Captchas are becoming Really Ridiculous (outlook.com has bad ones too)

 

Service

Improving UX Through Front-End Experience – “Users expect 2 sec, after 3 sec 40% will abandon your site

 

Social

Worst Tweets of All Time – Be Really Careful from your Business Account

 

Tool

Foot Powered Washing Machine could change millions of lives

 

Hack

All Security Anti-Patterns in a single site – Tesco

 

etc

 

parting thought

‘I started to feel a bit disconnected from our San Francisco office, so we got two big screens with cameras there and here in New York. They’re on all day long, so you just walk by and say: “Hey, Pete, what’s up? Can you get Ben?” It works so well’  – Dennis Crowley, CEO, FourSquare

State of Data #110

Top Read

How would Usain Bolt vs. 1896 100m Gold Medalist run would look? Slate does a great visualization of ranking top athletes’ performance-on-a-page. Best thing about the visual is time axis is missing, giving it a clean one-dimensional look.

 

Analysis


Analytics for Books

 

Big Data

Soccer’s Big Data Revolution

Starting next year, every MLS soccer kit will be fitted with a small chip from Adidas. Located between a player’s shoulder blades, the chip will transmit 200 data records a second from the player to a local information system. From there onward, the data can be transmitted to the coaches’ laptops or tablets to provide clear overviews of the physical and physiological situations of the players on the pitch. It is not far-fetched to imagine coaches viewing a Football Manager game-like interface which showcases individual players’ level of fatigue, concentration, etcetera, varying in real time.

 

Data Science

Spreadsheet Data Manipulation using Excel (Microsoft Research) – with really useful examples of formatting Phone Number, Date

 

DBMS

Hard Parsing is Bad! ..But how Bad?’ shows why app developers should really care about coding best practices (in this case, binding variables)

 

Idea

Dark Side of Data

Jonas’s concept of “privacy by design” is an important attempt to address privacy issues in big data. Jonas envisions a day when “I have more privacy features than you” is a marketing advantage. It’s certainly a claim I’d like to see Facebook make.”

 

Learning

Accuracy vs. Precision in 72 seconds

 

Visualization

A GE Data Visualization Project –Real-time Twitter Conversation on Breast Cancer

 

etc