State of Technology #18

#at_other_places

 

#architecture – How would you design your service API to serve 3 end-points – Chrome, iPad and Android? How about 100? This is how Netflix re-designed its API with billions of API requests ending in more than 100 devices

 

 

#code – What is your Favorite (or, most hated) Programming Mistake?


#designWhat happens inside Google’s User Experience Lab? Would you remember the following better because of the font?

“A while back, there was this report that came out – from Princeton University, I think—that said using Comic Sans will make your material easier to remember. The harder your typeface is to read, the better the chance that people will remember what you wrote.”

#essay – You Are the Ad’ – how a social network could turn its members into “ads”, from Technology Review

 

 

#mobileWant to check how your website looks on a particular Android device, iPad2 or across different devices? Screenfly now lets you do it from browser. #ReallyUseful

 

 

#saasIn some cities, on some nights, more people now rent from homeowners using AirBnB than use Hotels. What’s the worst that could happen when the rental algorithm follows Ebay’s “exchange human data as late as possible”

#social – Games as a Service – 10 Game Design Lessons

 

 

#tweaks n’ hacks –  History of @ Symbol Part 1; Part 2.5 of 2

#etc

 

#parting_thought“The three golden rules to ensure computer security are: do not own a computer; do not power it on; and do not use it.” –Robert Morris

 

 

 

 

 

State of Data #59

#analysisWhat metrics to track for Web Marketing effectiveness (e.g., say, a Newsletter)? How many receive the email (Delivery Rate); how many view it (Open Rate); and why it differs from how many click to view it (Click-to-deliver Rate) etc. Avinash Kaushik describes how to measure effectiveness across three dimensionsAcquisition. Behavior. Outcomes. 

 


#architecture
It’s hard to avoid articles titled – ‘Is NoSQL Lady Gaga of Database world?’. In context, replace Lady Gaga with NoSQL below –

 

“You know, there’s a difference between not liking someone’s music and not recognizing their talent. If you can’t recognize the fact that Lady GaGa is, in fact, extremely talented in many ways, then you may want to try to look at her with less of a bias. There’s plenty of artists I can’t stand, but still respect their talent.”



#big_data
How a Cornell team dug out “Fake Reviews” with non-human “Classifiers” (PDF) and beat humans at it handily. The basic premise was that Truth = Informative writing; Deception = Imaginative Writing.

Now, only if we could now have an API to run it on the restaurant with thousands of five star ratings in Yelp….

Speaking of Yelp, it now has 20M reviews (a cool visualization from them). Bi-rite creamery from San Francisco is the business with most reviews (3903, as of writing)



#conference –   
What is Self-service BI; what infrastructure is needed; how to take your organization towards it –- Focus Roundtable on August 9, 9:30-10:30 AM
 


#DBMS
Coming to a Server near you soon — No more Reboot after a systems update – Oracle acquires Ksplice

#learning – Read between the lines – Lymbix offers a sentiment-analysis (of, say, your boss’s email feedback) API returning JSON or XML scoring attributes like sadness, humiliation, dominant_emotion, affection etc. Amusement Quotient: 100!

 

 

#visualizationCompelling metaphor to illustrate the difference between Data and Information

 

#etc

 

  • Go SQLiteMobile is now 2% of Global GDP. “worldwide mobile industry should bring in $1.3 trillion in 2011 and will represent about 2 percent of global gross domestic product”

  • Cloud coming home to roost – Microsoft suggests ‘Data Furnaces’ to heat your home. 400 CPUs can heat a single-family home. The full paper (PDF) is interesting read too.

  • United States of Netflix – Visualization of the month does not cost $6 more to see.

  • Metalog – Catalog of Data Catalogs from Governments (and some spam taking advantage of openness) across the world – datacatalogs.org

State of Data #58

 #analysis – Machine Learning Fairy Dust“Machine learning as a meme is very similar to “social” five to ten years ago: you took an okay-ish concept, added some crowdsourcing, folksonomies and social networking, and there it was, your wonderful Web 2.0 brainchild

 

#architecture – my/new/no/sql – Amazon CTO Werner Vogels and Facebook DB Engineer shreds Stonebraker’s tall claim (‘Facebook trapped in MySQL – fate worse than death’; see SoD #56).

Vogels tweeted –

Ouch!! “If you have never developed anything of that scale, you cannot be taken serious if you call for the reengineering of facebook’s data store,”
no troll left behind –  “Scaling data systems in real life has humbled me. I would not dare criticize an architecture that the holds social graphs of 750M and works”.

Facebook DB engineer   “What happens in real world if one gets 2x efficiency gain? Twice more data can be stored, twice more data intensive products can be launched.
What happens in academia of in-memory databases, if one gets 2x efficiency gain? A paper.”

 

#big_data – How to avoid Hadoop’s ‘tremendous inefficiency’? Daniel Abadi ruminates –

 

“The problem with Hadoop is that its strength is also its weakness. Hadoop gives the user tremendous flexibility and power to scale all kinds of different data management problems. This is obviously great. But it is this same flexibility that allows the user to perform incredibly inefficient things and not care …”

#learning – What REALLY kills transactional app performance – if all developers watch this 2 minutes of video snippet, most applications could be significantly faster.

‘Nested Select’ or ‘N+1 problem’ is firing many SQLs to get essentially the same set of underlying data. The metaphor to understand this anti-pattern is FANTASTIC—

 

“Would we do this for grocery? Then why would we use this pattern to get data out?

1.              Drive to the super market
2.              Locate what’s needed (e.g., milk)
3.              Pay
4.              Store the item in the car
5.              Drive back home
6.              Store the item (e.g., fridge)
7.              Then start again for the next item on the shopping list (e.g., corn flakes).”

 

#visualization – 80-ft wide visualization display driven by Space-Time Insight’s analytics’

#etc

 

§          Top 3 names as password – Maggie, Michael, Jennifer

§          14% passwords are purely numeric

§          Most popular keyboard pattern password is – drum roll – querty


Is your app just slow or you’re killing people?

 Technology review had this great article on ‘The Slow-Motion Internet’ -it is a write-up from Google POV, and that’s helpful as they have gotten the performance part right.

 

 

 

 

 

Key Insights –

 

 

 

  1. Demand for perceived performance is getting steeper. People could tolerate 8 sec for a page to load in 2000; but they would leave after 3 sec in 2009. In 2012, they would probably go away after 1 sec. Apart from organic improvement in performance, ‘minimized mobile interfaces’ caused the expectations to increase.

 

 

 

  1. Mobile requires us to up the game 100x – 50% customers want NO performance difference between Mobile and Web sites. Now, it may sound like saying – “I expect the plane to fly at same speed on ground, and at 30K feet” – but customer is always right, so we got to deliver.

    (Roughly) Average mobile bandwidth is about 20% as fast as usual broadband; mobile processors is about 50% as powerful and the screen is about 10% as big as non-mobile surfing device.

    This requires an app to be 100x (5*2*10) more efficient on Mobile as on Web.

    Not all 100x will come from “squeezing real estates” or just minimization always. There will always be stuff running on “cloud”. That will have to contribute significantly as well.

  2. Page’s Law – Software gets slower faster than hardware gets faster. i.e., Software follows reverse of Moore’s Law – not doing anything will make software 2x slower every 18 months. It is an arms race.

  3. At some degree, slowness is like killing people (Google Philosophy of Performance) – Humans live, on average, about 2B seconds. If we’re serving just 0.5B transactions a day 0.1 sec slower than expected, that equals to wasting 9 full life-times a year.

  4. Browsing web pages “should be like changing channel on the TV”

 

 

 

Data vs. Context – James Cameron and David Ogilvy Way

In this series of posts, I will try illustrate my learning in the data world.

  1. Don’t just rely on data. Powerful context dwarfs data. In fact, with a good context data serves as a subtext at best, as a distraction at worst.
     

    An old, blind man is said to beg outside Ogilvy and Mather office. On an unusually sunny and bright day David Ogilvy noticed the old man standing with a sign – “I am blind. Please help”. Ogilvy stopped by, took his sign and added a few words. “It’s spring out there and I am blind”. The poor man had collected a huge sum by the evening.

  2. Raw data is the cheapest commodity. The world produces more bits and bytes a day than the total number of people ever lived. Thus, data by default is useless. If raw data is the only weapon to convince, try to tell a better story or a meaningful context. 

    After the ship sinks in ‘Titanic’, James Cameron opted to show starry sky. His rationale was perhaps to introduce enough fuzzy light for viewers. Cameron researched some really fine details and nailed it accurate. e.g., only three of four engines were used in the actual ship. The director correctly showed only three stacks of smoke coming out in the movie. However, his team got the entire sky wrong!

    The constellation showed was wrong to the point of silly. Left side of it was the mirror image of stars showing in the right side. Sky never looked like it from any point on the earth ever in recorded history.

    A scientist got an opportunity to chat with and complained about this detail to Cameron. He thanked the scientist for noticing and — sarcastically — added had he gotten the night sky “right” the movie may have grossed another $200M. Who knows?

To be continued.

State of Technology #17

 #at_other_places –


#architecture – 
Google+ technical lead (ex-CTO, Plaxo) explains the underlying architecture –

use Java servlets for our server code and JavaScript for the browser-side of the UI, largely built with the (open-source) Closure framework, including Closure’s JavaScript compiler and template system.
Our backends are built mostly on top of BigTable and Colossus/GFS, and we use a lot of other common Google technologies such as MapReduce”


#code – 
A handy cheat-sheet on Time – from time zones in code to how to use ntpd to change system time

#design – Lesser known “cool” features of HTML5 –

  • how to use speech input;
  • generating pseudorandom number;
  • capturing performance on timing/navigation/ memory;
  • determine if your app is visible etc.

#essay – Libraries vs. Frameworks –

Libraries are useful collections of code that you can call to do bits of work. In the case of a web app, there are things that are common to pretty much all of them: Receiving HTTP connections, URL routing, database connections, HTML generation or constructing SQL statements from fragments. There are libraries that do all of these for you.
Frameworks aim to solve all your problems for you in one fell swoop. They collect these libraries together in one big package and you generally have very little choice in how your problems are solved.

 


#mobile – 
Beautiful, intuitive and easy on eyes – Nokia’s N9 UX Guidelines


#saas – Want to know the underlying technology of Facebook? SalesForce? Now, just type in the name and you get instant, live analysis on browser. Brilliant underthesite.com

#social – How some people spend $75,000 in Zynga, and does it help the IPO 


#tool –  One of the Top 5 used tools by Engineers ever, Putty released new version (0.61) after four years of development.

#etc 

 

 

#parting_thought – “If we can’t win on quality, we shouldn’t win at all.” – Larry Page 

State of Data #57

#analysis – Business of Big Data’ – how venture fund analysts look at Taxonomy of Big data. 

 

#architecture – Take the politics away for a while, this “garbled” tweet analysis is possibly the best UTF8 / encoding tutorial ever.
Why didn’t I just say “The software read in a UTF-8 encoded JSON stream of tweets and displayed it with an ANSI Windows Code Page 1252.” Because that wouldn’t be nearly as fun.

 


#big_data – Love this thought experiment – reproducing YouTube with Oracle-driven architecture would cost ~$0.5B in hardware and software license

#conference –   First MongoDB Meetup in Bay Area, July 19

2011 Joint Statistical Meetings, Miami, July 31-Aug 3 – heavy emphasis on using R with Predictive Analysis 


#DBMS – Expert Oracle GoldenGate (book) is now available in Safari. GoldenGate could be used either for DR or heterogeneous data integration with/without transformation


#learning –
What range is Hadoop compression factor? “6-10X compression is common for “curated” Hadoop data”.

For low-value machine generated data, “lot of it would be repetitive “I’m fine; nothing to report” kinds of events.

Compression factor also Reverse-engineered Yahoo’s recent “standard Hadoop server” config – 

  • 8-12 cores
  • 48 gigabytes of RAM
  • 12 disks of 2 or 3 TB each

 

#visualization – Superb analysis of Slopegraphs, Edward Tufte’s lesser popular idea. [ed. It probably did not pick up because it is like reading a book while skiing – lots of jarred vertical eye movements to read text for horizontal script readers]

 

#etc

 

  • Skiing Data Eye Candy – Serious skiers can see data on speed, vertical rate of descent etc in the corner of this goggles. Data can later be uploaded to computer.
     
  • No workload increase in a decade – “If you add up all the hours worked in the economy in June 2011 they are equal to all the hours worked in February of 1999”. Interesting data on supply-side economy.
     
  • Watch out for correlation attack – What is the relation between Wimbledon and Washing Machine repair business
     
  • Exa-iting – A single telescope aims to generate more data a day in 2020 than the entire internet generates today – an Exabyte a day. But, universe has only about 4% actual ‘matter’ – so the data should compress well 😉


 

State of Technology #16

#at_other_places –

  • Time flies – Apple sold 15 Billionth App – developers make $2.5B, Apple pockets $1B. 1 Billionth app sold just seems to have happened yesterday!
  • Augmented Reality for Mindreading in (paper) Comic Book – Berg printed this awesome comic with a third ink that is only visible with a UV light source

#architecture –  node.js is “in thing” – this is probably the best introduction to node.js in O’Reilly this week.
“Node.js (or, as it’s more briefly called by many, simply “Node”) is a server-side solution for JavaScript, and in particular, for receiving and responding to HTTP requests

Node brings a different approach to the party: it seeks to move you and your web applications to an evented model, or if you like, a “small event” model. In other words, instead of sending a few requests with lots of data, you should be sending tons of requests, on lots of events, with tiny bits of data, or requests that need a response with only a tiny bit of data’


#code – 2011’s Razzies for Software were out last week – “Top 25 Most Dangerous Software Errors” of the year. Ouch! Top 3 –

  1. SQL Injection
  2. OS Injection
  3. Buffer Overflow


#design – 
About 200 new fonts for your new app – new WebFonts from Google 

#essay – Inside Google+’ from Wired – Steven Levy got the privilege to work with the team as they developed the social tool. His new book ‘In the Plex’ is a mesmerizing read detailing on Google People, Products and Processes. Very highly recommended summer read.

‘The massive wave symbolizes the ways Google views the increasingly prominent social aspect of the web — as a possible tsunami poised to engulf it, or a maverick surge that it will ride to glory. Beirstadt’s turbulent vision is the perfect illustration. “We needed a code name that captured the fact that either there was a great opportunity to sail to new horizons and new things, or that we were going to drown by this wave,” Gundotra said last August, when Google first showed me a prototype

 


#mobile – 
Designing for Android’ – Smashing Magazine must-read resource

 


#saas – 
‘How we made Hotmail 10x faster’ (No! Not from losing users therefore reducing load!). THREE main ideas resulting a phenomenal gain of 22x in ‘Composing Message’

1.     Caching

              2.     Pre-loading

              3.     Asynchronous Operations

 

#social – Pope tweets for the first time

 

#tool – Button Basics – how to design buttons with CSS vs. images; how different browsers render it differently – excellent primer.

 

#tweaks n’ hacks – DIY DNA – Want to play with your own DNA? Decipher the genome code? PCR machine now sells for less than a iPad (BTW, the genome data in a human gamete is roughly about 37MB)


#etc

 

#parting_thought – ‘You’ve found market price when buyers complain but still pay’ – Paul Graham

 

State of Data #56

 #analysis – Mining Twitter for consumer attitudes towards airlines (using R) – (a) “search twitter in 1 line of code”; (b) Estimate sentiment from ‘opinion lexicon’ (how to analyze sarcasm); (c) score/compare/ rinse/repeat 

 

 

#architecture –  Facebook has a “serious MySQL problem”?

1,800 servers dedicated to MySQL and 805 servers dedicated to memcached

…it has so much user data, and because every user clicking “Like,” updating his status, joining a new group or otherwise interacting with the site constitutes a transaction its MySQL database has to process. Every second a user has to wait while a Facebook service calls the database is time that user might spend wondering if it’s worth the wait.” 

 

#big_data –  Patriot Act vs. Data Protection Acts in Europe – what happens when they conflict – very pertinent for that ‘cloud’ thing

 

#conference – Another TDWI Summit – “Deep Analytics for Big Data”, San Diego, Sept 25-27

 

#competition – WikiMedia announces ‘a data modeling competition to develop an algorithm that predicts future editing activity on Wikipedia’


#DBMS – Counter Intuitive Fact #2 – A good hardware upgrade could kill the performance of your application. “Daily WTF” analyzes one of the many “whys” —

“Prior to the upgrade, at Wal*Mart waddling speed, the application trickled through the database table, and that meant very little happened in any given second. But after the upgrade, a number of order lines processed quickly, and suddenly the fact that some orders had the same item on two lines meant that the transaction exploded. Roughly 50% of the time that an order had duplicate lines, it now failed.

 

#visualization – Real Estate Data viz. from Trulia. ‘When does crime happen in big cities’. San Francisco, beware of 9PM!

Skyscraper of Mobile Phone Call Data – Data or Abstract Art? (from New York Times)


#etc

  • Twitter acquires BackType for Social Analytics
  • Nordstrom Rack – the only winner in Groupon war? This amazing data visualization from Harvard Business Review shows so.
  • #math – Celebrate a truly odd day this Saturday. Next one is on 9-11-13
  • Data Analysis could lead to Meatless Mondays? – “83% of the average U.S. household’s carbon footprint for food comes from growing and producing it. Transportation is only 11%” “one day per week’s worth of calories from red meat and dairy products.. achieves more GHG reduction than buying all locally sourced food”

State of Technology #15

 #at_other_places –


#architecture – 
A great tutorial on JVM heap analysis, GC and tuning it (isn’t Prezi cooler than powerpoint?)


#code – 
appdone claims to build “any web app in 7 days” or money back!


#design – 
Raising the barcode – some of the most innovative bar code designs

#mobile – Why mobile apps behave bad when you’re Mobile and what you could do as a mobile developer? It’s all because of TCP-over-TCP – “On our mobile link, with an underlying reliable data connection but with highly variable delay, there is little/no loss and no congestion, but of course TCP doesn’t know that and backs off making the connection our apps see next-to-useless” 

#social – How to manage crowdsourced human computation – when Alan Turing mentioned computers, he mostly meant humans; and how Amazon Mechanical Turk could classify Arabic into different dialects

 

#tool –  Area startup turning windows into solar panels


#tweaks n’ hacks –  Run Android on your PC

#etc

 

 

#parting_thought – “The most popular software for writing fiction isn’t Word. It’s Excel.”

– Brian Alvey, the New Shelton wet/dry