State of Technology #18

#at_other_places

 

#architecture – How would you design your service API to serve 3 end-points – Chrome, iPad and Android? How about 100? This is how Netflix re-designed its API with billions of API requests ending in more than 100 devices

 

 

#code – What is your Favorite (or, most hated) Programming Mistake?


#designWhat happens inside Google’s User Experience Lab? Would you remember the following better because of the font?

“A while back, there was this report that came out – from Princeton University, I think—that said using Comic Sans will make your material easier to remember. The harder your typeface is to read, the better the chance that people will remember what you wrote.”

#essay – You Are the Ad’ – how a social network could turn its members into “ads”, from Technology Review

 

 

#mobileWant to check how your website looks on a particular Android device, iPad2 or across different devices? Screenfly now lets you do it from browser. #ReallyUseful

 

 

#saasIn some cities, on some nights, more people now rent from homeowners using AirBnB than use Hotels. What’s the worst that could happen when the rental algorithm follows Ebay’s “exchange human data as late as possible”

#social – Games as a Service – 10 Game Design Lessons

 

 

#tweaks n’ hacks –  History of @ Symbol Part 1; Part 2.5 of 2

#etc

 

#parting_thought“The three golden rules to ensure computer security are: do not own a computer; do not power it on; and do not use it.” –Robert Morris

 

 

 

 

 

Advertisements

State of Data #59

#analysisWhat metrics to track for Web Marketing effectiveness (e.g., say, a Newsletter)? How many receive the email (Delivery Rate); how many view it (Open Rate); and why it differs from how many click to view it (Click-to-deliver Rate) etc. Avinash Kaushik describes how to measure effectiveness across three dimensionsAcquisition. Behavior. Outcomes. 

 


#architecture
It’s hard to avoid articles titled – ‘Is NoSQL Lady Gaga of Database world?’. In context, replace Lady Gaga with NoSQL below –

 

“You know, there’s a difference between not liking someone’s music and not recognizing their talent. If you can’t recognize the fact that Lady GaGa is, in fact, extremely talented in many ways, then you may want to try to look at her with less of a bias. There’s plenty of artists I can’t stand, but still respect their talent.”



#big_data
How a Cornell team dug out “Fake Reviews” with non-human “Classifiers” (PDF) and beat humans at it handily. The basic premise was that Truth = Informative writing; Deception = Imaginative Writing.

Now, only if we could now have an API to run it on the restaurant with thousands of five star ratings in Yelp….

Speaking of Yelp, it now has 20M reviews (a cool visualization from them). Bi-rite creamery from San Francisco is the business with most reviews (3903, as of writing)



#conference –   
What is Self-service BI; what infrastructure is needed; how to take your organization towards it –- Focus Roundtable on August 9, 9:30-10:30 AM
 


#DBMS
Coming to a Server near you soon — No more Reboot after a systems update – Oracle acquires Ksplice

#learning – Read between the lines – Lymbix offers a sentiment-analysis (of, say, your boss’s email feedback) API returning JSON or XML scoring attributes like sadness, humiliation, dominant_emotion, affection etc. Amusement Quotient: 100!

 

 

#visualizationCompelling metaphor to illustrate the difference between Data and Information

 

#etc

 

  • Go SQLiteMobile is now 2% of Global GDP. “worldwide mobile industry should bring in $1.3 trillion in 2011 and will represent about 2 percent of global gross domestic product”

  • Cloud coming home to roost – Microsoft suggests ‘Data Furnaces’ to heat your home. 400 CPUs can heat a single-family home. The full paper (PDF) is interesting read too.

  • United States of Netflix – Visualization of the month does not cost $6 more to see.

  • Metalog – Catalog of Data Catalogs from Governments (and some spam taking advantage of openness) across the world – datacatalogs.org

State of Data #58

 #analysis – Machine Learning Fairy Dust“Machine learning as a meme is very similar to “social” five to ten years ago: you took an okay-ish concept, added some crowdsourcing, folksonomies and social networking, and there it was, your wonderful Web 2.0 brainchild

 

#architecture – my/new/no/sql – Amazon CTO Werner Vogels and Facebook DB Engineer shreds Stonebraker’s tall claim (‘Facebook trapped in MySQL – fate worse than death’; see SoD #56).

Vogels tweeted –

Ouch!! “If you have never developed anything of that scale, you cannot be taken serious if you call for the reengineering of facebook’s data store,”
no troll left behind –  “Scaling data systems in real life has humbled me. I would not dare criticize an architecture that the holds social graphs of 750M and works”.

Facebook DB engineer   “What happens in real world if one gets 2x efficiency gain? Twice more data can be stored, twice more data intensive products can be launched.
What happens in academia of in-memory databases, if one gets 2x efficiency gain? A paper.”

 

#big_data – How to avoid Hadoop’s ‘tremendous inefficiency’? Daniel Abadi ruminates –

 

“The problem with Hadoop is that its strength is also its weakness. Hadoop gives the user tremendous flexibility and power to scale all kinds of different data management problems. This is obviously great. But it is this same flexibility that allows the user to perform incredibly inefficient things and not care …”

#learning – What REALLY kills transactional app performance – if all developers watch this 2 minutes of video snippet, most applications could be significantly faster.

‘Nested Select’ or ‘N+1 problem’ is firing many SQLs to get essentially the same set of underlying data. The metaphor to understand this anti-pattern is FANTASTIC—

 

“Would we do this for grocery? Then why would we use this pattern to get data out?

1.              Drive to the super market
2.              Locate what’s needed (e.g., milk)
3.              Pay
4.              Store the item in the car
5.              Drive back home
6.              Store the item (e.g., fridge)
7.              Then start again for the next item on the shopping list (e.g., corn flakes).”

 

#visualization – 80-ft wide visualization display driven by Space-Time Insight’s analytics’

#etc

 

§          Top 3 names as password – Maggie, Michael, Jennifer

§          14% passwords are purely numeric

§          Most popular keyboard pattern password is – drum roll – querty


Is your app just slow or you’re killing people?

 Technology review had this great article on ‘The Slow-Motion Internet’ -it is a write-up from Google POV, and that’s helpful as they have gotten the performance part right.

 

 

 

 

 

Key Insights –

 

 

 

  1. Demand for perceived performance is getting steeper. People could tolerate 8 sec for a page to load in 2000; but they would leave after 3 sec in 2009. In 2012, they would probably go away after 1 sec. Apart from organic improvement in performance, ‘minimized mobile interfaces’ caused the expectations to increase.

 

 

 

  1. Mobile requires us to up the game 100x – 50% customers want NO performance difference between Mobile and Web sites. Now, it may sound like saying – “I expect the plane to fly at same speed on ground, and at 30K feet” – but customer is always right, so we got to deliver.

    (Roughly) Average mobile bandwidth is about 20% as fast as usual broadband; mobile processors is about 50% as powerful and the screen is about 10% as big as non-mobile surfing device.

    This requires an app to be 100x (5*2*10) more efficient on Mobile as on Web.

    Not all 100x will come from “squeezing real estates” or just minimization always. There will always be stuff running on “cloud”. That will have to contribute significantly as well.

  2. Page’s Law – Software gets slower faster than hardware gets faster. i.e., Software follows reverse of Moore’s Law – not doing anything will make software 2x slower every 18 months. It is an arms race.

  3. At some degree, slowness is like killing people (Google Philosophy of Performance) – Humans live, on average, about 2B seconds. If we’re serving just 0.5B transactions a day 0.1 sec slower than expected, that equals to wasting 9 full life-times a year.

  4. Browsing web pages “should be like changing channel on the TV”

 

 

 

Data vs. Context – James Cameron and David Ogilvy Way

In this series of posts, I will try illustrate my learning in the data world.

  1. Don’t just rely on data. Powerful context dwarfs data. In fact, with a good context data serves as a subtext at best, as a distraction at worst.
     

    An old, blind man is said to beg outside Ogilvy and Mather office. On an unusually sunny and bright day David Ogilvy noticed the old man standing with a sign – “I am blind. Please help”. Ogilvy stopped by, took his sign and added a few words. “It’s spring out there and I am blind”. The poor man had collected a huge sum by the evening.

  2. Raw data is the cheapest commodity. The world produces more bits and bytes a day than the total number of people ever lived. Thus, data by default is useless. If raw data is the only weapon to convince, try to tell a better story or a meaningful context. 

    After the ship sinks in ‘Titanic’, James Cameron opted to show starry sky. His rationale was perhaps to introduce enough fuzzy light for viewers. Cameron researched some really fine details and nailed it accurate. e.g., only three of four engines were used in the actual ship. The director correctly showed only three stacks of smoke coming out in the movie. However, his team got the entire sky wrong!

    The constellation showed was wrong to the point of silly. Left side of it was the mirror image of stars showing in the right side. Sky never looked like it from any point on the earth ever in recorded history.

    A scientist got an opportunity to chat with and complained about this detail to Cameron. He thanked the scientist for noticing and — sarcastically — added had he gotten the night sky “right” the movie may have grossed another $200M. Who knows?

To be continued.

State of Technology #17

 #at_other_places –


#architecture – 
Google+ technical lead (ex-CTO, Plaxo) explains the underlying architecture –

use Java servlets for our server code and JavaScript for the browser-side of the UI, largely built with the (open-source) Closure framework, including Closure’s JavaScript compiler and template system.
Our backends are built mostly on top of BigTable and Colossus/GFS, and we use a lot of other common Google technologies such as MapReduce”


#code – 
A handy cheat-sheet on Time – from time zones in code to how to use ntpd to change system time

#design – Lesser known “cool” features of HTML5 –

  • how to use speech input;
  • generating pseudorandom number;
  • capturing performance on timing/navigation/ memory;
  • determine if your app is visible etc.

#essay – Libraries vs. Frameworks –

Libraries are useful collections of code that you can call to do bits of work. In the case of a web app, there are things that are common to pretty much all of them: Receiving HTTP connections, URL routing, database connections, HTML generation or constructing SQL statements from fragments. There are libraries that do all of these for you.
Frameworks aim to solve all your problems for you in one fell swoop. They collect these libraries together in one big package and you generally have very little choice in how your problems are solved.

 


#mobile – 
Beautiful, intuitive and easy on eyes – Nokia’s N9 UX Guidelines


#saas – Want to know the underlying technology of Facebook? SalesForce? Now, just type in the name and you get instant, live analysis on browser. Brilliant underthesite.com

#social – How some people spend $75,000 in Zynga, and does it help the IPO 


#tool –  One of the Top 5 used tools by Engineers ever, Putty released new version (0.61) after four years of development.

#etc 

 

 

#parting_thought – “If we can’t win on quality, we shouldn’t win at all.” – Larry Page 

State of Data #57

#analysis – Business of Big Data’ – how venture fund analysts look at Taxonomy of Big data. 

 

#architecture – Take the politics away for a while, this “garbled” tweet analysis is possibly the best UTF8 / encoding tutorial ever.
Why didn’t I just say “The software read in a UTF-8 encoded JSON stream of tweets and displayed it with an ANSI Windows Code Page 1252.” Because that wouldn’t be nearly as fun.

 


#big_data – Love this thought experiment – reproducing YouTube with Oracle-driven architecture would cost ~$0.5B in hardware and software license

#conference –   First MongoDB Meetup in Bay Area, July 19

2011 Joint Statistical Meetings, Miami, July 31-Aug 3 – heavy emphasis on using R with Predictive Analysis 


#DBMS – Expert Oracle GoldenGate (book) is now available in Safari. GoldenGate could be used either for DR or heterogeneous data integration with/without transformation


#learning –
What range is Hadoop compression factor? “6-10X compression is common for “curated” Hadoop data”.

For low-value machine generated data, “lot of it would be repetitive “I’m fine; nothing to report” kinds of events.

Compression factor also Reverse-engineered Yahoo’s recent “standard Hadoop server” config – 

  • 8-12 cores
  • 48 gigabytes of RAM
  • 12 disks of 2 or 3 TB each

 

#visualization – Superb analysis of Slopegraphs, Edward Tufte’s lesser popular idea. [ed. It probably did not pick up because it is like reading a book while skiing – lots of jarred vertical eye movements to read text for horizontal script readers]

 

#etc

 

  • Skiing Data Eye Candy – Serious skiers can see data on speed, vertical rate of descent etc in the corner of this goggles. Data can later be uploaded to computer.
     
  • No workload increase in a decade – “If you add up all the hours worked in the economy in June 2011 they are equal to all the hours worked in February of 1999”. Interesting data on supply-side economy.
     
  • Watch out for correlation attack – What is the relation between Wimbledon and Washing Machine repair business
     
  • Exa-iting – A single telescope aims to generate more data a day in 2020 than the entire internet generates today – an Exabyte a day. But, universe has only about 4% actual ‘matter’ – so the data should compress well 😉