State of Technology Last Week -#4

#at_other_places –

  • Is it a good bargain? CamelCamelCamel tracks Amazon products price for you
  • NY Times’ Paywall –$40M to build, 4 lines of JavaScript to break?!
  • CodeSchool launches cheesy, but aims to be the hands-on learning center for to learn better programming.

#architecture –
Adrian Cockcroft of Netflix – after Amazon EBS was blamed for a Reddit outage – explains Elastic Block Store and its difference from EC2.

#code –
Speech-to-text transcription on Chrome using HTML5 speech-input API from Google

#design –
‘If I had more time, I would write a shorter letter’ — 40 great examples of minimalist photography

#essay –
Ben Horowitz writes about “Titles and Promotions” in context of Technology people and how Andreessen and Zuckerberg try to fix the ‘problem’ from two orthogonal perspectives

#mobile – A great overview of algorithms to extract text from HTML documents (like, Instapaper, Readability)

#saas –
Ever wondered what tools do new startups use? Here’s a single page repository of the entire collection of tools used for ‘Feedback and FAQ’ to ‘Contractor Management’ to ‘CDNs’.

#social – Influencers – roughly half the tweets consumed are generated by just about 20K elite “producers” – Yahoo research analyzing about 5B tweets shows ‘only about 15% of tweets are received directly from traditional media’ and tweets about video, music and books are longest lived.

#tool – Snappy is Google’s Open Source Compression Library emphasizing speed over volume reduction

#tweaks n’ hacks – Some ‘Popular Security Architectures’ – Street Lamp (e.g., HTTPS), Gilded Cage, Barn Door, Blame the Victim – Marc Stiegler from HP Labs offers metaphorical teaching


#parting_thought –
There are people who actually like programming. I don’t understand why they like programming.’ — Rasmus Lerdorf, Creator of PHP

State of Data Last Week -#41

#analysis – Pete Warden launches an amazing “Data Science Toolkit” landing page – collection of most useful open-source tools and data setswrapped in REST/JSON interface” that could be used for, say, “extracting main text from a news story”

#architecture –
Tired of “superficiality” around the NoSQL/ RDBMS decision? Want to read some solid math to see where each benefits from and together can evolve into something way more powerful? ACM publishes what this aggregator feels “Paper of the Year” – “A co-relational Model of Data for Large Shared Data Banks

#big_data – Google UK’s new quarterly online magazine dedicates inaugural issue to data – contributors include Hans Rosling, Hal Varian. We used to be data poor, now the problem is data obesity”

#DBMS – How Yelp uses MySQL and InnoDB engine presentation (PDF) – even though the 102-slide “deep technical” presentation starts with “We are not really MySQL or InnoDB experts”, this is as good as it gets.

#learning – ThinkStats:
An introduction to Probability & Statistics for (Python) Programmers

#visualization –
How visualizes real time download of Firefox 4


State of Technology Last Week -#3

#at_other_places –

#architecture –
Lessons learned building a 4096-core Cloud HPC Supercomputer for $418/hr


#code – On pi-day – estimate pi in Python in ~10 lines of code
#design – HTML5 – Moving from Hacks to Solutions’ –- Compelling metaphor — HTML5 is advanced because it has bigger “vocabulary”
#mobile – Ever bumped your iPhone with another and exchanged information? Like, LinkedIn iphone app. A great introduction to near-field communication on ARS is worth reading.

#saas –
Authentication Design Best Practices’ with OpenID, OAUTH, Facebook connect — SXSU talk from last week
#thoughts – Does ‘Twitterization’ make our brain ‘freeze’ when taking decisions?
#tool – Coolest Chrome API that may save annoying ads and precious CPU cycles – FlashBlock – blocks all flash from websites, unless you click on it.

#tweaks n’ hacks – Junkyard Jumbotron from MIT Lab – “take a bunch of random displays and instantly stitch them together in a large, virtual display”



#parting_thought – ‘I spent a year at home with my wife and four young children. But all I learned about work-life balance from that year was that I found it quite easy to balance work and life when I didn’t have any work. – On ‘How to Make Work-Life Balance Work


State of Data Last Week -#40

#analysis – How to use statistics to find out if the art you purchased in eBay is fake or real (PDF; from Significance Magazine, March 2011)

#finance –
Wonga and Klarna are using unconventional forms of data / algorithm to provide financial services. E.g., “Consumers who shop online at 3am may get rejected by Klarna. Having a mobile phone with a contract helps to get money from Wonga

#architecture – HOWTO for organizations to open up data’ discusses from ‘why open data’ to the legalities; ‘technical openness’ (e.g., Bulk API etc). The site is not yet fully developed, and some sections (e.g., FAQ) may lack content.

#big_data –
Google’s Ads Preferences believes I’m a guy interested in politics, Asian food, perfume, celebrity gossip, animated movies and crime but who doesn’t care about “books & literature” or “people & society.” Joel Stein’s latest Time Cover Story – “Data Mining: How Companies Now Know Everything about You

Ever had those SQL queries where two columns always appear together as filter (e.g., TRANSACTION_TYPE and ZIP)? And those two columns are skewed (think of California Zipcodes vs. Rhode Island’s). A very cool “extended statistics” collection feature in Oracle now tools optimizer with more smartness to evaluate it.

#learning –
For those using GoldenGate as Data Replication tool, “Oracle GoldenGate 11g Implementer’s Guide” book was published this week. At first glance, the coverage looks quite extensive. The ebook version can be purchased directly from the publisher.

#visualization –
How ‘The New York Times uses R for Data Visualization’ – a 60 minute presentation from Amanda Cox


  • Revealing emailsGmailers are thinner? Hunch – a recommendation engine – analyzes data
  • Call for papers opened at Oracle OpenWorld, 2011
  • Numbers revealed on Twitter’s 5th birthday – It takes a week to create 1B tweets; 6,939 tweets per sec is the maximum throughput so far (New Year’s 2011)
  • Separate Hype from Reality with solid data – ‘One in five divorces linked to facebook’ – except, the originator of the idea acknowledges ‘this may not be representative of all divorces’

State of Data Last Week -#39

#analysis – Winner of 2010 Turing AwardLeslie Valiant for Machine Learning – “ introduced the “probably approximately correct” (PAC) model of machine learning that has helped the field of computational learning theory grow, and the concept of holographic algorithms. His earlier work in automata theory includes an algorithm for context-free parsing, which is (as of 2010) still the asymptotically fastest known” (from WikiPedia)

#api – Scraper is a well working Chrome extension for ‘getting data out of web pages and into spreadsheets’

#architecture – Why the legacy BI ‘Best Practice Architecture’ is rapidly getting obsolete – a lot to with innovations around hardware, appliance and storage of data in last decade or so. E.g., price of 1GB storage was $300K in 1981, $0.10 in 2010. (Metaphor/ picture on ‘column-based’ storage is from the first article)

#big_data (and a little boy)– ‘
MIT Scientist Deb Roy captures 90,000 Hours of Video of his son’s first words, graphs it’ – 200 TB of data was captured to track ‘the emergence and refinement of specific words in Roy’s son’s vocabulary’. Deb’s TED 2011 talk is now available

#DBMS – Jay Turner came out with a nifty way to easily capture SQLs that limit scalability by being parsed over and over for transactional Oracle databases

#learning –
FREE – ‘Linked Data: Evolving the Web into a Global Data Space’, book authored by Tom Heath. However, purchase is recommended to ensure more such publication.

#thoughtfulness –
Scott Adams continues boundary less thinking on a ‘future with no data privacy’ – “Privacy has its benefits, but you’re giving up a lot of cool apps

#visualization – enables ‘to easily compare the size of all sorts of artefacts, like objects (e.g. iPad vs. iPad 2), persons (e.g. Obama vs. Sarkozy)



State of Technology Last Week – #2

#at_other_places –

#architecture – ‘
The MIT 150 exhibition’ – ‘unique exhibition made up of stories and objects that members of the MIT community helped to select, collect and make available to the public, many for the first time’ – ‘Virus Battery’ is cool, but ‘MIT-Hardvard Merger Petition’ was revolutionary.

#code – New Interview Questions for Senior Software Engineers’ – even though the author clarifies – “I think we all agree (or at least we should) that if you go into an interview tomorrow and you look across the table and the interviewer has simply printed out this list and is reading from it, that you should excuse yourself and run”, the list is well compiled and the comments are entertaining.

#design –
Why ‘Angry Birds’ is so popular – a ‘cognitive teardown of the user experience’ – amazing analysis ranging from ‘short-term memory management’ to ‘faster is better’ paradigm that resulted 200 million minutes a day engagement

#longread –
One of the rare profiles of Jack Dorsey at Vanity Fair – apparently he got the Twitter idea from ‘haiku of Taxicab communication – the way drivers and dispatchers succinctly convey locations by radio

#mobile – Mobile UI Patterns – from Check-in screens to Sign-up flows to Splash Screens

#saas –
Zero to 1M users – What lessons did Dropbox and Xobni learn?


#parting_thought –
Maybe you won’t be the next Donald Knuth, but that’s not what it takes, all you need to be is a little bit better than you were yesterday and to keep doing that for a long time.’ – from ‘The Need to Code


State of Data Last Week – #38

#analysis – 133M blog posts, 231M social media feeds. 3TB data set collected between Jan 13 and Feb 14, 2011. Data Challenge – culminating in ICWSM @ Barcelona this summer – ‘locate significant posts in the collection which are relevant to the revolutions in Tunisia and Egypt’.

#api – Google App Engine to support SQL

#architecture – StackOverflow Architecture lowdown – how it deals with 800 HTTP requests/second. “Some raw SQL” in data access layer.

#big_data – MongoDB – apparently works great when entire data fits in the memory; otherwise it could be ‘up to 17 sec for 30,000 reads’

Big Data is Big Business – TeraData buys Aster Data for $263M

#learning – Why ‘most benchmarks are seriously broken’ because ‘complexity and performance model quality are inversely related’  – a great talk on ‘Performance Anxiety’ at Devoxx 2010.

#visualization – RStudio – new IDE for R – got raving reviews and many endorsements from community