State of Technology #22

#at_other_places – 

·  2011 Gartner “Hype Cycle” for technology is out -

IN – Group Buying; Mobile Robots; Video Analytics for customer service; Image Recognition; Big Data
OUT – Cloud Platform; Hosted Desktop;E-readers; In-memory Database

image

· Bootstrap, toolkit from Twitter to quickly build apps and pages

· jQuery overtakes Flash in most popular websites

#architectureHow to “close”candidates” – persuade to join your team once interviewed –

1) Figure out what is the candidate’s dream.
2) Determine if job and candidate are the right fit.
3) Communicate your own passion.

#codeJava Persistence API – A Quick Intro

image

#designSteve Job’s Best Quotes – Wall Street Journal compiles

“Design is a funny word. Some people think design means how it looks. But of course, if you dig deeper, it’s really how it  works. The design of the Mac wasn’t what it looked like, although that was part of it. Primarily, it was how it worked. To design something really well, you have to get it. You have to really grok what it’s all about. It takes a passionate commitment to really thoroughly understand something, chew it up, not just quickly swallow it. Most people don’t take the time to do that.” 

#essayWhy Software is eating the World” – from Mark Andreessen


“First of all, every new company today is being built in the face of massive economic headwinds, making the challenge far greater than it was in the relatively benign ’90s. The good news about building a company during times like this is that the companies that do succeed are going to be extremely strong and resilient. And when the economy finally stabilizes, look out—the best of the new companies will grow even faster.

#mobile8 best practices for Deploying a top Mobile App” – it is (mostly) about first Two Weeks!

image


#saas
How to use  UTF-8 Throughout your Web Stack

UTF-8 is extremely sane. Well, as sane as an encoding can be that features backwards-compatibility with ASCII. Everything you care about supports UTF-8. Trust me:you want it everywhere.


#socialInnovation in developer/customer  support – Facebook Hooks up with Stack Overflow

#toolA great tutorial / write-up on ‘How Browsers Work’ (from HTML5Rocks) –

image


#tweaks n’ hacksSSH can do THAT? Very useful productivity tips for working with remote servers.

#etc


#parting_thought
“It’s more fun to be a pirate than to join the navy.”- Jobs 

State of Data #63

#analysis – 46 page Internet Marketing Strategy “briefing looking at customer centricity, channel diversification, data, social media and content strategy. This is their usual high grade quality and worth a look”
 

Disdain Data Diving – “Today’s Big Data heavy-lifting machines and software systems were built back in the day when millions of customers made millions of phone calls and each one had to be captured, stored, and found in a heartbeat. Banking and credit card transactions by the billions had to be put into safekeeping somewhere they could be added up, averaged, and recalled if need be.

#architecture –  MongoDB loves BSON (Binary JSON) for Data Exchange –

“Fast scan-ability. For very large JSON documents, scanning can be slow. To skip a nested document or array we have to scan through the intervening field completely. In addition as we go we must count nestings of braces, brackets, and quotation marks. In BSON, the size of these elements is at the beginning of the field’s value, which makes skipping an element easy.


#big_pig_data – Angry Birds is played 1.4B minutes a week. Now, they have tied up with a predictive analytics solution provider to help forecast pig smashing abilities.

#Data_Science –   Multiple packages in R to read online datasets 


#DBMS
 – A phenomenal paper from NoCOUG on ‘NFS Tuning for Oracle’ (PDF) by Kyle Hailey. 

 #idea – Facebook engineer suggests reducing disk RPM to reduce data center power cost

 

Item Value
Normal Speed 7200 RPM
Reduced Speed 3600 RPM
State Transition (triggered by an OS command) 15 seconds
Normal Idle Power 7W
Reduced Speed Idle Power 3W
Normal Bandwidth ~100 MB/s
Reduced Speed Bandwidth >10 MB/s
Normal Latency ~10 ms
Reduced Speed Latency <100 ms

 

 

 #learning – What every Data Programmer Needs to know about Disks (PPT; from OSCON 2011) – very highly recommended especially for ‘Why EC2 I/O is Slow and Unpredictable’ –

Newer intel chips have the northbridge controller on-die. Southbridge bandwidth is usually <= 10GB/sec, and you are sharing this with other customers’ network and disk I/O. That, and you may be sharing drive spindles.

 

 

#visualization – Stanford’s ‘Republic of Letters’ visualization – “on database of thousands of letters exchanged between prominent intellectuals in the 17th and 18th centuries” – is made on HTML5. Has connections, volume and flow views of over 55,000 letters exchanged among 6,400 correspondents.

 


#etc
 

State of Technology #21

 #at_other_places –

o         Dear Photograph;
o         Proust;
o         Join.me (free ‘webex’ with no registration);
o         FreeRice (You learn; hungry people get to eat)

 #architecture – How to retire a great Interview problem – “word break” problem described as –


Given an input string and a dictionary of words, segment the input string into a space-separated sequence of dictionary words if possible. For example, if the input string is “applepie” and dictionary contains a standard set of English words, then we would return the string “apple pie” as output

#codeLearn JavaScript on the fly from CodeAcademy is really, really effective and smart way to learn. No registration required


#designThe man who designed ‘Like’ 

Some of Facebook’s look was inspired by the videogame look of the 1980s. “Back then, the aesthetic had a very limited color palate relative to videogames today. Everything is a bit blocky, without smooth surfaces,” he said. Yet, “there is a level of artistry in videogames that is unparalleled.”

#essay – What 8 things Susan Wojcicki learned about innovation as employee #16 – among other principles – “Never_fail_to_fail” and “Spark_with_imagination, fuel_with_data” –


“.. technology for driverless cars to reduce the number of lives lost to roadside accidents each year. These cars, still in development, have logged 140,000 hands-free miles driving down San Francisco’s famously twisty Lombard Street, across the Golden Gate Bridge and up the Pacific Coast Highway without a single accident. 

P.S. Not anymore without accident! 


#mobile – iPhone component cost is $178 – Samsung alone gets about $45 of it; Apple’s slice is $378 

#saas – Sign of things on SaaS delivery – Firefox removes version number

#social – Drug companies lose special protection…on Facebook 


#tool – Step-by-step guide to find JavaScript memory leaks; including actual memory leak problem analysis from Facebook. 


#tweaks n’ hacks – Data Sandals won’t probably rock the fashion scene any time soon. But…

 


#etc 


#parting_thought – “When you’re young, you look at television and think, There’s a conspiracy. The networks have conspired to dumb us down. But when you get a little older, you realize that’s not true. The networks are in business to give people exactly what they want. That’s a far more depressing thought. Conspiracy is optimistic! You can shoot the bastards! We can have a revolution! But the networks are really in business to give people what they want. It’s the truth.”

— Steve Jobs



State of Data #62

#analysisHotmail product usage data analysis and how it influences the design –

“three types based on their behavior—Filers, Pilers, and Deleters..
Deleters generally delete email after it arrives. Deleters receive an average of 211 email messages each week and end up deleting almost 80% of them.. The mantra for these people is, “My kitchen has to be clean before I start cooking.

Filers put nearly half of their email (44%) into folders immediately after it arrives.

Pilers receive the least amount of email each week (174 messages). But that means they still receive an average of 9,048 email messages per year. Because most of those messages (57%) never leave the Piler’s inbox, their email starts to pile up

 


#analysis
Google has started certification on Analytics with detailed “Analytics IQ Lessons” culminating in an exam



#big_dataWhole controversy around KissMetrics Data Collection practices and their official response to the allegations

#conferenceACM Data Mining Camp, October 2011 – “local, cheap, and high-quality learning opportunity”   

#Data_Science –  Verifying Benford’s Law on Tweets  – it works! 

#DBMSMost Big Data engineers mention ‘performance’ as the #1 priority. ‘3-minute test: What do you know about SQL Performance’ lets you figure out strengths, choose between MySQL; Oracle; PostGres; SQL Server and hammer out.

 


#idea
– Are we becoming too analytical? Serious introspection to be self-aware of possible ‘bandwagon effect’ of ‘big data’ and ‘analytics’–

“But the biggest reason I believe these two products have not taken off is their reliance on the belief that simply giving people their data and letting them analyze it is the way to improve behavior (both for health and for the environment)

One of the first things we teach in introductory human-computer interaction (HCI) is that “you are not your user” and “beware designer ego bias.” Google seemed to have fallen into this well-known trap in their design and testing for Google PowerMeter (and perhaps Google Health).”

 


#learningStanford University courses on Data – FREE for Fall, 2011, requires about 10 hrs of work a week per course; class begins on October 10 –

 

#math/statHow likely is it for a telephone number (w/o area code) to be prime? About 6%. With area code it may be somewhere around 4%.

 

#visualizationDichotomy or Difference? Statistical Graphics vs. Information Visualization – two crisp articles in most recent ‘Statistical Computing and Graphics Newsletter’ (PDF) discuss it from POVs of Computer Science and Statistics.   Follow-up from Andrew Gelman is interesting too.


#etc

 

 



 

 

State of Technology #20

 

 #at_other_places – 

·         Wolfram released CDF (Computable Document Format) based on ever popular Mathematica

·         Browser is the new tablet – Amazon’s Kindle-on-Cloud reads GREAT

·         Not AAA rating, it’s AAPL

#architecture – Is ‘Open Office Layout’ bad for brain and good for bugs?
Very interesting debate forming out there -

Jordon quoted Joel Splosky when he mentioned that open-office layouts and the similar concept of war rooms are the places where bugs are bred. According to him, in such settings, no-one can concentrate for long due to constant interruptions and distractions.

#code – Absurdity of some software patents –

         

  1. ‘someone’ patented Linked list
  2. and patented ‘Electronic shipping notifications’ too


#design – The best ever education on password strength


#essay – 
GPS is changing our brains faster than we think –

 

There is an idea popular in technophilia, dating back at least to Marshall McLuhan, that some technologies may be considered an “extension” of our own minds or selves. Scott Adams, sounding not unlike the drones who spin corporate techno-jargon in his comic strip Dilbert, has said just such a thing about GPS devices, claiming that they are part of our “exobrain” (and that this means that “technically, you’re already a cyborg”). It seems a rosy picture with a rosy appeal: GPS gives us additional abilities in physical space; therefore it extends our abilities into space; therefore it is an extension of us, or of our minds or brains. More precisely, as Adams puts it, “your regular brain uses your exobrain to outsource part of its memory, and perform other functions.””


#mobile – 
Not disk capacity; RAM capacity; of course not number of transistors – Battery charging time has improved the LEAST (PDF) over last few decades. iPhone can only accept ~2.5 Watts while charging, humans generate 100 watts running on treadmill – can we hook up iPhone with ourselves and charge? 

 

#social – Analytics to replace counseling?! 

 

 

 
#tool – ‘Electronic tattoo’ – otherwise known as ‘Epidermal Electronics’ — could potentially be dangerous for privacy. Right now, it’s good for science. 

 

#tweaks n’ hacks – Useful ‘how-to’ algorithms to enhance images – how to ‘beautify’ a face; change B&W to Color


#etc

  • Why are restaurant websites so awful?

    Restaurant sites are the product of restaurant culture. These nightmarish websites were spawned by restaurateurs who mistakenly believe they can control the online world the same way they lord over a restaurant. “In restaurants, the expertise is in the kitchen and in hospitality in general,” says Eng San Kho, a partner at the New York design firm Love and War, which has created several unusually great restaurant sites (more on those in a bit). “People in restaurants have a sense that they want to create an entertainment experience online—that’s why disco music starts, that’s why Flash slideshows open. They think they can still play the host even here online.”

     

#parting_thought – “Some very considerable part of the gestural language of public places, that had once belonged to cigarettes, now belonged to phones” – William Gibson in ‘Zero History’ 

 

 

State of Data #61

#analysisWhat does that “Register” button cost you? It cost one e-tailer $300M/year as “fastest way to alienate those customers and scare away that free money is to make its owner establish a relationship with you before s/he can make a purchase”.

Consumer:Creator ratio – 1M:1 (50 years ago) to 100:1 (Etsy era) 


#architecture
Very detailed tabular comparison of Top 6 “Cloud Computing” services (PDF) – AWS; GAE; Azure; Force.com; RackSpace and GoGrid

 

#big_data(Greenplum + SAS) vs. ($5K hardware + R Enterprise) – the latter ran logistic regression on 1 Billion records in 75 seconds – “ just as fast, and at less than 1% of the hardware cost


#Data_Science –   Machine Learning on Big Data – Lessons Learned from Google Projects. E.g., how do they render the ‘best guess’ in the following search?

 

#DBMSMythbusters: Stored Procedures Edition – agree or disagree, worth a read.

 

 

#learningBell curve (or, normal distribution) is not just a math thing, it is naturally ubiquitous. Watch out for it in door wear patterns (why would the left door wear distribution sit above the right door – this editor has a theory. Hint: which hand most would carry goods getting out of a store?)

 

#visualizationEver think what the real color of summer would be? Or, of Thursday? “using simple algorithms on data originating from subjective human perceptions — system created to find out the colour of anything, by querying and aggregating image data from Flickr”

#etc

  • Would you choose a different number if asked for ‘favorite number’ than ‘random number? Most people intrinsically like Prime numbers. Help uncover world’s most ‘favorite number’ 

  • Backup 1: Chaos 0 – Make Data ImmortalStartup claims a DVD form-factor storage that “you can dip it in liquid nitrogen and then boiling water without harming it” 
  • Backup 1: Chaos 1 – ‘Hard Disk Crusher” – a ‘new spin on destruction’. Economist writes – “A baseball bat might have been more liberating, but the hydraulic crusher’s surgical precision nonetheless holds a certain charm.



Drive or Fly from SFO to LAX – DBMS or noSQL for your Transactional App?

Clarke’s 1st Law – “When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.”

“The NoSQL movement is a lot like the Ron Paul campaign – it consists of people who are dissatisfied with the status quo, whose dissatisfaction has a lot to do with insufficient liberty and/or excessive expenditure, and who otherwise don’t have a whole lot in common with each other.” –    Curt Monash

“The computer industry is the only industry that is more fashion-driven than fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?“ -         Larry Ellison

Technologists have it real tough. We cannot hide our face in the sand from slew of new (often very meaningful) technologies. Neither can we be Don Quixote and embrace change purely for the sake of change.

 

If I saved $1 for every time I was asked “Should my app move on to noSQL?”, the brutally bruised 401K would surely look just about OK now. At the end, the easy way to decide for / against immediate change is simple and mathematical.

Change if (the cost of change) < (the cost of doing nothing).

Based on that, here is a brief guidance that should, hopefully, help you making an informed judgment on choosing the data management strategy for your particular application. I made things simpler than it should be to underscore the pattern.

A typical 3-tier transactional system spends about 80% of its total processing needs within database.

Within database, the time division is typically as follows – 80% time within DB is spent to what we could call “connection” overhead – with multiple SQLs fired over JDBC etc; N+1 type mapping issues; un-optimized OR-patterns etc. Rest 20% is somewhat equally distributed in four categories of latency, irrespective of RDBMS vendor. It is easy to (almost) fully get rid of the “connections” overhead by hand-coded optimizations, or – in extreme cases – by putting things in stored proc.

Assuming (a) stored proc runs the SQL; (b) data is already cached in memory, and (c) access path too has been determined in previous runs, a typical query returning few records should really take about 40ns to run. But it takes about 1~2 ms to run because of –

  1. Logging (mainly systemic logging) – 18%
  2. Locking (mostly to insure “A” of ACID) –  20%
  3. Latching (a low-level locking, to insure “C”) – 20%
  4. Buffer Management – 35%
  5. Actual Work – 7%


Roughly, if your application is creating 10M invoices a day @ 0.5 sec end-user-latency/invoice

 

  • 5M sec of total latency/day
  • 4M sec of DB latency/day
  • 3.2M sec of latency/day attributed to “connections” (firing more SQLs than needed; bad mapping; unavoidable mapping; denormalization etc)
  • 744,000 sec of “overhead” to pay for (mainly) ACID; vendor features in form of Locking; Logging; Latching and Buffer Management
  • 56,000 sec of *actual DB processing*

Thus, the “trade-off” analysis to hand-over to “noSQL” arises if –

  1. The “overhead” to pay is the major component. Say, rather than a typical 15%, the “overhead” is 50%
  2. The reasons for “overhead” no longer apply much (the system can ‘execute and forget’ – no need to lock, log, multi-user access, security, auditing etc)
  3. “Connection” (i.e., JDBC, ODBC, Data Transfer, Amount of Code executing) has been taken care of either via
    1. Hand-coded optimizations
    2. Stored-proc like “centralized” modular processing

It is a lot like to decide whether to fly from SFO-Los Angeles or drive. The pure flying time (“actual processing work”) is about 90 min, but from (home to SFO airport) + (LAX to Hotel) + security could be 3 hrs. An executive of a company could rent a charter flight and decide to get rid of the security check-ins etc (“lock, latch, log”) – but for general purposes we commoners just bear the “overhead” hoping the security scanners will do the work to make us safer at the end.


 

 

Follow

Get every new post delivered to your Inbox.