Drive or Fly from SFO to LAX – DBMS or noSQL for your Transactional App?
August 10, 2011 2 Comments
Clarke’s 1st Law – “When a distinguished but elderly scientist states that something is possible, he is almost certainly right. When he states that something is impossible, he is very probably wrong.”
“The NoSQL movement is a lot like the Ron Paul campaign – it consists of people who are dissatisfied with the status quo, whose dissatisfaction has a lot to do with insufficient liberty and/or excessive expenditure, and who otherwise don’t have a whole lot in common with each other.” – Curt Monash
“The computer industry is the only industry that is more fashion-driven than fashion. Maybe I’m an idiot, but I have no idea what anyone is talking about. What is it? It’s complete gibberish. It’s insane. When is this idiocy going to stop?“ – Larry Ellison
Technologists have it real tough. We cannot hide our face in the sand from slew of new (often very meaningful) technologies. Neither can we be Don Quixote and embrace change purely for the sake of change.
If I saved $1 for every time I was asked “Should my app move on to noSQL?”, the brutally bruised 401K would surely look just about OK now. At the end, the easy way to decide for / against immediate change is simple and mathematical.
Change if (the cost of change) < (the cost of doing nothing).
Based on that, here is a brief guidance that should, hopefully, help you making an informed judgment on choosing the data management strategy for your particular application. I made things simpler than it should be to underscore the pattern.
A typical 3-tier transactional system spends about 80% of its total processing needs within database.
Within database, the time division is typically as follows – 80% time within DB is spent to what we could call “connection” overhead – with multiple SQLs fired over JDBC etc; N+1 type mapping issues; un-optimized OR-patterns etc. Rest 20% is somewhat equally distributed in four categories of latency, irrespective of RDBMS vendor. It is easy to (almost) fully get rid of the “connections” overhead by hand-coded optimizations, or – in extreme cases – by putting things in stored proc.
Assuming (a) stored proc runs the SQL; (b) data is already cached in memory, and (c) access path too has been determined in previous runs, a typical query returning few records should really take about 40ns to run. But it takes about 1~2 ms to run because of —
- Logging (mainly systemic logging) – 18%
- Locking (mostly to insure “A” of ACID) – 20%
- Latching (a low-level locking, to insure “C”) – 20%
- Buffer Management – 35%
- Actual Work – 7%
- 5M sec of total latency/day
- 4M sec of DB latency/day
- 3.2M sec of latency/day attributed to “connections” (firing more SQLs than needed; bad mapping; unavoidable mapping; denormalization etc)
- 744,000 sec of “overhead” to pay for (mainly) ACID; vendor features in form of Locking; Logging; Latching and Buffer Management
- 56,000 sec of *actual DB processing*
Thus, the “trade-off” analysis to hand-over to “noSQL” arises if –
- The “overhead” to pay is the major component. Say, rather than a typical 15%, the “overhead” is 50%
- The reasons for “overhead” no longer apply much (the system can ‘execute and forget’ – no need to lock, log, multi-user access, security, auditing etc)
- “Connection” (i.e., JDBC, ODBC, Data Transfer, Amount of Code executing) has been taken care of either via
- Hand-coded optimizations
- Stored-proc like “centralized” modular processing
It is a lot like to decide whether to fly from SFO-Los Angeles or drive. The pure flying time (“actual processing work”) is about 90 min, but from (home to SFO airport) + (LAX to Hotel) + security could be 3 hrs. An executive of a company could rent a charter flight and decide to get rid of the security check-ins etc (“lock, latch, log”) – but for general purposes we commoners just bear the “overhead” hoping the security scanners will do the work to make us safer at the end.