[Note from Pinal]: This is a 94th episode of Notes from the Field series. When I read the title of this article – I was extremely intrigued with it – Rewriting Database History! When I was a kid, history was my favorite subject and till today when I have to deal with history, I always jump first to read and enjoy it. When I see this article from Kevin, I was so delighted. I started to read it immediately, and I did not stop reading it till it was finished. It was 20 minutes well spent. The journey starts from ORM and end at the ORM, however, when it comes to full circle there are so many things in between. Trust me Kevin can be a great historical novel writer.
In this episode of the Notes from the Field series database expert Kevin Hazzard explains the interesting subject of rewriting database history. Read the experience of Kevin in his own words.
Way back in June of 2006, my friend Ted Neward wrote a very detailed piece called The Vietnam of Computer Science. Juxtaposing America’s involvement in Southeast Asia in the last century with the battles we fight in modern, business-driven enterprises, that article examined the unwinnable war that Object-Relational Mapping (ORM) tools have become. The title of the article is probably a bit politically incorrect these days given that Vietnam’s world standing and economy have improved vastly over the last decade. Moreover, lots of database and application developers working today don’t even remember the war in Vietnam so the allegory may fail for them. But for those of us who grew up in the 1960s and 1970s, the writing evokes plenty of painful memories that paint ORMs as technological quagmires.
ORM tools have evolved quite a bit since 2006 but they’re still generally pretty awful. The metadata-driven code generators buried within tools like Hibernate and Microsoft’s Entity Framework do a much better job of writing decently-performing queries than they did in the early days. But the cruft, wacky idioms and generally non-sensible constraints one must suffer to use these tools is overwhelming for all but a handful of gladiators who gain some odd sense of purpose in buttressing their companies for long-running conflicts. Nowhere is this more apparent than in the Ruby on Rails space where some sort of addiction to total database ignorance seems to have infected the vast majority of that community. If you want to understand why Ruby on Rails is in decline in recent years, look no further than the chatty, miserably-performing queries that the Active Record ORM generates when accessing complex, relational data structures. Any DBA or database developer who has been enlisted to debug performance problems in those sorts of melees knows there’s a better way to win hearts and minds.
For most developers who have shunned traditional ORMs, I often hear them speak about how unwieldy, brittle or inflexible they found the tools to be. The relational abstractions that they provide is sometimes too weak, making common business operations difficult to express in languages like C# and Java. To solve that problem, the abstraction may become leaky, exposing details that create dependencies which complicate deployments or lock you into vendor or version-specific features.
For database people, ORM aversion is typically related to the naiveté of machine-generated queries. Chattiness is a big problem with full-service ORMs, for example. Mature ORMs do a better job today emitting correlated sub-queries or common table expressions than they did in the past. But that often depends on good mappings which depend in turn on good metadata. Unfortunately, perfect database metadata is quite rare. Anyone who watches the tracing of a database server suffering under the garrulous assault of an ORM-driven application understands just how little the ORM really knows about the database.
If the protestors can convince management that the ORM war is unwinnable, the response can move in several potential directions. One might rationalize that the relational representation of data is the real foe, i.e. that because data is rarely used in third normal form within applications, it shouldn’t be stored that way on disks. The pundits of the NoSQL movement say that relational storage models are a holdover from an age when storage and bandwidth were much more expensive than they are today. That is a massive oversimplification for sure, but there are some cases where a document-oriented databases make a lot of sense.
For example, I worked in a shop long ago that used a UniVerse database to manage large caches of medical claims data. As a multi-valued database much like today’s NoSQL databases, UniVerse stores data with all the data that it references in a single document. So a patient’s demographic information can be stored in line with the related medical claims, diagnostic history and lab results. For that system, there was only one index on the database which was used to find individual, semi-structured blobs of patient records. The rules engine which sat atop the database always needed all of a single patient’s information whenever it ran a rule so storing the entire document describing a patient as a single database entity was both practical and highly efficient.
In those days, UniVerse was an IBM product that competed against their own DB2 relational database. For some reason, IBM was keen on getting our company to move from UniVerse to DB2 so we invited them in for a proof of concept to build a relational model of our medical database in DB2. When they were done, they loaded several terabytes of data into the new relational database and aimed the rules engine there. After weeks of tweaking, rewriting and hand-wringing, they never could make the relational database outperform UniVerse. Moreover, it wasn’t even a close race. The UniVerse database running on much older hardware could fetch whole patient documents in a few milliseconds. The best we could get DB2 to do using its normalized implementation was measured in hundreds of milliseconds. The difference between five milliseconds and half a second may not sound like a major problem, but when you’re handling millions of queries per day and trying to satisfy a three second, end-to-end Service Level Agreement with your Point of Sale providers, those delays add up very quickly.
Of course, document-oriented databases like Mongo and multi-valued databases like UniVerse are not always viable solutions to bridging the gulf between on-disk and in-memory data representations. For 20 years, I’ve made the claim that databases are an unfortunate consequence of history. Bear with me because I don’t mean to disparage database technologies at all. Quite the opposite, I love and respect modern database systems of many types. However, it’s very true that if you could travel back in time to 1943 and give the developers of the ENIAC computer at the University of Pennsylvania a large scale memory device to attach to their invention, databases as we know them would never have evolved. Instead, all of the rich query tools, and consistency features of modern transactional databases would have been developed inside of our applications instead.
In that sense, document-oriented databases and ORMs are both attempts to undo history. The former solution means to eliminate the mismatch between applications and databases altogether by making the data storage model match the common data utilization pattern. The problem is that by coercing the database to adapt to the application’s model, many of the features that evolved in transactional systems simply aren’t available. So-called eventual consistency scares developers who have come to depend on databases that offer transactions which are Atomic, Consistent, Isolated and Durable (ACID). This is most unfortunate because had the ENIAC developers had unlimited in-memory storage available to them, I believe that in-memory data handling semantics would have evolved to be ACID-compliant just as they are in modern relational databases. Document-oriented databases redo history in a way that doesn’t honor so many of the good choices that were made in real history.
Rather than attempting to undo history per se, ORMs try to hide history from us by making it seem that the underlying database’s storage and query semantics exist inside the application’s space. While this is a better way to undo history because it preserves the best parts of it, this is perhaps the hardest problem in all of computer science today. The reasons that it is so difficult are myriad. Incomplete metadata is often the culprit. The cardinality between entities, for example, is nearly impossible to get right simply by inspecting a database’s available foreign keys. Moreover, the relationships between entities in separate databases or in separate but related data integration processes often has no metadata to tie them together. The Master Data Management (MDM) structures and processes used to manage code assignments can also have a huge impact on an ORM’s query generation. Languages like C# and Java lack even the most basic Design by Contract (DbC) features so it’s no wonder that their handling of database constraints is also quite weak. For example, imagine a simple database constraint that limits an integer value to multiples of ten. Now imagine an ORM trying to convey and enforce that constraint in a language like C#. You get the picture. This is a very difficult, effectively impossible, problem to solve.
So what’s the solution? Should ORMs be used at all? Should we abandon relational databases and move entirely over to document-oriented databases? Yes and no, not necessarily in that order. ORMs have taught us a lot over the past 20 years. The number one thing I’ve learned from them is to be pragmatic about my approach to data. I use stored procedures whenever there’s a way to reduce the chattiness and excessive round trip calls that ORMs often induce. That keeps my DBAs happy. For simple, table accesses that are indexed well for my common query predicates, I’ll use an ORM just to save me some time.
Nowadays, I rely on so-called micro-ORMs and auto-mapper tools more than traditional ORMs. Micro-ORMs are an admission that full-service ORMs are often too complex or that they unnecessarily constrain my use of data. Using less ambitious micro-ORMs like Dapper gives me the benefits I desire without all the weight and ceremony of a full-featured ORM. Lastly, when document-oriented storage seems warranted, I evaluate the criteria objectively and openly admit when NoSQL is the correct choice. However, this is less common than avid NoSQL proponents would have you believe. Relational databases are still the best choices for storage and data handling in the vast majority of cases, mostly because of their maturity, rich feature sets, high reliability and great support.
Nine years later and my friend Ted is still mostly correct. ORMs are an unwinnable war. But they’ve taught us a lot and there are other options for solving what is perhaps the hardest problem out there today: undoing history and redoing it as we think it might have been done under different circumstances. No matter which way the market goes, the technical equivalent of sober diplomacy should triumph over conflict.
If you want to get started with SQL Server with the help of experts, read more over at Fix Your SQL Server.
Reference: Pinal Dave (http://blog.sqlauthority.com)