Are relational databases on the way out ?
For decades, the default choice when it comes to storing application data have been relational databases. Recently however, we see a lot of alternative approaches gaining widespread exposure (not sure about acceptance yet), especially as part of Web 2.0 platforms. Think Amazon's SimpleDB, Google's BigTables, or Apache CouchDB.
Cluster architecture: RDBMS have traditionally always been client-server oriented, meaning that you can have multiple clients access the same database concurrently over a network. This alone is an enormous improvement over file-based storage, and it is also useful for three-tier web application, as it allows to scale out the number of application servers. In order not to have the single database server as a bottleneck and single point of failure, you eventually will want to spread its functionality over a cluster of machines. This is a more advanced option that most RDBMS have added in one form or another, but it seems these new web databases were designed specifically to run on distributed nodes.
Schema-free: RDBMS rely on data schema definitions
(tables with typed columns) and have great difficulties to
handle unstructured documents. In particular, a relational
system offers no way to query data other than by column value,
and makes it very difficult to query data across tables.
Again, most RDBMS now have non-relational extensions like XML query
capabilities or full text search.
In contrast, the newcomers appear to be very document-centric
,
where every document can have its own set of attributes.
One could argue that a data schema is part of the data integrity validation
that a database system should perform. On the other hand,
most people seem happy with doing that in the application instead,
and in any case, it seems like it should be an optional feature.
One could also argue that a fixed schema makes for more efficient
storage and access paths. In this case, the schema is seen
more as a necessary evil, and one would be happy to give up on it
if any performance problems can be avoided some other way.
Impedance mismatch: A big complication when using an RDBMS for storing application data is that everything has to be broken down and mapped to tables and columns using only the rather primitive (scalar) data types of the RDBMS. This gets complex very quickly, both conceptually and also in regards to how the resulting data will be stored, retrieved and queried. Multi-table joins are not easy to understand, and also not especially fast to execute.
Transactions: Probably the main selling point for an RDBMS is that they pass the famous ACID test: Atomicity (all or nothing: no incomplete updates), Consistency (the state of the database does not get corrupted at any time, even in the presence of crashes), Isolation (no one can see the results of a transaction before it is committed), Durability (no committed update can be lost). These properties are essential for many applications, but they come at a cost. In particular, they make it difficult to efficiently replicate or distribute the system. The newer non-relational databases tend to relax these constraints considerably, which makes them unusable when you really need a transactional database. But if you don't ...
Performance: One would assume that RDBMS with all their compacted and normalised storage schemes and their indices are the fastest way to go. And I guess that they do offer the fastest possible way to sort fifty million records, but how often do you really need to do that? Especially if sorting these fifty million records in the fastest possible fashion is still too slow for an interactive application, you start looking at alternative approaches such as an intelligent hierarchy of pre-computed aggregated data. In the RDBMS world this is called data warehousing. Once you get used to the idea that ad-hoc queries are impossible anyway, and that anticipated queries can be satisfied using clever indexing (that may not even need to be completely up-to-date), the performance benefits of operations that you can avoid become less important.
So, in summary, I think that these new databases are obviously not able to replace an RDBMS in its traditional field of operation (record processing where consistent read and writes, transaction isolation, and atomic updates are critical), but they may very well take a sizable chunk of the huge market where RDBMS are currently being used solely because there have been no other choices. There may be no need for an RDBMS in the usual web application stack after all.



