For decades, the default choice when it comes to
storing application data have been relational databases.
Recently however, we see a lot of alternative approaches
gaining widespread exposure (not sure about acceptance yet),
especially as part of Web 2.0 platforms.
Think Amazon's SimpleDB, Google's BigTables, or Apache CouchDB.
Cluster architecture: RDBMS have traditionally
always been client-server oriented, meaning that you
can have multiple clients access the same database
concurrently over a network. This alone is an enormous
improvement over file-based storage, and it is also
useful for three-tier web application, as it allows to
scale out the number of application servers. In order not
to have the single database server as a bottleneck and
single point of failure, you eventually will want to
spread its functionality over a cluster of machines.
This is a more advanced option that most RDBMS have added
in one form or another, but it seems these new web databases
were designed specifically to run on distributed nodes.
Schema-free: RDBMS rely on data schema definitions
(tables with typed columns) and have great difficulties to
handle unstructured documents. In particular, a relational
system offers no way to query data other than by column value,
and makes it very difficult to query data across tables.
Again, most RDBMS now have non-relational extensions like XML query
capabilities or full text search.
In contrast, the newcomers appear to be very document-centric
,
where every document can have its own set of attributes.
One could argue that a data schema is part of the data integrity validation
that a database system should perform. On the other hand,
most people seem happy with doing that in the application instead,
and in any case, it seems like it should be an optional feature.
One could also argue that a fixed schema makes for more efficient
storage and access paths. In this case, the schema is seen
more as a necessary evil, and one would be happy to give up on it
if any performance problems can be avoided some other way.
Impedance mismatch: A big complication when using
an RDBMS for storing application data is that everything has
to be broken down and mapped to tables and columns using only
the rather primitive (scalar) data types of the RDBMS. This gets
complex very quickly, both conceptually and also in regards to
how the resulting data will be stored, retrieved
and queried. Multi-table joins are not easy to understand, and
also not especially fast to execute.
Transactions: Probably the main selling point for an RDBMS
is that they pass the famous ACID test: Atomicity (all or nothing:
no incomplete updates),
Consistency (the state of the database does not get corrupted at
any time, even in the presence of crashes), Isolation (no one can
see the results of a transaction before it is committed), Durability
(no committed update can be lost).
These properties are essential for many applications, but they
come at a cost. In particular, they make it difficult to efficiently
replicate or distribute the system.
The newer non-relational databases tend to relax these constraints considerably,
which makes them unusable when you really need a transactional database.
But if you don't ...
Performance: One would assume that RDBMS with all their
compacted and normalised storage schemes and their indices are
the fastest way to go. And I guess that they do offer the fastest
possible way to sort fifty million records, but how often do you really
need to do that? Especially if sorting these fifty million records
in the fastest possible fashion is still too slow for an interactive
application, you start looking at alternative approaches such as
an intelligent hierarchy of pre-computed aggregated data. In the RDBMS
world this is called data warehousing. Once you get used to the idea
that ad-hoc queries are impossible anyway, and that anticipated queries
can be satisfied using clever indexing (that may not even need to
be completely up-to-date), the performance benefits of operations
that you can avoid become less important.
So, in summary, I think that these new databases are obviously not
able to replace an RDBMS in its traditional field of operation
(record processing where consistent read and writes, transaction isolation,
and atomic updates are critical), but they may very well take a sizable
chunk of the huge market where RDBMS are currently being used solely
because there have been no other choices. There may be no need
for an RDBMS in the usual web application stack after all.