The Rise of NoSQL

Recently I read about the Cassandra project. It is a distributed database. The goal is to store a large amount of data. It uses the Google BigTable model (more on BigTable later). The databases stores key value pairs.

Cassandra was created by FaceBook. It is used by FaceBook, Twitter, and Digg. It is an open source project managed by Apache. The intended use is large web applications. There are other projects similar to Cassandra. They are part of the NoSQL movement. Are these databases poised to replace the relational ones like Oracle, DB2, and SQL Server? Let’s see what these NoSQL databases have to offer.

NoSQL in general are data stores that do not impose a fixed structure for the data. Access to the NoSQL databases tries to avoid joins. The technical term for these databases is structured storage. They have weak consistency. That means if you update a copy on your server, the update is not guaranteed to be propagated everywhere immediately. These type of databases also have very simple interfaces.

Let’s go over some other NoSQL databases. Hadoop is a project with many subprojects. One such subproject is MapReduce, which is a framework for large data set distributed processing. Then there is Google’s BigTable. It is a distributed storage system that can scale to a large size.

Next we have MemCacheDB. It is a distributed key value storage system. Despite the name, it is not for caching. It is a persistent storage mechanism that uses the memcache protocol to access a Berkeley database on the back end. Another NoSQL database is project Voldemort. It is a distributed database providing key value storage. LinkedIn uses it.

CouchDB is an Apache project. It is a document oriented database. You query it using MapReduce. It has a RESTful JSON API. Note that CouchDB is written in the Erlang functional programming language. A similar offering is MongoDB. The name is a play on humongous. It is also a document oriented database. This one was written in C++. It collects JSON documents and stored them in a binary BSON format.

I have covered a lot of NoSQL implementation. They mostly provided distributed storage. Although they scale very well for large data sets, their functionality is limited. They are not a replacement for relational databases. You still need RDBs for transaction processing. Nonetheless it is good to know a bit about the NoSQL movement, and the problems it tries to solve.