Lightweight databases in C : GDBM

Lightweight databases in C : GDBM

If you’re a web developer, or some JDBC fan, you probably already used SQL (or similar) systems such as Oracle, MySQL, PostgreSQL, … (so many of them). These engines are called RDBMS, which stands for Relational Database Management System. These funny toys work with quite a simple concept : data isn’t just data. RDBMSes create relations between attributes (fields), they bring some connections, they transform data into actual informationThat’s why they’re relational, because they create relations, links, between the different pieces of information. When your applications become numerous, or just bigger, these systems tend to become essential. Sometimes, your needs are so specific you even need to deploy an object-oriented database management system (OODBMSes, Java/.NET developers love these things…).

Now, you’ve seen this blog’s design : heavy/complex implementations are not my thing, I like it simple. Well not exactly… I like it simpler, as simple as possible! For this reason, today, I’d like to introduce you to some unrecognised system : GDBM (Data Base Management).

About (G)DBM

A simpler implementation

DBM stands for Data Base Management, which is quite self-explanatory : this is a tool used in data management, simple management. Take the basic RDBMS, remove relations, remove table multiplicity, that’s it. DBM allows you to store data in a very simple format : all it does is associating a key to a value for each “row”.

History

Everything started at Bell Labs, with some guy named Ken Thompson (well, this one is pretty famous among UNIX users, actually). He created DBM, which allowed you to :

  • Create/Initialise a database file.
  • Close a database (well, one needs to free memory somewhere, right?)
  • Fetch information using its key.
  • Storing information (key => value).
  • Delete an entry using its key.
  • Getting the first key.
  • Jumping from one key to the next (lovely queue!).

Now, the big problem with that was : you can’t open more than one database file at the same time, and this can prove itself to be unhandy. For this reason, the University of California created NDBM (New DBM), and guess what : you can open several databases, yay! After this, the Free Software Foundation created GDBM (G for GNU, of course), which brought some additional improvements to the NDBM implementation. In this article, I’ll focus on GDBM directly, as I find it a little bit more convenient to use.

Important notes

GDBM is simple, but there are a few things that come with this lightweight API. I think it is important to know that :

  1. Databases files are locked when opened, a UNIX lock is set. You can disable this mechanism with a flag, but this implies building your own protection mechanism. That’s another GDBM improvement : there was absolutely no protection with NDBM.
  2. Some OSes tend to put some suffixes to the database files when the API creates them. As far as I know, FreeBSD appends .db to the filename. On Debian-based systems, nothing changes. Another interesting fact : if you use NDBM, two files might be created instead of one. Be careful not to mix names up!
  3. A GDBM database is represented by a GDBM_FILE data type. It looks strange at first, but this is a pointer type. GDBM masks our lovely asterisk with a typedef.
  4. Maybe it’s me, but I ran into some data overflow problems when using NDBM : some characters were injected into my DBM entries, no matter how tight my memory management was. If you used a NDBM database without such problems, feel free to contact me.

Anyway, if you develop a simple application, these will probably not become problems.

The API

Prerequisites

You’ll need to include the gdbm.h header file in order to call GDBM routines. You may need to install some packages such as libgdbm-dev or similar (have a look at your distribution-specific documentation).

Routines

Let’s have a look at our routines first :

The first two ones allow you to open and close a database. The datum structure and its 5 routines are meant for data manipulation. The last two ones are here for error handling (they work just like errno routines).

  • gdbm_open‘s open flags can be set to : GDBM_READER (read-only), GDBM_WRITER (read and write), GDBM_WRCREAT (Read, write, create if necessary), or GDBM_NEWDB (will force database creation no matter what). Some fancy additional flags also exist, such as : GDBM_SYNC (sync on disk after write operations), GDBM_NOLOCK (no locking on the database file) and GDBM_NOMAP (no memory mapping).
  • The datum structure is a typical dynamically allocated data handler : a pointer to the beginning of the data’s memory area, and its length. Note : datum is used for both the key and the value, meaning you can use anything as a key, as long as it remains unique in your database.
  • gdbm_store‘s store mode can be either GDBM_INSERT or GDBM_REPLACE. If set to GDBM_INSERT, this parameter will bring the gdbm_store call down if an entry already exists with the given key. Similarly, GDBM_REPLACE will create a new key if it cannot find what it is supposed to replace.
  • When passing from one key to the next using gdbm_nextkey, you need to specify the previous key in the call (second parameter). I’ll use a loop in the code below, have a look.

Besides those 4 points, everything is quite self-explanatory.
For more information, here’s the magic link : http://www.gnu.org.ua/software/gdbm/manual.html

 Some code

Now, let’s build a simple agenda : a date associated to an event. You can’t have two events at the same time, simple enough ? Let’s do this.

This sample program produces the following output :

Now browsing the database…
Event found : Christmas Eve on 24/12/2014.
Event found : Christmas Day on 25/12/2014.
Event found : Boxing Day on 26/12/2014.

Now, the system isn’t perfect, but I think that’s a very nice example of easy, simple data storage on UNIX systems. Sometimes, you don’t actually need more, and it is usually a mistake to over-evaluate your needs.

Anyway, that’s it for today. See you next time !