cdb - cql database format library


#include <cdb.h>


Cdb_t*    cdbopen(const char* path, unsigned long flags, Cdbdisc_t* disc);
int       cdbclose(Cdb_t* cdb);

typedef int     (*Cdberror_f)(Cdb_t*, Cdbdisc_t*, int, const char*, ...);
typedef struct
       unsigned long version;    /* CDB_VERSION                  */
       const char*   schema;     /* schema descriptor            */
       const char*   comment;    /* data specific comment        */
       Cdberror_f    errorf;     /* error function               */

int       cdbread(Cdb_t* cdb);
int       cdbwrite(Cdb_t* cdb);

int cdbflatten(Cdb_t* cdb, Sfio_t* output); int cdbsplit(Cdb_t* cdb, Sfio_t* input);

char*     cdbschema(Cdb_t* cdb);


cdb provides functions to read and write cql(1) database format files.

cdb is a portable binary format for databases of uniform records, where each record is an array of a fixed number of fields. It is often an efficient alternative to fixed field, delimited field, or tagged field flat file databases.

Three field types are supported: 4-byte long, 8-byte double, and 0-terminated string. long fields are stored using sfputu()/sfgetu() (see sfio(3)), double fields are stored using sfputd()/sfgetd(), and string fields are stored with a sfputd()/sfgetd() count (including the terminating 0) followed by the 0-terminated data, with the exception that null strings are stored with a 0 count and no data. There is no inter-record compression, so individual records are fully seekable.

cdb is particularly efficient for storing sparse records and approaches gzip(1) levels of compression in some cases. A sparse record contains runs of fields with null values; 0 for long, 0.0 for double, and 0 length for string). Non-sparse data may not compress as much, but record and field processing will be faster than ASCII flat file processing.

A cdb file consists of a header followed the record data. The functions below provide an abstract interface to the physical layout. The header layout is:

           type     description
           4 byte   magic number
           1 byte   major version
           1 byte   minor version
           32 byte  0-terminated comment string
           integer  number of fields
           integer  number of permanent fields
           integer  flags (currently unused)
           1 byte   flat file field delimiter
           integer* <type,size,delimiter> info for each field
           string   tagged header fields: <tag><data>0
           sfputu   terminating 0
The magic number is always CDB_MAGIC="\003\004\002\000", the current major version is CDB_MAJOR=1, and the current minor version is CDB_MINOR=0. Implementation and database major numbers must match; lower minor numbers will work properly but may not take advantage of all features. The remaining fields are described below (see cdbopen().) There is currently only one optional header field: S<schema-descriptor> (see cdbschema().)

The record layout is:

           f1 .. f\dperm\u ( keep skip fi .. f\di+keep-1\u )* 0

perm is the number of fields, starting from the beginning, that are present in every record. This number may be 0, but is usually at least 1, since most database schemas use the first field as a key. Following the permanent fields are zero or more groups of keep-skip-data. skip is the number of null fields to skip counting from the last non-null field. keep is the number of fields to keep after the skip, and data is the corresponding field data. The record ends with a 0 keep count.


This provides data specific information for the cdb context and is initialized before the cdbopen() call. It contains the following fields:
           unsigned long      version;
           const char*        schema;
           const char*        comment;
           Cdberror_f         errorf;

Must be initialized to CDB_VERSION. The implementation checks this value for interface compatibility.

A 0-terminated string that specifies the record schema (see cdbschema().)

A 0-terminated string limited to CDB_COMMENT bytes including the terminating 0. Comment semantics are controlled by the caller.

This is the cdb context inialized and returned by cdbopen(). It contains the following fields:
           const char*        id;
           unsigned short     delimiter;
           unsigned short     fields;
           unsigned short     permanent;
           unsigned char      major;
           unsigned char      minor;
           unsigned long      flags;
           char               comment[CDB_COMMENT];
           Sfio_t*            io;
           char*              schema;
           Cdbformat_t*       format;
           Cdbdata_t*         data;
Used by (*errorf)() to identify the library.

The flat file field delimiter character. Used by cdbflatten() and cdbsplit() for external data representation. A 0-terminated string that describes the three components of the record schema: the field delimiter character, the number of permanent fields, and the field types. The field delimiter character is specified by:


Glenn Fowler,

February 02, 2010