DBZ(3) manual page

Name

dbzinit, dbzfresh, dbzagain, dbzclose - database routines
dbzexists, dbzfetch, dbzstore - database routines
dbzsync, dbzsize, dbzgetoptions, dbzsetoptions, dbzdebug - database routines
Synopsis

#include <dbz.h>BOOL dbzinit(const char *base)
BOOL dbzclose(void)
BOOL dbzfresh(const
char *base, const long size)
BOOL dbzagain(const char *base, const char
*oldbase)
BOOL dbzexists(const HASH key)
OFFSET_T dbzfetch(const HASH key)BOOL
dbzfetch(const HASH key, void *ivalue)
BOOL dbzstore(const HASH key, const
OFFSET_T offset)BOOL dbzstore(const HASH key, void *ivalue)
BOOL dbzsync(void)
long
dbzsize(const long nentries)
void dbzgetoptions(dbzoptions *opt)
void dbzsetoptions(const
dbzoptions opt)
BOOL dbzdebug(const BOOL newvalue)
DescriptionThese functions
provide an indexing system for rapid random access to a text file (the
base  file). Dbz stores offsets into the base text file for rapid retrieval.
 All retrievals are keyed on a hash value that is generated by the  HashMessageID()
function. 
Dbzinit opens a database, an index into the base file base, consisting
of files base.dir , base.index , and base.hash which must already exist. (If
the database is new, they should be zero-length files.) Subsequent accesses
go to that database until dbzclose is called to close the database. 
Dbzfetch
searches the database for the specified key, returning the corresponding
value if any, if <--enable-tagged-hash at configure> is specified.  If <--enable-tagged-hash
at configure> is not specified, it returns TRUE and content of ivalue is
set. Dbzstore stores the key - value pair in the database, if <--enable-tagged-hash
at configure> is specified.  If <--enable-tagged-hash at configure> is not specified,
it stores the content of ivalue. Dbzstore will fail unless the database
files are writable. Dbzexists  will verify whether or not the given hash
exists or not.  Dbz is  optimized for this operation and it may be significantly
faster than dbzfetch(). 
Dbzfresh is a variant of dbzinit for creating a
new database with more control over details. 
Dbzfresh's size parameter specifies
the size of the first hash table within the database, in key-value pairs.
Performance will be best if the number of key-value pairs stored in the
 database does not exceed about 2/3 of size. (The dbzsize function, given
the expected number of key-value pairs, will suggest a database size that
meets these criteria.) Assuming that an fseek offset is 4 bytes, the .index
file will be 4 * size bytes.  The  .hash file will be DBZ_INTERNAL_HASH_SIZE
* size bytes (the .dir file is tiny and roughly constant in size) until
the number of key-value pairs exceeds about 80% of size. (Nothing awful will
happen if the database grows beyond 100% of size, but accesses will slow
down quite a bit and the  .index and  .hash files will grow somewhat.) 
Dbz
stores up to  DBZ_INTERNAL_HASH_SIZE bytes of the message-id's hash in the
 .hash file to confirm a hit.  This eliminates the need to read the base
file to handle collisions.  This replaces the tagmask feature in previous
dbz  releases. 
A size of ``0'' given to dbzfresh is synonymous with the local
default; the normal default is suitable for tables of 5,000,000 key-value
pairs. Calling dbzinit(name) with the empty name is equivalent to calling
dbzfresh(name, 0). 
When databases are regenerated periodically, as in news,
it is simplest to pick the parameters for a new database based on the old
one. This also permits some memory of past sizes of the old database, so
that a new database size can be chosen to cover expected fluctuations. Dbzagain
is a variant of dbzinit for creating a new database as a new generation
of an old database. The database files for oldbase must exist. Dbzagain is
equivalent to calling dbzfresh with a size equal to the result of applying
dbzsize to the largest number of entries in the oldbase database and its
previous 10 generations. 
When many accesses are being done by the same program,
dbz is massively faster if its first hash table is in memory. If the ``pag_incore''
flag is set to INCORE_MEM, an attempt is made to read the table in when
the database is opened, and dbzclose writes it out to disk again (if it
was read successfully and has been modified). Dbzsetoptions can be used
to set the  pag_incore  and  exists_incore  flag to new value which should
be ``INCORE_NO'', ``INCORE_MEM'', or ``INCORE_MMAP'' for the .hash and .index  files
separately; this does not affect the status of a database that has  already
been opened.  The default is ``INCORE_NO'' for the  .index  file and ``INCORE_MMAP''
for the  .hash  file.  The attempt to read the table in may fail due to memory
shortage; in this case dbz fails with an error. Stores to an in-memory database
are not (in general) written out to the file until dbzclose or dbzsync,
so if robustness in the presence of crashes or concurrent accesses is crucial,
in-memory databases should probably be avoided or the  writethrough option
should be set to ``TRUE''; 
If the nonblock option is ``TRUE'', then writes to the
 .hash and  .index files will be done using non-blocking I/O.  This can be
significantly faster if your platform supports non-blocking I/O with files.

Dbzsync causes all buffers etc. to be flushed out to the files. It is typically
used as a precaution against crashes or concurrent accesses when a dbz-using
process will be running for a long time. It is a somewhat expensive operation,
especially for an in-memory database. 
If dbz has been compiled with debugging
facilities available (which makes it bigger and a bit slower), dbzdebug
alters the value (and returns the previous value) of an internal flag which
(when 1; default is 0) causes verbose and cryptic debugging output on standard
output. 
Concurrent reading of databases is fairly safe, but there is no
(inter)locking, so concurrent updating is not. 
An open database occupies
three stdio streams and two file descriptors; Memory consumption is negligible
(except for stdio buffers) except for in-memory databases. 
See Alsodbm(3),
history(5), libinn(3) DiagnosticsFunctions returning BOOL values return
``TRUE'' for success, ``FALSE'' for failure. Functions returning OFFSET_T values
return a value with -1 for failure. Dbzinit attempts to have errno set plausibly
on return, but otherwise this is not guaranteed. An errno of EDOM from dbzinit
indicates that the database did not appear to be in dbz format. If  DBZTEST
is defined at compile-time then a  main() function will be included.  This
will do performance tests and integrity test. 
HistoryThe original dbz was
written by Jon Zeeff (zeeff@b-tech.ann-arbor.mi.us). Later contributions by David
Butler and Mark Moraes. Extensive reworking, including this documentation,
by Henry Spencer (henry@zoo.toronto.edu) as part of the C News project. MD5
code borrowed from RSA.  Extensive reworking to remove backwards compatibility
and to add hashes into dbz files by Clayton O'Neill (coneill@oneill.net)
BugsUnlike dbm, dbz will refuse  to  dbzstore with a key already in the
database. The user is responsible for avoiding this. 
The RFC822 case mapper
implements only a first approximation to the hideously-complex RFC822 case
rules. 
Dbz no longer tries to be call-compatible with dbm in any way.