Commit 0145e9527df77e941ec6965e2501d5bc5d0594fb – hare-lmdb – hare

Lightning Memory-Mapped Database Manager (LMDB)

Introduction

LMDB is a Btree-based database management library modeled loosely on the
BerkeleyDB API, but much simplified. The entire database is exposed
in a memory map, and all data fetches return data directly
from the mapped memory, so no malloc's or memcpy's occur during
data fetches. As such, the library is extremely simple because it
requires no page caching layer of its own, and it is extremely high
performance and memory-efficient. It is also fully transactional with
full ACID semantics, and when the memory map is read-only, the
database integrity cannot be corrupted by stray pointer writes from
application code.

The library is fully thread-aware and supports concurrent read/write
access from multiple processes and threads. Data pages use a copy-on-
write strategy so no active data pages are ever overwritten, which
also provides resistance to corruption and eliminates the need of any
special recovery procedures after a system crash. Writes are fully
serialized; only one write transaction may be active at a time, which
guarantees that writers can never deadlock. The database structure is
multi-versioned so readers run with no locks; writers cannot block
readers, and readers don't block writers.

Unlike other well-known database mechanisms which use either write-ahead
transaction logs or append-only data writes, LMDB requires no maintenance
during operation. Both write-ahead loggers and append-only databases
require periodic checkpointing and/or compaction of their log or database
files otherwise they grow without bound. LMDB tracks free pages within
the database and re-uses them for new write operations, so the database
size does not grow without bound in normal use.

The memory map can be used as a read-only or read-write map. It is
read-only by default as this provides total immunity to corruption.
Using read-write mode offers much higher write performance, but adds
the possibility for stray application writes thru pointers to silently
corrupt the database. Of course if your application code is known to
be bug-free (...) then this is not an issue.

Caveats

Troubleshooting the lock file, plus semaphores on BSD systems:

- A broken lockfile can cause sync issues.
  Stale reader transactions left behind by an aborted program
  cause further writes to grow the database quickly, and
  stale locks can block further operation.

  Fix: Check for stale readers periodically, using the
  [[reader_check]] function or the "mdb_stat" tool.
  Stale writers will be cleared automatically on some systems:
  - Windows - automatic
  - Linux, systems using POSIX mutexes with Robust option - automatic
  - not on BSD, systems using POSIX semaphores.
  Otherwise just make all programs using the database close it;
  the lockfile is always reset on first open of the environment.

- On BSD systems or others configured with [[USE_POSIX_SEM]],
  startup can fail due to semaphores owned by another userid.

  Fix: Open and close the database as the user which owns the
  semaphores (likely last user) or as root, while no other
  process is using the database.

Restrictions/caveats (in addition to those listed for some functions):

- Only the database owner should normally use the database on
  BSD systems or when otherwise configured with [[USE_POSIX_SEM]].
  Multiple users can cause startup to fail later, as noted above.

- There is normally no pure read-only mode, since readers need write
  access to locks and lock file. Exceptions: On read-only filesystems
  or with the [[NOLOCK]] flag described under [[env_open]].

- An LMDB configuration will often reserve considerable unused
  memory address space and maybe file size for future growth.
  This does not use actual memory or disk space, but users may need
  to understand the difference so they won't be scared off.

- By default, in versions before 0.9.10, unused portions of the data
  file might receive garbage data from memory freed by other code.
  (This does not happen when using the [[WRITEMAP]] flag.) As of
  0.9.10 the default behavior is to initialize such memory before
  writing to the data file. Since there may be a slight performance
  cost due to this initialization, applications may disable it using
  the [[NOMEMINIT]] flag. Applications handling sensitive data
  which must not be written should not use this flag. This flag is
  irrelevant when using [[WRITEMAP]].

- A thread can only use one transaction at a time, plus any child
  transactions.  Each transaction belongs to one thread.  See below.
  The [[NOTLS]] flag changes this for read-only transactions.

- Use an [[env]] in the process which opened it, not after
  [[rt::fork]].

- Do not have open an LMDB database twice in the same process at
  the same time.  Not even from a plain [[rt::open]] call - [[rt::close]]ing it
  breaks [[rt::fcntl]] advisory locking.  (It is OK to reopen it after
  [[rt::fork]] - [[rt::exec]], since the lockfile has [[rt::FD_CLOEXEC]] set.)

- Avoid long-lived transactions.  Read transactions prevent
  reuse of pages freed by newer write transactions, thus the
  database can grow quickly.  Write transactions prevent
  other write transactions, since writes are serialized.

- Avoid suspending a process with active transactions.  These
  would then be "long-lived" as above.  Also read transactions
  suspended when writers commit could sometimes see wrong data.

...when several processes can use a database concurrently:

- Avoid aborting a process with an active transaction.
  The transaction becomes "long-lived" as above until a check
  for stale readers is performed or the lockfile is reset,
  since the process may not remove it from the lockfile.

  This does not apply to write transactions if the system clears
  stale writers, see above.

- If you do that anyway, do a periodic check for stale readers. Or
  close the environment once in a while, so the lockfile can get reset.

- Do not use LMDB databases on remote filesystems, even between
  processes on the same host.  This breaks [[rt::flock]] on some OSes,
  possibly memory map sync, and certainly sync between programs
  on different hosts.

- Opening a database can fail if another process is opening or
  closing it at exactly the same time.