Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Option B — SQLite VFS substrate on FoundationDB

Pattern: SQLite (or any pager-style engine) with FDB as the disk. We implement the byte-range storage primitive a SQLite VFS sits on top of — fixed-size pages keyed by page number, atomic page-level updates — and demonstrate it through a small Go API. Wiring the C-level VFS hooks is mechanical glue that’s left for a follow-up.

What a SQLite VFS actually needs

SQLite’s pager talks to “the OS” through a thin C interface defined in vfs.h. Boiled down, a VFS file handle exposes:

SQLite callWhat we map it to
xRead(buf, n, offset)File.ReadAt(buf, offset)
xWrite(buf, n, offset)File.WriteAt(buf, offset)
xTruncate(size)File.Truncate(size)
xFileSize()File.Size()
xLock / xUnlockFile.Lock(holder) / File.Unlock(holder)
xSyncno-op — every WriteAt is already durable in FDB

SQLite always reads and writes in multiples of the page size (4096 by default) once it’s past the 100-byte file header, so storing one FDB KV per page is a natural fit and makes the pager-to-storage mapping 1:1.

Key layout

<ns> 0x00                            -> uint64 BE  file size in bytes
<ns> 0x01 <pageNum:uint64 BE>        -> 4096-byte page
<ns> 0x02                            -> lock holder name (or absent)

Where the transactional magic lives

The interesting method is WriteAt. For partial-page writes, we:

  1. Issue tr.Get for every affected page.
  2. Merge the existing page bytes with the new bytes.
  3. tr.Set the resulting full page.
  4. Update the file-size key if we grew.

All inside one FDB transaction. That gives SQLite a property it cannot get from a normal filesystem: multi-page writes are atomic. SQLite has elaborate journal/WAL machinery to recover from “we crashed halfway through updating pages 17, 18, and 19.” On this VFS that recovery code becomes dead — either all three pages flipped or none did.

In practice you’d still set PRAGMA journal_mode = MEMORY (so SQLite skips the rollback journal it doesn’t need) and rely on FDB’s transactional commit as the single durability point.

Hooking this into a real SQLite

There are two paths:

  1. cgo + mattn/go-sqlite3 or zombiezen.com/go/sqlite: register a custom sqlite3_vfs whose xRead/xWrite/... thunks call into our pagestore.File methods via a CGO bridge. ~300 lines of glue.
  2. modernc.org/sqlite: pure-Go SQLite. Its vfs subpackage exposes Register / VFS types. Same wiring, no CGO.

Either way the interesting code is what’s already here — the C/Go thunks add no further insight into how FDB serves as the storage tier.

Running the demo

cd option-b-sqlite
go mod tidy
go run ./demo -cluster ../fdb.cluster

Expected output:

After writing header (16 B): size=16
After 100-byte cross-page write at offset 4090: size=4190
Read 100 B back, all 'A'? true
Header preserved? "SQLite format 3\x00"
After truncate to 4096: size=4096
Lock contention as expected: pagestore: locked by "conn-A"
Lock handoff OK.

The cross-page write at offset 4090 is the key correctness test: it spans page 0 (bytes 4090–4095) and page 1 (bytes 4096–4189). The output above proves that:

  • The pre-existing 16-byte header on page 0 was preserved during the read- modify-write merge.
  • Both halves of the payload made it to disk.
  • A subsequent Truncate(4096) cleared page 1 entirely.

What’s deliberately omitted

  • WAL mode. Skipping it forces SQLite into rollback-journal mode, which emits ordinary writes only — exactly what our VFS supports. WAL would require a second “shared-memory” backing store (xShmMap) and is the single largest source of complexity in real VFS implementations.
  • Multi-process locking. The lock key works for cooperating clients but has no liveness guarantee on holder crash. Production code would combine it with FDB watches and a heartbeat.
  • The C bridge itself. See docs/README.md above for the wiring sketch.