Option B — SQLite VFS substrate on FoundationDB
Pattern: SQLite (or any pager-style engine) with FDB as the disk. We implement the byte-range storage primitive a SQLite VFS sits on top of — fixed-size pages keyed by page number, atomic page-level updates — and demonstrate it through a small Go API. Wiring the C-level VFS hooks is mechanical glue that’s left for a follow-up.
What a SQLite VFS actually needs
SQLite’s pager talks to “the OS” through a thin C interface defined in vfs.h. Boiled down, a VFS file handle exposes:
| SQLite call | What we map it to |
|---|---|
xRead(buf, n, offset) | File.ReadAt(buf, offset) |
xWrite(buf, n, offset) | File.WriteAt(buf, offset) |
xTruncate(size) | File.Truncate(size) |
xFileSize() | File.Size() |
xLock / xUnlock | File.Lock(holder) / File.Unlock(holder) |
xSync | no-op — every WriteAt is already durable in FDB |
SQLite always reads and writes in multiples of the page size (4096 by default) once it’s past the 100-byte file header, so storing one FDB KV per page is a natural fit and makes the pager-to-storage mapping 1:1.
Key layout
<ns> 0x00 -> uint64 BE file size in bytes
<ns> 0x01 <pageNum:uint64 BE> -> 4096-byte page
<ns> 0x02 -> lock holder name (or absent)
Where the transactional magic lives
The interesting method is WriteAt. For partial-page writes, we:
- Issue
tr.Getfor every affected page. - Merge the existing page bytes with the new bytes.
tr.Setthe resulting full page.- Update the file-size key if we grew.
All inside one FDB transaction. That gives SQLite a property it cannot get from a normal filesystem: multi-page writes are atomic. SQLite has elaborate journal/WAL machinery to recover from “we crashed halfway through updating pages 17, 18, and 19.” On this VFS that recovery code becomes dead — either all three pages flipped or none did.
In practice you’d still set PRAGMA journal_mode = MEMORY (so SQLite skips
the rollback journal it doesn’t need) and rely on FDB’s transactional commit
as the single durability point.
Hooking this into a real SQLite
There are two paths:
- cgo + mattn/go-sqlite3 or zombiezen.com/go/sqlite: register a custom
sqlite3_vfswhosexRead/xWrite/...thunks call into ourpagestore.Filemethods via a CGO bridge. ~300 lines of glue. - modernc.org/sqlite: pure-Go SQLite. Its
vfssubpackage exposesRegister/VFStypes. Same wiring, no CGO.
Either way the interesting code is what’s already here — the C/Go thunks add no further insight into how FDB serves as the storage tier.
Running the demo
cd option-b-sqlite
go mod tidy
go run ./demo -cluster ../fdb.cluster
Expected output:
After writing header (16 B): size=16
After 100-byte cross-page write at offset 4090: size=4190
Read 100 B back, all 'A'? true
Header preserved? "SQLite format 3\x00"
After truncate to 4096: size=4096
Lock contention as expected: pagestore: locked by "conn-A"
Lock handoff OK.
The cross-page write at offset 4090 is the key correctness test: it spans page 0 (bytes 4090–4095) and page 1 (bytes 4096–4189). The output above proves that:
- The pre-existing 16-byte header on page 0 was preserved during the read- modify-write merge.
- Both halves of the payload made it to disk.
- A subsequent
Truncate(4096)cleared page 1 entirely.
What’s deliberately omitted
- WAL mode. Skipping it forces SQLite into rollback-journal mode, which
emits ordinary writes only — exactly what our VFS supports. WAL would
require a second “shared-memory” backing store (
xShmMap) and is the single largest source of complexity in real VFS implementations. - Multi-process locking. The lock key works for cooperating clients but has no liveness guarantee on holder crash. Production code would combine it with FDB watches and a heartbeat.
- The C bridge itself. See
docs/README.mdabove for the wiring sketch.