Option B — LevelDB on top of FoundationDB

How LevelDB sees the world
Key layout
Why this is interesting
Running
What this implementation skips

Pattern: existing storage engine, FDB as the disk. We give the unmodified goleveldb library an FDB-backed implementation of its storage.Storage interface. LevelDB still does its LSM thing (memtables, SSTables, compaction, MANIFEST), but every byte ends up in FDB key ranges instead of on local disk.

This is the mirror image of option-a: the storage engine sits below FDB rather than above it.

How LevelDB sees the world

goleveldb accesses persistence exclusively through a small interface:

type Storage interface {
    Lock() (Locker, error)
    Log(str string)
    SetMeta(FileDesc) error
    GetMeta() (FileDesc, error)
    List(FileType) ([]FileDesc, error)
    Open(FileDesc) (Reader, error)
    Create(FileDesc) (Writer, error)
    Remove(FileDesc) error
    Rename(old, new FileDesc) error
    Close() error
}

Each FileDesc is {Type, Num} — e.g. {TypeTable, 42} for SST #42 or {TypeManifest, 7} for the 7th manifest. Filenames are an implementation detail; LevelDB never looks at strings.

Our fdbstorage package implements that interface against FDB. The whole file is ~250 lines.

Key layout

<ns> 0x01 <ftype:1B> <num:int64 BE>                 -> uint64 BE file size
<ns> 0x02 <ftype:1B> <num:int64 BE> <chunk:uint32 BE> -> 64 KiB chunk
<ns> 0x03                                             -> current MANIFEST {ftype,num}
<ns> 0x04                                             -> lock marker

Files are split into 64 KiB chunks so we stay well under FDB’s 100 KiB-per- value soft limit and the 10 MB-per-transaction hard limit.
Create returns a Writer that buffers in memory and flushes on Sync / Close. We split the flush across multiple transactions (100 chunks each ≈ 6 MiB) to safely handle files larger than 10 MB.
Rename is implemented as copy-then-clear inside one transaction. LevelDB only renames small files (temp → real on flush completion), so the inefficiency doesn’t matter.
SetMeta writes the manifest pointer atomically. Because FDB transactions are serializable, two concurrent flushes can’t observe a half-rotated manifest.

Why this is interesting

You get a real LevelDB instance — with bloom filters, compaction, snapshots, the works — whose durability story is “whatever FDB’s durability story is.” That means:

Geo-replication and read scaling come for free from the FDB cluster.
Backups are FDB backups.
The local node has no on-disk state at all; it can crash and restart against a different FDB coordinator without losing anything.

The cost is latency: every SST read is at least one FDB round-trip, every flush is many. This isn’t a production architecture; it’s a teaching artifact that proves how cleanly the layers separate.

Running

cd option-b-leveldb
go mod tidy
go run ./demo -cluster ../fdb.cluster

Expected output (the second session re-opens and reads the persisted data):

First session: wrote 3 keys, then closed.
Reopening LevelDB on the same FDB namespace...

  apple -> red
  banana -> yellow
  cherry -> red

Iterating the whole DB:
  apple -> red
  banana -> yellow
  cherry -> red

What this implementation skips

Locker isn’t multi-process safe across long-lived processes — if a holder crashes the lock key stays set. A production version would attach the lock to a client UUID and TTL it via FDB watches.
Reader loads the whole file into memory. LevelDB SSTs are bounded (default 2 MB), so this is fine for a demo but not for huge tables.
No caching layer. Every Open is a fresh FDB scan. A real impl would cache hot SSTs.

Read fdbstorage/storage.go — the whole thing is one file deliberately, so you can follow the data flow end to end.