Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Reading Guide — Where to Go Next

You’ve read through the five chapters of this guide and the lab implementation guides. By now you understand FDB’s architecture from the commit pipeline through every component of the cluster, the layer concept from byte-level encoding through production schema evolution, and how five real systems (Apple CloudKit, Snowflake, mvsqlite, Document Layer, Record Layer) deploy these exact patterns.

This chapter maps out the landscape of resources for going deeper in each direction.


6.1 FoundationDB — Going Deeper

Essential Reading

FDB Documentation — The official reference. The “Developer Guide” section covers API semantics in depth, including the precise semantics of read-your-writes consistency, exactly which operations conflict, and the full list of transaction options.

FDB White Paper — SIGMOD 2021 — “FoundationDB: A Distributed Unbundled Transactional Key Value Store.” The definitive technical reference for FDB’s internal architecture. Covers the simulation framework, commit pipeline, log system, and storage layer in academic depth. If you’ve read this guide, you have the vocabulary to understand every section of this paper.

FDB Forum — Design discussions, Q&A, and announcements. Some of the most insightful posts are from FDB’s core team explaining design decisions.

Source Code

apple/foundationdb — The full FDB C++ source. Key files:

  • fdbserver/MasterProxyServer.actor.cpp → Commit Proxy implementation (the commit pipeline)
  • fdbserver/Resolver.actor.cpp → Resolver (conflict detection)
  • fdbserver/TLogServer.actor.cpp → Transaction Log (write-ahead log)
  • fdbserver/StorageServer.actor.cpp → Storage Server (reads, MVCC)
  • fdbclient/ReadYourWrites.actor.cpp → Client-side read-your-writes cache
  • fdbserver/workloads/ → Simulation workloads (randomized fault testing)

FoundationDB/fdb-record-layer — Java. The most complete example of a production FDB layer. Study FDBRecordContext (transaction management), RecordQueryPlanner (query compilation to range scans), and OnlineIndexer (safe online index builds). Reading this source after this guide will be directly comprehensible.


6.2 Storage Engines — Going Deeper

Books

Designing Data-Intensive Applications by Martin Kleppmann — The best single-volume overview of storage trade-offs, replication, consistency, and distributed transactions. Chapters 3 (storage engines), 7 (transactions), and 9 (consistency) are directly relevant. If you haven’t read this book, stop here and read it. It contextualizes everything in this guide.

Database Internals by Alex Petrov — Deep dive into B-trees (B+ variants, page splits, page merges), LSM trees (all the internals: bloom filters, compaction algorithms, manifest management), and distributed consensus (Paxos, Raft, Multi-Paxos). The LSM chapter is the best public explanation of RocksDB/LevelDB internals.

Papers

The Log-Structured Merge Tree (1996) — The original LSM paper by O’Neil et al. Defines the C0/C1/C2 component model that LevelDB simplifies into level-0/level-1/…

Bigtable: A Distributed Storage System for Structured Data (2006) — The architecture paper for Google Bigtable. Introduced the tablet-server model, SSTable format, and hierarchical metadata server. Direct ancestor of HBase, Cassandra’s SSTables, and FDB’s storage layer design.

Spanner: Google’s Globally-Distributed Database (2012) — How Google built a globally distributed ACID database using TrueTime for external consistency. The “Paxos groups” concept is closely related to FDB’s shard-level durability. Essential reading for understanding distributed transactions.

Source Code

google/leveldb — The C++ LevelDB. Read: db/version_set.cc (compaction and manifest management), db/log_reader.cc + db/log_writer.cc (WAL format), table/block.cc + table/format.cc (SSTable on-disk format), util/arena.cc (memtable arena allocator).

syndtr/goleveldb — The Go LevelDB used in option-b-leveldb. The storage/ package defines the storage.Storage interface that our layer implements.

sqlite.org/sqlite — The SQLite source. Study: btree.c (B-tree implementation), vfs.c (VFS abstraction), pager.c (page cache and WAL). The SQLite source is famously well-commented.

facebook/rocksdb — RocksDB is LevelDB’s production successor. Much more complex (tiered compaction, bloom filters, column families, transactions) but the fundamental data model is identical.


6.3 SQLite VFS — Going Deeper

SQLite VFS Documentation — The full VFS API with method semantics. Essential for understanding what each of the xRead, xWrite, xSync, xLock, xUnlock methods must guarantee.

SQLite File Format — The page-by-page layout of a SQLite database file. After reading the option-b-sqlite guide, this document will make complete sense. The page header format, freelist, overflow pages, and B-tree cell formats are all documented here.

losfair/mvsqlite — The production FDB-backed SQLite VFS. The Rust code is clean and readable. The mvfs/ directory contains the VFS implementation; mvsqlite/ contains the page store. Compare with option-b-sqlite/pagestore/pagestore.go line by line.

SQLite WAL Mode — The WAL (Write-Ahead Logging) mode documentation. Explains how WAL provides concurrent reads and writes. Our pagestore uses rollback journal mode semantics (xSync is a no-op because FDB is durable), but understanding WAL mode helps you see what we’re not implementing.


6.4 Distributed Systems Fundamentals

The Part-Time Parliament (Paxos) — Lamport’s original Paxos paper. Dense, but the consensus problem it solves (how do N machines agree on one value when any machine can fail?) is the foundation of every distributed database.

In Search of an Understandable Consensus Algorithm (Raft) — The Raft paper. More accessible than Paxos. FDB uses a Paxos variant, not Raft, but the problems solved are identical. After this, read the Raft visualization.

Linearizability: A Correctness Condition for Concurrent Objects — The formal definition of linearizability (which FDB provides for reads and writes) and its relationship to sequential consistency and serializability.

A Critique of ANSI SQL Isolation Levels — The Berenson et al. paper that formalized isolation anomalies (dirty reads, non-repeatable reads, phantoms, lost updates, write skew) and showed that ANSI SQL’s definitions were imprecise. Essential vocabulary for discussing “serializable” vs “snapshot isolation” vs “repeatable read.”


6.5 Building Something Real — A Progression

If you want to go from “I understand these labs” to “I built something production-worthy,” here is a concrete progression:

Step 1: Add continuations to option-a-leveldb. Add IterateWithCursor(cursor []byte, limit int) ([]KV, nextCursor []byte). This is the single most important production feature missing from all the labs. Every real FDB application needs cursor-based pagination for range queries.

Step 2: Add range index queries to option-c-record-layer. Implement LookupByRange(schema, field string, minVal, maxVal interface{}) ([]Record, error). This requires: (a) sort-preserving encoding for all index value types, and (b) a range scan on the index subspace instead of a point lookup.

Step 3: Implement a simple query planner. For WHERE city='Paris' AND age >= 25, decide: which index scan is more selective? Scan city=‘Paris’ and filter age, or scan age >= 25 and filter city? Look at how fdb-record-layer’s RecordQueryPlanner.java makes this decision.

Step 4: Add online index building. Given a table with 1 million records, add a new index without downtime. The approach: (a) mark index as “building”, (b) background job scans records in cursor-paginated chunks and writes index entries, (c) any concurrent PutRecord/DeleteRecord writes to both old and new index state, (d) when background job finishes, mark index “ready”. This is how fdb-record-layer’s OnlineIndexer works.

Step 5: Read the fdb-record-layer source. After steps 1-4, open fdb-record-layer’s FDBRecordStore.java, RecordQueryPlanner.java, and OnlineIndexer.java. You will understand every design decision immediately. The gap between “labs exercise” and “production library used at Apple scale” is these four features: continuations, sort-preserving range indexes, query planning, and online index builds.