A Paprika-inspired Ethereum state storage engine written in Rust for ethrex.
ethrex_db is a high-performance, persistent storage solution for Ethereum state and storage tries. It is inspired by Paprika, a C# implementation by Nethermind.
- Block-aware persistence: Understands Ethereum concepts like finality, reorgs, latest/safe/finalized blocks
- Copy-on-Write concurrency: Single writer with multiple lock-free readers
- Memory-mapped storage: LMDB-inspired page-based persistence
- Efficient trie operations: Optimized nibble path handling and in-page storage
Benchmark comparison against ethrex-trie and cita_trie (the baseline used by ethrex):
| Items | ethrex_db | ethrex-trie | cita_trie | Speedup |
|---|---|---|---|---|
| 100 | 68 µs (1.47 Melem/s) | 109 µs | 129 µs | 1.6-1.9x faster |
| 1,000 | 761 µs (1.31 Melem/s) | 1.35 ms | 1.49 ms | 1.8-2.0x faster |
| 10,000 | 7.9 ms (1.27 Melem/s) | 17.0 ms | 15.1 ms | 1.9-2.2x faster |
| Implementation | Time | Speedup |
|---|---|---|
| ethrex_db | 14 ns | baseline |
| ethrex-trie | 143 ns | 10x faster |
| cita_trie | 206 ns | 15x faster |
| Items | ethrex_db | ethrex-trie | Speedup |
|---|---|---|---|
| 100 | 3.6 ns | 42 ns | 12x faster |
| 1,000 | 3.6 ns | 48 ns | 13x faster |
| Entries | Sequential | Parallel | Speedup |
|---|---|---|---|
| 100 | 64 µs | 47 µs | 1.4x |
| 1,000 | 678 µs | 298 µs | 2.3x |
| 10,000 | 7.3 ms | 2.3 ms | 3.2x |
| Operation | Performance |
|---|---|
| NibblePath from_bytes | 14.2 ns |
| NibblePath get_nibble | 1.0 ns |
| NibblePath common_prefix_len | 1.3 ns |
| SlottedArray insert (100 entries) | 27.2 Melem/s |
| SlottedArray insert (500 entries) | 31.5 Melem/s |
| SlottedArray lookup (100 entries) | 156 ns (3.2 Gelem/s) |
| Keccak256 (32 bytes) | 170 ns (179 MiB/s) |
| Optimization | Before | After | Improvement |
|---|---|---|---|
| Page read (cached) | 168 ns | 56 ns | 3x faster |
| Bloom filter membership | - | 173 ns | Fast negative lookups |
| Cache hit rate | - | 70%+ | Typical workload |
Run benchmarks yourself:
cargo bench --bench trie_comparison # Compare against other tries
cargo bench --bench benchmarks # Full benchmark suiteTraditional Ethereum clients store the Merkle Patricia Trie (MPT) on top of general-purpose key-value stores like RocksDB or LevelDB. This approach has fundamental inefficiencies:
-
Write Amplification: RocksDB uses LSM-trees which amplify writes 10-30x. Every trie node update triggers multiple compactions.
-
Read Amplification: Reading a single value requires traversing ~8 trie nodes (for 32-byte keys), each requiring a separate RocksDB lookup through multiple SST file levels.
-
Space Amplification: Trie nodes are stored as separate KV pairs with overhead per entry. LSM compaction creates multiple copies of data.
-
No Block Awareness: RocksDB doesn't understand Ethereum's finality model. Reorgs require expensive tombstone writes and compactions.
-
Cache Inefficiency: The LSM-tree's multi-level structure has poor cache locality for trie traversals.
1. Flat Key-Value with Computed Merkle
Instead of persisting trie structure, we store flat key-value pairs and compute the Merkle root on demand:
- Inserts are O(1) hash table operations
- Root computation happens once at block commit
- No tree rebalancing on updates
2. Page-Based Storage (Like LMDB)
┌─────────────────────────────────────────┐
│ Page (4KB) │
├─────────┬───────────────────────────────┤
│ Header │ Payload (SlottedArray) │
│ 8 bytes │ ◄── slots │ free │ data ──► │
└─────────┴───────────────────────────────┘
- Fixed 4KB pages with O(1) allocation
- No write amplification - pages are written in-place
- Memory-mapped for zero-copy reads
- Copy-on-Write for concurrent readers
3. Block-Aware Architecture
┌──────────┐
│ Genesis │
└────┬─────┘
│
┌──────────┼──────────┐
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│Block 1A│ │Block 1B│ │Block 1C│ ◄── Hot (Blockchain)
└───┬────┘ └────────┘ └────────┘
│
▼
┌─────────┐
│Finalized│ ◄── Cold (PagedDb)
└─────────┘
- Hot storage (Blockchain): Uncommitted blocks use Copy-on-Write diffs
- Cold storage (PagedDb): Finalized state in memory-mapped pages
- Reorgs only affect hot storage - no disk writes needed
4. SlottedArray for Dense Packing
PostgreSQL-inspired in-page storage:
- Slots grow forward, data grows backward
- Variable-length values with minimal overhead
- Binary search for O(log n) lookups within page
5. 256-Way Fanout
DataPages use 256-bucket fanout (2 nibbles):
- Reduces tree depth from ~64 to ~16
- Better cache utilization
- Fewer page accesses per lookup
| Aspect | MPT + RocksDB | ethrex_db |
|---|---|---|
| Write amplification | 10-30x (LSM) | 1x (in-place pages) |
| Read amplification | High (multi-level) | Low (mmap + fanout) |
| Reorg handling | Expensive (tombstones) | Cheap (COW diffs) |
| Concurrency | Lock-based | Lock-free readers |
| Memory efficiency | Poor (LSM buffers) | Excellent (mmap) |
The library is split into two major components:
Handles blocks that are not yet finalized (latest, safe):
- Parallel block creation from the same parent
- Fork Choice Update (FCU) handling
- Copy-on-Write state per block
Handles finalized blocks:
- Memory-mapped file storage using
memmap2 - 4KB page-based addressing
- Copy-on-Write for consistency
Efficient path representation for trie traversal. A nibble is a half-byte (4 bits), representing values 0-15. Ethereum trie paths are sequences of nibbles derived from keccak hashes.
use ethrex_db::data::NibblePath;
let path = NibblePath::from_bytes(&[0xAB, 0xCD]);
assert_eq!(path.get(0), 0xA);
assert_eq!(path.get(1), 0xB);In-page key-value storage using the PostgreSQL-inspired slot array pattern:
- Slots grow forward from the header
- Data grows backward from the end of the page
- Tombstone-based deletion with defragmentation
use ethrex_db::data::{SlottedArray, NibblePath};
let mut arr = SlottedArray::new();
let key = NibblePath::from_bytes(&[0xAB, 0xCD]);
arr.try_insert(&key, b"hello world");- RootPage: Metadata, fanout array, abandoned page tracking
- DataPage: Intermediate nodes with 256-bucket fanout
- BottomPage: Leaf pages with SlottedArray
- AbandonedPage: Tracks pages for reuse after reorg depth
use ethrex_db::merkle::{MerkleTrie, keccak256, EMPTY_ROOT};
let mut trie = MerkleTrie::new();
trie.insert(&key, value); // Insert/update
trie.get(&key) -> Option<&[u8]> // Lookup
trie.remove(&key); // Delete
trie.root_hash() -> [u8; 32] // Compute state root
trie.len() -> usize // Entry count
trie.iter() -> impl Iterator // Iterate all entriesuse ethrex_db::chain::{Blockchain, Account, WorldState};
use ethrex_db::store::PagedDb;
let db = PagedDb::in_memory(1000)?;
let blockchain = Blockchain::new(db);
let mut block = blockchain.start_new(parent_hash, block_hash, number)?;
block.set_account(addr, account); // Modify account
block.set_storage(addr, slot, value); // Modify storage
blockchain.commit(block)?; // Commit block
blockchain.finalize(hash)?; // Finalize to cold storageuse ethrex_db::chain::Account;
let account = Account::with_balance(U256::from(1000));
account.nonce; // Transaction count
account.balance; // ETH balance
account.encode() -> Vec<u8> // RLP encode
Account::decode(bytes)? // RLP decodeethrex_db/
├── src/
│ ├── lib.rs
│ ├── data/ # Core data structures
│ │ ├── nibble_path.rs
│ │ └── slotted_array.rs
│ ├── store/ # Page-based persistent storage
│ │ ├── page.rs
│ │ ├── data_page.rs
│ │ ├── root_page.rs
│ │ └── paged_db.rs
│ ├── chain/ # Block management
│ │ ├── blockchain.rs
│ │ └── block.rs
│ └── merkle/ # State root computation
│ └── compute.rs
├── Cargo.toml
└── tests/
Run the test suite:
cargo testThe project includes:
- 173 tests (unit + integration + Ethereum compatibility)
- Property-based tests using
proptestfor NibblePath, SlottedArray, and MerkleTrie - End-to-end tests covering multi-block chains, forks, storage operations, state persistence
The project includes fuzz targets for critical data structures:
# Install cargo-fuzz if needed
cargo install cargo-fuzz
# Run fuzzers
cargo fuzz run fuzz_nibble_path
cargo fuzz run fuzz_slotted_array
cargo fuzz run fuzz_merkle_trieCore data structures:
- NibblePath - trie path traversal
- SlottedArray - in-page key-value storage
- Property-based tests with proptest
- Page abstraction (4KB pages)
- DataPage with fanout buckets
- RootPage with metadata
- LeafPage, AbandonedPage
- PagedDb with memory-mapped files
- Copy-on-Write concurrency
- Batch commit support
- Block abstraction with parent chain
- WorldState trait for state access
- Blockchain for unfinalized state
- Parallel block creation
- FCU handling
- Finalization to PagedDb
- RLP encoding for Ethereum
- Keccak-256 hashing
- Merkle node types (Leaf, Extension, Branch)
- In-memory MerkleTrie
- Deterministic root hash computation
- SlottedArray iterator for loading entries from pages
- PagedStateTrie for PagedDb integration
- Fanout-based storage for large tries (256 buckets)
- Account storage tries (nested tries per account)
- State root computation in Blockchain finalization
- Full read/write path: Blockchain → PagedDb → Merkle
- State persistence across database reopens
SIMD Vectorization Investigation:
- Benchmarked explicit SIMD (
widecrate) vs scalar comparison - Finding: LLVM auto-vectorization already optimizes
starts_with - Scalar comparison: 1.25-3.26 ns, SIMD: 2.00-4.81 ns
- Decision: Keep standard library functions - they're already SIMD-optimized
Lock-Free Readers:
- Replaced
std::sync::RwLockwithparking_lot::RwLock(faster, no poisoning) - Added atomic variables for frequently-read metadata:
batch_id: Current batch ID (AtomicU32)block_number: Current block number (AtomicU32)state_root: State root address (AtomicU32)block_hash: Block hash (4x AtomicU64)
-
begin_read_only()now lock-free for common metadata -
batch_id(),block_number(),block_hash()all lock-free
Parallel Merkle Computation:
- Added Rayon for parallel computation
- Implemented
parallel_root_hash()with automatic threshold - Branch nodes with >64 entries computed in parallel
- Benchmarks:
- 100 entries: sequential faster (parallelization overhead)
- 1,000 entries: parallel 2.2x faster (1.44ms → 650µs)
- 10,000 entries: parallel 4.2x faster (9.8ms → 2.3ms)
Abandoned Page Reuse:
- Added inline abandoned page storage in RootPage
- Batch ID tracking for reorg safety
- Configurable reorg depth (default: 64 batches)
- Automatic page reuse after reorg depth passes
- AbandonedPage linked list for overflow (structure in place)
- COW integration with
abandon_page()- automatic tracking during Copy-on-Write - AbandonedPage overflow handling - linked list for large abandoned sets
- SlottedArray defragmentation -
defragment(),wasted_space(),needs_defragmentation() - Merkle proof generation -
ProofNode,MerkleProof,generate_proof(),verify() - Snapshot/checkpoint support -
create_snapshot(),restore_snapshot(),export(),import()
- Metrics module (
DbMetrics,MetricsSnapshot)- Page allocations, reuse, abandonment tracking
- COW operations, batch commits/aborts
- Bytes read/written, snapshot operations
- Fuzz testing with
cargo-fuzzfuzz_nibble_path- NibblePath operationsfuzz_slotted_array- SlottedArray with consistency verificationfuzz_merkle_trie- Trie operations with expected state tracking
Option A: Ethereum Compatibility Verification
- Test state roots against real mainnet blocks (genesis accounts, precompiles)
- Verify RLP encoding matches Ethereum spec exactly (54 test vectors)
- Add test vectors from ethereum/tests repository patterns
- Ensure Merkle proofs are compatible with other clients
Test coverage includes:
- RLP encoding (empty, bytes, strings, lists, integers, HP encoding)
- Keccak-256 hash verification (empty, RLP empty, known values)
- Basic trie operations (insert, delete, branch, extension nodes)
- Secure trie (keccak-hashed keys, as used in Ethereum state trie)
- Account RLP encoding (nonce, balance, storageRoot, codeHash)
- Mainnet verification (genesis structure, precompile accounts, large balances)
- Merkle proof generation and verification (inclusion/exclusion)
- Storage trie operations (slots, deletion)
Option D: Performance Optimizations
- LRU page cache for hot data (256 pages, 3x speedup for cached reads)
- Bloom filters for non-existence proofs (~173 ns membership check)
- Contract ID system (like Paprika) for storage efficiency
- Background compaction of abandoned pages
Performance improvements:
- Page cache: 168 ns (miss) → 56 ns (hit) = 3x faster repeated reads
- Bloom filter: Fast rejection of non-existent keys without HashMap lookup
- Cache hit rate in typical workloads: 70%+
Option B: Durability & Crash Recovery
- Write-ahead logging (WAL)
- Proper fsync strategies for commit durability
- Recovery from incomplete/corrupted writes
- Atomic metadata updates
Option C: Production Hardening
- Better error handling and recovery paths
- Memory pressure handling (graceful degradation)
- OpenTelemetry/tracing integration
- Connection pooling for concurrent access
Option E: Genesis State Verification
- Full genesis state root verification (~8800 accounts)
- Load and verify against complete Ethereum mainnet genesis
| Crate | Purpose |
|---|---|
memmap2 |
Memory-mapped files |
tiny-keccak |
Keccak hashing |
rlp |
Ethereum RLP encoding |
primitive-types |
H256, U256 |
parking_lot |
Fast synchronization primitives |
rayon |
Parallel computation |
- Paprika (C# implementation)
- Paprika Design Document
- PostgreSQL Page Layout
- LMDB - Lightning Memory-Mapped Database
- Andy Pavlo - Database Storage Lectures
MIT