Skip to content

feat: add parquet receipt store with DuckDB range queries#2789

Open
jewei1997 wants to merge 10 commits intoledger-cache-layerfrom
parquet-receiptdb
Open

feat: add parquet receipt store with DuckDB range queries#2789
jewei1997 wants to merge 10 commits intoledger-cache-layerfrom
parquet-receiptdb

Conversation

@jewei1997
Copy link
Contributor

@jewei1997 jewei1997 commented Feb 2, 2026

Summary

This PR adds a parquet-based receipt storage backend with DuckDB for efficient range queries on logs, enabling fast eth_getLogs queries across block ranges.

  • Add parquet backend option (Backend: "parquet" in config)
  • Parquet files rotate every 500 blocks
  • DuckDB queries across closed parquet files for efficient log filtering
  • WAL for crash recovery of in-progress parquet files
  • Pruning of old parquet files based on KeepRecent config
  • Build tag support: use -tags duckdb to enable parquet backend

The parquet backend supports the new FilterLogs range query API introduced in #2788, enabling efficient cross-block log queries without falling back to per-receipt fetching.

Dependencies

Test plan

  • Receipt store unit tests pass (without duckdb tag)
  • Parquet store tests pass with -tags duckdb
  • Integration testing with full node using parquet backend

@github-actions
Copy link

github-actions bot commented Feb 2, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedFeb 9, 2026, 2:53 PM

@codecov
Copy link

codecov bot commented Feb 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 48.32%. Comparing base (6aa9c6b) to head (69423d9).

Additional details and impacted files

Impacted file tree graph

@@                   Coverage Diff                   @@
##           ledger-cache-layer    #2789       +/-   ##
=======================================================
+ Coverage               46.91%   48.32%    +1.40%     
=======================================================
  Files                    1969      671     -1298     
  Lines                  160784    50575   -110209     
=======================================================
- Hits                    75438    24438    -51000     
+ Misses                  78801    23998    -54803     
+ Partials                 6545     2139     -4406     
Flag Coverage Δ
sei-chain ?
sei-cosmos 48.13% <ø> (+<0.01%) ⬆️
sei-db 68.72% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.
see 1419 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +230 to +240
for blockNum, logs := range chunk.logs {
if blockNum < fromBlock || blockNum > toBlock {
continue
}
for _, lg := range logs {
if matchLog(lg, crit) {
logCopy := *lg
result = append(result, &logCopy)
}
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
"database/sql"
"fmt"
"path/filepath"
"runtime"

Check notice

Code scanning / CodeQL

Sensitive package import Note

Certain system packages contain functions which may be a possible source of non-determinism
Comment on lines +280 to +302
go func() {
for {
latestVersion := s.latestVersion.Load()
pruneBeforeBlock := latestVersion - s.config.KeepRecent
if pruneBeforeBlock > 0 {
pruned := s.pruneOldFiles(uint64(pruneBeforeBlock))
if pruned > 0 && s.log != nil {
s.log.Info(fmt.Sprintf("Pruned %d parquet file pairs older than block %d", pruned, pruneBeforeBlock))
}
}

// Add jitter to avoid thundering herd
jitter := time.Duration(float64(pruneIntervalSeconds)*0.5) * time.Second
sleepDuration := time.Duration(pruneIntervalSeconds)*time.Second + jitter

select {
case <-s.pruneStop:
return
case <-time.After(sleepDuration):
// Continue to next iteration
}
}
}()

Check notice

Code scanning / CodeQL

Spawning a Go routine Note

Spawning a Go routine may be a possible source of non-determinism
}

// Add jitter to avoid thundering herd
jitter := time.Duration(float64(pruneIntervalSeconds)*0.5) * time.Second

Check notice

Code scanning / CodeQL

Floating point arithmetic Note

Floating point arithmetic operations are not associative and a possible source of non-determinism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants