Add periodic fallback pruning for stalled finalization#179
Add periodic fallback pruning for stalled finalization#179pablodeymo wants to merge 8 commits intomainfrom
Conversation
When finalization stalls (e.g., network issues, low participation), the States and Block tables grow unbounded because retention-based pruning only triggers on finalization advancement. This adds a periodic check every 7200 slots (~8 hours) that calls prune_old_states and prune_old_blocks directly when finalization is far behind, adapted from Zeam's approach. The check is cheap (two integer comparisons per slot-0 tick) and the actual pruning reuses existing idempotent methods.
🤖 Kimi Code ReviewReview SummaryThe PR introduces a fallback pruning mechanism for when finalization is stalled, which is a good defensive measure against unbounded storage growth. However, there are several issues that need attention: Critical Issues
Security & Correctness Issues
Performance & Best Practices
Suggested Fixes// In blockchain/src/store.rs, fix the race condition:
let current_finalized_slot = store.latest_finalized().slot;
if slot.is_multiple_of(PRUNING_FALLBACK_INTERVAL_SLOTS)
&& slot.saturating_sub(current_finalized_slot) > PRUNING_FALLBACK_INTERVAL_SLOTS
{
// ... rest of code
}
// In storage/src/store.rs, fix atomic reads:
pub fn periodic_prune(&mut self) {
let (finalized_checkpoint, justified_checkpoint) = {
let finalized = self.latest_finalized();
let justified = self.latest_justified();
(finalized, justified)
};
let protected_roots = [finalized_checkpoint.root, justified_checkpoint.root];
// ... rest of code
}Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Codex Code Review
Could not run tests in this environment due rustup permission errors ( Automated review by OpenAI Codex · custom prompt |
Greptile SummaryThis PR addresses a gap in PR #170's retention-based pruning: because Key points:
Confidence Score: 4/5
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["on_tick — interval 0"] --> B{slot > 0?}
B -- No --> G[Skip pruning]
B -- Yes --> C{slot % PRUNING_FALLBACK_INTERVAL_SLOTS == 0?}
C -- No --> G
C -- Yes --> D{"slot - finalized_slot\n> PRUNING_FALLBACK_INTERVAL_SLOTS?"}
D -- No\nFinalization healthy --> G
D -- Yes\nFinalization stalled --> E["warn! log stall event"]
E --> F["store.periodic_prune()"]
F --> F1["protected = [finalized.root, justified.root]"]
F1 --> F2["prune_old_states(protected)\nprune_old_blocks(protected)"]
F2 --> F3{anything pruned?}
F3 -- Yes --> F4["info! log pruned counts"]
F3 -- No --> G
Last reviewed commit: d6c7958 |
🤖 Claude Code ReviewHere is my review of PR #179: Review: Add periodic fallback pruning for stalled finalizationThe motivation is clear and the implementation is straightforward. A few points to consider: Notable:
|
…s 3 redundant DB reads for the same checkpoint value that doesnt change between calls in the interval-0 handler.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Two tests that verify the 3-part guard in on_tick interval 0:
- on_tick_triggers_periodic_pruning_when_finalization_stalled: pruning
fires when slot is a multiple of the interval AND finalization lag
exceeds the threshold
- on_tick_skips_periodic_pruning_when_finalization_healthy: pruning does
NOT fire when finalization is recent (lag within threshold)
Also exports Table, StorageWriteBatch, and StorageReadView from the
storage crate to enable direct backend access in blockchain tests.
Motivation
PR #170 adds retention-based pruning for blocks and states, but it only triggers when finalization
advances (
finalized.slot > old_finalized_slotinupdate_checkpoints). If finalization stalls —due to network issues, low participation, or other consensus failures — the
StatesandBlockHeaders/BlockBodies/BlockSignaturestables grow unbounded because the pruning conditionis never met.
This is adapted from Zeam's approach, which runs a periodic
fallback every 7200 slots (~8 hours) to prune non-canonical states when finalization is far behind.
Description
Adds a periodic pruning check in
on_tick(interval 0) that calls the existingprune_old_statesand
prune_old_blocksmethods directly, bypassing the finalization-gated path inupdate_checkpoints.Trigger conditions (all must be true):
slot > 0(skip genesis)slotis a multiple ofPRUNING_FALLBACK_INTERVAL_SLOTS(7200 slots = ~8 hours at 4s/slot)current_slot - finalized_slot > PRUNING_FALLBACK_INTERVAL_SLOTS(finalization is truly stalled)If finalization is healthy, condition 3 is never met, so the normal pruning path in
update_checkpointshandles everything and this code is a no-op (two cheap integer comparisonsper slot-0 tick).
Changes
crates/storage/src/store.rsPRUNING_FALLBACK_INTERVAL_SLOTSconstant (7200), addperiodic_prune()method, add 2 testscrates/storage/src/lib.rsPRUNING_FALLBACK_INTERVAL_SLOTScrates/blockchain/src/store.rson_tickSafety
prune_old_statesandprune_old_blocksare already idempotent and protect finalized/justifiedroots via the
protected_rootsparameterperiodic_prunepasses[latest_finalized.root, latest_justified.root]as protected, matchingthe existing behavior in
update_checkpointsHow to test
New tests:
periodic_prune_removes_old_states_and_blocks— verifies pruning beyond retention window whileprotecting finalized/justified roots
periodic_prune_no_op_within_retention— verifies no pruning when within retention window