Skip to content

fix: add max prunes to pruner#4191

Open
MasterPtato wants to merge 1 commit into02-12-fix_add_max_param_to_prune_signals_cmdfrom
02-12-fix_add_max_prunes_to_pruner
Open

fix: add max prunes to pruner#4191
MasterPtato wants to merge 1 commit into02-12-fix_add_max_param_to_prune_signals_cmdfrom
02-12-fix_add_max_prunes_to_pruner

Conversation

@MasterPtato
Copy link
Contributor

Description

Please include a summary of the changes and the related issue. Please also include relevant motivation and context.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes.

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@railway-app
Copy link

railway-app bot commented Feb 12, 2026

🚅 Deployed to the rivet-pr-4191 environment in rivet-frontend

Service Status Web Updated (UTC)
website 😴 Sleeping (View Logs) Web Feb 12, 2026 at 11:01 pm
frontend-inspector ❌ Build Failed (View Logs) Web Feb 12, 2026 at 10:52 pm
frontend-cloud ❌ Build Failed (View Logs) Web Feb 12, 2026 at 10:52 pm
mcp-hub ✅ Success (View Logs) Web Feb 12, 2026 at 10:51 pm
ladle ❌ Build Failed (View Logs) Web Feb 12, 2026 at 10:51 pm

Copy link
Contributor Author

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Feb 12, 2026

PR Review: Add max prunes to pruner

Summary

This PR adds a limit (MAX_PRUNES_PER_TXN = 10000) to prevent the pruner from processing too many workflows in a single transaction, which could cause transaction timeouts or memory issues.

Code Quality ✅

Positive aspects:

  • Clean, minimal change that addresses a specific issue
  • Follows existing patterns in the codebase (see prune_signals_inner at line 1882-1886 which uses max_per_txn)
  • Constant is appropriately named and scoped
  • Placement of the break condition is correct (after incrementing prune_count and updating new_last_key)

Potential Issues ⚠️

1. Off-by-one boundary condition

Line 1382: if prune_count > MAX_PRUNES_PER_TXN

This should likely be >= instead of >. With the current implementation:

  • When prune_count reaches 10000, the condition is false (10000 is not > 10000)
  • The loop continues and processes one more item
  • Only when prune_count becomes 10001 does it break

Recommendation: Change to if prune_count >= MAX_PRUNES_PER_TXN to enforce the limit precisely.

2. Inconsistent timeout handling

The function uses two different timeout mechanisms:

  • Line 1317-1320: Manual timeout check with EARLY_TXN_TIMEOUT that logs a warning and breaks
  • Line 1382-1384: New count-based limit that silently breaks

Compare this to prune_signals_inner (line 1936-1938) which wraps the entire loop in tokio::time::timeout.

Recommendation: Consider whether the new limit should also log when it is hit (e.g., tracing::debug!("reached max prunes per txn")) to help with debugging and observability.

3. Transaction boundary implications

When breaking due to the limit:

  • new_last_key is set to the last processed entry + 0xff (line 1380)
  • The outer loop (line 1398) uses this to resume from the next position
  • This looks correct, but the ClickHouse inserter is committed regardless of whether we hit the limit

This is probably fine, but worth noting that partial batches will be committed to ClickHouse.

Performance Considerations 👍

The change improves performance by:

  • Preventing unbounded transaction growth
  • Allowing work to be chunked across multiple smaller transactions
  • Reducing memory pressure and lock contention

The value of 10000 seems reasonable, though it is not documented why this specific number was chosen. Consider adding a comment explaining the rationale.

Security Concerns ✅

No security issues identified. This is a pure defensive programming change.

Test Coverage ❌

Missing test coverage:

  • No tests verify the new limit is enforced
  • No tests verify pagination continues correctly after hitting the limit
  • No tests verify behavior when exactly at the boundary

Recommendation: Add a test that:

  1. Creates more than 10000 workflows to prune
  2. Calls prune_workflows
  3. Verifies all workflows are eventually pruned (pagination works)
  4. Optionally verifies transaction batching behavior

Style & Conventions ✅

Follows project conventions per CLAUDE.md:

  • Constant naming follows existing pattern (EARLY_TXN_TIMEOUT)
  • No unnecessary formatting changes
  • Minimal, focused diff

Summary of Recommendations

Critical:

  1. Change > to >= on line 1382 to enforce the limit correctly

Nice to have:
2. Add a debug log when the limit is reached
3. Add a comment explaining why 10000 was chosen
4. Add test coverage for the new limit

Overall, this is a solid defensive fix that addresses a real operational concern. The main issue is the off-by-one error in the boundary condition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant