Add worker crash recovery to ProcessParallelController#395 by jhewers-pf · Pull Request #430 · algorithmicsuperintelligence/openevolve

jhewers-pf · 2026-03-04T14:59:41Z

When a child process in the ProcessPoolExecutor crashes (OOM, segfault, etc.), Python raises BrokenExecutor and the pool becomes unusable. Previously, this was caught as a generic Exception, logged, and caused silent failure of the evolution process.

Changes

Add explicit BrokenExecutor exception handling in run_evolution()
Add _recover_process_pool() method that gracefully shuts down the broken executor, runs garbage
collection, waits briefly for system stabilization, and recreates the pool
Re-queue all pending iterations after recovery
Track recovery attempts with a limit of 3 consecutive failures to prevent infinite loops
Reset recovery counter after successful iterations (only consecutive crashes count toward the limit)
Propagate BrokenExecutor from _submit_iteration() for centralized handling

Behavior
When a worker crashes:

Detect BrokenExecutor exception
Collect all pending iteration numbers
Shut down broken pool, run GC, wait 2s
Recreate fresh pool
Re-queue failed iterations
Continue evolution

If 3 crashes occur without any successful iterations in between, evolution stops gracefully.

Recreation of #395

jhewers-pf added 2 commits March 4, 2026 14:57

fix: Add worker crash recovery to ProcessParallelController

cbd4ec3

chore: Test worker crash recovery

8ce71c1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add worker crash recovery to ProcessParallelController#395#430

Add worker crash recovery to ProcessParallelController#395#430
jhewers-pf wants to merge 2 commits intoalgorithmicsuperintelligence:mainfrom
jhewers-pf:fix/worker_crash_recovery

jhewers-pf commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jhewers-pf commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant