-
-
Notifications
You must be signed in to change notification settings - Fork 33.6k
Description
Bug report
Bug description:
Our lifetime management of JIT executors is ad-hoc and has some logical flaws.
This results in bugs, and is hard to fix as fixing one bug often seems to produce another.
Some examples:
Executor management overview
Executors are reference counted PyObject objects allowing them to be interacted with from Python.
For performance reasons, the references to currently executing executors are deferred (not counted).
Executors can be invalidated at any time, even when it would be unsafe to deallocate an object.
Invalidation means that the machine code for that executor is no longer valid. The executor object remains a valid object.
Invalidation
Invalidating an executor does not make it unreachable, but in order to free executors as soon as
possible, when an executor is made invalid, we delete references to all executors that are side exits
from the invalidated executor. The reference from the code object is also deleted.
Invalidation can happen at any time, so care must be taken not to break any invariants when
invalidating executors.
Refcount dropping to zero
Because references to currently executing executors are deferred, executors may still be live when
their reference count drops to zero.
To handle this, when an executor's refcount drops to zero, it should not be deallocated, but moved to a "zero count" list.
Also, deferring deallocation means that it safe to call Py_DECREF() on an executor, even if it would not be safe to deallocate it.
Cycles
Since cycles can be formed between executors, we need the cycle GC to handle executors.
When executors become invalid, we should break any cycles involving them.
Current implementation
The current implementation (2025-12-01) maintains a doubly linked list of all executors.
This list is scanned when a dependency is broken, _Py_Executors_InvalidateDependency and _Py_Executors_InvalidateAll, to invalidate exectors.
There is also a second double linked list, executor_deletion_list.
When the reference count of an executor reacahes zero, it is moved from the main list
to the deletion list. This helps the GC find any executors with a zero reference count quickly.
When the executor_deletion_list reaches a certain capacity, we clear it.
This is broken. We should only do this during cycle GC.
Note that invalidation and unreachability (being garbage) are two different things.
When an executor becomes invalid, it is useful to drop some references to and from it,
but it could still be being executed and may still be reachable, so its integrity as an
object must be maintained.
When an executor becomes unreachable, it should be GC'd like any other object, but
when its reference count drops to zero it may still be being executed and should be kept
intact.
Proposed implementation
Invalidation
- Detach the executor from the code object
- Clear all exits.
Refcount dropping to zero
- Move the executor to the
executor_deletion_list - Invalidate the executor
tp_clear
- Clear all exits and any constants
Change to the cycle GC
- Move all currently executing executors in the
executor_deletion_listto thecurrently_executing_list, incrementing the reference count of each. - Deallocate all executors in the
executor_deletion_list - Perform cycle GC as normal
- For all executors in the
currently_executing_list:- decrement the refcount, which will move it back to the
executor_deletion_listif its refcount drops to zero.
- decrement the refcount, which will move it back to the