Skip to content

Fix supervisor: report vault execution so stuck-scan order isn't fixed#187

Open
holyfuchs wants to merge 13 commits intomainfrom
holyfuchs/supervisor-fix
Open

Fix supervisor: report vault execution so stuck-scan order isn't fixed#187
holyfuchs wants to merge 13 commits intomainfrom
holyfuchs/supervisor-fix

Conversation

@holyfuchs
Copy link
Member

@holyfuchs holyfuchs commented Feb 25, 2026

Closes: #177

Description

The supervisor “check the first N vaults” logic was fixed: vault executions are now reported to the registry, which keeps an ordered list of “least recently executed” vaults. The supervisor then scans only those first N (e.g. 5) and recovers the ones that are actually stuck, instead of always the same fixed set.

What was implemented

  • Execution callback
    Each AutoBalancer now has an execution callback that runs after a scheduled rebalance. The callback calls the registry with that vault’s id so the registry can update its internal order (remove id from the list, append to the end).

  • Shared callback resource
    In FlowYieldVaultsAutoBalancers, a single RegistryReportCallback resource per account implements DeFiActions.AutoBalancerExecutionCallback. Its onExecuted(balancerUUID) calls the registry so the vault that just ran is reported by id. Every new AutoBalancer gets a capability to this shared callback and passes it to setExecutionCallback(cap).

Context (from discussion)

The supervisor was limited to processing a small batch (e.g. first 5 vaults) per run. The agreed short-term approach was to order the vault list by “last executed” so the supervisor always checks the oldest / least recently executed vaults first (most likely stuck).

@holyfuchs holyfuchs force-pushed the holyfuchs/supervisor-fix branch from a6cce3f to b2af175 Compare February 25, 2026 20:16
@holyfuchs holyfuchs changed the title feat(scheduler): FlowAutoBalancer + stuck-scan order for Supervisor Fix supervisor: report vault execution so stuck-scan order isn't fixed Feb 25, 2026
@holyfuchs holyfuchs self-assigned this Mar 2, 2026
@holyfuchs holyfuchs marked this pull request as ready for review March 3, 2026 19:44
@holyfuchs holyfuchs requested a review from a team as a code owner March 3, 2026 19:44
@holyfuchs
Copy link
Member Author

holyfuchs commented Mar 4, 2026

Failing tests are due to rounding issues and should be fixed with:
onflow/FlowALP#188

Just for info:
This PR required changes in FlowActions.
FlowYieldVaults has FlowALP as submodule which has FlowActions as submodule, hence the need to update FlowALP.
But since FlowALP main contains changes currently not compatible with FlowYieldVaults, we are using a different branch.

@holyfuchs
Copy link
Member Author

failing tests need: #182

Copy link
Member

@Kay-Zee Kay-Zee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but you mentioned it still requires FlowActions changes? or are those resolved.

if !(self.yieldVaultRegistry[yieldVaultID] ?? false) {
return
}
if let index = self.stuckScanOrder.firstIndex(of: yieldVaultID) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that there's a solution, but wonder what the limit of the size of this array will be before we see problems, since we're just doing array ops.

Copy link
Member Author

@holyfuchs holyfuchs Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a "solution".
In the most recent commit now I used a linked list.
Not really happy about implementing a linked list in cadence but it gets execution time down to O(1).
If you prefer the old version without that added complexity we can always revert that commit.

added autobalancer callback to find potentially stuck vaults
@holyfuchs holyfuchs force-pushed the holyfuchs/supervisor-fix branch from 293dd3a to c7fa0e6 Compare March 10, 2026 05:03
@liobrasil
Copy link
Contributor

I do have this PR against the actual @holyfuchs 's branch: #207


/// Node in the simulated doubly-linked list used for O(1) stuck-scan ordering.
/// `prev` points toward the head (most recently executed); `next` points toward the tail (oldest/least recently executed).
access(all) struct ListNode {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work with the doubly linked list!
Minor comment, I think we can extract it from this contract into a utility contract for cleaner code. I asked Claude, decision up to you:

The core list logic is already self-contained. The only coupling is in getStuckScanCandidates, which touches scheduleCaps and calls dequeuePending. The node operations themselves touch nothing external.

Here's the clean extraction boundary:

┌──────────────────────────────────────────────────────────────┐
│  NEW: UInt64LinkedList contract  (or resource)               │
│                                                              │
│  struct ListNode { prev: UInt64?, next: UInt64? }            │
│  var nodes: {UInt64: ListNode}                               │
│  var head: UInt64?                                           │
│  var tail: UInt64?                                           │
│                                                              │
│  access(account) fun insertAtHead(id: UInt64)                │
│  access(account) fun remove(id: UInt64): Bool                │
│  access(all) view fun contains(id: UInt64): Bool             │
│  access(all) view fun tailWalk(limit: UInt): [UInt64]        │
└──────────────────────────────────────────────────────────────┘
          ▲ called by
┌──────────────────────────────────────────────────────────────┐
│  FlowYieldVaultsSchedulerRegistry  (unchanged business API)  │
│                                                              │
│  register() → list.insertAtHead(id) if participates         │
│  unregister() → list.remove(id)                             │
│  reportExecution() → list.remove(id); list.insertAtHead(id) │
│  getStuckScanCandidates() → list.tailWalk() + prune logic    │
│    (coupling to scheduleCaps + dequeuePending stays here)    │
└──────────────────────────────────────────────────────────────┘

As a resource (cleanest Cadence idiom):

access(all) contract UInt64LinkedList {
    access(all) struct ListNode {
        access(all) var prev: UInt64?
        access(all) var next: UInt64?
        // ... setters ...
    }

    access(all) resource List {
        access(all) var nodes: {UInt64: ListNode}
        access(all) var head: UInt64?
        access(all) var tail: UInt64?

        access(all) fun insertAtHead(id: UInt64) { ... }  // O(1)
        access(all) fun remove(id: UInt64): Bool { ... }  // O(1)
        access(all) view fun contains(id: UInt64): Bool { ... }
    }

    access(all) fun create(): @List {
        return <- create List(...)
    }
}

SchedulerRegistry stores one @UInt64LinkedList.List resource and delegates node operations to it. Changes required in SchedulerRegistry: replace the three fields and two private functions with a single @UInt64LinkedList.List field — roughly a 40-line reduction. The getStuckScanCandidates signature and behavior are unchanged.

The only decision is whether to use a resource (idiomatic, but requires the registry to store it) or a separate contract with its own module-level state (simpler to call, but less composable). The resource approach is preferred since it makes the list lifetime explicit and lets you instantiate multiple lists per contract if needed later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Supervisor is only supervising 5 vaults

5 participants