-
Notifications
You must be signed in to change notification settings - Fork 152
Add support for multiple InferencePool backends #4439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Still doing some testing, just wanted to run pipeline, will promote to ready to review PR when cleaned up. |
e611f14 to
a8cbd36
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4439 +/- ##
==========================================
+ Coverage 86.07% 86.20% +0.12%
==========================================
Files 132 132
Lines 14389 14525 +136
Branches 35 35
==========================================
+ Hits 12386 12521 +135
- Misses 1793 1794 +1
Partials 210 210 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds support for multiple InferencePool backends on a Route, enabling weighted traffic distribution across inference backends. Previously, routes were limited to a single InferencePool backend per rule.
Key Changes:
- Removed restriction preventing multiple InferencePool backends in a single rule
- Added validation to prevent mixing InferencePool and non-InferencePool backends
- Implemented deduplication of inference maps to handle multiple backends efficiently
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
tests/Makefile |
Enabled GatewayWeightedAcrossTwoInferencePools conformance test and added --ignore-not-found flags for cleanup commands |
internal/controller/state/graph/httproute_test.go |
Added comprehensive test cases for multiple weighted InferencePool backends with and without HTTP matches |
internal/controller/state/graph/httproute.go |
Replaced single-backend restriction with validation for mixed backend types and added checkForMixedBackendTypes function |
internal/controller/nginx/config/split_clients_test.go |
Added test cases for inference backends with endpoint picker configs and split client value generation |
internal/controller/nginx/config/split_clients.go |
Updated split client generation to support inference backend groups with specialized variable naming |
internal/controller/nginx/config/servers_test.go |
Added extensive test coverage for multiple inference backend scenarios with various match conditions |
internal/controller/nginx/config/servers.go |
Refactored location generation to support multiple inference backends with proper EPP and proxy pass locations |
internal/controller/nginx/config/maps_test.go |
Added test cases for unique backend deduplication and failure mode verification |
internal/controller/nginx/config/maps.go |
Implemented deduplication logic using a map to prevent duplicate inference backend entries |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| { | ||
| Path: "= /inference-match", | ||
| Type: http.RedirectLocationType, | ||
| HTTPMatchKey: "1_1", |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The HTTPMatchKey is "1_1" but the test name and context suggest this is for pathRuleInferenceWithMatch which has rule index 1. However, in the mixed test case at line 2875, the same pathRuleInferenceWithMatch also uses HTTPMatchKey "1_1". This creates ambiguity since both test cases reference the same match key but they represent different scenarios in the configuration. The HTTPMatchKey should be unique per path rule and match combination to avoid potential configuration conflicts.
| // This ensures the correct name gets generated to correlate with the split clients generation. | ||
| // If there is only one backend, this is effectively a no-op. | ||
| tempRule := r | ||
| tempRule.BackendGroup.Backends = []dataplane.Backend{b} |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a shallow copy of the rule and modifying its BackendGroup.Backends slice could lead to unexpected behavior if other parts of the code assume the BackendGroup structure is immutable. Consider using a deep copy or creating a new MatchRule with only the necessary fields to make the intent clearer and avoid potential side effects.
| tempRule.BackendGroup.Backends = []dataplane.Backend{b} | |
| // Deep copy BackendGroup with only the desired backend | |
| tempRule.BackendGroup = &dataplane.BackendGroup{ | |
| Source: r.BackendGroup.Source, | |
| RuleIdx: r.BackendGroup.RuleIdx, | |
| Backends: []dataplane.Backend{b}, | |
| } |
sjberman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this. It really is a complex mess to build all of these locations, and I'm hopeful in the future we can improve it, potentially with improvements in NGINX where we don't need the NJS matching module, as well as potentially the inference Rust module to skip the inference nested locations.
Can you verify that if a ClientSettingsPolicy with maxSize is set, that it gets propagated into every location down the chain?
| location /_ngf-internal-upstreamNamePrimary-routeNS-routeName-rule0 { | ||
| set $epp_internal_path /_ngf-internal-proxy-pass-rule0-route0-backend0-inference; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have internal;?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, have the conf pulled up and verified it has internal too.
| location /_ngf-internal-upstreamNameSecondary-routeNS-routeName-rule0 { | ||
| set $epp_internal_path /_ngf-internal-proxy-pass-rule0-route0-backend1-inference; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have internal;?
| External location calls httpmatch NJS module. The module determines the HTTP request conditions that exist | ||
| and which backend to use, then redirects to an internal location which will rewrite to another internal inference | ||
| location based on a split clients variable. That internal inference location calls the inference NJS module | ||
| to get the AI endpoint to proxy to, then redirects to the internal location that proxies to the backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish this was less insane lol, but unfortunately it seems necessary right now.
| ruleIdx int, | ||
| ) http.Location { | ||
| return http.Location{ | ||
| Path: inferencePath(pathruleIdx, matchRuleIdx), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're losing pathruleIdx here, couldn't that mean that two different pathrules could collide in naming?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, i need to update a comment somewhere talking about the naming. The ruleIdx used here is the backendGroup ruleIdx which is the index on a specific HTTPRoute.
That is different than the pathRuleIdx which is some collapsed data structure spanning across all rules on this path, which is really confusing.
But tldr, I don't think so because we attach both the UpstreamName, and httproute NsName, with the rule idx in the httproute. So with the httprouteNsName and the ruleIdx of the httproute, alongside the unique UpstreamName, i think that guarantees unique naming. Only scenario i could see is something like a backendgroup with multiple of the same backend which doesn't make sense to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we can't keep pathruleIdx because in the split_clients.go file, I couldn't find a way for it to access that field from the backendgroup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm...couldn't you have a route with multiple rules with different matching conditions, with splitting across the same inferencepool backends?
It's weird, but trying to think if there is a scenario where a collision could actually happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, yea you're right, a route with even a single rule, but multiple matching conditions will run into this case. let me see what i can do
Proposed changes
Add support for multiple InferencePool backends on a Route.
Problem: A route should be able to have multiple InferencePools in its backendRefs.
Solution: Add support for multiple InferencePool backends. Added logic to remove duplicated inference maps.
Testing: Added unit tests and enabled correlating
GatewayWeightedAcrossTwoInferencePoolsconformance test. Manually tested situations for multiple inferencepool backends with and without http matches.Closes #4192
Checklist
Before creating a PR, run through this checklist and mark each as complete.
Release notes
If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.