Skip to content

Conversation

@bjee19
Copy link
Contributor

@bjee19 bjee19 commented Dec 10, 2025

Proposed changes

Add support for multiple InferencePool backends on a Route.

Problem: A route should be able to have multiple InferencePools in its backendRefs.

Solution: Add support for multiple InferencePool backends. Added logic to remove duplicated inference maps.

Testing: Added unit tests and enabled correlating GatewayWeightedAcrossTwoInferencePools conformance test. Manually tested situations for multiple inferencepool backends with and without http matches.

Closes #4192

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.

Add support for multiple InferencePool backends on a Route. 

@github-actions github-actions bot added enhancement New feature or request tests Pull requests that update tests labels Dec 10, 2025
@bjee19
Copy link
Contributor Author

bjee19 commented Dec 10, 2025

Still doing some testing, just wanted to run pipeline, will promote to ready to review PR when cleaned up.

@bjee19 bjee19 force-pushed the enh/inference-extension-multiple-backendrefs branch from e611f14 to a8cbd36 Compare December 10, 2025 23:37
@bjee19 bjee19 changed the title Add support for multiple inferencepool backends Add support for multiple InferencePool backends Dec 10, 2025
@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 98.19005% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.20%. Comparing base (7046732) to head (6127cbf).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
internal/controller/nginx/config/servers.go 98.80% 1 Missing and 1 partial ⚠️
internal/controller/nginx/config/split_clients.go 88.88% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4439      +/-   ##
==========================================
+ Coverage   86.07%   86.20%   +0.12%     
==========================================
  Files         132      132              
  Lines       14389    14525     +136     
  Branches       35       35              
==========================================
+ Hits        12386    12521     +135     
- Misses       1793     1794       +1     
  Partials      210      210              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bjee19 bjee19 marked this pull request as ready for review December 11, 2025 18:25
@bjee19 bjee19 requested a review from a team as a code owner December 11, 2025 18:25
@bjee19 bjee19 requested a review from Copilot December 11, 2025 18:42
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for multiple InferencePool backends on a Route, enabling weighted traffic distribution across inference backends. Previously, routes were limited to a single InferencePool backend per rule.

Key Changes:

  • Removed restriction preventing multiple InferencePool backends in a single rule
  • Added validation to prevent mixing InferencePool and non-InferencePool backends
  • Implemented deduplication of inference maps to handle multiple backends efficiently

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/Makefile Enabled GatewayWeightedAcrossTwoInferencePools conformance test and added --ignore-not-found flags for cleanup commands
internal/controller/state/graph/httproute_test.go Added comprehensive test cases for multiple weighted InferencePool backends with and without HTTP matches
internal/controller/state/graph/httproute.go Replaced single-backend restriction with validation for mixed backend types and added checkForMixedBackendTypes function
internal/controller/nginx/config/split_clients_test.go Added test cases for inference backends with endpoint picker configs and split client value generation
internal/controller/nginx/config/split_clients.go Updated split client generation to support inference backend groups with specialized variable naming
internal/controller/nginx/config/servers_test.go Added extensive test coverage for multiple inference backend scenarios with various match conditions
internal/controller/nginx/config/servers.go Refactored location generation to support multiple inference backends with proper EPP and proxy pass locations
internal/controller/nginx/config/maps_test.go Added test cases for unique backend deduplication and failure mode verification
internal/controller/nginx/config/maps.go Implemented deduplication logic using a map to prevent duplicate inference backend entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

{
Path: "= /inference-match",
Type: http.RedirectLocationType,
HTTPMatchKey: "1_1",
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The HTTPMatchKey is "1_1" but the test name and context suggest this is for pathRuleInferenceWithMatch which has rule index 1. However, in the mixed test case at line 2875, the same pathRuleInferenceWithMatch also uses HTTPMatchKey "1_1". This creates ambiguity since both test cases reference the same match key but they represent different scenarios in the configuration. The HTTPMatchKey should be unique per path rule and match combination to avoid potential configuration conflicts.

Copilot uses AI. Check for mistakes.
// This ensures the correct name gets generated to correlate with the split clients generation.
// If there is only one backend, this is effectively a no-op.
tempRule := r
tempRule.BackendGroup.Backends = []dataplane.Backend{b}
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a shallow copy of the rule and modifying its BackendGroup.Backends slice could lead to unexpected behavior if other parts of the code assume the BackendGroup structure is immutable. Consider using a deep copy or creating a new MatchRule with only the necessary fields to make the intent clearer and avoid potential side effects.

Suggested change
tempRule.BackendGroup.Backends = []dataplane.Backend{b}
// Deep copy BackendGroup with only the desired backend
tempRule.BackendGroup = &dataplane.BackendGroup{
Source: r.BackendGroup.Source,
RuleIdx: r.BackendGroup.RuleIdx,
Backends: []dataplane.Backend{b},
}

Copilot uses AI. Check for mistakes.
Copy link
Collaborator

@sjberman sjberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work on this. It really is a complex mess to build all of these locations, and I'm hopeful in the future we can improve it, potentially with improvements in NGINX where we don't need the NJS matching module, as well as potentially the inference Rust module to skip the inference nested locations.

Can you verify that if a ClientSettingsPolicy with maxSize is set, that it gets propagated into every location down the chain?

Comment on lines +321 to +322
location /_ngf-internal-upstreamNamePrimary-routeNS-routeName-rule0 {
set $epp_internal_path /_ngf-internal-proxy-pass-rule0-route0-backend0-inference;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have internal;?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, have the conf pulled up and verified it has internal too.

Comment on lines +331 to +332
location /_ngf-internal-upstreamNameSecondary-routeNS-routeName-rule0 {
set $epp_internal_path /_ngf-internal-proxy-pass-rule0-route0-backend1-inference;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have internal;?

External location calls httpmatch NJS module. The module determines the HTTP request conditions that exist
and which backend to use, then redirects to an internal location which will rewrite to another internal inference
location based on a split clients variable. That internal inference location calls the inference NJS module
to get the AI endpoint to proxy to, then redirects to the internal location that proxies to the backend.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish this was less insane lol, but unfortunately it seems necessary right now.

ruleIdx int,
) http.Location {
return http.Location{
Path: inferencePath(pathruleIdx, matchRuleIdx),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're losing pathruleIdx here, couldn't that mean that two different pathrules could collide in naming?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, i need to update a comment somewhere talking about the naming. The ruleIdx used here is the backendGroup ruleIdx which is the index on a specific HTTPRoute.

That is different than the pathRuleIdx which is some collapsed data structure spanning across all rules on this path, which is really confusing.

But tldr, I don't think so because we attach both the UpstreamName, and httproute NsName, with the rule idx in the httproute. So with the httprouteNsName and the ruleIdx of the httproute, alongside the unique UpstreamName, i think that guarantees unique naming. Only scenario i could see is something like a backendgroup with multiple of the same backend which doesn't make sense to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And we can't keep pathruleIdx because in the split_clients.go file, I couldn't find a way for it to access that field from the backendgroup.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm...couldn't you have a route with multiple rules with different matching conditions, with splitting across the same inferencepool backends?

It's weird, but trying to think if there is a scenario where a collision could actually happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, yea you're right, a route with even a single rule, but multiple matching conditions will run into this case. let me see what i can do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request release-notes tests Pull requests that update tests

Projects

Status: 🆕 New

Development

Successfully merging this pull request may close these issues.

Support multiple backend refs when ref is an InferencePool

3 participants