Add support for multiple InferencePool backends #4439

bjee19 · 2025-12-10T23:32:33Z

Proposed changes

Add support for multiple InferencePool backends on a Route.

Problem: A route should be able to have multiple InferencePools in its backendRefs.

Solution: Add support for multiple InferencePool backends. Added logic to remove duplicated inference maps.

Testing: Added unit tests and enabled correlating GatewayWeightedAcrossTwoInferencePools conformance test. Manually tested situations for multiple inferencepool backends with and without http matches.

Closes #4192

Checklist

Before creating a PR, run through this checklist and mark each as complete.

I have read the CONTRIBUTING doc
I have added tests that prove my fix is effective or that my feature works
I have checked that all unit tests pass after adding my changes
I have updated necessary documentation
I have rebased my branch onto main
I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

Release notes

If this PR introduces a change that affects users and needs to be mentioned in the release notes,
please add a brief note that summarizes the change.

Add support for multiple InferencePool backends on a Route.

bjee19 · 2025-12-10T23:33:04Z

Still doing some testing, just wanted to run pipeline, will promote to ready to review PR when cleaned up.

codecov · 2025-12-10T23:40:09Z

Codecov Report

❌ Patch coverage is 98.19005% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.20%. Comparing base (7046732) to head (6127cbf).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/controller/nginx/config/servers.go	98.80%	1 Missing and 1 partial ⚠️
internal/controller/nginx/config/split_clients.go	88.88%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4439      +/-   ##
==========================================
+ Coverage   86.07%   86.20%   +0.12%     
==========================================
  Files         132      132              
  Lines       14389    14525     +136     
  Branches       35       35              
==========================================
+ Hits        12386    12521     +135     
- Misses       1793     1794       +1     
  Partials      210      210

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR adds support for multiple InferencePool backends on a Route, enabling weighted traffic distribution across inference backends. Previously, routes were limited to a single InferencePool backend per rule.

Key Changes:

Removed restriction preventing multiple InferencePool backends in a single rule
Added validation to prevent mixing InferencePool and non-InferencePool backends
Implemented deduplication of inference maps to handle multiple backends efficiently

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`tests/Makefile`	Enabled `GatewayWeightedAcrossTwoInferencePools` conformance test and added `--ignore-not-found` flags for cleanup commands
`internal/controller/state/graph/httproute_test.go`	Added comprehensive test cases for multiple weighted InferencePool backends with and without HTTP matches
`internal/controller/state/graph/httproute.go`	Replaced single-backend restriction with validation for mixed backend types and added `checkForMixedBackendTypes` function
`internal/controller/nginx/config/split_clients_test.go`	Added test cases for inference backends with endpoint picker configs and split client value generation
`internal/controller/nginx/config/split_clients.go`	Updated split client generation to support inference backend groups with specialized variable naming
`internal/controller/nginx/config/servers_test.go`	Added extensive test coverage for multiple inference backend scenarios with various match conditions
`internal/controller/nginx/config/servers.go`	Refactored location generation to support multiple inference backends with proper EPP and proxy pass locations
`internal/controller/nginx/config/maps_test.go`	Added test cases for unique backend deduplication and failure mode verification
`internal/controller/nginx/config/maps.go`	Implemented deduplication logic using a map to prevent duplicate inference backend entries

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

internal/controller/nginx/config/servers.go

Copilot · 2025-12-11T18:44:05Z

internal/controller/nginx/config/servers_test.go

+				{
+					Path:         "= /inference-match",
+					Type:         http.RedirectLocationType,
+					HTTPMatchKey: "1_1",


The HTTPMatchKey is "1_1" but the test name and context suggest this is for pathRuleInferenceWithMatch which has rule index 1. However, in the mixed test case at line 2875, the same pathRuleInferenceWithMatch also uses HTTPMatchKey "1_1". This creates ambiguity since both test cases reference the same match key but they represent different scenarios in the configuration. The HTTPMatchKey should be unique per path rule and match combination to avoid potential configuration conflicts.

Copilot · 2025-12-11T18:44:05Z

internal/controller/nginx/config/servers.go

+				// This ensures the correct name gets generated to correlate with the split clients generation.
+				// If there is only one backend, this is effectively a no-op.
+				tempRule := r
+				tempRule.BackendGroup.Backends = []dataplane.Backend{b}


Creating a shallow copy of the rule and modifying its BackendGroup.Backends slice could lead to unexpected behavior if other parts of the code assume the BackendGroup structure is immutable. Consider using a deep copy or creating a new MatchRule with only the necessary fields to make the intent clearer and avoid potential side effects.

Suggested change

tempRule.BackendGroup.Backends = []dataplane.Backend{b}

// Deep copy BackendGroup with only the desired backend

tempRule.BackendGroup = &dataplane.BackendGroup{

Source: r.BackendGroup.Source,

RuleIdx: r.BackendGroup.RuleIdx,

Backends: []dataplane.Backend{b},

}

internal/controller/nginx/config/maps.go

sjberman

Great work on this. It really is a complex mess to build all of these locations, and I'm hopeful in the future we can improve it, potentially with improvements in NGINX where we don't need the NJS matching module, as well as potentially the inference Rust module to skip the inference nested locations.

Can you verify that if a ClientSettingsPolicy with maxSize is set, that it gets propagated into every location down the chain?

sjberman · 2025-12-11T18:38:59Z

internal/controller/nginx/config/servers.go

+location /_ngf-internal-upstreamNamePrimary-routeNS-routeName-rule0 {
+	set $epp_internal_path /_ngf-internal-proxy-pass-rule0-route0-backend0-inference;


Should have internal;?

yep, have the conf pulled up and verified it has internal too.

sjberman · 2025-12-11T18:39:32Z

internal/controller/nginx/config/servers.go

+location /_ngf-internal-upstreamNameSecondary-routeNS-routeName-rule0 {
+	set $epp_internal_path /_ngf-internal-proxy-pass-rule0-route0-backend1-inference;


Should have internal;?

sjberman · 2025-12-11T18:49:59Z

internal/controller/nginx/config/servers.go

+External location calls httpmatch NJS module. The module determines the HTTP request conditions that exist
+and which backend to use, then redirects to an internal location which will rewrite to another internal inference
+location based on a split clients variable. That internal inference location calls the inference NJS module
+to get the AI endpoint to proxy to, then redirects to the internal location that proxies to the backend.


I wish this was less insane lol, but unfortunately it seems necessary right now.

sjberman · 2025-12-11T19:01:34Z

internal/controller/nginx/config/servers.go

+	ruleIdx int,
+) http.Location {
 	return http.Location{
-		Path: inferencePath(pathruleIdx, matchRuleIdx),


We're losing pathruleIdx here, couldn't that mean that two different pathrules could collide in naming?

Good catch, i need to update a comment somewhere talking about the naming. The ruleIdx used here is the backendGroup ruleIdx which is the index on a specific HTTPRoute.

That is different than the pathRuleIdx which is some collapsed data structure spanning across all rules on this path, which is really confusing.

But tldr, I don't think so because we attach both the UpstreamName, and httproute NsName, with the rule idx in the httproute. So with the httprouteNsName and the ruleIdx of the httproute, alongside the unique UpstreamName, i think that guarantees unique naming. Only scenario i could see is something like a backendgroup with multiple of the same backend which doesn't make sense to do.

And we can't keep pathruleIdx because in the split_clients.go file, I couldn't find a way for it to access that field from the backendgroup.

Hm...couldn't you have a route with multiple rules with different matching conditions, with splitting across the same inferencepool backends?

It's weird, but trying to think if there is a scenario where a collision could actually happen.

hm, yea you're right, a route with even a single rule, but multiple matching conditions will run into this case. let me see what i can do

github-project-automation bot added this to NGINX Gateway Fabric Dec 10, 2025

nginx-bot bot added the release-notes label Dec 10, 2025

github-project-automation bot moved this to 🆕 New in NGINX Gateway Fabric Dec 10, 2025

github-actions bot added enhancement New feature or request tests Pull requests that update tests labels Dec 10, 2025

Add support for multiple inferencepool backends

a8cbd36

bjee19 force-pushed the enh/inference-extension-multiple-backendrefs branch from e611f14 to a8cbd36 Compare December 10, 2025 23:37

bjee19 changed the title ~~Add support for multiple inferencepool backends~~ Add support for multiple InferencePool backends Dec 10, 2025

bjee19 added 3 commits December 10, 2025 16:27

Add another split clients test case

1ab44d6

Update comment with correct naming

7400fe0

Remove some comments and update array capacity calculation

69a72d3

bjee19 marked this pull request as ready for review December 11, 2025 18:25

bjee19 requested a review from a team as a code owner December 11, 2025 18:25

Update max location calculation

abe9a7b

bjee19 requested a review from Copilot December 11, 2025 18:42

Copilot AI reviewed Dec 11, 2025

View reviewed changes

Add ordering to inferenceMaps

6127cbf

sjberman reviewed Dec 11, 2025

View reviewed changes

-				tempRule.BackendGroup.Backends = []dataplane.Backend{b}
+				// Deep copy BackendGroup with only the desired backend
+				tempRule.BackendGroup = &dataplane.BackendGroup{
+					Source:  r.BackendGroup.Source,
+					RuleIdx: r.BackendGroup.RuleIdx,
+					Backends: []dataplane.Backend{b},
+				}

		location /_ngf-internal-upstreamNamePrimary-routeNS-routeName-rule0 {
		set $epp_internal_path /_ngf-internal-proxy-pass-rule0-route0-backend0-inference;

		location /_ngf-internal-upstreamNameSecondary-routeNS-routeName-rule0 {
		set $epp_internal_path /_ngf-internal-proxy-pass-rule0-route0-backend1-inference;

Add support for multiple InferencePool backends #4439

Are you sure you want to change the base?

Add support for multiple InferencePool backends #4439

Uh oh!

Conversation

bjee19 commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Release notes

Uh oh!

bjee19 commented Dec 10, 2025

Uh oh!

codecov bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sjberman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

bjee19 commented Dec 10, 2025 •

edited

Loading

codecov bot commented Dec 10, 2025 •

edited

Loading