Skip to content

fix: propagate trace context end-to-end for agent Services#1297

Open
syn-zhu wants to merge 3 commits intokagent-dev:mainfrom
syn-zhu:fix/agent-service-a2a-appprotocol
Open

fix: propagate trace context end-to-end for agent Services#1297
syn-zhu wants to merge 3 commits intokagent-dev:mainfrom
syn-zhu:fix/agent-service-a2a-appprotocol

Conversation

@syn-zhu
Copy link

@syn-zhu syn-zhu commented Feb 14, 2026

Summary

Three fixes to enable end-to-end W3C TraceContext propagation across the controller→agent boundary:

  1. AppProtocol on agent Services — Set appProtocol: kgateway.dev/a2a on the Service port created for each Agent CR so AgentGateway's A2A plugin can discover agent Services directly via protocol matching, rather than proxying through the kagent controller (which drops HTTP headers including traceparent).

  2. W3C TraceContext propagator in Python SDK — Configure the W3C TraceContext propagator in kagent-core tracing setup so agent pods correctly extract incoming traceparent headers and propagate them on outgoing requests.

  3. Trace header propagation in Go controller — The A2A server deserializes incoming HTTP requests into JSON-RPC params, discarding the original HTTP headers. When the controller forwards requests to agent pods via the A2A client, traceparent/tracestate are lost. Fix: capture W3C trace context headers from the incoming request into the Go context in the A2A auth middleware (A2AAuthenticator.Wrap), then inject them into outgoing requests in A2ARequestHandler.

All golden test outputs have been updated to include the new appProtocol field, including agent_with_passthrough (added in #1327).

Incorporates changes from opspawn@d9f2a3a.

Test plan

  • Golden tests updated and passing — all testdata/outputs/*.json files include appProtocol: "kgateway.dev/a2a" on Service ports
  • go test ./internal/httpserver/auth/... passes
  • Deploy agent, verify kubectl get svc <agent> -o jsonpath='{.spec.ports[0].appProtocol}' returns kgateway.dev/a2a
  • Verify AgentGateway A2A plugin discovers the Service
  • Send request with traceparent header through gateway → controller → agent pod, verify trace ID is preserved end-to-end

Closes #1295

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings February 14, 2026 01:01
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Agent manifest translation so that the Kubernetes Service created for each Agent CR sets an explicit appProtocol, enabling AgentGateway’s A2A plugin to discover and route directly to agent Services (preserving HTTP headers for distributed tracing).

Changes:

  • Set spec.ports[0].appProtocol: kgateway.dev/a2a on the per-Agent Service port.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@EItanya
Copy link
Contributor

EItanya commented Feb 16, 2026

Hey there, thanks for the PR, this is a great idea! You will need to update the goldens as well as sign your commits for us to merge this

@syn-zhu
Copy link
Author

syn-zhu commented Feb 20, 2026

Hey there, thanks for the PR, this is a great idea! You will need to update the goldens as well as sign your commits for us to merge this

Thanks! Just saw this now, but it looks like the issue was already fixed by opspawn@d9f2a3a :)

Gonna just close this PR, thanks!

@syn-zhu syn-zhu closed this Feb 20, 2026
@syn-zhu syn-zhu reopened this Feb 24, 2026
@syn-zhu
Copy link
Author

syn-zhu commented Feb 24, 2026

Oops @EItanya i realized the commit I linked wasn't actually a merged commit, but rather a branch. I've updated and reopened my PR to address the things you mentioned. Please lmk if there's anything else!

@syn-zhu syn-zhu force-pushed the fix/agent-service-a2a-appprotocol branch from 13799c1 to c2c5902 Compare February 24, 2026 11:12
@syn-zhu syn-zhu requested a review from peterj as a code owner February 24, 2026 11:12
@syn-zhu syn-zhu changed the title fix: set appProtocol on agent Services for A2A discovery fix: propagate trace context and enable A2A discovery for agent Services Feb 24, 2026
@syn-zhu syn-zhu force-pushed the fix/agent-service-a2a-appprotocol branch from c2c5902 to 256bbda Compare February 24, 2026 11:16
@syn-zhu syn-zhu changed the title fix: propagate trace context and enable A2A discovery for agent Services fix: propagate trace context end-to-end for agent Services Feb 24, 2026
@syn-zhu syn-zhu force-pushed the fix/agent-service-a2a-appprotocol branch from ffe2d68 to f965911 Compare February 24, 2026 11:28
@krisztianfekete
Copy link
Contributor

Hi @syn-zhu, Could you please test this locally on your end first? As it's Claude-generated code, a brief manual validation, e.g. such as posting before/after screenshots from a tracing tool would be the minimum step to ensure it's ready for contribution.

@syn-zhu
Copy link
Author

syn-zhu commented Feb 26, 2026

End-to-End Test Results: Trace Propagation

Tested on a live EKS cluster running kagent v0.7.13 with Langfuse (OTLP-backed) as the trace backend. Each test sends an A2A tasks/send request with a known traceparent header and verifies what Langfuse receives.

Setup

  • Agent: platform_assistant (Declarative agent with Anthropic via agentgateway-proxy)
  • LLM path: Agent → agentgateway-proxy → Anthropic API
  • OTEL path: Agent → gRPC OTLP → otel-collector → HTTP OTLP → Langfuse
  • Controller: kagent-controller proxies A2A requests to agent pods

Stage 0 — Baseline (upstream v0.7.13, no patches)

Metric Value
Trace ID 00000000aaaa11112222333344445555
Observations 66
Services in trace platform_assistant only
AgentGateway spans 0

Problem: Agent uses the default CompositeTextMapPropagator but never injects traceparent into outbound HTTP headers. LLM calls to agentgateway-proxy start a new disconnected trace.


Stage 1 — Commit 1 only (Python SDK: W3C propagator + AioHttpClientInstrumentor)

Sent request directly to agent pod (bypassing controller) to isolate the Python SDK changes.

Metric Value
Trace ID 11111111bbbb22223333444455556666
Observations 130
Services in trace platform_assistant + agentgateway-proxy
AgentGateway spans 4

Result: TraceContextTextMapPropagator + AioHttpClientInstrumentor causes the agent's outbound LLM calls (via litellm's aiohttp transport) to inject traceparent/tracestate headers. AgentGateway sees the trace context and its spans now join the same trace.


Stage 2 — Both commits (Python SDK + Go controller trace propagation)

A2A request sent through the controller (the production path).

Before (upstream controller, patched agent image)

Metric Value
Trace ID sent 33333333dddd44445555666677778888
Trace found in Langfuse NOT FOUND
Orphan trace ID 1159632b4b40815148c345bc9ead6dbf
Orphan observations 64
Orphan services platform_assistant + agentgateway-proxy + agentgateway-waypoint

Problem: The upstream controller strips traceparent when proxying A2A requests to agent pods. The agent creates a new root trace (1159632b...), disconnected from the caller's trace. The agent's outbound calls to agentgateway DO propagate correctly (thanks to Commit 1), but on the wrong trace ID.

After (patched controller + patched agent)

Metric Value
Trace ID sent 22222222cccc33334444555566667777
Trace found in Langfuse YES
Observations 69
Services in trace platform_assistant + agentgateway-proxy

Result: The patched controller extracts traceparent/tracestate from the incoming A2A request and re-injects them when proxying to the agent pod. The full chain — caller → controller → agent → agentgateway → LLM — now shares a single unified trace ID.


Summary

What Before After
Agent → AgentGateway propagation ❌ Disconnected ✅ Same trace
Controller → Agent propagation ❌ traceparent dropped ✅ traceparent forwarded
End-to-end trace unity ❌ 2-3 disconnected traces ✅ Single trace across all services

@syn-zhu syn-zhu force-pushed the fix/agent-service-a2a-appprotocol branch from f965911 to 3d2c566 Compare February 26, 2026 04:21
@syn-zhu
Copy link
Author

syn-zhu commented Feb 26, 2026

Screenshot 2026-02-25 at 10 26 47 PM Screenshot 2026-02-25 at 11 02 49 PM Screenshot 2026-02-25 at 10 58 28 PM Screenshot 2026-02-25 at 10 46 38 PM Screenshot 2026-02-25 at 10 46 31 PM Screenshot 2026-02-25 at 10 27 28 PM Screenshot 2026-02-25 at 10 26 57 PM

@syn-zhu
Copy link
Author

syn-zhu commented Feb 26, 2026

Hi @syn-zhu, Could you please test this locally on your end first? As it's Claude-generated code, a brief manual validation, e.g. such as posting before/after screenshots from a tracing tool would be the minimum step to ensure it's ready for contribution.

updated

Two changes to enable end-to-end W3C TraceContext propagation:

1. Add AppProtocol "kgateway.dev/a2a" to agent Service port so
   AgentGateway can discover agent Services directly via kgateway
   protocol matching, rather than proxying through the controller.
   Update all golden test outputs to include the new appProtocol field.

2. Set up W3C TraceContext propagator in the Python agent SDK tracing
   configuration so agent pods correctly extract incoming traceparent
   headers and propagate them on outgoing requests.

Fixes kagent-dev#1295

Signed-off-by: Simon Zhu <[email protected]>
…t pods

The A2A server deserializes incoming HTTP requests into JSON-RPC params,
discarding the original HTTP headers. When the controller forwards
requests to agent pods via the A2A client, trace context headers
(traceparent, tracestate) are lost, breaking distributed tracing.

Fix: capture W3C trace context headers from the incoming request into
the Go context in the A2A auth middleware, then inject them into
outgoing requests in the A2ARequestHandler. This closes the gap
between the A2A server (which strips headers) and the A2A client
(which constructs new HTTP requests).

Also update the agent_with_passthrough golden test (added in kagent-dev#1327)
to include the appProtocol field.

Signed-off-by: Simon Zhu <[email protected]>
@syn-zhu syn-zhu force-pushed the fix/agent-service-a2a-appprotocol branch from 3d2c566 to 32027e9 Compare February 26, 2026 04:25
Copy link
Contributor

@krisztianfekete krisztianfekete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks mostly good, but can you please look at the two comments I've just added?

logging.info("Enabling tracing")
# Set up W3C TraceContext propagator so incoming traceparent headers
# are extracted and outgoing requests carry them forward.
set_global_textmap(CompositeHTTPPropagator([TraceContextTextMapPropagator()]))
Copy link
Contributor

@krisztianfekete krisztianfekete Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is necessary? If it is (but I don't think it is), we should at least preserve the existing propagators that this overrides.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are unrelated to auth. Can we move these into internal/ or somewhere tracing-specific?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trace context (traceparent) not propagated from controller to agent pods

4 participants