Skip to content

fixing MCP headers usag for Llama stack 0.4.x, adding additional e2e tests for mcp servers#1080

Open
blublinsky wants to merge 2 commits intolightspeed-core:mainfrom
blublinsky:mcp-e2e-tests
Open

fixing MCP headers usag for Llama stack 0.4.x, adding additional e2e tests for mcp servers#1080
blublinsky wants to merge 2 commits intolightspeed-core:mainfrom
blublinsky:mcp-e2e-tests

Conversation

@blublinsky
Copy link
Contributor

@blublinsky blublinsky commented Jan 30, 2026

Description

This PR achieves 3 main things:

  1. Fixed headers definition for llama stack 0.4.x, which introduced a new MCP parameter - authorization
  2. Add logging to simplify MCP execution debugging
  3. Add a new e2e test for MCP servers

This PR turned out much larger then expected because of 2 issues

Root Cause 1 - Streaming Response Bug: Llama Stack's MCP tool execution uses streaming responses (Server-Sent Events), which exposed critical bugs in FastAPI's BaseHTTPMiddleware - specifically the RuntimeError: "No response returned" error that occurs when middleware tries to handle streaming endpoints.
Root Cause 2 - MCP Cleanup & Connection Management: MCP server connections and LLM streaming calls need to be properly closed AFTER the response is fully streamed, but we also need to persist conversation data to the database without blocking the stream or delaying the client.

The Fix: Required a complete architectural change:

  1. Pure ASGI middleware - Rewrote from decorator-based (@app.middleware("http")) to pure ASGI classes that directly implement the ASGI protocol, bypassing the buggy BaseHTTPMiddleware entirely
  2. Background tasks - Moved database operations and MCP cleanup to asyncio.create_task() so they run AFTER streaming completes, without blocking the response
  3. Thread-safe DB operations - Used asyncio.to_thread() to run blocking database writes in a separate thread
  4. Integration test fixes - Tests now properly wait for background tasks and mock asyncio.to_thread() to use the test database

In short: MCP streaming responses + required cleanup + database persistence = complete architectural change from decorator-based to ASGI middleware + async background tasks for DB writes and MCP cleanup.

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Claude
  • Generated by: N/A

Related Tickets & Documents

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • JSON‑RPC MCP support (initialize, tools/list, tools/call) with simulated tool execution, multiple auth modes, and conversation cache (SQLite).
  • Improvements

    • Detached background tasks for async persistence and streaming cleanup; ASGI middleware for metrics and global exception handling; richer MCP/RPC logging; container startup now writes a test token and remaps mock-server port.
  • Tests

    • Extensive new E2E MCP suite, many step helpers, and unit/integration updates to synchronize background tasks.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants