fixing MCP headers usag for Llama stack 0.4.x, adding additional e2e tests for mcp servers#1080
Open
blublinsky wants to merge 2 commits intolightspeed-core:mainfrom
Open
fixing MCP headers usag for Llama stack 0.4.x, adding additional e2e tests for mcp servers#1080blublinsky wants to merge 2 commits intolightspeed-core:mainfrom
blublinsky wants to merge 2 commits intolightspeed-core:mainfrom
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR achieves 3 main things:
This PR turned out much larger then expected because of 2 issues
Root Cause 1 - Streaming Response Bug: Llama Stack's MCP tool execution uses streaming responses (Server-Sent Events), which exposed critical bugs in FastAPI's BaseHTTPMiddleware - specifically the RuntimeError: "No response returned" error that occurs when middleware tries to handle streaming endpoints.
Root Cause 2 - MCP Cleanup & Connection Management: MCP server connections and LLM streaming calls need to be properly closed AFTER the response is fully streamed, but we also need to persist conversation data to the database without blocking the stream or delaying the client.
The Fix: Required a complete architectural change:
In short: MCP streaming responses + required cleanup + database persistence = complete architectural change from decorator-based to ASGI middleware + async background tasks for DB writes and MCP cleanup.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Improvements
Tests