fix(streamable-http): prevent SSE reconnect loops with shadow channels#660
Open
its-mash wants to merge 7 commits intomodelcontextprotocol:mainfrom
Open
fix(streamable-http): prevent SSE reconnect loops with shadow channels#660its-mash wants to merge 7 commits intomodelcontextprotocol:mainfrom
its-mash wants to merge 7 commits intomodelcontextprotocol:mainfrom
Conversation
…already active LocalSessionWorker::resume() unconditionally replaced self.common.tx on every GET request, orphaning the receiver the first SSE stream was reading from. All subsequent server-to-client notifications were sent to the new sender while the original client was still listening on the old, now-dead receiver. notify_tool_list_changed().await returned Ok(()) silently. This is triggered by VS Code's MCP extension which reconnects SSE every ~5 minutes with the same session ID. Fix: Check tx.is_closed() before replacing the common channel sender. If an active stream exists, return SessionError::Conflict which is propagated as HTTP 409 Conflict. This matches the TypeScript SDK behavior (streamableHttp.ts:423). Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: Mohammod Al Amin Ashik <[email protected]>
a727949 to
8bd424e
Compare
When a client sends GET with Last-Event-ID from a completed POST SSE response, the request-wise channel no longer exists in tx_router. Previously this returned ChannelClosed -> 500, causing clients like Cursor to enter an infinite re-initialization loop. Now falls back to the common channel when the request-wise channel is completed, per MCP spec: "Resumption applies regardless of how the original stream was initiated (POST or GET)." Co-Authored-By: Claude Opus 4.6 <[email protected]>
Per MCP spec §Streamable HTTP, "The client MAY remain connected to multiple SSE streams simultaneously." Returning 409 Conflict when a second GET arrives causes Cursor to enter an infinite re-initialization loop (~3s cycle). Instead of rejecting, replace the old common channel sender. Dropping the old sender closes the old receiver, cleanly terminating the previous SSE stream so the client can reconnect on the new stream. This fixes both code paths: - GET with Last-Event-ID from a completed POST SSE response - GET without Last-Event-ID (standalone stream reconnection) Co-Authored-By: Claude Opus 4.6 <[email protected]>
When a client opens a new GET SSE stream while a previous one is still active, the old sender is dropped (terminating the old stream) and a new channel is created. Previously, sync() replayed all cached events to the new stream, but the client already received those events on the old stream. This caused an infinite notification loop: 1. Client receives notifications (e.g. ResourceListChanged) 2. Old SSE stream dies (sender replaced) 3. Client reconnects after sse_retry (3s) 4. sync() replays cached notifications the client already handled 5. Client processes them again → goto 2 Fix: check tx.is_closed() BEFORE replacing the sender. If the old stream was still alive, skip replay entirely — the client already has those events. Only replay when the old stream was genuinely dead (network failure, timeout) so the client catches up on missed events. Co-Authored-By: Claude Opus 4.6 <[email protected]>
When POST SSE responses include a `retry` field, the browser's EventSource automatically reconnects via GET after the stream ends. This creates multiple competing EventSource connections that each replace the common channel sender, killing the other stream's receiver. Both reconnect every sse_retry seconds, creating an infinite loop. Instead of always replacing the common channel, check if the primary is still active. If so, create a "shadow" stream — an idle SSE connection kept alive by keep-alive pings that doesn't receive notifications or interfere with the primary channel. Also removes cache replay (sync) on common channel resume, as replaying server-initiated list_changed notifications causes clients to re-process old signals. Signed-off-by: Myko Ash <[email protected]> Signed-off-by: Mohammod Al Amin Ashik <[email protected]>
7 tasks
Rewrite test suite for SSE channel replacement fix: - Shadow creation: standalone GET returns 200, multiple GETs coexist - Dead primary: replacement, notification delivery, repeated cycles - Notification routing: primary receives, shadow does not - Resume paths: completed request-wise, common alive/dead - Real scenarios: Cursor leapfrog, VS Code reconnect - Edge cases: invalid session, missing header, shadow cleanup Fix Accept header bug (was missing text/event-stream for notifications/initialized POST, causing 406 rejection). Co-Authored-By: Claude Opus 4.6 <[email protected]>
a4b8ffe to
c3a557e
Compare
6 tasks
MCP spec (2025-11-25) section "Session Management" requires: - Missing session ID header → 400 Bad Request (not 401) - Unknown/terminated session → 404 Not Found (not 401) Using 401 Unauthorized caused MCP clients (e.g. VS Code) to trigger full OAuth re-authentication on server restart, instead of simply re-initializing the session. Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: Mohammod Al Amin Ashik <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes infinite SSE reconnect loops caused by multiple competing EventSource connections in the Streamable HTTP server session layer. Tested with Cursor, VS Code, and Claude Desktop.
Problem
When the server sends POST SSE responses with a
retryfield (priming event), the browser's EventSource API automatically reconnects via GET after the stream ends. This creates multiple competing EventSource connections — one from the initial standalone GET, and additional ones from completed POST responses (initialize, tools/list, etc.).Each reconnecting GET called
resume()which replacedself.common.tx(the common channel sender). Dropping the old sender closed the old receiver, terminating the OTHER EventSource's stream. Both reconnected everysse_retryseconds (3s), leapfrogging each other in an infinite loop.Root Cause Analysis
The connection lifecycle that causes the loop:
Solution — Shadow Channels
Instead of always replacing the common channel sender on resume, the new
resume_or_shadow_common()method checks whether the primary common channel is still active:tx.is_closed()) → Replace it. The new stream becomes the primary notification channel.:comments) that does NOT receive notifications and does NOT replace the primary channel.Shadow streams prevent competing EventSource connections from killing each other while keeping the HTTP connections alive (so EventSource doesn't reconnect again).
Changes
8bd424e0d03eb5a7bb8227cf5406sync) when replacing active streams (replaying list_changed notifications caused notification loops)a7df58cKey Design Decisions
No cache replay on common channel resume: Server-initiated notifications (tools/list_changed, resources/list_changed) are idempotent signals. Replaying cached ones causes clients to re-process old events, triggering unnecessary re-fetches. Missing one is harmless — the next real event will arrive naturally.
Shadow streams are idle: They don't receive notifications — only the primary common channel does. This is intentional: the reconnecting POST EventSources don't need notifications (their purpose — delivering the POST response — is already fulfilled). They just need to stay alive so EventSource stops reconnecting.
Automatic cleanup: Closed shadow senders (from client disconnection) are cleaned up via
retain(!is_closed())on eachresume()call andclear()onclose_sse_stream().Request-wise channels still sync: Only the common channel skips replay. Request-wise channels (POST response streams) still use
sync()for proper resumption, since those carry actual request/response data that may need replay.Files Changed
crates/rmcp/src/transport/streamable_http_server/session/local.rsshadow_txs: Vec<Sender<ServerSseMessage>>toLocalSessionWorkerresume_or_shadow_common()with primary-alive checkresume()to use shadow logic for both direct common and request-wise fallback pathsclose_sse_stream()to clear shadow senderscreate_local_session()to initializeshadow_txsTest Plan
cargo check --workspacepasses