-
Notifications
You must be signed in to change notification settings - Fork 462
Description
Summary
When multiple SSE streams exist for the same session (e.g. from POST response reconnections), LocalSessionWorker::resume() unconditionally replaces self.common.tx, killing the other stream's receiver. Both EventSource connections then reconnect every sse_retry seconds, leapfrogging each other in an infinite loop that floods the server with GET requests.
Affected versions: rmcp 0.14.0, 0.15.0
Severity: Critical — causes infinite reconnect loops with clients like Cursor, and breaks server-to-client notifications over Streamable HTTP
Root Cause
The MCP Streamable HTTP transport sends POST SSE responses with a priming event containing retry: 3000. When the POST stream ends (after delivering the response), the browser's EventSource API automatically reconnects via GET. This creates multiple competing EventSource connections:
- The initial standalone GET stream (primary notification channel)
- Reconnecting GETs from completed POST responses (initialize, tools/list, etc.)
Each reconnecting GET calls resume() which unconditionally replaces self.common.tx:
// Before fix — local.rs resume()
None => {
let (tx, rx) = tokio::sync::mpsc::channel(self.session_config.channel_capacity);
self.common.tx = tx; // ← Unconditionally replaces sender, kills other stream
// ...
}Dropping the old sender closes the old receiver, terminating the OTHER EventSource's stream. That stream then reconnects, replacing the sender again. Both leapfrog every sse_retry (3s) indefinitely.
The Leapfrog Loop
1. Client POST initialize → SSE response with priming (retry: 3000) → stream ends
2. Client GET (standalone) → becomes primary common channel (tx1/rx1)
3. POST EventSource reconnects via GET (3s later) → replaces common.tx → kills rx1
4. GET from step 2 reconnects → replaces common.tx → kills stream from step 3
5. Repeat every 3 seconds indefinitely
Server logs confirm the pattern — alternating GET requests every 3 seconds with different Last-Event-ID values:
13:33:51.670 GET Last-Event-ID: 0/2 ← from completed POST response
13:33:54.668 GET Last-Event-ID: 0 ← from killed standalone stream
13:33:57.679 GET Last-Event-ID: 0/2 ← leapfrog
13:34:00.674 GET Last-Event-ID: 0 ← leapfrog
...
Additional Issue: Cache Replay Loop
Even without the leapfrog, resume() called sync() on the common channel to replay cached events. Replaying server-initiated list_changed notifications caused clients to re-process old signals, triggering unnecessary re-fetches every reconnection cycle.
What Happens in Practice
Cursor (infinite loop)
- Connects via POST initialize + GET standalone
- POST SSE stream ends → EventSource reconnects via GET
- Two competing streams leapfrog every 3 seconds
- Server flooded with GET requests indefinitely
- Notifications intermittently lost as channels are swapped
VS Code (silent notification loss)
- Reconnects SSE every ~5 minutes with same session ID
- Each reconnection replaces the channel sender
- Previous stream's receiver is orphaned
notify_tool_list_changed().awaitreturnsOk(())— silent failure
Fix: Shadow Channels
PR: #660
Instead of unconditionally replacing the common channel, check if the primary is still active:
- Primary dead (
tx.is_closed()) → Replace it. New stream becomes primary. - Primary alive → Create a shadow stream — an idle SSE connection kept alive by SSE keep-alive pings that does NOT receive notifications and does NOT replace the primary channel.
fn resume_or_shadow_common(&mut self) -> Result<StreamableHttpMessageReceiver, SessionError> {
let (tx, rx) = tokio::sync::mpsc::channel(self.session_config.channel_capacity);
if self.common.tx.is_closed() {
// Primary is dead — replace it
self.common.tx = tx;
} else {
// Primary is alive — create shadow (idle, keep-alive only)
self.shadow_txs.push(tx);
}
Ok(StreamableHttpMessageReceiver { http_request_id: None, inner: rx })
}Why Not 409 Conflict?
The initial approach (matching the TypeScript SDK) was to return 409 Conflict on duplicate standalone streams. However:
- The MCP spec states: "The client MAY remain connected to multiple SSE streams simultaneously" — 409 is not spec-compliant
- 409 causes Cursor to fail entirely on reconnection (500 errors from unhandled Conflict)
- The reconnecting EventSources are legitimate HTTP requests — they need a valid stream back
Shadow channels are the correct approach: keep all connections alive without interference.
Why No Cache Replay on Common Channel?
Common channel notifications (tools/list_changed, resources/list_changed) are idempotent signals. Replaying cached ones causes clients to re-process old events, triggering unnecessary re-fetches or infinite notification loops. Missing one is harmless — the next real event arrives naturally. Request-wise channels still use sync() for proper response replay.
Changes (5 commits)
| Commit | Description |
|---|---|
8bd424e |
Initial 409 Conflict approach (returned error on duplicate standalone stream) |
0d03eb5 |
Handle resume with completed request-wise channels (fall through to common) |
a7bb822 |
Remove 409 Conflict — allow channel replacement per MCP spec |
7cf5406 |
Skip cache replay (sync) when replacing active streams |
a7df58c |
Shadow channels — the final fix that prevents the leapfrog loop |
Files Changed
crates/rmcp/src/transport/streamable_http_server/session/local.rs- Added
shadow_txs: Vec<Sender<ServerSseMessage>>toLocalSessionWorker - New method
resume_or_shadow_common()with primary-alive check - Updated
resume()to use shadow logic for both direct common and request-wise fallback paths - Removed
sync()calls on common channel resume - Updated
close_sse_stream()to clear shadow senders - Updated
create_local_session()to initializeshadow_txs
- Added
Test Results
- Cursor connects and initializes successfully (no 409/500 errors)
- Cursor does NOT enter infinite GET reconnect loop after connection
- Feature changes trigger exactly one batch of list_changed notifications
- Cursor receives and processes notifications correctly (re-fetches tools/resources)
- No notification replay loop (no repeated ResourceListChanged every 3s)
- VS Code connects and works correctly (unaffected by changes)
-
cargo check --workspacepasses
Environment
- rmcp 0.15.0 (also affects 0.14.0)
StreamableHttpServicewithstateful_mode: trueLocalSessionManager(default session manager)- Clients tested: Cursor 2.4.37, VS Code MCP Extension