HTTP transport: respect retry_on_reload and add stale connection detection#798
HTTP transport: respect retry_on_reload and add stale connection detection#798
Conversation
Reviewer's GuideExtends the HTTP/WebSocket PluginHub transport to respect the retry_on_reload flag and adds pre-send WebSocket liveness checking with proper stale session eviction, while wiring the flag through unity_transport and covering behavior with new tests. Sequence diagram for PluginHub send_command_for_instance with retry_on_reload and liveness checkssequenceDiagram
actor Client
participant UnityTransport
participant PluginHub
participant Registry
participant WebSocket
Client->>UnityTransport: send_with_unity_instance(unity_instance, command_type, params, retry_on_reload)
UnityTransport->>UnityTransport: normalize retry_on_reload from kwargs
UnityTransport->>PluginHub: send_command_for_instance(unity_instance, command_type, params, user_id, retry_on_reload)
activate PluginHub
PluginHub->>PluginHub: _resolve_session_id(unity_instance, user_id, retry_on_reload)
alt no session found
PluginHub-->>UnityTransport: _unavailable_retry_response(no_unity_session)
UnityTransport-->>Client: failure with hint retry
else session resolved
PluginHub->>PluginHub: _ensure_live_connection(session_id)
alt connection live
PluginHub->>WebSocket: send command over WebSocket
WebSocket-->>PluginHub: command_result
else connection stale
PluginHub->>PluginHub: _evict_connection(session_id, stale_websocket_state)
alt retry_on_reload is false
PluginHub-->>UnityTransport: _unavailable_retry_response(stale_connection)
UnityTransport-->>Client: failure with hint retry
else retry_on_reload is true
PluginHub->>PluginHub: _resolve_session_id(unity_instance, user_id, true)
alt no session after retry
PluginHub-->>UnityTransport: _unavailable_retry_response(no_unity_session)
UnityTransport-->>Client: failure with hint retry
else new session resolved
PluginHub->>PluginHub: _ensure_live_connection(new_session_id)
alt new connection still stale
PluginHub-->>UnityTransport: _unavailable_retry_response(stale_connection)
UnityTransport-->>Client: failure with hint retry
else new connection live
PluginHub->>WebSocket: send command over WebSocket
WebSocket-->>PluginHub: command_result
end
end
end
end
PluginHub-->>UnityTransport: success response
UnityTransport-->>Client: normalized response
end
deactivate PluginHub
Class diagram for updated PluginHub HTTP transport retry and liveness handlingclassDiagram
class PluginHub {
+_get_connection(session_id: str) WebSocket
+_evict_connection(session_id: str, reason: str) None
+_ensure_live_connection(session_id: str) bool
+_unavailable_retry_response(reason: str) dict
+_resolve_session_id(unity_instance: str, user_id: str, retry_on_reload: bool) str
+send_command_for_instance(unity_instance: str, command_type: str, params: dict, user_id: str, retry_on_reload: bool) dict
}
class UnityTransport {
+send_with_unity_instance(unity_instance: str, command_type: str, params: dict, user_id: str, retry_on_reload: bool) dict
}
class PluginRegistry {
+unregister(session_id: str) None
}
class WebSocket {
+client_state: WebSocketState
+application_state: WebSocketState
+close(code: int) None
}
class PingTask {
+cancel() None
}
class PendingEntry {
+session_id: str
+future: Future
}
class MCPResponse {
+success: bool
+error: str
+hint: str
+data: dict
+model_dump() dict
}
PluginHub --> PluginRegistry : uses
PluginHub --> WebSocket : manages
PluginHub --> PingTask : manages ping_tasks
PluginHub --> PendingEntry : manages pending
PluginHub --> MCPResponse : builds retry responses
UnityTransport --> PluginHub : calls send_command_for_instance
File-Level Changes
Assessment against linked issues
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
Caution Review failedThe pull request is closed. 📝 WalkthroughWalkthroughThis pull request introduces robust session lifecycle management and retry control to the PluginHub. New helper methods (_evict_connection, _ensure_live_connection, _unavailable_retry_response) enable per-session cleanup and proactive liveness checking. The _resolve_session_id and send_command_for_instance methods gain a retry_on_reload parameter controlling reconnect waiting behavior. The HTTP transport layer forwards this flag to PluginHub, and integration tests validate fast-fail and no-wait paths. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant PluginHub
participant Session
participant Unity
rect rgba(100, 149, 237, 0.5)
Note over Client,Unity: retry_on_reload=True (Default Path)
Client->>PluginHub: send_command_for_instance(retry_on_reload=True)
PluginHub->>PluginHub: _resolve_session_id(retry_on_reload=True)
PluginHub->>PluginHub: Wait for reconnect if needed
PluginHub->>PluginHub: _ensure_live_connection
PluginHub->>Session: Check WebSocket state
Session-->>PluginHub: Connection active
PluginHub->>Session: Send command
Session->>Unity: Execute
Unity-->>Session: Response
Session-->>PluginHub: Response data
PluginHub-->>Client: Success response
end
rect rgba(255, 99, 71, 0.5)
Note over Client,Unity: retry_on_reload=False (Fast-Fail Path)
Client->>PluginHub: send_command_for_instance(retry_on_reload=False)
PluginHub->>PluginHub: _resolve_session_id(retry_on_reload=False)
PluginHub->>PluginHub: No wait, resolve immediately
PluginHub->>PluginHub: _ensure_live_connection
PluginHub->>Session: Check WebSocket state
Session-->>PluginHub: Connection stale/missing
PluginHub-->>Client: Unavailable retry response (fail fast)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related issues
Possibly related PRs
Poem
✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In
unity_transport.send_with_unity_instance, consider usingkwargs.pop('retry_on_reload', True)instead ofkwargs.get(...)so the helper does not unintentionally forward theretry_on_reloadflag to downstream send functions (e.g., the stdio path) that may not expect it. - In
_evict_connection, you gather matching entries fromcls._pendingbut never remove those entries from the mapping; if_pendingis long-lived this may lead to accumulation of stale entries, so it may be worth pruning them when you collect the associated futures.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In `unity_transport.send_with_unity_instance`, consider using `kwargs.pop('retry_on_reload', True)` instead of `kwargs.get(...)` so the helper does not unintentionally forward the `retry_on_reload` flag to downstream send functions (e.g., the stdio path) that may not expect it.
- In `_evict_connection`, you gather matching entries from `cls._pending` but never remove those entries from the mapping; if `_pending` is long-lived this may lead to accumulation of stale entries, so it may be worth pruning them when you collect the associated futures.
## Individual Comments
### Comment 1
<location> `Server/src/transport/plugin_hub.py:572-573` </location>
<code_context>
+ if ping_task is not None and not ping_task.done():
+ ping_task.cancel()
+
+ for future in pending_futures:
+ future.set_exception(
+ PluginDisconnectedError(
+ f"Unity plugin session {session_id} disconnected while awaiting command_result"
</code_context>
<issue_to_address>
**issue (bug_risk):** Re-check `future.done()` before `set_exception` to avoid race-induced `InvalidStateError`.
Because `pending_futures` is captured under the lock but `set_exception` is called after releasing it, a future can complete in between. Calling `set_exception` on a completed future raises `InvalidStateError`. To avoid this, either re-check `if not future.done():` immediately before `set_exception`, or wrap `set_exception` in a try/except that ignores `InvalidStateError`.
</issue_to_address>
### Comment 2
<location> `Server/src/transport/plugin_hub.py:558-562` </location>
<code_context>
+ websocket = cls._connections.pop(session_id, None)
+ ping_task = cls._ping_tasks.pop(session_id, None)
+ cls._last_pong.pop(session_id, None)
+ for entry in cls._pending.values():
+ if entry.get("session_id") == session_id:
+ future = entry.get("future")
+ if future and not future.done():
+ pending_futures.append(future)
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Consider removing `_pending` entries for the evicted session to avoid lingering state.
The eviction logic fails associated futures but leaves their `_pending` entries intact. This can leave stale state for fully-evicted sessions and potentially cause subtle behavior or memory growth. Consider collecting matching keys while holding the lock and removing those `_pending` entries as part of eviction.
```suggestion
pending_futures: list[asyncio.Future] = []
async with lock:
websocket = cls._connections.pop(session_id, None)
ping_task = cls._ping_tasks.pop(session_id, None)
cls._last_pong.pop(session_id, None)
# Collect and remove any pending entries associated with this session.
pending_keys_to_remove: list[object] = []
for key, entry in list(cls._pending.items()):
if entry.get("session_id") == session_id:
future = entry.get("future")
if future and not future.done():
pending_futures.append(future)
pending_keys_to_remove.append(key)
for key in pending_keys_to_remove:
cls._pending.pop(key, None)
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
… detection Fixes #795 — the HTTP/WebSocket transport path (PluginHub) now respects retry_on_reload=False and performs pre-send liveness checks, matching the stdio path behavior from PR #792. Co-Authored-By: Codex <[email protected]>
c446efa to
fb59d56
Compare
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
Server/src/transport/plugin_hub.py (1)
800-813:InstanceSelectionRequiredErrorcan escape the stale-connection retry path unhandled.The
try/except NoUnitySessionErrorblock at line 810 leavesInstanceSelectionRequiredErrorunhandled. For the auto-select case (unity_instance=None, originally a single session), if a second session registers during the reconnect-wait window,_resolve_session_idwill raiseInstanceSelectionRequiredErrormid-retry, producing an unexpected error format for what started as a stale-connection scenario.♻️ Proposed fix
try: session_id = await cls._resolve_session_id( unity_instance, user_id=user_id, retry_on_reload=True, ) - except NoUnitySessionError: + except (NoUnitySessionError, InstanceSelectionRequiredError): return cls._unavailable_retry_response("no_unity_session")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Server/src/transport/plugin_hub.py` around lines 800 - 813, The retry path can leak InstanceSelectionRequiredError from cls._resolve_session_id and bypass the stale-connection handling; update the exception handling around the second attempt to resolve the session id so it also catches InstanceSelectionRequiredError (in addition to NoUnitySessionError) and returns the same cls._unavailable_retry_response("stale_connection")/retry behavior; specifically modify the try/except that calls cls._resolve_session_id (in the block that previously only excepted NoUnitySessionError) to except (NoUnitySessionError, InstanceSelectionRequiredError) and handle them the same way so _ensure_live_connection + retry_on_reload logic remains consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Server/src/transport/plugin_hub.py`:
- Around line 579-583: In _evict_connection where you close the stale websocket
(the websocket.close call), replace the silent except with a debug log that
mirrors the behavior in _ping_loop: catch Exception as e and call the same
logger used in _ping_loop to emit a debug-level message including the exception
(e) and context that the close failed for the stale websocket; ensure you
reference the websocket variable in the log so diagnostics can tie the error to
the connection being evicted.
---
Nitpick comments:
In `@Server/src/transport/plugin_hub.py`:
- Around line 800-813: The retry path can leak InstanceSelectionRequiredError
from cls._resolve_session_id and bypass the stale-connection handling; update
the exception handling around the second attempt to resolve the session id so it
also catches InstanceSelectionRequiredError (in addition to NoUnitySessionError)
and returns the same cls._unavailable_retry_response("stale_connection")/retry
behavior; specifically modify the try/except that calls cls._resolve_session_id
(in the block that previously only excepted NoUnitySessionError) to except
(NoUnitySessionError, InstanceSelectionRequiredError) and handle them the same
way so _ensure_live_connection + retry_on_reload logic remains consistent.
| if websocket is not None: | ||
| try: | ||
| await websocket.close(code=1001) | ||
| except Exception: | ||
| pass |
There was a problem hiding this comment.
Add debug logging for the swallowed WebSocket close exception.
The except Exception: pass silently drops any close error, unlike the analogous block in _ping_loop (lines 509–511) which logs it. Given that _evict_connection is explicitly cleaning up a stale socket, even a debug-level log here would aid diagnostics.
🪵 Proposed fix
- if websocket is not None:
- try:
- await websocket.close(code=1001)
- except Exception:
- pass
+ if websocket is not None:
+ try:
+ await websocket.close(code=1001)
+ except Exception as close_ex:
+ logger.debug("Error closing evicted WebSocket for session %s: %s", session_id, close_ex)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if websocket is not None: | |
| try: | |
| await websocket.close(code=1001) | |
| except Exception: | |
| pass | |
| if websocket is not None: | |
| try: | |
| await websocket.close(code=1001) | |
| except Exception as close_ex: | |
| logger.debug("Error closing evicted WebSocket for session %s: %s", session_id, close_ex) |
🧰 Tools
🪛 Ruff (0.15.1)
[error] 582-583: try-except-pass detected, consider logging the exception
(S110)
[warning] 582-582: Do not catch blind exception: Exception
(BLE001)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Server/src/transport/plugin_hub.py` around lines 579 - 583, In
_evict_connection where you close the stale websocket (the websocket.close
call), replace the silent except with a debug log that mirrors the behavior in
_ping_loop: catch Exception as e and call the same logger used in _ping_loop to
emit a debug-level message including the exception (e) and context that the
close failed for the stale websocket; ensure you reference the websocket
variable in the log so diagnostics can tie the error to the connection being
evicted.
Summary
Fixes #795 — the HTTP/WebSocket transport path (PluginHub) now respects retry_on_reload=False and performs pre-send liveness checks, matching the stdio path behavior from PR #792.
Test plan
Generated with Claude Code
Co-Authored-By: Codex [email protected]
Summary by Sourcery
Respect retry_on_reload for HTTP/WebSocket Unity transport and add stale connection detection and eviction to improve domain reload behavior and error signalling.
Bug Fixes:
Enhancements:
Tests:
Summary by CodeRabbit
Bug Fixes
Tests