Fix task creation being skipped bug due to timeouts#996
Open
khandelwal-ayush wants to merge 2 commits intolinkedin:masterfrom
Open
Fix task creation being skipped bug due to timeouts#996khandelwal-ayush wants to merge 2 commits intolinkedin:masterfrom
khandelwal-ayush wants to merge 2 commits intolinkedin:masterfrom
Conversation
akshayrai
reviewed
Feb 16, 2026
| // ConnectorTask being created. Clearing follows the same pattern as onSessionExpired(). | ||
| _log.warn("Timeout when doing the assignment. Clearing current assignment state to force full " | ||
| + "reconciliation on next assignment change.", e); | ||
| _assignedDatastreamTasks.clear(); |
Collaborator
There was a problem hiding this comment.
can we add some tests to cover this?
Collaborator
Author
There was a problem hiding this comment.
sure, let me add
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix stale Coordinator assignment state after timeout causing missed ConnectorTask creation
When handleAssignmentChange() times out waiting for connector callbacks, it returns early without updating _assignedDatastreamTasks. This leaves stale state from the last successful assignment. On the next
assignment change, the Coordinator computes an incorrect diff and may skip creating ConnectorTasks for re-assigned DatastreamTasks — resulting in unprocessed data.
Root Cause (Production Incident)
Host: lva1-app128114.prod.linkedin.com, affected task: makto-db-152
Result: No ConnectorTask for makto-db-152, all connected GaaS DB/tables stopped processing.
Fix
Added _assignedDatastreamTasks.clear() in the catch (TimeoutException) block of handleAssignmentChange(). This forces a full reconciliation on the next assignment change — all connectors receive their
complete task list and reconcile against their internal state.
Why clear instead of update to new assignment? After timeout, future.cancel(true) interrupts connector threads. We don't know which connectors finished processing and which didn't. Clearing forces a full
re-dispatch. Connectors handle this idempotently since onAssignmentChange() receives the full task list, not a delta.
This follows the existing pattern in onSessionExpired() (line 693), which performs the same clear for the same reason.
Testing Done
Will be deployed and tested