fix: sort record IDs in lockForeignRecords to prevent deadlocks#2676
Open
dkindlund wants to merge 1 commit intoteableio:developfrom
Open
fix: sort record IDs in lockForeignRecords to prevent deadlocks#2676dkindlund wants to merge 1 commit intoteableio:developfrom
dkindlund wants to merge 1 commit intoteableio:developfrom
Conversation
lockForeignRecords uses SELECT ... FOR UPDATE to acquire row-level locks on foreign records before modifying link data. However, it passes recordIds to whereIn without sorting, and PostgreSQL does not guarantee deterministic lock acquisition order for WHERE IN clauses. When two concurrent operations lock overlapping sets of foreign records in different orders, a deadlock can occur. This is inconsistent with lockRecordsSql in postgres.provider.ts (line 536) which correctly sorts recordIds before forUpdate(). This fix sorts the record IDs and adds ORDER BY __id to ensure deterministic lock acquisition order, matching the pattern used elsewhere in the codebase. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dkindlund
added a commit
to dkindlund/teable
that referenced
this pull request
Mar 9, 2026
## Problem The /link-fix endpoint's InvalidLinkReference repair has a race condition under concurrent write load. The current two-step approach: 1. checkLinks() — SELECT to detect desynced record IDs 2. fixLinks(recordIds) — UPDATE only those specific records Between steps 1 and 2, concurrent writes can: - Create NEW desyncs not in the recordIds list (missed until next run) - Worsen existing desyncs (JSONB changes between detection and fix) - Overwrite the fix immediately after it's applied In production under sustained concurrent API load (200-800 writes/hr), we've observed full JSONB array wipes (e.g., 9,781 entries reduced to 0) caused by the read-modify-write race in Teable's link update path. The integrity fix endpoint, meant to repair these desyncs, is vulnerable to the same concurrent writes because of this detection-fix gap. ## Solution Add atomicFixLinks() to IntegrityQueryPostgres that combines detection and fix into a single UPDATE ... WHERE __id IN (SELECT ...) statement. Detection and fix execute under the same MVCC snapshot, eliminating the application-level gap entirely. The service (LinkFieldIntegrityService.checkAndFix) tries the atomic method first. If the database engine doesn't support it (e.g., SQLite), it falls back to the existing two-step approach. This is a safe, backwards-compatible change — SQLite behavior is completely unchanged. ## What the /link-fix endpoint covers For context, the endpoint handles 10 integrity issue types. This PR improves only InvalidLinkReference. Here is the full list: | Issue Type | What it detects | Fix applied | |---|---|---| | InvalidLinkReference | JSONB link column diverged from junction/FK source of truth | Rebuilds JSONB from junction/FK data (THIS PR) | | MissingRecordReference | Junction table has rows pointing to deleted records | Deletes orphaned junction rows | | ForeignTableNotFound | Link field references a deleted table | No auto-fix (requires manual intervention) | | ForeignKeyHostTableNotFound | Junction table is missing | No auto-fix | | ForeignKeyNotFound | Missing FK columns in junction table | Recreates columns, backfills from JSONB | | SelfKeyNotFound | Missing self-reference key in junction | No auto-fix | | SymmetricFieldNotFound | Bidirectional link missing its counterpart | Converts to one-way link | | ReferenceFieldNotFound | Referenced record was deleted | Deletes orphaned reference | | UniqueIndexNotFound | Missing unique constraint for OneOne links | Creates the index | | EmptyString | Text fields have empty strings instead of NULL | Converts to NULL | ## Relationship types handled The atomic fix handles all four relationship types: - ManyMany (isMultiValue=true): Rebuilds JSONB array from junction table - OneMany (isMultiValue=true): Same as ManyMany - ManyOne (isMultiValue=false, FK in same table): Rebuilds single JSONB object from FK column - OneOne (isMultiValue=false, FK in host table): Rebuilds via cross-table join ## Files changed - abstract.ts: Added atomicFixLinks() with default null return - integrity-query.postgres.ts: PostgreSQL implementation of atomicFixLinks() - link-field.service.ts: checkAndFix() tries atomic first, falls back to two-step ## Related issues - teableio#2680 (DataLoader cache invalidation under concurrency) - teableio#2676 (Sort record IDs in lockForeignRecords) - teableio#2677 (Wrap simpleUpdateRecords with transaction/timeout/retry) - teableio#2679 (Add foreign record locking to ManyMany, OneMany, OneOne) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug Description
lockForeignRecordsinlink.service.tsusesSELECT ... FOR UPDATEto acquire row-level locks on foreign records before modifying link data. However, it passesrecordIdstowhereInwithout sorting, and PostgreSQL does not guarantee deterministic lock acquisition order forWHERE INclauses.When two concurrent operations lock overlapping sets of foreign records in different orders, a deadlock can occur. For example:
[rec3, rec1, rec2][rec2, rec3, rec1]This is inconsistent with
lockRecordsSqlinpostgres.provider.ts(line 536) which correctly sortsrecordIdsbeforeforUpdate().Steps to Reproduce
Expected Behavior
Lock acquisition should follow a deterministic order to prevent deadlocks between concurrent operations.
What This Fix Does
recordIdsarray before passing towhereInorderBy('__id')to the lock queryThis ensures deterministic lock acquisition order, matching the pattern already used by
lockRecordsSqlinpostgres.provider.ts.Client Information
Platform
Docker standalone (Google Cloud Run)
Additional Context
Discovered during a deep stability audit of the Teable codebase focused on concurrency issues under high API load. The existing
lockRecordsSqlin the Postgres provider already implements correct sorted locking — this fix bringslockForeignRecordsinto alignment with that pattern.