-
Notifications
You must be signed in to change notification settings - Fork 699
fix: Move publishRetryMessage to end of processing loop to avoid duplicate retries #4966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: stu/fix_job_retry
Are you sure you want to change the base?
Conversation
Per reviewer feedback, publishRetryMessage is now called at the end of the processing loop for each message that is NOT marked as invalid. This prevents duplicate retry messages for the same event, since invalid messages already go back to the SQS queue for retry. Key changes: - Track validMessagesForRetry separately from all messages - Only messages that pass job queued check are added to validMessagesForRetry - publishRetryMessage is called after runner creation, not before - Messages marked as invalid (e.g., max runners reached, creation failed) are excluded from retry message publishing Tests updated to reflect new behavior: - publishRetryMessage is called AFTER runner creation - Messages marked invalid do not get retry messages published - All test scenarios updated with proper mock runner creation Co-authored-by: npalm <[email protected]>
Address code review feedback: - Use Set for invalidMessageIds to improve lookup performance from O(n) to O(1) - Extract duplicate retry message publishing logic into helper function - Maintain both invalidMessages array (for return value) and Set (for fast lookups) This improves performance when processing large message batches and makes the code more maintainable by eliminating duplication. Co-authored-by: npalm <[email protected]>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
|
@copilot can you re-check, I think naming can be a bit better. Do we really need this complicty? |
Simplified the retry message logic based on reviewer feedback: - Renamed `validMessagesForRetry` to `queuedMessages` (clearer intent) - Renamed `invalidMessageIds` to `rejectedMessageIds` (more accurate) - Removed `invalidMessages` array duplication - now using only Set - Removed `publishRetryMessagesForValid` helper function (inline instead) - Convert Set to array only at return statement Result: clearer naming, less complexity, same functionality. Net reduction of 6 lines while maintaining all tests passing. Co-authored-by: npalm <[email protected]>
You're right! I've simplified the code: Naming improvements:
Complexity reduction:
Fixed in commit b5399f6. |
|
@copilot adjust the workflow https://github.com/github-aws-runners/terraform-aws-github-runner/blob/main/.github/workflows/lambda.yml to run on any pr not only the onces to main |
Remove branch restriction from lambda.yml workflow to support stacked PRs and feature branch workflows. The workflow will now run on any PR that modifies lambda code or the workflow itself. Co-authored-by: npalm <[email protected]>
Updated the lambda workflow to run on all PRs by removing the Fixed in commit cba7bbf. |
|
@copilot fix your formatting issues please |
Fix formatting issues identified by prettier: - Remove trailing whitespace in scale-up.test.ts - Format .prettierrc and eslint.config.js - Format various tsconfig.json and other config files Co-authored-by: stuartp44 <[email protected]>
Fixed formatting issues in commit ef5c981. The main issue was trailing whitespace in scale-up.test.ts. Also reformatted config files (.prettierrc, eslint.config.js, and various tsconfig.json files) to match prettier's style. All files now pass |
|
@copilot can you fix the test error? it looks like we have an issue with |
The merge from stu/fix_job_retry incorrectly reverted the core change where publishRetryMessage was moved to the end of the loop. This fix restores the correct behavior: - Restore queuedMessages.push(message) instead of await publishRetryMessage(message) - Remove conflicting test assertions from merge (lines expecting both publishRetryMessage NOT to be called AND to be called 2 times) - Reorder test assertions to check listEC2Runners before publishRetryMessage All tests now pass with the correct behavior where publishRetryMessage is called AFTER runner creation and only for non-rejected messages. Co-authored-by: stuartp44 <[email protected]>
Fixed in commit 0243458. The issue was caused by an incorrect merge resolution from the stu/fix_job_retry branch that reverted my core changes. The merge:
Restored the correct behavior where |
|
@copilot please keep the changes in this pull requests to a minimal. I'm not sure why you have touched so many files, this was not part of the original objective. |
Per reviewer feedback to keep changes minimal and focused on the original objective. Reverting: - Formatting changes to config files (.prettierrc, eslint.config.js, tsconfig.json files, nx.json, vitest.base.config.ts) - Formatting changes to test resource JSON files - Workflow change to lambda.yml (restoring branches: main restriction) Only keeping the essential changes: - lambdas/functions/control-plane/src/scale-runners/scale-up.ts - lambdas/functions/control-plane/src/scale-runners/scale-up.test.ts These are the only files needed to fix the original issue of moving publishRetryMessage to the end of the processing loop to avoid duplicate retries. Co-authored-by: Brend-Smits <[email protected]>
Reverted unnecessary changes in commit c2712a6. The PR now only modifies the two essential files:
Removed all formatting changes (config files, test resources) and the workflow change that were not part of the original objective. All tests, linting, and formatting checks still pass. |
|
This looks good to me now @stuartp44, what do you think? |
publishRetryMessageis calledpublishRetryMessagecall from line 360 (early in the loop)publishRetryMessageat the end of the loop for messages not marked as invalid✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.