Skip to content

Conversation

@hoangtrann
Copy link

This pr add a secondary attempt to requeue "orphaned jobs" where it stucks in enqueued, not yet make it to queue_job_lock and selecting jobs after db initialization also misses it. Even a server restart would not unblock the job.

This is most likely due to server being shutdown or crash during job enqueue. Manually requeue the job on UI make everything running again.

@OCA-git-bot
Copy link
Contributor

Hi @guewen,
some modules you are maintaining are being modified, check this out!

@sbidoul sbidoul added this to the 18.0 milestone Nov 22, 2025
Copy link
Member

@sbidoul sbidoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I kind of remember we had thought about this case when doing the original implementation with @AnizR but maybe it got lost in the various iterations we did.

@sbidoul sbidoul changed the title [IMP] queue_job: requeue orphaned jobs [18.0][IMP] queue_job: requeue orphaned jobs Nov 22, 2025
@hoangtrann hoangtrann force-pushed the feat-requeue-orphaned-jobs branch from 8b82098 to 84a3fca Compare November 22, 2025 21:43
@AnizR
Copy link
Contributor

AnizR commented Nov 26, 2025

Good catch! I kind of remember we had thought about this case when doing the original implementation with @AnizR but maybe it got lost in the various iterations we did.

In deed, we talked about it but I think that it got lost.

Thanks for this patch, I'll review it within the week.

@hoangtrann
Copy link
Author

@sbidoul are we good to merge?

@sbidoul
Copy link
Member

sbidoul commented Dec 31, 2025

Re-reading this I'm slightly annoyed that it reintroduces a complexity that we had simplified before.

I wonder if it would work to add a OR NOT EXISTS (SELECT 1 FROM queue_job_lock...) in _query_requeue_dead_jobs.

@hoangtrann hoangtrann force-pushed the feat-requeue-orphaned-jobs branch from 89af958 to a688fa9 Compare December 31, 2025 12:37
@hoangtrann
Copy link
Author

@sbidoul I've adapted the query to your suggestions and tests are passing!

@OCA-git-bot
Copy link
Contributor

This PR has the approved label and has been created more than 5 days ago. It should therefore be ready to merge by a maintainer (or a PSC member if the concerned addon has no declared maintainer). 🤖

Copy link
Member

@sbidoul sbidoul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! One important fix is needed but I think it looks better like this overall.

I'll need to check the execution plan of the new query to be sure it's efficient.

AND (
id in (
SELECT
queue_job_id
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
queue_job_id
1

for symmetry

queue_job_id
FROM
queue_job_lock
FOR UPDATE SKIP LOCKED
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs a where clause here otherwise it locks all jobs.

@sbidoul
Copy link
Member

sbidoul commented Jan 2, 2026

Execution plan looks like this (with the missing where clause added), with one job started with no lock held, and one enqueued with no lock record. So looks good to me:

 Update on queue_job  (cost=0.43..7.30 rows=1 width=106) (actual time=0.146..0.711 rows=2 loops=1)
   ->  Index Scan using queue_job_state_index on queue_job  (cost=0.43..7.30 rows=1 width=106) (actual time=0.046..0.095 rows=2 loops=1)
         Index Cond: ((state)::text = ANY ('{enqueued,started}'::text[]))
         Filter: ((date_enqueued < ((now() AT TIME ZONE 'utc'::text) - '00:00:10'::interval)) AND ((SubPlan 1) OR (NOT (SubPlan 2))))
         SubPlan 1
           ->  LockRows  (cost=0.43..2.66 rows=1 width=10) (actual time=0.010..0.010 rows=0 loops=2)
                 ->  Index Scan using queue_job_lock_queue_job_id_index on queue_job_lock  (cost=0.43..2.65 rows=1 width=10) (actual time=0.008..0.008 rows=0 loops=2)
                       Index Cond: (queue_job_id = queue_job.id)
         SubPlan 2
           ->  Index Only Scan using queue_job_lock_queue_job_id_index on queue_job_lock queue_job_lock_1  (cost=0.43..2.65 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1)
                 Index Cond: (queue_job_id = queue_job.id)
                 Heap Fetches: 0
 Planning Time: 0.247 ms
 Trigger queue_job_notify: time=0.083 calls=2
 Execution Time: 0.855 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants