Job queue performance - split into pending and done?
My application will use Postgresql as a job queue. There will be 100s-1000s of “pending” job rows at any time, and eventually grow to many millions of “completed” jobs spanning months.
The DB will be queried continuously by job workers for “next job”, after which the job updated with a start date and “in progress” status, before finally having a completed date and final status. After this final status, a job row won’t be updated again. The ratio of queries for completed jobs to pending jobs would be something like 1:1000 or more.
I am deciding whether to use a single table for this or whether it would be better to isolate the “churning” rows into a “job queue” table while then moving all completed jobs to a “job log” table which is essentially append-only.
Is there a performance advantage in cases like this of having a separate table for completed jobs with no UPDATE and few SELECT calls? If it’s relevant, I’d also been planning to partition jobs by month, because we will prune them by month later (e.g. archive and remove anything over 6 months).
A jobs queue probably needs to be vacuumed a lot to stay efficient. Having it mixed into the same table as the already finished jobs will make that harder to do. So having the still-not-finished jobs in a separate table (or a separate partition) is a good idea.
You could partition by "status" with one 'final' partition and one everything else partition, and have the system automatically migrate the row when "status" is updated to 'final'. And then have the 'final' partition subpartitioned by date if you want. Or you could have them be different tables and do the delete/insert yourself when a job is finished.