Inconsistent keyset pagination when using (timestamp, uuid) fields



  • I am using the keyset pagination method for uuids on my Postgres database described in this post:

    • https://dba.stackexchange.com/questions/267794/how-to-do-pagination-with-uuid-v4-and-created-time-on-concurrent-inserted-data

    However, I have noticed when I have two records where the date is the same, rows are being skipped from the result.

    For example when I run the query:

    SELECT id, created_at FROM collection
    ORDER BY created_at DESC, id DESC
    

    I get the records back as I expect them, with created_at being the primary order, then id acting as a tiebreaker:

    id created_at
    e327847a-7058-49cf-bd91-f562412aedd9 2022-05-23 23:07:22.592
    d35c6bb8-06dd-4b86-b5c6-d123340520e2 2022-05-23 23:07:22.592
    5167cf95-953f-4f7b-9881-03ef07adcf3c 2022-05-23 23:07:22.592
    d14f48dc-df22-4e98-871a-a14a91e8e3c1 2022-05-23 23:07:21.592

    However when I run a query to paginate through like:

    SELECT id, created_at
    FROM collection
    WHERE (created_at, id) < ('2022-05-23 23:07:22.592','d35c6bb8-06dd-4b86-b5c6-d123340520e2')
    ORDER BY created_at DESC, id DESC
    LIMIT 3
    

    I would expect to get back the last two records, but my result set is instead

    id created_at
    d14f48dc-df22-4e98-871a-a14a91e8e3c1 2022-05-23 23:07:21.592

    I've also tried some variations on the query to try to fix it, such as:

    SELECT id, created_at
    FROM collection
    WHERE created_at < '2022-05-23 23:07:22.592' OR
         (created_at = '2022-05-23 23:07:22.592' AND id < 'd35c6bb8-06dd-4b86-b5c6-d123340520e2')
    ORDER BY created_at DESC, id DESC
    

    But I still get back the same result set.

    What's going on with my query?



  • Either of your shown queries should return two rows, as expected:

    db<>fiddle https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f9d066ef2a3155dfd9995aaf36896f80

    If you see a different result, then the likely cause is index corruption. Test with:

    SELECT id, created_at
    FROM collection
    WHERE (created_at + interval '1 ms', id)
        < ('2022-05-23 23:07:22.592','d35c6bb8-06dd-4b86-b5c6-d123340520e2')
    ORDER BY created_at + interval '1 ms' DESC, id DESC
    LIMIT 3;
    

    That "disables" index support and gets the result of a sequential scan. See:

    • https://dba.stackexchange.com/questions/292185/text-column-compares-equal-to-where-clause-but-does-not-select-matching-row/292200#292200

    If so, fix with:

    REINDEX TABLE collection;
    

    Or just the involved index on (created_at, id).

    I have hardly ever seen index corruption myself. Typically, there is a troubling root cause, like failing hardware (RAM, storage) or a very old version of Postgres. Try to find and fix the cause.

    This might be a good time for a backup of your database first.

    Related:

    • https://dba.stackexchange.com/questions/219782/efficient-pagination-for-big-tables/219810#219810



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2