Can one dictate the order of generated values when adding a new BIGSERIAL column to a table?



  • I need to add a new BIGSERIAL column to a huge table (~3 billion records). https://dba.stackexchange.com/q/20801/224076 is similar to what I need to do and the https://dba.stackexchange.com/a/20802/224076 has helped me somewhat. But I'm still wondering about something. In my case, the table already has a BIGSERIAL column which is the primary key, but many rows have been deleted so now there are gaps. (The table has subsequently been fully vacuumed.) I need to regenerate the values so that they are sequential again. Here are 5 example rows of what I want to achieve where the new_value > 1000:

    +---------+---------+
    |old_value|new_value|
    +---------+---------+
    |1026     |1001     |
    |1027     |1002     |
    |1030     |1003     |
    |1032     |1004     |
    |1039     |1005     |
    +---------+---------+
    

    I have successfully implemented the alternative approach as mentioned in the referenced answer above (CREATE TABLE ... and then INSERT INTO new_table SELECT * FROM ... ), but I would also like to attempt, and benchmark against, the initial suggestion. The problem, however, is that I don't know whether the new_value will be generated in the same order as the old_value as this is a requirement.

    How can I ensure the order of the new_value column follows/tracks the order of the old_value column when the new_value column is added using a statement like this:

    ALTER TABLE existing_table ADD COLUMN new_value BIGSERIAL;
    

    A different approach

    I also attempted the following (that works quite well on a small table), but it's much slower than the alternative suggestion of the referenced answer on very large tables:

    ALTER TABLE existing_table ADD COLUMN new_value BIGINT NOT NULL DEFAULT 0;
    

    UPDATE existing_table AS existing
    SET new_value = generated.new_id
    FROM (
    SELECT original.old_value
    , row_number() OVER (ORDER BY original.old_value) AS new_id
    FROM existing_table AS original
    ) generated
    WHERE existing.old_value = generated.old_value;



  • First, consider the advise in the comments: do you really need to remove gaps? Typically, you don't. And gaps will be creeping back in. See:

    • https://dba.stackexchange.com/questions/289986/is-identity-continuous-in-new-versions-of-postgresql/290022#290022
    • https://stackoverflow.com/a/24918964/939860

    Next, consider the updates to my old answer you have been working off:

    • https://dba.stackexchange.com/questions/20801/most-efficient-way-to-add-a-serial-column-to-a-huge-table/20802#20802

    Among other things, how to possibly avoid writing WAL for the table data in the new table - that will be the most significant bit to further improve performance.

    How can I ensure the order of the new_value column follows/tracks the order of the old_value column when the new_value column is added using a statement like this:

    ALTER TABLE existing_table ADD COLUMN new_value BIGSERIAL;

    You cannot. Use one of the other routes.

    I also attempted the following (that works quite well on a small table), but it's much slower than the alternative suggestion of the referenced answer on very large tables:

    The UPDATE route is inherently more expensive than writing a pristine new table from scratch (adding indexes constraints afterwards).




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2