Is updating a table by delete+insert better than merging with a temp table?



  • I have two tables in Postgres. The first table has 120 columns (stock metrics like returns, sales etc.), and the second table has 5 columns with a one-to-many relationship and on cascade delete constraint with the first table (this table contains 10-year stock prediction data of all public companies). All these table values should get updated every day as the stock price moves every day.

    In my project, there are two ways to update the database:

    1. One is on the API level where only one row in the first table is only updated. This depends on the number of times the user wants to update but there won't be a lot of updates from the user side.
    2. Another one is executed on a weekly basis where all the rows in both tables are updated.

    For both ways, I first delete the records in the first table and insert data in both the tables using python (for-loop).

    Is deleting and inserting a better approach than creating a temp table and merging (update for existing records and insert for non-existent rows) it with the old table?



  • The first table has 120 columns, and each primary key has one row

    The "Primary Key", by definition, uniquely identifies each row.
    I suspect that's not what you mean here.

    the second table has 5 columns, and each primary key has 1000 rows

    Then it cannot be the "Primary Key" of the second table.
    I assume you mean that the second table has 1000 rows for each corresponding [Primary] Key in the first table.

    ... two ways to update the database ...
    One is on the API level where only one primary key is updated.
    Another one ... where all primary keys are updated.

    The Primary Key of any record should be generated / stored when the record is first created and should remain unchanged for the entire lifetime of that record, until the record is finally destroyed.

    Nothing should update Primary Key values.
    It's a hugely expensive operation, will lock up the database while it's happening and, to all intents and purposes, it's a complete waste of time. If you're using a surrogate, numeric id for each record to make tying things together in the database "easier", then that value should never, ever change (and no-one, outside the database, should ever, ever see it).

    Every change to the database is logged, so effectively deleting and recreating the table, by "rewriting" every record is going to hammer your disk system, with an image of every deleted record and then every inserted record. That's going to be a lot of disk activity and CPU load and record/table locking and Application performance impact - for something that you probably shouldn't be doing at all.

    An analogy:
    Consider the effect if your bank decided to renumber everybody's accounts whenever someone else closed their account ... That's the sort of thing we're talking about here.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2