How to update a column with multiple row values for each group?



  • In my Postgres table, there are about 2 million rows in total and for each ticker, there are 752 rows. I have an algorithm that shows different prediction price values for each date and ticker which needs to be uploaded to the table every day. As of now, I delete the ticker rows for which I have new data and insert the new data to the table using python. This table has a foreign key (1:M) on the ticker column.

    I read that deleting and inserting has side effects like it increases the transactional log, the index needs to get updated and a few more. Deleting and inserting seems simple as I have to call only 2 functions in python to do it. I cannot upload the data to a new table and rename it to old name and drop the old table because this table is connected to an API that is used by a lot of users. So, if someone calls the API during this process, I cannot save the user analysis data.

    1. Is delete+insert a good way to update a large table like this which has different row values every day for all tickers?
    2. How about doing UPSERT using a temp table?
    3. How about storing all the prediction data in a JSON structure for all tickers in postgres so it's easy to update 1 JSON row (value) for each ticker (key) instead of 752 rows for each ticker in a traditional table?

    Table 1: Original table

    id (PK) ticker (indexed) prediction_price date
    1 AAPL 5.4 2021-01-01
    2 AAPL 5.6 2021-01-03
    3 AAPL 5.8 2021-01-04
    4 MSFT 10.2 2021-01-01
    5 MSFT 10.8 2021-01-03
    6 MSFT 10.8 2021-01-04

    Table 2: New predictions for all the dates. Keep only x days for each ticker and delete previous ones

    id (PK) ticker (indexed) prediction_price date
    1 AAPL 5.9 2021-01-03
    2 AAPL 5.1 2021-01-04
    3 AAPL 10 2021-01-05
    4 MSFT 12.8 2021-01-03
    5 MSFT 11 2021-01-04
    6 MSFT 15 2021-01-05


  • You could, instead of delete and insert, update the oldest record for a given ticker, with the new data you have for that ticker. If the primary key of your table was the ticker plus a sequence from 1 to 752, you could save the sequence of that new record in the ticker table, so that finding the next one to be updated would be very fast. Your table would never need reorganizing, vacuuming or even dusting.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2