Does UPDATE write a new row version for unchanged, TOASTed values?



  • I am working with a PostgreSQL table with a large TEXT field, which is theoretically updated on a regular basis. I have thought about storing the data directly in the filesystem, but with TOAST, the data is already being stored off-page and compressed in the database so I figured I would keep things simple and just use the database storage.

    To give some context, I am indexing RSS feeds. I will have a script run every 24h that pulls the RSS feeds and potentially updates the table. This can lead to a lot of dead tuples, and thus lots of space being used on disk. Of course, autovacuum will take care of this eventually but it has the potential to be a lot of data (many GB) and I want to make sure I know what will happen when I am doing lots of updates on this very large table.

    One solution I have is to only update the TEXT field (storing the RSS data) if there are certain substantial changes to the feed, e.g. a new post on the website. This means that I could avoid doing the UPDATE unless I really have to. However, I still want to update the table (to keep track of when I most recently did an HTTP request). This will create a dead tuple with the old version of the row's data.

    What will happen to the TEXT data, if it isn't actually changed? Will the UPDATE also duplicate the TEXT data, when it creates a dead tuple? Or will the TEXT data be left untouched, because it wasn't changed and it is stored off-page?



  • That's the one major shortcut an UPDATE takes in Postgres' MVCC model (as compared to DELETE + INSERT😞 Fields that are stored out-of-line (TOASTed) and not changed in the UPDATE are kept as is. Meaning, the old (soon to be dead) row version and the new row version in the main relation point to the same TOASTed values, no additional bloat there.

    Or https://www.postgresql.org/docs/current/storage-toast.html#STORAGE-TOAST-ONDISK :

    During an UPDATE operation, values of unchanged fields are normally preserved as-is; so an UPDATE of a row with out-of-line values incurs no TOAST costs if none of the out-of-line values change.

    As a_horse_with_no_name pointed out:
    "Not changed in the UPDATE", or "unchanged" as the manual puts it, means "not targeted in a SET clause of the UPDATE". Postgres does not verify whether a new column value actually differs from the previous row version.

    If possible, skip rows that do not change to begin with by adding a WHERE clause. That still leaves cases updating multiple columns at once where some stay unchanged. If that applies to a your big column, it might pay to update it separately, and only where it actually changes. See:

    • https://stackoverflow.com/a/12632129/939860

    If, OTOH, you update big, TOASTed fields on a regular basis, consider the https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-DEFAULT-TOAST-COMPRESSION (new in Postgres 14) for those columns. Disk footprint is a bit bigger, but performance is much better. See:

    • https://stackoverflow.com/a/71088092/939860



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2