Postgresql - Split one big and often updated table into two for performance



  • Im about to make a decision on the shape of solution database. Thing im considering, and which is problematic to find clear answer with google is -

    Lets assume i have 1 big relation for my web application and analogical java object (JPA Hibernate is used for mapping). This is main relation, which consists of over 20 columns, and is heavily updated, but i can tell right now its only 5 columns, that are updated over and over, and i can tell they are all small datatypes(ints). Im thinking if extracting this 5 often updated values to separate table, and specifying one-to-one relation between them, would bring me any performance improvement? What i would like to achieve by implementing it this way is that only smaller table would be updated so often. It would require join on every query, but the load sent back and forth would be much much smaller. I will appreciate any advice. Hope i the question is clear enough.

    Thnx in advance.



  • Since the smaller part of the table is "updated over and over", and the larger part of the table is not, and SELECT "will not be sent so often as updates", your idea is solid.

    Of course this is anecdotal information, not hard numbers to calculate with. The resulting overall performance depends on the complete picture.

    The main benefit is to involve much fewer data pages in your updates, and to keep table and index bloat at bay. Indexes on the small table will bloat all the same, but indexes on the big table will not be affected by the many updates. So consider aggressive autovacuum setting for the small table. See:

    • https://dba.stackexchange.com/q/21068/3684

    Things to consider:

    • A FILLFACTOR below 100 (like 80?) for the small table (and below 90 for indexes) - depends on access patterns - while keeping default for the big table. See:

      • https://stackoverflow.com/questions/14187192/fillfactor-for-a-sequential-index-that-is-pk/14187367#14187367
    • Create just the right indexes for the small table, as those increase write cost.

    • Create all the right indexes for the big table. Few updates there, so less constraint.

    • It may pay to CLUSTER the big table from time to time. The small table, not so much, as the effect deteriorates with subsequent writes. See:

      • https://stackoverflow.com/questions/13998139/optimize-postgres-query-on-timestamp-range/14007963#14007963
    • Add a FOREIGN KEY constraint, possibly with ON DELETE CASCADE to enforce referential integrity, or use OUTER JOIN if one of the two parts can be missing. All according to actual requirements.

    • You might add a VIEW to merge the two parts for ease of use. That's just a convenience feature, no impact on performance.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2