PostgreSQL minimize data dir for Docker image



  • I am preparing a Postgres Docker image based on the official image. I have an application that fills the database with initial data (this takes about an hour for ~300MB of data). I want to make an image out of that Postgres database so that the data is already initialized. This works fine, but due to the nature of the WAL, the PG_DATA directory is larger than necessary. For Docker image distribution, minifying PG_DATA is important. This will add up if I later think about adding a couple more layers of data, which all add their own set of WAL files that are not strictly necessary.

    What options do I have to minimize PG_DATA? My main question is about minimizing the WAL, at the end I talk about minimizing other areas of PG_DATA.

    Delete the WAL

    When is it safe for me to delete the WAL?

    To be clear, this is a throw-away container filled with some initial data. It’s not intended to be a productive system. archive_mode = off, wal_level = minimal, no replication.

    I am trying to keep the WAL small by the following config:

    wal_keep_size = 0
    wal_recycle = off
    min_wal_size = 32MB
    max_wal_senders = 0 # necessary due to wal_level = minimal
    

    However, even after issuing CHECKPOINT;, the pg_wal directory is still ~200MB big. Why is Postgres not deleting the files? Based on the settings, I would assume that only 32MB should be kept. After CHECKPOINT; and stopping the server, is it safe to delete all files from pg_wal?

    From what I can tell from other questions and answers, Postgres should automatically delete the WAL files. Some Q&As I am already aware, but do not talk about my specific problem and use-case:

    • https://dba.stackexchange.com/questions/80317/how-can-i-solve-postgresql-problem-after-deleting-wal-files
    • https://stackoverflow.com/questions/49650016/how-to-reduce-wal-file-count-in-edb-postgresql-9-6-instance
    • https://stackoverflow.com/questions/35144403/which-postgresql-wal-files-can-i-safely-remove-from-the-wal-archive-folder
    • https://stackoverflow.com/questions/49539938/postgres-wal-file-not-getting-deleted
    • https://stackoverflow.com/questions/68858808/postgresql-how-to-safely-remove-files-inside-pg-wal-directory

    These questions are mostly concerned about archive, backup, replication, PITR. But that does not apply to my use-case.

    Minimizing PG_DATA

    My idea is that after executing CHECKPOINT; and clearing the WAL, I can also do a VACUUM FULL; to release any space to the file system. This seems to work as intended, I see the base directory getting a bit smaller. Of course, I expect this to have the most effect after deleting/updating data, not after mostly inserting rows during initialization.

    Are there even more options to minimize PG_DATA?



  • After you have populated your database, shut it down cleanly (important!) and run

    pg_resetwal -D /path/to/data/directory
    

    That will truncate your WAL. Never manually delete WAL files.


Log in to reply
 


Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2