How to fix "could not write to file pg_wal/xlogtemp" after a failed command?



  • I ran vacuum full analyze, but it took too long and it was using 100% of RAM, so I killed it. Now, Postgres won't start. I got:

     * Starting PostgreSQL 14 database server                                                                                            
     * Error: /usr/lib/postgresql/14/bin/pg_ctl /usr/lib/postgresql/14/bin/pg_ctl start -D /var/lib/postgresql/14/main -l /var/log/postgresql/postgresql-14-main.log -s -o  -c config_file="/etc/postgresql/14/main/postgresql.conf"  exited with status 1:
    2022-03-15 22:51:01.259 PDT [915] LOG:  starting PostgreSQL 14.2 (Ubuntu 14.2-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc 
    (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, 64-bit
    2022-03-15 22:51:01.259 PDT [915] LOG:  listening on IPv4 address "0.0.0.0", port 5432
    2022-03-15 22:51:01.259 PDT [915] LOG:  listening on IPv6 address "::", port 5432
    2022-03-15 22:51:01.263 PDT [915] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
    2022-03-15 22:51:01.271 PDT [916] LOG:  database system was interrupted while in recovery at 2022-03-15 21:52:50 PDT
    2022-03-15 22:51:01.271 PDT [916] HINT:  This probably means that some data is corrupted and you will have to use the last backup for recovery.
    2022-03-15 22:51:02.578 PDT [917] postgres@foo FATAL:  the database system is starting up
    2022-03-15 22:51:02.595 PDT [918] postgres@foo FATAL:  the database system is starting up
    2022-03-15 22:51:02.612 PDT [919] postgres@foo FATAL:  the database system is starting up
    2022-03-15 22:51:02.641 PDT [920] postgres@foo FATAL:  the database system is starting up
    2022-03-15 22:51:02.656 PDT [921] postgres@foo FATAL:  the database system is starting up
    2022-03-15 22:51:02.683 PDT [922] postgres@foo FATAL:  the database system is starting up
    2022-03-15 22:51:03.570 PDT [916] LOG:  database system was not properly shut down; automatic recovery in progress
    2022-03-15 22:51:03.573 PDT [916] LOG:  redo starts at 36/21030540
    2022-03-15 22:51:04.368 PDT [916] LOG:  redo done at 36/2EFFF1D8 system usage: CPU: user: 0.40 s, system: 0.38 s, elapsed: 0.79 s    
    2022-03-15 22:51:04.376 PDT [916] PANIC:  could not write to file "pg_wal/xlogtemp.916": No space left on device
    2022-03-15 22:51:04.378 PDT [915] LOG:  startup process (PID 916) was terminated by signal 6: Aborted
    2022-03-15 22:51:04.378 PDT [915] LOG:  aborting startup due to startup process failure
    2022-03-15 22:51:04.386 PDT [915] LOG:  database system is shut down
    pg_ctl: could not start server
    Examine the log output.
    

    The database is pretty small, it shouldn't be running out of disk space, so vacuum full analyze must've created a lot of junk files. What's the proper way to fix this? I can't figure out how to use pg_archivecleanup, I can't find the archive directory, /var/lib/postgresql/14/main/pg_xlog doesn't exist. Here's everything in the folder:

    sudo ls /var/lib/postgresql/14/main/
    PG_VERSION  pg_commit_ts  pg_multixact  pg_serial     pg_stat_tmp  pg_twophase postgresql.auto.conf
    base        pg_dynshmem   pg_notify     pg_snapshots  pg_subtrans  pg_wal      postmaster.opts
    global      pg_logical    pg_replslot   pg_stat       pg_tblspc    pg_xact
    

    This is a dev database, so in the worst case I can just delete everything. However, in case this happens in production, I'd like to know how else I can fix it.

    OS: Ubuntu 20 on WSL



  • You are clearly out of disk space for pg_wal.

    The way to recover:

    1. increase the size of the disk (do not remove files)

    2. start PostgreSQL

    3. pg_dumpall the cluster

    4. stop the PostgreSQL server and remove the data directory (saving configuration files)

    5. create a new cluster with initdb, add the configuration and start it

    6. restore the dump taken in step 3

    This will get rid of wll the files left behind from the crash.

    Next time, don't run VACUUM (FULL) unless you have a clear indication that it is indicated, and in particular don't run it on all tables in your database, but only those that really need it.

    Note: to gather statistics, run ANALYZE. And VACUUM (FULL) is not a better version of VACUUM, it is something quite different.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2