PITR with barman fails - is my understanding even accurate?



  • I mange to take backups and recover them to my postgres-server. Now I want to see what PITR is able to do with these steps

    1. create a backup (20220111T062908)
    2. wait a minute
    3. create a new database (at 06:34:11)
    4. run a barman recover operation
    • pg_ctl stop (on postgres-server)
    • barman recover (on barman server)
    • pg_ctl start (on postgres-server)
    • check for my database from 06:34:11 which is not there (on postgres-server)

    It looks like the recovery is getting me to the point of the backup (06:29:08) but not the the --target-time (06:35:00). Or do I understand something really wrong about the PITR logic?

    Even though my gut tells me it can't be ... do I need another backup after 06:34 and then be able to do a PITR to a PIT between those 2 backups or am I missing something along the way?


    these are the barman recover details:

    :~> barman recover vm-51150-0196 20220111T062908 --remote-ssh-command 'ssh postgres@[postgres-server]' --target-time 20220111T063500 /opt/db/data/postgres/data
            Starting remote restore for server vm-51150-0196 using backup 20220111T062908
            Destination directory: /opt/db/data/postgres/data
            Remote command: ssh postgres@[postgres-server]
            Doing PITR. Recovery target time: '2022-01-11 06:35:00+01:00'
            Using safe horizon time for smart rsync copy: 2022-01-11 06:29:08.521311+01:00
            Copying the base backup.
            Copying required WAL segments.
            Generating recovery configuration
            Identify dangerous settings in destination directory.
            IMPORTANT
            These settings have been modified to prevent data losses
            postgresql.conf line 242: archive_command = false
            postgresql.auto.conf line 5: recovery_target_time = None
            WARNING
            You are required to review the following options as potentially dangerous
            postgresql.conf line 760: include_if_exists = 'postgresql.conf.d/01_postgres_barman.conf' # include file only if it exists
            Recovery completed (start time: 2022-01-11 07:02:20.425453, elapsed time: 7 seconds)
            Your PostgreSQL server has been successfully prepared for recovery!
    


  • This should work, but there seems to be a bug in barman that makes it screw it up. It doesn't copy all the necessary WAL files to the restored server directory to do the PITR. If you specify get-wal, then it does work (for me), or you can find the WAL files and manually copy them.

    This is in 2.17. I haven't tried to figure out if the bug is recent or primordial.

    I've looked it over, the issue goes all the way back to the initial git commit. The problem is that barman decides the "time" of a WAL file is the time it finished receiving that file (on the barman server's clock). And that the last WAL file it needs for PITR is the first one whose receipt time is after the --target-time. But if the streamer/archiver ever gets at all behind, this isn't true, it might need files which were generated around the target-time, but not received until minutes or hours (or I suppose days if there were some kind if outage) later than the target-time.

    As far as I can tell, 'get-wal' needs to go in the conf file as recovery_options = 'get-wal', it can't be done on the command line. this works because the recovering PostgreSQL just keeps asking for files until it is done, it doesn't need to know the stopping WAL file ahead of time.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2