Issues with differential backup on random days



  • We are having a very weird issue with our backups related to differential backup for one of VLDB around 10 TB.

    We run once a full backup every Monday and differential on remaining days. We do on Monday a full backup because a window of 6 hours to take full backup is only available in that time frame. Log backups every 15 mins. We take backups on primary replica as this setup is part of AG.

    Now some random days we see issues where differential backup would go too slow. On avg it completes in 45 mins for 6 scheduled days. But in last few months some weird days it will just go too slow that is 10% complete progress in 2-3 hours. This can be any day Tue,wed,Thu etc.. We thought may be Fri or SAT diff could be large and become slow but that does not seem the case.

    From what i checked there is nothing unusual running conflicting on the backup run time and everything is just as any other day.

    The waits observed are ASYNC_IO_COMPLETION and we would have huge spike on slow diff backup days for read and write latency touching 60 ms and logical disk queue length around 3 for the backup drives (dedicated). Also we have multiple backup LUNS as we backup in split file mode for faster backups.

    A strange fix we see is that we run a FULL backup during the day of issue. Then next day when diff runs it seems to get back to good run time again without any major slowness. Currently when there is issue we have never seen diff backup completing and just end up aborting the run.

    Please suggest what else i can check to find the root cause.

    Edit- adding additional details as asked in comments

    What other processes are running at the same time (integrity checks, index maintenance etc)? --> None of them are running just regular app load. There is no fixed pattern as seen from last few hung states

    Are you using multiple backup files or backing up to a single file? Multiple backup files of 8

    Local storage or SAN disks? SAN

    Cloud or On-Premise? On premises

    Are you using backup compression? Yes enabled

    Are you backing up to an independent drive, or does it share the drive with data or log files? Independent backup drive with multiple mount points. This dedicated backup drive is almost 20 TB

    Any VM-level backups occurring at the same time that might be freezing IO? None that i heard of but can check



  • We have seen similar issues with backup slowdowns and disk latency when our backups were targeting a 40TB drive. How big is the drive that your backups are on?

    We have had to set 20TB limit on drive sizes to avoid the severe drop in latency when backing up large databases (5TB or more). If you have any smaller drives you could use for testing you could compare the back speeds to the larger drive, if you have one 🙂

    Otherwise start looking at potential hardware issues with the LUN. Drives go bad and it can be hard to prove it out but worth checking.

    Check https://www.sqlservercentral.com/forums/topic/backup-taking-too-long-sql-server-2016 out also, very similar issues to yours are mentioned.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2