ReadOnly queries on secondary server do not return correct values



  • I have two servers A and B. Both are in an Always On Group. The B server is configured for ReadOnly requests. This works fine so far, but recently there is a problem for queries for a service.

    The service first makes an update to a table (probably on the A server) and then a select with Read Only on the B server. The developer says these are two separate transactions.

    Unfortunately, the select does not return the values with which an update was previously made. Only after about 1 to 1.5 seconds do the correct values appear. We have never been able to observe this behaviour before.

    The server is set to synchronous commit. All databases are synchronous and have no data loss. The routing for read only has been tested and works. Readable Secondary is set to "Yes".

    I've only been working as a DBA for MSSQL databases for about 1.5 years and I don't quite understand how the two nodes A and B communicate in an AG, but I assumed that both nodes would receive the data at the same time when using "synchronous commit", right?

    What could be the reason for this 1 to 1.5 second time difference?

    Servers are SQL 2019 Enterprise with CU12 - Running on some powerful physical machines with Windows Server 2016 Standard.


  • QA Engineer

    The service first makes an update to a table (probably on the A server) and then a select with Read Only on the B server. Unfortunately, the select does not return the values with which an update was previously made. Only after about 1 to 1.5 seconds do the correct values appear.

    That's expected and how the product was designed. Commit harden (sync-commit) does not include redo, it only encompasses the harden on the secondary. Harden and Redo are two distinctly different processes for an AG in SQL Server.

    I don't quite understand how the two nodes A and B communicate in an AG, but I assumed that both nodes would receive the data at the same time when using "synchronous commit", right?

    The primary is responsible for capturing the "changes" which are packed up into log records, these are in turn packed into log blocks. The log blocks are (over simplifying here) closed and flushed on a commit, which is then used to copy and send to the other replicas independently of each other. Depending on various factors, this may or may not be a speedy process as it must traverse through other layers and threads.

    Eventually the log block will make it on the side of the secondary, be hardened, and depending on partner type will either immediately send a progress message or wait. When the primary receives the progress message for the specific replica that sent it, various internal values are updated, and other various checks occur. That happens for each replica, regardless of partner type (sync/async).

    Thus, data is not received at the same time on all replicas, nor is there any guarantee it completed redo before the progress message is processed on the primary.

    SQL Server is returning valid and correct data as per the start of the read transaction based on snapshot isolation, just not the data that is wanted/expected.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2