How does FCI guarantee data consistency and integrity?
FCI technology uses windows clustering to start-up the sql server service on the secondary node and provides the secondary node with the data and log disks.
Assuming the OS on node 1 crashed, and the FCI failover started-up the node 2, then what is the guarantee of data consistency and integrity? For example - it could be that the OS crashed while working on the sql data and log files thereby causing corruption.
To answer this question, let's look at what happens when a transaction is committed to the database.
When a DML statement is executed against SQL Server, a few events must happen.
- The transaction is written to the transaction log on disk.
- The transaction is next written to the corresponding data page in memory and the data page is then marked as dirty (I.E. it no longer matches what is on disk).
- At the next
LAZY WRITE, dirty pages will be written back to disk. Due to the frequency of when these run, it's actually possible for a single data page to receive several changes before the entire page is written back to disk.
Once the transaction is in the transaction log, it's essentially committed. This is the key point in the process. If the SQL Server were to crash prior to the change making it to the transaction log, SQL Server would have never acknowledged to the client that the transaction was successful.
If the SQL Server were to crash after the transaction log is updated, and prior to the page being updated in memory or on disk, the transaction could still be recovered from the transaction log. Once the SQL Server service starts on the new primary node, the database would go through recovery, and roll the database forward by applying the transaction from the transaction log that were not yet written to the data files.