Why significantly worse Column Store performance on Read-only AG secondary DB?
We have a pair of SQL 2019 Enterprise instances hosted in identically spec'd and configured VMs in Azure.
They for a simple primary/secondary availability group array, containing a small number of databases across a small number of AGs.
The single main production DB utilises a large (multi-billion row) column store table populated by production code, and used for read-only analytical reporting queries - these queries also join to row store tables for additional information.
We want to offload the read-only queries onto the read-only AG secondary to spread the load on the db, so we've to the listener with the
ApplicationIntent=ReadOnlyparameter - this works well and queries are issued against the secondary.
However, the duration of the read-only queries is regularly 10x slower than against the primary DB.
Comparing the identical execution plans, I can see that all of the additional time is taken reading the column store table. Repeatedly re-running the query to ensure caching does not make any difference on the secondary. But on the primary I am seeing cache-related performance improvements, returning results instantly on re-running. Also, temporarily suspending the AG data sync didn't have any effect.
Comparing the IO stats, they are both similar. Comparing memory used in the Column Store object pool and Row Store buffer pools, again both similar.
Really at a loss here - I would have expected that the secondary, being under much less load, but with identical resources would be performing if anything better rather than consistently and significantly worse.
Has anyone come across this scenario and have any suggestions how I can improve or at least justify this situation?
My blog post below details an issue which can cause a considerable increase in total CPU and total elapsed time for a query against a CCI with deleted rows when in Snapshot Isolation or RCSI, compared to read committed. This happens even when the CCI query against a table with deleted rows runs in isolation without any competing transactions.
Because an Availability Group secondary leverages RCSI under the hood, queries against CCIs there will see the same behavior as Snapshot Isolation or RCSI on the primary.
Getting rid of deleted rows in the CCI will bring performance of a RCSI or Snapshot Isolation query in isolation back in line with read committed.
When using index rebuild or re-organize to remove deleted rows from a CCI, remember that reorg uses 10% of rows deleted as default minimal threshold to remove deleted rows from rowgroups.