Columnstore index predicate is not used in index scan



  • We are using a clustered columnstore index table with ~50M records and experiencing big performance degradation when running on https://en.wikipedia.org/wiki/Google_Cloud_Platform cloud sql vs. running locally using the same db schema and data (just exported and imported a bak file).

    Using the below query, we found that the execution plan on GCP cloud sql ( https://www.brentozar.com/pastetheplan/?id=ByexjqpCF ) doesn't use the projectId predicate on the index scan and instead applies it only in a further filter step. When running locally ( https://www.brentozar.com/pastetheplan/?id=rJLO59pRK ) the predicate is pushed into the index scan resulting in much fewer rows to scan and better performance.

    What could be the reason for such difference?

    SELECT YEAR(ReferenceDate) RefDateYear, MONTH(ReferenceDate) RefDateMonth,sum(diffsum) DiffSum
    into #res
    FROM Journal jp
    WHERE jp.projectId='582b02e2-add0-4b50-94f7-4e7e07497cf6' AND ReferenceDate < '20220101' AND batch != 9998 
    GROUP BY YEAR(ReferenceDate), MONTH(ReferenceDate)
    

    See also the table schema:

    SET ANSI_NULLS ON
    GO
    SET QUOTED_IDENTIFIER ON
    GO
    CREATE TABLE [dbo].[Journal](
        [id] [int] IDENTITY(1,1) NOT NULL,
        [projectId] [uniqueidentifier] NOT NULL,
        [diffSum] [money] NOT NULL,
        [batch] [int] NOT NULL,
        [referenceDate] [datetime2](7) NOT NULL,
        ...
    ) ON [PRIMARY]
    GO
    ALTER TABLE [dbo].[Journal] ADD  CONSTRAINT [Journal_pkey] PRIMARY KEY NONCLUSTERED 
    (
        [id] ASC
    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
    GO
    CREATE CLUSTERED COLUMNSTORE INDEX [CCI_journal] ON [dbo].[Journal] WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
    GO
    


  • SQL Server Standard Edition has a https://docs.microsoft.com/en-us/sql/sql-server/editions-and-components-of-sql-server-version-15?view=sql-server-ver15#RDBMSSP related to columnstore performance.

    1In-Memory OLTP data size and Columnstore segment cache are limited to the amount of memory specified by edition in the https://docs.microsoft.com/en-us/sql/sql-server/editions-and-components-of-sql-server-version-15?view=sql-server-ver15#Cross-BoxScaleLimits section. The degree of parallelism (DOP) for https://docs.microsoft.com/en-us/sql/relational-databases/query-processing-architecture-guide?view=sql-server-ver15#batch-mode-execution operations is limited to 2 for SQL Server Standard Edition and 1 for SQL Server Web and Express Editions. This refers to columnstore indexes created over disk-based tables and memory-optimized tables.

    2 Aggregate Pushdown, String Predicate Pushdown, and SIMD Optimizations are SQL Server Enterprise Edition scalability enhancements

    From https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-what-s-new?view=sql-server-ver15#performance-for-database-compatibility-level-120-or-130 :

    • String Predicate pushdown speeds up queries that compare strings of type VARCHAR/CHAR or NVARCHAR/NCHAR. This applies to the common comparison operators and includes operators such as LIKE that use bitmap filters. This works with all supported collations. On SQL Server, this enhancement is reserved for Enterprise edition.

    Although your predicate is a UUID, it counts as a string predicate here. Without string predicate pushdown, the predicate is evaluated in a separate Filter, without special optimizations.

    There is a more comprehensive description of string predicate pushdown in the documentation at https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-query-performance?view=sql-server-ver15#string-predicate-pushdown .

    String predicate pushdown leverages the primary/secondary dictionary created for column(s) to improve the query performance. For example, let us consider string column segment within a rowgroup consisting of 100 distinct string values. This means each distinct string value is referenced 10,000 times on average assuming 1 million rows.

    With string predicate pushdown, the query execution computes the predicate against the values in the dictionary and if it qualifies, all rows referring to the dictionary value are automatically qualified. This improves the performance in two ways:

    1. Only the qualified row is returned reducing number of the rows that need to flow out of SCAN node.

    2. The number of string comparisons are significantly reduced. In this example, only 100 string comparisons are required as against 1 million comparisons. There are some limitations as described below:

      • No string predicate pushdown for delta rowgroups. There is no dictionary for columns in delta rowgroups.
      • No string predicate pushdown if dictionary exceeds 64 KB entries.
      • Expression evaluating NULLs are not supported.

    See https://www.erikdarlingdata.com/sql-server/how-useful-is-column-store-in-standard-edition/ by Erik Darling for other examples.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2