Detail on MS SQL CTEs



  • I am trying to find out how MS CTEs actually access data. Trying to find out if CTEs will actually run slower when pulling data and what is the order that a CTE will pull data. Does a CTE pull the data when a view is called or does it happen in the last select statement when creating the result set? How often does the CTE have to make a database call for additional data?



  • Main Difference

    The main difference between https://www.erikdarlingdata.com/category/common-table-expression/ and temporary objects is that CTEs do not materialize a result set, and every time you reference them the inner query needs to be re-executed. I'm not bringing https://www.erikdarlingdata.com/category/table-variables/ into the picture because it introduces additional performance complexities.

    Common Table Expression Problems

    I'm using this demo as a portable example. There are no performance issues here because it's a one row table, but it's sufficient to get the point across.

    CREATE TABLE
        #t
    (
        id int NOT NULL PRIMARY KEY
    );
    

    INSERT
    #t
    (
    id
    )
    SELECT
    id = 1;

    We have a one-row table, and now let's hit it with a CTE:

    WITH 
        t AS
    (
        SELECT
            t.id
        FROM #t AS t
    )
    SELECT
        t.*
    FROM t
    JOIN t AS t2
        ON t2.id = t.id
    JOIN t AS t3
        ON t3.id = t.id;
    

    The resulting query plan looks like this:

    NUTS

    For each join to the Common Table Expression, we end up hitting the base table. If you have CTEs that are referenced multiple times, you're usually better off using a #temp table to materialize the result. In Common Table Expressions with more complex queries in them, this can be especially painful.

    Row Goals

    While row goals do not materialize Common Table Expression results, they do (at least up to the current version of SQL Server) offer an https://dba.stackexchange.com/q/221555/32281 around the query inside of them.

    As an example of how this can work, take a look at these two queries:

    SELECT
        c = COUNT_BIG(*)
    FROM dbo.Users AS u
    JOIN dbo.Posts AS p
        ON p.OwnerUserId = u.AccountId
    JOIN dbo.Comments AS c
        ON c.PostId = p.ParentId
    JOIN dbo.Votes AS v
        ON v.PostId = c.PostId
    WHERE c.Score > 1000;
    

    SELECT
    c = COUNT_BIG()
    FROM dbo.Users AS u
    JOIN dbo.Posts AS p
    ON p.OwnerUserId = u.AccountId
    JOIN
    (
    SELECT TOP (2147483647)
    c.

    FROM dbo.Comments AS c
    JOIN dbo.Votes AS v
    ON v.PostId = c.PostId
    WHERE c.Score > 1000
    ) AS c ON c.PostId = p.ParentId;

    Here are the query plans for them:

    First query: NUTS

    Second query: NUTS

    The join order is altered because SQL Server has to honor the https://sqlperformance.com/2018/02/sql-plan/setting-and-identifying-row-goals . There's no real performance difference here, but you get the idea. If there's a particular thing you want to happen together, this can be one way to achieve that without a temporary object, but sans the materialization.

    Managing Complexity

    For an example of tuning a query that misused Common Table expressions by using #temp tables instead, check out this post of mine:

    • https://www.erikdarlingdata.com/sql-server/software-vendor-mistakes-with-sql-server-misusing-common-table-expressions/

    There's a video over there that I can't embed here that walks through identifying things like poor cardinality estimates coming out of Common Table Expressions, and using each one as a stopping point to materialize results.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2