Bitmap Creation in Execution Plan Causes bad Estimate on Clustered Index Scan



  • Given the following simple query on the StackOverflow2010 database:

    SELECT  u.DisplayName,
            u.Reputation
    FROM    Users u
            JOIN Posts p
                ON u.id = p.OwnerUserId
    WHERE   u.DisplayName = 'alex' AND
            p.CreationDate >= '2010-01-01' AND
            p.CreationDate 

    I was trying to understand why creating an index

    CREATE INDEX IX_CreationDate ON Posts
    (
        CreationDate
    )
    INCLUDE (OwnerUserId)
    

    yields a better estimate on Posts.CreationDate

    When I run the query with no indexes, I get https://www.brentozar.com/pastetheplan/?id=rkuLejx0t . In this plan, SQL Server estimates 298,910 rows coming out of the CI scan on Posts and actually 552 come back - this estimate is way off.

    Once I add the index, I get https://www.brentozar.com/pastetheplan/?id=SynvZjeCK , which causes an index seek and much more accurate estimate.

    I was curious as to why adding an index would cause a better estimate, since statistics are created when a column is used in a WHERE predicate, regardless of whether it is indexed or not.

    On further inspection, I can see the predicate on Posts.CreationDate is different on plan 1 vs plan 2:

    Plan 1 predicate

    [StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]>='2010-01-01 00:00:00.000' AND [StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]

    Plan 2 predicate

    Seek Keys[1]: Start: [StackOverflow2010].[dbo].[Posts].CreationDate >= Scalar Operator('2010-01-01 00:00:00.000'), End: [StackOverflow2010].[dbo].[Posts].CreationDate 

    So i can see Plan 2 is just going to use the histogram to find the number of rows between the two dates but Plan 1 has a slightly more complicated predicate involving a bitmap probe.

    That (I think) explains why the estimate on the seek is more accurate but I am now wondering what is the bitmap probe? I can see in the plan that there is a bitmap created of the user Ids that match the Alex predicate and that is what is being probed.

    I wondered "without the index, why wouldn't Plan 1 be the same as Plan 2, the only difference being a CI scan instead of an index seek on CreationDate?"

    I did some further testing and found that if I run the query without the index but force the plan to go serial, using OPTION (MAXDOP 1) I get https://www.brentozar.com/pastetheplan/?id=H1OkLigCF which has a better estimate on CreationDate despite now doing a CI Scan on Posts. If I look at the predicate, I can see that the probe is now gone and the bitmap is no longer in the plan so this leads me to believe the bitmap is something to do with the plan going parallel.

    So my question is - why is a bitmap created when the plan goes parallel and why does it cause such a bad estimate on Posts.CreationDate?



  • A number of factors in play:

    The index comes with full scan statistics. The auto-created ones were sampled.

    Different cardinality estimation models and execution modes handle the calculation differently. You might be happier with the estimate using the original CE model in this case:

    USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')
    

    Bitmaps only appear in row mode parallel plans. I wrote about the details in https://www.sql.kiwi/2011/07/bitmap-magic.html

    Bitmaps are possible in both serial and parallel batch mode plans. You have your database set to compatibility mode 130 so https://docs.microsoft.com/en-us/sql/relational-databases/performance/intelligent-query-processing#batch-mode-on-rowstore is not available to you. Side note: you might want to apply the latest CU for 2019—you're still on RTM.

    The estimate formula varies, but usually has its roots in estimating a semi join between filtered rows on the build side of the hash join and the target table using histograms. Sometimes it is a guess. Sometimes the bitmap is not considered at all:

    In parallel row mode plans, there are two types of bitmap. The original type of bitmap was added heuristically after query optimization has completed. Since it is not present during optimization, it has no effect on cardinality estimates. These bitmaps are named Bitmapxxxx. Yours is one of these:

    Bitmap

    It is more difficult to see because the effect of the bitmap is mixed in with the CreationDate predicate. We can separate them out with undocumented trace flag 9130:

    SELECT
        U.DisplayName,
        U.Reputation
    FROM dbo.Users AS U
    JOIN dbo.Posts AS P
        ON U.id = P.OwnerUserId
    WHERE
        U.DisplayName = N'alex' 
        AND P.CreationDate >= CONVERT(datetime, '2010-01-01', 121)
        AND P.CreationDate 

    The bitmap is still applied in-row at the scan, but the predicate on CreationDate is in a later Filter operator:

    Scan estimate with TF 9130

    Filter estimate with TF 9130

    The estimate on the scan is the full cardinality of the base table, despite the bitmap still being applied there:

    Bitmap applied at scan

    If you're interested in seeing a plan without the bitmap to compare estimates, you can enable undocumented trace flag 9498.

    The second type of row mode bitmap is a so-called https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/bb510541(v=sql.105) . These are evaluated as part of cost-based optimization, so they do have an effect on cardinality estimates and final plan shape. These bitmaps are named Opt_Bitmapxxx.

    I wrote about the details for batch mode in https://sqlperformance.com/2019/08/sql-performance/batch-mode-bitmaps-in-sql-server .




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2