Bitmap Creation in Execution Plan Causes bad Estimate on Clustered Index Scan
Given the following simple query on the StackOverflow2010 database:
SELECT u.DisplayName, u.Reputation FROM Users u JOIN Posts p ON u.id = p.OwnerUserId WHERE u.DisplayName = 'alex' AND p.CreationDate >= '2010-01-01' AND p.CreationDate
I was trying to understand why creating an index
CREATE INDEX IX_CreationDate ON Posts ( CreationDate ) INCLUDE (OwnerUserId)
yields a better estimate on
When I run the query with no indexes, I get https://www.brentozar.com/pastetheplan/?id=rkuLejx0t . In this plan, SQL Server estimates 298,910 rows coming out of the CI scan on Posts and actually 552 come back - this estimate is way off.
Once I add the index, I get https://www.brentozar.com/pastetheplan/?id=SynvZjeCK , which causes an index seek and much more accurate estimate.
I was curious as to why adding an index would cause a better estimate, since statistics are created when a column is used in a
WHEREpredicate, regardless of whether it is indexed or not.
On further inspection, I can see the predicate on
Posts.CreationDateis different on plan 1 vs plan 2:
Plan 1 predicate
[StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]>='2010-01-01 00:00:00.000' AND [StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]
Plan 2 predicate
Seek Keys: Start: [StackOverflow2010].[dbo].[Posts].CreationDate >= Scalar Operator('2010-01-01 00:00:00.000'), End: [StackOverflow2010].[dbo].[Posts].CreationDate
So i can see Plan 2 is just going to use the histogram to find the number of rows between the two dates but Plan 1 has a slightly more complicated predicate involving a bitmap probe.
That (I think) explains why the estimate on the seek is more accurate but I am now wondering what is the bitmap probe? I can see in the plan that there is a bitmap created of the user Ids that match the Alex predicate and that is what is being probed.
I wondered "without the index, why wouldn't Plan 1 be the same as Plan 2, the only difference being a CI scan instead of an index seek on CreationDate?"
I did some further testing and found that if I run the query without the index but force the plan to go serial, using
OPTION (MAXDOP 1)I get https://www.brentozar.com/pastetheplan/?id=H1OkLigCF which has a better estimate on CreationDate despite now doing a CI Scan on Posts. If I look at the predicate, I can see that the probe is now gone and the bitmap is no longer in the plan so this leads me to believe the bitmap is something to do with the plan going parallel.
So my question is - why is a bitmap created when the plan goes parallel and why does it cause such a bad estimate on
A number of factors in play:
The index comes with full scan statistics. The auto-created ones were sampled.
Different cardinality estimation models and execution modes handle the calculation differently. You might be happier with the estimate using the original CE model in this case:
USE HINT ('FORCE_LEGACY_CARDINALITY_ESTIMATION')
Bitmaps only appear in row mode parallel plans. I wrote about the details in https://www.sql.kiwi/2011/07/bitmap-magic.html
Bitmaps are possible in both serial and parallel batch mode plans. You have your database set to compatibility mode 130 so https://docs.microsoft.com/en-us/sql/relational-databases/performance/intelligent-query-processing#batch-mode-on-rowstore is not available to you. Side note: you might want to apply the latest CU for 2019—you're still on RTM.
The estimate formula varies, but usually has its roots in estimating a semi join between filtered rows on the build side of the hash join and the target table using histograms. Sometimes it is a guess. Sometimes the bitmap is not considered at all:
In parallel row mode plans, there are two types of bitmap. The original type of bitmap was added heuristically after query optimization has completed. Since it is not present during optimization, it has no effect on cardinality estimates. These bitmaps are named
Bitmapxxxx. Yours is one of these:
It is more difficult to see because the effect of the bitmap is mixed in with the CreationDate predicate. We can separate them out with undocumented trace flag 9130:
SELECT U.DisplayName, U.Reputation FROM dbo.Users AS U JOIN dbo.Posts AS P ON U.id = P.OwnerUserId WHERE U.DisplayName = N'alex' AND P.CreationDate >= CONVERT(datetime, '2010-01-01', 121) AND P.CreationDate
The bitmap is still applied in-row at the scan, but the predicate on CreationDate is in a later Filter operator:
The estimate on the scan is the full cardinality of the base table, despite the bitmap still being applied there:
If you're interested in seeing a plan without the bitmap to compare estimates, you can enable undocumented trace flag 9498.
The second type of row mode bitmap is a so-called https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/bb510541(v=sql.105) . These are evaluated as part of cost-based optimization, so they do have an effect on cardinality estimates and final plan shape. These bitmaps are named
I wrote about the details for batch mode in https://sqlperformance.com/2019/08/sql-performance/batch-mode-bitmaps-in-sql-server .