Why is a table scan for small data so slow



  • I've got a table with 200k rows which lives on an SSD. I'm doing a lookup using a comparison on varchar field where ~20% of entries are null, otherwise 10 numeric characters with random distribution.

    I expect a search to take a bit of time without an index, but not 13s like it does now. The execution plan accounts all 13s to "Clustered Index Scan". What could be going wrong here? This amount of data can be read and grepped in less than a second, so I'm surprised with SQL Server doing >10x the work.

    The profiler says: CPU 359, reads 11143.

    Different slow run (3m 21s) plan: https://www.brentozar.com/pastetheplan/?id=rkNF4LF6K

    Edit: After compressing the table at Page level I'm down to sub-second results. After turning off compression I'm still at sub-second. After a few hours, I'm back to very long delays, this time >1min)



  • So as I mentioned in the comments, you appear to be having a https://docs.microsoft.com/en-us/sql/relational-databases/performance/cardinality-estimation-sql-server?view=sql-server-ver15 issue which is resulting in the unnecessary memory grant to occur (as Martin pointed out). You should always use your execution plans as a comparison even if they both are doing the same operations, because the runtime statistics may be different between the slow and fast run, which appears to be the case here.

    You can observe the cardinality estimate issue you're experiencing in this example by looking at the clustered index scan operation on the right side of your execution plan. It is showing Actual Number of Rows is 5, but the plan's Estimated Number of Rows is over 7,000. This is a significant overestimate that results in the query requesting and waiting for much more memory than it needs. (Sometimes you'll the see the opposite occur, where the cardinality estimate is an underestimate, and under requests memory and / or generates a less efficient execution plan as a result.)

    Your issue likely results from https://www.sqlshack.com/how-to-use-sargable-expressions-in-t-sql-queries-performance-advantages-and-examples/ and hard to optimize predicates in your WHERE clause, most likely these predicates WHERE (REPLACE(MEDICARE_NO, ' ', '') LIKE '111111111%') AND (STATUS_CODE Not in ('D', 'X','I','Z')). Using functions in predicates can result in cardinality estimate issues like you're currently seeing. So replacing () the REPLACE() function with a different solution would probably help fix your issue. You can either store the data already staged with spaces replaced or add a https://docs.microsoft.com/en-us/sql/relational-databases/tables/specify-computed-columns-in-a-table?view=sql-server-ver15 to your table that applies the REPLACE() function. You can even make it a persisted computed column and index it, since the REPLACE() function is deterministic.

    Your other predicate AND STATUS_CODE Not in ('D', 'X','I','Z')) could also be better rewritten to maximize performance. NOT IN is an inequality operator and makes it difficult for the engine to optimize for. You're better off using IN and listing the values that are valid, which would be an equality operation then. Additionally the IN clause is just syntactical sugar for multiple OR clauses. Sometimes that also throws off the optimizer and you can try replacing them with using a UNION clause instead, for each value from your IN clause. Of course that would require you to repeat your query multiple times, so hurts maintainability and readability a little bit, but most times you can refactor the core part of your code to a CTE first then using a UNION against that CTE for each value in the IN clause.

    But I'd recommend focussing on eliminating your use of the REPLACE() function in your WHERE clause first. My instincts would say that's your bigger issue here. You can quickly prove it out too by testing with selecting your data into a temp table first, with REPLACE(MEDICARE_NO, ' ', '') AS SomeColumn as a column you select into the temp table. Then use that temp table in your main query instead, so your main query's WHERE clause now becomes WHERE SomeColumn LIKE '111111111%'.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2