Unpredictable Slowness and Table Spool(Lazy Read)



  • I was testing on Stackoverflow database to find out possible cases wherein SQL Server doesn't recommend index in the execution plan however if we introduce one, it would help in great way!

    Did it quite easily for Group by, Order by Clause and for aggregate function(count function - smallest copy of table). I wrote a random query wherein I knew that introducing supportive index will certainly help however missing index recommendation will be only on join condition and not on the order by clause.

    Query is as below:

    select top 100 Location from Users U join Badges B
    on B.UserId = U.Id
    order by Location desc
    

    Below indexes were introduced to improve performance:

    create index Location on Users(Location)
    go
    create index UsersId on Badges(UserId)
    go
    

    Indexes used by optimizer as expected for above query:

    Query Execution Plan with Index

    Logical reads and time stats are as below:

    Query Stats with Index

    Now, I wanted to test the performance with only index on Users table at Location column and no Index on Badges (UserId) Table, here performance becomes terrible(takes almost 7 minutes):

    Index only on Users Table

    Logical reads and Time stats are as below:

    Logical reads and Time Stats

    Index at Users tables are very much used, as evident from execution plan and logical reads however doing clustered index scan and Table Spool (Lazy Spool) causes most of the issue.

    All above tests are conducted on SQL Server 2019 in SQL Server 2016 compatibility mode(130).

    If someone could please advise on underlying issue, would be of great help.

    One more thing to note here, when there is no non-clustered supportive index on either of these two table, same query finishes in 9 seconds. Below is the execution plan:

    Execution Plan with no NC index

    Logical Read and Time Stats:

    Logical Reads and Time Stats with no NC Index

    For testing purpose, I changed compatibility level to 2019(150) and to my surprise - same previous query which had index only on Users (Location) table and not on Badges table, finished in 2 seconds which was taking 7 minutes in SQL Server 2016 compatibility(130) mode:

    SQL Server 2019 Execution Plan

    Logical Stats and Time Stats:

    Logical Stats and Time Stats

    In 2019 compatibility mode, all the operators before Parallelism are in batch mode.

    Any input in this regard would help me in understanding this behavior.



  • The optimizer chooses the plan with the lazy spool because of the https://sqlperformance.com/2018/02/sql-plan/setting-and-identifying-row-goals set by using the TOP clause.

    select top 100 Location from Users U join Badges B
    on B.UserId = U.Id
    order by Location desc
    

    You are asking for the first 100 rows—not the full set—so the optimizer tries to find a plan that returns 100 rows quickly rather than a plan that would return all matches in the quickest time.

    The optimizer's reasoning is based on statistics and modelling assumptions, which can go wrong, as it does here.

    The lazy table spool is a https://sqlperformance.com/2019/09/sql-performance/nested-loops-joins-performance-spools .

    Both the row goal and performance spool factor into the estimated cost for the plan shown being calculated as cheapest.

    You can disable the performance spool with the query hint:

    NO_PERFORMANCE_SPOOL
    

    You can disable the row goal logic with the query hint:

    USE HINT ('DISABLE_OPTIMIZER_ROWGOAL')
    

    Either will give you a plan more like the hash and top n sort you showed last, but without batch mode when compatibility is set to 130.


    By the way, the performance of the lazy spool plan is very dependent on your hardware, especially memory and tempdb performance. A typical execution on my laptop ran in 6.5 seconds:

    Nested loop and spool plan

    With no nonclustered indexes, it runs for 6.9 seconds:

    No nonclustered indexes plan

    Executed on Microsoft SQL Server 2019 CU15. Max memory set to 4GB, 12 logical processors available.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2