SQL Update statistics with Fullscan and normal update without fullscan



  • What is the difference between SQL Server Update statistics with Fullscan and without Fullscan.

    Is the Fullscan will improve the performance better that without fullscan option. How can we get the see the difference on the both. Is there any query to find, to process fullscan or without full scan. Need your guidance for this.

    Thanks...



  • If your estimates are really wrong, increasing statistics samplerate may be a way of solving the problem. But it also requires longer time to scan your data - it's a trade-off as so many other things in SQL Server.

    I find that looking at an example is sometimes beneficial

    CREATE TABLE numbers (number int);
    

    WITH empty_rows AS (
    SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
    SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
    SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
    SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
    SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
    SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL
    SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n UNION ALL SELECT 1 AS n
    )
    INSERT INTO numbers
    SELECT row_number() over (order by a.n) as number
    FROM empty_rows as a, empty_rows as b, empty_rows as c, empty_rows as d,
    empty_rows as e
    ;

    CREATE NONCLUSTERED INDEX numbers_ix ON numbers (number);
    UPDATE STATISTICS numbers WITH SAMPLE 1 PERCENT;

    The above produces ~ 17 million rows, stores in table numbers, builds a nonclustered index and updates the statistics on said index to only sample 1% of the rows.

    Now, if you ask SQL Server to build a plan for this query:

    SELECT * FROM numbers WHERE number = 540400
    

    How many rows will SQL Server expect to return? You and I know it's exactly one (1), but SQL Server doesn't because it only sampled ~230.000 rows, and doesn't know that between say 540400 and 540500 there are 99 other values. For all it knows, 540400 may be repeated 99 times.

    If you check out the statistics

    dbcc show_statistics (numbers, numbers_ix)
    

    you'll se this

    sample statistics

    You'll notice that 540400 falls into the bucket with HIGH_KEY = 540526, which has an AVG_RANGE_ROWS of 4.172074 - so SQL Server will assume that the query returns 4.17 rows.

    Now let's try with fullscan

    UPDATE STATISTICS numbers WITH FULLSCAN;
    DBCC SHOW_STATISTICS (numbers, numbers_ix);
    

    The result is now

    Statistics after fullscan

    Which tells sqlserver that one bucket is enough, and every possible value between 1 and 17210367 occurs exactly once, and therefore the expected number of rows to be returned will be 1.

    One or four rows won't matter, but if your dataset contains fewer distinct values and maybe spans a larger interval than 1 - 17.2 million, then SQL Server will make guesses that are wrong by factors of 1-1.000. Those wrong estimates will confuse the optimizer, and it may end up choosing an inferior plan.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2