UDF and built-in function behavior



  • Take the following queries:

    DECLARE @X VARCHAR(200) = '1,2,3,4'
    

    SELECT
    *,
    dbo.aUserDefinedScalarFunction(4) AS ScalarValue
    FROM
    MyTable T
    INNER JOIN dbo.aUserDefineTableFunction(@X) A ON T.SomeID=A.SomeID
    WHERE
    (T.ID1 IS NULL OR T.ID1 IN (SELECT [value] FROM STRING_SPLIT(@X,',')))
    AND
    (T.ID2 IS NULL OR T.ID2 IN (SELECT Value FROM dbo.MySplitterFunction(@X))

    I normally create indexed #tempTables for the WHERE conditions above, I have found that to perform better on large datasets. However, I still can't find definitive answers to the following questions:

    1. Will the query analyzer optimize aUserDefinedScalarFunction(4) As ScalarValue or does it get evaluated for each record?

    2. Will the INNER JOIN dbo.aUserDefineTableFunction(@X) get materialized into a temp table once or will it be executed for each record? The function returns table (not a table variable).

    3. Does the result of SELECT [value] FROM STRING_SPLIT(@X,',') get optimized or is it evaluated for each comparison?

    4. Does the result of SELECT Value FROM dbo.MySplitterFunction(@X) get optimized or is it evaluated during each comparison?



    1. This depends on the function. For a deterministic function specified with schemabinding, the optimizer will usually be able to evaluate its result just once, caching the answer.

      For details, see:

      • https://dba.stackexchange.com/q/140381/1192
      • https://dba.stackexchange.com/q/84460/1192
    2. Inline table-valued functions are expanded into the calling query text before optimization, so there is no special opportunity for early evaluation and caching. That does not mean the execution plan won't feature any cache-and-reuse optimizations. Where applicable, the optimizer will consider those (e.g. a spool) just as it would for a query without an inline function.

    3. Generally, the optimizer will be able to tell that SPLIT_STRING on a constant input returns deterministic results. The plan is likely to evaluate the split just once, but it does depend on plan shape.

    4. Assuming the splitter function is a multi-statement table-valued function, the result is very likely to be evaluated once. For details, and how you can tell if caching occurs from the execution plan, see:

      • https://dba.stackexchange.com/q/159333/1192

    All that said, you are right to be cautious about all this. The optimizer will frequently be able to evaluate expressions once and reuse the result, but this is not guaranteed, and often requires expert analysis to determine.

    Forcing materialization using temporary tables is frequently the most robust solution since it guarantees a particular order and number of evaluations.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2