SQL Server: Is it worth checking for change in a field before the update?

  • I'm investigating this statement performance on SQL Server 2019 (I've extracted the important part):

            A1 = src.A1,
            A2 = src.A2,
            A3 = src.A3,
            A4 = src.A4,
            A5 = src.A5
          MyTable m 
          JOIN SourceTable src ON m.Id = src.Id
            A1 <> src.A1
        OR  A2 <> src.A2
        OR  A3 <> src.A3
        OR  A4 <> src.A4
        OR  A5 <> src.A5
    • There's no triggers
    • Row count is fairly small (around 200k-300k)
    • Table is mostly read from, but frequently
    • The A1/5 aren't under index, they're mostly nvarchar(50) or similar
    • No replication setup

    Is it worth it to do the WHERE part to prevent the unnecessary updates? If not - which use-case would make it worth it?

  • Is it worth it to do the WHERE part to prevent the unnecessary updates?

    Yes. Every update to a row even if nothing has actually changed causes a write to the log and potentially to the data file⁰. This extra IO load may directly affect performance. It may also require more by way of locks which could further harm concurrency, including your reads not just other writes.

    As well as the extra IO caused by the write itself there may be other knock-on effects: if you are log shipping for backups then you'll use more space and bandwidth there. Differential backups may also be larger than they'd otherwise need to be. And of course if you are using replication, mirroring, and other such, these NoOp writes will take effect there too.

    Also, if you are using temporal tables you will find an extra row in the history, so you'll consume more space too that way, possibly permanently.

    There's no triggers / No replication setup

    If you are not using any of the above now, you might choose to later. Or someone after you may.

    The A1/5 aren't under index, they're mostly nvarchar(50) or similar

    Whether these were indexed or not shouldn't affect the performance of the decision to write or not.

    Row count is fairly small (around 200k-300k)

    Always be aware that your data may grow significantly over time.

    If not - which use-case would make it worth it?

    As I'd say it is definitely worth it most of the time, I'll flip that question to “which use case would make it not worth it?”:

    For small updates where you can afford not to care about any of the above, your code is far simpler if you don't include the filtering clause. If the resulting filtering clause is long and complex then this will remove a potential source of odd bugs. Your WHERE clause example as it is might need to be more complex/ugly if the affected columns are NULLable, so may already be buggy, because A1 <> src.A1 may make it skip an update if either m.A1 or src.A1 are NULL but the other isn't².

    If you are using heavy checks, perhaps a sub-query that you can't somehow factor out of the SET and WHERE clauses¹ then the extra work there might dwarf the extra IO of unnecessary updates.

    [0] as pointed out by Charlieqace in a comment, the experiments documented in https://www.sql.kiwi/2010/08/the-impact-of-non-updating-updates.html show that the data pages are not modified in a number of cases, so my previous wording was inaccurate. Given other concerns (the article doesn't test if locking is affected, triggers and temporal table activities do still fire, there is some extra log activity, …) I would still recommend checking before updating everywhere this does not result in convoluted code.

    [1] CTEs are usually an answer to this, though not always, and the query planner might be clever enough not to run the identical sub-query twice anyway but I'd not want to reply upon that.

    [2] because not only is NULL not equal to NULL, it is also not not-equal to NULL.

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2