Minimise duration of table lock



  • I've got a SQL table that needs to be updated daily. There may or may not be queries against that table while the update is happening. It's around 500,000 rows.

    We have an issue where there is locking conflict when the job to update the table is running at the same time as a query against it.

    So I have rewritten the process to update the table as follows:

    ALTER procedure [dbo].[Table_Generate] as
    

    declare @d datetime = getdate(), @c as int

    --Check temp tables
    IF OBJECT_ID('tempdb..#final') IS NOT NULL
    DROP TABLE #final

    IF OBJECT_ID('tempdb..#base') IS NOT NULL
    DROP TABLE #base

    --Get source data from linked server
    select
    ID,
    Reference,
    StartDate,
    EndDate,
    Description,
    SomeCode
    into #base

    from [LinkedServer].[Database].dbo.[A_View]

    --Generate row_hash
    select
    ID,
    Reference,
    StartDate,
    EndDate,
    Description,
    SomeCode,
    hashbytes('SHA2_256',(
    select
    ID,
    Reference,
    StartDate,
    EndDate,
    Description,
    SomeCode
    from #base sub where sub.ID = main.ID for xml raw)) as row_hash
    into #final
    from #base main

    select @c = count(*) from #final
    if @c >0 begin

    merge [The_Table_Staging] as target
    using #final as source
    on source.ID = target.ID

    --New rows
    when not matched by target then
    insert ( RunDate,
    ID,
    Reference,
    StartDate,
    EndDate,
    Description,
    SomeCode,
    Row_Hash
    ) values (
    @d,
    source.ID,
    source.Reference,
    source.StartDate,
    source.EndDate,
    source.Description,
    source.SomeCode,
    source.row_hash)

    --Existing changed rows
    when matched and source.row_hash != target.row_hash then update set
    target.RunDate = @d
    ,target.Reference = source.Reference
    ,target.StartDate = source.StartDate
    ,target.EndDate = source.EndDate
    ,target.Description = source.Description
    ,target.SomeCode = source.SomeCode
    ,target.row_hash = source.row_hash

    --Deleted rows
    when not matched by source then delete;

    --Existing unchanged rows
    update [The_Table_Staging] set RunDate = @d where RunDate != @d

    --Commit changes
    begin transaction

    exec sp_rename 'The_Table_Live', 'The_Table_Staging_T'
    exec sp_rename 'The_Table_Staging', 'The_Table_Live'
    exec sp_rename 'The_Table_Staging_T', 'The_Table_Staging'

    commit transaction
    end

    The idea is to reduce unnecessary row updates, and also to minimise the locking of the live table. I don't really like doing a table rename, but doing an update/insert takes 5-10 seconds, whereas the table rename is virtually instantaneous.

    So my questions is: is this approach OK, and/or could I improve it?

    Thanks!

    Edit to respond to JD:

    Hi JD Please no need to apologise - I'm here for constructive criticism.

    1. I didn't know MERGE was problematic. I've never had issues with it myself, but thanks
    2. I may rewrite this part into separate INSERT/UPDATE/DELETE statements
    3. I normally agree. The reason I did this is that if I did a TRUNCATE/INSERT from staging at that point, it takes 6-10 seconds whereas the sp_rename takes under a second. So less time locking the table
    4. This doesn't affect the table locking, as it takes the data into the staging table first. I have no choice but to use linked server or SSIS, and in this case I prefer linked server to keep all the SQL in one place
    5. I always use XML instead of CONCAT because otherwise 'a', 'bc' would hash the same as 'ab', 'c' which is not correct

    All of the processing up to the populating of the live table from staging is fine - I just want to minimise the amount of time the final live table is locked for,



  • Unfortunately I see a lot of sins in your code at a quick glance:

    1. There's a multitude of https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/ that it really should be avoided altogether in production code.

    2. MERGE is also known to be less performant than writing the individual INSERT, UPDATE, and DELETE statements; which may be the reasoning for some of your blocking issues.

    3. While tempting, using the sp_rename function to minimize blocking issues can actually result in worse blocking issues, as discussed in Kendra Little's https://www.littlekendra.com/2017/01/19/why-you-should-switch-in-staging-tables-instead-of-renaming/ . (If I recall correctly, this discusses using partition switching as a better solution.)

    4. Linked Servers are also known to be https://docs.microsoft.com/en-us/troubleshoot/sql/performance/decreased-performance-linkedserver-views at times too (for fixed cardinality estimates and bringing all the data across the network before processing it). If you were previously relying on it in the transaction of your code, it's possible you can just keep that part outside of the transaction to minimize the lock time of your local tables.

    5. Also, I do like the use of the HASHBYTES() function to generate a row hash, that's something I've used in the past. But you may also find it more performant to call it on the row itself by using CONCAT() across every column with a safe column separator (e.g. double pipe ||) as the parameter to it as opposed to using it around a subquery that leverages XML.

    6. That being said, as noted by Erik Darling, https://dba.stackexchange.com/q/228789/32281 . If it is your bottleneck here, you can try materializing the results of the function in a computed column or indexed view (it is deterministic), or use an alternative method, such as CLR, as mentioned in the linked post.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2