Minimise duration of table lock
I've got a SQL table that needs to be updated daily. There may or may not be queries against that table while the update is happening. It's around 500,000 rows.
We have an issue where there is locking conflict when the job to update the table is running at the same time as a query against it.
So I have rewritten the process to update the table as follows:
ALTER procedure [dbo].[Table_Generate] as
declare @d datetime = getdate(), @c as int
--Check temp tables
IF OBJECT_ID('tempdb..#final') IS NOT NULL
DROP TABLE #final
IF OBJECT_ID('tempdb..#base') IS NOT NULL
DROP TABLE #base
--Get source data from linked server
from #base sub where sub.ID = main.ID for xml raw)) as row_hash
from #base main
select @c = count(*) from #final
if @c >0 begin
merge [The_Table_Staging] as target
using #final as source
on source.ID = target.ID
when not matched by target then
insert ( RunDate,
) values (
--Existing changed rows
when matched and source.row_hash != target.row_hash then update set
target.RunDate = @d
,target.Reference = source.Reference
,target.StartDate = source.StartDate
,target.EndDate = source.EndDate
,target.Description = source.Description
,target.SomeCode = source.SomeCode
,target.row_hash = source.row_hash
when not matched by source then delete;
--Existing unchanged rows
update [The_Table_Staging] set RunDate = @d where RunDate != @d
exec sp_rename 'The_Table_Live', 'The_Table_Staging_T'
exec sp_rename 'The_Table_Staging', 'The_Table_Live'
exec sp_rename 'The_Table_Staging_T', 'The_Table_Staging'
The idea is to reduce unnecessary row updates, and also to minimise the locking of the live table. I don't really like doing a table rename, but doing an update/insert takes 5-10 seconds, whereas the table rename is virtually instantaneous.
So my questions is: is this approach OK, and/or could I improve it?
Edit to respond to JD:
Hi JD Please no need to apologise - I'm here for constructive criticism.
- I didn't know
MERGEwas problematic. I've never had issues with it myself, but thanks
- I may rewrite this part into separate
- I normally agree. The reason I did this is that if I did a
TRUNCATE/INSERTfrom staging at that point, it takes 6-10 seconds whereas the
sp_renametakes under a second. So less time locking the table
- This doesn't affect the table locking, as it takes the data into the staging table first. I have no choice but to use linked server or SSIS, and in this case I prefer linked server to keep all the SQL in one place
- I always use
CONCATbecause otherwise 'a', 'bc' would hash the same as 'ab', 'c' which is not correct
All of the processing up to the populating of the live table from staging is fine - I just want to minimise the amount of time the final live table is locked for,
- I didn't know
Unfortunately I see a lot of sins in your code at a quick glance:
There's a multitude of https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/ that it really should be avoided altogether in production code.
MERGEis also known to be less performant than writing the individual
DELETEstatements; which may be the reasoning for some of your blocking issues.
While tempting, using the
sp_renamefunction to minimize blocking issues can actually result in worse blocking issues, as discussed in Kendra Little's https://www.littlekendra.com/2017/01/19/why-you-should-switch-in-staging-tables-instead-of-renaming/ . (If I recall correctly, this discusses using partition switching as a better solution.)
Linked Servers are also known to be https://docs.microsoft.com/en-us/troubleshoot/sql/performance/decreased-performance-linkedserver-views at times too (for fixed cardinality estimates and bringing all the data across the network before processing it). If you were previously relying on it in the transaction of your code, it's possible you can just keep that part outside of the transaction to minimize the lock time of your local tables.
Also, I do like the use of the
HASHBYTES()function to generate a row hash, that's something I've used in the past. But you may also find it more performant to call it on the row itself by using
CONCAT()across every column with a safe column separator (e.g. double pipe
||) as the parameter to it as opposed to using it around a subquery that leverages XML.
That being said, as noted by Erik Darling, https://dba.stackexchange.com/q/228789/32281 . If it is your bottleneck here, you can try materializing the results of the function in a computed column or indexed view (it is deterministic), or use an alternative method, such as CLR, as mentioned in the linked post.