How to resolve INSERT contention in Azure SQL Database?



  • I'm working with an Azure SQL Database that has a very hot logging table. Any time a row in any other table is inserted, updated, or deleted, the web application writes a row to this logging table. The rows are very small and there is only one NC index and the PK on the table, but these inserts can stack up and sometimes even timeout, and the application then has to use its exponential back-off retry logic.

    The architect decided to implement partitioning on this logging table to resolve the insert speed problem. Every table has ProjectId (UNIQUEIDENTIFIER) as part of the PK. He added a BIGINT IDENTITY ProjectNo column to the Project table, and the partitioning function looks up the ProjectNo for the ProjectId and then uses ProjectNo % 200 to pick which partition all rows for that project use.

    CREATE TABLE [dbo].[ChangeLog](
        [IncrementId]           [bigint] IDENTITY(1,1)  NOT NULL,
        [ProjectId]             [uniqueidentifier]      NOT NULL,
        [ProjectNoPartitionKey] [tinyint]               NOT NULL,
        [CreatedBy]             [int]                   NOT NULL,
        [CreatedOn]             [datetime2](0)          NOT NULL,
        [EntityId]              [uniqueidentifier]      NULL, -- but no null values
        [EntityName]            [varchar](50)           NULL, -- but no null values
        [RevisionId]            [uniqueidentifier]      NULL,
        [RootId]                [uniqueidentifier]      NULL,
        [Status]                [tinyint]               NOT NULL,
        [OperationType]         [tinyint]               NOT NULL,
     CONSTRAINT [PK_ChangeLog] PRIMARY KEY CLUSTERED 
    (
        [IncrementId] ASC,
        [ProjectNoPartitionKey] ASC,
        [ProjectId] ASC
    );
    

    CREATE NONCLUSTERED INDEX [IX_ChangeLog_ProjectNoPartitionKey_ProjectId_EntityName_IncrementId_INCLUDE_All] ON [dbo].[ChangeLog]
    (
    [ProjectNoPartitionKey] ASC,
    [ProjectId] ASC,
    [EntityName] ASC,
    [IncrementId] ASC
    )
    INCLUDE([EntityId],[RootId],[RevisionId],[Status],[OperationType],[CreatedBy],[CreatedOn]) WITH (STATISTICS_NORECOMPUTE = OFF, DROP_EXISTING = OFF, ONLINE = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF);

    Unfortunately, there is a huge amount of skew in the number of rows in the logging table between projects. The largest 4 partitions of the 200 have 25% of all the rows in the logging table. So while it reduced the timeout frequency, they still occur often enough to be a concern. The table is written to around 175x as often as it is read from.

    It doesn't seem likely that increasing the number of partitions will be of much help. I've been told that making the logging table in-memory would be the best solution, but none of us have used that feature. Suggestions?



  • I would definitely suggest modifying your PK to use OPTIMIZE_FOR_SEQUENTIAL_KEY (as suggested by https://dba.stackexchange.com/users/32281/erik-darling in https://dba.stackexchange.com/questions/305074/how-to-resolve-insert-contention-in-azure-sql-database?noredirect=1#comment595567_305074 .

    https://www.erikdarlingdata.com/optimize_for_sequential_key/enabling-optimize-for-sequential-key/ is pretty straightforward. This makes it a relatively low-effort, low-impact starting point.

    ALTER INDEX PK_ChangeLog 
        ON dbo.ChangeLog SET(OPTIMIZE_FOR_SEQUENTIAL_KEY  = ON);
    

    This feature was made specifically to address the scenario you're seeing--so before re-clustering the table and fighting those higher-effort, higher-risk changes, using the purpose built setting first is the right starting point! There's no guarantee it will be a silver bullet to solve your problem, but it should help.

    Pam Lahoud, from the SQL Server Product Group wrote a great https://techcommunity.microsoft.com/t5/sql-server-blog/behind-the-scenes-on-optimize-for-sequential-key/ba-p/806888 on the feature. From Pam's article:

    With last page insert contention, as the number of insert threads increases, the queue for the page latch increases which in turn increases latency. Throughput will also decrease, but if something slows down one of the threads that is holding the latch, this can trigger a convoy and throughput suddenly falls off a cliff.

    ...

    When using this option, you may not see a decrease in PAGELATCH waits, in fact you may even see an increase in waits with a new wait type called BTREE_INSERT_FLOW_CONTROL. Despite these waits, you should be able to achieve much better throughput, and scale to a much higher number of concurrent threads, without hitting the proverbial cliff.

    If you are still seeing limited throughput after enabling that feature, you may then need to resort to changing your indexing strategy to eliminate that "last page contention" when concurrent inserts are all clamoring to write to the end of the table.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2