ALTER TABLE ADD COLUMN in batches but with NEWSEQUENTIALID()



  • I have a large table, but not huge (less than 2 million rows on aging hardware), and when adding a non-nullable column to an existing table, I usually follow the structure set out https://dba.stackexchange.com/a/188410/45757 to avoid problems with the script timing out when our database migrations run on deployment (FYI - This isn't a full text index problem).

    So, in summary, I:

    • Alter the table and add the column as NULL and do not add a default constraint
    • Backfill the column in batches
    • Alter the table and change the column to be NOT NULL and add the default constraint

    However, in the following case I want to add a new UNIQUEIDENTIFER column and fill it with NEWSEQUENTIALID() rather than NEWID() values.

    Without running in batches, my script would look like this:

    IF NOT EXISTS (
        SELECT NULL FROM INFORMATION_SCHEMA.COLUMNS 
        WHERE TABLE_NAME = 'Invoice' 
        AND COLUMN_NAME = 'InternalId')
    BEGIN
        ALTER TABLE Invoice 
            ADD InternalId UNIQUEIDENTIFIER NOT NULL 
            CONSTRAINT [DF_Invoice_InternalId] DEFAULT (NEWSEQUENTIALID())
    END
    GO
    

    However if I split this up into batches, and attempt to fill the nullable InternalId with the following:

    IF NOT EXISTS (
        SELECT NULL FROM INFORMATION_SCHEMA.COLUMNS 
        WHERE TABLE_NAME = 'Invoice' 
        AND COLUMN_NAME = 'InternalId')
    BEGIN
        ALTER TABLE Invoice 
            ADD InternalId UNIQUEIDENTIFIER NULL
    END
    

    DECLARE @MaxId INT, @LoopStart INT, @LoopEnd INT, @LoopSize INT = 50000
    SELECT @MaxId = MAX(InvoiceId) FROM Invoice
    SELECT @LoopStart = MIN(InvoiceId) FROM Invoice
    SET @LoopEnd = @LoopStart + @LoopSize

    PRINT 'Updating InternalIds to a new GUID'
    WHILE @LoopStart <= @MaxId
    BEGIN
    -- update internal id
    UPDATE I
    SET InternalId = NEWSEQUENTIALID()
    FROM Invoice I
    WHERE I.InvoiceId BETWEEN @LoopStart AND @LoopEnd

    SET @LoopStart = @LoopEnd + 1
    SET @LoopEnd = @LoopEnd + @LoopSize
    

    END

    IF EXISTS (
    SELECT NULL FROM INFORMATION_SCHEMA.COLUMNS
    WHERE TABLE_NAME = 'Invoice'
    AND COLUMN_NAME = 'InternalId'
    AND IS_NULLABLE = 'YES')
    BEGIN
    ALTER TABLE Invoice
    ALTER COLUMN InternalId UNIQUEIDENTIFIER NOT NULL
    END

    IF NOT EXISTS (SELECT NULL FROM sys.objects WHERE name = 'DF_Invoice_InternalId')
    BEGIN
    ALTER TABLE Invoice
    ADD CONSTRAINT [DF_Invoice_InternalId]
    DEFAULT ((NEWSEQUENTIALID())) FOR [InternalId]
    END

    I get the following error:

    Msg 302, Level 16, State 0, Line 40 The newsequentialid() built-in function can only be used in a DEFAULT expression for a column of type 'uniqueidentifier' in a CREATE TABLE or ALTER TABLE statement. It cannot be combined with other operators to form a complex scalar expression.

    Any tips on how to work around this? Or am I over-thinking this?

    The reason for doing this change is to expose the Sequential IDs (InternalId or could be called PublicId) externally in an API, as a replacement for the current sequential numeric Ids (InvoiceId). The numeric Id (the primary key) should have been kept internal, as it exposes a sequential and guessable internal value. The Sequential GUID is still sequential, but also not so easily guessable. To illustrate the point, I'm doing something like this, but @First is being provided via an API call. It is used for polling and processing new invoices using a watermarking process.

    CREATE TABLE #Test (
        Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
        DateCreated DATETIME NOT NULL DEFAULT(GETDATE()),
        Code NVARCHAR(50) NOT NULL
    )
    

    DECLARE @Id INT
    DECLARE @NO_OF_CHARS INT = 10
    SET @Id = 1

    WHILE @Id <= 12000
    BEGIN

    INSERT INTO #Test (Code) VALUES (SUBSTRING (REPLACE(CONVERT(VARCHAR(40), NEWID()), '-',''), 1, @NO_OF_CHARS))
    SET @Id = @Id + 1
    END

    ALTER TABLE #Test
    ADD InternalId UNIQUEIDENTIFIER NOT NULL DEFAULT(NEWSEQUENTIALID())

    DECLARE @First UNIQUEIDENTIFIER
    SELECT * FROM #Test
    SELECT @First = InternalId FROM #Test WHERE Id = 1
    SELECT * FROM #Test WHERE InternalID > @First

    DROP TABLE #Test



  • You could add the column as nullable with the constraint and then update the column with DEFAULT value. Conceptually something like this:

    ALTER TABLE Invoice 
        ADD InternalId UNIQUEIDENTIFIER NULL 
        CONSTRAINT [DF_Invoice_InternalId] DEFAULT (NEWSEQUENTIALID())
    

    UPDATE I
    SET InternalId = DEFAULT
    FROM Invoice I

    ALTER TABLE Invoice
    ALTER COLUMN InternalId UNIQUEIDENTIFIER NOT NULL

    But based on your motivation I would advice against using NEWSEQUENTIALID. It IS guessable. If someone gets one of the generated IDs he can easilly guess former and following values. Take a look at this StackOverflow https://stackoverflow.com/a/24158535/2603086

    In most cases, the next newsequentialid can be predicted by taking the current value and adding one to the first hex pair.

    In other words:

    1E29E599-45F1-E311-80CA-00155D008B1C

    is followed by

    1F29E599-45F1-E311-80CA-00155D008B1C

    is followed by

    2029E599-45F1-E311-80CA-00155D008B1C




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2