Recreating history from snapshoted data



  • i have a case where i need to compare snapshoted data(dbo.src) with destination table(dbo.dest) and find out which of the countries are new and which ones are not valid anymore.

    DATA:

    create table dbo.src
    (
     MessageID int not null
    ,ProductID int not null
    ,Country varchar(2) null
    ,ChangeDate date not null
    )
    

    insert into dbo.src
    (MessageID,ProductID, Country, ChangeDate)
    values
    (1,1,'AT','2021-10-05')
    ,(2,1,'AT','2021-10-10')
    ,(2,1,'CH','2021-10-10')
    ,(3,1,'CH','2021-10-20')

    create table dbo.dest
    (
    ProductID int not null
    ,Country varchar(2) null
    ,ChangeDate date not null
    ,Active varchar(1) not null
    )

    insert into dbo.dest
    (ProductID, Country, ChangeDate,Active)
    values
    (1,'DE','2021-10-02','Y')

    In dbo.dest table I already have one record with country DE and i have to compare it with the records in dbo.src in an exact way in which data arrived in source table(This can be tracked via column MessageID). Multiple changes arrived during some time period for the same product but different country setup.

    The desired output would look like this: Desired outcome

    Here is how to interpret desired results:

    1. In the 1st run we compared row from dbo.src with messageId=1 with current record in table dbo.dest. There is no more “DE” so we insert new record in dbo.dest with flag Active=N and put the date 2021-10-05 which would be some form of date when it became active. We also insert new record AT which would mean that new country arrived and became active.
    2. In the 2nd run, we would compare rows from dbo.src with MessageID=2 and the last active record with latest ChangeDate and do the same comparison like in previous step. The result would be that we insert only CH with date 2021-10-10 and do nothing for AT record since its still coming from dbo.src table.
    3. In the 3rd run we do the same comparisons like in previous 2 steps. The result would be inserting only new record for AT with active=N and date 2021-10-20 since this record no longer exists in dbo.src table where MessageID=3

    How can this be done without cursors or loops?



  • Below is my attempt.

    See comments in code for details:

    ;WITH CombinedList AS(
        -- Select ALL currentry Active records
        SELECT 0 AS MessageID, ProductID, Country, ChangeDate, Active FROM dest
        UNION ALL
        SELECT MessageID, ProductID, Country, ChangeDate, 'Y' AS Active FROM src ),
    GroupedList AS(
        SELECT ProductID, Country, MessageID, ChangeDate,
            -- Calculates contiguous data group numbers
            MessageID - ROW_NUMBER() OVER ( PARTITION BY ProductID, Country ORDER BY MessageID ) AS rn
        FROM CombinedList
    ),
    Islands AS(
        SELECT ProductID, Country, ActiveStart = MIN( MessageID ), ActiveEnd = MAX( MessageID )
        FROM GroupedList AS a
        GROUP BY ProductID, Country, rn
    ),
    StartStopRecords AS(
        -- Create Start Records
        SELECT I.ProductID, I.Country, AStart.ChangeDate, 'Y' AS Active
        FROM Islands AS I
            INNER JOIN GroupedList AS AStart ON I.ProductID = AStart.ProductID AND I.Country = AStart.Country AND I.ActiveStart = AStart.MessageID
        UNION ALL
        -- Create Stop Records
        SELECT I.ProductID, I.Country, AEnd.ChangeDate, 'N' AS Active
        FROM Islands AS I
            -- Create a Stop record only if subsequent Message has arrived
            CROSS APPLY( SELECT TOP 1 ChangeDate FROM GroupedList WHERE I.ActiveEnd < MessageID ORDER BY MessageID ) AS AEnd
    )
    -- Final Result
    SELECT *
    FROM StartStopRecords
    ORDER BY ProductID, ChangeDate, Country
    

    Some of the code was adopted from https://www.red-gate.com/simple-talk/databases/sql-server/t-sql-programming-sql-server/gaps-islands-sql-server-data/ by Dwain Camps.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2