Taking a weekly snapshot of table and appending it to a separate "history" table. Is this good practice?



  • I have set up a small project in work.

    Let's just say it's a list of customers and their "status" which is Gold/Silver/Bronze. Any time their status changes, a new row is created.

    However, these changes would be quite rare. So John Doe may have a timestamp in January, when he was created, and another in December, when he changed his phone number.

    Therefore I would like to take a copy of the data each week, and append it to a new table, let's call it Customer Weekly Trends. This will obviously grow with each week but will allow me to create trends with the data. For example, I can bring the data into Power BI and use this history table to get the number of "gold / bronze / silver" customers each week/month through the years. Perhaps even use the data for forecasting.

    It would be a stored procedure, running once a week and copying the data over.

    Is this good practice or is there a better way to do it?

    Using MSSQL version 14.1.1000.169



  • Your proposed solution is exactly how a data warehouse works. There's a source; at regular intervals data is copied out; the data is stored elsewhere marked with a date/time. That your history table is in the same database as the source doesn't matter.

    One problem is that multiple changes can happen to a row between extracts. These will be missed. It could be a customer goes from bronze to gold to silver to bronze again in the interval between extracts. Unlikely, yes, but does anything in the application prevent this happening? To the history table it looks like nothing happened but that customer actually had a rather eventful week. Your process effectively filters out high frequency data.

    As long as you are OK with omitting such data, and not being able to identify outliers, that's fine. Thousands of implementations do just that.

    You can, of course, decrease the time between extracts. The shorter the time the fewer changes there will be in the data. Taken to extreme you will be continuously reading the source and almost never finding any changes. This is quite wasteful of compute resource. At this point a different technology is required. SQL Server supports a few.

    Change tracking adds a new system table to the database. When a row changes in the application table a row is also written to this system table effectively saying "primary key X changed."

    Change Data Capture adds a system table which says "primary key X used to have values (a,b,c) but now has (a,b,q)."

    Temporal Tables adds a history table that shadows the application table. Before a change is written the system automatically copies the existing values to the history table, adding datetime values. You can query over the application and history tables as though they are a single table using the https://docs.microsoft.com/en-us/sql/relational-databases/tables/querying-data-in-a-system-versioned-temporal-table#query-for-a-specific-time-using-the-as-of-sub-clause syntax.

    In each case the additional tables are just tables. They can be read just like any other table. Each solution has different performance and overhead characteristics. Test carefully before you decide. For my two cents I'd suggest temporal tables are a good place to start.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2