Checksum for SQL-server tables



  • I'm in the process of installing applications and databases.

    In order to check whether a file is correct, I can simply take the checksum.
    Is there a way to check if a table is correct? (I realise I can drop the entire table into a file and take the checksum of that file, but maybe there's a simpler way.)



  • There are multiple ways and tools out there to do a database comparison. My preference is the tool https://www.sqlaccessories.com/sql-examiner/ . It is simple, powerful, relatively cheap, and can you tell you both schema differences and data differences. It's flexible in the advanced options by allowing you to ignore different things during comparison such as collation differences, defaults, check constraints, etc. And it can compare databases on the same server or across different servers.

    Additionally, here's a https://www.mssqltips.com/sqlservertip/1069/sql-server-comparison-tools/ for database comparison, from MSSQLTips.

    If you mostly only care about data differences, don't have a lot of tables to compare, and don't want to spend the money on a tool, you can probably roll your own solution using something like the https://docs.microsoft.com/en-us/sql/t-sql/functions/hashbytes-transact-sql?view=sql-server-ver15 or https://docs.microsoft.com/en-us/sql/t-sql/functions/checksum-transact-sql?view=sql-server-ver15 functions. I prefer HASHBYTES() for its https://sql-bits.com/sql-server-checksum-vs-checksum_binary-vs-hashbytes/ .

    Leveraging one of the aforementioned functions, the idea would be to create a single hash per row (of the concatenation of every column in the table), and do a FULL JOIN on the hashes between the two instances of the table, filtering where either side's key is null. This would leave you with just the rows that are different.

    How you want to implement the hashing function is your call. If it's a one time comparison, you can do it on the fly (though the join can be slow). If it's a recurring thing, you can add it as a column to the table itself or create a view on top of the table with it as a column, if you're able to alter the schema. This would allow you to create an index on the result of the function (HASHBYTES() is deterministic, I'm not so sure about CHECKSUM()), for better performance during comparison.

    The limitation to this method is that both tables need to be accessible to be joined on, either by having both databases on the same server or by leveraging something like a https://docs.microsoft.com/en-us/sql/relational-databases/linked-servers/linked-servers-database-engine?view=sql-server-ver15 to access the other database's server, which potentially can be slow for a lot of data.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2