Is there some good way to "diff" a table's data with a fresh copy of it, automatically?



  • I have a table representing all countries on this planet, with their country codes and full titles/common names, etc.

    I have not run my update_country_table function since 2020, so it's due for another run now.

    If I run it as-is, it will fetch the new data, truncate the current table (and reset identity), then INSERT all the new rows. In other words, I will have no idea how many new countries were added, or removed, or possibly had one of their columns edited. Well, I can count the rows before and after, but that will only tell me if it has changed in numbers -- not what changed. (Maybe two countries were removed and two others added, or many of the existing ones had typos fixed for their common names.)

    I could of course make this function much more advanced and have it painstakingly check each existing record instead of truncating it and then inserting all the new data, but this is anything but simple. It would have to check if any of the columns have changed, etc. It would just be an annoying mess, especially since this is a common thing that I want to do in many other places as well.

    Is there some way to do something like this? (This is made-up fantasy code from my dreams.)

    MAKE TEMPORARY COPY OF countries AS countries2;
    TRUNCATE TABLE countries;
    INSERT ...
    ...
    SELECT (COMPARE TABLES countries, countries2);
    

    I doubt this exists, but it sure would be useful for the many times when I have data of this kind which I regularly pull down. I don't like being "blind" to what it actually changes.

    For my source code files, I use the neat WinMerge program to see exactly what has changed since last time. I sure wish this could be easily done for my database tables as well.



  • MAKE TEMPORARY COPY OF countries AS countries2;
    

    You can use a CTAS (create table as select) operation. If you want PostgreSQL to drop it after disconnect, create a temporary table .

    create table countries2 as select * from countries;
    

    I assume you are not asking about the following part.

    TRUNCATE TABLE countries;
    INSERT ...
    ...
    

    For the comparison of 2 tables, you can use the except operator

    -- Rows in countries but not in countries2
    select * from countries
    except 
    select * from countries2;
    

    -- Rows in countries2 but not in countries
    select * from countries2
    except
    select * from countries;

    If your table has unique key (say name), you can use a SQL like the following.

    select c.*, c2.*
    from countries c full outer join 
    countries2 c2
    on c.name=c2.name
    where (c.full_name is distinct from c2.full_name
    or c.population is distinct from c2.population
    ...
    );
    

Log in to reply
 


Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2