Why are records from my data-modifying CTE not appearing in the subsequent SELECT from the insertion table?



  • I've crafted what seems to be a rather clever query for only inserting records that don't exist, and then returning the ids for all inserted data rows, with each row's PK id, whether it's been inserted or not (avoiding a lot of the problems that come with ON CONFLICT... RETURNING id clauses). The only problem is, it doesn't return the data that's been inserted, which doesn't make sense to me:

    In postgres 12.3:

    CREATE TABLE target_table (
      id BIGSERIAL PRIMARY KEY,
      name TEXT NOT NULL,
      value JSONB NOT NULL,
      UNIQUE (name, value)
    );
    

    INSERT INTO target_table(name, value) VALUES ('asdf', '18446744073709551615');
    INSERT INTO target_table(name, value) VALUES ('asdf', 'null');

    -- merge query
    WITH incoming AS (
    SELECT * FROM (VALUES ('asdf','null'::JSONB), ('asdf', '18446744073709551615'::JSONB), ('asdaf', '3'::JSONB)) AS tmp(name, value)
    )
    , merge_op AS (
    INSERT INTO target_table(name, value)
    SELECT i.name, i.value
    FROM incoming i
    LEFT JOIN target_table er
    ON er.name = i.name
    AND er.value = i.value
    WHERE er.name IS NULL
    ON CONFLICT DO NOTHING
    )
    SELECT er.id, er.name, er.value
    FROM target_table er
    JOIN incoming i ON i.name = er.name
    AND i.value = er.value;

    Since this is all one statement, it all happens in the same transaction, so I'd think that the last SELECT statement would be able to 'see' the inserted records and join appropriately. What happens is, the last record (with a value of 3) doesn't get returned on the first run of the merge query; only the first two existing records are returned. A subsequent run (inserting 0 records this time) will return all incoming records.

    Short of running the last SELECT in a separate query, can this be done? Any unforeseen drawbacks to this approach?



  • https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-MODIFYING

    The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables. (... ) and means that RETURNING data is the only way to communicate changes between different WITH sub-statements and the main query.

    (emphasis mine)

    So you need a RETURNING clause in your merge_op clause to see the inserted rows.

    -- merge query
    WITH incoming (name, value) AS (
      VALUES 
        ('asdf','null'::JSONB), 
        ('asdf', '18446744073709551615'::JSONB), 
        ('asdaf', '3'::JSONB)
    )
    , merge_op AS (
        INSERT INTO target_table(name, value)
        SELECT i.name, i.value
        FROM incoming i
        LEFT JOIN target_table er
            ON er.name = i.name
            AND er.value = i.value
        WHERE er.name IS NULL
        ON CONFLICT DO NOTHING
        returning *
    )
    SELECT er.id, er.name, er.value
    FROM target_table er
      JOIN incoming i ON i.name = er.name
    AND i.value = er.value;
    

    If you also want to see the rows from incoming that were not inserted (e.g. due to a conflict) you need an outer join:

    SELECT er.id, i.name, i.value
    FROM incoming i 
      LEFT JOIN target_table er ON i.name = er.name
    AND i.value = er.value;
    

Log in to reply
 


Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2