Why are records from my data-modifying CTE not appearing in the subsequent SELECT from the insertion table?
-
I've crafted what seems to be a rather clever query for only inserting records that don't exist, and then returning the ids for all inserted data rows, with each row's PK id, whether it's been inserted or not (avoiding a lot of the problems that come with
ON CONFLICT... RETURNING id
clauses). The only problem is, it doesn't return the data that's been inserted, which doesn't make sense to me:In postgres 12.3:
CREATE TABLE target_table ( id BIGSERIAL PRIMARY KEY, name TEXT NOT NULL, value JSONB NOT NULL, UNIQUE (name, value) );
INSERT INTO target_table(name, value) VALUES ('asdf', '18446744073709551615');
INSERT INTO target_table(name, value) VALUES ('asdf', 'null');-- merge query
WITH incoming AS (
SELECT * FROM (VALUES ('asdf','null'::JSONB), ('asdf', '18446744073709551615'::JSONB), ('asdaf', '3'::JSONB)) AS tmp(name, value)
)
, merge_op AS (
INSERT INTO target_table(name, value)
SELECT i.name, i.value
FROM incoming i
LEFT JOIN target_table er
ON er.name = i.name
AND er.value = i.value
WHERE er.name IS NULL
ON CONFLICT DO NOTHING
)
SELECT er.id, er.name, er.value
FROM target_table er
JOIN incoming i ON i.name = er.name
AND i.value = er.value;
Since this is all one statement, it all happens in the same transaction, so I'd think that the last
SELECT
statement would be able to 'see' the inserted records and join appropriately. What happens is, the last record (with avalue
of3
) doesn't get returned on the first run of the merge query; only the first two existing records are returned. A subsequent run (inserting 0 records this time) will return all incoming records.Short of running the last
SELECT
in a separate query, can this be done? Any unforeseen drawbacks to this approach?
-
https://www.postgresql.org/docs/current/queries-with.html#QUERIES-WITH-MODIFYING
The sub-statements in WITH are executed concurrently with each other and with the main query. Therefore, when using data-modifying statements in WITH, the order in which the specified updates actually happen is unpredictable. All the statements are executed with the same snapshot (see Chapter 13), so they cannot “see” one another's effects on the target tables. (... ) and means that RETURNING data is the only way to communicate changes between different WITH sub-statements and the main query.
(emphasis mine)
So you need a RETURNING clause in your
merge_op
clause to see the inserted rows.-- merge query WITH incoming (name, value) AS ( VALUES ('asdf','null'::JSONB), ('asdf', '18446744073709551615'::JSONB), ('asdaf', '3'::JSONB) ) , merge_op AS ( INSERT INTO target_table(name, value) SELECT i.name, i.value FROM incoming i LEFT JOIN target_table er ON er.name = i.name AND er.value = i.value WHERE er.name IS NULL ON CONFLICT DO NOTHING returning * ) SELECT er.id, er.name, er.value FROM target_table er JOIN incoming i ON i.name = er.name AND i.value = er.value;
If you also want to see the rows from
incoming
that were not inserted (e.g. due to a conflict) you need an outer join:SELECT er.id, i.name, i.value FROM incoming i LEFT JOIN target_table er ON i.name = er.name AND i.value = er.value;