EXISTS() vs EXISTS() = TRUE in Postgres
Marcee last edited by
Faced weird behaviour with
EXISTS(also applies for
NOT EXISTS) generating different execution plans for
EXPLAIN ANALYZE SELECT * FROM books WHERE EXISTS (SELECT 1 FROM authors WHERE id = books.author_id AND name LIKE 'asd%');
QUERY PLAN Hash Join (cost=218.01..454.43 rows=56 width=40) (actual time=0.975..0.975 rows=0 loops=1) Hash Cond: (books.author_id = authors.id) -> Seq Scan on books (cost=0.00..206.80 rows=11280 width=40) (actual time=0.010..0.010 rows=1 loops=1) -> Hash (cost=217.35..217.35 rows=53 width=4) (actual time=0.943..0.943 rows=0 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 8kB -> Seq Scan on authors (cost=0.00..217.35 rows=53 width=4) (actual time=0.942..0.943 rows=0 loops=1) Filter: ((name)::text ~~ 'asd%'::text) Rows Removed by Filter: 10000 Planning Time: 0.361 ms Execution Time: 1.022 ms
WHERE EXISTS(...) = TRUE
EXPLAIN ANALYZE SELECT * FROM books WHERE EXISTS (SELECT id FROM authors WHERE id = books.author_id AND name LIKE 'asd%') = True;
QUERY PLAN Seq Scan on books (cost=0.00..93887.20 rows=5640 width=40) (actual time=2.054..2.054 rows=0 loops=1) Filter: (alternatives: SubPlan 1 or hashed SubPlan 2) Rows Removed by Filter: 10000 SubPlan 1 -> Index Scan using authors_pkey on authors (cost=0.29..8.30 rows=1 width=0) (never executed) Index Cond: (id = books.author_id) Filter: ((name)::text ~~ 'asd%'::text) SubPlan 2 -> Seq Scan on authors authors_1 (cost=0.00..217.35 rows=53 width=4) (actual time=0.931..0.931 rows=0 loops=1) Filter: ((name)::text ~~ 'asd%'::text) Rows Removed by Filter: 10000 Planning Time: 0.298 ms Execution Time: 2.129 ms
Of particular interest is Hash Join vs. simple Seq Scan and the 2x time diff.
PostgreSQL is able to optimize
WHERE EXISTS (/* correlated subquery */)into a join or semi-join, but it is not smart enough to detect that the
EXISTS (...) = TRUEcan be removed, so it does not apply the optimization here.
Since the optimization is not used, it is unsurprising that the second plan is slower. Although, to be honest, with a tiny query like that the difference could just be noise.
Some background for the second execution plan:
The second plan with the
alternatives:shows that you are using an older version of PostgreSQL, which still had
AlternativeSubPlans. The idea behind that was that PostgreSQL could potentially decide to start using a different subplan during query execution if the row count estimates proved to be off. This capability was removed with commit https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=41efb8340877e8ffd0023bb6b2ef22ffd1ca014d in v14. You may want to refer to Tom Lane's commit message for details:
Move resolution of AlternativeSubPlan choices to the planner.
When commit bd3daddaf introduced AlternativeSubPlans, I had some
ambitions towards allowing the choice of subplan to change during
execution. That has not happened, or even been thought about, in the
ensuing twelve years; so it seems like a failed experiment. So let's
rip that out and resolve the choice of subplan at the end of planning
(in setrefs.c) rather than during executor startup. This has a number
of positive benefits:
Removal of a few hundred lines of executor code, since
AlternativeSubPlans need no longer be supported there.
Removal of executor-startup overhead (particularly, initialization
of subplans that won't be used).
Removal of incidental costs of having a larger plan tree, such as
tree-scanning and copying costs in the plancache; not to mention
setrefs.c's own costs of processing the discarded subplans.
EXPLAIN no longer has to print a weird (and undocumented)
representation of an AlternativeSubPlan choice; it sees only the
subplan actually used. This should mean less confusion for users.
Since setrefs.c knows which subexpression of a plan node it's
working on at any instant, it's possible to adjust the estimated
number of executions of the subplan based on that. For example,
we should usually estimate more executions of a qual expression
than a targetlist expression. The implementation used here is
pretty simplistic, because we don't want to expend a lot of cycles
on the issue; but it's better than ignoring the point entirely,
as the executor had to.
That last point might possibly result in shifting the choice
between hashed and non-hashed EXISTS subplans in a few cases,
but in general this patch isn't meant to change planner choices.
Since we're doing the resolution so late, it's really impossible
to change any plan choices outside the AlternativeSubPlan itself.
Patch by me; thanks to David Rowley for review.