EXISTS() vs EXISTS() = TRUE in Postgres



  • Faced weird behaviour with EXISTS (also applies for NOT EXISTS) generating different execution plans for

    WHERE EXISTS(...)

    EXPLAIN ANALYZE
    SELECT * FROM books
    WHERE EXISTS (SELECT 1 FROM authors WHERE id = books.author_id AND name LIKE 'asd%');
    
    QUERY PLAN
    Hash Join (cost=218.01..454.43 rows=56 width=40) (actual time=0.975..0.975 rows=0 loops=1)
    Hash Cond: (books.author_id = authors.id)
    -> Seq Scan on books (cost=0.00..206.80 rows=11280 width=40) (actual time=0.010..0.010 rows=1 loops=1)
    -> Hash (cost=217.35..217.35 rows=53 width=4) (actual time=0.943..0.943 rows=0 loops=1)
    Buckets: 1024 Batches: 1 Memory Usage: 8kB
    -> Seq Scan on authors (cost=0.00..217.35 rows=53 width=4) (actual time=0.942..0.943 rows=0 loops=1)
    Filter: ((name)::text ~~ 'asd%'::text)
    Rows Removed by Filter: 10000
    Planning Time: 0.361 ms
    Execution Time: 1.022 ms

    vs.

    WHERE EXISTS(...) = TRUE

    EXPLAIN ANALYZE
    SELECT * FROM books
    WHERE EXISTS (SELECT id FROM authors WHERE id = books.author_id AND name LIKE 'asd%') = True;
    
    QUERY PLAN
    Seq Scan on books (cost=0.00..93887.20 rows=5640 width=40) (actual time=2.054..2.054 rows=0 loops=1)
    Filter: (alternatives: SubPlan 1 or hashed SubPlan 2)
    Rows Removed by Filter: 10000
    SubPlan 1
    -> Index Scan using authors_pkey on authors (cost=0.29..8.30 rows=1 width=0) (never executed)
    Index Cond: (id = books.author_id)
    Filter: ((name)::text ~~ 'asd%'::text)
    SubPlan 2
    -> Seq Scan on authors authors_1 (cost=0.00..217.35 rows=53 width=4) (actual time=0.931..0.931 rows=0 loops=1)
    Filter: ((name)::text ~~ 'asd%'::text)
    Rows Removed by Filter: 10000
    Planning Time: 0.298 ms
    Execution Time: 2.129 ms

    Of particular interest is Hash Join vs. simple Seq Scan and the 2x time diff.

    Fiddle https://www.db-fiddle.com/f/a7kedUeFmCu4tWLFRoxMJA/2



  • PostgreSQL is able to optimize WHERE EXISTS (/* correlated subquery */) into a join or semi-join, but it is not smart enough to detect that the = TRUE in EXISTS (...) = TRUE can be removed, so it does not apply the optimization here.

    Since the optimization is not used, it is unsurprising that the second plan is slower. Although, to be honest, with a tiny query like that the difference could just be noise.


    Some background for the second execution plan:

    The second plan with the alternatives: shows that you are using an older version of PostgreSQL, which still had AlternativeSubPlans. The idea behind that was that PostgreSQL could potentially decide to start using a different subplan during query execution if the row count estimates proved to be off. This capability was removed with commit https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=41efb8340877e8ffd0023bb6b2ef22ffd1ca014d in v14. You may want to refer to Tom Lane's commit message for details:

    Move resolution of AlternativeSubPlan choices to the planner.
    

    When commit bd3daddaf introduced AlternativeSubPlans, I had some
    ambitions towards allowing the choice of subplan to change during
    execution. That has not happened, or even been thought about, in the
    ensuing twelve years; so it seems like a failed experiment. So let's
    rip that out and resolve the choice of subplan at the end of planning
    (in setrefs.c) rather than during executor startup. This has a number
    of positive benefits:

    • Removal of a few hundred lines of executor code, since
      AlternativeSubPlans need no longer be supported there.

    • Removal of executor-startup overhead (particularly, initialization
      of subplans that won't be used).

    • Removal of incidental costs of having a larger plan tree, such as
      tree-scanning and copying costs in the plancache; not to mention
      setrefs.c's own costs of processing the discarded subplans.

    • EXPLAIN no longer has to print a weird (and undocumented)
      representation of an AlternativeSubPlan choice; it sees only the
      subplan actually used. This should mean less confusion for users.

    • Since setrefs.c knows which subexpression of a plan node it's
      working on at any instant, it's possible to adjust the estimated
      number of executions of the subplan based on that. For example,
      we should usually estimate more executions of a qual expression
      than a targetlist expression. The implementation used here is
      pretty simplistic, because we don't want to expend a lot of cycles
      on the issue; but it's better than ignoring the point entirely,
      as the executor had to.

    That last point might possibly result in shifting the choice
    between hashed and non-hashed EXISTS subplans in a few cases,
    but in general this patch isn't meant to change planner choices.
    Since we're doing the resolution so late, it's really impossible
    to change any plan choices outside the AlternativeSubPlan itself.

    Patch by me; thanks to David Rowley for review.

    Discussion: https://postgr.es/m/1992952.1592785225@sss.pgh.pa.us




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2