Why parallel aggregation is not used for subquery when other subquery using non parallel aggregation?

  • I defined 2 equivalent aggregates. First supports parallel mode, but second one doesn't. I defined 2 identical tables with index.

    create table jsfield(j jsonb);
    insert into jsfield 
      select jsonb_build_object('c', s) 
       from generate_series(1, 100000) s; 
    create index jsfield_idx on jsfield(j);

    create table jsfield2 as select * from jsfield;
    create index jsfield2_idx on jsfield2(j);

    create or replace function shallow_jsonb_object_merge(st jsonb, a jsonb)
    returns jsonb
    language sql
    as $$ select st || a ; $$ ;

    create or replace aggregate "Semigroup_seq" (jsonb)
    sfunc = shallow_jsonb_object_merge,
    stype = jsonb

    create or replace aggregate "Semigroup_par" (jsonb)
    sfunc = shallow_jsonb_object_merge,
    stype = jsonb,
    combinefunc = shallow_jsonb_object_merge,
    parallel = safe,
    initcond = '{}'

    explain analyze produces expected plans when aggregates are used independently:

    explain analyze select "Semigroup_seq"(j) from jsfield ;
                                                         QUERY PLAN                                                      
     Aggregate  (cost=26737.04..26737.05 rows=1 width=32) (actual time=122.797..122.797 rows=1 loops=1)
       ->  Seq Scan on jsfield  (cost=0.00..1736.04 rows=100004 width=26) (actual time=0.015..6.934 rows=100004 loops=1)
     Planning Time: 0.420 ms
     Execution Time: 122.847 ms

    explain analyze select "Semigroup_par"(j) from jsfield2 ;

    Finalize Aggregate (cost=10559.84..10559.85 rows=1 width=32) (actual time=61.475..64.994 rows=1 loops=1)
    -> Gather (cost=10558.78..10559.09 rows=3 width=32) (actual time=61.363..64.884 rows=4 loops=1)
    Workers Planned: 3
    Workers Launched: 3
    -> Partial Aggregate (cost=9558.78..9558.79 rows=1 width=32) (actual time=42.316..42.316 rows=1 loops=4)
    -> Parallel Index Only Scan using jsfield2_idx on jsfield2 (cost=0.42..1494.03 rows=32259 width=26) (actual time=0.030..2.781 rows=25001 loops=4)
    Heap Fetches: 0
    Planning Time: 0.363 ms
    Execution Time: 65.065 ms

    Parallel aggregate runs definitely faster, but when I run both queries at once as subqueries, then explain shows sequential scan for both!

    explain analyze select 
      (select "Semigroup_par"(j) from jsfield2), 
      (select "Semigroup_seq"(j) from jsfield) ;
                                                              QUERY PLAN                                                          
     Result  (cost=53474.10..53474.11 rows=1 width=64) (actual time=271.377..271.378 rows=1 loops=1)
       InitPlan 1 (returns $0)
         ->  Aggregate  (cost=26737.04..26737.05 rows=1 width=32) (actual time=174.772..174.773 rows=1 loops=1)
               ->  Seq Scan on jsfield2  (cost=0.00..1736.04 rows=100004 width=26) (actual time=0.007..7.586 rows=100004 loops=1)
       InitPlan 2 (returns $1)
         ->  Aggregate  (cost=26737.04..26737.05 rows=1 width=32) (actual time=96.599..96.599 rows=1 loops=1)
               ->  Seq Scan on jsfield  (cost=0.00..1736.04 rows=100004 width=26) (actual time=0.006..5.362 rows=100004 loops=1)
     Planning Time: 0.581 ms
     Execution Time: 271.407 ms

    Why planner prefers seq scan over parallel one? Is it because cost for both subqueries the same and parallel mode is much more memory expensive? Though how planer knows that if it is guided just by cost parameter.

    Could it be a bug? Is it adjustable behavior?

     14.2 (Ubuntu 14.2-1.pgdg22.04+1+b1)

  • No, that is working as expected. If there is a parallel unsafe function used anywhere in a query, that query cannot use parallelism, period. In the words of https://www.postgresql.org/docs/current/parallel-safety.html :

    When a query contains anything that is parallel unsafe, parallel query is completely disabled for that query.

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2