postgresql Recursive Update resulting in Sequential Scan



  • I have a table with a self-referential foreign key, and I would like to update a given parent, and all of the descendants of the parent, with the same recursive update.

    When using an UPDATE with a recursive CTE that returns only 10 rows out of 25K rows, the optimizer uses a hash semi-join with sequential scan on the table being updated, instead of the more optimal nested loop and index scan. It has a slow execution time of about ~5-10ms.

    Increasing the size of this table to 250K rows will lead to using the index scan. Ironically, the execution time is actually much faster (~0.5 to 1.0ms), by an order of magnitude, compared to a 25K table (~5-10ms seconds), precisely because it is using the index instead of sequentially scanning.

    My guess here is that the optimizer is unable to first run the CTE and then plan the update, rather it needs to plan in advance, and it is incorrectly assuming the CTE is returning a much larger number of rows then it really is.

    Postgres doesn't allow index optimizer hints. Short of setting enable_seqscan to off in production, is there any workaround to get postgres to use the index?


    Set up:

    drop table emp;
    

    create table emp (id int primary key, manager_id int, department text);
    create index emp_manager_id on emp (manager_id);

    insert into emp
    select i id,
    case when mod(i, 10) = 0 then null else i - 1 end manager_id,
    null department
    from generate_series(0, 25000) as i;

    analyze emp;
    vacuum emp;

    Here is the update DML. This only updates 10 rows. It doesn't matter whether I use a IN, EXISTS, or update FROM the recursive CTE, they all result in a sequential scan

    explain 
    with recursive foo as (
        select id, manager_id, department
        from emp
        where id = 1000
    
    union all
    
    select emp.id, emp.manager_id, emp.department
    from emp join foo on emp.manager_id = foo.id
    

    )
    update emp
    set department = 'IT'
    where id in (select id from foo);

    Results in

                                                 QUERY PLAN
    -----------------------------------------------------------------------------------------------------
     Update on emp  (cost=766.85..939.24 rows=101 width=74)
       CTE foo
         ->  Recursive Union  (cost=0.29..763.57 rows=101 width=40)
               ->  Index Scan using emp_pkey on emp emp_1  (cost=0.29..8.30 rows=1 width=40)
                     Index Cond: (id = 1000)
               ->  Nested Loop  (cost=0.29..75.32 rows=10 width=40)
                     ->  WorkTable Scan on foo foo_1  (cost=0.00..0.20 rows=10 width=4)
                     ->  Index Scan using emp_manager_id on emp emp_2  (cost=0.29..7.50 rows=1 width=40)
                           Index Cond: (manager_id = foo_1.id)
       ->  Hash Semi Join  (cost=3.28..175.67 rows=101 width=74)
             Hash Cond: (emp.id = foo.id)
             ->  Seq Scan on emp  (cost=0.00..145.01 rows=10001 width=14)
             ->  Hash  (cost=2.02..2.02 rows=101 width=32)
                   ->  CTE Scan on foo  (cost=0.00..2.02 rows=101 width=32)
    

    Explain analyze gives the same result. I'm using explain here for brevity.


    This hash semi join with sequential scan is not optimal given that only 10 rows out of 25,000 are being updated. An nested loop with an index scan would be ideal here.

    Setting enable_seqscan=off reduces the time to ~0.1ms (from ~5-10ms)

    If I don't use a recursive CTE, the following update using generate_series shows that the emp_id index is correctly used to perform the update via a nested loop. This is what I would expect from the recursive CTE update.

    explain 
    update emp 
    set department = 'IT' 
    where id in (
        select i from generate_series(1000,1009) i
    );
    
                                        QUERY PLAN
    

    Update on emp (cost=0.43..83.59 rows=11 width=74)
    -> Nested Loop (cost=0.43..83.59 rows=11 width=74)
    -> HashAggregate (cost=0.14..0.25 rows=11 width=32)
    Group Key: i.i
    -> Function Scan on generate_series i (cost=0.00..0.11 rows=11 width=32)
    -> Index Scan using emp_pkey on emp (cost=0.29..7.58 rows=1 width=14)
    Index Cond: (id = i.i)

    If I increase the number of rows in the table to 250K from 10K, the explain plan does result in the optimal use of the index. However, with 25K rows/seq scan it takes ~5-10ms to execute. With 250K rows the index scan it takes ~0.5-0.1ms.

    My guess here is that postgres is not able to first run the CTE and then calculate a plan for the update. It needs to calculate a plan before running the CTE. Therefore postgres is unable to know that only 10 rows are being returned from the CTE, and instead has to guess the number. So postgres is guessing the CTE will return something like 1000 rows, which makes it prefer the sequential scan when the table only contains 25K. The reason why my 250K table uses the index scan, I assume, is that postgres continues to guess that the CTE is returning 1000 rows, but out of 250K an index scan makes more sense.


    Postgres doesn't allow index optimizer hints. Short of setting enable_seqscan to off in production, is there any workaround to get postgres to use the index?


    @a_horse_with_no_name's solution using emp.id = any(array(select id from foo)) is great. It results in the following explain plain, which is slightly different:

                                         QUERY PLAN
    ------------------------------------------------------------------------------------
     Update on emp  (cost=44.19..48.93 rows=10 width=46)
       CTE foo
         ->  Recursive Union  (cost=0.00..42.17 rows=101 width=11)
               ->  Seq Scan on emp emp_1  (cost=0.00..3.08 rows=1 width=11)
                     Filter: (id = 0)
               ->  Hash Join  (cost=0.33..3.71 rows=10 width=11)
                     Hash Cond: (emp_2.manager_id = foo.id)
                     ->  Seq Scan on emp emp_2  (cost=0.00..2.66 rows=166 width=11)
                     ->  Hash  (cost=0.20..0.20 rows=10 width=4)
                           ->  WorkTable Scan on foo  (cost=0.00..0.20 rows=10 width=4)
       InitPlan 2 (returns $2)
         ->  CTE Scan on foo foo_1  (cost=0.00..2.02 rows=101 width=4)
       ->  Seq Scan on emp  (cost=0.00..4.73 rows=10 width=46)
             Filter: (id = ANY ($2))
    

    Can anyone explain the difference between these two parts:

    Original with enable_seqscan=off:

       ->  Nested Loop  (cost=2.56..294.11 rows=101 width=74) (actual time=0.091..0.118 rows=10 loops=1)
             ->  HashAggregate  (cost=2.27..3.28 rows=101 width=32) (actual time=0.076..0.080 rows=10 loops=1)
                   Group Key: foo.id
                   Batches: 1  Memory Usage: 24kB
                   ->  CTE Scan on foo  (cost=0.00..2.02 rows=101 width=32) (actual time=0.024..0.068 rows=10 loops=1)
             ->  Index Scan using emp_pkey on emp  (cost=0.29..2.88 rows=1 width=14) (actual time=0.003..0.003 rows=1 loops=10)
                   Index Cond: (id = foo.id)
    

    Using any(array(...)):

       InitPlan 2 (returns $2)
         ->  CTE Scan on foo foo_1  (cost=0.00..2.02 rows=101 width=4)
       ->  Seq Scan on emp  (cost=0.00..4.73 rows=10 width=46)
             Filter: (id = ANY ($2))
    

    First, My original query results in a HashAggregate of the recursive cte foo.id after performing the CTE scan. Only after this does it loop through the emp index. I don't get why it is doing this. Using any(array(...)), it will skip this step, and simply nested loop over the cte scan and the index scan.

    Second, and probably most importantly, using any(array(...)) results in this InitPlan 2. I believe what is happening here is that any(array(...)) somehow forces the query planner to execute these as two different queries. First it executes the CTE, which only returns 10 rows. Then, the planner knows that with only 10 rows it can use an index scan instead of seqscan. My original solution for some reason cannot force the query planner to execute these as two different queries, so the queryplanner does not know in advance how many rows are being returned.

    Any ideas?



  • This seems to always use an index scan (at least on Postgres 14)

    with recursive foo as (
        select id, manager_id, department
        from emp
        where id = 1000
    
    union all
    
    select emp.id, emp.manager_id, emp.department
    from emp 
      join foo on emp.manager_id = foo.id
    

    )
    update emp
    set department = 'IT'
    where emp.id = any (array(select id from foo))

    If you have fast (SSD) disks, you might want to consider lowering random_page_cost, to make Postgres favor index scans in general




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2