Why does Postgres not use an index when joining a materialized view?



  • I have a table defined as:

    CREATE TABLE addresses (
        "address" varchar(42),
        "other_info1" text,
        "other_info2" bigint
    );
    

    and I have some manual corrections to that table defined as:

    CREATE TABLE address_corrections (
        "address" varchar(42),
        "other_info1" text,
        "other_info2" bigint
    );
    

    I have a simple union + dedup query that I am persisting as a materialized view corrected_addresses with the same schema as the two tables above. The size of the materialized view is about the same size as the original table addresses, which is about 400k rows. I have an index on the materialized view which matches the same exact index on the original addresses table:

    CREATE INDEX corrected_addresses_address_idx ON corrected_addresses(address);
    

    Now when querying some other tables I want to join on this table to get access to other_info1 and other_info2. Simple enough. However, this join appears to not use the index on address. The query looks like this:

    EXPLAIN (analyze, buffers, format text)
    SELECT transfer.from_name, transfer.to_name, transfer.value,
            transfer.address, address.other_info1, address.other_info2
    FROM transfers transfer
    LEFT JOIN corrected_addresses address ON transfer.address=address.address
    WHERE transfer.from_name='some name here';
    

    with either addresses or corrected_addresses as the join table.

    Here is the result for the original address table:

    Gather  (cost=1205.50..43489.92 rows=6241 width=173) (actual time=1.422..12.292 rows=2 loops=1)
      Workers Planned: 2
      Workers Launched: 2
      Buffers: shared hit=12
      ->  Nested Loop Left Join  (cost=205.50..41865.82 rows=2600 width=173) (actual time=0.184..0.198 rows=1 loops=3)
            Buffers: shared hit=12
            ->  Parallel Bitmap Heap Scan on transfers transfer  (cost=205.07..25042.94 rows=2600 width=204) (actual time=0.176..0.180 rows=1 loops=3)
                  Recheck Cond: ((from_name)::text = 'some name here'::text)
                  Heap Blocks: exact=2
                  Buffers: shared hit=8
                  ->  Bitmap Index Scan on transfers_from_name_timestamp_idx  (cost=0.00..203.51 rows=6241 width=0) (actual time=0.400..0.401 rows=2 loops=1)
                        Index Cond: ((from_name)::text = 'some name here'::text)
                        Buffers: shared hit=6
            ->  Memoize  (cost=0.43..7.99 rows=1 width=55) (actual time=0.017..0.018 rows=0 loops=2)
                  Cache Key: transfer.address
                  Cache Mode: logical
                  Hits: 0  Misses: 2  Evictions: 0  Overflows: 0  Memory Usage: 1kB
                  Buffers: shared hit=4
                  ->  Index Scan using addresses_idx on addresses a  (cost=0.42..7.98 rows=1 width=55) (actual time=0.013..0.013 rows=0 loops=2)
                        Index Cond: ((address)::text = (transfer.address)::text)
                        Buffers: shared hit=4
    Planning:
      Buffers: shared hit=334 dirtied=1
    Planning Time: 5.692 ms
    Execution Time: 12.601 ms
    

    And for the materialized view:

    Gather  (cost=10501.47..35970.26 rows=6241 width=173) (actual time=100.881..111.703 rows=2 loops=1)
      Workers Planned: 2
      Workers Launched: 2
      Buffers: shared hit=5327
      ->  Parallel Hash Left Join  (cost=9501.47..34346.16 rows=2600 width=173) (actual time=94.729..95.259 rows=1 loops=3)
            Hash Cond: ((transfer.address)::text = (a.address)::text)
            Buffers: shared hit=5327
            ->  Parallel Bitmap Heap Scan on transfers transfer  (cost=205.07..25042.94 rows=2600 width=204) (actual time=0.162..0.164 rows=1 loops=3)
                  Recheck Cond: ((from_name)::text = 'some name here'::text)
                  Heap Blocks: exact=1
                  Buffers: shared hit=8
                  ->  Bitmap Index Scan on transfers_from_name_timestamp_idx  (cost=0.00..203.51 rows=6241 width=0) (actual time=0.070..0.071 rows=2 loops=1)
                        Index Cond: ((from_name)::text = 'some name here'::text)
                        Buffers: shared hit=6
            ->  Parallel Hash  (cost=7051.18..7051.18 rows=179618 width=55) (actual time=92.901..92.901 rows=143694 loops=3)
                  Buckets: 524288  Batches: 1  Memory Usage: 41856kB
                  Buffers: shared hit=5255
                  ->  Parallel Seq Scan on corrected_addresses a  (cost=0.00..7051.18 rows=179618 width=55) (actual time=0.006..27.550 rows=143694 loops=3)
                        Buffers: shared hit=5255
    Planning:
      Buffers: shared hit=37
    Planning Time: 0.482 ms
    Execution Time: 111.758 ms
    

    Both of those execution times are small, I believe that's partially because of caching. The difference on a fresh run is ~100ms vs. a few seconds. Sorry I couldn't get an example here.

    Does anyone know why Postgres is not using the index for the materialized view?



  • The horrible estimation of 'transfers' condition (expected 2600, found 1) caused the cost of the hash join and the nested loop to appear falsely similar.

    Why does misestimating by 2600 fold cause one query to use a different plan but not the other? No idea, probably just luck. I doubt it is even reproducible, much less interesting. Fix the root problem.

    If an ANALYZE transfers doesn't fix it, then you will need to dig into pg_stats for column "from_name".



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2