How to make Postgres use the correct index here?



  • Here's a test setup:

    CREATE TABLE t (
        filename int,
        cropped bool not null default false,
        resized bool not null default false,
        create_date date not null default '1970-01-01'
    );
    

    INSERT INTO t
    SELECT generate_series(1, 1000000);

    UPDATE t SET cropped = true, resized = true
    WHERE filename IN (SELECT filename FROM t ORDER BY random() LIMIT 900000);
    UPDATE t SET resized = false
    WHERE filename IN (SELECT filename FROM t WHERE cropped = true ORDER BY random() LIMIT 1000);

    VACUUM FULL t;
    ANALYZE t;

    https://dbfiddle.uk/?rdbms=postgres_14&fiddle=f356fd56a920ea8a93c192f5a8c16b1c

    Data now looks like this:

    SELECT cropped, resized, count(*)
    FROM t
    GROUP BY 1,2;
    

    table

    I create some partial indexes for other queries and one index specifically for the query below:

    CREATE INDEX idx_cropped ON t(cropped) WHERE NOT cropped;
    CREATE INDEX idx_resized ON t(resized) WHERE NOT resized;
    

    CREATE INDEX specific ON t(cropped,resized) WHERE cropped AND NOT resized;

    Now I run my query:

    EXPLAIN ANALYZE
        SELECT count(*) FROM t WHERE cropped AND NOT resized AND create_date < CURRENT_DATE;
    
    Aggregate  (cost=4001.25..4001.26 rows=1 width=8) (actual time=478.557..478.558 rows=1 loops=1)
      ->  Index Scan using idx_resized on t  (cost=0.29..3777.71 rows=89415 width=0) (actual time=478.177..478.480 rows=1000 loops=1)
            Filter: (cropped AND (create_date < CURRENT_DATE))
            Rows Removed by Filter: 100000
    

    It takes 478 ms on dbfiddle.uk (on my machine it's faster but the difference is still visible).

    Now I delete an index:

    DROP INDEX idx_resized;
    

    and run the same query again and I get a much better plan:

    Aggregate  (cost=11876.27..11876.28 rows=1 width=8) (actual time=0.315..0.316 rows=1 loops=1)
      ->  Bitmap Heap Scan on t  (cost=35.50..11652.73 rows=89415 width=0) (actual time=0.054..0.250 rows=1000 loops=1)
            Recheck Cond: (cropped AND (NOT resized))
            Filter: (create_date < CURRENT_DATE)
            Heap Blocks: exact=6
            ->  Bitmap Index Scan on specific  (cost=0.00..13.15 rows=89415 width=0) (actual time=0.040..0.040 rows=1000 loops=1)
    

    which uses the index specific and completes in less than a ms on both dbfiddle.uk and my machine.


    Edit: Additional mystery - when I set the values not with an UPDATE but with a DEFAULT, then the correct index is chosen. What is going on? https://dbfiddle.uk/?rdbms=postgres_14&fiddle=dc7d8aea14e90f08ab6537a855f34d8c



  • Firstly, your idx_cropped doesn’t use the same filter as your query so it’s not possible to use.

    Your plans tell us that it expects both possible indexes to return 89415 rows, this will be computed by the optimizer based on the information it has (statistics and query). You can try gathering additional statistics.

    Because it thinks the selectivity of both indexes are the same, it will go with the smaller one (the one that doesn’t include both columns in its index key).

    It’s a bit of a waste to include those columns as the index key - you already know what they are because they’re part of the index filter.

    Instead, you could use create_date which gets filtered on in the query. I’ve included an index on (create_date) where cropped and not resized In this demo https://dbfiddle.uk/?rdbms=postgres_14&fiddle=19b28197b2654b7492aa6a8a3ee44cd3




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2