How to make Postgres use the correct index here?
Here's a test setup:
CREATE TABLE t ( filename int, cropped bool not null default false, resized bool not null default false, create_date date not null default '1970-01-01' );
INSERT INTO t
SELECT generate_series(1, 1000000);
UPDATE t SET cropped = true, resized = true
WHERE filename IN (SELECT filename FROM t ORDER BY random() LIMIT 900000);
UPDATE t SET resized = false
WHERE filename IN (SELECT filename FROM t WHERE cropped = true ORDER BY random() LIMIT 1000);
VACUUM FULL t;
Data now looks like this:
SELECT cropped, resized, count(*) FROM t GROUP BY 1,2;
I create some partial indexes for other queries and one index specifically for the query below:
CREATE INDEX idx_cropped ON t(cropped) WHERE NOT cropped; CREATE INDEX idx_resized ON t(resized) WHERE NOT resized;
CREATE INDEX specific ON t(cropped,resized) WHERE cropped AND NOT resized;
Now I run my query:
EXPLAIN ANALYZE SELECT count(*) FROM t WHERE cropped AND NOT resized AND create_date < CURRENT_DATE;
Aggregate (cost=4001.25..4001.26 rows=1 width=8) (actual time=478.557..478.558 rows=1 loops=1) -> Index Scan using idx_resized on t (cost=0.29..3777.71 rows=89415 width=0) (actual time=478.177..478.480 rows=1000 loops=1) Filter: (cropped AND (create_date < CURRENT_DATE)) Rows Removed by Filter: 100000
It takes 478 ms on dbfiddle.uk (on my machine it's faster but the difference is still visible).
Now I delete an index:
DROP INDEX idx_resized;
and run the same query again and I get a much better plan:
Aggregate (cost=11876.27..11876.28 rows=1 width=8) (actual time=0.315..0.316 rows=1 loops=1) -> Bitmap Heap Scan on t (cost=35.50..11652.73 rows=89415 width=0) (actual time=0.054..0.250 rows=1000 loops=1) Recheck Cond: (cropped AND (NOT resized)) Filter: (create_date < CURRENT_DATE) Heap Blocks: exact=6 -> Bitmap Index Scan on specific (cost=0.00..13.15 rows=89415 width=0) (actual time=0.040..0.040 rows=1000 loops=1)
which uses the index
specificand completes in less than a ms on both dbfiddle.uk and my machine.
Edit: Additional mystery - when I set the values not with an UPDATE but with a DEFAULT, then the correct index is chosen. What is going on? https://dbfiddle.uk/?rdbms=postgres_14&fiddle=dc7d8aea14e90f08ab6537a855f34d8c
idx_croppeddoesn’t use the same filter as your query so it’s not possible to use.
Your plans tell us that it expects both possible indexes to return 89415 rows, this will be computed by the optimizer based on the information it has (statistics and query). You can try gathering additional statistics.
Because it thinks the selectivity of both indexes are the same, it will go with the smaller one (the one that doesn’t include both columns in its index key).
It’s a bit of a waste to include those columns as the index key - you already know what they are because they’re part of the index filter.
Instead, you could use
create_datewhich gets filtered on in the query. I’ve included an index on
(create_date) where cropped and not resizedIn this demo https://dbfiddle.uk/?rdbms=postgres_14&fiddle=19b28197b2654b7492aa6a8a3ee44cd3