PostgresSQL multiple indexes tuning
-
I have a table with those columns: id, uuid, customer_id, building, dept, values
I have a combined index on those columns:
building, dept
.If I run this query:
select count(*) from my_table where building = 666 and dept = 'my_dept'
, it takes 400 ms and there are 25 rowsIf I run this query:
select count(*) from my_table where building = 666 and dept = 'my_dept' and customer_id = 1
, it takes 4 minutes. It always return 25 rowsthat means all the 25 rows belong to customer 1. The table contains ~20 millions of rows
I know I can solve adding
customer_id
into the multi columns index, but why in the second case is so slow? should it benefit from the multi column index even if the customer_id is missed? I expect it a little bit slower but not 4 minutes.Any idea?
-
It looks like not having
customer_id
as a column on thebuilding_dept
is going to be your problem.For your first query, the optimizer knows that it can simply count all the rows directly returned from the
building_dept
index without needing to lookup additional data elsewhere.However, once you add the additional where clause, you give the optimizer a decision to make. To to find rows with
customer_id
, it can do one of two things.- Continue to use the
building_dept
index, and then lookup thecustomer_id
in the main table. - Use
customer_uuid
and then lookupbuilding
anddept
in the main table.
In either case, it now requires looking up data in two place, instead of one, to return the data you're requesting. If you didn't have the
customer_uuid
index, the optimizer would not have that decision to make, and would go with option 1. However, it would still be looking up data in two locations instead of 1. Which would still be sub-optimal.Postgres can estimate the number or rows that will be returned using statistics, and with those, it believe the better plan is to go to the
customer_id-uuid
index and then lookupbuilding
anddept
in the main table, vs using thebuilding_dept
index and looking upcustomer
in the main table. Try reviewing your statistics to see how many rows Postgres believes it has for the values you're passing to your query.Similarly, try running these three select independently and seeing how many rows are returned by each.
select count(*) from my_table where building = 666
select count(*) from my_table where dept = 'my_dept'
select count(*) from my_table where customer_id = 1
To summarize, your best bet is going to be to add
customer_id
to thebuilding_dept
index and see if that helps the second query.
- Continue to use the