PostgresSQL multiple indexes tuning



  • I have a table with those columns: id, uuid, customer_id, building, dept, values

    I have a combined index on those columns: building, dept.

    If I run this query: select count(*) from my_table where building = 666 and dept = 'my_dept', it takes 400 ms and there are 25 rows

    If I run this query: select count(*) from my_table where building = 666 and dept = 'my_dept' and customer_id = 1, it takes 4 minutes. It always return 25 rows

    that means all the 25 rows belong to customer 1. The table contains ~20 millions of rows

    I know I can solve adding customer_id into the multi columns index, but why in the second case is so slow? should it benefit from the multi column index even if the customer_id is missed? I expect it a little bit slower but not 4 minutes.

    Any idea?



  • It looks like not having customer_id as a column on the building_dept is going to be your problem.

    For your first query, the optimizer knows that it can simply count all the rows directly returned from the building_dept index without needing to lookup additional data elsewhere.

    However, once you add the additional where clause, you give the optimizer a decision to make. To to find rows with customer_id, it can do one of two things.

    1. Continue to use the building_dept index, and then lookup the customer_id in the main table.
    2. Use customer_uuid and then lookup building and dept in the main table.

    In either case, it now requires looking up data in two place, instead of one, to return the data you're requesting. If you didn't have the customer_uuid index, the optimizer would not have that decision to make, and would go with option 1. However, it would still be looking up data in two locations instead of 1. Which would still be sub-optimal.

    Postgres can estimate the number or rows that will be returned using statistics, and with those, it believe the better plan is to go to the customer_id-uuid index and then lookup building and dept in the main table, vs using the building_dept index and looking up customer in the main table. Try reviewing your statistics to see how many rows Postgres believes it has for the values you're passing to your query.

    Similarly, try running these three select independently and seeing how many rows are returned by each.

    select count(*) from my_table where building = 666

    select count(*) from my_table where dept = 'my_dept'

    select count(*) from my_table where customer_id = 1

    To summarize, your best bet is going to be to add customer_id to the building_dept index and see if that helps the second query.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2