Filter table lookup by array value, but returning groupby unfiltered



  • I have multiple lookup tables and each need to be filtered by an array of values:

    select 
    array_agg(joined_properties.name) as properties,
    items.name
    from items
    left join (
      select properties.name, item_property.item_id, item_property.property_id
      from item_property
      join properties on properties.id = item_property.property_id) as joined_properties on joined_properties.item_id = items.id
    group by items.name;
    

    properties name
    property1 item2
    property1,property2,property4 item1
    property2,property3 item3
    property2 item5
    property3,property4 item4

    I would like to filter each item by an array of properties, but return the unfiltered properties (just like above) in the same query. What is the performant/correct way of doing it without needing to re-run the joins once I found matching rows? Whenever I filter inside the join or outside it before grouping it only returns the matching rows of course:

    select 
    array_agg(joined_properties.name) as properties,
    items.name
    from items
    inner join (
      select properties.name, item_property.item_id, item_property.property_id
      from item_property
      join properties on properties.id = item_property.property_id
      where properties.name = ANY('{"property1","property2"}')
    ) as joined_properties on joined_properties.item_id = items.id
    group by items.name;
    
    properties name
    property1,property2 item1
    property1 item2
    property2 item3
    property2 item5

    Expected output:

    properties name
    property1 item2
    property1,property2,property4 item1
    property2,property3 item3
    property2 item5

    Fiddle below:

    https://www.db-fiddle.com/f/ur2oadEhiJ1H1CRAHJWn1D/2

    Edit 1:

    • Approximate hardware specs: 4-core Xeon Gold VM, 16GB RAM, SSD
    • Relevant DB cardinalities:
      • ~100k records in items,
      • 6 properties_i text tables with their respective intermediary tables,
      • most properties_i tables range from 10-30 records,
      • largest properties_i will have 200 records indexed,
      • each item in items has 0-10 associated properties for any property_i


  • The best solution heavily depends on data distribution and cardinalities - and available indexes.

    This query should be generally good:

    
    SELECT sub.properties, i.name AS item
    FROM  (
       SELECT item_id, array_agg(p.name) AS properties
       FROM  (
          SELECT DISTINCT item_id
          FROM   item_property
          WHERE  property_id = ANY (
             SELECT id FROM properties
             WHERE  name = ANY ('{"property1","property2"}')
             )
          ) ip0
       JOIN   item_property ip USING (item_id)
       JOIN   properties    p  ON p.id = ip.property_id
       GROUP  BY item_id
       ) sub
    JOIN   items i ON i.id = sub.item_id
    ORDER  BY i.name;  -- my optional addition
    

    db<>fiddle https://dbfiddle.uk/?rdbms=postgres_13&fiddle=2427a47000ff7b9095b7376021e505bc

    The innermost subquery ip0 identifies items (or rather: item_id's) that have one of the filtered properties.
    The next subquery sub joins to all properties and aggregates per item - not involving table items, yet!
    The outer query replaces item_id with actual item name to arrive at your desired result. If item names are not defined UNIQUE NOT NULL, I would still return item_id additionally to avoid ambiguities.

    Other query styles may be faster for particular data distributions / search arguments. Thee are many ways. I added one alternative with IN to the fiddle.

    Indexes

    For big tables and selective filters, matching indexes make all the difference. For your given data distribution:

    • You don't need any indexes on properties as it's tiny.

    • You'll already have a UNIQUE or PK constraint on item_property (item_id, property_id). Add another index on the same two columns in reversed order, so (property_id, item_id) in my example. Why? See:

      • https://dba.stackexchange.com/questions/27481/is-a-composite-index-also-good-for-queries-on-the-first-field/27493#27493

      If item_property is vacuumed enough, you'll even see https://www.postgresql.org/docs/14/indexes-index-only-scans.html .

    • Obviously an index on items(id) - that will already be covered by the PK.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2