Create an index for fields within an array of composite type

  • My question is a follow-up to the question answered here:

    Considering a table with an array column of a composite type, is it possible with PostgreSQL to index the column to be able to search for rows containing an array entry matching some arbitrary predicate.

    Here is a fiddle to start off the problem:

    We have the following schema:

    CREATE TYPE complex AS (
        r       double precision,
        i       double precision

    CREATE TABLE tbl2 (tbl2_id serial, co complex[]);

    INSERT INTO tbl2(co)
    select array_agg((random()* 100, random()*100)::complex)
    from generate_series(1, 50000) i
    group by i % 10000;

    -- how to create an index on co[*].r basically?
    CREATE INDEX tbl2_co1_idx ON tbl2 (((co[1]).r)); -- note the parentheses!
    -- * this is only a single array entry's r values

    Is there a mechanism to do an indexed lookup for queries such as this:

       (SELECT *,
               generate_subscripts(co, 1) AS s
        FROM tbl2) AS foo
    WHERE (co[s].r) BETWEEN 9.65 and 9.67;

    The rationale behind this could be to have items such as polygons for example with a small number of points (x,y) and to then lookup easily which polygons are out of bounds. It is a more NoSQL-like approach, which would be great if it is doable without resorting to jsonb.

  • It is possible to create an index for fields within an array of composite type, but the applicable for arrays is GIN. And that would work for your query predicate WHERE (co[*].r) BETWEEN 9.65 and 9.67. So not possible after all - with current Postgres 14, and probably future versions as well.

    The problem is merely academic, though, because you wouldn't do that to begin with. The solution is a proper relational design.

    Whenever you find yourself creating a table with an array of composite type, cease and desist. With a probability bordering on certainty, you should create a separate table instead, your design at least that much.

    Then everything becomes rather simple. Could look like this:

    -- parent table optional, not needed for demo
    CREATE TABLE tbl2_parent (tbl2_id serial PRIMARY KEY);  -- more attributes?

    -- random test data
    INSERT INTO tbl2_parent
    SELECT FROM generate_series(1, 1000) t;

    -- main table
    CREATE TABLE tbl2(
    tbl2_id int NOT NULL -- REFERENCES tbl2_parent
    , idx int NOT NULL
    , l float8 NOT NULL
    , r float8 NOT NULL
    , PRIMARY KEY (tbl2_id, idx)

    -- random test data
    INSERT INTO tbl2 (tbl2_id, idx, l, r)
    SELECT t
    , i
    , random() * 100
    , random() * 100
    FROM generate_series(1, 1000) t, generate_series(1, 5) i
    ORDER BY 1, 2;

    Now, the index is simple:

    CREATE INDEX tbl2_r_idx ON tbl2 (r, tbl2_id);

    Any B-tree index with leading r does the job. Minor additional optimizations depend on the complete use case.

    And the query is something like:

    FROM   tbl2
    WHERE  r BETWEEN 9.65 and 9.67;

    Or, if you need more than distinct IDs:

    SELECT *
    FROM   tbl2_parent p
       SELECT FROM tbl2 t
       WHERE  t.r BETWEEN 9.65 AND 9.67
       AND    t.tbl2_id = p.tbl2_id


    Either query can use very fast index-only scans with tables that are vacuumed enough, or at least index scans.


    For your original scenario, you could just use the built-in type point. It consists of two float8 quantities, exactly like your custom row type. Reference first and second number with a subscript like:

    SELECT ('(7,8)'::point)[0];  -- 7



    Your query in your original test case can be rewritten as:

    SELECT *
    FROM   tbl2 t
       SELECT FROM unnest( elem
       WHERE (elem.r) BETWEEN 9.65 and 9.67

    Reading your rationale, though, you might consider first, or at least mainline .

Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2