Any performance reasons to prefer idx() (intarray) over array_position()?



  • Are there performance reasons (or other reasons) to prefer idx (from https://www.postgresql.org/docs/current/intarray.html ) over array_position to find the position of an integer in a list of integers?

    I ask because we've used idx since before array_position was added to Postgres. It would be nice to use the built-in function rather than relying on the intarray extension, if there is no performance penalty.

    Very informal benchmarking suggests it's comparable. PostgreSQL version is 13.5.



  • There are multiple more or less subtle differences. Some become obvious from looking at the description in the manual:

    https://www.postgresql.org/docs/current/intarray.html#INTARRAY-FUNC-TABLE

    idx ( integer[], item integer ) → integer

    Returns index of the first array element matching item, or 0 if no match.

    https://www.postgresql.org/docs/current/functions-array.html#ARRAY-FUNCTIONS-TABLE

    array_position ( anycompatiblearray, anycompatible [, integer ] ) → integer

    Returns the subscript of the first occurrence of the second argument in the array, or NULL if it's not present. If the third argument is given, the search begins at that subscript. The array must be one-dimensional. Comparisons are done using IS NOT DISTINCT FROM semantics, so it is possible to search for NULL.

    • If not found, idx() returns 0, but array_position() returns null.

    • idx() cannot handle null at all. intarray as a whole disallows arrays with null values. So it also makes no sense to search for it, you always get null when searching for null.
      array_position() allows null.

    • Unlike idx(), array_position() allows an offset in the search as 3rd parameter. So you can iterate through arrays to find one match after the other. idx() can never find elements past the first occurrence. (intarray is mostly optimized to deal with sorted, unique array elements.)

    • OTOH, idx() accepts multi-dimensional arrays (treats them as flat), while array_position() does not.

    • Non-standard array subscripts are ignored by idx() (generally bypassing array dimensions and subscripts), but honored by array_position(). So results can differ for seemingly equivalent, simple expressions! It's best to stick with standard array subscripts, but one must be aware of possibilities. See:

      • https://stackoverflow.com/q/12011569/939860
    • array_position() takes polymorphic arguments, so it works for any array type, while idx() works for integer exclusively. There are pros and cons to either. idx() is faster with function type resolution, and will never be confused with different types, but it's limited to integer: integer or die.

    All the added versatility of array_position() comes at a price. Most importantly, array_position() uses IS NOT DISTINCT FROM semantics to handle null values, and that is substantially more expensive. This leads to the possibly most important difference:

    • idx() is much faster and scales much better with long arrays.
      (Your "very informal benchmarking" mislead you somehow.)

    db<>fiddle https://dbfiddle.uk/?rdbms=postgres_14&fiddle=b5d1bdda3aed30260c604e60fdc6ba99 - demonstrating all.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2