Why is count(x.*) slower than count(*)?



  • explain select count(x.*) from customer x;
    ...
    ->  Partial Aggregate  (cost=27005.45..27005.46 rows=1 width=8)
      ->  Parallel Seq Scan on customer x  (cost=0.00..26412.56 rows=237156 width=994)
    

    explain select count(*) from customer x;
    ...                                     
    

    -> Partial Aggregate (cost=27005.45..27005.46 rows=1 width=8)
    -> Parallel Seq Scan on customer x (cost=0.00..26412.56 rows=237156 width=0)


    The COUNT(x.*) here makes the width in the explain result read unnecessary row data.

    I thought they should be identical, but it seems not, why?



  • Logically, both are identical - because x.* always counts, even when all columns are NULL.
    But Postgres has a separate implementation for count(*).

    It does not bother with any expression at all and only considers the existence of live rows. That's slightly faster, which sums up to a relevant difference over many rows.
    The performance penalty for count(x.*) grows with the number of columns / width of rows, and will be rather substantial for wide rows like yours (width=994).

    It's even https://www.postgresql.org/docs/current/functions-aggregate.html#FUNCTIONS-AGGREGATE-TABLE :

    count ( * ) → bigint

    Computes the number of input rows.


    count ( "any" ) → bigint

    Computes the number of input rows in which the input value is not null.

    The gist of it: whenever you don't care whether an expression is NULL, use count(*) instead.

    Related:

    • https://dba.stackexchange.com/questions/27558/for-absolute-performance-is-sum-faster-or-count/27572#27572

    Some other RDBMS do not have the same fast path for count(*). OTOH, counting all rows in a table is comparatively slow in Postgres due to its MVCC model that forces checking row visibility. See:

    • https://stackoverflow.com/a/7945274/939860



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2