Query Performance for Reports in PostgreSQL



  • I use PostgreSQL 12 to store my raw data and ElasticSearch as the datawarehouse for reports.

    It happens that there are situations with JOIN (eg, data that exists in one table but does not exist in another) ElasticSearch not attending me, having to make queries that return a large number of results directly in PostgreSQL.

    Many years ago I worked with a SELECT FROM

    WHERE ;
    totally without LIMIT or OFFSET, this caused my database to have a high load, taking up a lot of memory to return the results and my web server couldn't handle the amount of data that was returned, crashing (memory limit overflow, for example).

    To solve this, I started working with a paging system, bringing in a limited number of records and paging the results with LIMIT and a condition to display the records below the last record of the previous query.

    Example:

    -- 1st Query
    SELECT  FROM  WHERE  LIMIT 512;
    -- 2nd Query
    SELECT  FROM 
    WHERE AND id < last_id_query_1 LIMIT 512; -- 3rd Query SELECT FROM
    WHERE AND id < last_id_query_2 LIMIT 512;

    In my opinion this method does not seem to be very good. Reading about CURSORS it seems that it does something similar, but using only one search. And several sites say that it performs much better than LIMIT/OFFSET.

    Is this kind of statement true or is the current way I work also good?

    As I cannot perform an EXPLAIN ANALYZE at each FETCH of the CURSOR, I have no way of actually analyzing whether it is faster and has better performance.



  • What you are currently doing is key-set pagination (except you don't show the necessary ORDER BY to make it work correctly--I'm assuming it is really there). With many queries and with the appropriate index this can work very well. You mention some concerns about it, but without any details on those concerns it is hard to know how valid and/or addressable they are.

    Both cursor and key-set pagination should be much more efficient than OFFSET/LIIMT when there are a large number of pages, but note that in your example there is no OFFSET. There is just a LIMIT and a key-set for where to start. In other words, OFFSET/LIMIT might be bad, but that is not what you are doing.

    The nice thing about key-set pagination over server-side cursor pagination is that the state is entirely and efficiently managed by the client. If the client doesn't come back for the next page for two weeks (or ever), that is of no concern to the server. With a server-side cursor however, you need some mechanism to hold the database connection open between page visits, and reattach to it. You also need some mechanism to declare a cursor abandoned, lest they accumulate and consume an unbounded amount of resources.

    auto_explain will still record plans for cursors, as long as it is closed cleanly, either by an explicit CLOSE or by COMMIT.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2