The best approach for optimizing query in large database table



  • I've PostgreSQL database table named worker_activity following structure

       Column      |              Type              | Collation | Nullable |                        
    ---------------+--------------------------------+-----------+----------+
     id            | bigint                         |           | not null | 
     workerName    | character varying(255)         |           | not null |
     workerId      | integer                        |           | not null |
     action        | text                           |           | not null |
     created_at    | timestamp(0) without time zone |           |          |
     updated_at    | timestamp(0) without time zone |           |          |
    

    This table has almost 30 millions of rows and I have the query that creates weekly reports

    select * from "worker_activity" 
      where "created_at" between $1 and $2 
      and ("action" = $3 or "action" = $4) order by "id" asc
    

    So, that "plain" query executes nearly 3 minutes, it's rather slow, isn't it? How can I speed it up using all features PostgreSQL provides? Are window functions or some other aggregation method applicable in my case?

    Additional info as members pointed out:

    • Real row from the table
    id| algorithmName|algorithmId |action |created_at|updated_at
    3 | LiquidityMaker|     1     | {"step_1":{"action":"openOrderSell","orderSum":0.032,"orderPrice":2.3049,"counter":1},"step_2":{"action":"workTime","time":0.03714489936828613}}|2021-07-06 06:49:26|2021-07-06 06:49:26
    
    • psql (PostgreSQL) 10.19 (Ubuntu 10.19-0ubuntu0.18.04.1)
    • Hardware KVM VPS, 32 Gb of RAM, 8 Core CPU, SSD Drive.

    Here is the execution plan:

    explain (analyze, buffers)
    select * from "worker_activity"
    where "created_at" between '2021-07-06 06:49:25' and '2021-07-07 00:00:00'
      and ("action" = 'Start Work' or "action" = 'End Work')
    order by "id" asc;
    
     Sort  (cost=4138424.27..4138424.29 rows=9 width=903) (actual time=235525.506..235571.749 rows=0 loops=1)
       Sort Key: worker_activity_liquidity_maker_1.id
       Sort Method: quicksort  Memory: 25kB
       Buffers: shared hit=120287 read=3750619
       ->  Gather  (cost=1000.00..4138424.12 rows=9 width=903) (actual time=235525.494..235571.733 rows=0 loops=1)
             Workers Planned: 2
             Workers Launched: 2
             Buffers: shared hit=120287 read=3750619
             ->  Append  (cost=0.00..4137423.22 rows=9 width=903) (actual time=235521.443..235521.462 rows=0 loops=3)
                   Buffers: shared hit=120287 read=3750619
                   ->  Parallel Seq Scan on worker_activity_liquidity_maker_1  (cost=0.00..577947.97 rows=1 width=883) (actual time=30871.374..30871.375 rows=0 loops=3)
                         Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at   Parallel Seq Scan on worker_activity_liquidity_maker_2  (cost=0.00..505967.24 rows=1 width=935) (actual time=29927.739..29927.740 rows=0 loops=3)
                         Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at   Parallel Seq Scan on worker_activity_liquidity_maker_3  (cost=0.00..409953.82 rows=1 width=978) (actual time=24560.669..24560.670 rows=0 loops=3)
                         Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at   Parallel Seq Scan on worker_activity_liquidity_maker_4  (cost=0.00..517571.53 rows=1 width=912) (actual time=30845.683..30845.684 rows=0 loops=3)
                         Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at   Parallel Seq Scan on worker_activity_liquidity_maker_5  (cost=0.00..412771.36 rows=1 width=969) (actual time=24726.327..24726.328 rows=0 loops=3)
                         Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at   Parallel Seq Scan on worker_activity_liquidity_maker_6  (cost=0.00..505008.02 rows=1 width=950) (actual time=30233.641..30233.642 rows=0 loops=3)
                         Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at   Parallel Seq Scan on worker_activity_liquidity_maker_7  (cost=0.00..452042.03 rows=1 width=871) (actual time=26881.948..26881.949 rows=0 loops=3)
                         Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at   Parallel Seq Scan on worker_activity_liquidity_maker_8  (cost=0.00..126317.74 rows=1 width=757) (actual time=199.055..199.056 rows=0 loops=3)
                         Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at   Parallel Seq Scan on worker_activity_liquidity_maker_9  (cost=0.00..629843.52 rows=1 width=873) (actual time=37274.984..37274.985 rows=0 loops=3)
                         Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at 


  • That query could be made faster with the following index:

    CREATE INDEX ON worker_activity (created_at, action);
    


Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2