The best approach for optimizing query in large database table
-
I've PostgreSQL database table named worker_activity following structure
Column | Type | Collation | Nullable | ---------------+--------------------------------+-----------+----------+ id | bigint | | not null | workerName | character varying(255) | | not null | workerId | integer | | not null | action | text | | not null | created_at | timestamp(0) without time zone | | | updated_at | timestamp(0) without time zone | | |
This table has almost 30 millions of rows and I have the query that creates weekly reports
select * from "worker_activity" where "created_at" between $1 and $2 and ("action" = $3 or "action" = $4) order by "id" asc
So, that "plain" query executes nearly 3 minutes, it's rather slow, isn't it? How can I speed it up using all features PostgreSQL provides? Are window functions or some other aggregation method applicable in my case?
Additional info as members pointed out:
- Real row from the table
id| algorithmName|algorithmId |action |created_at|updated_at 3 | LiquidityMaker| 1 | {"step_1":{"action":"openOrderSell","orderSum":0.032,"orderPrice":2.3049,"counter":1},"step_2":{"action":"workTime","time":0.03714489936828613}}|2021-07-06 06:49:26|2021-07-06 06:49:26
- psql (PostgreSQL) 10.19 (Ubuntu 10.19-0ubuntu0.18.04.1)
- Hardware KVM VPS, 32 Gb of RAM, 8 Core CPU, SSD Drive.
Here is the execution plan:
explain (analyze, buffers) select * from "worker_activity" where "created_at" between '2021-07-06 06:49:25' and '2021-07-07 00:00:00' and ("action" = 'Start Work' or "action" = 'End Work') order by "id" asc;
Sort (cost=4138424.27..4138424.29 rows=9 width=903) (actual time=235525.506..235571.749 rows=0 loops=1) Sort Key: worker_activity_liquidity_maker_1.id Sort Method: quicksort Memory: 25kB Buffers: shared hit=120287 read=3750619 -> Gather (cost=1000.00..4138424.12 rows=9 width=903) (actual time=235525.494..235571.733 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 Buffers: shared hit=120287 read=3750619 -> Append (cost=0.00..4137423.22 rows=9 width=903) (actual time=235521.443..235521.462 rows=0 loops=3) Buffers: shared hit=120287 read=3750619 -> Parallel Seq Scan on worker_activity_liquidity_maker_1 (cost=0.00..577947.97 rows=1 width=883) (actual time=30871.374..30871.375 rows=0 loops=3) Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at Parallel Seq Scan on worker_activity_liquidity_maker_2 (cost=0.00..505967.24 rows=1 width=935) (actual time=29927.739..29927.740 rows=0 loops=3) Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at Parallel Seq Scan on worker_activity_liquidity_maker_3 (cost=0.00..409953.82 rows=1 width=978) (actual time=24560.669..24560.670 rows=0 loops=3) Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at Parallel Seq Scan on worker_activity_liquidity_maker_4 (cost=0.00..517571.53 rows=1 width=912) (actual time=30845.683..30845.684 rows=0 loops=3) Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at Parallel Seq Scan on worker_activity_liquidity_maker_5 (cost=0.00..412771.36 rows=1 width=969) (actual time=24726.327..24726.328 rows=0 loops=3) Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at Parallel Seq Scan on worker_activity_liquidity_maker_6 (cost=0.00..505008.02 rows=1 width=950) (actual time=30233.641..30233.642 rows=0 loops=3) Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at Parallel Seq Scan on worker_activity_liquidity_maker_7 (cost=0.00..452042.03 rows=1 width=871) (actual time=26881.948..26881.949 rows=0 loops=3) Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at Parallel Seq Scan on worker_activity_liquidity_maker_8 (cost=0.00..126317.74 rows=1 width=757) (actual time=199.055..199.056 rows=0 loops=3) Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at Parallel Seq Scan on worker_activity_liquidity_maker_9 (cost=0.00..629843.52 rows=1 width=873) (actual time=37274.984..37274.985 rows=0 loops=3) Filter: ((created_at >= '2021-07-06 06:49:25'::timestamp without time zone) AND (created_at
-
That query could be made faster with the following index:
CREATE INDEX ON worker_activity (created_at, action);