Optimizing expensive GROUP BY / ORDER BY query on 690,000 row table
I am writing a query to calculate hiscore rankings for a game. It performs a GROUP BY to aggregate hiscore statistics for individual player stats that are stored across 23 different rows (separated by different "skills").
The query I came up with below works:
SELECT * FROM ( SELECT username, SUM(level) total_level, SUM(experience) total_experience, (SELECT rank FROM overall_hiscore_rankings WHERE username = skill_hiscore.username) rank FROM skill_hiscore GROUP BY username ORDER BY rank ) a WHERE rank >= 1 AND rank
Note: overall_hiscore_rankings is a materialized view and the filter at the end is my implementation of the seek method for pagination to prevent expensive LIMIT OFFSET pagination.
I did not notice any problems with this until seeding about ~30,000 players worth of data into my hiscores schema. Because each player has 23 different rows as mentioned above, this leaves us with a table with 690,000 rows.
This now gives us atrocious performance, literally taking my db 100 seconds to execute this query:
Sort (cost=35926899.17..35926907.50 rows=3330 width=35) (actual time=98599.592..98599.595 rows=25 loops=1) Sort Key: ((SubPlan 1)) Sort Method: quicksort Memory: 26kB -> GroupAggregate (cost=0.42..35926704.35 rows=3330 width=35) (actual time=149.023..98599.514 rows=25 loops=1) Group Key: skill_hiscore.username Filter: (((SubPlan 2) >= 1) AND ((SubPlan 3) Index Scan using skill_hiscore_username on skill_hiscore (cost=0.42..46884.88 rows=690063 width=19) (actual time=144.644..293.859 rows=690063 loops=1) SubPlan 1 -> Seq Scan on overall_hiscore_rankings (cost=0.00..567.04 rows=1 width=8) (actual time=0.002..1.592 rows=1 loops=25) Filter: ((username)::text = (skill_hiscore.username)::text) Rows Removed by Filter: 30002 SubPlan 2 -> Seq Scan on overall_hiscore_rankings overall_hiscore_rankings_1 (cost=0.00..567.04 rows=1 width=8) (actual time=0.817..1.633 rows=1 loops=30003) Filter: ((username)::text = (skill_hiscore.username)::text) Rows Removed by Filter: 30002 SubPlan 3 -> Seq Scan on overall_hiscore_rankings overall_hiscore_rankings_2 (cost=0.00..567.04 rows=1 width=8) (actual time=0.816..1.633 rows=1 loops=30003) Filter: ((username)::text = (skill_hiscore.username)::text) Rows Removed by Filter: 30002 Planning Time: 2.820 ms JIT: Functions: 21 Options: Inlining true, Optimization true, Expressions true, Deforming true Timing: Generation 1.248 ms, Inlining 44.936 ms, Optimization 60.995 ms, Emission 38.520 ms, Total 145.700 ms Execution Time: 98623.550 ms
GROUP BYis causing the largest performance drain, but I am not sure how to rewrite this.
Nothing pops out at my that indicates this query would take 100 seconds - I have never seen a query perform so poorly, and frankly trying to optimize this is out of my knowledge.
This query is not backed by any indexes.
My main question is this: Is the lack of an index on this query for the
ORDER BYdata enough to make the runtime of this query 100 seconds?
The correlated subquery is a real drag on performance, and without need.
The lack of indexes, too.
SELECT username , sum(s.level) AS total_level , sum(s.experience) AS total_experience , o.rank FROM overall_hiscore_rankings o JOIN skill_hiscore s USING (username) WHERE o.rank >= 1 AND o.rank
overall_hiscore_rankings.usernameis unique, you can just add
GROUP BY. I'd need table definitions to be sure ...
Ideally, you have indexes like:
CREATE INDEX ON overall_hiscore_rankings (rank, username); CREATE INDEX ON skill_hiscore (username) INCLUDE (level, experience);
The covering index only makes sense if your table is vacuumed enough. Else make it on just
(username)- the index
skill_hiscore_usernamein your query plan seems to cover that. See:
Be sure to have
autovacuumrunning, or run
VACUUM ANALYZEmanually once after creating your test case to get current column statistics and update the visibility map.
BTW, you mention "23 different rows (separated by different skills)", but
WHERE rank >= 1 AND rank seems to select 25 of them. Did you mean
WHERE rank > 1 AND rank < 25?