How to optimize simple table with groups and timestamps (SUM over 15M rows with GROUP BY)
I think I have quite simple question, but I couldn't find the answer anywhere. I'm using MySQL and I have a simple table:
id timestamp groupId costA costB costC ... 1 2022-02-01 19:45 1 5,13 3,20 30,20 ... 2 2022-02-01 19:45 2 1,13 6,20 40,20 ... 3 2022-02-01 19:45 3 2,13 7,20 50,20 ... 4 2022-02-01 20:00 1 12,23 13,20 20,20 ... 5 2022-02-01 20:00 2 23,23 15,20 22,20 ...
Some rules of the table:
- timestamp is always the time dividable by 15 minutes.
- for each timestamp there is about 5000 rows,
- there are > 5k groups,
- there are > 5k new rows each 15 minutes, which makes about 500k new rows a day,
- currently I have one index ON (timestamp, groupId).
I want to query that table in two ways:
SELECT timestamp, SUM(costA), SUM(costB), SUM(costC) FROM table WHERE timestamp BETWEEN :date1 and :date2 AND groupId IN (:idList) GROUP BY timestamp
timestamp SUM(costA) SUM(costB) SUM(costC) ... 2022-02-01 19:45 5,13 3,20 30,20 ... 2022-02-01 20:00 2,23 13,20 20,20 ...
SELECT groupId, SUM(costA), SUM(costB), SUM(costC) FROM table WHERE timestamp BETWEEN :date1 and :date2 AND groupId IN (:idList) GROUP BY groupId
groupId SUM(costA) SUM(costB) SUM(costC) ... 1 5,13 3,20 30,20 ... 2 2,23 13,20 20,20 ...
To do so - I created an index ON two columns (timestamp, groupId), but when I try to run the query on long date range (for example a month - which makes a sum of 15M rows), the MySQL is either very slow (takes minutes to execute it) or the timeout occurs and select cannot be executed.
I just wonder what I could do to make it really fast, like respond in less than 1 second for a month set. I don't know if the index I have is correct, but when I use "DESCRIBE" it looks like it's using correct index or maybe I should add more memory to the MySQL server?
Marcee last edited by
It will help some of the cases you presented.
Are your time ranges always full days? If so, daily subtotals in a summary table would be much faster.
SHOW CREATE TABLE. With that many rows, let's discuss the datatypes. And if you have an
idwhether you can get rid of it -- and promote one of the indexes to be the PK.
How long does the insert of 5K rows take? If needed, we can discuss improvements on that.
Since you have a 2-dimensional index, it may be worth discussing PARTITIONing. But first, will you be deleting "old" rows? If yes, then I will definitely push for
PARTITION BY RANGE TO_ROWS(..). How much data will be kept (eg, "one year's worth")? How many different "groups"?
How much RAM? What is the setting of