How to optimize simple table with groups and timestamps (SUM over 15M rows with GROUP BY)



  • I think I have quite simple question, but I couldn't find the answer anywhere. I'm using MySQL and I have a simple table:

    id timestamp groupId costA costB costC ...
    1 2022-02-01 19:45 1 5,13 3,20 30,20 ...
    2 2022-02-01 19:45 2 1,13 6,20 40,20 ...
    3 2022-02-01 19:45 3 2,13 7,20 50,20 ...
    4 2022-02-01 20:00 1 12,23 13,20 20,20 ...
    5 2022-02-01 20:00 2 23,23 15,20 22,20 ...

    Some rules of the table:

    • timestamp is always the time dividable by 15 minutes.
    • for each timestamp there is about 5000 rows,
    • there are > 5k groups,
    • there are > 5k new rows each 15 minutes, which makes about 500k new rows a day,
    • currently I have one index ON (timestamp, groupId).

    I want to query that table in two ways:

    SELECT timestamp, SUM(costA), SUM(costB), SUM(costC) 
    FROM table 
    WHERE timestamp BETWEEN :date1 and :date2 
    AND groupId IN (:idList)
    GROUP BY timestamp
    

    which returns:

    timestamp SUM(costA) SUM(costB) SUM(costC) ...
    2022-02-01 19:45 5,13 3,20 30,20 ...
    2022-02-01 20:00 2,23 13,20 20,20 ...

    AND

    SELECT groupId, SUM(costA), SUM(costB), SUM(costC) 
    FROM table 
    WHERE timestamp BETWEEN :date1 and :date2 
    AND groupId IN (:idList)
    GROUP BY groupId
    

    which returns

    groupId SUM(costA) SUM(costB) SUM(costC) ...
    1 5,13 3,20 30,20 ...
    2 2,23 13,20 20,20 ...

    To do so - I created an index ON two columns (timestamp, groupId), but when I try to run the query on long date range (for example a month - which makes a sum of 15M rows), the MySQL is either very slow (takes minutes to execute it) or the timeout occurs and select cannot be executed.

    I just wonder what I could do to make it really fast, like respond in less than 1 second for a month set. I don't know if the index I have is correct, but when I use "DESCRIBE" it looks like it's using correct index or maybe I should add more memory to the MySQL server?



  • Add

    INDEX(groupID, timestamp)
    

    It will help some of the cases you presented.

    Are your time ranges always full days? If so, daily subtotals in a summary table would be much faster.

    Please provide SHOW CREATE TABLE. With that many rows, let's discuss the datatypes. And if you have an id whether you can get rid of it -- and promote one of the indexes to be the PK.

    How long does the insert of 5K rows take? If needed, we can discuss improvements on that.

    Since you have a 2-dimensional index, it may be worth discussing PARTITIONing. But first, will you be deleting "old" rows? If yes, then I will definitely push for PARTITION BY RANGE TO_ROWS(..). How much data will be kept (eg, "one year's worth")? How many different "groups"?

    How much RAM? What is the setting of innodb_buffer_pool_size?




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2