Database becoming unresponsive after time (4 to 24 hours)



  • Since a couple of days my mariadb database randomly becomes unresponsive after random intervals (which have been between 4 and 20 hours).

    The database comes back to life when rebooting the ct.
    And after some time these queries start to fail with MySqlException (0x80004005): The Command Timeout expired before the operation completed.
    I have also noticed that sometimes restarting the mariadb-service also gets stuck.

    Below is all information I have on the issue:

    System Info

    Ubuntu: Ubuntu 22.04 LTS
    Mariadb: 10.6.7
    Proxmox: 7.2-3

    Fixing attemps

    • Fresh install (vm) importing old data (only copying the database fivem) using mysqldump

    • mysqlcheck --all-databases --auto-repair (while mysql is working. When it is in the unresponsive state, there's no output.

    Proxmox

    This is a screenshot of all information on the proxmox CT: https://i.imgur.com/nR6n3nt.png

    Data and structure

    It happens in a database called fivem which has multiple tables. Notably only the bigger ones are stuck (meaning that select * from TABLE_NAME will not finish for a long time).

    Phpmyadmin table (content): https://i.imgur.com/cUdhvs1.png
    Phpmyadmin table (size and "is hung"): https://i.imgur.com/C6gxn1N.png

    MariaDB Config

    I have done these modification to the default config:

    skip-name-resolve
    max_allowed_packet      = 1G
    max_connections         = 100000
    table_cache             = 640
    thread_handling         = pool-of-threads
    innodb_buffer_pool_size = 48103633715
    

    INNODB

    SHOW ENGINE INNODB STATUS; was also unresponsive.

    Processlist

    Shows multiple queries on mostly the states commit and sending data (some commits).

    Hastebin-Link to full processlist: https://www.toptal.com/developers/hastebin/sapajapohi.sql

    Syslog

    Shows a lot of Got an error reading communication packets and (This connection closed normally without authentication.
    Note that these happen while mysql is operational too.

    Some examples: https://www.toptal.com/developers/hastebin/rewozofana.yaml

    Mariadb service

    Output while starting: https://www.toptal.com/developers/hastebin/uracoqutab.yaml

    Restarting the mariadb service also did not work. The last warning it shows are: https://www.toptal.com/developers/hastebin/usewerugut.less

    Conclusion

    If anyone has ideas of what the issue might be or just ideas on how to further debug this I would really appreciate it.

    Edit / Updates

    • max_connections has been lowered to 250 but will need to be increased
    • innodb_buffer has also been lowered
    • It has become apparent that shortly after (or right before) the unresponsivness a lot of Disk IO is observed (20x the usual amount)
    • mysql status shows some warnings the quickly increase in size ( https://i.imgur.com/TxMxC35.png ). I am not sure if these could be a cause and how to fix them for now
    • I noticed that restarting the service (mariadb) occasionally got it stuck at Waiting to flush the buffer pool with the last messages being
    May 19 08:11:06 database systemd[1]: Stopping MariaDB 10.6.7 database server...
    May 19 08:11:06 database mariadbd[659]: 2022-05-19  8:11:06 0 [Note] InnoDB: FTS optimize thread exiting.
    May 19 08:11:06 database mariadbd[659]: 2022-05-19  8:11:06 0 [Note] InnoDB: Starting shutdown...
    May 19 08:11:06 database mariadbd[659]: 2022-05-19  8:11:06 0 [Note] InnoDB: Dumping buffer pool(s) to /var/lib/mysql/ib_buffer_pool
    May 19 08:11:06 database mariadbd[659]: 2022-05-19  8:11:06 0 [Note] InnoDB: Buffer pool(s) dump completed at 220519  8:11:06
    


  • I have probably found the cause and it's a weird edge case.

    I have the cts (from proxmox) running on lvm instead of lvm-thin. And had a backup job using snapshot for the database every 4 hours. My guess is that since snapshot was unavailable it froze the ct for the backup and somehow mariadb did not like that.

    I have now changed that to a backup once a day with the shutdown option.



Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2