Mysql 5.6 Memory usage keeps increasing till it is killed by OOM killer
we have a production mysql 5.6.46, which is under heavy load (2000 Queries per second). The memory allocation to Mysql keeps on increasing over a period of time (sometimes in 1 day, sometime in 1 week) and the server gets auto restarted. Operating System is Ubuntu 20:04 LTS.
All the tables are Innodb tables. Total Ram on the server is 512 GB. InnoDB Buffer is set to 350 GB (we have tried changing it 400 GB, 300 GB, doesnt seem to affect the issue). Overall index size is close to 300 GB. Data size is 1.5 TB
Last we added a swap space to stop OOM Killer from killing the server, but even the additional 500 GB of swap becomes full over few additional hours/days and the server crashes (Also when the swapping takes place, the queries hangs for a few seconds). The time taken to crash is also not consistent, it has crashed in 12-14 hours on few occasions and ran OK for a week.
I am pretty sure it is either a memory leak in this mysql version, or we are running some crazy queries which should not be running. The application is pretty old and heavy and while a new complete revamp on the way, this needs to run for another 6-8 months. Checking every query is going to be challenge.
Here are the server variables and global status, if you can help on what configurations we can experiment with, it would be a great help.
Demir last edited by
Analysis of GLOBAL STATUS and VARIABLES:
- Version: 5.6.46-log
- 512 GB of RAM
- Uptime = 03:30:58; Please rerun SHOW GLOBAL STATUS after several hours.
- 1930 QPS
The More Important Issues:
If "transparent huge pages" is turned on in the OS, see if turning it off will prevent the OOMs. (There is some chatter about this as a problem.)
After 3.5 hours, the buffer pool was not filled up. I suggest lowering
innodb_buffer_pool_sizeto 280G as a way to at least delay OOM. (Just a delay, not a cure.)
table_open_cache is now 20000is much higher than
Opened_tables; recommend lowering to
3000. Also, I suggest
table_open_cache_instances = 16
innodb_purge_threads = 4
host_cache_size = 1000-- This might help with the high
Do you have some kind of "connection pooling" turned on? Is it set too high?
The "binlog" is turned on (
log_bin); is it being used for Replication? Or something else?
If you are having issues with "slow queries" there are several settings that need changing to turn on the slowlog to help locate such queries.
tmp_table_sizeis currently only 1MB. That is abnormally small, especially for a machine as big as yours. It may be slowing down some querys. Recommend
Details and other observations:
( table_open_cache ) = 20,000-- Number of table descriptors to cache -- Several hundred is usually good.
( innodb_buffer_pool_instances ) = 64-- For large RAM, consider using 1-16 buffer pool instances, not allowing less than 1GB each. Also, not more than, say, twice the number of CPU cores. -- Recommend no more than 16. (Beginning to go away in 10.5)
( innodb_lru_scan_depth * innodb_buffer_pool_instances ) = 1,024 * 64 = 65,536-- A metric of CPU usage. -- Lower either number.
( innodb_lru_scan_depth ) = 1,024-- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth
( Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_total ) = 7,598,545 / 22937568 = 33.1%-- Pct of buffer_pool currently not in use -- innodb_buffer_pool_size (now 375809638400) is bigger than necessary?
( Com_rollback ) = 59,051 / 12658 = 4.7 /sec-- ROLLBACKs in InnoDB. -- An excessive frequency of rollbacks may indicate inefficient app logic.
( Handler_rollback ) = 31,472 / 12658 = 2.5 /sec-- Why so many rollbacks?
( Innodb_row_lock_time_max ) = 51,005-- Max time to lock a row (millisec) -- Possibly conflicting queries; possibly table scans.
( innodb_flush_neighbors ) = 1-- A minor optimization when writing blocks to disk. -- Use 0 for SSD drives; 1 for HDD.
( innodb_strict_mode ) = innodb_strict_mode = OFF-- ON would catch some subtle errors earlier. -- OFF leaves some warnings as warnings; ON makes them errors.
( sync_binlog ) = 0-- Use 1 for added security, at some cost of I/O =1 may lead to lots of "query end"; =0 may lead to "binlog at impossible position" and lose transactions in a crash, but is faster. 0 is OK for Galera.
( innodb_adaptive_hash_index ) = innodb_adaptive_hash_index = ON-- Usually should be ON. -- There are cases where OFF is better. See also innodb_adaptive_hash_index_parts (after 5.7.9) and innodb_adaptive_hash_index_partitions (MariaDB and Percona). ON has been implicated in rare crashes (bug 73890). 10.5.0 decided to default OFF.
( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF-- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.
( innodb_purge_threads ) = 1-- Number of threads to clean up history list. -- If you have a lot of writes, recommend 4 in versions 5.6 and 10.0 or later.
( max_connections ) = 2,000-- Maximum number of connections (threads). Impacts various allocations. -- If max_connections (now 2000) is too high and various memory settings are high, you could run out of RAM.
( local_infile ) = local_infile = ON-- local_infile (now ON) = ON is a potential security issue
( Created_tmp_tables ) = 2,256,582 / 12658 = 178 /sec-- Frequency of creating "temp" tables as part of complex SELECTs.
( Created_tmp_disk_tables ) = 19,445 / 12658 = 1.5 /sec-- Frequency of creating disk "temp" tables as part of complex SELECTs -- increase tmp_table_size (now 1048576) and max_heap_table_size (now 16777216). Check the rules for temp tables on when MEMORY is used instead of MyISAM. Perhaps minor schema or query changes can avoid MyISAM. Better indexes and reformulation of queries are more likely to help.
( Com_rollback / (Com_commit + Com_rollback) ) = 59,051 / (294527 + 59051) = 16.7%-- Rollback : Commit ratio -- Rollbacks are costly; change app logic
( Com_insert + Com_delete + Com_delete_multi + Com_replace + Com_update + Com_update_multi ) = (635420 + 10508 + 0 + 0 + 2733673 + 0) / 12658 = 266 /sec-- writes/sec -- 50 writes/sec + log flushes will probably max out I/O write capacity of normal drives
( binlog_error_action ) = binlog_error_action = IGNORE_ERROR-- What to do if the binlog cannot be written. -- IGNORE_ERROR is the default for backwards compatibility, but ABORT_SERVER is recommended.
( binlog_format ) = binlog_format = STATEMENT-- STATEMENT/ROW/MIXED. -- ROW is preferred by 5.7 (10.3)
( simplified_binlog_gtid_recovery ) = 0-- Percona: Perform a faster crash recovery. -- Set to 1 to speed up binlog recovery with/without GTIDs after a crash.
( slow_query_log ) = slow_query_log = OFF-- Whether to log slow queries. (5.1.12)
( long_query_time ) = 10-- Cutoff (Seconds) for defining a "slow" query. -- Suggest 2
( log_slow_slave_statements ) = log_slow_slave_statements = OFF-- (5.6.11, 5.7.1) By default, replicated statements won't show up in the slowlog; this causes them to show. -- It can be helpful in the slowlog to see writes that could be interfering with Replica reads.
( Max_used_connections ) = 831-- High-water mark for connections -- Lots of inactive connections is OK; over 100 active connections is likely to be a problem. Max_used_connections (now 831) does not distinguish them; Threads_running (now 9) is instantaneous.
( Max_used_connections / host_cache_size ) = 831 / 703 = 118.2%-- Increase host_cache_size (now 703)
( Aborted_clients ) = 5,129 / 12658 = 0.41 /sec-- Threads bumped due to wait_timeout -- Increase wait_timeout (now 28800); be nice, use disconnect
( thread_cache_size / Max_used_connections ) = 1,000 / 831 = 120.3%-- There is no advantage in having the thread cache bigger than your likely number of connections. Wasting space is the disadvantage.
Com_show_tables = 0 Handler_read_rnd_next / Handler_read_rnd = 2.91 Innodb_log_writes / Innodb_log_write_requests = 0.05% innodb_lru_scan_depth / innodb_io_capacity = 0.341 min(max_heap_table_size, tmp_table_size) = 1MB query_cache_limit = 0
( Innodb_pages_read + Innodb_pages_written ) / Uptime = 1,385 Binlog_cache_use = 173 /sec Bytes_sent = 6391896 /sec Com_kill = 0.48 /sec Com_set_option = 716 /sec Com_show_charsets = 0.25 /sec Com_show_collations = 0.25 /sec Com_update = 215 /sec Created_tmp_files = 0.32 /sec Handler_delete = 1387 /sec Handler_read_key = 171995 /sec Handler_read_next = 9216407 /sec Handler_read_prev = 41634 /sec Handler_read_rnd = 162780 /sec Innodb_buffer_pool_bytes_data = 19093702 /sec Innodb_buffer_pool_pages_data = 1.48e+7 Innodb_buffer_pool_pages_dirty = 198,092 Innodb_buffer_pool_pages_free = 7.6e+6 Innodb_buffer_pool_pages_misc = 587,553 Innodb_buffer_pool_pages_total = 2.29e+7 Innodb_buffer_pool_read_ahead = 44 /sec Innodb_buffer_pool_read_requests = 3098244 /sec Innodb_buffer_pool_write_requests = 74119 /sec Innodb_buffer_pool_write_requests / Innodb_buffer_pool_pages_flushed = 321 Innodb_data_read = 18940790 /sec Innodb_data_reads = 1154 /sec Innodb_data_writes - Innodb_log_writes - Innodb_dblwr_writes = 230 /sec Innodb_data_written = 8925503 /sec Innodb_dblwr_pages_written = 230 /sec Innodb_log_write_requests = 3442 /sec Innodb_os_log_written / (Uptime / 3600) / innodb_log_files_in_group = 2,359.1MB Innodb_pages_created = 11 /sec Innodb_pages_read = 1154 /sec Innodb_pages_read + Innodb_pages_written = 1385 /sec Innodb_pages_written = 230 /sec Innodb_rows_deleted = 1387 /sec Innodb_rows_deleted + Innodb_rows_inserted = 1435 /sec Innodb_rows_read = 9861927 /sec Innodb_rows_updated = 372 /sec Select_range = 73 /sec Sort_range = 50 /sec Threads_cached = 687 innodb_open_files = 20,000 innodb_read_io_threads = 64 innodb_write_io_threads = 64 optimizer_trace_offset = --1 performance_schema_max_cond_instances = 28,100 performance_schema_max_file_instances = 92,308 performance_schema_max_socket_instances = 4,020 performance_schema_max_thread_instances = 4,100 table_definition_cache = 10,000
optimizer_trace_features = greedy_search=on, range_optimizer=on, dynamic_range=on, repeated_subselect=on slave_rows_search_algorithms = TABLE_SCAN,INDEX_SCAN