MySQL replication setup based on an LVM snapshot – am I forgetting anything?



  • I'm tasked with starting a brand new replica of a client's MySQL 5.7.35 database server. While I've observed others doing this in the past in various circumstances, I haven't been in charge of this operation before, so I'd just like to confirm that my plan is correct and that I'm not missing anything crucial.

    The data lives in an XFS-formatted LVM logical volume, inside a volume group that has enough free space to enable taking an LVM snapshot.

    The database is a legacy mixture of MyISAM and InnoDB tables, over 200 GB in size at the moment. The replica server is set up with an identical MySQL version, and all of the prerequisites for row-based binlog replication (replica user, server ID, log_bin etc.) are in order.

    My process is this:

    1. On the master, run FLUSH TABLES WITH READ LOCK. Leave the client session open.
      • According to https://www.percona.com/blog/2012/03/23/how-flush-tables-with-read-lock-works-with-innodb-tables/ , it's important to try to prevent long-running SELECT queries from being in the middle of execution when this is run, so we'll make an effort to ensure this.
    2. In another client session (don't know why, but the docs say so), run SHOW MASTER STATUS to get the binlog file name and position.
    3. Create the LVM snapshot from the MySQL data volume. (Does not contain binlogs or relay logs.)
    4. In the original MySQL client session, run UNLOCK TABLES to restore normal operation.
    5. Mount the snapshot (-t xfs -o nouuid) and rsync the data directory from it onto the (shut down) replica server – excluding auto.cnf and the autogenerated .pem files (we're not using SSL to connect to MySQL, so they shouldn't particularly matter).
    6. Unmount and lvremove the snapshot.
    7. Start up MySQL on the replica and do the normal binlog replication initialization steps as outlined in the MySQL documentation, providing the MASTER_LOG_FILE and MASTER_LOG_POS options to CHANGE MASTER with the values from step 2, and then just wait for the replica to catch up.

    I've tested this process to work on a trafficless staging server, but my main concern is FLUSH TABLES WITH READ LOCK on the actual production instance. Is it sufficient to prevent long-running SELECTs or is there some other danger I need to be aware of? Any clients attempting writes will naturally fail while the lock is in place, but the goal is to get steps 1–4 finished in under 30 seconds, which is an acceptable partial outage for us.

    Also, since we're running mixed MyISAM and InnoDB, is FLUSH TABLES WITH READ LOCK all we need to do, or does one of the engines require something more to ensure a consistent LVM snapshot?



  • FLUSH TABLES WITH READ LOCK is sufficient. This locks the tables, preventing new writes, and it also makes sure any other writes to tables are flushed to the disk volume for MyISAM tables. For InnoDB tables, some changes may still be in the buffer pool, but those change are also accounted for by the InnoDB redo log, so as you start up the replica those changes will be restored automatically (this is the same as InnoDB crash recovery).

    Using SHOW MASTER STATUS during the read lock is necessary because you need those coordinates for your CHANGE MASTER command on the replica. It's basically analogous to a bookmark, so the replica knows where to start reading in the binary log.

    It is still important to listen to the advice about long-running queries blocking FLUSH TABLES WITH READ LOCK. Any query, even a read-only SELECT, holds a metadata lock on a table. But FTWRL requires no metadata locks held on any tables, at least briefly. So a long-running SELECT will block FTWRL. You can do an experiment to demonstrate this on a test instance of MySQL:

    In one window, create a MyISAM table:

    mysql> create table test.m (i int) engine=myisam;
    mysql> insert into test.m values (42);
    

    Query it in a way that will last some time:

    mysql> select sleep(120) from m;
    (hangs)
    

    In a second window, try FTWRL:

    mysql> flush tables with read lock;
    (hangs)
    

    The FTWRL is waiting for that query to finish, so FTWRL can get its own turn at the metadata lock. If that query takes a long time to finish, FTWRL will still be waiting, and also any other queries will be queued up waiting for FTWRL to release its metadata locks on all tables.

    Percona added a feature to their version of MySQL to help in these situations. It's a lock used during backups which does not suffer this blocking behavior. Read https://www.percona.com/blog/2014/03/11/introducing-backup-locks-percona-server-2/ for details if you're interested, but if you use mainstream MySQL, you don't have this feature.

    The LVM snapshot method is a pretty good solution given that you have a mix of InnoDB and MyISAM tables, because the lock can be done pretty briefly, just long enough to acquire the LVM snapshot and read the MASTER STATUS.

    I'm accustomed to using Percona XtraBackup, which does not require locking, as long as the database stores only InnoDB tables (except for the mysql system tables, which are very small, so they are not a problem to back up during a brief read lock). It's my preference to insist that all tables are stored in the InnoDB. MyISAM has poor performance and is susceptible to data corruption.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2