Does deleting multiple rows in one statement reduce the size of the binlog



  • I need to remove a large amount of data, but need to do so without overwhelming the amount of data in the binlog. From the binlog's perspective are the following 2 equivalent?

    delete from table sample where id in (1, 2,3)
    

    and

    delete from table sample where id = 1; 
    delete from table sample where id = 2;
    delete from table sample where id = 3;
    


    • First, I'll say that the answer is "it depends". With binlog_format = "STATEMENT", the statements are put in the binlog; hence the second format is bigger.
    • With "ROW" format, information about each row is put in the binlog. So, the number of bytes going into the binlog is the same either way. However, there is likely to be some overhead on each statement. So again, I think the second format will be at least slightly bigger.
    • Are the statement(s) inside a "transaction"? If not, then each statement is a separate transaction, which does have some overhead in the binlog. Furthermore, if the binlog is being used for replication, the ACID semantics are different.
    • A million-row DELETE is very heavyweight when executed -- it must save all the old rows, delete them, then toss the saved copies if you actually COMMIT (versus ROLLBACK). My point is that there are other issues you may need to consider in choosing between the two approaches. I like to delete no more than 1000 rows at a time.
    • If you have expire... set, the binlogs will be 'rotated' periodically, so it mostly does not matter how bulky they are.

    What is your concern? Maybe I left out something.




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2