How can MySQL be optimized for unattended operation on unreliable hardware?



  • I run some personal services on a not-entirely-reliable single-board computer. Sometimes it locks up and needs to be power cycled.

    One of these services is currently configured to use MySQL with InnoDB, keeping the database on an ext4 (IIRC) MicroSD card.

    My fundamental design concern here is that I need to give this setup as little attention as possible. I don't care if the database comes up completely empty, or with random stuff missing, but I need it to come up every time without human intervention.

    InnoDB does not want to work this way, and moreover is suspiciously vulnerable to data corruption in a way that seems unusual for an application designed to fsync often and journal correctly.

    Today the complaint was that it couldn't apply its redo log because, as I understood it, something was missing from the middle of the sequence number space in the log. Then it stops as if this is a thing I can be expected to do something about, rather than evidence of a bug in InnoDB (or maybe an actual bad sector).

    So my questions are:

    1. How do I best configure MySQL to run 100% unattended, so it stops failing safe to protect my data, and is willing to throw away (or stash elsewhere, or work off a last good snapshot of) a corrupted database instead of just halting? If MySQL can't do that, is there another SQL server that can?

    2. How can I configure InnoDB (or a more reliable MySQL storage engine) to be less likely to get into weird corrupt states like this in the first place? I should be able to stand there and pull the SD card in and out and never lose a transaction, if syncing and journaling is implemented properly, right? Is there some kind of flag I need to be setting in MySQL or on the filesystem to make something atomic? Or does MySQL and InnoDB not actually promise to be safe against unexpected shutdown/disk disconnection, under any configuration?



  • Consider a Galera Cluster spread across 3 such pieces of crappy hardware.

    It can keep running if any one node dies. If that node somehow spontaneously comes back to life, the other two will help it get caught up with recent changes. Ditto if the dead node is manually rebooted.

    If two nodes die, you are SOL.

    With 5 nodes, the Cluster could survive 2 failures before needing manual intervention. (7 or more nodes may run into other issues; I don't recommend it.)




Suggested Topics

  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2