Skip to content
  • Leigh B Stoller's avatar
    Quick fix for watchdog/backup interaction; use a script lock. · 72b4ba32
    Leigh B Stoller authored
    From Slack:
    
    What I notice is that mysqldump is read locking all of the tables for a
    long time. This time gets longer and longer of course as the DB gets
    bigger. Last night enough stuff backed up (trying to get various write
    locks) that we hit the 500 thread limit. I only know this cause mysql
    prints "killing 501" threads at 2:03am. Which makes me wonder if our
    thread limit is too small (but seems like it would have to be much
    bigger) or if our backup strategy is inappropriate for how big the DB is
    and how busy the system is. But to be clear, I am not even sure if
    mysqld throws in the towel when it hits 500 threads, I am in the midst
    of reading obtuse mysql documentation. (edited) There a bunch of other
    error messages that I do not understand yet.
    
    I can reproduce this in my elabinelab with a 10 line perl script. Two
    problems; one is that we do not use the permission system, so we cannot
    use dynamic permissions, which means that the single thread that is left
    for just this case, can be used by anyone, and so the server is fully
    out of threads. And 2) then the Emulab mysql watchdog cannot perform its
    query, and so it thinks mysqld has gone catatonic and kills it, right in
    the middle of the backup. Yuck * 2. (edited)
    
    And if anyone is curious about a more typical approach: "If you want to
    do this for MyISAM or mixed tables without any downtime from locking the
    tables, you can set up a slave database, and take your snapshots from
    there. Setting up the slave database, unfortunately, causes some
    downtime to export the live database, but once it's running, you should
    be able to lock it's tables, and export using the methods others have
    described. When this is happening, it will lag behind the master, but
    won't stop the master from updating it's tables, and will catch up as
    soon as the backup is complete"
    72b4ba32