-
Leigh B. Stoller authored
mysqld hangs that cause the entire system to grind to a halt. The basic theory of operation is like this: * Once a minute fork a child (protected by a 60 second timeout) to connect to the DB and issue a simple query. If the child can access the DB okay, it exits with a zero status. * If the alarm fires, the child is killed. This indicates that mysqld is no longer responding in a reasonable amount of time (60 seconds). We shift into trying to restart mysqld: * Send mysqld a TERM. Wait for 30 seconds. * Try query again; typically, the situation will not have changed one bit, but I do it anyway. * If mysqld was running, send it a kill -9. Wait for 15 seconds. * Start mysqld. Wait for 5 seconds. * Try query again. If query succeeds, we are done, and no one will have to deal with it Sunday morning at 6am (thanks Tim). * If query still fails, send email and give up trying to do fix anything. The daemon continues to query the DB once a minute; once the query succeeds (cause a human fixed things up), the daemon goes back into its normal mode (attempt to fix things next time it fails). So, the problem is what happens when someone kills off mysqld for some other reason. It may be that this daemon should only try to restart mysqld if and only if, it actually killed a running mysqld. Comments?
c47cefa1