mysqld hangs that cause the entire system to grind to a halt. The
basic theory of operation is like this:
* Once a minute fork a child (protected by a 60 second timeout) to
connect to the DB and issue a simple query. If the child can access
the DB okay, it exits with a zero status.
* If the alarm fires, the child is killed. This indicates that mysqld
is no longer responding in a reasonable amount of time (60 seconds).
We shift into trying to restart mysqld:
* Send mysqld a TERM. Wait for 30 seconds.
* Try query again; typically, the situation will not have changed one
bit, but I do it anyway.
* If mysqld was running, send it a kill -9. Wait for 15 seconds.
* Start mysqld. Wait for 5 seconds.
* Try query again. If query succeeds, we are done, and no one
will have to deal with it Sunday morning at 6am (thanks Tim).
* If query still fails, send email and give up trying to do fix
anything. The daemon continues to query the DB once a minute;
once the query succeeds (cause a human fixed things up), the
daemon goes back into its normal mode (attempt to fix things
next time it fails).
So, the problem is what happens when someone kills off mysqld for some
other reason. It may be that this daemon should only try to restart
mysqld if and only if, it actually killed a running mysqld. Comments?