Skip to content
  • Michael Roth's avatar
    qemu-ga: persist tracking of fsfreeze state via filesystem · f789aa7b
    Michael Roth authored
    
    
    Currently, qemu-ga may die/get killed/go away for whatever reason after
    guest-fsfreeze-freeze has been issued, and before guest-fsfreeze-thaw
    has been issued. This means the only way to unfreeze the guest is via
    VNC/network/console access, but obtaining that access after-the-fact can
    often be very difficult when filesystems are frozen. Logins will almost
    always hang, for instance. In many cases the only recourse would be to
    reboot the guest without any quiescing of volatile state, which makes
    this a corner-case worth giving some attention to.
    
    A likely failsafe for this situation would be to use a watchdog to
    restart qemu-ga if it goes away. There are some precautions qemu-ga
    needs to take in order to avoid immediately hanging itself on I/O,
    however, namely, we must disable logging and defer to processing/creation
    of user-specific logfiles, along with creation of the pid file if we're
    running as a daemon. We also need to disable non-fsfreeze-safe commands,
    as we normally would when processing the guest-fsfreeze-freeze command.
    
    To track when we need to do this in a way that persists between multiple
    invocations of qemu-ga, we create a file on the guest filesystem before
    issuing the fsfreeze, and delete it when doing the thaw. On qemu-ga
    startup, we check for the existance of this file to determine
    the need to take the above precautions.
    
    We're forced to do it this way since a more traditional approach such as
    reading/writing state to a dedicated state file will cause
    access/modification time updates, respectively, both of which will hang
    if the file resides on a frozen filesystem. Both can occur even if
    relatime is enabled. Checking for file existence will not update the
    access time, however, so it's a safe way to check for fsfreeze state.
    
    An actual watchdog-based restart of qemu-ga can itself cause an access
    time update that would thus hang the invocation of qemu-ga, but the
    logic to workaround that can be handled via the watchdog, so we don't
    address that here (for relatime we'd periodically touch the qemu-ga
    binary if the file $qga_statedir/qga.state.isfrozen is not present, this
    avoids qemu-ga updates or the 1 day relatime threshold causing an
    access-time update if we try to respawn qemu-ga shortly after it goes
    away)
    
    Signed-off-by: default avatarMichael Roth <mdroth@linux.vnet.ibm.com>
    f789aa7b