clientside/tmcc/ubuntu16/fixup-fstab-swaps · 923fddebfb0c350db3a65ba099b59ec5285dbc89 · emulab / emulab-stable

Attempt to safely work around systemd swap service on Ubuntu 16. · 7afd5f24

David Johnson authored Aug 09, 2016

systemd.swap is one of its special builtin services. Basically, swap
devices are parsed out of fstab, or by examining a disk's GPT. Any such
devices are turned into instantiated units. This happens via the
systemd-fstab-generator. Generators in systemd are almost
uncontrollable. They run immediately, prior to on-disk unit file
parsing, and all you can do is disable or replace them. You cannot
express dependencies on the resulting units (unless you write your own
generator). Generators also run in an impoverished environment (think
read-only /etc), so we cannot just add another generator that does
basically what fixup-fstab-swaps does. Finally, we cannot write a
template unit file for all swap devices (we would use this to inject a
blocking dependency so that these swap units don't conflict with us).
Lennart has recognized the value in this, but thought the impl effort is
pretty hard. This makes sense, because the generators run prior to unit
file load from disk (and presumably that would nix templates for
generated units)... and I gather there are other problems as well.

This is quite problematic for us because we rely on the ability to
update /etc/fstab with the name of the real swap device, and to mkswap
on it. However, on machines with lots of cores, systemd is at its
parallelizing best, and inevitably systemd tries to start up one of its
instantiated swap device units at the same time as our fixup-fstab-swaps
script is running.

So I've done several things to try to deal with this situation. First,
this Ubuntu 16-specific version of fixup-fstab-swaps no longer adds a
swap line to fstab with options=defaults -- instead it uses
options=noauto,x-emulab-auto . The noauto causes systemd's instantiated
swap units to not automatically run on boot (don't worry, they become
active if fixup-fstab-swaps swapons them, and thus they get swapped off
prior to umount -- important that happens to avoid hangs); but our
script will swapon the noauto,x-emulab-auto swap partitions as if they'd
had options=default|auto. What this does break is swapon/off -a --- but
who cares. The x-* comment option in fstab is something I didn't know
about, I'll admit.

Second, I've done is make emulab-fstab-fixup.service Conflict with
swap.target, but also to be pulled in by swap.target! The hope was that
this would ensure that our service *always* runs successfully, even if
it kills off swap.target to "handle" the conflict. Well, the problem is
that we need to Conflict with the instantiated swap unit files, not
swap.target... so I think that isn't really working. But I left it in
-- maybe it is helping us win races.

The one thing I cannot block is that systemd looks at the partition
types of at least one of our hardware types (d820) and generates swap
unit files by the partition UUID. How it is doing this, I have no idea
-- that behavior is only supposed to happen if your disk is GPT. So we
get failures on the d820s from the systemd instantiated swap units on
first boot, but our scripts always do the right thing.

7afd5f24