1. 11 Aug, 2016 4 commits
  2. 10 Aug, 2016 2 commits
    • Mike Hibler's avatar
      Rejiggered reload_daemon to enforce a max time. · b6d272a2
      Mike Hibler authored
      There are now some sitevars to control its behavior, the one of interest here
      is reload/failtime:
      
      The way the reload daemon is supposed to work now is that nodes will be
      started on their reloading adventure with an os_load. If they are still there
      after reload/retrytime minutes, then they will either be rebooted (if the
      os_load was successful) or os_load'ed again (if the first os_load failed
      outright). The logic for either of these is that there might have been some
      transient condition that caused the failure. If we do have to perform this
      "retry" then we will send email to testbed-ops if reload/warnonretry is set.
      If, after another reload/retrytime minutes, a node is still there, then the
      node will be sent to hwdown, possibly powering it off or booting it into the
      admin MFS depending on the setting of reload/hwdownaction.
      
      So really, reload/failtime should not be needed. All node should exit
      reloading in 2 * reload/retrytime minutes. But it is there as a backstop
      (and because I didn't understand the logic of the reload daemon at first!)
      Well, it also comes into play if the reload daemon is restarted after being
      down for a long period of time. In this case, all nodes in reloading will
      get moved to hwdown. May need to reconsider this...
      b6d272a2
    • David Johnson's avatar
      Make sure udev settles enough to get us network interfaces. · 58db6edd
      David Johnson authored
      Ok, it seems that sometimes the network.target runs before network
      devices have fully finished going through udev.  I think what goes on
      here is that udev can "settle" (meaning there are no events), but there
      will still be some events in the future.
      
      So now in the special networking-emulab.service, we settle AND wait for
      at least one auto, non-lo interface to appear via ifquery.
      58db6edd
  3. 09 Aug, 2016 11 commits
    • David Johnson's avatar
      Attempt to safely work around systemd swap service on Ubuntu 16. · 7afd5f24
      David Johnson authored
      systemd.swap is one of its special builtin services.  Basically, swap
      devices are parsed out of fstab, or by examining a disk's GPT.  Any such
      devices are turned into instantiated units.  This happens via the
      systemd-fstab-generator.  Generators in systemd are almost
      uncontrollable.  They run immediately, prior to on-disk unit file
      parsing, and all you can do is disable or replace them.  You cannot
      express dependencies on the resulting units (unless you write your own
      generator).  Generators also run in an impoverished environment (think
      read-only /etc), so we cannot just add another generator that does
      basically what fixup-fstab-swaps does.  Finally, we cannot write a
      template unit file for all swap devices (we would use this to inject a
      blocking dependency so that these swap units don't conflict with us).
      Lennart has recognized the value in this, but thought the impl effort is
      pretty hard.  This makes sense, because the generators run prior to unit
      file load from disk (and presumably that would nix templates for
      generated units)... and I gather there are other problems as well.
      
      This is quite problematic for us because we rely on the ability to
      update /etc/fstab with the name of the real swap device, and to mkswap
      on it.  However, on machines with lots of cores, systemd is at its
      parallelizing best, and inevitably systemd tries to start up one of its
      instantiated swap device units at the same time as our fixup-fstab-swaps
      script is running.
      
      So I've done several things to try to deal with this situation.  First,
      this Ubuntu 16-specific version of fixup-fstab-swaps no longer adds a
      swap line to fstab with options=defaults -- instead it uses
      options=noauto,x-emulab-auto .  The noauto causes systemd's instantiated
      swap units to not automatically run on boot (don't worry, they become
      active if fixup-fstab-swaps swapons them, and thus they get swapped off
      prior to umount -- important that happens to avoid hangs); but our
      script will swapon the noauto,x-emulab-auto swap partitions as if they'd
      had options=default|auto.  What this does break is swapon/off -a --- but
      who cares.  The x-* comment option in fstab is something I didn't know
      about, I'll admit.
      
      Second, I've done is make emulab-fstab-fixup.service Conflict with
      swap.target, but also to be pulled in by swap.target!  The hope was that
      this would ensure that our service *always* runs successfully, even if
      it kills off swap.target to "handle" the conflict.  Well, the problem is
      that we need to Conflict with the instantiated swap unit files, not
      swap.target... so I think that isn't really working.  But I left it in
      -- maybe it is helping us win races.
      
      The one thing I cannot block is that systemd looks at the partition
      types of at least one of our hardware types (d820) and generates swap
      unit files by the partition UUID.  How it is doing this, I have no idea
      -- that behavior is only supposed to happen if your disk is GPT.  So we
      get failures on the d820s from the systemd instantiated swap units on
      first boot, but our scripts always do the right thing.
      7afd5f24
    • Leigh Stoller's avatar
      Watch for unqualified interface names (not node_id:iface_id). Mostly · aabf7bb1
      Leigh Stoller authored
      we see that kind of interface naming and that is how geni-lib does it.
      But need to accept that syntax.
      aabf7bb1
    • David Johnson's avatar
      Control yet another systemd/udev race/dependency on Ubuntu 16. · 3ed87a0f
      David Johnson authored
      The stock Ubuntu 16 networking.service only runs `udevadm settle` if
      there are 'auto ...' stanzas in /etc/network/interfaces .  Well, we got
      rid of that a few commits ago, and now let udev rules populate
      /etc/network/interfaces (really /run/emulab-interfaces.d-auto-added/*).
      So, it's either hack the networking.service unit file to force udev to
      settle, and have it blown away on package update; or add a
      networking-emulab.service that has to run before networking.service to
      force udev to settle.  We *always* want udev to settle on any Emulab
      node before bringing up interfaces, just in case the control net NIC is
      slow for whatever reason.
      3ed87a0f
    • Mike Hibler's avatar
      b6abcfd2
    • David Johnson's avatar
      Allow the systemd fstab swap fixer to mkswap/swapon! · 492a748d
      David Johnson authored
      I cannot find why we called fixup-fstab-swaps with '-E' (which means,
      don't mkswap/swapon any swaps).  The only thing I can think of is that
      perhaps running swapon manually made the systemd dev-*.swap targets
      unhappy.  However, it is necessary to mkswap if the swap device didn't
      exist, because systemd will not mkswap for you, AFAIK -- it will only
      swapon.  On Ubuntu 16, the dev-*.swap targets are happy whether they or
      Emulab does the swapon.  If that's not true on Centos 7 or other
      systemds, we may have to revisit this tweak.
      492a748d
    • David Johnson's avatar
      3229712e
    • Mike Hibler's avatar
      Duh! FreeBSD prepare does not need to remove Linux-generated swap lines! · d3da3fd1
      Mike Hibler authored
      Put that code in the Linux prepare instead.
      d3da3fd1
    • Mike Hibler's avatar
      Fix /etc/fstab swap line removal for Linux images. · 7186318f
      Mike Hibler authored
      The comment line is different.
      7186318f
    • Mike Hibler's avatar
      New sitevars for reload daemon. · c7f6e63d
      Mike Hibler authored
      c7f6e63d
    • Mike Hibler's avatar
      Allow root to run node_admin. · fc3b111b
      Mike Hibler authored
      fc3b111b
    • David Johnson's avatar
      Ensure the local fs is up before whacking fstab in swap fixup. · cbd967df
      David Johnson authored
      The remount-root-fs unit changed names in 16 to systemd-remount-fs ,
      and I didn't see the race in the first round of testing, I guess.
      cbd967df
  4. 08 Aug, 2016 2 commits
    • David Johnson's avatar
      Handle control net on newer systemd/udevds correctly on Ubuntu. · 9c456003
      David Johnson authored
      Prior to this commit, in Ubuntu 16, our control net hook was getting
      invoked accidentally by udev rules that look for bridge ports or vlan
      ports via ifquery.  Those rules invoke call ifquery -l, but do not add
      the --no-mappings argument to skip mapping processing --- and thus our
      mapping hook got run.  But it was not getting run via systemd's
      networking.service, which is where it needs to run.  That service
      guarantees that udev has 'settled' (flushed its event queue it
      accumulated during boot), which is important for devices with slow
      firmware/drivers/etc.
      
      Sadly, our mapping hook could *not* get run by the normal
      networking.service, because we cannot predict the control net device
      name (the possibilities are determined now by hardware and firmware, and
      could range from enoX to enpXsYfZdA).  ifup -a requires that the real
      device name be present and be set to auto in /etc/network/interfaces.
      You can run ifup -a --force to bring up a non-existent device, but you
      cannot bring it down with ifdown.  Interestingly enough, ifquery does
      not require that all 'auto' devices it returns be real devices, and
      that's why things were working.
      
      First, we have to make sure our findcnet hook does not run via the
      builtin udev rules.  That's easy; we fixed up findcnet to look for some
      udev/systemd env vars, and do nothing in that case.  Hopefully we got
      env vars that are always present...
      
      There are basically 3 strategies we can try after that.  We can make
      our own networking-emulab.service that brings up and down the Emulab
      control net, and make networking.service pull that in.  This way,
      'service networking restart' or 'systemctl restart networking.service'
      would still work.  However, ifup/ifdown would not work, because the
      control net iface is not present in /etc/network/interfaces.  So nix
      that.
      
      Two other options require us to dynamically edit /etc/network/interfaces
      on first boot of a debian/systemd machine, to place all ethernet devices
      into it along with our mapping hook and set them to auto, *and* to
      remove those customizations in prepare.  This sort of sucks, but it
      doesn't suck much worse than if prepare fails in some other part of the
      process.  What is more, we can make it suck less by always checking to
      assure ourselves that the real control net device is present in
      /etc/network/interfaces, and is present on the system.  If we encounter
      anything to the contrary, we can recreate the Emulab section from
      scratch.  Thus if there are prepare failures, the image will still boot
      because any inconsistent cruft will get wiped away.  We can do this
      either by adding a networking-emulab.service that runs and finishes
      prior to networking.service, OR we can add a udev rule that calls a
      script to ensure all ethernet devices are added to
      /etc/network/interfaces prior to running.  At this point, I favor the
      latter approach, if we can guarantee that it finishes prior to anything
      looking at /etc/network/interfaces.  We can't guarantee anything about
      udev events being "finished" for a subsystem, AFAIK.
      
      Finally (and the best way), we can use yet another interfaces(5)
      mechanism and some strategic udev rules of our own!  We add udev rules
      (/etc/udev/rules.d/99-emulab-control-network.rules) that populates the
      /run/emulab-interfaces.d-auto-added dir (listed as a source dir in
      /etc/network/interfaces for the ifup/ifdown/ifquery commands below) with
      files that contain simply 'auto <IFACE>'.  Those rules are careful to do
      only that for certain valid wired Ethernet devices (and deliberately not
      wireless devices!).  Then, once we've got 'auto ...' stanzas for each
      possible Ethernet device, we can continue to utilize the mapping stanzas
      below like previous versions of this file did.  And we don't have
      anything to clean out on reboot or on image capture, because /run is
      automatically cleared.  ifup/ifdown/ifquery are not bothered by the
      absence of the sourced directory in /run, if that didn't exist for any
      reason.
      
      If you need to add another foo* device name, you'll need to edit the
      interfaces file (with another mapping stanza) and update the match rules
      in 99-emulab-control-network.rules .
      9c456003
    • Gary Wong's avatar
  5. 06 Aug, 2016 1 commit
  6. 04 Aug, 2016 1 commit
  7. 03 Aug, 2016 4 commits
    • David Johnson's avatar
      Fix findcnet for newer udev reliable fixed device names. · bbb4ebe0
      David Johnson authored
      In the latest udev world, udev generates predictable device names using
      firmware info and/or pci buss info (i.e., eno1 or enps4f0).  So, we now
      try to run dhclient only on real ethernet devices (i.e., eth*, en*,
      sl*).  There are other kinds of ethernet devices (i.e. wireless, wl*,
      ww*) or virtual devices, but we don't care about finding the control net
      on those.  Might need to add another device name prefix for PV devices
      in Xen guests... we'll see.
      bbb4ebe0
    • David Johnson's avatar
      Workaround dhclient/resolvconf problem in Ubuntu 16. · c2bd98f6
      David Johnson authored
      This replaces the first attempt, which just masked the race condition,
      since I didn't understand what tmcc bossinfo was really doing.  This
      appears to fix it satisfactorily for now; it doesn't seem that we will
      run into the case where the file exists but has no nameserver.
      
        resolvconf on Linux also breaks DNS momentarily via dhclient exit
        hook, or something.  On Ubuntu 16, resolvconf is setup to run via
        dhclient enter hook (the hook redefines make_resolv_conf, which
        dhclient-script eventually executes prior to the exit hook execution).
        For whatever reason, though, sometimes when our exit hook (this
        script) runs, /etc/resolv.conf is a dangling symlink.  I was not able
        to find the source of the asynch behavior, so I can't say for sure.
        But sethostname.dhclient is an immediate casualty, because it calls
        tmcc bossinfo(), and the tmcc binary attempts to use res_init and read
        the resolver and use that as boss.  If there is no /etc/resolv.conf
        (or it is a broken symlink into /run, as it is on resolvconf systems
        before resolvconf runs for the first time on boot), res_init will
        return localhost, and there is no way for us in tmcc to know that is
        inappropriate (taking the res_init resolver might not be the best
        choice, but we do not dare to add a special-case rejection of
        localhost in tmcc).
      c2bd98f6
    • Mike Hibler's avatar
      Update pubsub port for KEEPALIVE fixes. · a3ea0297
      Mike Hibler authored
      a3ea0297
    • Leigh Stoller's avatar
  8. 02 Aug, 2016 1 commit
  9. 01 Aug, 2016 1 commit
    • Leigh Stoller's avatar
      Small DB changes for supporting secure transfer of datasets between · 43c7c976
      Leigh Stoller authored
      clusters using credentials to provide permission to access the datasets.
      
      * Add authority_urn to the images table, which is the urn of the origin
        dataset (similar to the slice urn, the Portal mints a credential in
        its namespace, so that the Portal always has permission to do anything
        it wants to the dataset at the remote cluster).
      
      * Add slot to the apt_datasets table to store a credential from the
        cluster where the dataset lives. This credential gives the owner
        permission to download the dataset, which the portal will delegate to
        any cluster that might need to get that dataset.
      43c7c976
  10. 29 Jul, 2016 8 commits
  11. 28 Jul, 2016 5 commits