1. 16 Oct, 2019 2 commits
  2. 14 Oct, 2019 5 commits
  3. 10 Oct, 2019 1 commit
  4. 09 Oct, 2019 1 commit
  5. 30 Sep, 2019 2 commits
    • Mike Hibler's avatar
      Minor tweak for the FreeNAS 11.2-STABLE train. · 78be7b03
      Mike Hibler authored
      This appears to be backward compatible with the 11-STABLE train we are
      running currently.
      78be7b03
    • Mike Hibler's avatar
      Implement an "on server" strategy for copying persistent datasets. · fce7c7c7
      Mike Hibler authored
      This is implemented as a variant of createdataset. If you do:
      
          createdataset -F pid/old pid/new
      
      It will create a new dataset, initializing it with the contents of old.
      The new dataset will of course have the same size, type, and filesystem type
      (if any). Right now the old and new both have to be in the same project, and
      new gets placed in the same pool on the same server (i.e., this is a local
      "zfs send | zfs recv" pipeline).
      
      Implementing copy as a variant of create will hopefully make it easy for
      Leigh in the portal interface as he doesn't have to treat it any different
      than a normal create: fire it off in the background and wait til the lease
      state becomes "valid".
      
      Since a copy could takes hours or even days, there are plenty of opportunities
      for failure that I have not considered too much yet, e.g., the storage server
      rebooting in the middle or boss rebooting in the middle. These things could
      happen already, but we have just made the window of opportunity much larger.
      
      Anyway, this mechanism can serve as the basis for creating persistent datasets
      from clones or other ephemeral datasets.
      fce7c7c7
  6. 04 Sep, 2019 1 commit
    • Mike Hibler's avatar
      Make vnodesetup and mkvnode consistent in their use of signals. · 62865d37
      Mike Hibler authored
      The meanings of USR1 and HUP were reversed between the two.
      In particular, HUP meant "destroy vnode" to mkvnode instead of USR1.
      This had a particularly bad side-effect since HUPs tend to get flung
      around willy-nilly when the physical machine goes down.
      
      This caused a vnode to get destroyed when we rebooted a shared vnode
      host. See emulab-devel issue 521 for details.
      62865d37
  7. 27 Aug, 2019 1 commit
  8. 26 Aug, 2019 2 commits
  9. 21 Aug, 2019 2 commits
    • Mike Hibler's avatar
      Scaling work on blockstore setup and teardown along with a couple of bug fixes. · eb0ff3b8
      Mike Hibler authored
      Previously we were failing on an experiment with 40 blockstores. Now we can
      handle at least 75. We still fail at 100 due to client-side timeouts, but that
      would be addressed with a longer timeout (which involves making new images).
      
      Attacked this on a number of fronts.
      
      On the infrastructure side:
      
      - Batch destruction calls. We were making one ssh call per blockstore
        to tear these down, leaving a lot of dead time. Now we batch them up
        in groups of 10 per call just like creation. They still serialize on
        the One True Lock, but switching is much faster.
      
      - Don't create new snapshots after destroying a clone. This was a bug.
        If the use of a persistent blockstore was read-write, we forced a new
        snapshot even if it was a RW clone. This resulted in two calls from
        boss to the storage server and two API calls: one to destroy the old
        snapshot and one to create the new one.
      
      Client-side:
      
      - Increase the timeout on first attach to iSCSI. One True Lock in
        action again. In the case where the storage server has a lot of
        blockstores to create, they would serialize with each blockstore
        taking 8-10 seconds to create. Meanwhile the node attaching to the
        blockstore would timeout after two minutes in the "login" call.
        Normally we would not hit this as the server would probably only
        be setting up 1-3 blockstores and the nodes would most likely first
        need to load an image and do a bunch or other boot time operations
        before attempting the login. There is now a loop around the iSCSI
        login operation that will try up to five times (10 minutes total)
        before giving up. This is completely arbitrary, but making it much
        longer will lead to triggering the node reboot timeout anyway.
      
      Server-side:
      
      - Cache the results of the libfreenas freenasVolumeList call.
        The call can take a second or more as it can make up to three API
        calls plus a ZFS CLI call. On blockstore VM creation, we were calling
        this twice through different paths. Now the second call will use
        the cached results. The cache is invalidated whenever we drop the
        global lock or make a POST-style API call (that might change the
        returned values).
      
      - Get rid of gratuitous synchronization. There was a stub vnode function
        on the creation path that was grabbing the lock, doing nothing, and then
        freeing the lock. This caused all the vnodes to pile up and then be
        released to pile up again.
      
      - Properly identify all the clones of a snapshot so that they all get
        torn down correctly. The ZFS get command we were using to read the
        "clones" property of a snapshot will return at most 1024 bytes of
        property value. When the property is a comma separated list of ZFS
        names, you hit that limit with about 50-60 clones (given our naming
        conventions). Now we have to do a get of every volume and look at the
        "origin" property which identifies any snapshot the volume is associated
        with.
      
      - Properly synchronize overlapping destruction/setup. We call snmpit to
        remove switch VLANs before we start tearing down nodes. This potentially
        allows the VLAN tags to become free for reuse by other blockstore
        experiments before we have torn down the old vnodes (and their VLAN
        devices) on the storage server. This was creating chaos on the server.
        Now we identify this situation and stall any new creations until the
        previous VLAN devices goes away. Again, this is an arbitrary
        wait/timeout (10 minutes now) and can still fail. But this only comes
        into play if a new blockstore allocation comes immediately on the heels
        of a large deallocation.
      
      - Wait longer to get the One True Lock during teardown. Failure to get
        the lock at the beginning of the teardown process would result in all
        the iSCSI and ZFS state getting left behind, but all the vnode state
        being removed. Hence, a great deal of manual cleanup on the server
        was required. The solution? You guessed it, another arbitrary timeout,
        longer than before.
      eb0ff3b8
    • Mike Hibler's avatar
      Shorten the interval at which we check for vnodesetup termination. · ac31c487
      Mike Hibler authored
      When we kill a vnode, we invoke a new instance of vnodesetup which
      signals the real instance and then waits for it to die. Every 15 seconds
      it checks for death and resignals if it is still alive. 15 seconds is
      very coarse grained and could lead to unnecessary delay for anyone above
      waiting. Now we check every 5 seconds, while still only resignalling
      every 15 seconds.
      ac31c487
  10. 29 Jul, 2019 1 commit
    • Mike Hibler's avatar
      Crank up the retries a notch. · ea99c381
      Mike Hibler authored
      My previous value worked fine when an image had to be loaded on the
      node as it would delay the node's setup of the blockstore and thus
      delay when it would start trying to connect to the storage server.
      When using the default image where no reload is needed at instantiation
      time, the node starts hitting on the storage server much sooner and
      a longer timeout is then required.
      ea99c381
  11. 24 Jul, 2019 1 commit
    • Mike Hibler's avatar
      Address (badly) iSCSI blockstore scaling issue. · 2bab2dd3
      Mike Hibler authored
      We serialize setup of iSCSI "VMs" on the storage server which means
      that setup/teardown can take a really long time. On setup in particular
      this delay can cause client iSCSI logins to timeout and fail. Now we
      retry a couple of times if this happens. So we will wait up to 6 minutes
      for the server. We really can't wait much longer than this since there
      is a higher level timeout that will reboot the node at around 10 minutes
      (I think).
      2bab2dd3
  12. 23 Jul, 2019 1 commit
  13. 22 Jul, 2019 2 commits
  14. 18 Jul, 2019 1 commit
  15. 20 Jun, 2019 1 commit
  16. 19 Jun, 2019 1 commit
    • Mike Hibler's avatar
      Further tweaks to jumbo frames code. · 571b4a14
      Mike Hibler authored
      Now use a sitevar, general/allowjumboframes, rather than MAINSITE
      to determine whether we should even attempt any jumbo frames magic.
      
      Use a per-link/lan setting rather than the hacky per-experiment
      setting to let the user decide if they want to use jumbos. In NS
      world, we already had a link/lan method (set-settings) to specify
      virt_lan_settings which is where it winds up now.
      
      Client-side fixes to make jumbos work with vnodes.
      571b4a14
  17. 11 Jun, 2019 1 commit
  18. 03 Jun, 2019 1 commit
  19. 28 May, 2019 1 commit
  20. 17 May, 2019 2 commits
  21. 13 May, 2019 1 commit
    • Mike Hibler's avatar
      Properly cleanup when exportSlice fails. · 638b8731
      Mike Hibler authored
      We were relying on the subsequent unexportSlice call to do the
      cleanup, but it lacks the necessary state to know what needs to
      be cleaned up. The result of left over targets/target groups/extents.
      638b8731
  22. 26 Apr, 2019 2 commits
  23. 19 Apr, 2019 1 commit
    • David Johnson's avatar
      Better handle systemd-networkd chattiness in control net search. · 126ef78e
      David Johnson authored
      systemd-networkd and friends have become very chatty.  This commit
      is about turning down the noise.  It also removes PreferredLiftime=forever
      because it is no longer valid where it used to be, and cannot be used
      apparently in the DHCP case.  Seems that the default is now "forever"
      anyway, so it's now irrelevant to us.  (Older systemd-networkds would
      set the address lifetime to the advertised lease.)
      
      We also only mark an iface with CriticalConnection=yes once that iface
      has been chosen as the control net.  We used to just mark them all
      in the udev helper so that we didn't have to modify the generated
      config after successful detection, but now systemd-networkd complains
      about bringing down a searched-but-not-control-net interface if
      it is critical.  So, avoid that.
      
      Finally, I added `-q` to our invocation of systemd-networkd-wait-online,
      and increased the timeout with which we call it.  Timeout increase is
      because we would get spurious event loop disconnect messages without it;
      and q to quiet it in other ways.  Ugh.
      126ef78e
  24. 15 Apr, 2019 3 commits
    • Mike Hibler's avatar
      Initial steps to enable jumbo frames on experiment interfaces. · 33beb373
      Mike Hibler authored
      This is just mods to the tmcd "ifconfig" command to include an MTU= arg.
      Right now we don't have anything in the DB for MTU, so tmcd is just returning
      "MTU=" which says to not explicitly set the MTU.
      
      It also includes the basic client-side support which I have tested on a
      physical interface with MTU=1500. Further changes will be needed to DTRT
      on virtual interfaces and their physical carrier interface.
      
      But the hope is to get the client-side part nailed down before the next
      set of images are rolled, so that we will be ready when support for the
      front-side (UI and DB state) get added.
      33beb373
    • David Johnson's avatar
      b1110139
    • David Johnson's avatar
      Fix breakage to raw xmlrpc mode in 13ee8406. · 535c8d7a
      David Johnson authored
      (The hack to get "raw" xml mode from xmlrpclib is quite different than
      for m2crypto.  Basically, the response is parsed in the Transport, so
      not only do we need a special raw input method on the ServerProxy, but
      also a custom "raw" transport that skips the parser.)
      535c8d7a
  25. 09 Apr, 2019 1 commit
    • Mike Hibler's avatar
      Hack-ish change to allow ixl0 as control net. · 2452ef19
      Mike Hibler authored
      Our old hack-ish heuristic would only consider 10Gb interfaces if there
      were no 1Gb interfaces. But the new Powder nodes have both and we want
      one of the 10Gb interfaces to be the control net.
      2452ef19
  26. 03 Apr, 2019 1 commit
    • Leigh Stoller's avatar
      Watch for a bogus handshake; I saw this happen on one of the FEs, we did · 58e1192e
      Leigh Stoller authored
      a handshake even though capserver was not running. But the uid/gid
      values were totally bogus. So sanity check them, and if they look
      whacky, abort the handshake until the next time we wake up, to do it
      again.
      
      I go no good theories as to how this happened. A bad theory is that
      maybe some transient startup process bound that socket for a while, but
      that seems incredibly unlikely.
      58e1192e
  27. 02 Apr, 2019 1 commit