1. 05 Sep, 2019 2 commits
  2. 04 Sep, 2019 1 commit
    • Mike Hibler's avatar
      Make vnodesetup and mkvnode consistent in their use of signals. · 62865d37
      Mike Hibler authored
      The meanings of USR1 and HUP were reversed between the two.
      In particular, HUP meant "destroy vnode" to mkvnode instead of USR1.
      This had a particularly bad side-effect since HUPs tend to get flung
      around willy-nilly when the physical machine goes down.
      
      This caused a vnode to get destroyed when we rebooted a shared vnode
      host. See emulab-devel issue 521 for details.
      62865d37
  3. 03 Sep, 2019 1 commit
  4. 29 Aug, 2019 3 commits
  5. 28 Aug, 2019 1 commit
  6. 27 Aug, 2019 4 commits
  7. 26 Aug, 2019 3 commits
  8. 22 Aug, 2019 4 commits
  9. 21 Aug, 2019 3 commits
    • Mike Hibler's avatar
      Scaling work on blockstore setup and teardown along with a couple of bug fixes. · eb0ff3b8
      Mike Hibler authored
      Previously we were failing on an experiment with 40 blockstores. Now we can
      handle at least 75. We still fail at 100 due to client-side timeouts, but that
      would be addressed with a longer timeout (which involves making new images).
      
      Attacked this on a number of fronts.
      
      On the infrastructure side:
      
      - Batch destruction calls. We were making one ssh call per blockstore
        to tear these down, leaving a lot of dead time. Now we batch them up
        in groups of 10 per call just like creation. They still serialize on
        the One True Lock, but switching is much faster.
      
      - Don't create new snapshots after destroying a clone. This was a bug.
        If the use of a persistent blockstore was read-write, we forced a new
        snapshot even if it was a RW clone. This resulted in two calls from
        boss to the storage server and two API calls: one to destroy the old
        snapshot and one to create the new one.
      
      Client-side:
      
      - Increase the timeout on first attach to iSCSI. One True Lock in
        action again. In the case where the storage server has a lot of
        blockstores to create, they would serialize with each blockstore
        taking 8-10 seconds to create. Meanwhile the node attaching to the
        blockstore would timeout after two minutes in the "login" call.
        Normally we would not hit this as the server would probably only
        be setting up 1-3 blockstores and the nodes would most likely first
        need to load an image and do a bunch or other boot time operations
        before attempting the login. There is now a loop around the iSCSI
        login operation that will try up to five times (10 minutes total)
        before giving up. This is completely arbitrary, but making it much
        longer will lead to triggering the node reboot timeout anyway.
      
      Server-side:
      
      - Cache the results of the libfreenas freenasVolumeList call.
        The call can take a second or more as it can make up to three API
        calls plus a ZFS CLI call. On blockstore VM creation, we were calling
        this twice through different paths. Now the second call will use
        the cached results. The cache is invalidated whenever we drop the
        global lock or make a POST-style API call (that might change the
        returned values).
      
      - Get rid of gratuitous synchronization. There was a stub vnode function
        on the creation path that was grabbing the lock, doing nothing, and then
        freeing the lock. This caused all the vnodes to pile up and then be
        released to pile up again.
      
      - Properly identify all the clones of a snapshot so that they all get
        torn down correctly. The ZFS get command we were using to read the
        "clones" property of a snapshot will return at most 1024 bytes of
        property value. When the property is a comma separated list of ZFS
        names, you hit that limit with about 50-60 clones (given our naming
        conventions). Now we have to do a get of every volume and look at the
        "origin" property which identifies any snapshot the volume is associated
        with.
      
      - Properly synchronize overlapping destruction/setup. We call snmpit to
        remove switch VLANs before we start tearing down nodes. This potentially
        allows the VLAN tags to become free for reuse by other blockstore
        experiments before we have torn down the old vnodes (and their VLAN
        devices) on the storage server. This was creating chaos on the server.
        Now we identify this situation and stall any new creations until the
        previous VLAN devices goes away. Again, this is an arbitrary
        wait/timeout (10 minutes now) and can still fail. But this only comes
        into play if a new blockstore allocation comes immediately on the heels
        of a large deallocation.
      
      - Wait longer to get the One True Lock during teardown. Failure to get
        the lock at the beginning of the teardown process would result in all
        the iSCSI and ZFS state getting left behind, but all the vnode state
        being removed. Hence, a great deal of manual cleanup on the server
        was required. The solution? You guessed it, another arbitrary timeout,
        longer than before.
      eb0ff3b8
    • Mike Hibler's avatar
      Shorten the interval at which we check for vnodesetup termination. · ac31c487
      Mike Hibler authored
      When we kill a vnode, we invoke a new instance of vnodesetup which
      signals the real instance and then waits for it to die. Every 15 seconds
      it checks for death and resignals if it is still alive. 15 seconds is
      very coarse grained and could lead to unnecessary delay for anyone above
      waiting. Now we check every 5 seconds, while still only resignalling
      every 15 seconds.
      ac31c487
    • Leigh Stoller's avatar
  10. 20 Aug, 2019 1 commit
  11. 19 Aug, 2019 10 commits
  12. 18 Aug, 2019 1 commit
  13. 16 Aug, 2019 1 commit
  14. 13 Aug, 2019 2 commits
    • Leigh Stoller's avatar
      Changes to handle extremely high load: · 186eff94
      Leigh Stoller authored
      So the basic problem is that when you load the instantiate page there is
      an ajax call to the profile info. Normally this returns before the user
      has a chance to move the mouse the Next button. When there are 40 or so
      loads of that profile, its take longer and the test harness 'clicks'
      next before its ready for it. The steps package we use does not have a
      concept of disabling the next button until some event is ready, so I
      have to add that. Turns out the same exact problem happens on Step two;
      the parameterize step is not ready before the harness clicks the next
      button again. The test harness can look/wait for a button to become
      enabled, but only if something disables the button. Every one of the
      errors is some form of this problem. Anyway, users will see the same
      exact problem on a super loaded boss; they will click next and the page
      will break during the tutorial.
      
      Later ... turned out to be easy to disable the buttons, but they are
      links and Selenium does not have a way to deal with a disabled link.
      So I am doing this by changing the class of the buttons to look
      disabled, removing the click event handler, and adding a hidden div
      element that we can look for.
      
      Lets see if this works.
      186eff94
    • Leigh Stoller's avatar
      950d12fa
  15. 12 Aug, 2019 3 commits