1. 21 Oct, 2016 1 commit
  2. 20 Oct, 2016 1 commit
  3. 17 Oct, 2016 2 commits
  4. 14 Oct, 2016 2 commits
    • Leigh B Stoller's avatar
      Attempt to address the problem described in issue #166; that nodes fail · 5d7164b3
      Leigh B Stoller authored
      to go from PXEBOOTING (pxewakeup) to actually booting, but we do not
      know that for a really long time cause we send a BOOTING event from
      bootinfo right after PXEBOOTING, since that was the only place to hook
      it in. Well Mike discovered the "on commit" support in dhcpd, and so
      that is what we are going to use now. Note that uboot nodes have been
      using on commit, now all nodes will when BOOTINFO_EVENTS=0.
      Mike's reportboot program is now a daemon, renamed to report_daemon.
      The original reportboot program is a little script that writes the
      arguments from dhcpd to a unix socket to be picked up by the daemon,
      which does the original work of mapping the IP/Mac to a node id and
      sending an event. The code has also been modified to run on a subboss
      using the same node mapping given to to dhcpd, reconstituted as DBM
      file by subboss_dhcpd_makeconf.
      The reason for using a daemon this way is so that we do not hang up
      dhcpd in case we cannot get to the event system. The unix domain
      socket will give us some amount of buffering, but I suspect that any
      event problem will eat that space up quickly, and I will be back to
      revisit this (probably want reportboot to not block on its write
      to the socket).
      pxeboot changed to not send PXEBOOTING or BOOTING when
    • Leigh B Stoller's avatar
      Couple of tiny fixes that I made while stumbling around on a question · ae70d47f
      Leigh B Stoller authored
      about nobwshaping from Kirk.
  5. 12 Oct, 2016 1 commit
  6. 11 Oct, 2016 2 commits
  7. 10 Oct, 2016 1 commit
    • Leigh B Stoller's avatar
      Address linktest problems reported by Mike in issue #160: · e7422d49
      Leigh B Stoller authored
      1. Changes to gentopofile to not put in linktest info for links and lan
         with only one member.
      2. Fix to the CM for deletenode of a node that has tagged links.
      3. Fixes to the status web page for deletenode; we were installing the
         linktest event handlers multiple times.
      4. Pass through -N argument to linktest from the CM, when the experiment
         has NFS mounts turned off, so that we use loghole to gather the data
         files (instead of via NFS).
      This closes issues #160.
  8. 06 Oct, 2016 1 commit
  9. 03 Oct, 2016 2 commits
  10. 29 Sep, 2016 2 commits
    • Mike Hibler's avatar
      Fix the wording of a warning message. · ee854767
      Mike Hibler authored
    • Mike Hibler's avatar
      Machinery for supporting multiple RO/RW clones of a dataset in one experiment. · 72fb6763
      Mike Hibler authored
      Mostly ptopgen/libvtop changes to get things through assign.
      Added a new virt_blockstore_attribute, 'prereserve' that can be applied to
      a RW clone to pre-allocate the full amount of space allocated to the volume
      being cloned. This is instead of the default "sparse" clone which could run
      out of space at an inopportune time if the containing pool runs out of space.
      But it doesn't work yet.
      Everything is there in the front end to do the necessary capacity checks and
      allocations of space, but then I discovered that ZFS doesn't readily support
      a non-sparse clone! You can do this, I think, by tweaking the "refreserved"
      attribute of the volume after it is created but that would have to be done
      behind the back of FreeNAS and I would have to do some more testing before I
      am willing to go here.
      So for now, all clones are sparse and no one is charged for their usage.
  11. 26 Sep, 2016 1 commit
  12. 20 Sep, 2016 1 commit
    • Mike Hibler's avatar
      Initial support for ephemeral RW clones of persistent blockstores. · f98ab0e5
      Mike Hibler authored
      Using "set-rwclone" ala:
          set $bsobj [$ns blockstore]
          $bsobj set-lease "emulab-ops/bar"
          $bsobj set-node $node
          $bsobj set-rwclone 1
      in your NS file will create a clone of the indicated persistent blockstore.
      Somewhat limited in utility since you can only have one clone of a
      particular blockstore per experiment.
  13. 15 Sep, 2016 1 commit
  14. 14 Sep, 2016 1 commit
  15. 12 Sep, 2016 2 commits
    • Mike Hibler's avatar
      Modify NOVIRTNFSMOUNTS to allow mounts on vnodes with routable IPs. · 470a81e5
      Mike Hibler authored
      This is different than the traditional behavior of this defs- variable.
      Previously it caused tmcd to not expose any NFS mounts to shared-host vnodes.
      We relax that now to allow exposing such mounts to vnodes with routable IP
      The rationale for this change is simply that the original option was only
      intended to prevent exporting mounts to hosts that could not reach the FS
      node anyway due to their unroutable cnet IPs.
    • Mike Hibler's avatar
      Make exports_setup consistent with tmcd w.r.t. NOVIRTNFSMOUNTS. · 7ba4cfd0
      Mike Hibler authored
      Previously, we would not pass the mounts via tmcd but they were still
      exported from fs.
  16. 06 Sep, 2016 1 commit
  17. 02 Sep, 2016 1 commit
    • Leigh B Stoller's avatar
      Changes for dealing with group/password file locking errors: · fd0fd225
      Leigh B Stoller authored
      * Move user mod (gecos,password) into the accountsetup proxy instead of
        ssh chpass. Wrap all usermod/chpass system calls in a loop that looks
        for the busy file error, back off and try again for a while.
      * Add same wrapping to local (boss) calls of usermod/chpass. I put that
        function into emutil.
      * Rename old modgroups in the proxy to setgroups, since that it is what
        it was actually doing.
  18. 29 Aug, 2016 3 commits
  19. 25 Aug, 2016 1 commit
  20. 10 Aug, 2016 1 commit
    • Mike Hibler's avatar
      Rejiggered reload_daemon to enforce a max time. · b6d272a2
      Mike Hibler authored
      There are now some sitevars to control its behavior, the one of interest here
      is reload/failtime:
      The way the reload daemon is supposed to work now is that nodes will be
      started on their reloading adventure with an os_load. If they are still there
      after reload/retrytime minutes, then they will either be rebooted (if the
      os_load was successful) or os_load'ed again (if the first os_load failed
      outright). The logic for either of these is that there might have been some
      transient condition that caused the failure. If we do have to perform this
      "retry" then we will send email to testbed-ops if reload/warnonretry is set.
      If, after another reload/retrytime minutes, a node is still there, then the
      node will be sent to hwdown, possibly powering it off or booting it into the
      admin MFS depending on the setting of reload/hwdownaction.
      So really, reload/failtime should not be needed. All node should exit
      reloading in 2 * reload/retrytime minutes. But it is there as a backstop
      (and because I didn't understand the logic of the reload daemon at first!)
      Well, it also comes into play if the reload daemon is restarted after being
      down for a long period of time. In this case, all nodes in reloading will
      get moved to hwdown. May need to reconsider this...
  21. 29 Jul, 2016 4 commits
  22. 21 Jul, 2016 1 commit
  23. 19 Jul, 2016 1 commit
  24. 17 Jun, 2016 1 commit
  25. 10 Jun, 2016 2 commits
  26. 06 Jun, 2016 1 commit
  27. 26 May, 2016 1 commit
  28. 16 May, 2016 1 commit