1. 12 Feb, 2019 1 commit
    • Leigh Stoller's avatar
      Recovery mode: · bde6c94d
      Leigh Stoller authored
      * Add a new Portal context menu option to nodes, to boot into "recovery"
        mode, which will be a Linux MFS (rather then the FreeBSD MFS, which
        99% of user will not know what to do with).
      
      * Plumb all through to the Geni RPC interface, which invokes node_admin
        with a new option, to use the recovery mfs nodetype attribute.
      
      * recoverymfs_osid is a distinct osid from adminmfs_osid, we use that in
        the CM to add an Emulab name space attribute to the manifest, that
        tells the Portal that a node supports recovery mode (and thus gets a
        context menu option).
      
      * Add an inrecovery flag to the sliver status blob, which the Portal
        uses to determine that a node is currently in recovery mode, so that
        we can indicate that in the topology and list tabs.
      bde6c94d
  2. 24 Oct, 2018 1 commit
    • Leigh Stoller's avatar
      Fixes for DeleteNodes(): · c14472f9
      Leigh Stoller authored
      * When deleting a lan can there is only one interface left, need to go
        back and delete the interface from the last node. Else its a malformed
        rpsec (which we have been ignoring), but it was passing through to the
        manifest, which made it a malformed manifest.
      
      * But a later bug was causing that now removed interface to sneak back
        in via the old copy of the manifest in the database.
      
      * Also fix a bug that was causing multiple versions of the site_info
        element to get inserted during an update.
      
      * Remove code that updates the manifest in the DB, use the existing
        Aggregate->UpdateManifest() method instead.
      c14472f9
  3. 02 Oct, 2018 1 commit
  4. 01 Oct, 2018 1 commit
  5. 06 Sep, 2018 1 commit
  6. 29 Aug, 2018 1 commit
  7. 16 Jul, 2018 2 commits
  8. 12 Jul, 2018 1 commit
  9. 09 Jul, 2018 1 commit
  10. 21 Jun, 2018 1 commit
  11. 14 May, 2018 2 commits
  12. 18 Apr, 2018 1 commit
  13. 17 Apr, 2018 2 commits
  14. 16 Feb, 2018 2 commits
  15. 22 Jan, 2018 2 commits
  16. 05 Dec, 2017 1 commit
  17. 19 Nov, 2017 1 commit
    • Leigh Stoller's avatar
      Round of changes related to dataset approval: · f431479c
      Leigh Stoller authored
      Previously we forced all Portal datasets to auto approve at the target
      cluster, now we let the local policy settings determine that, and return
      status indicating that the dataset needs to be approved by an admin.
      
      Plumbed through the approval path to the remote cluster.
      
      Fixed up polling to handle unapproved datasets and to watch for new
      failed state that Mike added to indicate that allocation failed.
      f431479c
  18. 03 Nov, 2017 1 commit
    • Leigh Stoller's avatar
      Fixes/Changes for reservations: · 79d99fa8
      Leigh Stoller authored
      1. Fix the user extend modal to show the proper number of days they can
         extend.
      
      2. Fix the admin extend modal warning when the extension would violate
         max extension, it was not showing. Add new alerts when we cannot get
         max extension from the cluster or no extension at all allowed.
      
      3. Reduce number of days in the box to max allowed. Warn loudly if you
         type a different number and its greater then max extension.
      
      4. Add "force" box to override. Use with caution. Added the plumbing
         through to the back end as new force option to RenewSliver().
      
      5. Add check in RenewSliver() to ask the reservation system if extension
         allowed before doing it. This was missing, should solve some of the
         over book problems.
      79d99fa8
  19. 13 Oct, 2017 1 commit
    • Leigh Stoller's avatar
      Changes for automatic lockdown of experiments: · 8f4e3191
      Leigh Stoller authored
      1. First off, we no longer do automatic lockdown of experiments when
         granting an extension longer then 10 days.
      
      2. Instead, we will lockdown experiments on case by case basis.
      
      3. Changes to the lockdown path that ask the reservation system at the
         target cluster if locking down would throw the reservation system
         into chaos. If so, return a refused error and give admin the choice
         to override. When we do override, send email to local tbops informing
         that the reservation system is in chaos state.
      8f4e3191
  20. 06 Oct, 2017 1 commit
  21. 25 Jul, 2017 1 commit
    • Leigh Stoller's avatar
      Add two new options to CreateImage(): · a7a3bc78
      Leigh Stoller authored
      1. nosnapshot; create the descriptor (clone_image) but do not start the
         imaging process (create_image).
      
      2. mustnotexist: Must be a new image in the project or return error.
      a7a3bc78
  22. 28 Jun, 2017 1 commit
  23. 22 Jun, 2017 1 commit
  24. 25 Apr, 2017 1 commit
  25. 24 Mar, 2017 1 commit
  26. 01 Mar, 2017 3 commits
  27. 28 Feb, 2017 1 commit
  28. 25 Jan, 2017 1 commit
  29. 09 Jan, 2017 1 commit
  30. 29 Nov, 2016 1 commit
    • Leigh Stoller's avatar
      Fix two small problems with Addnode/Deletenode. · fd9bd976
      Leigh Stoller authored
      1. Do not start a second copy of the event scheduler. This is the cause
         of all the slurm error messages on the APT cluster. Clearly this was
         wrong for DeleteNode(). AddNode is still open for debate, but at
         least now the error mail will stop.
      
      2. Do not reset the startstatus either, this was causing web interface
         to think startup services were running, when in fact they are not
         since the other nodes are not rebooted. In the classic interface,
         node reboot does not change the startstatus either, so lets mirror
         that in the Geni interface.
      fd9bd976
  31. 07 Nov, 2016 1 commit
    • Leigh Stoller's avatar
      Some work on restarting (rebooting) nodes. Presently, there is a bit of · 18cdfa8b
      Leigh Stoller authored
      an inconsistency in SliverAction(); when operating on the entire slice
      we do the whole thing in the background, returning (almost) immediately.
      Which makes sense, we expect the caller to poll for status after.
      
      But when operating on a subset of slivers (nodes), we do it
      synchronously, which means the caller is left waiting until we get
      through rebooting all the nodes. As David pointed out, when rebooting
      nodes in the openstack profile, this can take a long time as the VMs are
      torn down. This leaves the user looking at a spinner modal for a long
      time, which is not a nice UI feature.
      
      So I added a local option to do slivers in the background, and return
      immediately. I am doing the for restart and reload at the moment since
      that is primarily what we use from the Portal.
      
      Note that this has to push out to all clusters.
      18cdfa8b
  32. 02 Nov, 2016 1 commit
  33. 10 Oct, 2016 1 commit
    • Leigh Stoller's avatar
      Address linktest problems reported by Mike in issue #160: · e7422d49
      Leigh Stoller authored
      1. Changes to gentopofile to not put in linktest info for links and lan
         with only one member.
      
      2. Fix to the CM for deletenode of a node that has tagged links.
      
      3. Fixes to the status web page for deletenode; we were installing the
         linktest event handlers multiple times.
      
      4. Pass through -N argument to linktest from the CM, when the experiment
         has NFS mounts turned off, so that we use loghole to gather the data
         files (instead of via NFS).
      
      This closes issues #160.
      e7422d49