1. 08 Dec, 2016 1 commit
  2. 07 Dec, 2016 1 commit
  3. 01 Dec, 2016 1 commit
  4. 29 Nov, 2016 1 commit
    • Leigh Stoller's avatar
      Fix two small problems with Addnode/Deletenode. · fd9bd976
      Leigh Stoller authored
      1. Do not start a second copy of the event scheduler. This is the cause
         of all the slurm error messages on the APT cluster. Clearly this was
         wrong for DeleteNode(). AddNode is still open for debate, but at
         least now the error mail will stop.
      
      2. Do not reset the startstatus either, this was causing web interface
         to think startup services were running, when in fact they are not
         since the other nodes are not rebooted. In the classic interface,
         node reboot does not change the startstatus either, so lets mirror
         that in the Geni interface.
      fd9bd976
  5. 28 Nov, 2016 1 commit
  6. 12 Nov, 2016 2 commits
    • Leigh Stoller's avatar
      Bring the cluster monitor "inhouse", rather then depending on the jfed · d7c4230e
      Leigh Stoller authored
      monitoring system.
      
      New portal_monitor daemon does a GetVersion/ListResources call at each
      of the clusters every five minutes, and updates the new table in the
      DB called apt_aggregate_status. We calculate free/inuse counts for
      physical nodes and a free count for VMs. Failure to contact the
      aggregate for more then 10 minutes sets the aggregate as down, since
      from our perspective if we cannot get to it, the cluster is down.
      
      Unlike the jfed monitoring system, we are not going to try to
      instantiate a new experiment or ssh into it. Wait and see if that is
      necessary in our context.
      
      On the instantiate page, generate a json structure for each cluster,
      similar the one described in issue #172 by Keith. This way we can easily
      switch the existing code over to this new system, but fail back to the
      old mechanism if this turn out to be a bust.
      
      Some other related changes to how we hand cluster into the several web
      pages.
      d7c4230e
    • Leigh Stoller's avatar
      Fix a couple of memory leaks. · 1fd592b5
      Leigh Stoller authored
      1fd592b5
  7. 11 Nov, 2016 1 commit
  8. 09 Nov, 2016 1 commit
  9. 08 Nov, 2016 1 commit
  10. 07 Nov, 2016 2 commits
    • Leigh Stoller's avatar
      Minor fix to previous revision. · b0bb1017
      Leigh Stoller authored
      b0bb1017
    • Leigh Stoller's avatar
      Some work on restarting (rebooting) nodes. Presently, there is a bit of · 18cdfa8b
      Leigh Stoller authored
      an inconsistency in SliverAction(); when operating on the entire slice
      we do the whole thing in the background, returning (almost) immediately.
      Which makes sense, we expect the caller to poll for status after.
      
      But when operating on a subset of slivers (nodes), we do it
      synchronously, which means the caller is left waiting until we get
      through rebooting all the nodes. As David pointed out, when rebooting
      nodes in the openstack profile, this can take a long time as the VMs are
      torn down. This leaves the user looking at a spinner modal for a long
      time, which is not a nice UI feature.
      
      So I added a local option to do slivers in the background, and return
      immediately. I am doing the for restart and reload at the moment since
      that is primarily what we use from the Portal.
      
      Note that this has to push out to all clusters.
      18cdfa8b
  11. 03 Nov, 2016 6 commits
  12. 02 Nov, 2016 2 commits
  13. 28 Oct, 2016 1 commit
  14. 25 Oct, 2016 1 commit
  15. 18 Oct, 2016 2 commits
  16. 16 Oct, 2016 2 commits
  17. 12 Oct, 2016 2 commits
  18. 10 Oct, 2016 1 commit
    • Leigh Stoller's avatar
      Address linktest problems reported by Mike in issue #160: · e7422d49
      Leigh Stoller authored
      1. Changes to gentopofile to not put in linktest info for links and lan
         with only one member.
      
      2. Fix to the CM for deletenode of a node that has tagged links.
      
      3. Fixes to the status web page for deletenode; we were installing the
         linktest event handlers multiple times.
      
      4. Pass through -N argument to linktest from the CM, when the experiment
         has NFS mounts turned off, so that we use loghole to gather the data
         files (instead of via NFS).
      
      This closes issues #160.
      e7422d49
  19. 07 Oct, 2016 1 commit
  20. 06 Oct, 2016 2 commits
  21. 03 Oct, 2016 1 commit
  22. 26 Sep, 2016 3 commits
  23. 20 Sep, 2016 3 commits
  24. 19 Sep, 2016 1 commit