1. 01 Oct, 2018 1 commit
    • Leigh Stoller's avatar
      More work on the aggregate monitoring. · 9f3205c9
      Leigh Stoller authored
      1. Split the resource stuff (where we ask for an advertisement and
         process it) into a separate script, since that takes a long time to
         cycle through cause of the size of the ads from the big clusters.
      
      2. On the monitor, distinguish offline (nologins) from actually being
         down.
      
      3. Add a table to store changes in status so we can see over time how
         much time the aggregates are usable.
      9f3205c9
  2. 17 Sep, 2018 1 commit
  3. 30 May, 2018 1 commit
  4. 16 Feb, 2018 1 commit
  5. 04 Oct, 2017 1 commit
  6. 01 Mar, 2017 2 commits
    • Leigh Stoller's avatar
      More work on deleting profiles/images; when deleting an entire profile · 7a07826e
      Leigh Stoller authored
      all at once, we want to delete the naked images so that all versions of
      the image get deleted.
      
      Here is where things get tricky; our only record of where the image
      lives and hat versions exist is at the cluster or in the source. But
      if the source is not using a version then we have no record of it and
      do not know what cluster to delete from. This is problem on the Cloudlab
      portal not the Emulab portal, we can handle that (although I do not
      yet).
      
      Still working on this, but its a little better and of course images can
      be deleted on the image management page which asks for a complete list
      of all images.
      7a07826e
    • Leigh Stoller's avatar
  7. 15 Feb, 2017 1 commit
  8. 29 Nov, 2016 1 commit
  9. 12 Nov, 2016 1 commit
    • Leigh Stoller's avatar
      Bring the cluster monitor "inhouse", rather then depending on the jfed · d7c4230e
      Leigh Stoller authored
      monitoring system.
      
      New portal_monitor daemon does a GetVersion/ListResources call at each
      of the clusters every five minutes, and updates the new table in the
      DB called apt_aggregate_status. We calculate free/inuse counts for
      physical nodes and a free count for VMs. Failure to contact the
      aggregate for more then 10 minutes sets the aggregate as down, since
      from our perspective if we cannot get to it, the cluster is down.
      
      Unlike the jfed monitoring system, we are not going to try to
      instantiate a new experiment or ssh into it. Wait and see if that is
      necessary in our context.
      
      On the instantiate page, generate a json structure for each cluster,
      similar the one described in issue #172 by Keith. This way we can easily
      switch the existing code over to this new system, but fail back to the
      old mechanism if this turn out to be a bust.
      
      Some other related changes to how we hand cluster into the several web
      pages.
      d7c4230e
  10. 22 Feb, 2016 1 commit
  11. 04 Jan, 2016 1 commit