1. 28 Feb, 2019 2 commits
  2. 11 Oct, 2018 1 commit
  3. 29 Aug, 2018 1 commit
  4. 16 Jul, 2018 2 commits
  5. 21 Jun, 2018 2 commits
  6. 06 Jun, 2018 1 commit
  7. 04 Jun, 2018 1 commit
    • Leigh B Stoller's avatar
      Fix a bug that was introduced when we shifted to using os_setup · e59fc714
      Leigh B Stoller authored
      directly (on the Cloudlab clusters); we were losing a lock out that
      allowed DeleteSliver() to run while in the middle of a CreateSliver().
      This was resulting in a lot of email about node failures since the nodes
      were getting yanked out from underneath the CreateSliver(). From the
      user perspective, this did not matter much, since they wanted the slice
      gone, but it finally bothered me enough to look more closely.
      e59fc714
  8. 19 Apr, 2018 1 commit
    • Leigh B Stoller's avatar
      Minor changes to update keys: · c9faef6e
      Leigh B Stoller authored
      1. Do not mark down nodes as needing to be updated, they just get stuck
         there.
      
      2. In the CM daemon, what for slivers/aggregates that have been stuck in
         updating users for too long, and cancel the update. Typically this is
         cause a node is wedged or not participating (CORD), so again just
         cancel and reset the state.
      c9faef6e
  9. 17 Apr, 2018 1 commit
  10. 09 Mar, 2018 1 commit
  11. 08 Mar, 2018 1 commit
  12. 16 Feb, 2018 2 commits
  13. 09 Feb, 2018 1 commit
  14. 22 Jan, 2018 3 commits
  15. 27 Nov, 2017 1 commit
  16. 12 Oct, 2017 2 commits
  17. 25 Sep, 2017 1 commit
    • David Johnson's avatar
      Bugfix: don't (potentially) process dedicated vhost slivers twice. · db7e0b9d
      David Johnson authored
      libosload_new::osload cannot handle if it is told in one invocation to
      load the same image twice on the same node.  GeniAggregate::Action was
      telling it to do that if a vnode sliver was processed before a vhost
      sliver; the vhost sliver would be duplicate-processed, resulting in a
      double-call to osload that resulted in the osload child process hanging.
      db7e0b9d
  18. 06 Sep, 2017 1 commit
  19. 22 May, 2017 3 commits
  20. 03 Mar, 2017 1 commit
  21. 01 Mar, 2017 1 commit
  22. 20 Feb, 2017 1 commit
  23. 01 Feb, 2017 2 commits
    • Leigh B Stoller's avatar
      Remove debugging code. · 4fb328bd
      Leigh B Stoller authored
      4fb328bd
    • Leigh B Stoller's avatar
      Checkpoint two changes: · bd9613cc
      Leigh B Stoller authored
      1. Using frisbee events in libosload_new as a replacement for
         statically (hand waved) maxwait times for image loading. When frisbee
         is generating events, we use those to determined if progress is being
         made.
      
      2. Convert the CM to using the libosload_new library directly (like
         os_setup does). This is conditional on the NewOsload feature being
         attached to the geniuser. Otherwise, we go through the old path.
      bd9613cc
  24. 03 Jan, 2017 1 commit
  25. 29 Nov, 2016 1 commit
    • Leigh B Stoller's avatar
      Fix two small problems with Addnode/Deletenode. · fd9bd976
      Leigh B Stoller authored
      1. Do not start a second copy of the event scheduler. This is the cause
         of all the slurm error messages on the APT cluster. Clearly this was
         wrong for DeleteNode(). AddNode is still open for debate, but at
         least now the error mail will stop.
      
      2. Do not reset the startstatus either, this was causing web interface
         to think startup services were running, when in fact they are not
         since the other nodes are not rebooted. In the classic interface,
         node reboot does not change the startstatus either, so lets mirror
         that in the Geni interface.
      fd9bd976
  26. 07 Nov, 2016 2 commits
    • Leigh B Stoller's avatar
      Minor fix to previous revision. · b0bb1017
      Leigh B Stoller authored
      b0bb1017
    • Leigh B Stoller's avatar
      Some work on restarting (rebooting) nodes. Presently, there is a bit of · 18cdfa8b
      Leigh B Stoller authored
      an inconsistency in SliverAction(); when operating on the entire slice
      we do the whole thing in the background, returning (almost) immediately.
      Which makes sense, we expect the caller to poll for status after.
      
      But when operating on a subset of slivers (nodes), we do it
      synchronously, which means the caller is left waiting until we get
      through rebooting all the nodes. As David pointed out, when rebooting
      nodes in the openstack profile, this can take a long time as the VMs are
      torn down. This leaves the user looking at a spinner modal for a long
      time, which is not a nice UI feature.
      
      So I added a local option to do slivers in the background, and return
      immediately. I am doing the for restart and reload at the moment since
      that is primarily what we use from the Portal.
      
      Note that this has to push out to all clusters.
      18cdfa8b
  27. 06 Oct, 2016 1 commit
  28. 03 Oct, 2016 1 commit
  29. 19 Sep, 2016 1 commit