1. 25 Feb, 2010 1 commit
    • Leigh B Stoller's avatar
      Add revoke option (-r) to grantnodetype script. Does what you think · e0234031
      Leigh B Stoller authored
      it does.
      
      Change the code that rebuilds nodetypeXpid_permissions so that if
      a node is specifically revoked, make sure it is granted to all
      other projects. This is kinda gross, but in fact, we really need to
      ditch nodetypeXpid_permissions and use the policy tables directly,
      but I do not have time to do that.
      
      Remove all that robot lab open/close stuff in libadminctrl. Silly
      stuff that is no longer used.
      e0234031
  2. 23 Feb, 2010 2 commits
  3. 09 Feb, 2010 4 commits
    • Mike Hibler's avatar
      Change static "use" of power_rmcp to a runtime "require". · 3693464c
      Mike Hibler authored
      Make sure power doesn't blow up if rmcp code is not installed.
      3693464c
    • Mike Hibler's avatar
      Minor fixes for snmpit_remote: · f85b65b1
      Mike Hibler authored
      Add missing argument to RecordVlanInsertion().
      
      Don't return 1 from RemoteDoList() on error.  The caller is expecting a list
      and not any sort of error.  Instead, on error we tbdie() which is what other
      backend modules do on errors.
      f85b65b1
    • Mike Hibler's avatar
      One fix and one work around for new (5.10.0) perl warnings about "used once" · d10e9504
      Mike Hibler authored
      variables in libtblog_simple and SWIG generated code.
      
      The fix is to libtblog, to get rid of the warning.  The warning from swig
      generated code is a swig problem that was supposedly fixed, but apparently
      not.  So those warnings remain, and led to the workaround.
      
      The workaround is in shommlists which parses output captured by SUEXEC.
      That function captures all output, stdout and stderr, from the command it runs
      and presents it in an array.  So if you use it to invoke a perl script that
      evokes the "used once" messages, those messages wind up in the array.
      showmmlists was not validating that output at all, it just assumed everything
      returned was the name of a mail list.  Added some syntactic validation
      (aka a regexp) to deal with that.
      d10e9504
    • Mike Hibler's avatar
      More work, still not tested · 1007a582
      Mike Hibler authored
      1007a582
  4. 08 Feb, 2010 1 commit
  5. 04 Feb, 2010 1 commit
  6. 03 Feb, 2010 3 commits
  7. 29 Jan, 2010 1 commit
  8. 28 Jan, 2010 1 commit
  9. 25 Jan, 2010 1 commit
  10. 22 Jan, 2010 1 commit
  11. 21 Jan, 2010 1 commit
  12. 15 Jan, 2010 1 commit
    • Mike Hibler's avatar
      Fix waitmode==2 logic. · ee7974ab
      Mike Hibler authored
      We were getting stuck in an infinite loop if a node failed to come back
      up for longer than its time limit.
      ee7974ab
  13. 14 Jan, 2010 1 commit
  14. 13 Jan, 2010 1 commit
  15. 12 Jan, 2010 2 commits
  16. 11 Jan, 2010 1 commit
  17. 08 Jan, 2010 2 commits
  18. 07 Jan, 2010 1 commit
  19. 06 Jan, 2010 1 commit
  20. 05 Jan, 2010 2 commits
  21. 28 Dec, 2009 1 commit
  22. 23 Dec, 2009 1 commit
    • Leigh B. Stoller's avatar
      A couple of changes that attempt to cut short the waiting when · 28ac96a5
      Leigh B. Stoller authored
      a node has failed.
      
      * In the main wait loop, I check the eventstate for the node, for
        TBFAILED or PXEFAILED. Neither of these should happen after the
        reboot, so it makes sense to quit waiting if they do.
      
      * I added an event handler to libosload, specifically to watch for
        nodes entering RELOADSETUP or RELOADING, after the reboot. Because
        of the race with reboot, this was best done with a handler instead
        of polling the DB state like case #1 above. The idea is that a node
        should hit one of these two states within a fairly short time (I
        currently have it set to 5 minutes). If not, something is wrong and
        the loop bails on that node. ÊWhat happens after is subject to the
        normal waiting times.
      
      I believe that these two tests will catch a lot of cases where osload
      is waiting on something that will never finish.
      28ac96a5
  23. 22 Dec, 2009 4 commits
  24. 21 Dec, 2009 1 commit
    • Leigh B. Stoller's avatar
      New approach to dealing with nodes that fail to boot is os_setup, and · 5cf6aad2
      Leigh B. Stoller authored
      land in hwdown.
      
      Currently, if a node fails to boot in os_setup and the node is running
      a system image, it is moved into hwdown. 99% of the time this is
      wasted work; the node did not fail for hardware reasons, but for some
      other reason that is transient.
      
      The new approach is to move the node into another holding experiment,
      emulab-ops/hwcheckup. The daemon watches that experiment, and nodes
      that land in it are freshly reloaded with the default image and
      rebooted. If the node reboots okay after reload, it is released back
      into the free pool. If it fails any part of the reload/reboot, it is
      officially moved into hwdown.
      
      Another possible use; if you have a suspect node, you go wiggle some
      hardware, and instead of releasing it into the free pool, you move it
      into hwcheckup, to see if it reloads/reboots. If not, it lands in
      hwdown again. Then you break out the hammer.
      
      Most of the changes in Node.pm, libdb.pm, and os_setup are
      organizational changes to make the code cleaner.
      5cf6aad2
  25. 18 Dec, 2009 1 commit
    • Leigh B. Stoller's avatar
      Changes to support the SPP nodes. My approach was a little odd. · fd015646
      Leigh B. Stoller authored
      What I did was create node table entries for the three SPP nodes.
      These are designated as local, shared nodes, reserved to a holding
      experiment. This allowed me to use all of the existing shared node
      pool support, albeit with a couple of tweaks in libvtop that I will
      not bother to mention since they are hideous (another thing I need to
      fix).
      
      The virtual nodes that are created on the spp nodes are figments; they
      will never be setup, booted or torn down. They exist simply as place
      holders in the DB, in order hold the reserved bandwidth on the network
      interfaces. In other words, you can create as many of these imaginary
      spp nodes (in different slices if you like) as there are interfaces on
      the spp node. Or you can create a single spp imaginary node with all
      of the interfaces. You get the idea; its the reserved bandwidth that
      drives the allocation.
      
      There are also some minor spp specific changes in vnode_setup.in to
      avoid trying to generalize things. I will return to this later as
      needed.
      
      See this wiki page for info and sample rspecs:
      
      https://www.protogeni.net/trac/protogeni/wiki/SPPNodes
      fd015646
  26. 17 Dec, 2009 1 commit
  27. 15 Dec, 2009 2 commits