1. 04 Jun, 2012 1 commit
  2. 14 Dec, 2010 1 commit
  3. 08 Dec, 2010 1 commit
  4. 10 May, 2010 1 commit
  5. 14 Apr, 2010 1 commit
    • Mike Hibler's avatar
      Changes for speeding up elabinelab server setup. · 6feda7d3
      Mike Hibler authored
      Boss/ops/fs: reboot them together after setup rather than serially.
      
      Nodes: leave them in PXEWAIT throughout the setup, until after boss has
      been rebooted.  At that point we send them the new bootinfo RESTART command
      telling pxeboot to re-DHCP and use the new info obtained (next-server) to
      contact a potentially new boss node.  This is a quick way to switch a node
      in PXEWAIT from talking to the outer boss to talking to the inner one.
      
      A significant number of rinky-dink changes were needed to do this, primarily
      adding a new state, PXELIMBO, where nodes can be sent to sit until they are
      restarted.  It turns out, just putting them in an existing state such as
      PXEWAKEUP or SHUTDOWN wouldn't work, as they tend to timeout or otherwise
      reboot.
      6feda7d3
  6. 08 Apr, 2010 1 commit
  7. 22 Mar, 2010 1 commit
    • Leigh B Stoller's avatar
      Finish up user deletion. The big visible change is that when a user is · 2965922b
      Leigh B Stoller authored
      deleted, they still remain in the user table with a status of
      "archived", but since all the queries in the system now use uid_idx
      instead of uid, it is safe to reuse a uid since they are no longer
      ambiguous. 
      
      The reason for not deleting users from the users table is so that the
      stats records can refer to the original record (who was that person
      named "mike"). This is very handy and worth the additional effort it
      has taken.
      
      There is no way to ressurect a user, but it would not be hard to add.
      2965922b
  8. 11 Mar, 2010 1 commit
  9. 22 Dec, 2009 1 commit
    • Leigh B. Stoller's avatar
      New approach to dealing with nodes that fail to boot is os_setup, and · 5cf6aad2
      Leigh B. Stoller authored
      land in hwdown.
      
      Currently, if a node fails to boot in os_setup and the node is running
      a system image, it is moved into hwdown. 99% of the time this is
      wasted work; the node did not fail for hardware reasons, but for some
      other reason that is transient.
      
      The new approach is to move the node into another holding experiment,
      emulab-ops/hwcheckup. The daemon watches that experiment, and nodes
      that land in it are freshly reloaded with the default image and
      rebooted. If the node reboots okay after reload, it is released back
      into the free pool. If it fails any part of the reload/reboot, it is
      officially moved into hwdown.
      
      Another possible use; if you have a suspect node, you go wiggle some
      hardware, and instead of releasing it into the free pool, you move it
      into hwcheckup, to see if it reloads/reboots. If not, it lands in
      hwdown again. Then you break out the hammer.
      
      Most of the changes in Node.pm, libdb.pm, and os_setup are
      organizational changes to make the code cleaner.
      5cf6aad2
  10. 16 Oct, 2009 1 commit
    • David Johnson's avatar
      Make shared vnodes reloadable. This whole thing sucks for modifies · abcc783c
      David Johnson authored
      because we (vnode_setup) needs to go out to the nodes and run vnodesetup
      to trigger the reload, but os_setup needs to setup the reload.  So for
      now, os_setup sets up the reload but does not wait nor reboot the vnode;
      vnode_setup does that like normal.  Probably there are going to be timeout
      problems, but it's good enough for my needs right now.
      abcc783c
  11. 12 Oct, 2009 1 commit
    • David Johnson's avatar
      Add the ability to load images on virtnodes. For now, we just overload · c6c57bc9
      David Johnson authored
      the tb-set-node-os command with a second optional argument; if that is
      present, the first arg is the child OS and the second is the parent OS.
      We add some new features in ptopgen (OS-parentOSname-childOSname) based
      off a new table that maps which child OSes can run on which parents, and
      the right desires get added to match.  We setup the reloads in os_setup
      along with the parents.  Also needed a new opmode, RELOAD-PCVM, to handle
      all this.
      
      For now, users only have to specify that their images can run on pcvms, a
      special hack for which type the images can run on.  This makes sense in
      general since there is no point conditionalizing childOS loading on
      hardware type at the moment, but rather on parentOS.  Hopefully this stuff
      wiill mostly work on shared nodes too, although we'll have to be more
      aggressive on the client side garbage collecting old frisbee'd images for
      long-lived shared hosts.
      
      I only made these changes in libvtop, so assign_wrapper folks are left in
      the dark.
      
      Currently, the client side supports frisbee.  Only in openvz for now, and
      this probably breaks libvnode_xen.pm.  Also in here are some openvz
      improvements, like ability to sniff out which network is the public
      control net, and which is the fake virtual control net.
      c6c57bc9
  12. 24 Sep, 2009 1 commit
  13. 08 May, 2009 1 commit
  14. 13 Feb, 2009 1 commit
  15. 12 Feb, 2009 1 commit
    • Kevin Atkinson's avatar
      Add code to os_setup to log information about Image usage. · 979442f3
      Kevin Atkinson authored
      Enough information is logged so that, at any point in time,
      it is possible to tell what images are being used.  After
      collecting some stats for a while I hope to use this data to
      evaluate various strategies for preloading disks with images
      other than the default.
      
      Although not its primary purpose, enough information is
      collection to be able to get a snapshot of node usage at any
      point in time.  This includes what nodes are being used and by
      who, as in which experiments and thus which projects.
      
      NOTE: For a while you might see a few of these warnings,
        *** WARNING: os_setup:
        ***   could not find previous state (rsrcidx=484084) in image_history
        ***   table, won't be able to determine newly allocated nodes
      if someone does a swapmod to an experiment that was swapped in
      before this commit was installed.  This is because os_setup uses
      previous information in the table to determine newly allocated
      nodes.  This warning can safely be ignored in this case, and should
      go away over time.
      979442f3
  16. 10 Sep, 2008 1 commit
    • Kevin Atkinson's avatar
      Make nodereboot respect the waittime arg, and wait 10 minutes for PLC. · b7da57f5
      Kevin Atkinson authored
      Currently nodereboot in libreboot essentially ignores the waittime
      arg because it forks and calls node_reboot to do the real work, but
      doesn't pass on the waittime to it.  Fix this by adding a "-W"
      option to node_reboot in order to specify the waittime.
      
      Use this to extend the waittime for a PLC node to come up from 6 minutes to 10.
      b7da57f5
  17. 02 May, 2008 1 commit
  18. 25 Oct, 2007 1 commit
  19. 17 Sep, 2007 1 commit
  20. 16 Aug, 2007 1 commit
  21. 02 Aug, 2007 1 commit
  22. 25 Apr, 2007 1 commit
  23. 06 Apr, 2007 1 commit
  24. 08 Sep, 2006 1 commit
    • Kirk Webb's avatar
      · 3a3c95fb
      Kirk Webb authored
      Parallelize the setup of plab vnodes alongside the loading of local
      physical nodes.  We fork vnode_setup to operate on the plab vnodes just
      before firing off local reload/reboot/reconfig operations.  The status
      of the plab vnode setup setup is checked just before firing off vnode_setup
      for any local vnodes.  The ISUP wait for plab vnodes continues to fall
      within the same stage as wating for local vnodes.  New arguments have been
      added to vnode_setup to tell it to only operate on specific vnode types.
      '-j' for local jail nodes, and '-p' for plab nodes.  If neither are
      specified, the default is to operate on all types.
      3a3c95fb
  25. 21 Aug, 2006 1 commit
    • Kevin Atkinson's avatar
      · 9b718661
      Kevin Atkinson authored
      Avoid counting planetlab vnodes twice.
      9b718661
  26. 16 Aug, 2006 1 commit
    • Kevin Atkinson's avatar
      - Added tbreport database schema (added three tables), storage for · 9c5d3308
      Kevin Atkinson authored
        tbreport errors & context.
      
      - Modified fatal() in swapexp, batchexp, and tbprerun, and die_noretry()
        in os_setup to pass hash parameter to tblog functions.
      
      - Added tbreport errror & context information for select errors in
        swapexp, tbswap, assign_wrapper2, snmpit_lib, snmpit, batchexp,
        assign_wrapper, os_setup, parse-ns, & tbprerun.
      
      - Added assign error parser in assign_wrapper2.
      
      - Added parse.tcl error parser in parse-ns.
      
      - Added severity constants for tbreport in libtblog_simple.
      
      - Added tbreport() function & context table mappging for reporting
        discrete error types to libtblog.
      9c5d3308
  27. 27 Jul, 2006 1 commit
    • Kevin Atkinson's avatar
      · 0e5e57e6
      Kevin Atkinson authored
      Small bug fixes in cleanup in os_setup summary code.
      0e5e57e6
  28. 26 Jul, 2006 2 commits
    • Kevin Atkinson's avatar
      · 9237c34b
      Kevin Atkinson authored
      Fix syntax error.
      9237c34b
    • Kevin Atkinson's avatar
      · cbf3c5d4
      Kevin Atkinson authored
      swapexp: The previous commit, witch added a message about the recovery
      action when a swap-modify failed to the top of the email, did not
      catch all of the possible cases.  Added the case when the experiment is
      not swapped in.
      
      os_setup: Refactored/rewrote os_setup error summary code.  Distinguish
      the case when nodes fail to properly load the os and when the don't
      boot after loading the os.
      cbf3c5d4
  29. 21 Jul, 2006 1 commit
    • Kevin Atkinson's avatar
      · 2093eb10
      Kevin Atkinson authored
      Don't use "no warnings 'uninitialized'" since that is a perl 5.6+ feature
      and some are still using an ancient version of perl.
      2093eb10
  30. 20 Jul, 2006 3 commits
    • Kevin Atkinson's avatar
      · 494debf6
      Kevin Atkinson authored
      length => $length in os_setup!
      494debf6
    • Kevin Atkinson's avatar
      · 6c61b70c
      Kevin Atkinson authored
      Fixed bug in summary of failed nodes when there are more than can fit on a line.
      6c61b70c
    • Kevin Atkinson's avatar
      · 5710c340
      Kevin Atkinson authored
      Various tblog changes:
      
      Added message about recovery action when a swap-modify failed to the
      top of the email.
      
      Fine tuned os_setup summary error.  Added (possible partial) list of
      nodes that fail; if a large number fail only show as many that will
      fit on a single line.  Other tweaks.
      
      Flagged assign_wrapper errors of an Invalid OS as user errors.
      5710c340
  31. 18 Jul, 2006 1 commit
    • Leigh B. Stoller's avatar
      Changes necessary for moving most of the stuff in the node_types · 624a0364
      Leigh B. Stoller authored
      table, into a new table called node_type_attributes, which is intended
      to be a more extensible way of describing nodes.
      
      The only things left in the node_types table will be type,class and the
      various isXXX boolean flags, since we use those in numerous joins all over
      the system (ie: when discriminating amongst nodes).
      
      For the most part, all of that other stuff is rarely used, or used in
      contexts where the information is needed, but not for type descrimination.
      Still, it made for a lot of queries to change!
      
      Along the way I added a NodeType library module that represents the type
      info as a perl object. I also beefed up the existing Node module, and
      started using it in more places. I also added an Interfaces module, but I
      have not done much with that yet.
      
      I have not yet removed all the slots from the node_types table; I plan to
      run the new code for a few days and then remove the slots.
      
      Example using the new NodeType object:
      
      	use NodeType;
      
      	my $typeinfo = NodeType->Lookup($type);
      
              if ($typeinfo->control_interface(\$control_iface) ||
                  !$control_iface) {
        	    warn "No control interface for $type is defined in the DB!\n";
              }
      
      or using the Node:
      
      	use Node;
      
              my $nodeobject = Node->Lookup($node_id);
              my $imageable  = $nodeobject->NodeTypeInfo()->imageable();
      or
              my $rebootable = $nodeobject->isrebootable();
      or
              $nodeobject->NodeTypeAttribute("control_interface", \$control_iface);
      
      Lots of way to accomplish the same thing, but the main point is that the
      Node is able to override the NodeType (if it wants to), which I think is
      necessary for flexibly describing one/two of a kind things like switches, etc.
      624a0364
  32. 10 Jul, 2006 1 commit
  33. 08 Jul, 2006 1 commit
  34. 07 Jul, 2006 1 commit
  35. 05 Jul, 2006 2 commits
    • Kevin Atkinson's avatar
      · 43c0b17f
      Kevin Atkinson authored
      Fixed perl warning about Use of uninitialized value in numeric gt.
      43c0b17f
    • Kevin Atkinson's avatar
      · 183040de
      Kevin Atkinson authored
      Many changes to tblog code.  Database update needed:
      
      1) Added summary of failed nodes is os_setup.  The cause of the error is now
      classified as "user" if it is only user images that failed and the user
      image failed on every pc of a particular type.  Otherwise I leave the cause
      as "unknown" since it is really hard to tell what the real cause is.
      
      2) Raised the confidence threshold for most errors so that they will appear
      on the top.
      
      3) Added a special error when an experiment is canceled.  The cause is
      "canceled" and testbed-ops won't see these errors.
      
      4) Fixed a bug in assign_wrapper where it will incorrectly report "This
      experiment cannot be instantiated on this testbed..." when really the user
      canceled the swapin.
      
      5) Fixed a bug where os_setup errors where being incorrectly reported as
      assign errors.  This happens when os_setup fails for some reason and
      tbswap tries again, but the second time around there are not enough nodes.
      So the last error is coming from assign even though the true cause of the
      error is due to failed nodes.  The fix for this involved added a new column
      to the log table, "attempt", which will be 1 for the first attempt and then
      incremented for each new attempt.  tblog_find_error will then simply ignore
      any errors with "attempt > 1".
      
      6) Also fixed a potential problem when there is an error during the cleanup
      phase by adding another column "cleanup".  tblog_find_error will
      also ignore any errors with the cleanup bit set.
      183040de
  36. 14 Jun, 2006 1 commit
    • Leigh B. Stoller's avatar
      The template "datastore" ... · fe9aa6a4
      Leigh B. Stoller authored
      Each template has a datastore, which is really just a subdirectory that can
      be populated with files, and committed to the subversion archive.  Note,
      the datastore os specific to the template itself. The Template Archive link
      on the Show Template page takes you to the subdirectory, which by
      convention I am calling "datastore".
      
      The directory actually lives in /proj/pid/exp/eid/TGUID-VERS ... but that
      path is printed out for you on the archive page.
      
      Anyway, put stuff in the datastore directory, and then commit the template
      archive so there is a tag associated with it.
      
      When an instance is created, a checkout of the datastore is placed in the
      experiment directory (/proj/pid/eid/exp/template_datastore). The current
      tag (from above) is stored with the instance so that we can later recreate
      the enviroment for the instance, say for rerun.
      
      Tarfiles and rpms in the datastore can be referenced as xxx://foo.rpm (in
      your NS file).  tarfiles_setup transforms those when the instance is
      swapped in, sorta like it does other URLs, only it does not actually fetch
      them, just need to rewrite the paths so they reference datastore.
      
      The program agent gets another environment variable so you can refer to the
      datastore without hardwiring paths ($DATASTORE). Eventually I want to move
      the checkout someplace else, but it was easy to drop it into the experiment
      directory for now.
      fe9aa6a4