1. 07 Dec, 2010 1 commit
  2. 20 Oct, 2010 1 commit
  3. 11 Oct, 2010 1 commit
    • Leigh B Stoller's avatar
      Work on an optimization to the perl code. Maybe you have noticed, but · 92f83e48
      Leigh B Stoller authored
      starting any one of our scripts can take a second or two. That time is
      spent including and compiling 10000s of thousands of lines of perl
      code, both from our libraries and from the perl libraries.
      Mostly this is just a maintenance thing; we just never thought about
      it much and we have a lot more code these days.
      So I have done two things.
      1) I have used SelfLoader() on some of our biggest perl modules.
         SelfLoader delays compilation until code is used. This is not as
         good as AutoLoader() though, and so I did it with just a few 
         modules (the biggest ones).
      2) Mostly I reorganized things:
        a) Split libdb into an EmulabConstants module and all the rest of
           the code, which is slowly getting phased out.
        b) Move little things around to avoid including libdb or Experiment
           (the biggest files).
        c) Change "use foo" in many places to a "require foo" in the
           function that actually uses that module. This was really a big
           win cause we have dozens of cases where we would include a
           module, but use it in only one place and typically not all.
      Most things are now starting up in 1/3 the time. I am hoping this will
      help to reduce the load spiking we see on boss, and also help with the
      upcoming Geni tutorial (which kill boss last time).
  4. 23 Jun, 2010 1 commit
  5. 14 Apr, 2010 1 commit
    • Mike Hibler's avatar
      Changes for speeding up elabinelab server setup. · 6feda7d3
      Mike Hibler authored
      Boss/ops/fs: reboot them together after setup rather than serially.
      Nodes: leave them in PXEWAIT throughout the setup, until after boss has
      been rebooted.  At that point we send them the new bootinfo RESTART command
      telling pxeboot to re-DHCP and use the new info obtained (next-server) to
      contact a potentially new boss node.  This is a quick way to switch a node
      in PXEWAIT from talking to the outer boss to talking to the inner one.
      A significant number of rinky-dink changes were needed to do this, primarily
      adding a new state, PXELIMBO, where nodes can be sent to sit until they are
      restarted.  It turns out, just putting them in an existing state such as
      PXEWAKEUP or SHUTDOWN wouldn't work, as they tend to timeout or otherwise
  6. 21 Dec, 2009 1 commit
    • Leigh B. Stoller's avatar
      New approach to dealing with nodes that fail to boot is os_setup, and · 5cf6aad2
      Leigh B. Stoller authored
      land in hwdown.
      Currently, if a node fails to boot in os_setup and the node is running
      a system image, it is moved into hwdown. 99% of the time this is
      wasted work; the node did not fail for hardware reasons, but for some
      other reason that is transient.
      The new approach is to move the node into another holding experiment,
      emulab-ops/hwcheckup. The daemon watches that experiment, and nodes
      that land in it are freshly reloaded with the default image and
      rebooted. If the node reboots okay after reload, it is released back
      into the free pool. If it fails any part of the reload/reboot, it is
      officially moved into hwdown.
      Another possible use; if you have a suspect node, you go wiggle some
      hardware, and instead of releasing it into the free pool, you move it
      into hwcheckup, to see if it reloads/reboots. If not, it lands in
      hwdown again. Then you break out the hammer.
      Most of the changes in Node.pm, libdb.pm, and os_setup are
      organizational changes to make the code cleaner.
  7. 12 Oct, 2009 1 commit
    • David Johnson's avatar
      Add the ability to load images on virtnodes. For now, we just overload · c6c57bc9
      David Johnson authored
      the tb-set-node-os command with a second optional argument; if that is
      present, the first arg is the child OS and the second is the parent OS.
      We add some new features in ptopgen (OS-parentOSname-childOSname) based
      off a new table that maps which child OSes can run on which parents, and
      the right desires get added to match.  We setup the reloads in os_setup
      along with the parents.  Also needed a new opmode, RELOAD-PCVM, to handle
      all this.
      For now, users only have to specify that their images can run on pcvms, a
      special hack for which type the images can run on.  This makes sense in
      general since there is no point conditionalizing childOS loading on
      hardware type at the moment, but rather on parentOS.  Hopefully this stuff
      wiill mostly work on shared nodes too, although we'll have to be more
      aggressive on the client side garbage collecting old frisbee'd images for
      long-lived shared hosts.
      I only made these changes in libvtop, so assign_wrapper folks are left in
      the dark.
      Currently, the client side supports frisbee.  Only in openvz for now, and
      this probably breaks libvnode_xen.pm.  Also in here are some openvz
      improvements, like ability to sniff out which network is the public
      control net, and which is the fake virtual control net.
  8. 22 Sep, 2009 1 commit
  9. 06 Mar, 2009 1 commit
  10. 05 Nov, 2008 1 commit
  11. 03 Nov, 2008 1 commit
  12. 10 Sep, 2008 1 commit
    • Kevin Atkinson's avatar
      Make nodereboot respect the waittime arg, and wait 10 minutes for PLC. · b7da57f5
      Kevin Atkinson authored
      Currently nodereboot in libreboot essentially ignores the waittime
      arg because it forks and calls node_reboot to do the real work, but
      doesn't pass on the waittime to it.  Fix this by adding a "-W"
      option to node_reboot in order to specify the waittime.
      Use this to extend the waittime for a PLC node to come up from 6 minutes to 10.
  13. 03 Sep, 2008 2 commits
  14. 14 Mar, 2008 1 commit
  15. 31 Aug, 2007 1 commit
    • David Johnson's avatar
      vnodesetup used to depend on there being an existing vnodesetup process · 2c3851ff
      David Johnson authored
      running on the node to force it to reboot.  Since planetlab has a bit of a
      history of bugs/oddities in the initscript process (and thus slivers don't
      always boot our stuff correctly), this commit special-cases the plab case
      so that even if there's not an existing vnodesetup, we kill all the sliver
      processes and rerun vnodesetup.
  16. 18 Jul, 2006 1 commit
    • Leigh B. Stoller's avatar
      Changes necessary for moving most of the stuff in the node_types · 624a0364
      Leigh B. Stoller authored
      table, into a new table called node_type_attributes, which is intended
      to be a more extensible way of describing nodes.
      The only things left in the node_types table will be type,class and the
      various isXXX boolean flags, since we use those in numerous joins all over
      the system (ie: when discriminating amongst nodes).
      For the most part, all of that other stuff is rarely used, or used in
      contexts where the information is needed, but not for type descrimination.
      Still, it made for a lot of queries to change!
      Along the way I added a NodeType library module that represents the type
      info as a perl object. I also beefed up the existing Node module, and
      started using it in more places. I also added an Interfaces module, but I
      have not done much with that yet.
      I have not yet removed all the slots from the node_types table; I plan to
      run the new code for a few days and then remove the slots.
      Example using the new NodeType object:
      	use NodeType;
      	my $typeinfo = NodeType->Lookup($type);
              if ($typeinfo->control_interface(\$control_iface) ||
                  !$control_iface) {
        	    warn "No control interface for $type is defined in the DB!\n";
      or using the Node:
      	use Node;
              my $nodeobject = Node->Lookup($node_id);
              my $imageable  = $nodeobject->NodeTypeInfo()->imageable();
              my $rebootable = $nodeobject->isrebootable();
              $nodeobject->NodeTypeAttribute("control_interface", \$control_iface);
      Lots of way to accomplish the same thing, but the main point is that the
      Node is able to override the NodeType (if it wants to), which I think is
      necessary for flexibly describing one/two of a kind things like switches, etc.
  17. 25 May, 2006 1 commit
  18. 13 Mar, 2006 1 commit
    • Leigh B. Stoller's avatar
      A set of changes to run "prepare" on a node just prior to an image · d8f8f9b4
      Leigh B. Stoller authored
      being taken.
      The basic strategy is to have node_reboot (when -p option supplied)
      invoke a special command on the node that will cause the shutdown
      procedure to run prepare as it goes single user, but before the
      network is turned off and the machine rebooted. The output of the
      prepare run is capture and send back via the tmcd BOOTLOG command and
      stored in the DB, so that create_image can dump that to the logfile
      (so that the person taking the image can know for certain that the
      prepare ran and finished okay).
      On linux this is pretty easy to arrange since reboot is actually
      shutdown and shutdown runs the K scripts in /etc/rc.d/rc6.d, and at
      the end the node is basically single user mode. I just added a new
      script to run prepare and send back the output.
      On FreeBSD this is a lot harder since there are no decent hooks.
      Instead, I had to hack up init (see tmcd/freebsd/init/{4,5,6}) with
      some simple code that looks for a command to run instead of going to a
      single user shell. The command (script) runs prepare, sends the output
      back to tmcd, and then does a real reboot.
      Okay, so how to get -p passed to node_reboot? I hacked up the
      libadminmfs code slightly to do that, with new 'prepare' argument
      option. This may not be the best approach; might have to do this as a
      real state transition if problems develop. I will wait and see.
      Also, I changed www/loadimage.php3 to spew the output of the
      create_image to the browser.
  19. 28 Dec, 2005 1 commit
  20. 22 Dec, 2005 1 commit
  21. 19 Dec, 2005 1 commit
    • Kevin Atkinson's avatar
      · 45f997fd
      Kevin Atkinson authored
      Updates to to Error Logging API Code.
      You should start seeing much better error messages coming from my
      system.  Errors coming from parse.proxy and assign (the two most
      frequent sources of errors) should now be concise and to the point.
      Errors coming from libosload/libreboot (the next most frequent source
      of errors) should now also be much better, but not perfect.  Getting
      perfect errors will likely a rework of how errors are handled in
      libosload/libreboot, just adding tberror/tbwarn/tbnotice calls is not
      enough.  I can do this at a latter date if necessary.
      A few minor database changes.
      Some changes to the API.  A few bug fixes. Lots of tberror/tbwarn/tbnotice
      added to scripts.
      Since assign is a C program, and at this time my API is perl only, I wrote a
      second wrapper around assign, assign_wrapper2.  When assign fails errors are
      now parsed in assign_wrapper2, sent to stderr and logged.  This means that
      RunAssign() just returns when assign fails rather than echoing some of
      assign.log output and then quiting.  The output to the activity log remains
      Since "parse.proxy" is run from ops I couldn't use my API in it, even though
      it is a perl program.  Instead I parse the errors coming form it in
  22. 21 Oct, 2005 1 commit
  23. 29 Jul, 2005 1 commit
    • Timothy Stack's avatar
      · cb7801fb
      Timothy Stack authored
      Fix the race between loading a mote and rebooting its host stargate.
      	* db/libdb.pm.in: Add TBNodeSubNodes function which returns the
      	list of subnodes for a given node.
      	* mote/tbuisp.in: Don't reboot the stargate anymore after loading
      	the attached mote.  The problem with the radio not working after
      	the upload should be fixed now.
      	* tbsetup/libreboot.pm.in: Check if a node's subnodes are being
      	reloaded.  If so, try to wait until they reach ISUP before
      	actually doing the reboot.
      	* tbsetup/os_setup.in: Do not skip the ISUP wait for subnodes that
      	are imageable (like motes), otherwise their allocstates are not
      	updated correctly.  Remove the robot-specific hack that	assumed
      	tbuisp would do the reboot if the attached mote was being reloaded.
  24. 06 Jan, 2005 1 commit
    • Leigh B. Stoller's avatar
      A bunch of boot changes. Read carefully. · 94ccc3f4
      Leigh B. Stoller authored
      * Add boot_errno to the nodes table so that nodes can report in a
        subcode to indicate what went wrong. At present, we do not report any
        real error codes; that is going to take some time to work out since it
        will reqiure a bunch of changes to the boot scripts.
      * Add new table node_bootlogs to store logs provided by the nodes. Not
        a full console log, but a log of the tmcd client side part. We can
        make it a full log if we want though; just means mucking about with
        the boot phase a bit.
      * Add new state transition to NORMALv2 and PCVM state machines. "TBFAILED"
        is a new state that is sent (after TBSETUP) if a node fails somewhere in
        the tmcd client side.
      * Change TBNodeStateWait() to take a list of states (instead of single
        state) and an optional pass by reference parameter to return the actual
        state that the node landed in. Change all calls to TBNodeStateWait() of
      * Change os_setup (and libreboot in wait mode) to look for both TBFAILED
        and ISUP. If a TBFAILED event is seen, we can terminate the wait early
        and not retry os_setup on physical nodes (although still retry virtual
        nodes). The nice thing about this is that the wait should terminate much
        earlier (rather then waiting for timeout), especially for virtual nodes
        which can take a really long time when there are a couple of hundred.
      * Add new routines dobooterrno() and dobootlog() to tmcd. Bump version
        number and increase the buffer size to allow for the larger packets that
        a console log wikk generate (added MAXTMCDPACKET variable, set to 0x4000).
      * Add new -f option to tmcc to specify a datafile to send along as the last
        argument to tmcd. This is more pleasing then trying to send a console log
        in on the command line. For example: "tmcc -f /tmp/log BOOTLOG" will send
        a BOOTLOG command along with the contents of /tmp/log.
        Also close the write side of the pipe so that server sees EOF on
        read. See aside comment below.
      * Changes to rc.bootsetup:
           1. Use perl tricks to capture all output, duping to the console and to
              a log file in /var/emulab/logs.
           2. On any error, send a status code (boot_errno) and the bootlog to
           3. Generate a TBFAILED state transition.
      * Changes to rc.injail:
           1. Same as rc.bootsetup, but do not send log files; that would pummel
              boss. Leave them on the physical node.
      * Change vnodesetup (which calls mkjail) to watch for any error and send a
        TBFAILED state transition. This should catch almost all errors, and
        dramatically reduce waiting when something fails.
      * Changes to rc.cdboot are essentially the same as rc.bootsetup, although a
        bootlog is sent all the time (success or failure), and I do not generate
        a boot_errno yet. Also, instead of TBFAILED, generate a PXEFAILED state
        since the CDROM is actually operating within the PXEFBSD opmode. I have
        yet to work this into the rest of the system though; waiting to get a new
        CD built and actually experiment with it.
      * Add new menu option and web page to display the node bootlog. We store
        only the lastest bootlog, but maybe someday store more then one. Display
        boot_errno on node page.
      Aside: I made a big mistake in the tmcd protocol; I did not envision
      passing more then a small amount of data (one fragment) and so I do not
      include a record terminator (ie: close of the write side on the client
      sends EOF) or a size field at the beginning. No big deal since small
      requests are sent in one fragment and the server sees the entire
      thing. Well, with a large console log, that will end up as multiple
      fragments, and the server will often not get the entire thing on the first
      read, and there are no subsequent reads (with no EOF or known size, it
      would block forever). Well, fixing this in a backwards compatable manner
      (for old images) was way too much pain. Instead, tmcc now closes the write
      side, and the server does subsequent reads *only* in the new dobbootlog()
      routine. Note that it *is* possible to fix this in a backwards compatable
      manner, but I did not want to go down that path just yet.
  25. 10 Dec, 2004 1 commit
  26. 06 Dec, 2004 1 commit
  27. 08 Mar, 2004 1 commit
    • Leigh B. Stoller's avatar
      Converted os_load and node_reboot into libraries. Basically that meant · 9bfe3d61
      Leigh B. Stoller authored
      splitting the existing code between a frontend script that parses arguments
      and does taint checking, and a backend library where all the work is done
      (including permission checks). The interface to the libraries is simple
      right now (didn't want to spend a lot of time on designing interface
      without knowing if the approach would work long term).
      	use libreboot;
      	use libosload;
              nodereboot(\%reboot_args, \%reboot_results);
              osload(\%reload_args, \%reload_results);
      Arguments are passed to the libraries in the form of a hash. For example,
      in os_setup:
      	$reload_args{'debug'}     = $dbg;
      	$reload_args{'asyncmode'} = 1;
      	$reload_args{'imageid'}   = $imageid;
      	$reload_args{'nodelist'}  = [ @nodelist ];
      Results are passed back both as a return code (-1 means total failure right
      away, while a positive argument indicates the number of nodes that failed),
      and in the results hash which gives the status for each individual node. At
      the moment it is just success or failure (0 or 1), but in the future might
      be something more meaningful.
      os_setup can now find out about individual failures, both in reboot and
      reload, and alter how it operates afterwards. The main thing is to not wait
      for nodes that fail to reboot/reload, and to terminate with no retry when
      this happens, since at the moment it indicates an unusual failure, and it
      is better to terminate early. In the past an os_load failure would result
      in a tbswap retry, and another failure (multiple times). I have already
      tested this by trying to load images that have no file on disk; it is nice
      to see those failures caught early and the experiment failure to happen
      much quicker!
      A note about "asyncmode" above. In order to promote parallelism in
      os_setup, asyncmode tells the library to fork off a child and return
      immediately. Later, os_setup can block and wait for status by calling
      back into the library:
      	my $foo = nodereboot(\%reboot_args, \%reboot_results);
      If you are wondering how the child reports individual node status back to
      the parent (so it can fill in the results hash), Perl really is a kitchen
      sink. I create a pipe with Perl's pipe function and then fork a child to so
      the work; the child writes the results to the pipe (status for each node),
      and the parent reads that back later when nodereboot_wait() is called,
      moving the results into the %reboot_results array. The parent meanwhile can
      go on and in the case of os_setup, make more calls to reboot/reload other
      nodes, later calling the wait() routines once all have been initiated.
      Also worth noting that in order to make the libraries "reentrant" I had to
      do some cleaning up and reorganizing of the code. Nothing too major though,
      just removal of lots of global variables. I also did some mild unrelated
      cleanup of code that had been run over once too many times with a tank.
      So how did this work out. Well, for os_setup/os_load it works rather
      node_reboot is another story. I probably should have left it alone, but
      since I had already climbed the curve on osload, I decided to go ahead and
      do reboot. The problem is that node_reboot needs to run as root (its a
      setuid script), which means it can only be used as a library from something
      that is already setuid. os_setup and os_load runs as the user. However,
      having a consistent library interface and the ability to cleanly figure out
      which individual nodes failed, is a very nice thing.
      So I came up with a suitable approach that is hidden in the library. When the
      library is entered without proper privs, it silently execs an instance of
      node_reboot (the setuid script), and then uses the same trick mentioned
      above to read back individual node status. I create the pipe in the parent
      before the exec, and set the no-close-on-exec flag. I pass the fileno along
      in an environment variable, and the library uses that to the write the
      results to, just like above. The result is that os_setup sees the same
      interface for both os_load and node_reboot, without having to worry that
      one or the other needs to be run setuid.