1. 25 Oct, 2007 1 commit
  2. 17 Sep, 2007 1 commit
  3. 16 Aug, 2007 1 commit
  4. 02 Aug, 2007 1 commit
  5. 25 Apr, 2007 1 commit
  6. 05 Apr, 2007 1 commit
  7. 08 Sep, 2006 1 commit
    • Kirk Webb's avatar
      · 3a3c95fb
      Kirk Webb authored
      Parallelize the setup of plab vnodes alongside the loading of local
      physical nodes.  We fork vnode_setup to operate on the plab vnodes just
      before firing off local reload/reboot/reconfig operations.  The status
      of the plab vnode setup setup is checked just before firing off vnode_setup
      for any local vnodes.  The ISUP wait for plab vnodes continues to fall
      within the same stage as wating for local vnodes.  New arguments have been
      added to vnode_setup to tell it to only operate on specific vnode types.
      '-j' for local jail nodes, and '-p' for plab nodes.  If neither are
      specified, the default is to operate on all types.
      3a3c95fb
  8. 21 Aug, 2006 1 commit
    • Kevin Atkinson's avatar
      · 9b718661
      Kevin Atkinson authored
      Avoid counting planetlab vnodes twice.
      9b718661
  9. 16 Aug, 2006 1 commit
    • Kevin Atkinson's avatar
      - Added tbreport database schema (added three tables), storage for · 9c5d3308
      Kevin Atkinson authored
        tbreport errors & context.
      
      - Modified fatal() in swapexp, batchexp, and tbprerun, and die_noretry()
        in os_setup to pass hash parameter to tblog functions.
      
      - Added tbreport errror & context information for select errors in
        swapexp, tbswap, assign_wrapper2, snmpit_lib, snmpit, batchexp,
        assign_wrapper, os_setup, parse-ns, & tbprerun.
      
      - Added assign error parser in assign_wrapper2.
      
      - Added parse.tcl error parser in parse-ns.
      
      - Added severity constants for tbreport in libtblog_simple.
      
      - Added tbreport() function & context table mappging for reporting
        discrete error types to libtblog.
      9c5d3308
  10. 27 Jul, 2006 1 commit
    • Kevin Atkinson's avatar
      · 0e5e57e6
      Kevin Atkinson authored
      Small bug fixes in cleanup in os_setup summary code.
      0e5e57e6
  11. 26 Jul, 2006 2 commits
    • Kevin Atkinson's avatar
      · 9237c34b
      Kevin Atkinson authored
      Fix syntax error.
      9237c34b
    • Kevin Atkinson's avatar
      · cbf3c5d4
      Kevin Atkinson authored
      swapexp: The previous commit, witch added a message about the recovery
      action when a swap-modify failed to the top of the email, did not
      catch all of the possible cases.  Added the case when the experiment is
      not swapped in.
      
      os_setup: Refactored/rewrote os_setup error summary code.  Distinguish
      the case when nodes fail to properly load the os and when the don't
      boot after loading the os.
      cbf3c5d4
  12. 21 Jul, 2006 1 commit
    • Kevin Atkinson's avatar
      · 2093eb10
      Kevin Atkinson authored
      Don't use "no warnings 'uninitialized'" since that is a perl 5.6+ feature
      and some are still using an ancient version of perl.
      2093eb10
  13. 20 Jul, 2006 3 commits
    • Kevin Atkinson's avatar
      · 494debf6
      Kevin Atkinson authored
      length => $length in os_setup!
      494debf6
    • Kevin Atkinson's avatar
      · 6c61b70c
      Kevin Atkinson authored
      Fixed bug in summary of failed nodes when there are more than can fit on a line.
      6c61b70c
    • Kevin Atkinson's avatar
      · 5710c340
      Kevin Atkinson authored
      Various tblog changes:
      
      Added message about recovery action when a swap-modify failed to the
      top of the email.
      
      Fine tuned os_setup summary error.  Added (possible partial) list of
      nodes that fail; if a large number fail only show as many that will
      fit on a single line.  Other tweaks.
      
      Flagged assign_wrapper errors of an Invalid OS as user errors.
      5710c340
  14. 18 Jul, 2006 1 commit
    • Leigh B. Stoller's avatar
      Changes necessary for moving most of the stuff in the node_types · 624a0364
      Leigh B. Stoller authored
      table, into a new table called node_type_attributes, which is intended
      to be a more extensible way of describing nodes.
      
      The only things left in the node_types table will be type,class and the
      various isXXX boolean flags, since we use those in numerous joins all over
      the system (ie: when discriminating amongst nodes).
      
      For the most part, all of that other stuff is rarely used, or used in
      contexts where the information is needed, but not for type descrimination.
      Still, it made for a lot of queries to change!
      
      Along the way I added a NodeType library module that represents the type
      info as a perl object. I also beefed up the existing Node module, and
      started using it in more places. I also added an Interfaces module, but I
      have not done much with that yet.
      
      I have not yet removed all the slots from the node_types table; I plan to
      run the new code for a few days and then remove the slots.
      
      Example using the new NodeType object:
      
      	use NodeType;
      
      	my $typeinfo = NodeType->Lookup($type);
      
              if ($typeinfo->control_interface(\$control_iface) ||
                  !$control_iface) {
        	    warn "No control interface for $type is defined in the DB!\n";
              }
      
      or using the Node:
      
      	use Node;
      
              my $nodeobject = Node->Lookup($node_id);
              my $imageable  = $nodeobject->NodeTypeInfo()->imageable();
      or
              my $rebootable = $nodeobject->isrebootable();
      or
              $nodeobject->NodeTypeAttribute("control_interface", \$control_iface);
      
      Lots of way to accomplish the same thing, but the main point is that the
      Node is able to override the NodeType (if it wants to), which I think is
      necessary for flexibly describing one/two of a kind things like switches, etc.
      624a0364
  15. 10 Jul, 2006 1 commit
  16. 08 Jul, 2006 1 commit
  17. 07 Jul, 2006 1 commit
  18. 05 Jul, 2006 2 commits
    • Kevin Atkinson's avatar
      · 43c0b17f
      Kevin Atkinson authored
      Fixed perl warning about Use of uninitialized value in numeric gt.
      43c0b17f
    • Kevin Atkinson's avatar
      · 183040de
      Kevin Atkinson authored
      Many changes to tblog code.  Database update needed:
      
      1) Added summary of failed nodes is os_setup.  The cause of the error is now
      classified as "user" if it is only user images that failed and the user
      image failed on every pc of a particular type.  Otherwise I leave the cause
      as "unknown" since it is really hard to tell what the real cause is.
      
      2) Raised the confidence threshold for most errors so that they will appear
      on the top.
      
      3) Added a special error when an experiment is canceled.  The cause is
      "canceled" and testbed-ops won't see these errors.
      
      4) Fixed a bug in assign_wrapper where it will incorrectly report "This
      experiment cannot be instantiated on this testbed..." when really the user
      canceled the swapin.
      
      5) Fixed a bug where os_setup errors where being incorrectly reported as
      assign errors.  This happens when os_setup fails for some reason and
      tbswap tries again, but the second time around there are not enough nodes.
      So the last error is coming from assign even though the true cause of the
      error is due to failed nodes.  The fix for this involved added a new column
      to the log table, "attempt", which will be 1 for the first attempt and then
      incremented for each new attempt.  tblog_find_error will then simply ignore
      any errors with "attempt > 1".
      
      6) Also fixed a potential problem when there is an error during the cleanup
      phase by adding another column "cleanup".  tblog_find_error will
      also ignore any errors with the cleanup bit set.
      183040de
  19. 14 Jun, 2006 1 commit
    • Leigh B. Stoller's avatar
      The template "datastore" ... · fe9aa6a4
      Leigh B. Stoller authored
      Each template has a datastore, which is really just a subdirectory that can
      be populated with files, and committed to the subversion archive.  Note,
      the datastore os specific to the template itself. The Template Archive link
      on the Show Template page takes you to the subdirectory, which by
      convention I am calling "datastore".
      
      The directory actually lives in /proj/pid/exp/eid/TGUID-VERS ... but that
      path is printed out for you on the archive page.
      
      Anyway, put stuff in the datastore directory, and then commit the template
      archive so there is a tag associated with it.
      
      When an instance is created, a checkout of the datastore is placed in the
      experiment directory (/proj/pid/eid/exp/template_datastore). The current
      tag (from above) is stored with the instance so that we can later recreate
      the enviroment for the instance, say for rerun.
      
      Tarfiles and rpms in the datastore can be referenced as xxx://foo.rpm (in
      your NS file).  tarfiles_setup transforms those when the instance is
      swapped in, sorta like it does other URLs, only it does not actually fetch
      them, just need to rewrite the paths so they reference datastore.
      
      The program agent gets another environment variable so you can refer to the
      datastore without hardwiring paths ($DATASTORE). Eventually I want to move
      the checkout someplace else, but it was easy to drop it into the experiment
      directory for now.
      fe9aa6a4
  20. 15 May, 2006 1 commit
    • Mike Hibler's avatar
      Initial "Inner Plab" support. In your NS file, you declare one node: · 9512772e
      Mike Hibler authored
      tb-set-node-plab-role $plc plc
      
      to make it the PLC node.  Then any number of other nodes are declared as:
      
      tb-set-node-plab-role $plab1 node
      
      to make them inner plab nodes.  Unlike elabinelab, there is no magic
      "tb-plab-in-elab" command which implies the topology, you put all the
      plab nodes in a LAN or whatever yourself.  This may or may not be a good idea.
      
      Anyway, these NS commands set DB state in virt_nodes and reserved much like
      elabinelab.  During swapin, the dhcpd.conf file is rewritten so that
      inner plab nodes have their "filename" set to "pxelinux.0" and their
      "next-server" set to the designated PLC node.  The PLC node will then be
      loaded/booted before anything is done to the inner-plab nodes.  After
      it comes up, the inner plab nodes are rebooted and declared as up.
      There is a new tmcd command "eplabconfig" (suggestions for a new name
      welcom!), which returns info like:
      
          NAME=plc ROLE=plc IP=155.98.36.3 MAC=00d0b713f57d
          NAME=plab1 ROLE=node IP=155.98.36.10 MAC=0002b3877a4f
          NAME=plab2 ROLE=node IP=155.98.36.34 MAC=00d0b7141057
      
      to just the PLC node (returns nothing to any other node).
      
      The implications of this setup are:
      
       * The PLC node must act as a TFTP server as we have discussed in the past.
         The TMCC info above is hopefully enough to configure pxelinux, if not
         we can change it.
      
       * The PLC node is responsible for loading the disks of inner plab nodes.
         This is implied by the setup, where we change the dhcpd.conf file before
         doing anything to the inner nodes.  Thus, once the inner nodes are
         rebooted, they will be talking pxelinux with PLC, and not to boss.
         This step is dubious, as we could no doubt load the disks faster than
         whatever plab uses can.  But it simplified the setup (and is more
         realistic!).  The alternative, which is something that might be useful
         anyway, is to introduce a "state" after which nodes have been reloaded
         but before they are rebooted.  With that, we can reload the plab nodes
         and then change the dhcpd.conf file so when they reboot they start
         talking to the PLC.
      9512772e
  21. 16 Feb, 2006 1 commit
  22. 05 Jan, 2006 1 commit
    • Kevin Atkinson's avatar
      · b958376f
      Kevin Atkinson authored
      Hopefully fix bug in die_noretry which causes an experment to incorretly
      report an error as:
        Can't locate object method "tberror" via package "No image can be found
        for arms-bb-mysql on pc29!" at /usr/testbed/libexec/os_setup line 41.
        (os_setup)
      b958376f
  23. 22 Dec, 2005 1 commit
  24. 21 Dec, 2005 1 commit
    • Kirk Webb's avatar
      · 9256faf3
      Kirk Webb authored
      Add pid/eid to subject line of virtual node failure notification message.
      9256faf3
  25. 19 Dec, 2005 1 commit
    • Kevin Atkinson's avatar
      · 45f997fd
      Kevin Atkinson authored
      Updates to to Error Logging API Code.
      
      You should start seeing much better error messages coming from my
      system.  Errors coming from parse.proxy and assign (the two most
      frequent sources of errors) should now be concise and to the point.
      Errors coming from libosload/libreboot (the next most frequent source
      of errors) should now also be much better, but not perfect.  Getting
      perfect errors will likely a rework of how errors are handled in
      libosload/libreboot, just adding tberror/tbwarn/tbnotice calls is not
      enough.  I can do this at a latter date if necessary.
      
      A few minor database changes.
      
      Some changes to the API.  A few bug fixes. Lots of tberror/tbwarn/tbnotice
      added to scripts.
      
      Since assign is a C program, and at this time my API is perl only, I wrote a
      second wrapper around assign, assign_wrapper2.  When assign fails errors are
      now parsed in assign_wrapper2, sent to stderr and logged.  This means that
      RunAssign() just returns when assign fails rather than echoing some of
      assign.log output and then quiting.  The output to the activity log remains
      unchanged.
      
      Since "parse.proxy" is run from ops I couldn't use my API in it, even though
      it is a perl program.  Instead I parse the errors coming form it in
      parse-ns.
      45f997fd
  26. 15 Dec, 2005 2 commits
    • Kirk Webb's avatar
      · 41c54939
      Kirk Webb authored
      The revived Plab interface is here!
      
      Lots of updates to the plab backend, including improved plab <-> elab node
      id translation and update handling.  Includes support for the current PLC
      API, and the new pl_conf node manager interface API.  Several more db library
      routines were ported from the perl library to the python one to support the
      new code (mostly the node_id tracking stuff).  Fixes to the client side and
      also a rootball creation cleanup (binaries removed from the CVS repo).
      
      There are also enhancements to the experiment view page for experiments
      including plab nodes: site and widearea hostname are now displayed along
      with the other node information.
      
      Note that the way setup timeout for vnodes is calculated has been changed a
      bit.  Instead of using a hardwired base timeout, the base timeout is now
      based on the reload_waittime database field, which comes from the 'OS'
      (e.g., FBSD-JAIL, RHL-PLAB) the vnode runs.
      
      The default max duration for a plab slice created through the plab_ez interface
      is set to 1 year, and linktest is currently disabled and hidden through
      the ez interface.
      
      There is still work to do, but this checkin brings with it a functional
      plab portal!
      41c54939
    • Leigh B. Stoller's avatar
      Commit the current archive support. Currently exposed to only · 6a0a1eb7
      Leigh B. Stoller authored
      studly users in the testbed project on the mainsite.
      6a0a1eb7
  27. 06 Dec, 2005 1 commit
    • Mike Hibler's avatar
      Phase II in disk state saving for swapout. · ed0d25b4
      Mike Hibler authored
      Exec summary: after this checkin, the infrastructure exists (once enabled)
      to create swapout-time "delta" images for all machines in experiments.
      There is only a single, cumulative swap image per node (i.e., all diffs
      are from the base image, not from the previous swap).
      
      What doesn't yet exist, is the mechanism for reloading the delta at
      swapin time.  That is Phase III.
      
      The nitty-gritty:
      
      1. Keep disk image signature files for all nodes in an experiment.
      
         New fields in the DB to track, for each disk partition, what image the
         partition was loaded from.  This enables us at swapin or os_load time to
         create signature files in /proj/<pid>/exp/<eid>/swapinfo for the current
         contents of a node disk/partition.  All nodes with the same image loaded
         will share (via symlink) the same signature file.  TODO: no longer
         referenced signature files should be removed.
      
         Signature info is only collected in the swapinfo directory if the
         experiment is set to have disk state saving enabled (see #5 below).
         Info consists of the <vname>.sig file, which is the file created
         by imagehash, and <vname>.part which says what the root disk is
         for the node and whether to look at the whole disk or just a single
         partition when crafting the delta image.
      
      2. Swapout-time hook for creating swapout image.
      
         If the experiment is marked as allowing disk state saving, tbswap
         will arrange to run and then monitor the create-swapimage command
         on each node.  This script will run the modified version of imagezip
         which uses the signature file to create a delta image.
      
         The command to run and maximum timeout are specified via sitevars
         (previously checked in).  Note that the tbswap script currently has
         special knowledge of /usr/local/bin/create-swapimage as a swapout
         time script.  If the swap/swapout_command sitevar is set to that,
         Magic Stuff shall occur (i.e. it will monitor the command and make
         periodic reports of progress).  The sitevars are a total hack and
         will disappear at some point.
      
      3. Client-side script for creating swapout image.
      
         os/create-swapimage, very similar to create-image.  Uses the info
         stashed in /proj/..blahblah../swapinfo to create a delta image.
      
         XXX fer now hack: the script first looks in /proj/<pid>/bin for an
         imagezip binary to use.  Failing that, it uses the one in the MFS.
         This allows for easier development of the imagezip changes (i.e.,
         don't have to update the MFS every time.
      
      4. Auto creation of signature files for new images.
      
         The create_image script (the one that runs on boss when creating images
         for users) has been modified to automatically create a signature via
         imagehash.  The .sig file winds up in /usr/testbed/images/sigs or
         in /proj/<pid>/images/sigs.  From there it will be copied at swapin/os_load
         time to the per-expt swapinfo directory for any node that uses the images.
      
         The process for creating standard system images (aka, "Mike") has not
         yet been modified.  When the image creation/installation procedure
         is formalized into a script, this will be done.
      
      5. Web changes to set/clear saving of disk state at swapout time.
      
         Add a checkbox to the experiment create page to allow setting "save
         swap state".  Also added to the experiment modify page, but currently
         "if (0)"ed out as it will need some additional support.  The showstuff
         page will show it.
      
         Taking a page from Leigh's hack book, if EXPOSESTATESAVE in defs.php3
         is set to zero (as it is now), then the checkbox doesn't appear in the
         create experiment page except for STUDLY users.
      ed0d25b4
  28. 04 Nov, 2005 1 commit
  29. 28 Sep, 2005 1 commit
  30. 29 Jul, 2005 1 commit
    • Timothy Stack's avatar
      · cb7801fb
      Timothy Stack authored
      Fix the race between loading a mote and rebooting its host stargate.
      
      	* db/libdb.pm.in: Add TBNodeSubNodes function which returns the
      	list of subnodes for a given node.
      
      	* mote/tbuisp.in: Don't reboot the stargate anymore after loading
      	the attached mote.  The problem with the radio not working after
      	the upload should be fixed now.
      
      	* tbsetup/libreboot.pm.in: Check if a node's subnodes are being
      	reloaded.  If so, try to wait until they reach ISUP before
      	actually doing the reboot.
      
      	* tbsetup/os_setup.in: Do not skip the ISUP wait for subnodes that
      	are imageable (like motes), otherwise their allocstates are not
      	updated correctly.  Remove the robot-specific hack that	assumed
      	tbuisp would do the reboot if the attached mote was being reloaded.
      cb7801fb
  31. 22 Feb, 2005 1 commit
    • Leigh B. Stoller's avatar
      Okay, first attempt to deal with os_setup waittimes on a per node_type · facc7acd
      Leigh B. Stoller authored
      and per OSID basis.
      
      * Added bios_waittime to node_types table and reboot_waittime to
        os_info table. Initialized them as follows:
      
              update node_types set bios_waittime=60 where class='pc';
              update os_info set reboot_waittime=150 where OS='Linux' or
      	  OS='FreeBSD' or OS='NetBSD';
              update os_info set reboot_waittime=180 where OS=Windows';
      
      * The bios waittime can be edited via the web interface.
      
      * The reboot waittime can be set only by admin people right now; this
        is another case of something that maybe the user should not see
        cause its too much stuff? Instead, default values are established in
        www/osiddefs.php3.
      
      * os_setup computes its per-node waitime as:
      
      	(bios_waittime + reboot_waittime) * 2
      
        as per Mike's suggestion. If either value is not defined in the DB,
        it defaults the original 7 minute value.
      facc7acd
  32. 04 Feb, 2005 1 commit
    • Timothy Stack's avatar
      · 6f545cf0
      Timothy Stack authored
      Some more robot integration.
      
      	* event/lib/event.h, event/lib/event.c: Add some
      	event_notification creation functions that get used in
      	event-sched.
      
      	* event/sched/event-sched.c, event/sched/rpc.h,
      	event/sched/rpc.cc: Sync the start of event time with the
      	robots reaching their initial positions.  This is done by
      	creating a master event-sequence that takes care of sending
      	the SETDEST events and then starting the Simulator timeline.
      
      	* mote/tbuisp.in: Use node_reboot instead of ssh'ing in.
      
      	* robots/GNUmakefile.in: Don't build tbsetdest if ulsshxmlrpcpp is
      	not available.
      
      	* robots/emc/loclistener.in: Clear the destination values for a
      	node when it reaches its destination.
      
      	* robots/primotion/garcia-pilot.cc,
      	robots/primotion/pilotClient.cc: Some cleanup and debugging.
      
      	* robots/primotion/wheelManager.cc: Check the rear sensors for
      	obstructions and then decide which way to pivot.
      
      	* robots/rmcd/pilotConnection.h, robots/rmcd/pilotConnection.c,
      	robots/rmcd/rclip.h: Even more tweaking.
      
      	* robots/tbsetdest/tbsetdest.cc: Don't generate points that are
      	outside the camera bounds or inside an obstacle.
      
      	* robots/vmcd/visionTrack.c, robots/vmcd/vmcd.h,
      	robots/vmcd/vmcd.c:  Add more debugging output.
      
      	* tbsetup/os_setup.in: Removed the robot hack used when deciding
      	which nodes to reconfig/reboot.  Added a robot hack to avoid
      	rebooting a robot whose mote is being os_load'd, since it would
      	interrupt tbuisp which does the reboot anyways.  Also fixed a
      	small typo.
      
      	* tbsetup/ns2ir/node.tcl, tbsetup/ns2ir/sim.tcl.in: Oops, forgot
      	to convert degrees to radians.
      
      	* tbsetup/ns2ir/topography.tcl: When checking for node destination
      	points in obstacles, include the implicit exclusion zone.
      
      	* tmcd/common/bootsubnodes: Add empty "mote" case.
      
      	* tmcd/linux-sg/GNUmakefile.in: Make some of the /etc subdirs when
      	doing the install.
      
      	* tmcd/linux-sg/rc.stargate: Start garcia-pilot.
      
      	* vis/floormap.in: Add options for showing the camera bounds,
      	obstacle exclusion zones, and displaying vnames instead of pnames.
      
      	* www/ledpipe.php3: Finally figured out how to use a socket
      	instead of popening a perl script.
      
      	* www/robotmap.php3: Add checkboxes for displaying/not displaying
      	the camera bounds and obstacle exclusion zones.  Add a legend
      	showing what actual vs. destination points are.  Pass pid/eid
      	through to vis/floormap if it is given.
      
      	* www/showexp.php3: Add a "Robot Map" link to experiments that
      	have allocated garcias.
      
      	* www/floormap/map_legend_node.gif,
      	www/floormap/map_legend_node_dst.gif: Icons used by the robot map
      	legend.
      
      	* www/floormap/robots-4.jpg: Added an obstacle around the entryway
      	so people are slightly less likely to trip over the robots.  Added
      	a coordinate system legend to the top left corner.
      
      	* www/tutorial/mobilewireless.php3: Add links to David's movie of
      	the robot making its way around the pillar.
      
      	* www/tutorial/robot_anim.gif: A nifty gifanim clip of the robot
      	movie.
      
      	* xmlrpc/emulabserver.py.in: Pull the camera data from the DB,
      	instead of returning hardcoded stuff.
      6f545cf0
  33. 24 Jan, 2005 1 commit
    • Timothy Stack's avatar
      · 3c1a5bad
      Timothy Stack authored
      Robot related stuff: power via e-mail, client-install fixups, checking
      coords against camera boundaries.
      
      	* configure, configure.in: Add tbsetup/power_mail.pm to the list
      	of template files.
      
      	* doc/cross-compiling.txt: More stargate notes.
      
      	* event/sched/rpc.cc: Updates for the addition of the cameras
      	table.
      
      	* robots/GNUmakefile.in, robots/emc/GNUmakefile.in,
      	robots/mtp/GNUmakefile.in, robots/rmcd/GNUmakefile.in,
      	robots/tbsetdest/GNUmakefile.in, robots/vmcd/GNUmakefile.in:
      	client-install fixups.
      
      	* tbsetup/GNUmakefile.in: Add power_mail.pm.
      
      	* tbsetup/os_setup.in: Don't skip reboot of robots anymore.
      
      	* tbsetup/power.in: Add special case for a power_id of "mail",
      	which calls into the power_mail.pm backend.
      
      	* tbsetup/power_mail.pm.in: E-mail backend for power, it sends an
      	e-mail to tbops and waits for the outlets.last_power value to be
      	updated from the power.php3 web page.
      
      	* tbsetup/ns2ir/parse-ns.in: Add the contents of the cameras table
      	to the TBCOMPAT namespace.
      
      	* tbsetup/ns2ir/sim.tcl.in: More checking of "setdest" inputs.
      
      	* tbsetup/ns2ir/topography.tcl: Update the checkdest method to
      	check destination points against the camera list.
      
      	* www/powertime.php3: Webpage used to update the last power time
      	for nodes.
      
      	* www/shownode.php3: Add "Update Power Time" menu button.
      3c1a5bad
  34. 19 Jan, 2005 1 commit
    • Leigh B. Stoller's avatar
      Some changes for Mike and Firewalled ElabInElab experiments. I need · 263e2cb3
      Leigh B. Stoller authored
      all of the nodes to boot up normally before I can turn them into an
      inner elab (latter, after os_setup). That cannot happen with the
      firewall rules in place. So, when an experiment is firewalled, reorder
      the boot/wait list and wait for the firewall node first. Once that
      hits ISUP, tell the elabinelab code (-f) option so that it can do what
      it needs, which right now means an ssh over to the firewall node to
      temporarily disable all the rules.
      
      We still need to deal with teardown though.
      263e2cb3
  35. 06 Jan, 2005 1 commit
    • Leigh B. Stoller's avatar
      A bunch of boot changes. Read carefully. · 94ccc3f4
      Leigh B. Stoller authored
      * Add boot_errno to the nodes table so that nodes can report in a
        subcode to indicate what went wrong. At present, we do not report any
        real error codes; that is going to take some time to work out since it
        will reqiure a bunch of changes to the boot scripts.
      
      * Add new table node_bootlogs to store logs provided by the nodes. Not
        a full console log, but a log of the tmcd client side part. We can
        make it a full log if we want though; just means mucking about with
        the boot phase a bit.
      
      * Add new state transition to NORMALv2 and PCVM state machines. "TBFAILED"
        is a new state that is sent (after TBSETUP) if a node fails somewhere in
        the tmcd client side.
      
      * Change TBNodeStateWait() to take a list of states (instead of single
        state) and an optional pass by reference parameter to return the actual
        state that the node landed in. Change all calls to TBNodeStateWait() of
        course.
      
      * Change os_setup (and libreboot in wait mode) to look for both TBFAILED
        and ISUP. If a TBFAILED event is seen, we can terminate the wait early
        and not retry os_setup on physical nodes (although still retry virtual
        nodes). The nice thing about this is that the wait should terminate much
        earlier (rather then waiting for timeout), especially for virtual nodes
        which can take a really long time when there are a couple of hundred.
      
      * Add new routines dobooterrno() and dobootlog() to tmcd. Bump version
        number and increase the buffer size to allow for the larger packets that
        a console log wikk generate (added MAXTMCDPACKET variable, set to 0x4000).
      
      * Add new -f option to tmcc to specify a datafile to send along as the last
        argument to tmcd. This is more pleasing then trying to send a console log
        in on the command line. For example: "tmcc -f /tmp/log BOOTLOG" will send
        a BOOTLOG command along with the contents of /tmp/log.
      
        Also close the write side of the pipe so that server sees EOF on
        read. See aside comment below.
      
      * Changes to rc.bootsetup:
           1. Use perl tricks to capture all output, duping to the console and to
              a log file in /var/emulab/logs.
           2. On any error, send a status code (boot_errno) and the bootlog to
              tmcd.
           3. Generate a TBFAILED state transition.
      
      * Changes to rc.injail:
           1. Same as rc.bootsetup, but do not send log files; that would pummel
              boss. Leave them on the physical node.
      
      * Change vnodesetup (which calls mkjail) to watch for any error and send a
        TBFAILED state transition. This should catch almost all errors, and
        dramatically reduce waiting when something fails.
      
      * Changes to rc.cdboot are essentially the same as rc.bootsetup, although a
        bootlog is sent all the time (success or failure), and I do not generate
        a boot_errno yet. Also, instead of TBFAILED, generate a PXEFAILED state
        since the CDROM is actually operating within the PXEFBSD opmode. I have
        yet to work this into the rest of the system though; waiting to get a new
        CD built and actually experiment with it.
      
      * Add new menu option and web page to display the node bootlog. We store
        only the lastest bootlog, but maybe someday store more then one. Display
        boot_errno on node page.
      
      Aside: I made a big mistake in the tmcd protocol; I did not envision
      passing more then a small amount of data (one fragment) and so I do not
      include a record terminator (ie: close of the write side on the client
      sends EOF) or a size field at the beginning. No big deal since small
      requests are sent in one fragment and the server sees the entire
      thing. Well, with a large console log, that will end up as multiple
      fragments, and the server will often not get the entire thing on the first
      read, and there are no subsequent reads (with no EOF or known size, it
      would block forever). Well, fixing this in a backwards compatable manner
      (for old images) was way too much pain. Instead, tmcc now closes the write
      side, and the server does subsequent reads *only* in the new dobbootlog()
      routine. Note that it *is* possible to fix this in a backwards compatable
      manner, but I did not want to go down that path just yet.
      94ccc3f4