- 16 Jan, 2009 1 commit
-
-
Leigh B. Stoller authored
experiment with just a firewall and no nodes. That was failing cause the switchs does not create any vlans, and so there is nothing to trunk with.
-
- 13 Jan, 2009 1 commit
-
-
Leigh B. Stoller authored
doswapin(modify) so that new test in nfree for vlan membership does not trigger.
-
- 08 Jan, 2009 1 commit
-
-
Leigh B. Stoller authored
snmpit changes? See commitlog for snmpit.
-
- 09 Jul, 2008 1 commit
-
-
Leigh B. Stoller authored
Previously, any error in assign wrapper would cause the experiment to swap out because the "DB had been modified" ... well I have isolated all of the changes that are made, and errors in assign_wrapper proper no longer do that. tbswap now restores the experiment back the way it was. Not that errors after assign_wrapper (like in os_setup) are still a problem. In addition, rather then kill off all of the vlans, leave them in place and then do a comparison after assign wrapper, removing obsolete and modified vlans only. I have made use of the obsolete vlans table for this by having snmpit track its changes in that table. There is a bunch of new code in Lan.pm for doing the comparisons.
-
- 03 Jun, 2008 1 commit
-
-
Leigh B. Stoller authored
-
- 22 Feb, 2008 1 commit
-
-
Kevin Atkinson authored
from an experiment such as hwdown.
-
- 06 Feb, 2008 1 commit
-
-
David Johnson authored
now, this is keyed off nodetype. Lots of hardcoded constants and config stuff moved to attributes in the db. You can now set per-PLC and per-slice attributes, so you can (for instance) use different auth info whenever you want. Experiments can use preexisting slices if somebody sets up the db before swapin. Also, we no longer have to rely on slices.xml to sync up nodes/sites with PLC... can use xmlrpc instead. Lots of code cleanup, improved some abstractions, etc.
-
- 25 Jan, 2008 1 commit
-
-
Leigh B. Stoller authored
the state was backed up but we do not know if assign_wrapper removed the state before it failed (with recover on).
-
- 02 Aug, 2007 1 commit
-
-
Leigh B. Stoller authored
thankless job but someone has to do it. I'm expecting to finish by the time Bush 43 leaves office.
-
- 15 May, 2007 1 commit
-
-
Leigh B. Stoller authored
* Records are now "help open" when a run is stopped. When the next run is started, a check is made to see if the files (/project/$pid/exp/$eid) have changed, and if so a new version of the archive is committed before the next run is started. * Change the way swapmod is handled within an instance. A new option on the ShowExp page called Modify Resources. The intent is to allow an instance to be modified without having to start and stop runs, which tends to clutter things up, according to our user base. So, if you are within a run, that run is reset (reused) after the swapmod is finished. You can do this as many times as you like. If you are between runs (last operation was a stoprun), do the swapmod and then "speculatively" start a new run. Subsequent modifies reuse the that run again, as above. I think this is what Kevin was after ... there are some UI issues that may need to be resolved, will wait to hear what people have to say. * Revising a record is now supported. Export, change in place, and then use the Revise link on the ShowRun page. Currently this has to happen from the export directory on ops, but eventually allow an upload (to correspond to downloaded exports) * Check to see if export already exists, and give warning. Added a checkbox that allows user to overwrite the export. * A bunch of minor UI changes to the various template pages.
-
- 25 Apr, 2007 1 commit
-
-
Leigh B. Stoller authored
-
- 20 Oct, 2006 1 commit
-
-
Mike Hibler authored
Two-day boondoggle to support "/scratch", an optional large, shared filesystem for users. To do this, I needed to find all the instances where /proj is used and behave accordingly. The boondoggle part was the decision to gather up all the hardwired instances of shared directory names ("/proj", "/users", etc.) so that they are set in a common place (via unexposed configure variables). This is a boondoggle because: 1. I didn't change the client-side scripts. They need a different mechanism (e.g., tmcd) to get the info, configure is the wrong way. 2. Even if I had done #1 it is likely--no, certain--that something would fail if you tried to rename "/proj" to be "/mike". These names are just too ingrained. 3. We may not even use "/scratch" as it turns out. Note, I also didn't fix any of the .html documentation. Anyway, it is done. To maintain my illusion in the future you should: 1. Have perl scripts include "use libtestbed" and use the defined PROJRO...
-
- 02 Oct, 2006 1 commit
-
-
Leigh B. Stoller authored
-
- 25 Aug, 2006 1 commit
-
-
Leigh B. Stoller authored
that clients and servers can avoid using hardwired ports on those experimental nodes. I have added the following tmcd operation: tmcc portregister <service> [<port>] where we assume its the control network IP (from the DB), and the pid/eid of the node the experiment belongs to. The given port is entered into the port_registration table for the experiment, using the service as the tag. Supplying port=0 clears the registration from the table. When called like: tmcc portregister <service> we return the registered port, or nothing. I hacked up a little C library module in libtb so that there is something that looks like a C interface to this: int PortRegister(char *service, int port); int PortLookup(char *service, char *hostname, int namelen, int *port); The above routines call out to tmcc of course. Lastly, I changed the sync server and client to use the new port registration, via the library calls above. There are other emulab services that need to be changed as well, but they can be done on an as needed basis.
-
- 16 Aug, 2006 1 commit
-
-
Kevin Atkinson authored
tbreport errors & context. - Modified fatal() in swapexp, batchexp, and tbprerun, and die_noretry() in os_setup to pass hash parameter to tblog functions. - Added tbreport errror & context information for select errors in swapexp, tbswap, assign_wrapper2, snmpit_lib, snmpit, batchexp, assign_wrapper, os_setup, parse-ns, & tbprerun. - Added assign error parser in assign_wrapper2. - Added parse.tcl error parser in parse-ns. - Added severity constants for tbreport in libtblog_simple. - Added tbreport() function & context table mappging for reporting discrete error types to libtblog.
-
- 05 Jul, 2006 1 commit
-
-
Kevin Atkinson authored
Many changes to tblog code. Database update needed: 1) Added summary of failed nodes is os_setup. The cause of the error is now classified as "user" if it is only user images that failed and the user image failed on every pc of a particular type. Otherwise I leave the cause as "unknown" since it is really hard to tell what the real cause is. 2) Raised the confidence threshold for most errors so that they will appear on the top. 3) Added a special error when an experiment is canceled. The cause is "canceled" and testbed-ops won't see these errors. 4) Fixed a bug in assign_wrapper where it will incorrectly report "This experiment cannot be instantiated on this testbed..." when really the user canceled the swapin. 5) Fixed a bug where os_setup errors where being incorrectly reported as assign errors. This happens when os_setup fails for some reason and tbswap tries again, but the second time around there are not enough nodes. So the last error is coming from assign even though the true cause of the error is due to failed nodes. The fix for this involved added a new column to the log table, "attempt", which will be 1 for the first attempt and then incremented for each new attempt. tblog_find_error will then simply ignore any errors with "attempt > 1". 6) Also fixed a potential problem when there is an error during the cleanup phase by adding another column "cleanup". tblog_find_error will also ignore any errors with the cleanup bit set.
-
- 15 May, 2006 1 commit
-
-
Mike Hibler authored
tb-set-node-plab-role $plc plc to make it the PLC node. Then any number of other nodes are declared as: tb-set-node-plab-role $plab1 node to make them inner plab nodes. Unlike elabinelab, there is no magic "tb-plab-in-elab" command which implies the topology, you put all the plab nodes in a LAN or whatever yourself. This may or may not be a good idea. Anyway, these NS commands set DB state in virt_nodes and reserved much like elabinelab. During swapin, the dhcpd.conf file is rewritten so that inner plab nodes have their "filename" set to "pxelinux.0" and their "next-server" set to the designated PLC node. The PLC node will then be loaded/booted before anything is done to the inner-plab nodes. After it comes up, the inner plab nodes are rebooted and declared as up. There is a new tmcd command "eplabconfig" (suggestions for a new name welcom!), which returns info like: NAME=plc ROLE=plc IP=155.98.36.3 MAC=00d0b713f57d NAME=plab1 ROLE=node IP=155.98.36.10 MAC=0002b3877a4f NAME=plab2 ROLE=node IP=155.98.36.34 MAC=00d0b7141057 to just the PLC node (returns nothing to any other node). The implications of this setup are: * The PLC node must act as a TFTP server as we have discussed in the past. The TMCC info above is hopefully enough to configure pxelinux, if not we can change it. * The PLC node is responsible for loading the disks of inner plab nodes. This is implied by the setup, where we change the dhcpd.conf file before doing anything to the inner nodes. Thus, once the inner nodes are rebooted, they will be talking pxelinux with PLC, and not to boss. This step is dubious, as we could no doubt load the disks faster than whatever plab uses can. But it simplified the setup (and is more realistic!). The alternative, which is something that might be useful anyway, is to introduce a "state" after which nodes have been reloaded but before they are rebooted. With that, we can reload the plab nodes and then change the dhcpd.conf file so when they reboot they start talking to the PLC.
-
- 04 May, 2006 1 commit
-
-
Mike Hibler authored
-
- 02 May, 2006 1 commit
-
-
Mike Hibler authored
Since we rerun assign, it can shuffle around the interfaces on a node even if the node is "fixed".
-
- 30 Mar, 2006 1 commit
-
-
Mike Hibler authored
Use that option in the undoFW function to make sure we don't try to cleanup virtnodes.
-
- 28 Mar, 2006 1 commit
-
-
Mike Hibler authored
couple of MFS booting problems: * in the RPC power controller, make sure that an "on" command succeeds by checking the status, retrying if it failed (we already did this for "off") * if nodes fail to boot up the MFS after a power on, try again with a power cycle. I have seen "power on" leave pc600s hung, and a power cycle seems to cure it.
-
- 20 Mar, 2006 1 commit
-
-
Mike Hibler authored
-
- 02 Mar, 2006 1 commit
-
-
Leigh B. Stoller authored
-
- 16 Feb, 2006 1 commit
-
-
Leigh B. Stoller authored
-
- 02 Feb, 2006 1 commit
-
-
Timothy Stack authored
Various nfstrace changes that have been sitting in my tree for awhile. * GNUmakefile.in: Do fs-install in the sensors subdir so the nfstracer gets installed. * sensors/and/and-emulab.priorities: Add some more daemon uid's to be excluded from auto-nicing. * sensors/and/and.c: Ignore invalid uids/gids in the config file instead of dying. * sensors/nfstrace/GNUmakefile: Makefile used to generate nfsdb-create.sql. * sensors/nfstrace/GNUmakefile.in: Some more installation stuff. * sensors/nfstrace/nfsdb-create.sql: SQL used to create the nfsdb database. * sensors/nfstrace/nfsdump2db: Bunch of bug fixes and cleanup. * sensors/nfstrace/nfsdump2db.8, sensors/nfstrace/nfstrace.7, sensors/nfstrace/nfstrace.proxy.8: Start at some man pages. * sensors/nfstrace/nfstrace.init.in: Try to detect the interface to listen on, not perfect though. Add a restart handler that just restarts nfsdump2db. Some other cleanup. * sensors/nfstrace/nfstrace.proxy: Some optimizations for resolving file names. * sensors/nfstrace/nfsdump2/*: Only print summaries of read/write packets and start a separate thread to read from the bpf socket. * tbsetup/tbswap.in: Stop transferring nfs accesses to boss' db until we figure out what we want to do with it.
-
- 19 Jan, 2006 1 commit
-
-
Leigh B. Stoller authored
-
- 28 Dec, 2005 1 commit
-
-
Leigh B. Stoller authored
* The rest of the backend support for simplistic experiment duplication, either from an existing (current) experiment, or from a specific archive revision of a current or terminated experiment. * Rework the swapmod code so that the archive is committed at the exact end of the swapout phase. This required adding (and moving) some code from swapexp to tbswap sine that is where the actual swapout/swapin happens during a swapmod * Add a special directory called "archive" to the experiment directory, which is a place where users can store stuff they want saved away. This will eventually be a user defined set of directories, but this was good for getting the basic mechanism in place. Note that the when the contents of this directory are copied out for placement into the archive, it is an exact copy made with rsync. * No longer "clean" the contents of the temporary store between commits of the archive. This was creating a lot of headaches, and was also causing the revision history to get messed up. The downside of this is that we have to be more careful to explicitly delete files that the user no longer uses. I have not solved all these issues yet, so in the meantime files will get left in the archive even if the user no longer references them.
-
- 27 Dec, 2005 1 commit
-
-
Mike Hibler authored
If you specify an explicit firewall, you are implicitly assigned security level 2 and you cannot explicitly specify the security level. Likewise, if you specify a security level, you cannot also specify a firewall. The reason for this is that security level 1 (aka "Blue") now has a slightly different meaning. It is intended for protecting the inside from the outside rather than visa-versa. The only practical implication of this is that for level 1, we don't do all the fancy power-off-boot-into-MFS-zapbootblock stuff that we do for higher levels. Anyway, I wanted to make sure that if you specify your own firewall, you DO have to go through the full cleansing swapout since we can't trust a firewall that the Average Joe sets up.
-
- 21 Dec, 2005 2 commits
-
-
Mike Hibler authored
and now also tbres!
-
Mike Hibler authored
("correctly" here meaning "don't go down in flames", not "we should never save the state of vnodes", that will have to be revisited)
-
- 19 Dec, 2005 4 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
-
Kevin Atkinson authored
Updates to to Error Logging API Code. You should start seeing much better error messages coming from my system. Errors coming from parse.proxy and assign (the two most frequent sources of errors) should now be concise and to the point. Errors coming from libosload/libreboot (the next most frequent source of errors) should now also be much better, but not perfect. Getting perfect errors will likely a rework of how errors are handled in libosload/libreboot, just adding tberror/tbwarn/tbnotice calls is not enough. I can do this at a latter date if necessary. A few minor database changes. Some changes to the API. A few bug fixes. Lots of tberror/tbwarn/tbnotice added to scripts. Since assign is a C program, and at this time my API is perl only, I wrote a second wrapper around assign, assign_wrapper2. When assign fails errors are now parsed in assign_wrapper2, sent to stderr and logged. This means that RunAssign() just returns when assign fails rather than echoing some of assign.log output and then quiting. The output to the activity log remains unchanged. Since "parse.proxy" is run from ops I couldn't use my API in it, even though it is a perl program. Instead I parse the errors coming form it in parse-ns.
-
Mike Hibler authored
(actually just stats gathering right now, no images are produced)
-
- 15 Dec, 2005 1 commit
-
-
Timothy Stack authored
the experiment from ops' db to boss'. File names are stored in the accessed_files table and are associated with the experiment via the fs_resources table.
-
- 12 Dec, 2005 1 commit
-
-
Timothy Stack authored
-
- 08 Dec, 2005 1 commit
-
-
Mike Hibler authored
-
- 07 Dec, 2005 1 commit
-
-
Mike Hibler authored
Don't whine about missing swapinfo for nodes in experiments that are not saving disk state.
-
- 06 Dec, 2005 2 commits
-
-
Mike Hibler authored
Exec summary: after this checkin, the infrastructure exists (once enabled) to create swapout-time "delta" images for all machines in experiments. There is only a single, cumulative swap image per node (i.e., all diffs are from the base image, not from the previous swap). What doesn't yet exist, is the mechanism for reloading the delta at swapin time. That is Phase III. The nitty-gritty: 1. Keep disk image signature files for all nodes in an experiment. New fields in the DB to track, for each disk partition, what image the partition was loaded from. This enables us at swapin or os_load time to create signature files in /proj/<pid>/exp/<eid>/swapinfo for the current contents of a node disk/partition. All nodes with the same image loaded will share (via symlink) the same signature file. TODO: no longer referenced signature files should be removed. Signature info is only collected in the swapinfo directory if the experiment is set to have disk state saving enabled (see #5 below). Info consists of the <vname>.sig file, which is the file created by imagehash, and <vname>.part which says what the root disk is for the node and whether to look at the whole disk or just a single partition when crafting the delta image. 2. Swapout-time hook for creating swapout image. If the experiment is marked as allowing disk state saving, tbswap will arrange to run and then monitor the create-swapimage command on each node. This script will run the modified version of imagezip which uses the signature file to create a delta image. The command to run and maximum timeout are specified via sitevars (previously checked in). Note that the tbswap script currently has special knowledge of /usr/local/bin/create-swapimage as a swapout time script. If the swap/swapout_command sitevar is set to that, Magic Stuff shall occur (i.e. it will monitor the command and make periodic reports of progress). The sitevars are a total hack and will disappear at some point. 3. Client-side script for creating swapout image. os/create-swapimage, very similar to create-image. Uses the info stashed in /proj/..blahblah../swapinfo to create a delta image. XXX fer now hack: the script first looks in /proj/<pid>/bin for an imagezip binary to use. Failing that, it uses the one in the MFS. This allows for easier development of the imagezip changes (i.e., don't have to update the MFS every time. 4. Auto creation of signature files for new images. The create_image script (the one that runs on boss when creating images for users) has been modified to automatically create a signature via imagehash. The .sig file winds up in /usr/testbed/images/sigs or in /proj/<pid>/images/sigs. From there it will be copied at swapin/os_load time to the per-expt swapinfo directory for any node that uses the images. The process for creating standard system images (aka, "Mike") has not yet been modified. When the image creation/installation procedure is formalized into a script, this will be done. 5. Web changes to set/clear saving of disk state at swapout time. Add a checkbox to the experiment create page to allow setting "save swap state". Also added to the experiment modify page, but currently "if (0)"ed out as it will need some additional support. The showstuff page will show it. Taking a page from Leigh's hack book, if EXPOSESTATESAVE in defs.php3 is set to zero (as it is now), then the checkbox doesn't appear in the create experiment page except for STUDLY users.
-
Leigh B. Stoller authored
linktest at level 3 if a mere user. Studly users still have control though. Note that errors are no longer mailed to user by linktest_control. Also moved duplicated code to get dbuid (and email address) to top of file.
-