- 16 Dec, 2004 2 commits
-
-
Robert Ricci authored
the web interface.
-
Leigh B. Stoller authored
* tbsetup/panic.in: New backend script to implement the panic button feature. When used, it will cut the severe the connection to the firewall node by using snmpit to disable the port. Sets the panic bit (and date) in the experiments table, and changes the state of the experiment from "active" to "paniced" to ensure that the experiment cannot be messed with (swapped out or modified). Sends email to tbops when the panic button is pressed. Used with -r option, reverses the above. State is set back to active, the panic bit is cleared, and the port is renabled with snmpit. * tbsetup/tbswap.in: During swapout, a firewalled experiment that has been paniced will get a cleaning; The nodes are powered off, then the osids for all the nodes are reset (with os_select) so that they will boot the MFS, and then the nodes are powered on. Then the control network is turned back on, and then I wait for the nodes to reboot (this is simply cause we do not record in the DB that a node is turned off, and if I do not wait, the reload daemon will end hitting the power button again if they do not reboot in time. We can fix this later. I am not planning to apply this to general firewalled experiments yet as the power cycling is going to be hard on the nodes, so would rather that we at least have a 1/2 baked plan before we do that. * www/showexp.php3: If experiment is firewalled, show the Panic Button, linked to the panic button web script. If the experiment has already had the panic button pressed, show a big warning message and explain that user must talk to tbops to swap the experiment out. Also fiddle with menu options so that the terminate link is gone, and the swap link is visible only in admin mode. In other words, only an admin person can swap an experiment once it is paniced. And of course, an admin person can the backend panic script above with the -r option, but thats not something to be done lightly. * db/libdb.pm.in: Add "paniced" as an experiment state (EXPTSTATE_PANICED). Add utility functions: TBExptSetPanicBit(), TBExptGetPanicBit(), and TBExptClearPanicBit(). * tbsetup/swapexp.in: Minor state fiddling so that an experiment can be swapped while in paniced state, but only when in admin mode. Also clear the panic bit when experiment is swapped out. * www/dbdefs.php3.in: Add "paniced" as an experiment state. Add a utility function TBExptFirewall() to see if experiment is firewalled. * www/panicbutton.php3: New web script to invoke the backend panic script mentioned above, after the usual confirm song and dance. * www/panicbutton.gif: New gif of a red panic button that I stole off the net. If anyone has sees/has a better one, feel free to replace this one. * utils/node_statewait.in: Add -s option so that I can pass in the state I want to wait for (used from tbswap above to wait for nodes to reach ISUP after power on).
-
- 14 Dec, 2004 1 commit
-
-
Robert Ricci authored
'power' command.
-
- 16 Nov, 2004 1 commit
-
-
Leigh B. Stoller authored
download images from the outer emulab. This script is invoked from frisbeelauncher when ELABINELAB=1 and the filename does not exist (thus attempting to get the image file before bailing). The frisbeeimage script uses a new method in the RPC server to fire up a frisbeed (using frisbeelauncher on the outer Emulab), subject to the usual permission checks against creator of the elabinelab experiment (I assume that the creator will have access to any outer images that are used inside the inner emulab). If outer frisbeelauncher succeeds, its return value is the load_address (IP:port), which is used to fire up a frisbee client to get the image file and write it out (using Mike's new -N option that just dumps the raw data to file). Once the image is downloaded, control returns to inner frisbeelauncher and proceeds as normal. I whacked this together pretty quickly. Under heavy usage it might hit a race condition or two, but I do not expect that to happen in an inner elab for a while.
-
- 15 Nov, 2004 1 commit
-
-
Leigh B. Stoller authored
* snmpit: When ElabInELabis true, use the routines in the new snmpit_remote.pm library for setting up and tearing down vlans for an experiment. At present, only these two operations are proxied out to the outer emulab. * snmpit_remote.pm: A new little library that uses the XMLPRC server on the outer emulab to setup and destroy vlans for an inner experiment. This code is used from snmpit (see above). * snmpit_lib.pm: A couple of minor changes for the server side of the proxy operation. * snmpit.proxy.in: A new perl module that is invoked from the RPC server. This proxy sets up and tears down vlans for an inner elab. The basic model is that the container experiment will have lots of vlans for various individual experiments running on the inner emulab. * swapexp: A couple of minor elabinelab hacks. * tbswap: For elabinelab experiments, reconfig/restart dhcpd when tearing down the experiment, and call out to new elabinelab script when setting up an elabinelab experiment. There is no provision for swapmod at this time. * elabinelab: A new script to create the inner emulab. Does all kinds of gross DB stuff then more gross stuff on the inner ops and boss.
-
- 12 Nov, 2004 1 commit
-
-
Robert Ricci authored
Contributed by Keith Sklower at Berkeley.
-
- 08 Oct, 2004 1 commit
-
-
Leigh B. Stoller authored
by more then just plab code.
-
- 30 Aug, 2004 1 commit
-
-
Leigh B. Stoller authored
* The per-experiment event scheduler now runs on ops instead of boss. Boss still runs elvind and uses events internally, but the user part of the event system has moved. * Part of the guts of eventsys_control moved to new script, eventsys.proxy, which runs on ops and fires off the event scheduler. The only tricky part of this is that the scheduler runs as the user, but killing it has to be done as root since a different person might swap out the experiment. So, the proxy is a perl wrapper invoked from a root ssh from boss, which forks, writes the pid file into /var/run/emulab/evsched/$pid_$eid.pid, then flips to the user and execs the event scheduler (which is careful not to fork). Obviously, if the kill is done as root, the pid file has to be stored someplace the user is not allowed to write. * The event scheduler has been rewritten to use Tim's C++ interface to the sshxmlrpc server on boss. Actually, I reorg'ed the scheduler so that it can be built either as a mysql client, or as RPC client. Note that it can also be built to use the SSL version of the XMLRPC server, but that will not go live until I finish the server stuff up. Also some goo for dealing with building the scheduler with C++. * Changes to several makefiles to install the ops binaries over NFS to /usr/testbed/opsdir. Makes life easier, but only if boss and ops are running the same OS. For now, using static linking on the event scheduler until ops upgraded to same rev as boss. * All of the event clients got little tweaks for dealing with the new CNAME for the event system server (event-sever). Will need to build new images at some point. Old images and clients will continue to work cause of an inetd hack on boss that uses netcat to transparently redirect elvind connections to ops. * Note that eventdebug needs some explaining. In order to make the inetd redirect work, elvind cannot be listening on the standard port. So, the boss event system uses an alternate port since there are just a few subsystems on boss that use the server, and its easy to propogate changes on boss. Anyway, the default for eventdebug is to connect to the standard port on localhost, which means it will work as expected on ops, but will require -b argument on boss. * Linktest changes were slightly more involved. No longer run linktest on boss when called from the experiment swapin path, but ssh over to ops to fire it off. This is done as the user of course, and there are some tricks to make it possible to kill a running linktest and its ssh when experiment swapin is canceled (or from the command line) by forcing allocation of a tty. I will probably revisit this at some point, but I did not want to spend a bunch of time on linktest. * The upgrade path detailed in doc/UPDATING is necessarily complicated and bound to cause consternation at remote sites doing an upgrade.
-
- 09 Aug, 2004 1 commit
-
-
Leigh B. Stoller authored
supporting both a shell script driven interface, plus the newer XMLRPC interface. This change removes the script driven interface from boss, replacing it with just the XMLRPC interface. Since we like to maintain backwards compatability with interfaces we have advertised to users (and which we know are being used), I have implemented a script wrapper that exports the same interface, but which converts the operations into XMLRPC requests to the server. This wrapper is written in python and uses our locally grown xmlrpc-over-ssh library. Like the current "demonstation" client, you can take this wrapper to your machine that has python and ssh installed, and use it there; you do not need to use these services from just users.emulab.net. Other things to note: * The wrapper is a single python script that has a "class" for each wrapped script. Running the wrapper without any arguments will list all of the operations it supports. You can invoke the wrapper with the operation as its argument: {987} stoller$ script_wrapper.py swapexp --help swapexp -e pid,eid in|out swapexp pid eid in|out where: -w - Wait for experiment to finish swapping -e - Project and Experiment ID in - Swap experiment in (must currently be swapped out) out - Swap experiment out (must currently be swapped in) Wrapper Options: --help Display this help message --server Set the server hostname --login Set the login id (defaults to $USER) --debug Turn on semi-useful debugging But more convenient is to create a set of symlinks so that you can just invoke the operation by its familiar scriptname. This is what I have done on users.emulab.net. {987} stoller$ /usr/tesbed/bin/swapexp --help swapexp -e pid,eid in|out swapexp pid eid in|out * For those of you talking directly to the RPC server from python, I have added a wrapper class so that you can issue requests to any of the modules from a single connection. Instead using /xmlrpc/modulename, you can use just /xmlrpc, and use method names of the form experiment.swapexp, node.reboot, etc. Tim this should be useful for the netlab client which I think opens up multiple ssh connections? * I have replaced the paperbag shell with a stripped down xmlrpcbag shell that is quite a bit simpler since we no longer allow access to anything but the RPC server. No interactive mode, no argument processing, no directory changing, etc. My main reason for reworking the bag is to make it easier to understand, maintain, and verify that it is secure. The new bag also logs all connections to syslog (something we should have done in the orginal). I also added some setrlimit calls (core, maxcpu). I also thought about niceing the server down, but that would put RPC users at a disadvantage relative to web interface users. When we switch the web interface to use the XMLRPC backend, we can add this (reniceing from the web server would be a pain cause of its scattered implementation).
-
- 28 Jul, 2004 1 commit
-
-
Leigh B. Stoller authored
NFS, convert to using a proxy that runs on ops, which does the copying locally.
-
- 01 Jun, 2004 1 commit
-
-
Leigh B. Stoller authored
-
- 26 Apr, 2004 1 commit
-
-
Mike Hibler authored
1. "make clean" will just remove stuff built in the process of a regular build 2. "make distclean" will also clean out configure generated files. This is how it was always supposed to be, there was just some bitrot.
-
- 07 Apr, 2004 1 commit
-
-
Leigh B. Stoller authored
-
- 08 Mar, 2004 1 commit
-
-
Leigh B. Stoller authored
splitting the existing code between a frontend script that parses arguments and does taint checking, and a backend library where all the work is done (including permission checks). The interface to the libraries is simple right now (didn't want to spend a lot of time on designing interface without knowing if the approach would work long term). use libreboot; use libosload; nodereboot(\%reboot_args, \%reboot_results); osload(\%reload_args, \%reload_results); Arguments are passed to the libraries in the form of a hash. For example, in os_setup: $reload_args{'debug'} = $dbg; $reload_args{'asyncmode'} = 1; $reload_args{'imageid'} = $imageid; $reload_args{'nodelist'} = [ @nodelist ]; Results are passed back both as a return code (-1 means total failure right away, while a positive argument indicates the number of nodes that failed), and in the results hash which gives the status for each individual node. At the moment it is just success or failure (0 or 1), but in the future might be something more meaningful. os_setup can now find out about individual failures, both in reboot and reload, and alter how it operates afterwards. The main thing is to not wait for nodes that fail to reboot/reload, and to terminate with no retry when this happens, since at the moment it indicates an unusual failure, and it is better to terminate early. In the past an os_load failure would result in a tbswap retry, and another failure (multiple times). I have already tested this by trying to load images that have no file on disk; it is nice to see those failures caught early and the experiment failure to happen much quicker! A note about "asyncmode" above. In order to promote parallelism in os_setup, asyncmode tells the library to fork off a child and return immediately. Later, os_setup can block and wait for status by calling back into the library: my $foo = nodereboot(\%reboot_args, \%reboot_results); nodereboot_wait($foo); If you are wondering how the child reports individual node status back to the parent (so it can fill in the results hash), Perl really is a kitchen sink. I create a pipe with Perl's pipe function and then fork a child to so the work; the child writes the results to the pipe (status for each node), and the parent reads that back later when nodereboot_wait() is called, moving the results into the %reboot_results array. The parent meanwhile can go on and in the case of os_setup, make more calls to reboot/reload other nodes, later calling the wait() routines once all have been initiated. Also worth noting that in order to make the libraries "reentrant" I had to do some cleaning up and reorganizing of the code. Nothing too major though, just removal of lots of global variables. I also did some mild unrelated cleanup of code that had been run over once too many times with a tank. So how did this work out. Well, for os_setup/os_load it works rather nicely! node_reboot is another story. I probably should have left it alone, but since I had already climbed the curve on osload, I decided to go ahead and do reboot. The problem is that node_reboot needs to run as root (its a setuid script), which means it can only be used as a library from something that is already setuid. os_setup and os_load runs as the user. However, having a consistent library interface and the ability to cleanly figure out which individual nodes failed, is a very nice thing. So I came up with a suitable approach that is hidden in the library. When the library is entered without proper privs, it silently execs an instance of node_reboot (the setuid script), and then uses the same trick mentioned above to read back individual node status. I create the pipe in the parent before the exec, and set the no-close-on-exec flag. I pass the fileno along in an environment variable, and the library uses that to the write the results to, just like above. The result is that os_setup sees the same interface for both os_load and node_reboot, without having to worry that one or the other needs to be run setuid.
-
- 12 Feb, 2004 1 commit
-
-
Leigh B. Stoller authored
no reason for the separation for a long time, and it made maintence more difficult cause of duplication between batchexp and startexp (batch was the sole user of startexp). Cleaner solution. * Check argument processing for batchexp, swapexp, endexp to make sure the taint checks are correct. All three of these scripts will now be available from ops. I especially watch the filename processing, which was pretty loose before and could allow some to grab a file on boss by trying to use it as an NS file (scripts all runs as user of course). The web interface generates filenames that are hard to guess, so rather then wrapping these scripts when invoked from ops, just allow the usual paths (/proj, /groups, /users) but also /tmp/$uid-XXXXXX.nsfile pattern, which should be hard enough to guess that users will not be able to get anything they are not supposed to. * Add -w (waitmode) options to all three scripts. In waitmode, the backend detaches, but the parent remains waiting for the child to finish so it can exit with the appropriate status (for scripting). The user can interrupt (^C), but it has no effect on the backend; it just kills the parent side that is waiting (backend is in a new session ID). Log outout still goes to the file (available from web page) and is emailed.
-
- 10 Feb, 2004 1 commit
-
-
Kirk Webb authored
-
- 02 Feb, 2004 1 commit
-
-
Robert Ricci authored
is broken right now. Told Jon I'd do this so that he can check in - so that we have snapshots of his code, but he doesn't have to worry about breaking the build (yet.)
-
- 15 Jan, 2004 1 commit
-
-
Leigh B. Stoller authored
imageid. Uses new slot in the images table (frisbee_pid) to track running frisbee daemon for an image so that it can be killed from create-image (kill before creating new image) and from the web page before deleting an imageid.
-
- 16 Dec, 2003 2 commits
-
-
Shashi Guruprasad authored
ns2ir and nseparse to chmod the directories with the right permissions.
-
Leigh B. Stoller authored
-
- 15 Dec, 2003 1 commit
-
-
Shashi Guruprasad authored
now mapped to more than one PC if required. The simnode_capacity column in the node_types table determines how many sim nodes can be packed on one PC. The packing factor can also be controlled via tb-set-colocate-factor to be smaller than simnode_capacity. - No frontend code changes. To summarize: $ns make-simulated { ... } is still the easy way to put a whole bunch of Tcl code to be in simulation. One unrelated fix in the frontend code is to fix the xmlencode() function which prior to this would knock off newlines from columns in the XML output. This affected nseconfigs since it is one of the few columns with embedded newlines. Also changed the event type and event object type in traffic.tcl from TRAFGEN/MODIFY to NSE/NSEEVENT. - More Tcl code in a new directory tbsetup/nseparse -> Runs on ops similar to the main parser. This is invoked from assign_wrapper in the end if there are simnodes -> Partitions the Tcl code into multiple Tcl specifications and updates the nseconfigs table via xmlconvert -> Comes with a lot of caveats. Arbitrary Tcl code such as user specified objects or procedures will not be re-generated. For example, if a user wanted a procedure to be included in Tcl code for all partitions, there is no way for code in nseparse to do that. Besides that, it needs to be tested more thoroughly. - xmlconvert has a new option -s. When invoked with this option, the experiments table is not allowed to be modified. Also, virtual tables are just updated (as opposed to deleting all rows in the first invocation before inserting new rows) - nse.patch has all the IP address related changes committed in iversion 1.11 + 2 other changes. 1) MTU discovery support in the ICMP agent 2) "$ns rlink" mechanism for sim node to real node links - nseinput.tcl includes several client side changes to add IP routes in NSE and the kernel routing table for packets crossing pnodes. Also made the parsing of tmcc command output more robust to new changes. Other client side changes in libsetup.pm and other scripts to run nse, are also in this commit - Besides the expected changes in assign_wrapper for simulated nodes, the interfaces and veth_interfaces tables are updated with routing table identifiers (rtabid). The tmcd changes are already committed. This field is used only by sim hosts on the client side. Of course, they can be used by jails as well if desired.
-
- 02 Dec, 2003 1 commit
-
-
Jonathon Duerig authored
-
- 01 Dec, 2003 1 commit
-
-
Robert Ricci authored
The idea is to give us hooks for grabbing experimenters' tarballs (and RPMs) from locations other than files on ops. Mainly, to remove another dependance on users having shells on ops. tarfiles_setup supports fetching files from http and ftp URLs right now, through wget. It places them into the experiment directory, so that they'll go away when the experiment is terminated, and the rest of the chain (ie. downloading to clients and os_setup's checks) remains unchaged. It is now tarfiles_setup's job to copy tarballs and RPMs from the virt_nodes table to the nodes table for allocated nodes. This way, it can translate URLs into the local filenames it constructs. It get invoked from tbswap. Does the actual fetching over on ops, running as the user, with fetchtar.proxy. Should be idempotent, so we should be able to give the user a button to run webtarfiles_setup (none exists yet) yet to 'freshen' their tarballs. (We'd also have to somehow let the experiment's nodes know they need to re-fetch their tarballs.) One funny side effect of this is that the separator in virt_nodes.tarfiles is now ';' instead of ':' like nodes.tarballs, since we can now put URLs in the former. Making these consistent is a project for another day.
-
- 18 Nov, 2003 1 commit
-
-
Shashi Guruprasad authored
-
- 13 Nov, 2003 1 commit
-
-
Leigh B. Stoller authored
-
- 09 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
* install-rpm, install-tarfile, spewrpmtar.php3, spewrpmtar.in: Pumped up even more! The db file we store in /var/db now records both the timestamp (of the file, or if remote the install time) and the MD5 of the file that was installed. Locally, we can get this info when accessing the file via NFS (copymode on or off). Remote, we use wget to get the file, and so pass the timestamp along in the URL request, and let spewrpmtar.in determine if the file has changed. If the timestamp it gets is >= to the timestamp of the file, an error code of 304 (Not Modifed) is returned. Otherwise the file is returned. If the timestamps are different (remote, server sends back an actual file), the MD5 of the file is compared against the value stored. If they are equal, update the timestamp in the db file to avoid repeated MD5s (or server downloads) in the future. If the MD5 is different, then reinstall the tarball or rpm, and update the db file with the new timestamp and MD5. Presto, we have auto update capability! Caveat: I pass along the old MD5 in the URL, but it is currently ignored. I do not know if doing the MD5 on the server is a good idea, but obviously it is easy to add later. At the moment it happens on the node, which means wasted bandwidth when the timestamp has changed, but the file has not (probably not something that will happen in typical usage). Caveat: The timestamp used on remote nodes is the time the tarfile is installed (GM time of course). We could arrange to return the timestamp of the local file back to the node, but that would mean complicating the protocol (or using an http header) and I was not in the mood for that. In typical usage, I do not think that people will be changing tarfiles and rpms so rapidly that this will make a difference, but if it does, we can change it. * node_update.in, client side watchdog, and various web pages: Deflated node_update, removing all of the older ssh code. We now assume that all nodes will auto update on a periodic basis, via the watchdog that runs on all client nodes, including plab nodes. Changed the permission check to look for new UPDATE permission (used to be UPDATEACCOUNT). As before, it requires local_root or better. The reason for this is that node_update now implies more than just updating the accounts/mounts. The web pages have been changed to explain that in addition to mounts/accounts, rpms and tarfiles will also be updated. At the moment, this is still tied to a single variable (update_accounts) in the nodes table, but as Kirk requested at the meeting, it will probably be nice to split these out in the future. Added the ability to node_update a single node in an experiment (in addition to all nodes option on the showexp page). This has been added to the shownode webpage menu options. Changed locking code to use the newer wrapper states, and to move the experiment to RUNNING_LOCKED until the update completes. This is to prevent mayhem in the rest of the system (which could be dealt with, but is not worth the trouble; people have to wait until their initiated update is complete, before they can swap out the experiment). Added "short" mode to shownode routine, equiv to the recently added short mode for showexp. I use this on the confirmation page for updating a single node, giving the user a couple of pertinent (feel good) facts before they comfirm.
-
- 25 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
-
- 18 Sep, 2003 1 commit
-
-
Robert Ricci authored
tree gets installed. I've set it for our default defs, and for all the *-emulab devel defs files.
-
- 22 Aug, 2003 3 commits
-
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Austin Clements authored
-
- 06 Aug, 2003 1 commit
-
-
Mac Newbold authored
-
- 28 Jul, 2003 1 commit
-
-
Robert Ricci authored
database yet, so using node_reboot on them would be catastrophic. Uses the special newnode MFS's ssh key. Also, when a node has booted or the first time, it may be up on a temporary IP address rather than its permanent one, so we pass the node's IP rather than node_id on the command line. Only tries ssh.
-
- 25 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
tbswap to use this version inside the testbed project only! All other projects will see the old version for now; there are just too many things to test, and the testsuite gets just a fraction of them. Some highlights (which I will expand on later when I commit this version to the main version): * New -t option to create the TOP file, and then exit. The only other side effect of this is to update the min/max nodes for the experiment in the DB, unles new option -n (impotent mode) is given. * New -n option to operate in impotent mode; do not allocate nodes and do not modify the DB. Okay, so this option is not as great as it sounds. I eventually hit the point of diminishing returns, with trying to make things work right without DB modification. At some point I just throw in the towel and exit. This currently happens after interpolating the link results of assign. But, I have found it very useful, and could get better with time. Being able to run assign on the main DB without sucking up the nodes is nice for debugging. * Lots of data structure organization, mostly on the virtual topology side of assign (you can think of assign as two sections, the part that interprets the DB tables and creates the TOP file, and the part that reads the results of assign and sets up all the physical stuff in the DB). I removed numerous global hashes, and combined them into aggregate data structures, such as they are in Perl. My approach for this was to read the tables from the DB, and keep them handy, extending them as needed with stuff that assign_wrapper generates as it proceeds. This has the side effect of cutting down on the number of queries as well. The next task is to do the physical side reorg, but not up for that yet.
-
- 14 Jul, 2003 1 commit
-
-
Robert Ricci authored
-
- 08 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
has been working on minibed okay for a while. In other words, the new parser really is going to be installed on mainbed, any moment!
-
- 30 Jun, 2003 1 commit
-
-
Leigh B. Stoller authored
do the actual parse. The parser now spits out XML instead of DB queries, and the wrapper on boss converts that to DB insertions after verification. There are some makefile changes as well to install the new parser on ops via NFS, since otherwise the parser could intolerably out of date on ops!
-
- 21 Apr, 2003 1 commit
-
-
Robert Ricci authored
the experimental switches. The idea is to be able to auto-detect where a node has been plugged in, so that we fill out the wires table without any manual intervention! This is a step towards being able to automate the adding of nodes. Has a runtime linear in the number of VLANs on the experimental switches, so it should run pretty fast on a new testbed, but can be kinda slow on, say, ours.
-
- 16 Apr, 2003 1 commit
-
-
Leigh B. Stoller authored
experiment, rather than as an administrator, which presents group permission problems when the experiment is in a subgroup (requires two additional group, whereas suexec adds only one group). That aside, the correct approach is to run the swap as the creator. To do that, must flip to the user (from the admin person) in the backend using the new idleswap script, and then run the normal swapexp. Add new option to swapexp (-i) which changes the email slightly to make it clear that the experiment was idleswapped, and so that the From: is tbops not the user (again, to make it more clear).
-
- 04 Apr, 2003 1 commit
-
-
Chad Barb authored
tbswapin and tbswapout are no more.
-