- 23 Mar, 2004 3 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
-
Mike Hibler authored
reinstall remote tarballs everytime we change our mind. Also, make sure rusage function processes updates.
-
- 20 Mar, 2004 1 commit
-
-
Mike Hibler authored
1. make sure we send an immediate isalive on startup rather than waiting for the first interval to pass 2. don't fall back on a tcp call on plab nodes, this just eventually hangs all our tmcds on flaky plab machines
-
- 18 Mar, 2004 1 commit
-
-
Mike Hibler authored
- check intervals driven by sitevars delivered by TMCD command - handles rusage stats return on plab nodes It is now a single process and executes any auxilliary scripts synchronously. This may prove to be unwieldy in the face of long running scripts like update. If so, we'll have to add all that fork/exec/waitpid mucky-muck.
-
- 15 Mar, 2004 1 commit
-
-
Mike Hibler authored
has changed. Not a big deal when updates are 12 hours apart, but if we shorten the interval, it will help keep gratuitous updates down. Put a timestamp in the log when watchdog starts.
-
- 17 Feb, 2004 1 commit
-
-
Leigh B. Stoller authored
this was to add soft reconfig support so that nodes could be reconfigured without having to reboot them. This appears to work, and has been tested with jails getting moved around. I've also tested the new code on the MFS, but still no testing has been done on PLAB nodes. The main change is that most of the code moved out of libsetup.pm, and was split into constituent rc scripts, each of which does its own thing, including cleaning up and preparing for making an image. Most of that central knowledge has been moved out into the scripts. Still more to do but this was a good start.
-
- 10 Jan, 2004 1 commit
-
-
Kirk Webb authored
-
- 26 Dec, 2003 1 commit
-
-
Leigh B. Stoller authored
(slivers). Account updates take longer of course, but thats okay for now.
-
- 09 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
* install-rpm, install-tarfile, spewrpmtar.php3, spewrpmtar.in: Pumped up even more! The db file we store in /var/db now records both the timestamp (of the file, or if remote the install time) and the MD5 of the file that was installed. Locally, we can get this info when accessing the file via NFS (copymode on or off). Remote, we use wget to get the file, and so pass the timestamp along in the URL request, and let spewrpmtar.in determine if the file has changed. If the timestamp it gets is >= to the timestamp of the file, an error code of 304 (Not Modifed) is returned. Otherwise the file is returned. If the timestamps are different (remote, server sends back an actual file), the MD5 of the file is compared against the value stored. If they are equal, update the timestamp in the db file to avoid repeated MD5s (or server downloads) in the future. If the MD5 is different, then reinstall the tarball or rpm, and update the db file with the new t...
-
- 06 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
code that was in libsetup has moved into this library, and underwent a giant cleaning and pumping up. The interface from your typical perl script now looks like this: use libtmcc; if (tmcc(TMCCCMD_STATUS, "optional arguments", \@tmccresults) < 0) { warn("*** WARNING: Could not get status from server!\n"); return -1; } foreach my $me (@tmccresults) { print "bite $me"; } The arguments and results are optional values. There is a fourth optional value that is a hash of config options (basically converted to command line switches passed to tmcc). For example, to set the timeout on an individual call, pass a fourth argument like: ("timeout" => 5) There is also a way to set global options so that all subsequent tmcc calls are affected: configtmcc("timeout", 5); I'll probably clean this up a bit to avoid the direct strings. The result list is a list of strings. Since we are trending away from using tmcc to transfer large amounts of data, I think this is okay. * A new tmcc.pl which does little more than load libtmcc and use it. This will become the new tmcc, with the existing C version becoming a backend binary for it. * All of the perl scripts in tmcd have been changed to use the new library. I left the few uses of tmcc in shell scripts alone since they were of the simple variety (mostly "state" command). * And again, if you have read this far, you will learn why I bothered with all this. Well, the existing code was really bad and it was getting out of control. Sort of like a squid that was getting harder to control as its rotting tenticles slithered into more and more scripts. Anyway ... More important, my goal is to use the libtmcc library to add caching. I have not worked out the details yet, but I am envisioning a configuration file, perhaps generated initially by tmcd, of all of the config values. If the library finds that file, it sucks the info out of the file instead of going to tmcd. Eventually, this config file would be generated as part of experiment swapping and stored in the DB, but thats a longer term project, and perhaps orthogonal (how we fill the cache is not as important as adding the ability to use a cache, right?). Note that certain operations (like "state" and "ready") are flagged by the library to always bypass the "cache".
-
- 03 Sep, 2003 1 commit
-
-
Mike Hibler authored
-
- 18 Aug, 2003 1 commit
-
-
Austin Clements authored
Jail changes.
-
- 05 Aug, 2003 1 commit
-
-
Leigh B. Stoller authored
* Parser: Added new tb command to set the name of the sync server: tb-set-sync-server <node> This initializes the sync_server slot of the experiment entry to the *vname* of the node that should run the sync server for that experiment. In other words, the sync server is per-experiment, runs on a node in the experiment, and the user gets to chose which node it runs on. * tmcd and client side setup. Added new syncserver command which returns the name of the syncserver and whether the requesting node is the lucky one to run the daemon: SYNCSERVER SERVER='nodeG.syncserver.testbed.emulab.net' ISSERVER=1 The name of the syncserver is written to /var/emulab/boot/syncserver on the nodes so that clients can easily figure out where the server is. Aside: The ready bits are now ignored (no DB accesses are made) for virtual nodes; they are forced to use the new sync server. * New os/syncd directory containing the daemon and the client. The daemon is pretty simple. It waits for TCP (and UDP, although that path is not complete yet) connections, and reads in a little structure that gives the name of the "barrier" to wait for, and an optional count of clients in the group (this would be used by the "master" who initializes barriers for clients). The socket is saved (no reply is made, so the client is blocked) until the count reaches zero. Then all clients are released by writting back to the sockets, and the sockets are closed. Obviously, the number of clients is limited by the numbed of FDs (open sockets), hence the need for a UDP variant, but that will take more work. The client has a simple command line interface: usage: emulab-sync [options] -n <name> Optional barrier name; must be less than 64 bytes long -d Turn on debugging -s server Specify a sync server to connect to -p portnum Specify a port number to connect to -i count Initialize named barrier to count waiters -u Use UDP instead of TCP The client figures out the server by looking for the file created above by libsetup (/var/emulab/boot/syncserver). If you do not specify a barrier "name", it uses an internal default. Yes, the server can handle multiple barriers (differently named of course) at once (non-overlapping clients obviously). Clients can wait before a barrier in "initialized." The count on the barrier just goes negative until someone initializes the barrier using the -i option, which increments the count by the count. Therefore, the master does not have to arrange to get there "first." As an example, consider a master and one client: nodeA> /usr/local/etc/emulab/emulab-sync -n mybarrier nodeB> /usr/local/etc/emulab/emulab-sync -n mybarrier -i 1 Node A waits until Node B initializes the barrier (gives it a count). The count is the number of *waiters*, not including the master. The master is also blocked until all of the waiters have checked in. I have not made an provision for timeouts or crashed clients. Lets see how it goes.
-
- 05 Jun, 2003 1 commit
-
-
Leigh B. Stoller authored
script both inside and outside the jail.
-
- 30 Jan, 2003 1 commit
-
-
Leigh B. Stoller authored
-
- 06 Jan, 2003 1 commit
-
-
Leigh B. Stoller authored
ntpdrift values reported from most nodes. Do not report ntpdrift from the MFS either. Clean up pidfile stuff, and add better "stop" mode for shutdown.
-
- 18 Dec, 2002 2 commits
-
-
Leigh B. Stoller authored
replace it eventually). Like the ron nodes, local nodes will now periodically (once every 5 minutes) send a udp packet to boss to indicate the node is alive and to see if it needs to check for account updates. This will replace the once every 5 minute fping we do from db/node_status (once I whack that script), and will simplify the existing problem of propogating accounts to nodes (nodes down, nodes in the swapping phase, etc).
-
Leigh B. Stoller authored
rc.conf: Remove fixed -p argument. Now set by mkjail. rc.local,jailctl: Update for client side path reorg and cleanup. jaildog.pl,mkjail.pl: Numerous fixes for jailed nodes.
-
- 26 Sep, 2002 1 commit
-
-
Leigh B. Stoller authored
well (requires new libsetup.pm to be installed too). Still a work in progress.
-
- 27 Aug, 2002 1 commit
-
-
Leigh B. Stoller authored
on the RON nodes and in the new widearea image.
-
- 15 Aug, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 29 Jul, 2002 1 commit
-
-
Leigh B. Stoller authored
client software to widearea nodes. Most of these changes were to reduce the embarrassment factor. At some point we need a proper autoconf and such, but for now there is a makefile in the src dir for creating the distribution. I've tested it on a local linux node and mostly on a freebsd node, but I've moved things around and so updating the RON nodes will require some hand intervention by me at some point.
-
- 10 Jul, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 07 Jul, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 02 Jul, 2002 2 commits
-
-
Robert Ricci authored
-
Leigh B. Stoller authored
fail (usually cause of network dropout), continue to try every minute until it suceeds.
-
- 19 Jun, 2002 1 commit
-
-
Leigh B. Stoller authored
well. We use a UDP packet to keep it lightweight. If it does not get through, thats okay, obviously. The return value is just a yes/no flag that says an update needs to run. Right now, thats just accounts. This allows us to churn a little less on accounts. Other cleanups.
-
- 06 Jun, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 31 May, 2002 1 commit
-
-
Leigh B. Stoller authored
watchdog at the moment, but it will be. Right now it does boot time stuff; issues tmcc state event so the testbed knows (REBOOTED), does an account update to get any accounts missed while dead, then sets up and vnodes (tunnels and such) that where supposed to be running on the node, then issues a tmcc ISUP event. After that, goes into a loop doing periodic account update. At some point it would be good to look for stale vnodes (that could not be torn down because of network connectivity problems), but there are some race conditions that I need to work out first.
-