- 10 Oct, 2003 7 commits
-
-
Mac Newbold authored
-
Mac Newbold authored
-
Robert Ricci authored
they mean.
-
Leigh B. Stoller authored
www tree.
-
Mike Hibler authored
-
Mac Newbold authored
-
Mac Newbold authored
model of waiting for state changes. Before we were watching the database (which means we can only watch for terminal/stable/long-lived states, and have to poll the db). Now things that are waiting for states to change become event listeners, and watch the stream of events flow by, and don't have to do any polling. They can now watch for any state, and even sequences of states (ie a Shutdown followed by an Isup). To do this, there is now a cool StateWait.pm library that encapsulates the functionality needed. To use it, you call initStateWait before you start the chain of events (ie before you call node reboot). Then do your stuff, and call waitForState() when you're ready to wait. It can be told to return periodically with the results so far, and you can cancel waiting for things. An example program called waitForState is in testbed/event/stated/ , and can also be used nicely as a command line tool that wraps up the library functionality. This also required the introduction of a TBFAILED event that can be sent when a node isn't going to make it to the state that someone may be waiting for. Ie if it gets wedged coming up, and stated retries, but eventually gives up on it, it sends this to let things know that the node is hozed and won't ever come up. Another thing that is part of this is that node_reboot moves (back) to the fully-event-driven model, where users call node reboot, and it does some checks and sends some events. Then stated calls node_reboot in "real mode" to actually do the work, and handles doing the appropriate retries until the node either comes up or is deemed "failed" and stated gives up on it. This means stated is also the gatekeeper of when you can and cannot reboot a node. (See mail archives for extensive discussions of the details.) A big part of the motivation for this was to get uninformed timeouts and retries out of os_load/os_setup and put them in stated where we can make a wiser choice. So os_load and os_setup now use this new stuff and don't have to worry about timing out on nodes and rebooting. Stated makes sure that they either come up, get retried, or fail to boot. tbrestart also underwent a similar change.
-
- 09 Oct, 2003 8 commits
-
-
Mike Hibler authored
We have a few more sources changes then they do, so we cannot just use it.
-
Mike Hibler authored
the closest match.
-
Mike Hibler authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
* install-rpm, install-tarfile, spewrpmtar.php3, spewrpmtar.in: Pumped up even more! The db file we store in /var/db now records both the timestamp (of the file, or if remote the install time) and the MD5 of the file that was installed. Locally, we can get this info when accessing the file via NFS (copymode on or off). Remote, we use wget to get the file, and so pass the timestamp along in the URL request, and let spewrpmtar.in determine if the file has changed. If the timestamp it gets is >= to the timestamp of the file, an error code of 304 (Not Modifed) is returned. Otherwise the file is returned. If the timestamps are different (remote, server sends back an actual file), the MD5 of the file is compared against the value stored. If they are equal, update the timestamp in the db file to avoid repeated MD5s (or server downloads) in the future. If the MD5 is different, then reinstall the tarball or rpm, and update the db file with the new timestamp and MD5. Presto, we have auto update capability! Caveat: I pass along the old MD5 in the URL, but it is currently ignored. I do not know if doing the MD5 on the server is a good idea, but obviously it is easy to add later. At the moment it happens on the node, which means wasted bandwidth when the timestamp has changed, but the file has not (probably not something that will happen in typical usage). Caveat: The timestamp used on remote nodes is the time the tarfile is installed (GM time of course). We could arrange to return the timestamp of the local file back to the node, but that would mean complicating the protocol (or using an http header) and I was not in the mood for that. In typical usage, I do not think that people will be changing tarfiles and rpms so rapidly that this will make a difference, but if it does, we can change it. * node_update.in, client side watchdog, and various web pages: Deflated node_update, removing all of the older ssh code. We now assume that all nodes will auto update on a periodic basis, via the watchdog that runs on all client nodes, including plab nodes. Changed the permission check to look for new UPDATE permission (used to be UPDATEACCOUNT). As before, it requires local_root or better. The reason for this is that node_update now implies more than just updating the accounts/mounts. The web pages have been changed to explain that in addition to mounts/accounts, rpms and tarfiles will also be updated. At the moment, this is still tied to a single variable (update_accounts) in the nodes table, but as Kirk requested at the meeting, it will probably be nice to split these out in the future. Added the ability to node_update a single node in an experiment (in addition to all nodes option on the showexp page). This has been added to the shownode webpage menu options. Changed locking code to use the newer wrapper states, and to move the experiment to RUNNING_LOCKED until the update completes. This is to prevent mayhem in the rest of the system (which could be dealt with, but is not worth the trouble; people have to wait until their initiated update is complete, before they can swap out the experiment). Added "short" mode to shownode routine, equiv to the recently added short mode for showexp. I use this on the confirmation page for updating a single node, giving the user a couple of pertinent (feel good) facts before they comfirm.
-
Mac Newbold authored
-
Mac Newbold authored
-
- 08 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
this page is open to the world.
-
- 07 Oct, 2003 9 commits
-
-
Robert Ricci authored
second argument so that you can pass ($pid,$gid) when showing a group's experiments. But, it has a default value, so you don't have to around around passing a superfluous second argument for showing user or project experiments.
-
Robert Ricci authored
-
Mac Newbold authored
-
Leigh B. Stoller authored
-
Robert Ricci authored
directory, symlink in /proj/<pid> .
-
Leigh B. Stoller authored
nodes it can). Change exit value; return -1 on fatal error, otherwise return the number of nodes that could not be allocated. Combined with the -p switch, assign_wrapper can easily determine that nalloc was able to reserve a subset of the nodes. Also fix up getopts() call, which had its arguments backwards! Good thing we hardly pass switches to nalloc.
-
Leigh B. Stoller authored
Also add constant for ROLE=gw.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
- 06 Oct, 2003 10 commits
-
-
Robert Ricci authored
take into the account the fact that it works now for 55xx series switches.
-
Leigh B. Stoller authored
"fullconfig" command to cycle through the list of existing commands and spit out a section for each. Not all of them of course; just a subset that makes sense. I did not that mounts are a bit of a problem cause of the USESFS argument. Not sure what to do yet.
-
Robert Ricci authored
where they are okay.
-
Leigh B. Stoller authored
code that was in libsetup has moved into this library, and underwent a giant cleaning and pumping up. The interface from your typical perl script now looks like this: use libtmcc; if (tmcc(TMCCCMD_STATUS, "optional arguments", \@tmccresults) < 0) { warn("*** WARNING: Could not get status from server!\n"); return -1; } foreach my $me (@tmccresults) { print "bite $me"; } The arguments and results are optional values. There is a fourth optional value that is a hash of config options (basically converted to command line switches passed to tmcc). For example, to set the timeout on an individual call, pass a fourth argument like: ("timeout" => 5) There is also a way to set global options so that all subsequent tmcc calls are affected: configtmcc("timeout", 5); I'll probably clean this up a bit to avoid the direct strings. The result list is a list of strings. Since we are trending away from using tmcc to transfer large amounts of data, I think this is okay. * A new tmcc.pl which does little more than load libtmcc and use it. This will become the new tmcc, with the existing C version becoming a backend binary for it. * All of the perl scripts in tmcd have been changed to use the new library. I left the few uses of tmcc in shell scripts alone since they were of the simple variety (mostly "state" command). * And again, if you have read this far, you will learn why I bothered with all this. Well, the existing code was really bad and it was getting out of control. Sort of like a squid that was getting harder to control as its rotting tenticles slithered into more and more scripts. Anyway ... More important, my goal is to use the libtmcc library to add caching. I have not worked out the details yet, but I am envisioning a configuration file, perhaps generated initially by tmcd, of all of the config values. If the library finds that file, it sucks the info out of the file instead of going to tmcd. Eventually, this config file would be generated as part of experiment swapping and stored in the DB, but thats a longer term project, and perhaps orthogonal (how we fill the cache is not as important as adding the ability to use a cache, right?). Note that certain operations (like "state" and "ready") are flagged by the library to always bypass the "cache".
-
Robert Ricci authored
switches have some really low MAC timeouts!
-
Robert Ricci authored
many links to slip by unnoticed.
-
Leigh B. Stoller authored
change.
-
Mac Newbold authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
the DB, when doing main install. Also updated sql/database-fill.sql while I was at it.
-
- 03 Oct, 2003 5 commits
-
-
Robert Ricci authored
-
Robert Ricci authored
are doing their own firewalling can leave them open.
-
Mike Hibler authored
program, it only took a couple of hours. Heavily tested: it didn't core dump examining my 20GB FAT32 partition, ship it! Actually, I did imagezip/imageunzip a FAT12 DOS floppy. Since imagezip files are a minimum of 1MB (the chunk size), it is probably not practical for saving 1.4MB floppies :-) Also, updated the man page.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-