- 08 Mar, 2004 3 commits
-
-
Leigh B. Stoller authored
splitting the existing code between a frontend script that parses arguments and does taint checking, and a backend library where all the work is done (including permission checks). The interface to the libraries is simple right now (didn't want to spend a lot of time on designing interface without knowing if the approach would work long term). use libreboot; use libosload; nodereboot(\%reboot_args, \%reboot_results); osload(\%reload_args, \%reload_results); Arguments are passed to the libraries in the form of a hash. For example, in os_setup: $reload_args{'debug'} = $dbg; $reload_args{'asyncmode'} = 1; $reload_args{'imageid'} = $imageid; $reload_args{'nodelist'} = [ @nodelist ]; Results are passed back both as a return code (-1 means total failure right away, while a positive argument indicates the number of nodes that failed), and in the results hash which gives the status for each individual node. At the moment it is just success or failure (0 or 1), but in the future might be something more meaningful. os_setup can now find out about individual failures, both in reboot and reload, and alter how it operates afterwards. The main thing is to not wait for nodes that fail to reboot/reload, and to terminate with no retry when this happens, since at the moment it indicates an unusual failure, and it is better to terminate early. In the past an os_load failure would result in a tbswap retry, and another failure (multiple times). I have already tested this by trying to load images that have no file on disk; it is nice to see those failures caught early and the experiment failure to happen much quicker! A note about "asyncmode" above. In order to promote parallelism in os_setup, asyncmode tells the library to fork off a child and return immediately. Later, os_setup can block and wait for status by calling back into the library: my $foo = nodereboot(\%reboot_args, \%reboot_results); nodereboot_wait($foo); If you are wondering how the child reports individual node status back to the parent (so it can fill in the results hash), Perl really is a kitchen sink. I create a pipe with Perl's pipe function and then fork a child to so the work; the child writes the results to the pipe (status for each node), and the parent reads that back later when nodereboot_wait() is called, moving the results into the %reboot_results array. The parent meanwhile can go on and in the case of os_setup, make more calls to reboot/reload other nodes, later calling the wait() routines once all have been initiated. Also worth noting that in order to make the libraries "reentrant" I had to do some cleaning up and reorganizing of the code. Nothing too major though, just removal of lots of global variables. I also did some mild unrelated cleanup of code that had been run over once too many times with a tank. So how did this work out. Well, for os_setup/os_load it works rather nicely! node_reboot is another story. I probably should have left it alone, but since I had already climbed the curve on osload, I decided to go ahead and do reboot. The problem is that node_reboot needs to run as root (its a setuid script), which means it can only be used as a library from something that is already setuid. os_setup and os_load runs as the user. However, having a consistent library interface and the ability to cleanly figure out which individual nodes failed, is a very nice thing. So I came up with a suitable approach that is hidden in the library. When the library is entered without proper privs, it silently execs an instance of node_reboot (the setuid script), and then uses the same trick mentioned above to read back individual node status. I create the pipe in the parent before the exec, and set the no-close-on-exec flag. I pass the fileno along in an environment variable, and the library uses that to the write the results to, just like above. The result is that os_setup sees the same interface for both os_load and node_reboot, without having to worry that one or the other needs to be run setuid.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
- 05 Mar, 2004 6 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
-
Mike Hibler authored
-
Mike Hibler authored
the tmcd stuff...
-
Mike Hibler authored
-
Mike Hibler authored
"client-install"
-
- 04 Mar, 2004 8 commits
-
-
Shashi Guruprasad authored
comment that LANs don't have queues.
-
Robert Ricci authored
-
Shashi Guruprasad authored
-
Shashi Guruprasad authored
e.g. 50s instead of 50 slots. Expanded the abbreviation coz one of the users was confused and guessed 50s to be 50 seconds * bandwidth.
-
Kirk Webb authored
operations for plabrenewd.
-
Robert Ricci authored
not n.node_id.
-
Kirk Webb authored
* Added 'leaseend' column to plab_slices table.
-
Robert Ricci authored
-
- 03 Mar, 2004 2 commits
-
-
Kirk Webb authored
* implemented PLC slice renewal * restructured daemon code/startup - removed getfree daemon (replaced by plabdiscover; run from cron) - moved generic daemonizing code into libtestbed (class) - created plabrenewd - small script that utilizes daemonizing class - removed plabdaemon file. - updated bossnode startup scripts * changed slice prefix - PLC denies permission w/ anything other than "utah" * Minor semantic changes to module API to be more consistent with other parts. * Some bug fixes.
-
Robert Ricci authored
it out of a file in /usr/testbed/etc . We put it in a seperate file from the rest of the certificate, because we need the fingerprint to be publically-readable.
-
- 02 Mar, 2004 5 commits
-
-
Kirk Webb authored
* removed unused and not generally useful ping checking * reorganized node discovery and added node info updating - e.g., update IP, SITE, or HOSTNAME when they have changed - no longer part of the backend module as this is independent of which backend is used; may modularize it due to plab's new "trumpet" service, which is basically its node DB available via a decentralized transport/API. * introduced new method of getting node info - use plab sites.xml file * various other cleanups.
-
Robert Ricci authored
putting type information into the database by hand.
-
Robert Ricci authored
-
Robert Ricci authored
instead of only 10 characters.
-
Robert Ricci authored
planning to change the size of node_ids.
-
- 01 Mar, 2004 2 commits
-
-
Jonathon Duerig authored
-
Mike Hibler authored
-
- 27 Feb, 2004 3 commits
-
-
Robert Ricci authored
-
Robert Ricci authored
given OSID, and include this view on the image and osinfo pages.
-
Robert Ricci authored
since some people really do have very short names.
-
- 26 Feb, 2004 11 commits
-
-
Robert Ricci authored
-
Robert Ricci authored
ones. Provide defaults for many fields, when creating a new type. Re-order a few fields to make a little more sense. Add a javascript function to build a value for control_iface based on what the user puts into control_net.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Kirk Webb authored
-
Leigh B. Stoller authored
-
Kirk Webb authored
-
Kirk Webb authored
remove debugging flag from libdb.py
-
Kirk Webb authored
-
Leigh B. Stoller authored
node_types table, especially when setting up new testbeds. Currently, linked off the node summary page, when in admin mode you get the edit link instead of the show link.
-
Robert Ricci authored
-