- 20 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
-
- 13 Oct, 2003 1 commit
-
-
Mac Newbold authored
-
- 10 Oct, 2003 1 commit
-
-
Mac Newbold authored
model of waiting for state changes. Before we were watching the database (which means we can only watch for terminal/stable/long-lived states, and have to poll the db). Now things that are waiting for states to change become event listeners, and watch the stream of events flow by, and don't have to do any polling. They can now watch for any state, and even sequences of states (ie a Shutdown followed by an Isup). To do this, there is now a cool StateWait.pm library that encapsulates the functionality needed. To use it, you call initStateWait before you start the chain of events (ie before you call node reboot). Then do your stuff, and call waitForState() when you're ready to wait. It can be told to return periodically with the results so far, and you can cancel waiting for things. An example program called waitForState is in testbed/event/stated/ , and can also be used nicely as a command line tool that wraps up the library functionality. This also required the introduction of a TBFAILED event that can be sent when a node isn't going to make it to the state that someone may be waiting for. Ie if it gets wedged coming up, and stated retries, but eventually gives up on it, it sends this to let things know that the node is hozed and won't ever come up. Another thing that is part of this is that node_reboot moves (back) to the fully-event-driven model, where users call node reboot, and it does some checks and sends some events. Then stated calls node_reboot in "real mode" to actually do the work, and handles doing the appropriate retries until the node either comes up or is deemed "failed" and stated gives up on it. This means stated is also the gatekeeper of when you can and cannot reboot a node. (See mail archives for extensive discussions of the details.) A big part of the motivation for this was to get uninformed timeouts and retries out of os_load/os_setup and put them in stated where we can make a wiser choice. So os_load and os_setup now use this new stuff and don't have to worry about timing out on nodes and rebooting. Stated makes sure that they either come up, get retried, or fail to boot. tbrestart also underwent a similar change.
-
- 25 Sep, 2003 1 commit
-
-
Robert Ricci authored
does the daemon.
-
- 17 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
errors.
-
- 23 Jul, 2003 1 commit
-
-
Robert Ricci authored
In some cases (such as nodes that are not 'fully' in the testbed yet), this has to be done by an external program.
-
- 14 May, 2003 1 commit
-
-
Robert Ricci authored
per-OS. Also, moved the functionality to check for this into libdb, so we can call it from other places (like the batch daemon.)
-
- 26 Mar, 2003 1 commit
-
-
Leigh B. Stoller authored
only pid, to pid/gid like most other things in the testbed. Also add a "global" slot to denote images that are globally available to all projects (system images). The older "shared" attribute is now used to denote images that are shared within a project (available to all subgroups in the project). The migration path for existing DBs is given in the migrate file. Be sure to run those commands on an existing testbed or things will break! www/newimageid, www/newimageid_ez: A bunch of changes for shared/global attributes. Added a group menu to the form so users can create images in subgroups. Beefed up the Java code that constructs the path name to use the gid, shared, and global attributes of the form to give the user the best possible path that we can. Improved the pathname checking code so that we do not allow just any old path in case the user elects to disregard the path we carefully constructed for them. Also check the proj/group membership, and setup defaults for users that have permission in just one pid/gid to create images. libdb.in: Changed permission check in TBImageIDAccessCheck() to reflect shared/global attribute changes. os_load: Get rid of test that checked path of the image. The path checking is done in the web interface anyway, so why duplicate in 4 places. Other minor changes reflecting shared->global name change. Also note that images can come from the group directory now. create_image: Get rid of test that checked path of the image. The path checking is done in the web interface anyway, so why duplicate in 4 places. Also note that images can come from the group directory now. www/dbdefs: Changed permission check in TBImageIDAccessCheck() to reflect shared/global attribute changes. www/showimageid_list, www/showstuff: Minor global/shared attribute changes. www/menu: Change osids/imageids pointer to point to the image list, not the osid list. This is more reasonable for mere users who have access to the EZ form, and thus never really need to concern themselves with osids. www/editimageid: Add proper pathname checking. There were no checks at all before!
-
- 25 Mar, 2003 1 commit
-
-
Mac Newbold authored
-
- 19 Feb, 2003 1 commit
-
-
Robert Ricci authored
loaded on at once. The idea is to use this for things like rtLinux that require a per-machine license. This mechanism is circumventable, (for example, by simply making a second image), so it's mainly to prevent experimenters from accidentally violating the license.
-
- 29 Jan, 2003 1 commit
-
-
Robert Ricci authored
reloading, not come all the way back up. Also, always sets current_reloads right before calling os_select, in case it got cleared out in the meantime.
-
- 13 Jan, 2003 1 commit
-
-
Leigh B. Stoller authored
-
- 07 Jan, 2003 1 commit
-
-
Leigh B. Stoller authored
calculation based on the size of the image file. Okay, to avoid all you folks from going to see what bit of dreck I came up with, here it is: my $sb = stat($imagepath); my $chunks = $sb->size / (1024 * 1024); $maxwait = int((($chunks / 100.0) * 25) + (4 * 60)); Note the replacement of one hardwired number (15) with several dozen new ones! I like it anyway, cause I hate waiting 2*15 minutes when a 60 second load fails.
-
- 11 Dec, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 12 Nov, 2002 1 commit
-
-
Mac Newbold authored
-
- 01 Nov, 2002 1 commit
-
-
Mac Newbold authored
-
- 18 Oct, 2002 1 commit
-
-
Mac Newbold authored
Changes to watch out for: - db calls that change boot info in nodes table are now calls to os_select - whenever you want to change a node's pxe boot info, or def or next boot osids or paths, use os_select. - when you need to wait for a node to reach some point in the boot process (like ISUP), check the state in the database using the lib calls - Proxydhcp now sends a BOOTING state for each node that it talks to. - OSs that don't send ISUP will have one generated for them by stated either when they ping (if they support ping) or immediately after they get to BOOTING. - States now have timeouts. Actions aren't currently carried out, but they will be soon. If you notice problems here, let me know... we're still tuning it. (Before all timeouts were set to "none" in the db) One temporary change: - While I make our new free node manager daemon (freed), all nodes are forced into reloading when they're nfreed and the calls to reset the os are disabled (that will move into freed).
-
- 04 Oct, 2002 1 commit
-
-
Mac Newbold authored
Small changes to image access permissions checks. Root can get any image it wants, and frisbeelauncher only requires READINFO permissions, so that users can os_load shared images still. Also, have os_load pass its debug flag to frisbeelauncher if set.
-
- 17 Sep, 2002 1 commit
-
-
Leigh B. Stoller authored
obscure errors since we allow the path to be set to null. Setting the path to null via the web page is probably not a good idea, but I just did that by accident and noticed what happened ...
-
- 07 Jul, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 24 Jun, 2002 1 commit
-
-
Leigh B. Stoller authored
when loading the default image.
-
- 19 Jun, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 06 Jun, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 22 Apr, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 28 Mar, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 12 Feb, 2002 1 commit
-
-
Leigh B. Stoller authored
invoked from os_setup, which runs as the user.
-
- 08 Feb, 2002 1 commit
-
-
Leigh B. Stoller authored
supporting autocreating and autoloading images. The imageid form now sports a field to specify a nodeid to create the image from; If set, the backend create_image script is invoked. Thats the easy part. Slightly harder is autoloading images based on the osid specified in the NS file. To support this, I have added a new DB table called osidtoimageid, which holds the mapping from osid/pctype to imageid. When users create images, they must specify what node types that image is good for. Obviously, the mappings have to be unique or it would be impossible to figure it out! Anyway, once that image mapping is in place and the image created, the user can specify that ID in the NS file. I've changed os_setup to to look for IDs that are not loaded, and to try and find one in the osidtoimageid. If found, it invokes os_load. To keep things running in parallel as much as possible, os_setup issues all the loads/reboots (could be more than a single set of loads is multiple IDs are in the NS file) at once, and waits for all the children to exit. I've hacked up os_load a bit to try and be more robust in the face of PXE failures, which still happen and are rather troublsesome. Need an event system! Contained in this revision are unrelated changed to make the OS and Image IDs per-project unique instead of globally unique, since thats a pain for the users. This turns out to be very messy, since underneath we do not want to pass around pid/ID in all the various places its used. Rather, I create a globally unique name and extened the OS and Image tables to include pid/name/ID. The user selects pid/name, and I create the globally unique ID. For the most part this is invisible throughout the system, except where we interface with the user, say in the web pages; the user should see his chosen name where possible, and the should invoke scripts (os_load, create_image, etc) using his/her name not the internal ID. Also, in the front end the NS file should use the user name not the ID. All in all, this accounted for a number of annoying changes and some special cases that are unavoidable.
-
- 30 Jan, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 17 Jan, 2002 1 commit
-
-
Leigh B. Stoller authored
image is bigger than any previous image we have dealt with! Probably need to make this dynamic in some way.
-
- 14 Jan, 2002 1 commit
-
-
Leigh B. Stoller authored
* Add appropriate goo to os/GNUMakefile so that Frisbee daemon is built and installed. * Rework the frisbee launcher slightly. Aside from little changes (send email to tbops when frisbeed dies, new cmdline syntax to frisbeed), allow for frisbeed to exit gracefully after a period of inactivity (no client requests for 30 minutes, at present). In order to prevent a race condition with a new client being added (and rebooted) and frisbeed terminating before the client gets started, add a load_busy indicator to the images table (next to load_address slot) and set that to one each time to frisbeelauncher is invoked. When frisbeed exits, test and clear that bit atomically (lock tables) and go around another time (restart frisbeed for another 30 minute period). * Rework waitmode in os_load. Wait for all of the nodes to finish at once, and track which nodes never finish. Retry those nodes again by rebooting. The number of retries is configurable in the script, and is currently set to one. This should take care of some PXE boot related problems, although obviously not all. * Got rid of -w option to os_load and made waitmode the default. The -s option can be used to start a reload, but not to wait for it to complete. * Minor changes to sched_reload and reload_daemon; pass in -s option to os_load.
-
- 06 Nov, 2001 1 commit
-
-
Robert Ricci authored
a '-n' option to use netdisk, and will respond properly to changing TB_DEFAULT_RELOADTYPE in libdb. os_load also got some fixes for the -w flag when used with Frisbee - it fires off all nodes at once, rather than two at once.
-
- 05 Nov, 2001 1 commit
-
-
Robert Ricci authored
TB_IMAGEID_READINFO - I was mistaken about the level of access implied by TB_IMAGEID_ACCESS.
-
- 22 Oct, 2001 1 commit
-
-
Leigh B. Stoller authored
reloads for nodes in an experiment. Change os_load to schedule a default image reload whenever a mereuser loads an image that is not the default image for that node type. Add some support stuff in libdb (TBSetSchedReload) and some constant definitions for sched_reload and for nodelog.
-
- 16 Oct, 2001 1 commit
-
-
Leigh B. Stoller authored
-
- 28 Sep, 2001 1 commit
-
-
Leigh B. Stoller authored
Usage: os_load [-s | -w] [-r] [-i <imageid>] <node> [node ...] Usage: sched_reload [-f | -p] [-r] [-i <imageid>] <node> [node ...] The imageid is now an optional argument. After continually forgetting what imageid to use, or just plain forgetting the argument, and having it try to load imageid pc53 on pcXX, I decided this interface was bogus. With now imageid, select the default imageid for each node provided. This is actually convenient since you can load multiple types of nodes in one shot.
-
- 18 Sep, 2001 1 commit
-
-
Robert Ricci authored
1.13, but had somehow gone unnoticed until now.
-
- 17 Sep, 2001 1 commit
-
-
Robert Ricci authored
to contain a list of reloads currently in processes. It is filled by os_load, and is cleared out by the tmcd 'reset' command or by nfree. The tmcd 'loadaddr' command now uses this table instead of the reloads table. Also added Frisbee support to sched_reload, and changed the Frisbee command line option to os_load to '-r' to avoid a conflict with sched_reload's '-f' option.
-
- 04 Sep, 2001 1 commit
-
-
Robert Ricci authored
switch to try it out. The main thing missing at this point is a way to tell frisbee _which_ disk image to load - it will load whichever image there happens to be a server running for.
-
- 24 Aug, 2001 2 commits
-
-
Mac Newbold authored
Change occurrences of "@TESTMODE@" back to @TESTMODE@ like they were supposed to be in the first place...
-
Mac Newbold authored
-