- 02 Oct, 2003 1 commit
-
-
Leigh B. Stoller authored
someday, but not today!
-
- 30 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
awaiting word from Kirk.
-
- 24 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
trying to bring them back from the dead periodically by trying to instantiate a vserver/vnode on them, and then tearing it down. If we can do that, then the node is usable, and it gets moved back into the normal holding experiment so that ptopgen will add it to ptop files. This deamon is not turned on yet; waiting for other little bits and pieces to be done. There is an equiv change in os_setup that moves physnodes into hwdown when a setup on a vnode fails. Lbs
-
- 22 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
fail in "plabnode alloc" or in the remote vnodesetup call. In the former case, we do not want to "plabnode free" it later. In the later, we want to plabnode free it right away, and make sure we do not try to remote vnode teardown or plabfree it later. In either case, os_setup needs to check so that it does not bother waiting for the node since it is wasted time. I use an alternate dead state for this, but the real solution is to move much of the vnode specific code from os_setup to vnode_setup. Note that this stuff is mostly untested since I need nodes to fail! The normal path works fine though.
-
- 19 Sep, 2003 2 commits
-
-
Robert Ricci authored
-
Robert Ricci authored
-
- 18 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
bit based on type (plab nodes allowed to fail). The default in the DB is nofail for all nodes.
-
- 16 Sep, 2003 3 commits
-
-
Robert Ricci authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
- 02 Sep, 2003 1 commit
-
-
Leigh B. Stoller authored
or wait for them. I'll add this later.
-
- 22 Aug, 2003 1 commit
-
-
Austin Clements authored
that jails do. sshtb can now handle vnodeid's, and must be given a vnodeid if it's a Plab node, because the username will differ depending on this.
-
- 25 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
utlimate solution to this whole problem is to change the experiment state so that we can distinguish between real swapout and swapout caused by swapmod/retry/error. Or, we need to add more intermediate allocstates for the nodes. Not sure yet.
-
- 18 Jul, 2003 2 commits
-
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
setup/teardown mistakes. Phys Nodes that fail to boot should not have their vnodes setup or torndown; results in pointless errors and email.
-
- 17 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
freed up. Not strictly necessary, but a good idea. In any event, if any vnodes fail to come up, do not retry. In most cases, its going to fail again, so do not bother.
-
- 15 Jul, 2003 2 commits
-
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
does not yet work with remove virtual nodes; that will take even more work). Added a new allocstate called RES_TEARDOWN. assign_wrapper no longer deallocates unused nodes, but rather moves them into the new state for the wrapper (tbswap) to deal with. Thats cause deleted vnodes need to be torn down, since its possible that the node on which they were living will not be deallocated (say, if there are other vnodes on it). We do not want to be doing that from assign_wrapper, so tbswap looks for those nodes. Made vnode_setup allocstate aware in the same way that os_setup is; do not reboot vnodes or try to set up vnodes when they are already in the RES_READY state, as they will be when doing a swapmod. In addition, if os_setup is going to reboot the underlying physnode, move the vnodes on that node into RES_READY too, since there they will setup automatically. Might need an interim state here, for correctness.
-
- 08 Jul, 2003 2 commits
-
-
Leigh B. Stoller authored
reason to override, or duplicate. Init the allocstate on virt nodes to RES_READY when it boots, to be complete and to avoid warnings elsewhere. Remove some commented out code.
-
Leigh B. Stoller authored
to determine how long to wait, rather than (vnode * 30), which results in a REALLY long wait when "someone" tries a 1000 node experiment!
-
- 04 Jul, 2003 1 commit
-
-
Leigh B. Stoller authored
swapin this morning was well over the top.
-
- 25 Jun, 2003 1 commit
-
-
Leigh B. Stoller authored
120 + (30 * number_of_vnodes).
-
- 14 Apr, 2003 2 commits
-
-
Chad Barb authored
Fixed error message. "There were 0 nodes." --> "There were 0 failed nodes."
-
Leigh B. Stoller authored
whatever assign_wrapper did. This is different than local nodes where we allocate the underlying phys node and set its osid. Confusing.
-
- 07 Apr, 2003 1 commit
-
-
Chad Barb authored
Modify os_setup return codes to enable "intelligent" retry; Now os_setup returns: 0 on success 1 on one or more retry-friendly errors -1 on no-retry errors tbswap.in checks os_setup's return code, and will only retry on 1.
-
- 18 Mar, 2003 1 commit
-
-
Chad Barb authored
Here it is; reswap. nfree - modified to put node in FREE_DIRTY when it is freed assign_wrapper - '-u' update switch added. os_setup - doesn't reboot node which is already in RES_READY tbswap - calls all this stuff appropriately
-
- 17 Mar, 2003 1 commit
-
-
Leigh B. Stoller authored
to a specific one, for the purposes of mapping things like FBSD-STD to FBSD47-STD (the current OSID to use). This is technically more correct than what os_setup used to do, which was map FBSD-STD to whatever FreeBSD OSID was currently on the disk. Now it maps to a specific one, and if that is not loaded, it sets up a reload.
-
- 31 Jan, 2003 2 commits
-
-
Robert Ricci authored
-
Robert Ricci authored
-
- 29 Jan, 2003 1 commit
-
-
Robert Ricci authored
no longer waits that long.
-
- 07 Jan, 2003 1 commit
-
-
Leigh B. Stoller authored
real nodes get. Also, run a proper os_select on jailed nodes, *after* the os for the physical node is setup, since otherwise stated will not be happy. Fixes for dealing with failed os_load. Previously, if os_load would fail, os_setup would wait for those nodes anyway since it had no idea what nodes had failed (and we do not want to just quit from os_setup since that might cause a lot of extra power cycles). Now, for each node that got an os_load, check its eventstate; it should be in ISUP immediately after os_load exits (since thats what os_load waited for), and if its not, then mark that node as failed. Note though that failed loads no longer result in the node going into hwdown, since 99 percent of the time its a busted user image, not a hardware problem. I figure we will catch real hw errors via the reload daemon, when it sends email about nodes not finishing. Do not bother with doing the vnode setup if any of the phys nodes failed to setup. Leads to cascading errors and prolongs the angony by another few minutes. Might revisit this later. Remove local WaitTillAlive() function, and switch to using the version I put into libdb a couple of weeks ago. Fix up a bunch of print statements to be nicer.
-
- 31 Oct, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 18 Oct, 2002 1 commit
-
-
Mac Newbold authored
Changes to watch out for: - db calls that change boot info in nodes table are now calls to os_select - whenever you want to change a node's pxe boot info, or def or next boot osids or paths, use os_select. - when you need to wait for a node to reach some point in the boot process (like ISUP), check the state in the database using the lib calls - Proxydhcp now sends a BOOTING state for each node that it talks to. - OSs that don't send ISUP will have one generated for them by stated either when they ping (if they support ping) or immediately after they get to BOOTING. - States now have timeouts. Actions aren't currently carried out, but they will be soon. If you notice problems here, let me know... we're still tuning it. (Before all timeouts were set to "none" in the db) One temporary change: - While I make our new free node manager daemon (freed), all nodes are forced into reloading when they're nfreed and the calls to reset the os are disabled (that will move into freed).
-
- 26 Sep, 2002 1 commit
-
-
Mac Newbold authored
Fix small problem that was causing a failure in the test suite: If there are no nodes in the expt, don't die(), just exit().
-
- 05 Aug, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 07 Jul, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 03 Jul, 2002 1 commit
-
-
Robert Ricci authored
-
- 02 Jun, 2002 1 commit
-
-
Leigh B. Stoller authored
state to REBOOTING, and then wait for the ISUP state to be set. This change reflected in the clientside startup scripts on remote nodes, that now issues a REBOOTED event, and then an ISUP event after everything is setup properly.
-
- 13 May, 2002 1 commit
-
-
Leigh B. Stoller authored
-
- 10 May, 2002 1 commit
-
-
Robert Ricci authored
too much CPU
-