- 26 Sep, 2012 3 commits
- 25 Sep, 2012 19 commits
-
-
Mike Hibler authored
Previously tb-set-node-failure-mode of "nonfatal" only applied to failures when rebooting a node. If there was an error during the disk reload phase, the experiment would still fail. This makes sense, as it is pretty dicey to let a node boot with an unloaded or partially-loaded disk. But there are situations, such as 500+ node experiments on PRObE, where it makes sense to not fail the experiment. What we do if a node fails reload, is to clear the OSIDs and partition info for the node and then force it to reboot (by setting the state to TBFAILED, for which there is a REBOOT trigger in stated). This causes the node to come up and park in pxeboot in the PXEWAIT state. It should remain in this state across reboots. The user can manually os_load the machine, or do a swap modify which will force the node to try to reload the original OS. Since this may not be for everyone, this new allow non-fatal osload failures requires that the "OsloadFailNonfatal" feature be enabled. This allows the new behavior to be global, per-group, per-experiment or per-user. The default is disabled.
-
Mike Hibler authored
-
Leigh B Stoller authored
Snapshots are done a little differently then openvz of course, since there are potentially multiple disk partitions and a kernel. The basic operation is: 1. Fire off reboot_prepare from boss. Changes to reboot_prepare result in the guest "halting" insted of rebooting. 2. Fire off the create-image client script, which will take imagezips of all of the disks (except the swap partition), and grab a copy of the kernel. A new xm.conf file is written, and then the directory is first tar'ed and then we imagezip that bundle for upload. 3. When booting a guest, we now look for guest images that are packaged in this way, although we still support the older method for backwards compatability. All of the disks are restored, and a new xm.conf created that points to the new kernel.
-
Leigh B Stoller authored
can mount the NFS filesystems. Only doing this for shared hosts at this time, to avoid driving up the number of exports too much. Might reconsider later.
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
in the proper reverse map.
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Leigh B Stoller authored
of reboot since otherwise the domU just reboots and keeps running.
-
Leigh B Stoller authored
they do not have to be created at swapin. This is mostly for the tutorial sessions, where we have dozens of containers launching at the same time.
-
Leigh B Stoller authored
Works a lot better!
-
Leigh B Stoller authored
-
Leigh B Stoller authored
the slothd, and remove the NFS rule that causes all traffic leaving the VM to look like it came from the physical host. Cleanup the tmcc proxy startup and teardown.
-
Leigh B Stoller authored
file so that command line arguments overrides, as it does for other options.
-
Leigh B Stoller authored
otherwise we just keep going and going and going.
-
Mike Hibler authored
Helps with debugging of lost state events. Also nit: fix the tmcd makefile so it would correctly build the bootinfo modules if we make in the tmcd directory before the pxe directory.
-
- 24 Sep, 2012 5 commits
-
-
Mike Hibler authored
Due to a race with collecting events, it looks like some events will still slip through the crack and we might wind up having missed a transition after five minutes. If we see that we are already in RELOADING (the state transition we are looking for) when we would declare the node wedged, then fake the transition and continue. I suspect this would not happen if I just looped on event_poll til there were no more events, but I am afraid of letting that loop go unbounded. So til I gather more data, lets go with this hack check.
-
Leigh B Stoller authored
-
Leigh B Stoller authored
-
Eric Eide authored
-
Eric Eide authored
This commit is intended to makes the license status of Emulab and ProtoGENI source files more clear. It replaces license symbols like "EMULAB-COPYRIGHT" and "GENIPUBLIC-COPYRIGHT" with {{{ }}}-delimited blocks that contain actual license statements. This change was driven by the fact that today, most people acquire and track Emulab and ProtoGENI sources via git. Before the Emulab source code was kept in git, the Flux Research Group at the University of Utah would roll distributions by making tar files. As part of that process, the Flux Group would replace the license symbols in the source files with actual license statements. When the Flux Group moved to git, people outside of the group started to see the source files with the "unexpanded" symbols. This meant that people acquired source files without actual license statements in them. All the relevant files had Utah *copyright* statements in them, but without the expanded *license* state...
-
- 23 Sep, 2012 4 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
Previously, we were not doing this if there was a DHCP subboss (which, in theory, should make the next-server line unnecessary since the client should ignore us entirely). I have a vague recollection of some PXE caching the wrong next-server value and this seemed to help at PRObE.
-
Mike Hibler authored
event_poll will trigger at most one event, so we have to loop calling it to pick up all events in a timely manner. Also add a couple more timestamps and debug messages.
-
Mike Hibler authored
This happens at PRObE with very large experiment swapins (500+ nodes). Don't know why it happens, but ingoring it is safe.
-
- 22 Sep, 2012 1 commit
-
-
Mike Hibler authored
-
- 21 Sep, 2012 3 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
-
Mike Hibler authored
-
- 19 Sep, 2012 5 commits
-
-
Mike Hibler authored
Sigh...yet another change. This one is not critical, so I didn't remake the MFSes yet again.
-
Mike Hibler authored
"62" and "47" are now in the legacy tarball and may or may not work.
-
Mike Hibler authored
Left it in for newnode MFS since even with subbosses, newnodes really need to talk to boss.
-
Mike Hibler authored
I am not 100% sure this is correct, but if not correct, it is at least no more wrong than the old code!
-
Jonathon Duerig authored
-