- 18 Dec, 2002 12 commits
-
-
Leigh B. Stoller authored
rc.conf: Remove fixed -p argument. Now set by mkjail. rc.local,jailctl: Update for client side path reorg and cleanup. jaildog.pl,mkjail.pl: Numerous fixes for jailed nodes.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
for BSD of course. First is a "proxy" mode that is used outside of a jail, to forward tmcc requests from inside the jail to boss over the normal ssl channel (when a remote node). We remove the pem files from inside the jail so it has no way to form a secure connection to tmcd on its own, and tmcd rejects non-ssl connections from remote nodes (it should probably reject them from local jails too). Second change is a "unix socket" mode that is the compliment to the proxy; tmcc inside of a jail connects to the tmcc proxy outside the jail via a unix domain socket that can be shared between the two because the outer environment can see inside the jailed filesystems (the jail sees a chroot environment). When the jail is started, the initial root shell gets an environment variable called TMCCUNIXPATH which holds the path to the socket. This makes it easy for anything started from that shell of course, but its still a minor pain when invoking tmcc from elsehwere, but that does not really happen, except when running it by hand. Anyway, tmcc forms a unix socket to the proxy and does its thing. The proxy filters out VNODE= and PRIVKEY= arguments, and inserts its own into the command string. This prevents a jail from trying to impersonate another vnode.
-
Leigh B. Stoller authored
like the remote nodes do, but for now do not update the up/down status from that. I need to mess with db/node_status first to make sure there is agreement between the parties. Note that remote nodes send one UDP message every 60 seconds (isalive is done with a UDP). Local nodes will send them at a slower rate, as is the practice in db/node_status which wakes up every 5 minutes and fpings the world!
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
message about no libmysql on a testbed node.
-
Leigh B. Stoller authored
some trouble with old logs getting cached in the browser.
-
Leigh B. Stoller authored
Attempts to replay an experiment by rebooting all the nodes, clearing the various startup bits (ready, startstatus, bootstatus, portstats), and then restarting the event system. I am dubious that this is a workable solution because of the asynchronous nature of the testbed (nodes happily cruise from TBRESET to ISUP and beyond without stopping), and so its hard to truly replicate the initial lack of state that a freshly swapped in experiment has. Still, people requested it and I cheerfully provided it cause thats what I do; service with a smile and not a wit of complaint. Is anyone reading this?
-
Leigh B. Stoller authored
node. Used in new tbrestart code for replaying experiments. It remains to be seen if this is a workable approach. TBNodeStateWait() is really WaitTillAlive, which I need in several new spots now. Its not as general purpose as it seems though, since there are only a couple of terminal states (isup) that you can actually wait for by querying the DB. But, I'm loathe to add any more event code to the system.
-
Leigh B. Stoller authored
option is now "[[includevirt] or [virtonly[=<phys>]]]". In other words, you can ask to include virtual nodes, or you can ask for just virtual nodes. Optionally, you can ask for the virtual nodes for a specific physical node. I use this from assign_wrapper to map local jail nodes.
-
Leigh B. Stoller authored
nodes. The second argument can now be an NS node instead of the name of a real testbed node. For example: tb-set-hardware $node3 pc600 tb-set-hardware $nodev1 pcvm600 tb-fix-node $nodev1 $node3 So, "fix" $nodev1 to $node3. The intent is that once $node3 is allocated by assign to a real testbed node, we can then allocate a virtual node on pcXX to $nodev1. I did this primarily to allow for easy testing of jails via my NS file, without having to hack assign wrapper to deeply. Note there are still hacks in assign_wrapper to support this, but they are not extensive. Also my old usewatunnels stuff I never checked in: tb-set-usewatunnels 0/1
-
Leigh B. Stoller authored
before using it!
-
- 17 Dec, 2002 5 commits
-
-
Mike Hibler authored
-
Mike Hibler authored
-
Mike Hibler authored
(there was no copyright)
-
Mac Newbold authored
available to it.
-
Mac Newbold authored
Currently, the cache size is set to 128 entries max, and "recently" is defined as being within the last 60 seconds. This should really help ease the strain on the system (stated especially) when things are getting lost due to high load, and retries would normally cause duplicates.
-
- 16 Dec, 2002 3 commits
-
-
Mac Newbold authored
-
Mac Newbold authored
help nodes in reload_pending get sucked into reloading faster. If it doesn't do enough, we'll need to do more batching of stuff, so we get some parallelism in os_load instead of forcing it to serialize by calling os_load one node at a time. I was tempted to nuke all the stuff that was in there from the netdisk reload type, but decided not to. It won't be too long (relatively speaking) before we have freed, the new "free node manager" that will replace/supersede our current reload_daemon anyway.
-
Mac Newbold authored
events. This may delay handling of other stuff that happens in my main loop, but not by too much. To prevent skew, everything (including reload frequency) is done strictly by seconds elapsed, not by iterations or anything. I found that even polling for multiple events without sleeping, I could only handle a little over 1 per second when I was calling inuse/statetime for additional info on every event. Even though this only happens in the worst case (every event is wrong), it won't do. So I took that out. I'll probably end up adding a faster lookup of the info I need (mostly reservation, and what osid it thinks it is running). That change took it up to at least 4 per second (as fast as I could send them manually), more than 4x our previous performance. So we should be able to keep up now. Also, add the support for "announcements" to testbed ops when I die and such. (Been in a few days, but this is the first commit of it)
-
- 13 Dec, 2002 1 commit
-
-
Mike Hibler authored
netbed/RON machines. Eliminate the second pass over the file to fill in blockindex/total fields. Blockindex fields are filled in on the first pass, we don't bother with blocktotal since we don't use it any more.
-
- 12 Dec, 2002 6 commits
-
-
Mike Hibler authored
passed to the frisbee init routine. Instead of failing if the new image doesn't fit in the target slice, just warn and truncate. Back to aggressive adjustment of BSD partition table sizes. Warn about and truncate partitions that won't fit in the target slice and adjust the 'c' partition to exactly match the BSD slice.
-
Leigh B. Stoller authored
This is going to be used to sign the stuff we send out to widearea nodes (images, scripts, etc). The passphrase as the one I used on the SSH priv keys for widearea nodes.
-
Mike Hibler authored
problem isn't fixed in the latest port or else there is a kernel problem too...
-
Mac Newbold authored
-
Mike Hibler authored
-
Mike Hibler authored
Add some additional debugging for bad packets (since I have seen some)
-
- 11 Dec, 2002 8 commits
-
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Mike Hibler authored
add -Wall to CFLAGS and clean up lint update the TODO file explicitly size the header fields (e.g., int32_t not int) imagezip: Version 2. Adds two ints to the header to help track free space. Each chunk now has a first and last sector number which can describe any free block before or after the data contained in the chunk. This is needed in order to properly zero all free space when laying down an image. In practice: the first chunk describes any free space before the first allocated range and any free space after its contained ranges and before the first allocated range in the second chunk. Every other chunk then describes just free space following itself (since the previous chunk has already described the space before this chunk). The point being, we only describe each free range once. Added "relocation" information. Relocation entries go in the chunk header along with region descriptors. This allows us to identify chunks of data which need to be absolute disk blocks instead of offsets from the containing partition. This is now used for BSD-slice partition tables which contain absolute disk blocks. We can now create an image in one slice and reload it into another slice. Allow zlib compression level 0 (no compression). This might be useful on machines that have slow CPUs: do just FS-compression and transfer the image elsewhere faster where it could be re-zipped with regular compression. Fix goof. Previously we were not saving any DOS partition with an unrecognized type. We should be naively compressing it instead. This is what we now do. We continue to skip partitions of type 0 ("unused"). mikeism: add handler for SIGINFO (^T) to report progress of a zip-age. Added everybody's favorite "dots" mode for reporting progress. Eliminate some excess copies left over from the conversion from write-every-little-piece to buffer-up-a-full-chunk-and-then-write. Eliminaged the special case handling of no skips (ranges) in compress_image by creating a single allocated range describing the whole disk/partition in this case. For NTFS, make the behavior of calling missing unicode routines be to return an error rather than exit. These calls happen, but their failing doesn't seem to be fatal. Lots of typical mike-pissing on everything else. imageunzip: Modify to handle both V1 and V2 images. In slice mode, make sure we don't write past the bounds of the slice. ES&D if we try. Make output to unseekable devices work again (broken when pwrite was added) Add debug -F (Frisbee) option to randomize the presentation of chunks to the unzip/write threads. Used to simulate frisbee. Add "-T DOS-type" option to tell imageunzip, when in slice mode, to set the type of the slice in the DOS partition table. This is useful if you are dropping say a BSD filesystem into an unused slice, you don't have to go back later and set this with fdisk. Considered making this info part of the image itself (recorded by imagezip when creating a slice image), but decided against it. writezero takes an off_t for the size, we can be asked to write many gigabytes of zero at the end of a disk. Turn off dots mode by default. Ya wanna see spots? Ya gotta turn it on! Lots of typical mike-pissing on everything else. imagedump: New tool for checking/dumping the structure of an image and reporting stats about it.
-
Mike Hibler authored
Server: make file readsize independent of burstsize (previously readsize had to be a divisor of burstsize). A subtle side-effect is that the dynamic burst rate is recalcluated at the conslusion of every burst instead of after every readsize count of blocks has been sent (less than a burst) This just seems to be more logical. Client: add "-T DOS-type" option to tell frisbee, when in slice mode, to set the type of the slice in the DOS partition table. This is useful if you are dropping say a BSD filesystem into an unused slice, you don't have to go back later and set this with fdisk. Considered making this info part of the image itself (recorded by imagezip when creating a slice image), but decided against it.
-
Mike Hibler authored
-
Mike Hibler authored
the different node types
-
Mike Hibler authored
Needed for the frisbee environment, so might as well use it everywhere.
-
Leigh B. Stoller authored
a temporary class for testing new images.
-
- 10 Dec, 2002 3 commits
-
-
Kirk Webb authored
Modified the timeout logic in create_image to track the image creation progress (size) rather than simply waiting a certain amount of time. Also changed the code to report progress at regularly spaced intervals (adjustable), and to indicate when the timeout timer has been activated, or halted due to progress. The changes also include an NFS cache slack factor, which makes the effective non-progress timeout equal to the sum of the slack time, plus the non-progress time (currently 3 + 5 = 8 minutes). Some changes were made to the error and cleanup logic to help revert the state of the DB and node as much as possible (node is not rebooted if the DB state cannot first be reverted) prior to exit.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
- 09 Dec, 2002 2 commits
-
-
Mac Newbold authored
-
Leigh B. Stoller authored
protected page except those that are explicitly deemed okay for a webonly user. This makes me feel better and safer!
-