- 11 Jan, 2002 2 commits
-
-
Leigh B. Stoller authored
flood (say, if there were 40 clients wanting chunks). I added some backoff code that will slow the rate at which the clients make requests in the face of a non asnwering daemon. Backs off in increments of the PKTRCV timeout value (30ms at present), until it gets to one second. Then it holds at one second intervals.
-
Leigh B. Stoller authored
status when we kill them off intentionally.
-
- 10 Jan, 2002 5 commits
-
-
Leigh B. Stoller authored
also noticed that the slower machines were getting very far behind the faster machines (the faster machines requests chunks faster), and actually dropping them cause they have no room for the chunks (chunkbufs at 32). I increased the timeout on the client (if no blocks received for this long; request something) from 30ms to 90ms. This helped a bit, but the real help was increasing chunkbufs up to 64. Now the clients run in pretty much single node speed (152/174), and the CPU usage on boss went back down 2-3% during the run. The stats show far less data loss and resending of blocks. In fact, we were resending upwards 300MB of data cause of client loss. That went down to about 14MB for the 12 node test. Then I ran a 24 node node test. Very sweet. All 24 nodes ran in 155 - 180 seconds. CPU peaked at 6%, and dropped off to steady state of 4%. None of the nodes saw any duplicate chunks. Note that the client is probably going to need some backoff code in case the server dies, to prevent swamping the boss with unanswerable packets. Next step is to have Matt run a test when he swaps in his 40 nodes.
-
Leigh B. Stoller authored
on for now, since its minor code, and spits out good info to the console.
-
Leigh B. Stoller authored
forever.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
statements from the two threads.
-
- 08 Jan, 2002 1 commit
-
-
Leigh B. Stoller authored
idleness is defined as an empty work queue. We still use join/leave messages, but the join message is so that the client can be informed of the number of blocks in the file. The leave message is strictly informational, and includes the elapsed time on the client, so that it can be written to the log file. If that message is lost, no big deal. I ran a 6 node test on this new code, and all the clients ran in 174 to 176 seconds, with frisbeed using 1% CPU on average (typically starts out at about 3%, and quickly drops off to steady state).
-
- 07 Jan, 2002 3 commits
-
-
Leigh B. Stoller authored
requires the linux threads package to give us kernel level pthreads. From: Leigh Stoller <stoller@fast.cs.utah.edu> To: Testbed Operations <testbed-ops@fast.cs.utah.edu> Cc: Jay Lepreau <lepreau@cs.utah.edu> Subject: Frisbee Redux Date: Mon, 7 Jan 2002 12:03:56 -0800 Server: The server is multithreaded. One thread takes in requests from the clients, and adds the request to a work queue. The other thread processes the work queue in fifo order, spitting out the desrired block ranges. A request is a chunk/block/blockcount tuple, and most of the time the clients are requesting complete 1MB chunks. The exception of course is when individual blocks are lost, in which case the clients request just those subranges. The server it totally asynchronous; It maintains a list of who is "connected", but thats just to make sure we can time the server out after a suitable inactive time. The server really only cares about the work queue; As long as the queue si non empty, it spits out data. Client: The client is also multithreaded. One thread receives data packets and stuffs them in a chunkbuffer data structure. This thread also request more data, either to complete chunks with missing blocks, or to request new chunks. Each client can read ahead up 2 chunks, although with multiple clients it might actually be much further ahead as it also receives chunks that other clients requested. I set the number of chunk buffers to 16, although this is probably unnecessary as I will explain below. The other thread waits for chunkbuffers to be marked complete, and then invokes the imagunzip code on that chunk. Meanwhile, the other thread is busily getting more data and requesting/reading ahread, so that by the time the unzip is done, there is another chunk to unzip. In practice, the main thread never goes idle after the first chunk is received; there is always a ready chunk for it. Perfect overlap of I/O! In order to prevent the clients from getting overly synchronized (and causing all the clients to wait until the last client is done!), each client randomizes it block request order. This why we can retain the original frisbee name; clients end up catching random blocks flung out from the server until it has all the blocks. Performance: The single node speed is about 180 seconds for our current full image. Frisbee V1 compares at about 210 seconds. The two node speed was 181 and 174 seconds. The amount of CPU used for the two node run ranged from 1% to 4%, typically averaging about 2% while I watched it with "top." The main problem on the server side is how to keep boss (1GHZ with a Gbit ethernet) from spitting out packets so fast that 1/2 of them get dropped. I eventually settled on a static 1ms delay every 64K of packets sent. Nothing to be proud of, but it works. As mentioned above, the number of chunk buffers is 16, although only a few of them are used in practice. The reason is that the network transfer speed is perhaps 10 times faster than the decompression and raw device write speed. To know for sure, I would have to figure out the per byte transfer rate for 350 MBs via network, via the time to decompress and write the 1.2GB of data to the raw disk. With such a big difference, its only necessary to ensure that you stay 1 or 2 chunks ahead, since you can request 10 chunks in the time it takes to write one of them.
-
Leigh B. Stoller authored
duplicate this code in the frisbee tree, build a version suitable for linking in with frisbee. I also modified the FrisbeeRead interface to pass back pointers instead of copying the data. There is no real performance benefit that I noticed, but it made me feel better not to copy 350 MBs of data another time. There is new initialization function that is called by the frisbee main program to set up a few things.
-
Leigh B. Stoller authored
of start==end==0! This causes the entire disk to compressed a second time!
-
- 06 Dec, 2001 1 commit
-
-
Robert Ricci authored
echoed to stderr.
-
- 30 Nov, 2001 1 commit
-
-
Leigh B. Stoller authored
-
- 09 Nov, 2001 1 commit
-
-
Chad Barb authored
-
- 22 Oct, 2001 1 commit
-
-
Chad Barb authored
-v is for verbosity, -t specifies a timeout number of seconds before exiting if there are no clients.
-
- 15 Oct, 2001 2 commits
- 03 Oct, 2001 2 commits
-
-
Chad Barb authored
Changed request behavior to try to fill the most full of incomplete buffers, rather than a random one. Also changed MB request to be a bit cleverer (tries to find contiguous received blocks and grow around those) Changes resulted in less "bimodal" behavior and lower overall average runtime.
-
Chad Barb authored
(now requests from random bucket, doesnt give priority to first bucket)
-
- 25 Sep, 2001 1 commit
-
-
Chad Barb authored
Added Port number to client arguments.
-
- 13 Sep, 2001 1 commit
-
-
Robert Ricci authored
Manages the launching of new frisbee servers, and recording the addresses the use in the database. If called for an image that is already associated with a running server, exits quitely. Otherwise, registers the new server's address and goes into the background, waiting for the frisbee server to die so that it can unregister the address.
-
- 06 Sep, 2001 2 commits
- 04 Sep, 2001 2 commits
-
-
Chad Barb authored
Added checksums (simple add for now,) as well as Multicast leave-group on exit.
-
Leigh B. Stoller authored
in. Sigh. Took me too long to find this. On the other hand, there is more debugging and more asserts in the code. Also a -d option to turn on progressive levels of debugging. Also changed the operation of imageunzip so that individual slice writes (ie, to unzip the BSD or Linux partition instead of the entire disk). Using the slice device (/dev/rda0s1) is actually a problem on BSD since it snoops writes to where the disklabel should be and alters the offsets. Even worse, on non BSD partitions it craps out completely because there is no disk label at all. This is really a dumb thing. So, I added code to read the MBR when the -s (slice) option is given, and use the MBR to compute the offsets from the beginning of the raw disk. Must always use the raw disk device now, and the new operation is: imageunzip all.ndz /dev/rad0 imageunzip -s 2 rhat.ndz /dev/rad0 Note that if you give a slice instead of a disk device, and there is a valid looking MBR in the slice (which is very possible), then things will get very confused. Not sure what to do about that yet.
-
- 02 Sep, 2001 1 commit
-
-
Leigh B. Stoller authored
-
- 31 Aug, 2001 1 commit
-
-
Chad Barb authored
Added instrumentation, a bit easier to see whats going on in the client now.
-
- 30 Aug, 2001 1 commit
-
-
Chad Barb authored
Modified makefile to create userfrisbee (the client) and frisbeed (the server). Added support for syslog to f_network.c, using an ugly makefile hack to make multiple versions of the f_network .o file available (one for the server, with #define SYSLOG prepended, one for the client using the normal printfs.)
-
- 24 Aug, 2001 5 commits
- 20 Aug, 2001 1 commit
-
-
Leigh B. Stoller authored
node!
-
- 01 Aug, 2001 2 commits
-
-
Leigh B. Stoller authored
This uses the pxe booted freebsd kernel and MFS. In addition, I use the standard testbed mechanism of specifying a startup command to run, which will do the imagezip to NFS mounted /proj/<pid>/.... The controlling script on paper sets up the database, reboots the node, and then waits for the startstatus to change. Then it resets the DB and reboots the node so that it returns back to its normal OS. The format of operation is: create_image <node> <imageid> <filename> Node must be under the user's control of course. The filename must reside in the node's project (/proj/<pid>/whatever) since thats the directory that is mounted by the testbed config software when the machine boots. The imageid already exists in the DB, and is used to determine what part of the disk to zip up (say, using the slice option to the zipper). Since this operation is rather time consuming, it does the usual trick of going to background and sending email status later.
-
Leigh B. Stoller authored
Fix up the print statements so that DOS slice numbering is consistent. Remove a few printfs that were not under the debug flag.
-
- 25 Jul, 2001 1 commit
-
-
Leigh B. Stoller authored
"skip" region so that you can have an unknown slice anyplace on the disk and it will just look like a free region to be skipped over.
-
- 27 Jun, 2001 2 commits
-
-
Mike Hibler authored
(i.e., make sure all device reads are block aligned/sized) Put some asserts in imageunzip to verify the same
-
Leigh B. Stoller authored
-
- 18 Jun, 2001 1 commit
-
-
Leigh B. Stoller authored
since they don't do what you expect.
-