os/frisbee.redux/decls.h · 295521639f7f353014349e2d5bad4e136ae9f683 · emulab / emulab-devel

Checkpoint first working version of Frisbee Redux. This version · 86efdd9e

Leigh B. Stoller authored Jan 07, 2002

requires the linux threads package to give us kernel level pthreads.

From: Leigh Stoller <stoller@fast.cs.utah.edu>
To: Testbed Operations <testbed-ops@fast.cs.utah.edu>
Cc: Jay Lepreau <lepreau@cs.utah.edu>
Subject: Frisbee Redux
Date: Mon, 7 Jan 2002 12:03:56 -0800

Server:
The server is multithreaded. One thread takes in requests from the
clients, and adds the request to a work queue. The other thread processes
the work queue in fifo order, spitting out the desrired block ranges. A
request is a chunk/block/blockcount tuple, and most of the time the clients
are requesting complete 1MB chunks. The exception of course is when
individual blocks are lost, in which case the clients request just those
subranges. The server it totally asynchronous; It maintains a list of who
is "connected", but thats just to make sure we can time the server out
after a suitable inactive time. The server really only cares about the work
queue; As long as the queue si non empty, it spits out data.

Client:
The client is also multithreaded. One thread receives data packets and
stuffs them in a chunkbuffer data structure. This thread also request more
data, either to complete chunks with missing blocks, or to request new
chunks. Each client can read ahead up 2 chunks, although with multiple
clients it might actually be much further ahead as it also receives chunks
that other clients requested. I set the number of chunk buffers to 16,
although this is probably unnecessary as I will explain below. The other
thread waits for chunkbuffers to be marked complete, and then invokes the
imagunzip code on that chunk. Meanwhile, the other thread is busily getting
more data and requesting/reading ahread, so that by the time the unzip is
done, there is another chunk to unzip. In practice, the main thread never
goes idle after the first chunk is received; there is always a ready chunk
for it. Perfect overlap of I/O! In order to prevent the clients from
getting overly synchronized (and causing all the clients to wait until the
last client is done!), each client randomizes it block request order. This
why we can retain the original frisbee name; clients end up catching random
blocks flung out from the server until it has all the blocks.

Performance:
The single node speed is about 180 seconds for our current full image.
Frisbee V1 compares at about 210 seconds. The two node speed was 181 and
174 seconds. The amount of CPU used for the two node run ranged from 1% to
4%, typically averaging about 2% while I watched it with "top."

The main problem on the server side is how to keep boss (1GHZ with a Gbit
ethernet) from spitting out packets so fast that 1/2 of them get dropped. I
eventually settled on a static 1ms delay every 64K of packets sent. Nothing
to be proud of, but it works.

As mentioned above, the number of chunk buffers is 16, although only a few
of them are used in practice. The reason is that the network transfer speed
is perhaps 10 times faster than the decompression and raw device write
speed. To know for sure, I would have to figure out the per byte transfer
rate for 350 MBs via network, via the time to decompress and write the
1.2GB of data to the raw disk. With such a big difference, its only
necessary to ensure that you stay 1 or 2 chunks ahead, since you can
request 10 chunks in the time it takes to write one of them.

86efdd9e