• Leigh B. Stoller's avatar
    Checkpoint first working version of Frisbee Redux. This version · 86efdd9e
    Leigh B. Stoller authored
    requires the linux threads package to give us kernel level pthreads.
    From: Leigh Stoller <stoller@fast.cs.utah.edu>
    To: Testbed Operations <testbed-ops@fast.cs.utah.edu>
    Cc: Jay Lepreau <lepreau@cs.utah.edu>
    Subject: Frisbee Redux
    Date: Mon, 7 Jan 2002 12:03:56 -0800
    The server is multithreaded. One thread takes in requests from the
    clients, and adds the request to a work queue. The other thread processes
    the work queue in fifo order, spitting out the desrired block ranges. A
    request is a chunk/block/blockcount tuple, and most of the time the clients
    are requesting complete 1MB chunks. The exception of course is when
    individual blocks are lost, in which case the clients request just those
    subranges.  The server it totally asynchronous; It maintains a list of who
    is "connected", but thats just to make sure we can time the server out
    after a suitable inactive time. The server really only cares about the work
    queue; As long as the queue si non empty, it spits out data.
    The client is also multithreaded. One thread receives data packets and
    stuffs them in a chunkbuffer data structure. This thread also request more
    data, either to complete chunks with missing blocks, or to request new
    chunks. Each client can read ahead up 2 chunks, although with multiple
    clients it might actually be much further ahead as it also receives chunks
    that other clients requested. I set the number of chunk buffers to 16,
    although this is probably unnecessary as I will explain below. The other
    thread waits for chunkbuffers to be marked complete, and then invokes the
    imagunzip code on that chunk. Meanwhile, the other thread is busily getting
    more data and requesting/reading ahread, so that by the time the unzip is
    done, there is another chunk to unzip. In practice, the main thread never
    goes idle after the first chunk is received; there is always a ready chunk
    for it. Perfect overlap of I/O! In order to prevent the clients from
    getting overly synchronized (and causing all the clients to wait until the
    last client is done!), each client randomizes it block request order. This
    why we can retain the original frisbee name; clients end up catching random
    blocks flung out from the server until it has all the blocks.
    The single node speed is about 180 seconds for our current full image.
    Frisbee V1 compares at about 210 seconds. The two node speed was 181 and
    174 seconds. The amount of CPU used for the two node run ranged from 1% to
    4%, typically averaging about 2% while I watched it with "top."
    The main problem on the server side is how to keep boss (1GHZ with a Gbit
    ethernet) from spitting out packets so fast that 1/2 of them get dropped. I
    eventually settled on a static 1ms delay every 64K of packets sent. Nothing
    to be proud of, but it works.
    As mentioned above, the number of chunk buffers is 16, although only a few
    of them are used in practice. The reason is that the network transfer speed
    is perhaps 10 times faster than the decompression and raw device write
    speed. To know for sure, I would have to figure out the per byte transfer
    rate for 350 MBs via network, via the time to decompress and write the
    1.2GB of data to the raw disk. With such a big difference, its only
    necessary to ensure that you stay 1 or 2 chunks ahead, since you can
    request 10 chunks in the time it takes to write one of them.