1. 24 Nov, 2010 1 commit
    • Mike Hibler's avatar
      First crack at a frisbee "master server" for handling GET (download) requests. · a2a896ab
      Mike Hibler authored
      There are a couple of new packet types in the frisbee protocol which are
      exchanged via TCP with the master server: GETREQUEST and GETREPLY.  The
      client passes to the master server an opaque imageid and a couple of options
      and gets back the addr/port to use to actually download the image.  The
      implementation of the master server is fragile and is more of a test
      framework, Grant is working on a more robust master server.  I am mostly
      doing a backend that communicates with the Emulab DB to do its authentication
      and making the client changes.
      
      The client now uses the -S option to specify the IP address of the master
      server and the -F option to specify an imageid.  If no error is returned,
      the image is downloaded using the returned addr/port.  If -Q is used in place
      of -F, then the client makes a "status only" call getting back info about
      whether the named image is accessible to the client and whether a server is
      currently running.
      
      On the server side, the new master server (mserver.c) has an Emulab
      configuration "backend" that supports host-based authentication.
      The IP address of the caller is mapped to a node_id/pid/gid/eid combo
      that is used to determine access.  On a request, the specified imageid is
      treated either as a pathname (if it starts with '/') or an image identifier
      of the form "<pid>/<imagename>".  If it is a pathname, we check to make
      sure that pathname (after running through "realpath") is contained in one
      of the directories accessible to that node in its current experiment context;
      i.e., /share, /proj/<pid>, /groups/<pid>/<gid>, or /users/<swapper-uid>.
      If it is an image identifier, the DB is queried to ensure that access is
      allowed to that image; i.e., it must be "global" or in the appropriate
      project/group.
      
      The master server forks a frisbeed for each valid request, if one is not
      already running.  The multicast address selection is still based on the
      emulab_indicies.frisbee_index field, but the address/port/server info is no
      longer stored in the frisbee_blobs table (frisbee_pid, load_address,
      load_busy are not set).
      
      Note that this is not yet integrated in the os_load path.  Further work is
      required to replace frisbeelauncher.
      a2a896ab
  2. 28 Sep, 2009 1 commit
    • Mike Hibler's avatar
      Changes: · 8fd4b67e
      Mike Hibler authored
      Support for jumbo packets.  Setting WITH_JUMBO on the make command line
      will change the image block size to 8192 bytes and reduces the number of
      block per chunk to 256 (to maintain the 1MB chunk size for compat with old
      images).  The default is still 1024.
      
      Added the notion of a "dubious" chunk buffer in the client.  If an incoming
      chunk buffer is marked as CHUNK_DUBIOUS, then its contents can be evicted and
      the buffer reused for a more promising chunk.  This is a crude replacement
      mechanism that is currently only used in one place: if we miss part of a
      chunk and the server switches to sending a new chunk for which we have no
      free buffer, we switch to collecting the new chunk.  The reasoning is that
      it will take a while for the server to switch back to completing the former
      chunk, during which time it may send one or more complete chunks that we
      could more fruitfully use (decompress and write out).
      
      Changed the meaning of the "done" field for a chunk.  It used to mean either
      that we have completely processed the chunk or that we are currently collecting
      it.  It took additional work (scanning all chunk buffers) to differentiate
      these cases, so I make it explicit.
      
      Allow the client and server to dynamically determine the maximum socket
      buffer size.
      
      Fix a couple more on-the-wire data structure size/alignment issues that
      showed up on a 64-bit OS.
      
      A few minor speedups to the bitmap handling code.  Think: "rearranging deck
      chairs on the Titanic" here.  We need more serious algorithmic changes
      to scale all this code going forward.
      
      Add some more TRACE events and refine what is already there.
      
      Added some hacks to allow frisbee client/server to run on the same machine.
      We had made it remarkably hard to do this.  But then again, why would you
      want to!  Look for SAME_HOST_HACK in the makefile.
      8fd4b67e
  3. 19 Aug, 2009 1 commit
  4. 25 May, 2007 1 commit
  5. 09 Jan, 2007 1 commit
    • Mike Hibler's avatar
      Frisbee MFS changes: · 346c0562
      Mike Hibler authored
       * support FreeBSD 6
       * client-side changes to support enable/disable of ACPI via slicefix
       * use dynamically linked Emulab binaries in frisbee MFS (for size)
      346c0562
  6. 01 Dec, 2006 1 commit
    • Mike Hibler's avatar
      Bug fixes from Annette DeSchon <deschon@ISI.EDU> and Keith Sklower · 8ea641c1
      Mike Hibler authored
      <sklower@vangogh.CS.Berkeley.EDU> for the following, related to the -z (zero)
      option in imageunzip/frisbee:
      
        1. For the case where a full-disk image is smaller than
           the disk the image is being unzipped onto, we added
           code to zero the area between the end of the image and
           the end of the disk.
      
        2. During the unzipping process, when zeros are being written
           at the end of a chunk, a write() that returned a length
           different from the expected value previously caused an
           infinite loop.  We noticed this problem at ISI, on a number
           of pc733s, which we suspect may have (relatively minor)
           hardware disk problems.
      
      The latter addressed a Mike-o that has existed for 4 years.  Call it failure
      resilient computing or just plain denial, but because of a botched conditional,
      I was ignoring failed writes to the disk.  This lead to one of those infinite loop
      thingees if you actually had a bad disk.
      8ea641c1
  7. 16 Nov, 2004 1 commit
  8. 12 Nov, 2004 1 commit
  9. 03 Nov, 2004 1 commit
  10. 28 Oct, 2004 1 commit
    • Mike Hibler's avatar
      Minor tweaks from a one-day binge of performance analysis. · 1a76e634
      Mike Hibler authored
      The only meaningful change was to insert a sched_yield() in the frisbee
      decompressor path.  Apparently, the decompressor can run long enough to
      cause the incoming socket buffer to overflow.  I was under the assumption
      that the decompressor would not run much longer than a single time slice
      (0.001 seconds, about 8 packets) before its priority would force it to
      be context switched.  But it was running much longer than that!  Forcing
      a periodic yield seems to have taken care of this.
      
      One other cause of retransmitted blocks that I saw was where the server
      was taking a long time to read data from a file (up to 0.25 seconds).
      This would stall the clients and force them to rerequest blocks (which
      they do after about 0.10 seconds).  We can improve on this by splitting
      the file reading off to a seperate thread.
      
      Most other changes are related to the event logging code.
      1a76e634
  11. 29 Sep, 2004 1 commit
  12. 10 May, 2004 1 commit
  13. 22 Mar, 2004 1 commit
  14. 08 Mar, 2004 1 commit
  15. 15 Oct, 2003 1 commit
    • Mike Hibler's avatar
      Uniform syslog'ing. Change everything I could find to use a syslog facility · cc6d6fa7
      Mike Hibler authored
      as defined in the defs-* file (e.g. "TBLOGFACIL=local2").  The default is
      "local5" which is what we are setup to use so you shouldn't need to mess
      with your defs- file!
      
      perl scripts just get this value configured in when configure is run.
      C programs get the value in two ways.  For programs that are intimate with
      the testbed infrastructure, and include "config.h", they just get it from
      that file.  For programs that we sometimes use outside the Emulab build
      environment (e.g., frisbee, capture) and that don't include config.h,
      the value is set via a "-DLOG_TESTBED=..." in the GNUmakefile build line.
      If the value isn't set, it defaults to what it used to be (usually LOG_USER).
      
      Still to do: healthd, hmcd (whose build doesn't seem to be completely
      integrated) and plabdaemon.in (since its icky python :-)
      cc6d6fa7
  16. 14 Jun, 2003 4 commits
  17. 09 Apr, 2003 2 commits
    • Mike Hibler's avatar
      Once again remove the hacky/hokey LOSSRATE code, · 8f349249
      Mike Hibler authored
      hopefully never to return again!
      8f349249
    • Mike Hibler's avatar
      Frisbee general: · 9e55b0b1
      Mike Hibler authored
      1. Implement PREQUEST message which passes a bit map of desired blocks.
         We still use the REQUEST message (start block + number of blocks) for
         full chunk requests as that is more efficient.  This message also
         includes a flag indicating whether it is a retry of a request we
         originally made or not.  This gives the server more accurate loss info.
      
      2. More stats and tracing goo.
      
      
      Frisbee client:
      
      1. Add 'C' and 'W' command line options to specify amount of memory
         for chunk buffers (network buffering) and for write buffers (disk
         buffering).  The Emulab frisbee startup script uses these to partition
         up all the available memory on a machine.  Previously we were just
         using a fixed ~128MB even though our machines have 256 or 512MB of
         memory.  Also add the 'M' option which specifies the overall memory,
         internally dividing it up between chunk buffers and write buffers.
      
      2. Add 'S' command line option to explicitly specify the server.  This
         allows us to make a feeb...um, "lightweight" authentication check
         on incoming messages.
      
      3. Use the common BlockMap data struct to track which pieces of a chunk
         we have received.  This is easily inverted to make PREQUESTS and it is
         also smaller than the older byte-per-block technique.
      
      4. Allow partial request-ahead.  Previously, we only issued request-ahead
         if there were enough empty chunk buffers for a maximum (2) request-ahead.
      
      Frisbee server:
      
      1. Use BlockMap for workQ elements.  An easy way to allow a complete merge
         of incoming requests with existing ones.
      
      2. Check for overlap of incoming requests with the request currently
         being serviced.  This happens surprisingly often.
      
      3. Dubious: burst gap becomes burst interval.  The latter takes into
         account the time required to read data, etc., in other words, we now
         have variable-sized gaps and put out bursts at specific times rather
         than having fixed gaps and putting out bursts at variable times.
         This gives us more accurate pacing over shorter time periods.  I
         thought this might be important for dynamic pacing.
      
      4. Add 'W' command line option to specify a target bandwidth.  Frisbeed
         will use this to calculate a burst size/interval.
      
      5. Rewrote the dynamic pacing code.  It is now easily as bad as before
         if not worse.  But it does have fewer magic constants!  Needs to be
         redone by someone who understands the TCP-friendly rate equation.
      
      Imagezip:
      
      1. add 'R' option to specify one or more partitions for which to force
         raw (naive) compression even if the FS format is understood.  Useful
         for benchmarking.
      
      2. add 'D' option to allow "dangerous" writes.  In this mode, we don't
         do the fsync's or retries of failed writes.  Overrides the hack we put
         in for NFS.  Use this if writing to a local filesystem (or /dev/null).
      
      3. Eliminate an extra copy of every chunk header.
      
      Imageunzip:
      
      1. Eliminate extra copy of decompressed data that we were doing between
         the single decompression buffer and the disk buffers.  Helps on slow
         machines (like gatech's 300Mhz machines with 66MHz memory bus).
      
      2. Allow dynamic number of variable-sized write buffers.  Total memory
         not to exceed the writebufmem limit.  Previously we had a small number
         of fixed-size (256K) buffers.
      
      3. Add debugging 'C' option to just compute a single CRC of the decompressed
         image.  Back-ported to older imageunzip and used to make sure my write
         buffer changes were correct.  Maybe handy for similar massive changes
         in the future.
      9e55b0b1
  18. 06 Jan, 2003 1 commit
  19. 11 Dec, 2002 1 commit
    • Mike Hibler's avatar
      Server: back to using a condvar since they seem to be fixed. · 2e77122f
      Mike Hibler authored
      Server: make file readsize independent of burstsize (previously
      readsize had to be a divisor of burstsize).  A subtle side-effect
      is that the dynamic burst rate is recalcluated at the conslusion
      of every burst instead of after every readsize count of blocks has
      been sent (less than a burst)  This just seems to be more logical.
      
      Client: add "-T DOS-type" option to tell frisbee, when in slice
      mode, to set the type of the slice in the DOS partition table.
      This is useful if you are dropping say a BSD filesystem into
      an unused slice, you don't have to go back later and set this
      with fdisk.  Considered making this info part of the image
      itself (recorded by imagezip when creating a slice image),
      but decided against it.
      2e77122f
  20. 26 Nov, 2002 2 commits
    • Mike Hibler's avatar
      Remove hokey loss rate code · 1f897669
      Mike Hibler authored
      1f897669
    • Mike Hibler's avatar
      Commit of USENIX driven improvements: · 2ff95cee
      Mike Hibler authored
      1. Client: add "NAK avoidance."  We track our (and others, via snooping) block
         requests and avoid making re-requests unless it has been "long enough."
      
      2. Server: more aggressive merging of requests in the work queue.  For every
         new request, look for any overlap with an existing entry.
      
      3. Server: from Leigh: first cut at dynamic rate adjustment.  Can be enabled
         with -D option.
      
      4. Both: change a lot of the magic constants into runtime variables so that
         they can be adjusted on the command line or via the event interface (see
         below).
      
      5. Add code to do basic validatation of incoming packets.
      
      6. Client: randomization of block request order is now optional.
      
      7. Client: startup delay is optional and specified via a parameter N which
         says "randomly delay between 0 and N seconds before attempting to join."
      
      8. Both: add a new LEAVE message which reports back all the client stats to
         the server (which logs them).
      
      9. Both: attempt to comment some of the magic values in decls.h.
      
      10. Both: add cheezy hack to fake packet loss.  Disabled by default, see
         the GNUmakefile.  This code is coming out right after I archive it with
         this commit.
      
      11. Add tracing code.  Frisbee server/client will record a number of
         interesting events in a memory buffer and dump them at the end.  Not
         compiled in by default, see the GNUmakefile (NEVENTS) for turning this on.
      
      12. Not to be confused with the events above, also added testbed event system
         code so that frisbee clients can be remotely controlled.  This is a hack
         for measurement purposes (it requires a special rc.frisbee in the frisbee
         MFS).  Allows changing of all sorts of parameters as well as implementing
         a crude form of identification allowing you to start only a subset of
         clients.  Interface is via tevc with commands like:
      	tevc -e testbed,frisbee now frisbee start maxclients=5 readahead=5
      	tevc -e testbed,frisbee now frisbee stop exitstatus=42
         Again, this is not compiled in by default as it makes the client about
         4x bigger.  See the GNUmakefile for turning it on.
      2ff95cee
  21. 31 Oct, 2002 1 commit
  22. 07 Jul, 2002 1 commit
  23. 10 Jan, 2002 1 commit
  24. 07 Jan, 2002 1 commit
    • Leigh Stoller's avatar
      Checkpoint first working version of Frisbee Redux. This version · 86efdd9e
      Leigh Stoller authored
      requires the linux threads package to give us kernel level pthreads.
      
      From: Leigh Stoller <stoller@fast.cs.utah.edu>
      To: Testbed Operations <testbed-ops@fast.cs.utah.edu>
      Cc: Jay Lepreau <lepreau@cs.utah.edu>
      Subject: Frisbee Redux
      Date: Mon, 7 Jan 2002 12:03:56 -0800
      
      Server:
      The server is multithreaded. One thread takes in requests from the
      clients, and adds the request to a work queue. The other thread processes
      the work queue in fifo order, spitting out the desrired block ranges. A
      request is a chunk/block/blockcount tuple, and most of the time the clients
      are requesting complete 1MB chunks. The exception of course is when
      individual blocks are lost, in which case the clients request just those
      subranges.  The server it totally asynchronous; It maintains a list of who
      is "connected", but thats just to make sure we can time the server out
      after a suitable inactive time. The server really only cares about the work
      queue; As long as the queue si non empty, it spits out data.
      
      Client:
      The client is also multithreaded. One thread receives data packets and
      stuffs them in a chunkbuffer data structure. This thread also request more
      data, either to complete chunks with missing blocks, or to request new
      chunks. Each client can read ahead up 2 chunks, although with multiple
      clients it might actually be much further ahead as it also receives chunks
      that other clients requested. I set the number of chunk buffers to 16,
      although this is probably unnecessary as I will explain below. The other
      thread waits for chunkbuffers to be marked complete, and then invokes the
      imagunzip code on that chunk. Meanwhile, the other thread is busily getting
      more data and requesting/reading ahread, so that by the time the unzip is
      done, there is another chunk to unzip. In practice, the main thread never
      goes idle after the first chunk is received; there is always a ready chunk
      for it. Perfect overlap of I/O! In order to prevent the clients from
      getting overly synchronized (and causing all the clients to wait until the
      last client is done!), each client randomizes it block request order. This
      why we can retain the original frisbee name; clients end up catching random
      blocks flung out from the server until it has all the blocks.
      
      Performance:
      The single node speed is about 180 seconds for our current full image.
      Frisbee V1 compares at about 210 seconds. The two node speed was 181 and
      174 seconds. The amount of CPU used for the two node run ranged from 1% to
      4%, typically averaging about 2% while I watched it with "top."
      
      The main problem on the server side is how to keep boss (1GHZ with a Gbit
      ethernet) from spitting out packets so fast that 1/2 of them get dropped. I
      eventually settled on a static 1ms delay every 64K of packets sent. Nothing
      to be proud of, but it works.
      
      As mentioned above, the number of chunk buffers is 16, although only a few
      of them are used in practice. The reason is that the network transfer speed
      is perhaps 10 times faster than the decompression and raw device write
      speed. To know for sure, I would have to figure out the per byte transfer
      rate for 350 MBs via network, via the time to decompress and write the
      1.2GB of data to the raw disk. With such a big difference, its only
      necessary to ensure that you stay 1 or 2 chunks ahead, since you can
      request 10 chunks in the time it takes to write one of them.
      86efdd9e