1. 19 Dec, 2017 1 commit
    • Mike Hibler's avatar
      Revenge of the Delta Images. · a79af843
      Mike Hibler authored
      Can't live with em, can't kill em dead... When writing my hack
      routine to convert an image path into an imageid, I failed to
      consider the .ddz (delta image) suffix.
      a79af843
  2. 17 Jan, 2017 1 commit
    • Mike Hibler's avatar
      Implement heartbeat/status reports in Frisbee. · 2be46ba4
      Mike Hibler authored
      There are three pieces here, a change to the frisbee protocol itself, an
      Emulab event component to get status back to the portal, and the surrounding
      infrastructure to make it all work.
      
      Frisbee heartbeat messages:
      
      Added a new message type to the frisbee protocol, "Progress". In theory it
      operates by having the server send a multicast progress request to its clients
      which includes an interval at which to report (or "just once") and an
      indication of what to report (nothing, progress summary, or full stats). The
      client then sends unicast "fire and forget" UDP replies according to that
      schedule. However, I took a shortcut for the moment and just added a command
      line option to the client to tell it to report a summary at the indicated
      interval (-H <interval>).  So the server never sends requests.
      
      This is implemented in the client by a fourth thread since I wanted it to
      operate independent of packet reception (which would cause clients to report
      in a highly synchronized fashion due to multicast). The server instance just
      logs progress reports into its log.
      
      This protocol addition should be fully backward compatible as both client and
      server ignore (but log) unknown messages.
      
      Emulab progress report events:
      
      When this is compiled in (-DEMULAB_EVENTS) and turned on (-E <server>), the
      frisbee server instances will send a FRISBEEPROGRESS event to the indicated
      event server for every progress report it receives (in addition to logging the
      events to its own log). Right now it will create an event with key/value pairs
      for the information in a client summary reply:
      
      TSTAMP is the client's time at which it sends the event. Could be used by the
      received to determine latency of the report if it cared (and if it assumed
      that the clocks are in sync). We don't care about this.
      
      SEQUENCE is the report number. Again, could be used by the receiver, in this
      case to detect loss, if it cared. We don't.
      
      CHUNKS_RECV is complete chunks that the client has received from the network.
      CHUNKS_DECOMP is chunks decompressed by the client.  BYTES_WRITTEN is bytes
      written to disk by the client.
      
      Any of the three can be used by the event receiver as an indication of life
      and/or progress. However, only the last would be a reasonable indicator of
      time remaining since it is the last (and slowest) phase of imaging. To
      estimate time remaining we could compare that value to the amount of
      uncompressed data that is in the image. This makes the sketchy assumptions
      that time for writes to the disk are uniform and that the number and distance
      of seeks is uniform, but it is better than a sharp stick in the eye.
      
      Emulab infrastructure:
      
      There is a new sitevar "images/frisbee/heartbeat" which can be set to a
      non-zero value to tell the frisbee MFS to fire off frisbee with -H <value>
      and thus make reports. The default value of zero means to not make reports.
      The tmcd "loadinfo" command sends this through via the HEARTBEAT=<value>
      param.
      
      REQUIRED A TMCD VERSION BUMP TO 41.
      2be46ba4
  3. 18 Feb, 2014 1 commit
  4. 10 Feb, 2014 3 commits
    • Mike Hibler's avatar
      Add stat to keep track of "partial chunk drops". · 6c55bc25
      Mike Hibler authored
      These are drops of the so-called dubious chunks. Dubious chunks are those
      which we partially filled, but then we started receiving pieces of another
      chunk before completing the first. We mark that first chunk as dubious under
      the assumption that the remainder of that chunk got dropped (by us or on
      the wire) and we won't be seeing the remaining blocks for some time.
      6c55bc25
    • Mike Hibler's avatar
      Make socket buffer size more configurable. · 2f1a8340
      Mike Hibler authored
      It appeared to be before, but wasn't really. The -k option for both client
      and server will set the max socketbuf size in KB (NOTE: THIS USED TO BE MB!)
      The actual socketbuf size will then be the min of that and what the system
      supports.
      
      The client stats now include the sockbuf size of the run.
      2f1a8340
    • Mike Hibler's avatar
      Mega-commit renaming some symbols in utils.c. · 56d81336
      Mike Hibler authored
      To avoid namespace conflicts (e.g., with libm's "log" function).
      56d81336
  5. 24 Sep, 2012 1 commit
    • Eric Eide's avatar
      Replace license symbols with {{{ }}}-enclosed license blocks. · 6df609a9
      Eric Eide authored
      This commit is intended to makes the license status of Emulab and
      ProtoGENI source files more clear.  It replaces license symbols like
      "EMULAB-COPYRIGHT" and "GENIPUBLIC-COPYRIGHT" with {{{ }}}-delimited
      blocks that contain actual license statements.
      
      This change was driven by the fact that today, most people acquire and
      track Emulab and ProtoGENI sources via git.
      
      Before the Emulab source code was kept in git, the Flux Research Group
      at the University of Utah would roll distributions by making tar
      files.  As part of that process, the Flux Group would replace the
      license symbols in the source files with actual license statements.
      
      When the Flux Group moved to git, people outside of the group started
      to see the source files with the "unexpanded" symbols.  This meant
      that people acquired source files without actual license statements in
      them.  All the relevant files had Utah *copyright* statements in them,
      but without the expanded *license* statements, the licensing status of
      the source files was unclear.
      
      This commit is intended to clear up that confusion.
      
      Most Utah-copyrighted files in the Emulab source tree are distributed
      under the terms of the Affero GNU General Public License, version 3
      (AGPLv3).
      
      Most Utah-copyrighted files related to ProtoGENI are distributed under
      the terms of the GENI Public License, which is a BSD-like open-source
      license.
      
      Some Utah-copyrighted files in the Emulab source tree are distributed
      under the terms of the GNU Lesser General Public License, version 2.1
      (LGPL).
      6df609a9
  6. 26 Apr, 2012 1 commit
  7. 20 Mar, 2012 1 commit
  8. 26 Aug, 2011 1 commit
  9. 27 May, 2011 1 commit
  10. 18 May, 2011 1 commit
    • Mike Hibler's avatar
      Support image PUT (aka, "upload") and assorted minor changes. · 77dbad39
      Mike Hibler authored
      1. Support for PUT.
      
      The big change is support for uploading via the master server, based heavily
      on the prototype that Grant did. Currently only host-based (IP-based)
      authentication is done as is the case with download. Grant's SSL-based
      authentication code is "integrated" but has not even been compiled in.
      
      The PUT protocol allows for assorted gewgaws, like specifying a maximum size,
      setting a timeout value, returning size and signature info, etc.
      
      There is a new, awkwardly-named client utility "frisupload" which, like the
      download client, takes an "image ID" as an argument and requests to upload
      (PUT) that image via the master server. As with download, the image ID can
      be either of the form "<pid>/<emulab-image-name>", to upload/update an actual
      Emulab image or it can start with a "/" in which case it is considered to
      be a pathname on the server.
      
      On the server side, the master server takes PUT requests, verifies permission
      to upload the image, fires up a separate instance of an upload daemon (with
      the even catchier moniker "frisuploadd"), and returns the unicast addr/port
      info to the client which then begins the upload. The master server also acts
      as a traffic cop to make sure that downloads and uploads (or uploads and
      uploads) don't overlap.
      
      This has been integrated into the Emulab "create image" process in a
      backward-compatible way (i.e., so old admin MFSes will continue to work).
      Boy, was that fun. One not-so-desirable effect of this integration is that
      images now traverse our network twice, once to upload from node to boss and
      once for boss to write out the image file across NFS to ops. This is not
      really something that should be "fixed" in frisbee, it is only "undesirable"
      because we have a crappy NFS server.
      
      What has NOT been done includes: support of hierarchical PUT operations
      (we don't need it for either the elabinelab or subboss case), support for
      uploading standard images stored on boss (we really want something better
      than host-based authentication here), and the aforementioned support of
      SSL-based authentication.
      
      2. Other tidbits that got mixed in with PUT support:
      
      Added two new site variables:
          images/frisbee/maxrate_std
          images/frisbee/maxrate_usr
      which replace the hardwired (in mfrisbeed and frisbeelauncher before that)
      bandwidth limits for image download. mfrisbeed reads these (and the
      images/create/* variables) when it starts up or receives a HUP signal.
      These could be read from the DB on every GET/PUT, but they really don't change
      much and I needed something to test the reread-the-config-on-a-HUP code!
      
      Fixed avoidance of "problematic multicast addresses" so it would actually
      work as intended.
      
      Lots of internal "refactoring" to make up for things I did wrong the first
      time and to give the general impression that "Wow, Mike did a LOT!"
      77dbad39
  11. 17 Jan, 2011 1 commit
    • Mike Hibler's avatar
      Random Windows (Cygwin) fixes. · 975cc86e
      Mike Hibler authored
      Client-side builds again, but haven't got node to boot correctly.
      Need to get pubsubd installed correctly as a service in place of elvind.
      975cc86e
  12. 13 Jan, 2011 1 commit
  13. 13 Dec, 2010 1 commit
    • Mike Hibler's avatar
      Aerformed. If the mserver receiving the request allows the caller to be a · c5b7cceb
      Mike Hibler authored
      proxy, then it will perform its checks against the hostIP provided rather
      than the IP of the message sender.  For the Emulab subboss case, subboss
      nodes (as determined from the DB) are allowed to explicitly specify hosts
      that are under their control.  The master server on real boss will run with
      proxying enabled.
      
      Also, in a fit of madness, I added a version number to the master server
      protocol.  This way, if at some distant point in the future (say next week)
      I realize I screwed up the protocol, I can fix it without resorting to creative
      retrofitting of a version number (see imagezip) or the more Orwellian
      eradication and denial of past versions (what I am doing now). Furthermore,
      using an ill-thought-out insight, I made the version number be an ASCII string
      in case I decide to change to an all-text protocol at some equally distant
      point in the future.
      c5b7cceb
  14. 03 Dec, 2010 1 commit
    • Mike Hibler's avatar
      Hierarchical downloading is now implemented. · 21b01bbb
      Mike Hibler authored
      The master server can be started with a "parent" mserver that it can call
      if it doesn't have an image.  The master server will return an EAGAIN type
      error to any clients that contact it while it is downloading an image from
      its parent.  The client now has a "-B N" option to tell it to try again
      every N seconds as long as it gets back that error.  This is the "store and
      forward" mode.  The mserver also has a "cut through" style where it will
      return to clients the mcast info it got from its parent so they can download
      directly from the parent until the local mserver has it.
      21b01bbb
  15. 02 Dec, 2010 1 commit
    • Mike Hibler's avatar
      Inching toward recursive use, ala what is needed for elabinelab or subbosses. · 2ead2a68
      Mike Hibler authored
      Add the ability of the master server to have a "parent" from which it can
      download an image if it doesn't have it or if the image is out of date.
      Had to add some more goo to the GET reply, notably a hash so that we can
      check for up-to-dateness.
      
      The actual part where we upcall to the parent isn't done yet, that is why
      this is "inching toward" and not "leaping and bounding toward"...
      
      Also redid the child process management to not use SIGCHLD, no need for that.
      2ead2a68
  16. 29 Oct, 2010 1 commit
    • Mike Hibler's avatar
      Improve regular (non-image) file transfer via frisbee. · 78f3f8d3
      Mike Hibler authored
      Basically, make it possible to transfer a non imagezip image.  Previously
      you had to wrap a regular file as an image in order to transfer it.  The
      big hang up was that the frisbee protocol could only transfer files that
      were a multiple of 1MB (the chunk size).
      
      This commit changes the frisbee protocol slightly to allow transfer of
      non-1MB-multiple files.  The protocol change was to add a new JOIN message
      that returns the size of the file in bytes rather than in blocks.  This
      allows the client to know that the file in question is not a multiple of 1MB
      and allows it to request the correct partial number of blocks for the
      final chunk and to extract the correct amount of data from the final 1K block
      (that block is still padded to 1K by the server).  For the server side, the
      request mostly allows it to do some sanity checking.  The fact that the
      server is started with a file that is not a multiple of 1MB is what triggers
      it to know about partial chunks.  The sanity checking is that the server will
      not acknowledge clients that attempt to join with a version 1 JOIN message,
      since nothing good would come of that pairing.
      
      On the client side, frisbee must be invoked with the -N (nodecompress) option
      in order to issue a v2 JOIN.  See the comment in the code for the rationale,
      but it is largely a backward compat feature.
      
      While I was changing the JOIN message, I added a couple of other future
      features.  One is that by passing back a 64-bit value for the size of the
      image in bytes, we can feed bigger images.  However there is still much to
      be done to realize this.  The other was to add blocksize/chunksize fields
      in the message so that the server/client can negotiate the transfer parameters,
      e.g., 1024 blocks of 1024 bytes vs. 256 blocks of 8192 bytes, the latter being
      for "jumbo" packets on a Gb ethernet.  But there is still more to be done to
      get this working too.
      78f3f8d3
  17. 28 Sep, 2009 1 commit
    • Mike Hibler's avatar
      Changes: · 8fd4b67e
      Mike Hibler authored
      Support for jumbo packets.  Setting WITH_JUMBO on the make command line
      will change the image block size to 8192 bytes and reduces the number of
      block per chunk to 256 (to maintain the 1MB chunk size for compat with old
      images).  The default is still 1024.
      
      Added the notion of a "dubious" chunk buffer in the client.  If an incoming
      chunk buffer is marked as CHUNK_DUBIOUS, then its contents can be evicted and
      the buffer reused for a more promising chunk.  This is a crude replacement
      mechanism that is currently only used in one place: if we miss part of a
      chunk and the server switches to sending a new chunk for which we have no
      free buffer, we switch to collecting the new chunk.  The reasoning is that
      it will take a while for the server to switch back to completing the former
      chunk, during which time it may send one or more complete chunks that we
      could more fruitfully use (decompress and write out).
      
      Changed the meaning of the "done" field for a chunk.  It used to mean either
      that we have completely processed the chunk or that we are currently collecting
      it.  It took additional work (scanning all chunk buffers) to differentiate
      these cases, so I make it explicit.
      
      Allow the client and server to dynamically determine the maximum socket
      buffer size.
      
      Fix a couple more on-the-wire data structure size/alignment issues that
      showed up on a 64-bit OS.
      
      A few minor speedups to the bitmap handling code.  Think: "rearranging deck
      chairs on the Titanic" here.  We need more serious algorithmic changes
      to scale all this code going forward.
      
      Add some more TRACE events and refine what is already there.
      
      Added some hacks to allow frisbee client/server to run on the same machine.
      We had made it remarkably hard to do this.  But then again, why would you
      want to!  Look for SAME_HOST_HACK in the makefile.
      8fd4b67e
  18. 27 Jun, 2007 1 commit
  19. 14 Jun, 2003 1 commit
  20. 09 Apr, 2003 1 commit
    • Mike Hibler's avatar
      Frisbee general: · 9e55b0b1
      Mike Hibler authored
      1. Implement PREQUEST message which passes a bit map of desired blocks.
         We still use the REQUEST message (start block + number of blocks) for
         full chunk requests as that is more efficient.  This message also
         includes a flag indicating whether it is a retry of a request we
         originally made or not.  This gives the server more accurate loss info.
      
      2. More stats and tracing goo.
      
      
      Frisbee client:
      
      1. Add 'C' and 'W' command line options to specify amount of memory
         for chunk buffers (network buffering) and for write buffers (disk
         buffering).  The Emulab frisbee startup script uses these to partition
         up all the available memory on a machine.  Previously we were just
         using a fixed ~128MB even though our machines have 256 or 512MB of
         memory.  Also add the 'M' option which specifies the overall memory,
         internally dividing it up between chunk buffers and write buffers.
      
      2. Add 'S' command line option to explicitly specify the server.  This
         allows us to make a feeb...um, "lightweight" authentication check
         on incoming messages.
      
      3. Use the common BlockMap data struct to track which pieces of a chunk
         we have received.  This is easily inverted to make PREQUESTS and it is
         also smaller than the older byte-per-block technique.
      
      4. Allow partial request-ahead.  Previously, we only issued request-ahead
         if there were enough empty chunk buffers for a maximum (2) request-ahead.
      
      Frisbee server:
      
      1. Use BlockMap for workQ elements.  An easy way to allow a complete merge
         of incoming requests with existing ones.
      
      2. Check for overlap of incoming requests with the request currently
         being serviced.  This happens surprisingly often.
      
      3. Dubious: burst gap becomes burst interval.  The latter takes into
         account the time required to read data, etc., in other words, we now
         have variable-sized gaps and put out bursts at specific times rather
         than having fixed gaps and putting out bursts at variable times.
         This gives us more accurate pacing over shorter time periods.  I
         thought this might be important for dynamic pacing.
      
      4. Add 'W' command line option to specify a target bandwidth.  Frisbeed
         will use this to calculate a burst size/interval.
      
      5. Rewrote the dynamic pacing code.  It is now easily as bad as before
         if not worse.  But it does have fewer magic constants!  Needs to be
         redone by someone who understands the TCP-friendly rate equation.
      
      Imagezip:
      
      1. add 'R' option to specify one or more partitions for which to force
         raw (naive) compression even if the FS format is understood.  Useful
         for benchmarking.
      
      2. add 'D' option to allow "dangerous" writes.  In this mode, we don't
         do the fsync's or retries of failed writes.  Overrides the hack we put
         in for NFS.  Use this if writing to a local filesystem (or /dev/null).
      
      3. Eliminate an extra copy of every chunk header.
      
      Imageunzip:
      
      1. Eliminate extra copy of decompressed data that we were doing between
         the single decompression buffer and the disk buffers.  Helps on slow
         machines (like gatech's 300Mhz machines with 66MHz memory bus).
      
      2. Allow dynamic number of variable-sized write buffers.  Total memory
         not to exceed the writebufmem limit.  Previously we had a small number
         of fixed-size (256K) buffers.
      
      3. Add debugging 'C' option to just compute a single CRC of the decompressed
         image.  Back-ported to older imageunzip and used to make sure my write
         buffer changes were correct.  Maybe handy for similar massive changes
         in the future.
      9e55b0b1
  21. 08 Jan, 2003 1 commit
  22. 26 Nov, 2002 1 commit
    • Mike Hibler's avatar
      Commit of USENIX driven improvements: · 2ff95cee
      Mike Hibler authored
      1. Client: add "NAK avoidance."  We track our (and others, via snooping) block
         requests and avoid making re-requests unless it has been "long enough."
      
      2. Server: more aggressive merging of requests in the work queue.  For every
         new request, look for any overlap with an existing entry.
      
      3. Server: from Leigh: first cut at dynamic rate adjustment.  Can be enabled
         with -D option.
      
      4. Both: change a lot of the magic constants into runtime variables so that
         they can be adjusted on the command line or via the event interface (see
         below).
      
      5. Add code to do basic validatation of incoming packets.
      
      6. Client: randomization of block request order is now optional.
      
      7. Client: startup delay is optional and specified via a parameter N which
         says "randomly delay between 0 and N seconds before attempting to join."
      
      8. Both: add a new LEAVE message which reports back all the client stats to
         the server (which logs them).
      
      9. Both: attempt to comment some of the magic values in decls.h.
      
      10. Both: add cheezy hack to fake packet loss.  Disabled by default, see
         the GNUmakefile.  This code is coming out right after I archive it with
         this commit.
      
      11. Add tracing code.  Frisbee server/client will record a number of
         interesting events in a memory buffer and dump them at the end.  Not
         compiled in by default, see the GNUmakefile (NEVENTS) for turning this on.
      
      12. Not to be confused with the events above, also added testbed event system
         code so that frisbee clients can be remotely controlled.  This is a hack
         for measurement purposes (it requires a special rc.frisbee in the frisbee
         MFS).  Allows changing of all sorts of parameters as well as implementing
         a crude form of identification allowing you to start only a subset of
         clients.  Interface is via tevc with commands like:
      	tevc -e testbed,frisbee now frisbee start maxclients=5 readahead=5
      	tevc -e testbed,frisbee now frisbee stop exitstatus=42
         Again, this is not compiled in by default as it makes the client about
         4x bigger.  See the GNUmakefile for turning it on.
      2ff95cee
  23. 07 Jul, 2002 1 commit
  24. 07 Jan, 2002 1 commit
    • Leigh B. Stoller's avatar
      Checkpoint first working version of Frisbee Redux. This version · 86efdd9e
      Leigh B. Stoller authored
      requires the linux threads package to give us kernel level pthreads.
      
      From: Leigh Stoller <stoller@fast.cs.utah.edu>
      To: Testbed Operations <testbed-ops@fast.cs.utah.edu>
      Cc: Jay Lepreau <lepreau@cs.utah.edu>
      Subject: Frisbee Redux
      Date: Mon, 7 Jan 2002 12:03:56 -0800
      
      Server:
      The server is multithreaded. One thread takes in requests from the
      clients, and adds the request to a work queue. The other thread processes
      the work queue in fifo order, spitting out the desrired block ranges. A
      request is a chunk/block/blockcount tuple, and most of the time the clients
      are requesting complete 1MB chunks. The exception of course is when
      individual blocks are lost, in which case the clients request just those
      subranges.  The server it totally asynchronous; It maintains a list of who
      is "connected", but thats just to make sure we can time the server out
      after a suitable inactive time. The server really only cares about the work
      queue; As long as the queue si non empty, it spits out data.
      
      Client:
      The client is also multithreaded. One thread receives data packets and
      stuffs them in a chunkbuffer data structure. This thread also request more
      data, either to complete chunks with missing blocks, or to request new
      chunks. Each client can read ahead up 2 chunks, although with multiple
      clients it might actually be much further ahead as it also receives chunks
      that other clients requested. I set the number of chunk buffers to 16,
      although this is probably unnecessary as I will explain below. The other
      thread waits for chunkbuffers to be marked complete, and then invokes the
      imagunzip code on that chunk. Meanwhile, the other thread is busily getting
      more data and requesting/reading ahread, so that by the time the unzip is
      done, there is another chunk to unzip. In practice, the main thread never
      goes idle after the first chunk is received; there is always a ready chunk
      for it. Perfect overlap of I/O! In order to prevent the clients from
      getting overly synchronized (and causing all the clients to wait until the
      last client is done!), each client randomizes it block request order. This
      why we can retain the original frisbee name; clients end up catching random
      blocks flung out from the server until it has all the blocks.
      
      Performance:
      The single node speed is about 180 seconds for our current full image.
      Frisbee V1 compares at about 210 seconds. The two node speed was 181 and
      174 seconds. The amount of CPU used for the two node run ranged from 1% to
      4%, typically averaging about 2% while I watched it with "top."
      
      The main problem on the server side is how to keep boss (1GHZ with a Gbit
      ethernet) from spitting out packets so fast that 1/2 of them get dropped. I
      eventually settled on a static 1ms delay every 64K of packets sent. Nothing
      to be proud of, but it works.
      
      As mentioned above, the number of chunk buffers is 16, although only a few
      of them are used in practice. The reason is that the network transfer speed
      is perhaps 10 times faster than the decompression and raw device write
      speed. To know for sure, I would have to figure out the per byte transfer
      rate for 350 MBs via network, via the time to decompress and write the
      1.2GB of data to the raw disk. With such a big difference, its only
      necessary to ensure that you stay 1 or 2 chunks ahead, since you can
      request 10 chunks in the time it takes to write one of them.
      86efdd9e