1. 07 Jan, 2002 8 commits
    • Robert Ricci's avatar
      Fix a minor typo in a comment · f4169034
      Robert Ricci authored
      f4169034
    • Leigh B. Stoller's avatar
      Checkpoint first working version of Frisbee Redux. This version · 86efdd9e
      Leigh B. Stoller authored
      requires the linux threads package to give us kernel level pthreads.
      
      From: Leigh Stoller <stoller@fast.cs.utah.edu>
      To: Testbed Operations <testbed-ops@fast.cs.utah.edu>
      Cc: Jay Lepreau <lepreau@cs.utah.edu>
      Subject: Frisbee Redux
      Date: Mon, 7 Jan 2002 12:03:56 -0800
      
      Server:
      The server is multithreaded. One thread takes in requests from the
      clients, and adds the request to a work queue. The other thread processes
      the work queue in fifo order, spitting out the desrired block ranges. A
      request is a chunk/block/blockcount tuple, and most of the time the clients
      are requesting complete 1MB chunks. The exception of course is when
      individual blocks are lost, in which case the clients request just those
      subranges.  The server it totally asynchronous; It maintains a list of who
      is "connected", but thats just to make sure we can time the server out
      after a suitable inactive time. The server really only cares about the work
      queue; As long as the queue si non empty, it spits out data.
      
      Client:
      The client is also multithreaded. One thread receives data packets and
      stuffs them in a chunkbuffer data structure. This thread also request more
      data, either to complete chunks with missing blocks, or to request new
      chunks. Each client can read ahead up 2 chunks, although with multiple
      clients it might actually be much further ahead as it also receives chunks
      that other clients requested. I set the number of chunk buffers to 16,
      although this is probably unnecessary as I will explain below. The other
      thread waits for chunkbuffers to be marked complete, and then invokes the
      imagunzip code on that chunk. Meanwhile, the other thread is busily getting
      more data and requesting/reading ahread, so that by the time the unzip is
      done, there is another chunk to unzip. In practice, the main thread never
      goes idle after the first chunk is received; there is always a ready chunk
      for it. Perfect overlap of I/O! In order to prevent the clients from
      getting overly synchronized (and causing all the clients to wait until the
      last client is done!), each client randomizes it block request order. This
      why we can retain the original frisbee name; clients end up catching random
      blocks flung out from the server until it has all the blocks.
      
      Performance:
      The single node speed is about 180 seconds for our current full image.
      Frisbee V1 compares at about 210 seconds. The two node speed was 181 and
      174 seconds. The amount of CPU used for the two node run ranged from 1% to
      4%, typically averaging about 2% while I watched it with "top."
      
      The main problem on the server side is how to keep boss (1GHZ with a Gbit
      ethernet) from spitting out packets so fast that 1/2 of them get dropped. I
      eventually settled on a static 1ms delay every 64K of packets sent. Nothing
      to be proud of, but it works.
      
      As mentioned above, the number of chunk buffers is 16, although only a few
      of them are used in practice. The reason is that the network transfer speed
      is perhaps 10 times faster than the decompression and raw device write
      speed. To know for sure, I would have to figure out the per byte transfer
      rate for 350 MBs via network, via the time to decompress and write the
      1.2GB of data to the raw disk. With such a big difference, its only
      necessary to ensure that you stay 1 or 2 chunks ahead, since you can
      request 10 chunks in the time it takes to write one of them.
      86efdd9e
    • Leigh B. Stoller's avatar
      Add several FRISBEE ifdefs to the user level unzip code. Rather than · d0b9f55f
      Leigh B. Stoller authored
      duplicate this code in the frisbee tree, build a version suitable for
      linking in with frisbee. I also modified the FrisbeeRead interface to
      pass back pointers instead of copying the data. There is no real
      performance benefit that I noticed, but it made me feel better not to
      copy 350 MBs of data another time. There is new initialization
      function that is called by the frisbee main program to set up a few
      things.
      d0b9f55f
    • Leigh B. Stoller's avatar
      Minor bug fix; When skipping unknown slices, do not add a skip range · 630b7764
      Leigh B. Stoller authored
      of start==end==0! This causes the entire disk to compressed a second time!
      630b7764
    • Mac Newbold's avatar
      9606491e
    • Christopher Alfeld's avatar
    • Christopher Alfeld's avatar
      Performance mods. Specifically adjusted to scale well with number of · f24b982f
      Christopher Alfeld authored
      pclasses.  This involved removing the heuristics, which, for the most
      part, were not worth the cycles they consumed, and scaled badly.
      f24b982f
    • Leigh B. Stoller's avatar
  2. 04 Jan, 2002 2 commits
  3. 03 Jan, 2002 17 commits
  4. 02 Jan, 2002 1 commit
    • Christopher Alfeld's avatar
      · 1eb1e1d9
      Christopher Alfeld authored
      This check-in consists of 7 modifications to assign.
      
      1. Equivalence Classes
      
      Defined an equivalence relation on the physical nodes and applied it
      to the physical topology to get the resulting quotient topology (abuse
      of terminology).  So instead of searching among all possible physical
      nodes to make a map, assign only searches among all possible
      equivalence classes of nodes.  This tremendously reduces the search
      space.  At the time of this writing it reduces the physical topology
      from 252 nodes to 13 nodes.  The equivalence classes are generated
      automatically from the ptop file.
      
      2. Scoring based on equivalence classes.
      
      Each equivalence class used comes with a significant cost.  This
      strongly encourages assign to use equivalence machines when possible.
      The result is that an experiment that does not otherwise specify will
      almost definitely get machines of the same type.  If this needs to be
      reduced in the future it is the SCORE_PCLASS constant.
      
      3. Heuristics
      
      Added a bunch of heuristics for choosing which equivalence class to
      use.  This was less successful than I hoped.  A good solution is now
      found in record time but it still continues searching.  When OPTIMAL
      is turned on these heuristics help a lot.  When off they make little
      difference.  I may turn this into a compile time option in the future
      since the heuristics do take non-trivial CPU cycles.
      
      4. Fixed the very-very-big-and-evil disconnected-switches bug.
      
      Assign wasn't cleaning up after itself in certain cases.  Disconnected
      graphs are now merely a minor, easily ignored, bump rather than the
      towering cliffs they use to be.
      
      5. Fixed the not-yet-noticed not-enough-nodes bug.
      
      Found a bug that probably has never come up before because we have
      checks that avoid those circumstances.
      
      6. Modified constants.
      
      I was tired of waiting so long for results so, I lowered CYCLES and
      reduced the constant for naccepts (Mac, you probably want to add that
      inconspicuous number to your configurable constants; look for
      "naccepts =").  The results is roughly a speedup of 2.  It works great
      currently but we may want to change these numbers up again if we get
      problems with features and desires.
      
      7. General clean up.
      
      Associated with the other changes was a lot of restructuring and some
      cleanup.  Specifically to the assign loop and scoring code.
      1eb1e1d9
  5. 28 Dec, 2001 6 commits
  6. 27 Dec, 2001 1 commit
    • Leigh B. Stoller's avatar
      Another set of group changes. As discussed in email and meetings, · 8404af03
      Leigh B. Stoller authored
      group directories are now created in a different tree than the
      the project directory so that they can be exported independently of
      the project tree to the nodes in a group experiment. The tree is
      routed at /groups on boss/users and on nodes.
      
      1. mkgroup,rmgroup,mkproj - Minor changes to reflect new group
         directory location (/groups). We leave a symlink in the old spot to
         maintain compatability, and to reduce the number of different
         directories that a person needs to worry about. So, when a group is
         made, you get a real directory /groups/pid/gid, and a symlink
         /proj/pid/groups/gid that points to the former.
      
      2. tmcd/tmcd.c - Minor change to add the additional group directory mount
         in the mounts command. Only done when pid!=gid for the experiment.
      
      3. tmcd/libsetup.pm and friends - Minor changes to fix the fact that
         mkdir does not create subdirs along the way unless the -p option is
         specified. Needed to create the local directory for the mounts
         returned by tmcd for group dirs. Pushed them out to the sup trees,
         although 6.2 images older than the most recent one are not going to
         work right. No one is using those images though, and we should just
         flush the sup trees.
      
      4. exports_setup.in - Ah, the crux of the issue. I really dislike NFS
         at this point. The original idea was to export a third set of
         directories to nodes that were part of a group experiment. Those
         nodes would get /groups/pid/gid exported, and /proj/pid read-only.
         Well, no such luck. On users, /groups and /proj are both really on
         /q, and the old restriction of mountd not allowing an IP to
         specified more than once on the right hand side for any FS, reared
         its ugly head again. As far as mountd is concerned, /q/groups and
         /q/proj are the same thing, and so it bombed when I tried to export
         them on different lines, since that meant an IP was repeated twice.
         So, I reworked exports_setup, and now for any node that is part of
         a group experiment, it gets this:
      
      	/q/proj/pid /q/groups/pid/gid -maproot=root 155.101.132.26
      
         which at least allows the individual group dirs to be protected
         from each other, but does not allow /proj/pid to be exported read
         only. Sigh.
      8404af03
  7. 26 Dec, 2001 2 commits
    • Robert Ricci's avatar
      0cd744a4
    • Leigh B. Stoller's avatar
      A bunch o' account managment script schanges. I have reworked · 46068860
      Leigh B. Stoller authored
      mkprojdir, mkacct-cntrl, mkgroup, and group-update into a set of new
      scripts that are more specific to their intended operation, and strive
      to do less work.
      
      1. mkacct - Replaces mkacct-cntrl. This script no longer does any
         group stuff. All it does is create new accounts, or update the
         password and gecos fields of existing accounts. Usage is the same
         as it was: "mkacct <userid>", and is typically invoked from the web
         interface via the approveuser form.
      
      2. mkgroup - Replaces group-update. This script creates new groups,
         either for the main project when it is approved, or for subgroup
         creation. This script does not alter the group membership. Usage
         is typically from the web interface, but mkgroup can be invoked
         from the command line: "mkgroup [-b | -a] <pid> <gid>" where -b
         puts it in the background and sends email later, while -a just
         captures the log and emails. This "audit" feature is going to find
         its way into more scripts as soon as I figure out a neat and clean
         perl mechanism to make it easy.
      
      3. setgroups - Replaces group-update. This script modifies the group
         membership of either specific users, or all the users in a
         project. It is typically invoked from the web interface when a
         project leader edits the subgroup membership or when a user is
         first approved to a project or subgroup. Command line usage is:
      
      	setgroups [-b | -a] -p <pid> [user ...]
              setgroups [-b | -a] [user ...]\n
      
         The first form is mostly a means to speed things up. The web
         interfaces knows exactly what users have need to be changed, but a
         global project update is nice too.
      
      4. mkproj - Replaces mkprojdir. Actually, mkproj still has all that
         directory code, but it also handles creating the groups and the
         account for the project leader. Part of my policy to move as much
         random code out of the web interface and into the PERL backend
         where it belongs.
      46068860
  8. 21 Dec, 2001 3 commits