1. 08 Jan, 2002 3 commits
    • Christopher Alfeld's avatar
      All the vclass stuff. Can now do: · 09737efc
      Christopher Alfeld authored
      make-vclass A 0.5 pc600 pc850
      
      in the top file.  And then have nodes of type A.  Assign will try to put
      them all as either pc600 or pc850.
      
      Still need to write all the pre-assign stuff.
      09737efc
    • Leigh B. Stoller's avatar
      Fix up genlastlog to avoid Y2K like problems. Basically, syslog does · cb096341
      Leigh B. Stoller authored
      not include a year in the output, so I was using the current year to
      init the tm structure. However, the log file could span from the old
      year to the new year, and so all the dates could be wrong. Using Mac's
      suggestion, look at the final time stamp, and if its in the future,
      reset it back one year.
      
      Also add [-a #] option to roll through the specified number of
      rotation files, to make the job of reinitting all the records easier.
      I ran it as "genlastlog -a 7", which makes it process logins.7.gz
      through logins.0.gz, before doing the current log.
      cb096341
    • Leigh B. Stoller's avatar
      Remove all of the connection handling stuff. Clients come and go, and · 7523a98c
      Leigh B. Stoller authored
      idleness is defined as an empty work queue. We still use join/leave
      messages, but the join message is so that the client can be informed
      of the number of blocks in the file. The leave message is strictly
      informational, and includes the elapsed time on the client, so that it
      can be written to the log file. If that message is lost, no big deal.
      I ran a 6 node test on this new code, and all the clients ran in 174
      to 176 seconds, with frisbeed using 1% CPU on average (typically
      starts out at about 3%, and quickly drops off to steady state).
      7523a98c
  2. 07 Jan, 2002 13 commits
    • Kirk Webb's avatar
      Added entry to start "hmcd" - Healthd Master Collection Daemon on bootup. ... · 29552163
      Kirk Webb authored
      Added entry to start "hmcd" - Healthd Master Collection Daemon on bootup.  Noticed it wasn't running and recalled the recent machine room downtime - first reboot since hmcd was started up.
      29552163
    • Robert Ricci's avatar
      3b838cd4
    • Robert Ricci's avatar
      eb79ba3c
    • Robert Ricci's avatar
      ee7ef90a
    • Robert Ricci's avatar
    • Robert Ricci's avatar
      Fix a minor typo in a comment · f4169034
      Robert Ricci authored
      f4169034
    • Leigh B. Stoller's avatar
      Checkpoint first working version of Frisbee Redux. This version · 86efdd9e
      Leigh B. Stoller authored
      requires the linux threads package to give us kernel level pthreads.
      
      From: Leigh Stoller <stoller@fast.cs.utah.edu>
      To: Testbed Operations <testbed-ops@fast.cs.utah.edu>
      Cc: Jay Lepreau <lepreau@cs.utah.edu>
      Subject: Frisbee Redux
      Date: Mon, 7 Jan 2002 12:03:56 -0800
      
      Server:
      The server is multithreaded. One thread takes in requests from the
      clients, and adds the request to a work queue. The other thread processes
      the work queue in fifo order, spitting out the desrired block ranges. A
      request is a chunk/block/blockcount tuple, and most of the time the clients
      are requesting complete 1MB chunks. The exception of course is when
      individual blocks are lost, in which case the clients request just those
      subranges.  The server it totally asynchronous; It maintains a list of who
      is "connected", but thats just to make sure we can time the server out
      after a suitable inactive time. The server really only cares about the work
      queue; As long as the queue si non empty, it spits out data.
      
      Client:
      The client is also multithreaded. One thread receives data packets and
      stuffs them in a chunkbuffer data structure. This thread also request more
      data, either to complete chunks with missing blocks, or to request new
      chunks. Each client can read ahead up 2 chunks, although with multiple
      clients it might actually be much further ahead as it also receives chunks
      that other clients requested. I set the number of chunk buffers to 16,
      although this is probably unnecessary as I will explain below. The other
      thread waits for chunkbuffers to be marked complete, and then invokes the
      imagunzip code on that chunk. Meanwhile, the other thread is busily getting
      more data and requesting/reading ahread, so that by the time the unzip is
      done, there is another chunk to unzip. In practice, the main thread never
      goes idle after the first chunk is received; there is always a ready chunk
      for it. Perfect overlap of I/O! In order to prevent the clients from
      getting overly synchronized (and causing all the clients to wait until the
      last client is done!), each client randomizes it block request order. This
      why we can retain the original frisbee name; clients end up catching random
      blocks flung out from the server until it has all the blocks.
      
      Performance:
      The single node speed is about 180 seconds for our current full image.
      Frisbee V1 compares at about 210 seconds. The two node speed was 181 and
      174 seconds. The amount of CPU used for the two node run ranged from 1% to
      4%, typically averaging about 2% while I watched it with "top."
      
      The main problem on the server side is how to keep boss (1GHZ with a Gbit
      ethernet) from spitting out packets so fast that 1/2 of them get dropped. I
      eventually settled on a static 1ms delay every 64K of packets sent. Nothing
      to be proud of, but it works.
      
      As mentioned above, the number of chunk buffers is 16, although only a few
      of them are used in practice. The reason is that the network transfer speed
      is perhaps 10 times faster than the decompression and raw device write
      speed. To know for sure, I would have to figure out the per byte transfer
      rate for 350 MBs via network, via the time to decompress and write the
      1.2GB of data to the raw disk. With such a big difference, its only
      necessary to ensure that you stay 1 or 2 chunks ahead, since you can
      request 10 chunks in the time it takes to write one of them.
      86efdd9e
    • Leigh B. Stoller's avatar
      Add several FRISBEE ifdefs to the user level unzip code. Rather than · d0b9f55f
      Leigh B. Stoller authored
      duplicate this code in the frisbee tree, build a version suitable for
      linking in with frisbee. I also modified the FrisbeeRead interface to
      pass back pointers instead of copying the data. There is no real
      performance benefit that I noticed, but it made me feel better not to
      copy 350 MBs of data another time. There is new initialization
      function that is called by the frisbee main program to set up a few
      things.
      d0b9f55f
    • Leigh B. Stoller's avatar
      Minor bug fix; When skipping unknown slices, do not add a skip range · 630b7764
      Leigh B. Stoller authored
      of start==end==0! This causes the entire disk to compressed a second time!
      630b7764
    • Mac Newbold's avatar
      9606491e
    • Christopher Alfeld's avatar
    • Christopher Alfeld's avatar
      Performance mods. Specifically adjusted to scale well with number of · f24b982f
      Christopher Alfeld authored
      pclasses.  This involved removing the heuristics, which, for the most
      part, were not worth the cycles they consumed, and scaled badly.
      f24b982f
    • Leigh B. Stoller's avatar
  3. 04 Jan, 2002 1 commit
    • Robert Ricci's avatar
      New script: unixgroups . Pretty simple - just a convenient way to manage the · 469dacdb
      Robert Ricci authored
      unixgroup_membershit table from the command line. Runs the appropriate
      commands to make changes in the 'real world' after the database has been
      updated. From the usage message:
      
      Usage: unixgroups <-h | -p | < <-a | -r> uid gid...> >
      -h            This message
      -p            Print group information
      -a uid gid... Add a user to one (or more) groups
      -r uid gid... Remove a user from one (or more) groups
      469dacdb
  4. 03 Jan, 2002 18 commits
  5. 02 Jan, 2002 1 commit
    • Christopher Alfeld's avatar
      · 1eb1e1d9
      Christopher Alfeld authored
      This check-in consists of 7 modifications to assign.
      
      1. Equivalence Classes
      
      Defined an equivalence relation on the physical nodes and applied it
      to the physical topology to get the resulting quotient topology (abuse
      of terminology).  So instead of searching among all possible physical
      nodes to make a map, assign only searches among all possible
      equivalence classes of nodes.  This tremendously reduces the search
      space.  At the time of this writing it reduces the physical topology
      from 252 nodes to 13 nodes.  The equivalence classes are generated
      automatically from the ptop file.
      
      2. Scoring based on equivalence classes.
      
      Each equivalence class used comes with a significant cost.  This
      strongly encourages assign to use equivalence machines when possible.
      The result is that an experiment that does not otherwise specify will
      almost definitely get machines of the same type.  If this needs to be
      reduced in the future it is the SCORE_PCLASS constant.
      
      3. Heuristics
      
      Added a bunch of heuristics for choosing which equivalence class to
      use.  This was less successful than I hoped.  A good solution is now
      found in record time but it still continues searching.  When OPTIMAL
      is turned on these heuristics help a lot.  When off they make little
      difference.  I may turn this into a compile time option in the future
      since the heuristics do take non-trivial CPU cycles.
      
      4. Fixed the very-very-big-and-evil disconnected-switches bug.
      
      Assign wasn't cleaning up after itself in certain cases.  Disconnected
      graphs are now merely a minor, easily ignored, bump rather than the
      towering cliffs they use to be.
      
      5. Fixed the not-yet-noticed not-enough-nodes bug.
      
      Found a bug that probably has never come up before because we have
      checks that avoid those circumstances.
      
      6. Modified constants.
      
      I was tired of waiting so long for results so, I lowered CYCLES and
      reduced the constant for naccepts (Mac, you probably want to add that
      inconspicuous number to your configurable constants; look for
      "naccepts =").  The results is roughly a speedup of 2.  It works great
      currently but we may want to change these numbers up again if we get
      problems with features and desires.
      
      7. General clean up.
      
      Associated with the other changes was a lot of restructuring and some
      cleanup.  Specifically to the assign loop and scoring code.
      1eb1e1d9
  6. 28 Dec, 2001 4 commits