1. 15 Jan, 2002 7 commits
  2. 14 Jan, 2002 9 commits
  3. 12 Jan, 2002 1 commit
  4. 11 Jan, 2002 11 commits
  5. 10 Jan, 2002 12 commits
    • Leigh B. Stoller's avatar
      A set of capture/capserver/DB changes. · 8ec05f0d
      Leigh B. Stoller authored
      Capserver and capture now handshake the owner/group of the tipline.
      Owner is defaults to root, and the group defaults to root when the
      node is not allocated. Capture will do the chmod after the handshake,
      so if boss is down when capture starts, the acl/run file will get 0,0,
      but will get the proper owner/group later after its able to handshake.
      As a result, console_setup.proxy was trimmed down and cleaned up a
      bit, since it no longer has to muck with some of this stuff.
      
      A second change was to support multiple tiplines per node. I have
      modified the tiplines table as such:
      
      	| Field   | Type        | Null | Key | Default | Extra |
      	+---------+-------------+------+-----+---------+-------+
      	| tipname | varchar(32) |      | PRI |         |       |
      	| node_id | varchar(10) |      |     |         |       |
      	| server  | varchar(64) |      |     |         |       |
      
      That is, the name of the tip device (given to capture) is the unique
      key, and there can be multiple tiplines associated with each node.
      console_setup now uses the tiplines table to determine what tiplines
      need to be reset; used to be just the name of the node_id passed into
      console_setup. Conversely, capserver uses the tipname to map back to
      the node_id, so that it can get the owner/group from the reserved
      table.
      
      I also removed the shark hack from nalloc, nfree, and console_reset,
      since there is no longer any need for that; this can be described
      completely now with tiplines table entries. If we ever bring the
      sharks back, we will need to generate new entries. Hah!
      8ec05f0d
    • Mike Hibler's avatar
      dooh! forgot to update the makefile · 91b15ec9
      Mike Hibler authored
      91b15ec9
    • Mike Hibler's avatar
    • Leigh B. Stoller's avatar
      Take the submenu code from the experiments info page, and generalize · ae987100
      Leigh B. Stoller authored
      (somewhat) so that we can do submenu easily in other pages.
      ae987100
    • Leigh B. Stoller's avatar
    • Robert Ricci's avatar
      Most of the time, findVlan() retries up to 10 times to find a VLAN, to · 6a3140fa
      Robert Ricci authored
      account for the time it may take for changes made at the master to
      propagate to the slaves. Added a paramter to override this, as sometimes,
      we know that we're talking to the master so the delay does not come into
      play.
      
      This should improve the running time of snmpit by about 10 seconds per VLAN
      created, since we can tell right away if the VLAN already exists or not.
      6a3140fa
    • Christopher Alfeld's avatar
      d2d02a3a
    • Christopher Alfeld's avatar
    • Leigh B. Stoller's avatar
      52d479a4
    • Robert Ricci's avatar
      Added location of APC SNMP MIBs, and instructions for getting root's ssh · e6cdd32d
      Robert Ricci authored
      keys on boss and ops.
      e6cdd32d
    • Leigh B. Stoller's avatar
      e085bf6f
    • Leigh B. Stoller's avatar
      I noticed in the 12 nodes tests that CPU was running at 5-6% now. I · 5c02231f
      Leigh B. Stoller authored
      also noticed that the slower machines were getting very far behind the
      faster machines (the faster machines requests chunks faster), and
      actually dropping them cause they have no room for the chunks
      (chunkbufs at 32). I increased the timeout on the client (if no blocks
      received for this long; request something) from 30ms to 90ms.  This
      helped a bit, but the real help was increasing chunkbufs up to 64.
      Now the clients run in pretty much single node speed (152/174), and
      the CPU usage on boss went back down 2-3% during the run. The stats
      show far less data loss and resending of blocks. In fact, we were
      resending upwards 300MB of data cause of client loss. That went down
      to about 14MB for the 12 node test.
      
      Then I ran a 24 node node test. Very sweet. All 24 nodes ran in 155 -
      180 seconds. CPU peaked at 6%, and dropped off to steady state of 4%.
      None of the nodes saw any duplicate chunks. Note that the client is
      probably going to need some backoff code in case the server dies, to
      prevent swamping the boss with unanswerable packets. Next step is to
      have Matt run a test when he swaps in his 40 nodes.
      5c02231f