1. 13 Jan, 2004 3 commits
  2. 12 Jan, 2004 28 commits
    • Robert Ricci's avatar
      6faa393a
    • Robert Ricci's avatar
      Snapshot · 20770ea5
      Robert Ricci authored
      20770ea5
    • Robert Ricci's avatar
      Several changes, mostly targeted at jail and simulation-type · 3a345f8e
      Robert Ricci authored
      topologies.
      
      First, added a new 'summary' of the solution, when the '-u' option is
      given. It's a view of things from the physical side - prints out how
      many vnodes were mapped to each pnode, as well as how much bandwidth
      (trivial and non-trivial) was used on each node. For 'normal' nodes,
      we also print out all links used and how much bandwidth was used on
      each of them. For switches, we print only inter-switch links. This is
      amazingly helpful in getting an intuitive feel for how well assign is
      doing.
      
      Added a SIGINFO handler for the impatient (like me) to see things such
      as the current temperature, and current and best scores, while assign
      is running.
      
      Fixed a bug in which emulated links could get over-subscribed, as well
      as a few other misc. bugfixes.
      
      Changed the way assign goes through the list of a node's pclasses in
      random order - there were problems with the old way in which you could
      end up with a situation in which some pnodes were chosen with a much
      higher probability than others. Now, rather than treating the list as
      a ring and starting at a random place, we make a randomly-ordered list
      of the pclasseses, and go through it from start to finish.
      
      Did some work on dynamic pclasses so that we adjust the estimate of
      the neighborhood size to account for disabled pclasses (ie. pnodes
      that have nothing mapped to them yet.)
      
      Changed the way that find_link_to_switch() decides on the best link to
      use - the old method was doing very poorly at bin-packing emulated
      links into plinks. I now use a simple first-fit algorithm. This made a
      pretty big difference. I may try some other fast bin-packing
      approximation algorithm, but my main fear is that all of the good ones
      (such as the 'sort from largest to smallest, then do first-fit'
      algorithm) may require re-mapping other links. This might be slow,
      and/or it might make it difficult, if not impossible, to keep
      add_node() and remove_node() symmetric.
      
      Combinded direct_link() and find_link_to_switch() into
      find_best_link(), since they really do the same thing.
      
      Standardized on std::random() to get random numbers - previosuly, some
      calls were using std::rand().
      
      The big one: I added a find_pnode_connected() function that finds a
      random pnode that one of the vnode's neighbors in the virtual graph is
      assigned to. Then, with a random probability (given with the -c option
      on the command line), we try that function to find a pnode first (if
      it fails, we still call find_pnode() ). Of course, this is only really
      applicable when you have a reasonable degree of vnode-to-pnode
      multiplexing. In the test case I'm using, this managed to get 3x as
      much bandwidth into trivial links as just using find_pnode().
      3a345f8e
    • Shashi Guruprasad's avatar
    • Leigh Stoller's avatar
      b39f9432
    • Leigh Stoller's avatar
      Remove all trace of proxydhcp! · 3f943535
      Leigh Stoller authored
      3f943535
    • Mike Hibler's avatar
      0bc1d045
    • Leigh Stoller's avatar
      Change to REUSEPORT instead of REUSEADDR! · a2f3e045
      Leigh Stoller authored
      a2f3e045
    • Leigh Stoller's avatar
      Add -a option to reboot all free pcs. · 7eb2adb6
      Leigh Stoller authored
      7eb2adb6
    • Leigh Stoller's avatar
      Add commitlog for bootinfo stuff. · 17a431e7
      Leigh Stoller authored
      17a431e7
    • Leigh Stoller's avatar
      Use whiteball for free nodes since they otherwise would look down (no · 57301970
      Leigh Stoller authored
      isalive reported from pxeboot kernel when node is free).
      57301970
    • Leigh Stoller's avatar
      Add another state to the PXEKERNEL state machine. After rebooting all · ce0bf7e7
      Leigh Stoller authored
      of the free nodes at once (113 nodes) three nodes lost bootinfo reply
      packets (one time each) causing them to retry, which was an invalid
      state (PXEWAIT to PXEBOOTING).
      ce0bf7e7
    • Leigh Stoller's avatar
      2006b0c3
    • Leigh Stoller's avatar
    • Leigh Stoller's avatar
      Proxydhcp is dead. · 7caae589
      Leigh Stoller authored
      7caae589
    • Mike Hibler's avatar
      I set out to make this the definitive document on vnodes, but aborted that. · 02905695
      Mike Hibler authored
      Basically, I just updated it and changed it from a chronology to a summary
      (i.e., collected all the jail features into one list).
      02905695
    • Mike Hibler's avatar
      Incorporate description of jail changes from Leigh's jail.html file. · be548710
      Mike Hibler authored
      Fix a few nits.
      be548710
    • Leigh Stoller's avatar
      Turn debug mode back off by default. · 71385a45
      Leigh Stoller authored
      71385a45
    • Leigh Stoller's avatar
      aeb80b9c
    • Leigh Stoller's avatar
      Hmm, this file dropped from previous commit. Added support for · 5378d87c
      Leigh Stoller authored
      handling PXEWAKUP timeouts, retrying 3 times and then forcing a power
      cycle.  Changed BOOTING event action to auto switch in and out of the
      special PXEKERNEL state machine that all local nodes use since all
      local nodes boot the same pxeboot kernel and talk to bootinfo (as
      directed to by dhcp).
      5378d87c
    • Leigh Stoller's avatar
      0d63b396
    • Leigh Stoller's avatar
      Death to proxydhcp; one less specialized daemon. DHCP will return the · 2b2b8ca1
      Leigh Stoller authored
      filename to boot, and all local nodes will boot the same pxeboot kernel,
      which has been extended to allow for jumping directly into a specific MFS
      (in addition to the usual testbed boot into a partition or multiboot
      kernel).
      
      Bootinfo and the bootwhat protocol extended to tell the client node what
      MFS to jump into directly, without a reboot. pxe_boot_path and
      next_pxe_boot_path are now deprecated, with bootinfo used to control which
      MFS to boot. Nodes now boot a single pxeboot kernel, and bootinfo tells
      them what to do next.
      
      Bootinfo greatly simplifed. temp_boot_osid has been added to allow for
      temporary booting of different kernels (such as with ndoe_admin or
      create_image). Unlike next_boot_osid which is a one-shot boot,
      temp_boot_osid causes the node to boot that OS until told not too.
      
      next_boot_path and def_boot_path in the nodes table are now ignored.
      Bootinfo gets path info strictly from the os_info table entry for the osid
      given in one of def_boot_osid, temp_boot_osid, or next_boot_osid.  This
      makes the selection of what to do in bootinfo a lot simpler (and for
      TBBootWhat in libdb). The os_info table also modified to include an MFS
      flag so that bootinfo knows to tell the client that the path refers to an
      MFS and not a multiboot kernel.
      
      Change to boot sequence; free nodes no longer boot into the default OSID.
      Instead, they are told to wait in pxeboot until told what to do, which
      will typically be when the node is allocated and a specific OSID
      picked. If the node needs to be reloaded, then the node is told to jump
      directly into the Frisbee MFS, which saves one complete reboot cycle
      whether the node has the requested OS installed, or not.  New program
      added called "bootinfosend" that is used by node_reboot to "wake up" up
      nodes sitting in pxewait mode, so that they query bootinfo again and boot.
      
      node_reboot changed to look at the event state of a node, and use
      bootinfosend to wake up nodes, rather then power cycle, since pxeboot does
      not repsond to pings. Retry (if the UDP packet is lost) is handled by
      stated.
      
      Event support added to bootinfo, to replace the event generation that was
      in proxydhcp. I have not included the caching that Mac had in proxydhcp
      since it does not appear that bootinfo packets are lost very
      often. Cleaned up all of the event and DB queury code to use lib/libtb for
      DB access, and moved all of the event code into a separate file.  The
      event sequence when a node boots now looks like this:
      
      	'SHUTDOWN'    --> 'PXEBOOTING'  (BootInfo)
      	'PXEBOOTING', --> 'PXEBOOTING'  (BootInfo Retry)
      	'PXEBOOTING', --> 'BOOTING'     (Node Not Free)
      	'PXEBOOTING', --> 'PXEWAIT'     (Node is Free)
      	'PXEWAIT',    --> 'PXEWAKEUP'   (Node Allocated)
      	'PXEWAKEUP',  --> 'PXEWAKEUP'   (Bootinfo Retry)
      	'PXEWAKEUP',  --> 'PXEBOOTING'  (Node Woke Up)
      
      Change stated to support resending PXEWAKEUP events when node times out.
      After 3 tries, node is power cycled. Other minor cleanup in stated.
      
      Clean up and simplify os_select, while adding support for temp_next_boot
      and removing all trace of def_boot_path and next_boot_path processing.
      Remove all pxe_boot_path and next_pxe_boot_path processing.  Changed
      command line interface to support "clearing" fields. For example,
      node_admin changed to call os_select like this to have the node
      temporarily boot the FreeBSD MFS:
      
      	os_select -t FREEBSD-MFS pcXXX
      
      which sets temp_boot_osid. To turn admin mode off:
      
      	os_select -c -t pcXXX
      
      which says to clear temp_boot_osid.
      
      sql/database-fill-supplemental.sql modifed to add os_info table
      entries for the FreeBSD, Frisbee, and newnode MFS's.
      
      Be sure to change dhcpd config, restart dhcp, kill proxydhcp, restart
      bootinfo,
      2b2b8ca1
    • Leigh Stoller's avatar
      Add more node state machine constants. · 899be6af
      Leigh Stoller authored
      Add constants for the osids describing the FreeBSD and Frisbee MFSs.
      Complete redo of TBBooWhat to match the changes in bootinfo. Look
      there for description of new boot protocol (how TBBooWhat now works).
      899be6af
    • Leigh Stoller's avatar
      Add dbclose() routine. · 0bcb887a
      Leigh Stoller authored
      0bcb887a
    • Leigh Stoller's avatar
      c33a6d9b
    • Leigh Stoller's avatar
      Remove PXE stuff and replace with simple "filename" directive to have · f03b9c86
      Leigh Stoller authored
      clients load the pxeboot kernel. Proxydhcp is dead.
      f03b9c86
    • Shashi Guruprasad's avatar
      Another bug fix. The newly added $ns ip-connect instproc had a bug. The · b113d029
      Shashi Guruprasad authored
      code originally tried to do a normal $ns connect between traffic agents
      attached to simnodes on the same pnode. The problem that I forgot of course
      is that partitioned topology is quite disconnected which means that a
      packet is forced to exit the pnode and come back to it (in many cases).
      In other words, a direct intra pnode path does not exist. The fix is
      to just use the IP address based routes always. A similar problem
      is encountered in pdns as well. However, since IP address based routing
      is not used, there is no simple fix unless I work on it!
      
      The 416 node topology testbed/nse416 is working alright. It mapped to
      20 pnodes and as soon as a whole bunch of traffic started up, 7 pnodes
      couldn't track real-time and caused a modify. Expt modify happened 3
      times but eventually max_retries in my re-swapping code was reached. Need
      more measuring, tuning as well as eventrate based re-swapping.
      b113d029
    • Shashi Guruprasad's avatar
      Fixed tcl-to-tcl reparsing while testing the 416 node topology. It was · 4b9acdc6
      Shashi Guruprasad authored
      a simple problem in the duplex-link instproc which caused the code for
      simnode creation to go to one pnode while an rlink from this simnode
      was mapped to another pnode.
      
      Also added $ns rtproto Manual for generated tcl code since IP address
      based routes are being added.
      4b9acdc6
  3. 10 Jan, 2004 1 commit
  4. 09 Jan, 2004 5 commits
  5. 08 Jan, 2004 3 commits