1. 12 Jan, 2004 18 commits
    • Leigh Stoller's avatar
      Use whiteball for free nodes since they otherwise would look down (no · 57301970
      Leigh Stoller authored
      isalive reported from pxeboot kernel when node is free).
      57301970
    • Leigh Stoller's avatar
      Add another state to the PXEKERNEL state machine. After rebooting all · ce0bf7e7
      Leigh Stoller authored
      of the free nodes at once (113 nodes) three nodes lost bootinfo reply
      packets (one time each) causing them to retry, which was an invalid
      state (PXEWAIT to PXEBOOTING).
      ce0bf7e7
    • Leigh Stoller's avatar
      2006b0c3
    • Leigh Stoller's avatar
    • Leigh Stoller's avatar
      Proxydhcp is dead. · 7caae589
      Leigh Stoller authored
      7caae589
    • Mike Hibler's avatar
      I set out to make this the definitive document on vnodes, but aborted that. · 02905695
      Mike Hibler authored
      Basically, I just updated it and changed it from a chronology to a summary
      (i.e., collected all the jail features into one list).
      02905695
    • Mike Hibler's avatar
      Incorporate description of jail changes from Leigh's jail.html file. · be548710
      Mike Hibler authored
      Fix a few nits.
      be548710
    • Leigh Stoller's avatar
      Turn debug mode back off by default. · 71385a45
      Leigh Stoller authored
      71385a45
    • Leigh Stoller's avatar
      aeb80b9c
    • Leigh Stoller's avatar
      Hmm, this file dropped from previous commit. Added support for · 5378d87c
      Leigh Stoller authored
      handling PXEWAKUP timeouts, retrying 3 times and then forcing a power
      cycle.  Changed BOOTING event action to auto switch in and out of the
      special PXEKERNEL state machine that all local nodes use since all
      local nodes boot the same pxeboot kernel and talk to bootinfo (as
      directed to by dhcp).
      5378d87c
    • Leigh Stoller's avatar
      0d63b396
    • Leigh Stoller's avatar
      Death to proxydhcp; one less specialized daemon. DHCP will return the · 2b2b8ca1
      Leigh Stoller authored
      filename to boot, and all local nodes will boot the same pxeboot kernel,
      which has been extended to allow for jumping directly into a specific MFS
      (in addition to the usual testbed boot into a partition or multiboot
      kernel).
      
      Bootinfo and the bootwhat protocol extended to tell the client node what
      MFS to jump into directly, without a reboot. pxe_boot_path and
      next_pxe_boot_path are now deprecated, with bootinfo used to control which
      MFS to boot. Nodes now boot a single pxeboot kernel, and bootinfo tells
      them what to do next.
      
      Bootinfo greatly simplifed. temp_boot_osid has been added to allow for
      temporary booting of different kernels (such as with ndoe_admin or
      create_image). Unlike next_boot_osid which is a one-shot boot,
      temp_boot_osid causes the node to boot that OS until told not too.
      
      next_boot_path and def_boot_path in the nodes table are now ignored.
      Bootinfo gets path info strictly from the os_info table entry for the osid
      given in one of def_boot_osid, temp_boot_osid, or next_boot_osid.  This
      makes the selection of what to do in bootinfo a lot simpler (and for
      TBBootWhat in libdb). The os_info table also modified to include an MFS
      flag so that bootinfo knows to tell the client that the path refers to an
      MFS and not a multiboot kernel.
      
      Change to boot sequence; free nodes no longer boot into the default OSID.
      Instead, they are told to wait in pxeboot until told what to do, which
      will typically be when the node is allocated and a specific OSID
      picked. If the node needs to be reloaded, then the node is told to jump
      directly into the Frisbee MFS, which saves one complete reboot cycle
      whether the node has the requested OS installed, or not.  New program
      added called "bootinfosend" that is used by node_reboot to "wake up" up
      nodes sitting in pxewait mode, so that they query bootinfo again and boot.
      
      node_reboot changed to look at the event state of a node, and use
      bootinfosend to wake up nodes, rather then power cycle, since pxeboot does
      not repsond to pings. Retry (if the UDP packet is lost) is handled by
      stated.
      
      Event support added to bootinfo, to replace the event generation that was
      in proxydhcp. I have not included the caching that Mac had in proxydhcp
      since it does not appear that bootinfo packets are lost very
      often. Cleaned up all of the event and DB queury code to use lib/libtb for
      DB access, and moved all of the event code into a separate file.  The
      event sequence when a node boots now looks like this:
      
      	'SHUTDOWN'    --> 'PXEBOOTING'  (BootInfo)
      	'PXEBOOTING', --> 'PXEBOOTING'  (BootInfo Retry)
      	'PXEBOOTING', --> 'BOOTING'     (Node Not Free)
      	'PXEBOOTING', --> 'PXEWAIT'     (Node is Free)
      	'PXEWAIT',    --> 'PXEWAKEUP'   (Node Allocated)
      	'PXEWAKEUP',  --> 'PXEWAKEUP'   (Bootinfo Retry)
      	'PXEWAKEUP',  --> 'PXEBOOTING'  (Node Woke Up)
      
      Change stated to support resending PXEWAKEUP events when node times out.
      After 3 tries, node is power cycled. Other minor cleanup in stated.
      
      Clean up and simplify os_select, while adding support for temp_next_boot
      and removing all trace of def_boot_path and next_boot_path processing.
      Remove all pxe_boot_path and next_pxe_boot_path processing.  Changed
      command line interface to support "clearing" fields. For example,
      node_admin changed to call os_select like this to have the node
      temporarily boot the FreeBSD MFS:
      
      	os_select -t FREEBSD-MFS pcXXX
      
      which sets temp_boot_osid. To turn admin mode off:
      
      	os_select -c -t pcXXX
      
      which says to clear temp_boot_osid.
      
      sql/database-fill-supplemental.sql modifed to add os_info table
      entries for the FreeBSD, Frisbee, and newnode MFS's.
      
      Be sure to change dhcpd config, restart dhcp, kill proxydhcp, restart
      bootinfo,
      2b2b8ca1
    • Leigh Stoller's avatar
      Add more node state machine constants. · 899be6af
      Leigh Stoller authored
      Add constants for the osids describing the FreeBSD and Frisbee MFSs.
      Complete redo of TBBooWhat to match the changes in bootinfo. Look
      there for description of new boot protocol (how TBBooWhat now works).
      899be6af
    • Leigh Stoller's avatar
      Add dbclose() routine. · 0bcb887a
      Leigh Stoller authored
      0bcb887a
    • Leigh Stoller's avatar
      c33a6d9b
    • Leigh Stoller's avatar
      Remove PXE stuff and replace with simple "filename" directive to have · f03b9c86
      Leigh Stoller authored
      clients load the pxeboot kernel. Proxydhcp is dead.
      f03b9c86
    • Shashi Guruprasad's avatar
      Another bug fix. The newly added $ns ip-connect instproc had a bug. The · b113d029
      Shashi Guruprasad authored
      code originally tried to do a normal $ns connect between traffic agents
      attached to simnodes on the same pnode. The problem that I forgot of course
      is that partitioned topology is quite disconnected which means that a
      packet is forced to exit the pnode and come back to it (in many cases).
      In other words, a direct intra pnode path does not exist. The fix is
      to just use the IP address based routes always. A similar problem
      is encountered in pdns as well. However, since IP address based routing
      is not used, there is no simple fix unless I work on it!
      
      The 416 node topology testbed/nse416 is working alright. It mapped to
      20 pnodes and as soon as a whole bunch of traffic started up, 7 pnodes
      couldn't track real-time and caused a modify. Expt modify happened 3
      times but eventually max_retries in my re-swapping code was reached. Need
      more measuring, tuning as well as eventrate based re-swapping.
      b113d029
    • Shashi Guruprasad's avatar
      Fixed tcl-to-tcl reparsing while testing the 416 node topology. It was · 4b9acdc6
      Shashi Guruprasad authored
      a simple problem in the duplex-link instproc which caused the code for
      simnode creation to go to one pnode while an rlink from this simnode
      was mapped to another pnode.
      
      Also added $ns rtproto Manual for generated tcl code since IP address
      based routes are being added.
      4b9acdc6
  2. 10 Jan, 2004 1 commit
  3. 09 Jan, 2004 5 commits
  4. 08 Jan, 2004 3 commits
  5. 07 Jan, 2004 3 commits
    • Leigh Stoller's avatar
      Fix minor bug I introduced a long time ago, that would show up only if · 58bb3ced
      Leigh Stoller authored
      you typed the URL directly instead of indirecting from the project
      page. No one did that till today.
      58bb3ced
    • Leigh Stoller's avatar
      A set of debugging changes to allow running multiple stateds. This is · cf61f6f3
      Leigh Stoller authored
      probably imperfect, but better then nothing. New option, "-t tag"
      allows you to specify an arbitrary tag to match against the stated_tag
      of the nodes table. The stated invocation will only operate on nodes
      that match the tag, ignoring all events for other nodes. If
      unspecified, stated will operate on all nodes with a NULL tag. This is
      setup up at the beginning of time (or during a reload) saving the
      per-node tag in the $nodes hash. Each time an event arrives, check the
      tag in the table, ignoring the event if not a match.
      
      On signaled reload() must also be careful to throw away timeouts from
      the queue (and be careful not to set up new timeouts for ignored
      nodes).  So, this allows you to set the tag for a node in the DB, and
      then HUP stated so that it reloads it tables. That node will now be
      ignored by that stated.
      
      Also made some changes to debug mode. In debug mode, don't worry about
      the pidfile or the lockfile or checking for other running stated
      (which causes my debug version to exit! right away). Also, added a new
      -l option to turn of syslog output and just send it all to stdout with
      the debug output. -l can be only be used with -d of course.
      
      So what can I do with all this:
      
      	update nodes set stated_tag='lbs' where node_id='pc5';
      	sudo kill -HUP `cat /var/run/stated.pid`
      	sudo stated -d -l -t lbs
      
      Which tells the main stated to ignore pc5. Then I run a debugging
      stated that operates only on pc5. Later when done:
      
      	update nodes set stated_tag=NULL where node_id='pc5';
      	sudo kill -HUP `cat /var/run/stated.pid`
      
      Which tells the main stated to operate on pc5 again.
      cf61f6f3
    • Shashi Guruprasad's avatar
      Yet another bugfix + code to call a function that sends NSESWAP event · ac01c40b
      Shashi Guruprasad authored
      when it cannot keep up with real-time.
      
      bug: This affected encapsulated simulator packets that had to cross
      multiple physical nodes before arriving at the destination simulator
      traffic agent. This bug didnt affect live packets from traffic sources
      on real PCs.
      
      The NSESWAP event is now sent via the tevc command. The nse scheduler
      waits for the slop factor (diff between clock and event dispatch time
      that exceeds a threshold) to be crossed multiple times in a second
      before sending the NSESWAP event. Currently 5 times in 1 second.
      However, this needs more careful thought and will get modified later.
      When is it really necessary to declare that an nse is overloaded?
      i.e. what is the right slop factor? How many times can we tolerate
      that the slop factor is exceeded to ensure end-to-end performance
      is within a certain percentage of the expected?
      ac01c40b
  6. 06 Jan, 2004 9 commits
  7. 05 Jan, 2004 1 commit