1. 14 Mar, 2007 1 commit
  2. 15 May, 2006 1 commit
  3. 15 Nov, 2004 1 commit
  4. 29 Oct, 2004 1 commit
    • Leigh Stoller's avatar
      Such a brutal ElabinElab hack ... When trying to swapin an actual · 0749ef9c
      Leigh Stoller authored
      experiment from the web interface, I ran into another control network
      problem, this time in bootinfo. When a node is sitting free, it waits
      in pxeboot for a bootinfo packet from boss to tell it what to do (this
      is different then when the node is allocated, and bootinfo tells it
      what to do in a reply to the initial request). In the PXEWAIT case, we
      *send* it a packet, addressed to its *control network* address, which
      in the inner DB, is on the inner control network, but of course PXE is
      really using the outer control network, so packets addressed to inner
      control network are never seen by pxeboot.
      
      This is the only (known) case of this happening, and rather then try
      for some general, over engineered solution, I did something unusual,
      and put in a hack, ifdefed for ELABINELAB (meaning, its an inner
      elab). I know, you're thinking, how could he have done such a thing,
      its so unlike him!
      
      Well, it was damn easy! Anyway, this little hack checks the DB for an
      interface tagged as role='outer_ctrl' and uses that IP instead of the
      inner control network. When I create the inner DB from the outer DB, I
      was already leaving the outer control network in place so that
      bootinfo could find the proper node (again, cause the bootinfo request
      packets are coming from the outer control network, and so its IP would
      not match any nodes in the DB).
      
      I'd like to say that this is the last problem with swapin, but I see
      in my other window that the event scheduler failed to start on inner
      ops with some silly error ssh permission denied error. Whats that all
      about?
      0749ef9c
  5. 13 Oct, 2004 1 commit
  6. 21 Jan, 2004 1 commit
  7. 12 Jan, 2004 1 commit
    • Leigh Stoller's avatar
      Death to proxydhcp; one less specialized daemon. DHCP will return the · 2b2b8ca1
      Leigh Stoller authored
      filename to boot, and all local nodes will boot the same pxeboot kernel,
      which has been extended to allow for jumping directly into a specific MFS
      (in addition to the usual testbed boot into a partition or multiboot
      kernel).
      
      Bootinfo and the bootwhat protocol extended to tell the client node what
      MFS to jump into directly, without a reboot. pxe_boot_path and
      next_pxe_boot_path are now deprecated, with bootinfo used to control which
      MFS to boot. Nodes now boot a single pxeboot kernel, and bootinfo tells
      them what to do next.
      
      Bootinfo greatly simplifed. temp_boot_osid has been added to allow for
      temporary booting of different kernels (such as with ndoe_admin or
      create_image). Unlike next_boot_osid which is a one-shot boot,
      temp_boot_osid causes the node to boot that OS until told not too.
      
      next_boot_path and def_boot_path in the nodes table are now ignored.
      Bootinfo gets path info strictly from the os_info table entry for the osid
      given in one of def_boot_osid, temp_boot_osid, or next_boot_osid.  This
      makes the selection of what to do in bootinfo a lot simpler (and for
      TBBootWhat in libdb). The os_info table also modified to include an MFS
      flag so that bootinfo knows to tell the client that the path refers to an
      MFS and not a multiboot kernel.
      
      Change to boot sequence; free nodes no longer boot into the default OSID.
      Instead, they are told to wait in pxeboot until told what to do, which
      will typically be when the node is allocated and a specific OSID
      picked. If the node needs to be reloaded, then the node is told to jump
      directly into the Frisbee MFS, which saves one complete reboot cycle
      whether the node has the requested OS installed, or not.  New program
      added called "bootinfosend" that is used by node_reboot to "wake up" up
      nodes sitting in pxewait mode, so that they query bootinfo again and boot.
      
      node_reboot changed to look at the event state of a node, and use
      bootinfosend to wake up nodes, rather then power cycle, since pxeboot does
      not repsond to pings. Retry (if the UDP packet is lost) is handled by
      stated.
      
      Event support added to bootinfo, to replace the event generation that was
      in proxydhcp. I have not included the caching that Mac had in proxydhcp
      since it does not appear that bootinfo packets are lost very
      often. Cleaned up all of the event and DB queury code to use lib/libtb for
      DB access, and moved all of the event code into a separate file.  The
      event sequence when a node boots now looks like this:
      
      	'SHUTDOWN'    --> 'PXEBOOTING'  (BootInfo)
      	'PXEBOOTING', --> 'PXEBOOTING'  (BootInfo Retry)
      	'PXEBOOTING', --> 'BOOTING'     (Node Not Free)
      	'PXEBOOTING', --> 'PXEWAIT'     (Node is Free)
      	'PXEWAIT',    --> 'PXEWAKEUP'   (Node Allocated)
      	'PXEWAKEUP',  --> 'PXEWAKEUP'   (Bootinfo Retry)
      	'PXEWAKEUP',  --> 'PXEBOOTING'  (Node Woke Up)
      
      Change stated to support resending PXEWAKEUP events when node times out.
      After 3 tries, node is power cycled. Other minor cleanup in stated.
      
      Clean up and simplify os_select, while adding support for temp_next_boot
      and removing all trace of def_boot_path and next_boot_path processing.
      Remove all pxe_boot_path and next_pxe_boot_path processing.  Changed
      command line interface to support "clearing" fields. For example,
      node_admin changed to call os_select like this to have the node
      temporarily boot the FreeBSD MFS:
      
      	os_select -t FREEBSD-MFS pcXXX
      
      which sets temp_boot_osid. To turn admin mode off:
      
      	os_select -c -t pcXXX
      
      which says to clear temp_boot_osid.
      
      sql/database-fill-supplemental.sql modifed to add os_info table
      entries for the FreeBSD, Frisbee, and newnode MFS's.
      
      Be sure to change dhcpd config, restart dhcp, kill proxydhcp, restart
      bootinfo,
      2b2b8ca1
  8. 30 Jan, 2003 2 commits
  9. 18 Oct, 2002 1 commit
    • Mac Newbold's avatar
      Merge the newstated branch with the main tree. · 5c961517
      Mac Newbold authored
      Changes to watch out for:
      
      - db calls that change boot info in nodes table are now calls to os_select
      
      - whenever you want to change a node's pxe boot info, or def or next boot
      osids or paths, use os_select.
      
      - when you need to wait for a node to reach some point in the boot process
      (like ISUP), check the state in the database using the lib calls
      
      - Proxydhcp now sends a BOOTING state for each node that it talks to.
      
      - OSs that don't send ISUP will have one generated for them by stated
      either when they ping (if they support ping) or immediately after they get
      to BOOTING.
      
      - States now have timeouts. Actions aren't currently carried out, but they
      will be soon. If you notice problems here, let me know... we're still
      tuning it. (Before all timeouts were set to "none" in the db)
      
      One temporary change:
      
      - While I make our new free node manager daemon (freed), all nodes are
      forced into reloading when they're nfreed and the calls to reset the os
      are disabled (that will move into freed).
      5c961517
  10. 20 Sep, 2002 1 commit
  11. 07 Jul, 2002 1 commit
  12. 22 Jun, 2001 1 commit
  13. 06 Jun, 2001 2 commits
  14. 09 May, 2001 1 commit
  15. 03 May, 2001 1 commit
  16. 02 Jan, 2001 1 commit
  17. 14 Dec, 2000 1 commit
  18. 13 Dec, 2000 1 commit
  19. 12 Dec, 2000 1 commit
  20. 20 Oct, 2000 1 commit
  21. 03 Oct, 2000 1 commit
  22. 22 Sep, 2000 1 commit