- 15 Nov, 2004 1 commit
-
-
Leigh B. Stoller authored
hack to deal with inner vs outer control network.
-
- 29 Oct, 2004 1 commit
-
-
Leigh B. Stoller authored
experiment from the web interface, I ran into another control network problem, this time in bootinfo. When a node is sitting free, it waits in pxeboot for a bootinfo packet from boss to tell it what to do (this is different then when the node is allocated, and bootinfo tells it what to do in a reply to the initial request). In the PXEWAIT case, we *send* it a packet, addressed to its *control network* address, which in the inner DB, is on the inner control network, but of course PXE is really using the outer control network, so packets addressed to inner control network are never seen by pxeboot. This is the only (known) case of this happening, and rather then try for some general, over engineered solution, I did something unusual, and put in a hack, ifdefed for ELABINELAB (meaning, its an inner elab). I know, you're thinking, how could he have done such a thing, its so unlike him! Well, it was damn easy! Anyway, this little hack checks the DB for an interface tagged as role='outer_ctrl' and uses that IP instead of the inner control network. When I create the inner DB from the outer DB, I was already leaving the outer control network in place so that bootinfo could find the proper node (again, cause the bootinfo request packets are coming from the outer control network, and so its IP would not match any nodes in the DB). I'd like to say that this is the last problem with swapin, but I see in my other window that the event scheduler failed to start on inner ops with some silly error ssh permission denied error. Whats that all about?
-
- 21 Jan, 2004 1 commit
-
-
Leigh B. Stoller authored
-
- 12 Jan, 2004 3 commits
-
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
filename to boot, and all local nodes will boot the same pxeboot kernel, which has been extended to allow for jumping directly into a specific MFS (in addition to the usual testbed boot into a partition or multiboot kernel). Bootinfo and the bootwhat protocol extended to tell the client node what MFS to jump into directly, without a reboot. pxe_boot_path and next_pxe_boot_path are now deprecated, with bootinfo used to control which MFS to boot. Nodes now boot a single pxeboot kernel, and bootinfo tells them what to do next. Bootinfo greatly simplifed. temp_boot_osid has been added to allow for temporary booting of different kernels (such as with ndoe_admin or create_image). Unlike next_boot_osid which is a one-shot boot, temp_boot_osid causes the node to boot that OS until told not too. next_boot_path and def_boot_path in the nodes table are now ignored. Bootinfo gets path info strictly from the os_info table entry for the osid given in one of def_boot_osid, temp_boot_osid, or next_boot_osid. This makes the selection of what to do in bootinfo a lot simpler (and for TBBootWhat in libdb). The os_info table also modified to include an MFS flag so that bootinfo knows to tell the client that the path refers to an MFS and not a multiboot kernel. Change to boot sequence; free nodes no longer boot into the default OSID. Instead, they are told to wait in pxeboot until told what to do, which will typically be when the node is allocated and a specific OSID picked. If the node needs to be reloaded, then the node is told to jump directly into the Frisbee MFS, which saves one complete reboot cycle whether the node has the requested OS installed, or not. New program added called "bootinfosend" that is used by node_reboot to "wake up" up nodes sitting in pxewait mode, so that they query bootinfo again and boot. node_reboot changed to look at the event state of a node, and use bootinfosend to wake up nodes, rather then power cycle, since pxeboot does not repsond to pings. Retry (if the UDP packet is lost) is handled by stated. Event support added to bootinfo, to replace the event generation that was in proxydhcp. I have not included the caching that Mac had in proxydhcp since it does not appear that bootinfo packets are lost very often. Cleaned up all of the event and DB queury code to use lib/libtb for DB access, and moved all of the event code into a separate file. The event sequence when a node boots now looks like this: 'SHUTDOWN' --> 'PXEBOOTING' (BootInfo) 'PXEBOOTING', --> 'PXEBOOTING' (BootInfo Retry) 'PXEBOOTING', --> 'BOOTING' (Node Not Free) 'PXEBOOTING', --> 'PXEWAIT' (Node is Free) 'PXEWAIT', --> 'PXEWAKEUP' (Node Allocated) 'PXEWAKEUP', --> 'PXEWAKEUP' (Bootinfo Retry) 'PXEWAKEUP', --> 'PXEBOOTING' (Node Woke Up) Change stated to support resending PXEWAKEUP events when node times out. After 3 tries, node is power cycled. Other minor cleanup in stated. Clean up and simplify os_select, while adding support for temp_next_boot and removing all trace of def_boot_path and next_boot_path processing. Remove all pxe_boot_path and next_pxe_boot_path processing. Changed command line interface to support "clearing" fields. For example, node_admin changed to call os_select like this to have the node temporarily boot the FreeBSD MFS: os_select -t FREEBSD-MFS pcXXX which sets temp_boot_osid. To turn admin mode off: os_select -c -t pcXXX which says to clear temp_boot_osid. sql/database-fill-supplemental.sql modifed to add os_info table entries for the FreeBSD, Frisbee, and newnode MFS's. Be sure to change dhcpd config, restart dhcp, kill proxydhcp, restart bootinfo,
-