- 12 Jan, 2004 25 commits
-
-
Shashi Guruprasad authored
of -AX)
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Mike Hibler authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
isalive reported from pxeboot kernel when node is free).
-
Leigh B. Stoller authored
of the free nodes at once (113 nodes) three nodes lost bootinfo reply packets (one time each) causing them to retry, which was an invalid state (PXEWAIT to PXEBOOTING).
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Mike Hibler authored
Basically, I just updated it and changed it from a chronology to a summary (i.e., collected all the jail features into one list).
-
Mike Hibler authored
Fix a few nits.
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
handling PXEWAKUP timeouts, retrying 3 times and then forcing a power cycle. Changed BOOTING event action to auto switch in and out of the special PXEKERNEL state machine that all local nodes use since all local nodes boot the same pxeboot kernel and talk to bootinfo (as directed to by dhcp).
-
Leigh B. Stoller authored
number.
-
Leigh B. Stoller authored
filename to boot, and all local nodes will boot the same pxeboot kernel, which has been extended to allow for jumping directly into a specific MFS (in addition to the usual testbed boot into a partition or multiboot kernel). Bootinfo and the bootwhat protocol extended to tell the client node what MFS to jump into directly, without a reboot. pxe_boot_path and next_pxe_boot_path are now deprecated, with bootinfo used to control which MFS to boot. Nodes now boot a single pxeboot kernel, and bootinfo tells them what to do next. Bootinfo greatly simplifed. temp_boot_osid has been added to allow for temporary booting of different kernels (such as with ndoe_admin or create_image). Unlike next_boot_osid which is a one-shot boot, temp_boot_osid causes the node to boot that OS until told not too. next_boot_path and def_boot_path in the nodes table are now ignored. Bootinfo gets path info strictly from the os_info table entry for the osid given in one of def_boot_osid, temp_boot_osid, or next_boot_osid. This makes the selection of what to do in bootinfo a lot simpler (and for TBBootWhat in libdb). The os_info table also modified to include an MFS flag so that bootinfo knows to tell the client that the path refers to an MFS and not a multiboot kernel. Change to boot sequence; free nodes no longer boot into the default OSID. Instead, they are told to wait in pxeboot until told what to do, which will typically be when the node is allocated and a specific OSID picked. If the node needs to be reloaded, then the node is told to jump directly into the Frisbee MFS, which saves one complete reboot cycle whether the node has the requested OS installed, or not. New program added called "bootinfosend" that is used by node_reboot to "wake up" up nodes sitting in pxewait mode, so that they query bootinfo again and boot. node_reboot changed to look at the event state of a node, and use bootinfosend to wake up nodes, rather then power cycle, since pxeboot does not repsond to pings. Retry (if the UDP packet is lost) is handled by stated. Event support added to bootinfo, to replace the event generation that was in proxydhcp. I have not included the caching that Mac had in proxydhcp since it does not appear that bootinfo packets are lost very often. Cleaned up all of the event and DB queury code to use lib/libtb for DB access, and moved all of the event code into a separate file. The event sequence when a node boots now looks like this: 'SHUTDOWN' --> 'PXEBOOTING' (BootInfo) 'PXEBOOTING', --> 'PXEBOOTING' (BootInfo Retry) 'PXEBOOTING', --> 'BOOTING' (Node Not Free) 'PXEBOOTING', --> 'PXEWAIT' (Node is Free) 'PXEWAIT', --> 'PXEWAKEUP' (Node Allocated) 'PXEWAKEUP', --> 'PXEWAKEUP' (Bootinfo Retry) 'PXEWAKEUP', --> 'PXEBOOTING' (Node Woke Up) Change stated to support resending PXEWAKEUP events when node times out. After 3 tries, node is power cycled. Other minor cleanup in stated. Clean up and simplify os_select, while adding support for temp_next_boot and removing all trace of def_boot_path and next_boot_path processing. Remove all pxe_boot_path and next_pxe_boot_path processing. Changed command line interface to support "clearing" fields. For example, node_admin changed to call os_select like this to have the node temporarily boot the FreeBSD MFS: os_select -t FREEBSD-MFS pcXXX which sets temp_boot_osid. To turn admin mode off: os_select -c -t pcXXX which says to clear temp_boot_osid. sql/database-fill-supplemental.sql modifed to add os_info table entries for the FreeBSD, Frisbee, and newnode MFS's. Be sure to change dhcpd config, restart dhcp, kill proxydhcp, restart bootinfo,
-
Leigh B. Stoller authored
Add constants for the osids describing the FreeBSD and Frisbee MFSs. Complete redo of TBBooWhat to match the changes in bootinfo. Look there for description of new boot protocol (how TBBooWhat now works).
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
-
Leigh B. Stoller authored
clients load the pxeboot kernel. Proxydhcp is dead.
-
Shashi Guruprasad authored
code originally tried to do a normal $ns connect between traffic agents attached to simnodes on the same pnode. The problem that I forgot of course is that partitioned topology is quite disconnected which means that a packet is forced to exit the pnode and come back to it (in many cases). In other words, a direct intra pnode path does not exist. The fix is to just use the IP address based routes always. A similar problem is encountered in pdns as well. However, since IP address based routing is not used, there is no simple fix unless I work on it! The 416 node topology testbed/nse416 is working alright. It mapped to 20 pnodes and as soon as a whole bunch of traffic started up, 7 pnodes couldn't track real-time and caused a modify. Expt modify happened 3 times but eventually max_retries in my re-swapping code was reached. Need more measuring, tuning as well as eventrate based re-swapping.
-
Shashi Guruprasad authored
a simple problem in the duplex-link instproc which caused the code for simnode creation to go to one pnode while an rlink from this simnode was mapped to another pnode. Also added $ns rtproto Manual for generated tcl code since IP address based routes are being added.
-
- 10 Jan, 2004 1 commit
-
-
Kirk Webb authored
-
- 09 Jan, 2004 5 commits
-
-
Mike Hibler authored
I'm trying to get a handle on what to write for the paper...
-
Robert Ricci authored
-
Shashi Guruprasad authored
(or more than one simhost) is unable to keep up with real-time. It includes changes to assign_wrapper to handle swap modify for simnodes, the simple algorithm in nseswap that bumps up the nodeweight of simnodes being hosted on a simhost that reports "can't keep up with real-time" (aka nse violation), ptopgen and sim.tcl to prefer nodes that already have the FBSD-NSE image. Also, changes to other files to send out NSESWAP event. One unrelated change: We now have per-swapin .top files and assign.log files along with .ptop files. This helps in debugging across multiple swapins since files remain in the form of <pid>-<eid>-<process_id>.{top,ptop} and assign-<pid>-<eid>-<process_id>.log Also useful for archiving.
-
Shashi Guruprasad authored
-
Shashi Guruprasad authored
but not other virt_* tables. Thats coz the above two tables are not truly virtual since they contain the vnode to which the event should be sent to. My previous patch has been to use replace instead of insert. Unfortunately, the tables get messed up with the same agent having multiple entries where some of the entries were left behind from the previous swapin.
-
- 08 Jan, 2004 3 commits
-
-
Mac Newbold authored
committed last, but this dump came out in a very different order, so the diff looks huge, even though it is only about 8 lines.
-
Shashi Guruprasad authored
added -eventsys_restart option to tbswap. These options are allowed only with swapexp -s modify and correspondingly for tbswap with update. Tested with a 1 node experiment and things seem to work fine.
-
Shashi Guruprasad authored
of typing to log in to nodes, I wrote the same code in C. Compile on cygwin with gcc -mwindows -mno-cygwin -o ssh-mime ssh-mime.c . Set the env PUTTY_CMD to the path of where you have put putty. The compiled exe is in http://www.emulab.net/ssh-mime.exe
-
- 07 Jan, 2004 3 commits
-
-
Leigh B. Stoller authored
you typed the URL directly instead of indirecting from the project page. No one did that till today.
-
Leigh B. Stoller authored
probably imperfect, but better then nothing. New option, "-t tag" allows you to specify an arbitrary tag to match against the stated_tag of the nodes table. The stated invocation will only operate on nodes that match the tag, ignoring all events for other nodes. If unspecified, stated will operate on all nodes with a NULL tag. This is setup up at the beginning of time (or during a reload) saving the per-node tag in the $nodes hash. Each time an event arrives, check the tag in the table, ignoring the event if not a match. On signaled reload() must also be careful to throw away timeouts from the queue (and be careful not to set up new timeouts for ignored nodes). So, this allows you to set the tag for a node in the DB, and then HUP stated so that it reloads it tables. That node will now be ignored by that stated. Also made some changes to debug mode. In debug mode, don't worry about the pidfile or the lockfile or checking for other running stated (which causes my debug version to exit! right away). Also, added a new -l option to turn of syslog output and just send it all to stdout with the debug output. -l can be only be used with -d of course. So what can I do with all this: update nodes set stated_tag='lbs' where node_id='pc5'; sudo kill -HUP `cat /var/run/stated.pid` sudo stated -d -l -t lbs Which tells the main stated to ignore pc5. Then I run a debugging stated that operates only on pc5. Later when done: update nodes set stated_tag=NULL where node_id='pc5'; sudo kill -HUP `cat /var/run/stated.pid` Which tells the main stated to operate on pc5 again.
-
Shashi Guruprasad authored
when it cannot keep up with real-time. bug: This affected encapsulated simulator packets that had to cross multiple physical nodes before arriving at the destination simulator traffic agent. This bug didnt affect live packets from traffic sources on real PCs. The NSESWAP event is now sent via the tevc command. The nse scheduler waits for the slop factor (diff between clock and event dispatch time that exceeds a threshold) to be crossed multiple times in a second before sending the NSESWAP event. Currently 5 times in 1 second. However, this needs more careful thought and will get modified later. When is it really necessary to declare that an nse is overloaded? i.e. what is the right slop factor? How many times can we tolerate that the slop factor is exceeded to ensure end-to-end performance is within a certain percentage of the expected?
-
- 06 Jan, 2004 3 commits
-
-
Shashi Guruprasad authored
number of times re-swapin has occured due to a simhost not being able to keep up with real-time.
-
Robert Ricci authored
again. One of the fixes changes the way in which we iterate through pclasses in find_pnode(). We used to treat the vector like a ring buffer, and start (randomly) someplace in the middle. This turns out to give some bad statistical properties when doing dynamic pclasses, since long chains of disabled pclasses will cause some pclasses to be selected more often. My old hack of just hopping around randomly in the disabled-pclass case was bad, because it's hard to tell when you've actually tried all the pclasses - so, we were getting false negatives where it was looking like there was no place available where we could map a vnode, which turned out to have worse effects than I had thought. So, now, we make a list of all the indices and randomize the order, then just iterate through that list. We also now count the number of pclasses that are enabled at every temperature step, and adjust the neighborhood size to remove them. This makes dynamic pclasses quite a bit faster - it cuts the time by 30% - 50% for my test case. Cleaned up find_pnode() by removing some #ifdef's that we don't use, and probably will never want to again - this makes the function almost readable!
-
Leigh B. Stoller authored
Well, thats the hope. Not sure it will work, but might as well try.
-