event/stated/stated.in · b438d5f5c3a3cbcda607bb290f69c1af9bac5ff1 · emulab / emulab-devel

Bunch of pretty good-sized changes to stated: · b438d5f5

Mac Newbold authored May 20, 2003

1. Change from inefficient timeout search algo that ran once per second to
a highly efficient priority queue method of managing timeouts. Now
instead of checking every node's timestamps, we just look at the head of
the queue, and it is often much less frequent than once a second, since we
know how long we have until the next timeout.

2. Start using a blocking poll for events, so I can sleep for long periods
of time instead of having to wake up at least once a second to check for
timeouts and events. Will set the block timeout for the shortest of: the
time to send out the next batch of queued emails, the next time a timeout
may occur, or when there are no mails waiting and no timeouts possible, 10
minutes. Comes back as soon as an event comes in.

3. Given the above two items, we no longer need a sleep(1) in our main
loop.

One small glitch is in the progress of being fixed. When using blocking
polls, things hang when trying to unregister from the event system. Not a
big deal, just ^C twice to kill it. (May cause it to need two SIGUSR1's
to get it to restart, too.)

In the next update, look for:
 - Really take action on timeouts.
   - keep track of how many times we've retried, and notify if something
     may be wrong with the node.
   - Find out policy on taking action with timeouts.
     - Do it if the expt is in transition or the node is free
     - Probably don't touch if the expt is established.
     - Maybe? in active expt, send (good) email to expt owner on timeouts

Related "coming soon" items:
os_load/os_setup etc.:
 - Add the waitforstate stuff we've talked about
 - make os_load/os_setup use it

b438d5f5