-
Mac Newbold authored
1. Change from inefficient timeout search algo that ran once per second to a highly efficient priority queue method of managing timeouts. Now instead of checking every node's timestamps, we just look at the head of the queue, and it is often much less frequent than once a second, since we know how long we have until the next timeout. 2. Start using a blocking poll for events, so I can sleep for long periods of time instead of having to wake up at least once a second to check for timeouts and events. Will set the block timeout for the shortest of: the time to send out the next batch of queued emails, the next time a timeout may occur, or when there are no mails waiting and no timeouts possible, 10 minutes. Comes back as soon as an event comes in. 3. Given the above two items, we no longer need a sleep(1) in our main loop. One small glitch is in the progress of being fixed. When using blocking polls, things hang when trying to unregister from the event system. Not a big deal, just ^C twice to kill it. (May cause it to need two SIGUSR1's to get it to restart, too.) In the next update, look for: - Really take action on timeouts. - keep track of how many times we've retried, and notify if something may be wrong with the node. - Find out policy on taking action with timeouts. - Do it if the expt is in transition or the node is free - Probably don't touch if the expt is established. - Maybe? in active expt, send (good) email to expt owner on timeouts Related "coming soon" items: os_load/os_setup etc.: - Add the waitforstate stuff we've talked about - make os_load/os_setup use it
b438d5f5