1. 29 Jan, 2003 3 commits
  2. 07 Jan, 2003 2 commits
  3. 20 Dec, 2002 1 commit
  4. 16 Dec, 2002 1 commit
    • Mac Newbold's avatar
      Fix the 1-event-per-second limitations. Poll until I don't get more · a77a1559
      Mac Newbold authored
      events. This may delay handling of other stuff that happens in my main
      loop, but not by too much. To prevent skew, everything (including reload
      frequency) is done strictly by seconds elapsed, not by iterations or
      anything.
      
      I found that even polling for multiple events without sleeping, I could
      only handle a little over 1 per second when I was calling inuse/statetime
      for additional info on every event. Even though this only happens in the
      worst case (every event is wrong), it won't do. So I took that out. I'll
      probably end up adding a faster lookup of the info I need (mostly
      reservation, and what osid it thinks it is running). That change took it
      up to at least 4 per second (as fast as I could send them manually), more
      than 4x our previous performance. So we should be able to keep up now.
      
      Also, add the support for "announcements" to testbed ops when I die and
      such. (Been in a few days, but this is the first commit of it)
      a77a1559
  5. 09 Dec, 2002 1 commit
  6. 03 Dec, 2002 1 commit
  7. 22 Nov, 2002 1 commit
  8. 14 Nov, 2002 1 commit
    • Mac Newbold's avatar
      Lots of changes. · 349db7bf
      Mac Newbold authored
      First, fix up the isup generation code. When a node/OS doesn't send its
      own isups, but is pingable, we need to fork and ping it, and send ISUP
      when it pings. The code was there, but was broken. This fixes it. The one
      time that it may cause errant messages is in modes other than MINIMAL.
      When we get BOOTING, we check if it needs isup generated. If we have to
      ping it, when it pings we send ISUP. This means that if we are really in
      NORMAL mode, we might send ISUP before the node sends REBOOTED (or TBSETUP
      in NORMALv1), and it would look funny. But that case will be really rare,
      since everything that sends REBOOTED or TBSETUP has no reason not to send
      ISUP itself.
      
      Second, after mailbombing myself a couple of times, Kirk and I decided I'd
      better put some throttling in the notification code that stated uses. So
      now it throttles itself and digests the messages if they're sent too close
      together. The first message it gets will get sent immediately. If the next
      one is long enough after that, it sends it immediately too. If a message
      comes too soon after sending one, we queue it up, and send it later
      after enough time has passed. Currently it is set to wait 5 seconds
      between messages, so it will send up to 12 per second, and wait no more
      than 5 seconds before sending a message that is queued up.
      
      (Something similar to this may be a nice thing in the rest of our stuff,
      but it was made a lot easier by the fact that stated already had a polling
      loop in it. Without that, you'd have to use alarms or some other weird
      thing, which would be painful.)
      349db7bf
  9. 05 Nov, 2002 1 commit
  10. 04 Nov, 2002 1 commit
    • Mac Newbold's avatar
      Bunch o' changes. · e9dcf743
      Mac Newbold authored
       - Better pidfile handling, do proper locking, etc.
       - Change die() to fatal(), so it sends mail and goes to syslog instead of
         to /dev/null
       - Fix RESET to not reset pxe_boot_path for Mike.
       - Fix sendmail call to have proper to and from addrs
      e9dcf743
  11. 01 Nov, 2002 1 commit
  12. 31 Oct, 2002 1 commit
  13. 22 Oct, 2002 2 commits
  14. 18 Oct, 2002 1 commit
    • Mac Newbold's avatar
      Merge the newstated branch with the main tree. · 5c961517
      Mac Newbold authored
      Changes to watch out for:
      
      - db calls that change boot info in nodes table are now calls to os_select
      
      - whenever you want to change a node's pxe boot info, or def or next boot
      osids or paths, use os_select.
      
      - when you need to wait for a node to reach some point in the boot process
      (like ISUP), check the state in the database using the lib calls
      
      - Proxydhcp now sends a BOOTING state for each node that it talks to.
      
      - OSs that don't send ISUP will have one generated for them by stated
      either when they ping (if they support ping) or immediately after they get
      to BOOTING.
      
      - States now have timeouts. Actions aren't currently carried out, but they
      will be soon. If you notice problems here, let me know... we're still
      tuning it. (Before all timeouts were set to "none" in the db)
      
      One temporary change:
      
      - While I make our new free node manager daemon (freed), all nodes are
      forced into reloading when they're nfreed and the calls to reset the os
      are disabled (that w...
      5c961517
  15. 20 Sep, 2002 2 commits
  16. 19 Sep, 2002 1 commit
    • Robert Ricci's avatar
      A few changes for use with the testsuite's 'full' mode: · 509c7b38
      Robert Ricci authored
      1) Checks database redirects for nodes, and ignore events that aren't
         directed to its database.
      2) Doesn't insist on being run as root (doesn't need to be right now,
         anyway.)
      3) '-f' option that prevents it from forking into the backgound, for
         easier killing.
      509c7b38
  17. 10 Jul, 2002 1 commit
  18. 12 Jun, 2002 1 commit
  19. 10 Jun, 2002 1 commit
  20. 20 May, 2002 1 commit
  21. 25 Apr, 2002 1 commit
  22. 16 Apr, 2002 1 commit
  23. 10 Apr, 2002 1 commit
    • Robert Ricci's avatar
      First pass at operational mode support for node states. · 4db415f5
      Robert Ricci authored
      Operational mode (op_mode in the database) affects the state diagram
      and timeouts for a node. Modes planned so far are:
      NORMAL    - Normal operation
      DELAYING  - Acting as a delay node
      UNKNOWNOS - Running an OS that does not report its state (OSKit kernels, etc.)
      RELOADING - Disk reloading
      
      stated now responds to to TBNODEOPMODE events, and sets database state
      accordingly. The set of state timeouts and valid state transitions are
      affected by a node's operational mode.
      
      The nodes table now stores information about operational modes, and
      the state_transitions and state_timeouts tables include the operational
      mode in addition to states.
      
      Next step will be to get the appropriate programs to send TBNODEOPMODE
      events.
      4db415f5
  24. 02 Apr, 2002 1 commit
    • Robert Ricci's avatar
      Changed behavior when reloading node state from database. Now, if we · 67d3205d
      Robert Ricci authored
      find a node that we already knew about, and it hasn't changed state or
      timestamp, we just use the old entry. This allows us to still notice
      new nodes, or nodes that have had their state changed externally (say,
      by hand), but not forget about nodes we've already sent mail about.
      67d3205d
  25. 01 Apr, 2002 1 commit
  26. 29 Mar, 2002 1 commit
    • Robert Ricci's avatar
      Remove SWIG from the build process - unfortunately, it's slightly · 8ffb8cf4
      Robert Ricci authored
      broken. Also, it made me slightly uneasy that there was no way to
      prevent swig from putting one of its generated files in sorce
      directory. So, I've just checked in the two major files that get
      generated by SWIG, so that the make rule that runs it never gets
      invoked.
      
      One of the reasons for doing this is that swig generates slightly
      broken code when the -exportall (which does perl module exports
      correctly) arugment is given. A very minor amount of manual tweaking
      of the generated .pm file can fix this problem. So, the checked in
      copy of event.pm has these tweaks applied.
      
      As a result of all of this, exports work correctly in the event perl
      module, so the hacky practice of putting your program in the event
      namespace is no longer necessary.
      8ffb8cf4
  27. 28 Mar, 2002 1 commit
    • Robert Ricci's avatar
      New script: stated · 447bb8a5
      Robert Ricci authored
      Watches for events sent by TMCD regarding the state of nodes. Records
      this information in the database. Also watches for nodes that undergo
      invalid state transitions, or stay in the same state for too long.
      Right now, the only action it takes is to send email, but in the
      future, will take action to 'unstick' nodes.
      
      Not yet installed by default.
      447bb8a5