1. 29 Aug, 2003 1 commit
  2. 25 Aug, 2003 1 commit
  3. 22 Aug, 2003 1 commit
  4. 19 Jun, 2003 1 commit
    • Mac Newbold's avatar
      The new and fully functional rebooting-via-events stuff and the · 1daaa992
      Mac Newbold authored
      really-reboot-nodes-that-timeout stuff.
      
      NOTE: Until the timeout/retry stuff is gone from os_load/os_setup, it is
      disabled in stated. It will still only send email. But all the stuff is
      there and has been tested.
      
      NOTE: Until other things don't depend on the old behavior of node_reboot
      (when it returns, all nodes are in SHUTDOWN), the event stuff is disabled.
      Real mode is the default, and can be run by anyone.
      
      In short, this commit is new versions of stated and node_reboot that act
      almost exactly like the old ones. But I wanted to commit them before I go
      on making a bunch more changes, to have a checkpoint that I know works.
      1daaa992
  5. 06 Jun, 2003 1 commit
  6. 13 May, 2003 1 commit
  7. 04 Apr, 2003 1 commit
  8. 20 Mar, 2003 3 commits
  9. 19 Mar, 2003 1 commit
    • Mac Newbold's avatar
      New slothd change: · b73aee17
      Mac Newbold authored
      node_reboot reports node activity into the "last_ext_act" column of
      node_activity. (Ie activity that is external to the node.)
      
      This means that swapin, swapout, reload, etc etc, anything that reboots
      the node from boss/ops, will count as activity.
      b73aee17
  10. 07 Jan, 2003 1 commit
  11. 31 Dec, 2002 1 commit
    • Leigh B. Stoller's avatar
      Add support for rebooing jailed (virtual) nodes, either remote or · ab8b901f
      Leigh B. Stoller authored
      local. For local nodes, need to cull out jailed nodes if the phys node
      is also going to reboot. Jailed nodes are rebooted serially since they
      go down much faster.
      
      Fix up recently added wait mode for jailed nodes. Also, I noticed that
      I was having problems with events not filtering through stated before
      going into the ISUP wait loop; I was catching the nodes still in ISUP
      instead of SHUTDOWN. I added a sleep(2) before going into wait mode,
      but this might be something to watch out for elsewhere too.
      ab8b901f
  12. 18 Oct, 2002 1 commit
    • Mac Newbold's avatar
      Merge the newstated branch with the main tree. · 5c961517
      Mac Newbold authored
      Changes to watch out for:
      
      - db calls that change boot info in nodes table are now calls to os_select
      
      - whenever you want to change a node's pxe boot info, or def or next boot
      osids or paths, use os_select.
      
      - when you need to wait for a node to reach some point in the boot process
      (like ISUP), check the state in the database using the lib calls
      
      - Proxydhcp now sends a BOOTING state for each node that it talks to.
      
      - OSs that don't send ISUP will have one generated for them by stated
      either when they ping (if they support ping) or immediately after they get
      to BOOTING.
      
      - States now have timeouts. Actions aren't currently carried out, but they
      will be soon. If you notice problems here, let me know... we're still
      tuning it. (Before all timeouts were set to "none" in the db)
      
      One temporary change:
      
      - While I make our new free node manager daemon (freed), all nodes are
      forced into reloading when they're nfreed and the calls to reset the os
      are disabled (that will move into freed).
      5c961517
  13. 17 Oct, 2002 1 commit
  14. 07 Oct, 2002 1 commit
  15. 20 Sep, 2002 1 commit
    • Mac Newbold's avatar
      Remove -e flag from calls to power. node_reboot sends an event only when ssh... · 8b23d335
      Mac Newbold authored
      Remove -e flag from calls to power. node_reboot sends an event only when ssh reboot or ipod are successful in rebooting the node, and only calls power when they are not successful. So an event should be sent by power every time node_reboot calls it. This explains some of the problems we were having with tons of email from stated about invalid transitions: since the state changes weren't always happening, it appeared to skip over states.
      8b23d335
  16. 07 Jul, 2002 1 commit
  17. 19 Jun, 2002 1 commit
  18. 16 Jun, 2002 1 commit
  19. 07 Jun, 2002 1 commit
  20. 05 Jun, 2002 1 commit
    • Leigh B. Stoller's avatar
      Changes to sshtb. Remove sshremote, and convert sshtb into a perl · 231fc2b1
      Leigh B. Stoller authored
      script that checks the database to see if local or remote. The problem
      with this is that the ssh syntax makes it hard to determine the host
      name by inspection. Would need to parse all the ssh args (bad idea),
      ot work backwards and try to figure out the difference between the
      command (which is not a string but a sequence of args) and the host
      and the preceeding ssh args. Hell with that! Changed sshtb to require
      a specific -host argument. Read the args and look for it. Error out of
      not found, to catch improper usage.
      
      The moral of this update: "sshtb [ssh args] -host <host> [more args ...]
      231fc2b1
  21. 22 Apr, 2002 1 commit
  22. 17 Apr, 2002 1 commit
    • Robert Ricci's avatar
      Moved EventSend calls to the TBSetNodeEventState() function. This has · 15c13c32
      Robert Ricci authored
      two benefits: (1) More general (2) Regains ability to run without the
      event system. Previously, since programs that watned to set node state
      had to 'use event', this broke our ability to run without the event
      system. Now, we can do a check in libdb for the event system, and not
      use it if EVENTSYS is not set. If not, we update state in the database
      directly rather than sending an event.
      
      Also added equivalent calls for node operational mode, as well as new
      constants for both state and mode.
      
      Converted power and node_reboot to use this new scheme.
      15c13c32
  23. 03 Apr, 2002 1 commit
  24. 01 Apr, 2002 1 commit
    • Robert Ricci's avatar
      Transition to tmcd and event-based node state reporting. · 44311142
      Robert Ricci authored
      Changed scripts that used the 'eventstatus' column to use the more
      descriptively-named 'eventstate' column.
      
      The FreeBSD and Linux starup scripts report a 'REBOOTED' state to tmcd
      when they start, and 'ISUP' when the starup script is done.
      
      node_reboot and power now send TBNODESTATE/REBOOTING events.
      44311142
  25. 05 Mar, 2002 2 commits
    • Leigh B. Stoller's avatar
    • Leigh B. Stoller's avatar
      A wide ranging set of event system changes: · 0318cc22
      Leigh B. Stoller authored
      assign_wrapper.in: Hack in a change that ensures a delay node is
      created for any link on which an event is posted (up,down,modify),
      no matter what its initial parameters are. ie: If a link is created
      with no delay, but there is an event that adds a delay later, then we
      must drop in a delay node. Same for up/down on a link. We do this in
      the delay node. I am reasonably confident that this change is fine for
      duplex links, but I am less sure of the effect on lans!
      
      eventsys_control.in: Checkpoint latest changes. Add "replay" option,
      which right now just stops and starts the event scheduler so that it
      reloads the entire event list. Add check for existing experiment, and
      that the experiment is either active or swapping (do not want to start
      a scheduler for a swapped out experiment!). Add check to see if there
      are any events, and skip startup if there are not events in the DB.
      Lastly, get very serious about preventing more than one scheduler from
      being started, either by accident or intentionally. My protocol is to
      lock the table, grab and set the pid to -pid, test the pid for a
      positive value, and if positive, send the scheduler a kill(TERM) so
      that it can cleanup, clear the pid to zero in the DB, and exit. This
      approach ensures that we do not try to send a kill to a pid that is no
      longer active or owned by the user (this last part is not really
      necessary cause of how pids are reused, but it was easy to add so why
      not).
      
      exports_setup.in: Trivial change to make it easier to turn this on
      temporarily in devel trees.
      named_setup.in: Ditto.
      
      node_reboot.in: Add call to TBdbfork() in child cause of apparent DB
      connection problems across forks. In the child, set the eventstatus
      for the node to REBOOT if successful (not this event status stuff is
      temporary, will be recast in next set of revisions).
      
      GNUmakefile:  Add new controlling program, eventsys_control.
      power.in:     Ditto previous comment about REBOOT.
      os_setup.in:  Non event system cleanups.
      tbend.in:     Add DB cleanup of the new virt_trafgens and eventlist tables.
      tbprerun.in:  Ditto.
      tbreport.in:  Print out the event list in a pretty print format.
      tbswapin.in:  Add call to start the event system. Also a big fix; move
                    the named script up above the os_setup so that the named
                    tables have been updated by the time the first node
                    reboots. I noticed that nodes were failing on gethostbyname().
      tbswapout.in: Add call to stop the event system.
      0318cc22
  26. 27 Nov, 2001 1 commit
  27. 16 Oct, 2001 1 commit
  28. 25 Jul, 2001 2 commits
    • Mac Newbold's avatar
      Fix small syntax error. · 27de539d
      Mac Newbold authored
      27de539d
    • Leigh B. Stoller's avatar
      Another Shark hack. Well, maybe not. Batch node_reboots in groups of 8 · 73437a5c
      Leigh B. Stoller authored
      to avoid a blizzard of reboots all at once. This might solve the
      problem of sharks rebooting okay, but failing to become proper members
      of the testbed. A good thing to do in any event, especially with
      people trying to run 50 node experiments. The reason for 8 of course
      is that I want to isolate each shelf (after sorting the list). I pause
      15 seconds between each shelf, and 10 seconds between each batch of 8
      pcs.
      73437a5c
  29. 12 Jul, 2001 1 commit
  30. 11 Jul, 2001 2 commits
  31. 26 Jun, 2001 1 commit
    • Robert Ricci's avatar
      New script: sshtb · 9de266c3
      Robert Ricci authored
      sshtb is a _very_ simple shell script that runs ssh with a few commandline
      parameters, which make it play nicer in an script environment. These
      parameters can be changed with the '--with-ssh-args' argument, but default to:
      '-q -o "BatchMode yes" -o "StrictHostKeyChecking no"'
      All ssh calls now use this script.
      9de266c3
  32. 05 Jun, 2001 1 commit
  33. 10 May, 2001 1 commit
    • Leigh B. Stoller's avatar
      Lots of little changes for sending email to the right places, with · 3285bc3e
      Leigh B. Stoller authored
      proper headers. Split out some of the mail into testbed-logs,
      testbed-ops, and testbed-approval. Added a library for including from
      our perl scripts. Contains a couple of mail helper functions, but will
      hopefully contain more as time goes by.
      
      Fixed a bug in the web interface that was causing breakage for people
      with multiple accounts. Mac and Jay have noticed this, when logging
      out and trying to join or create a project under a new or different
      name.
      3285bc3e
  34. 12 Apr, 2001 1 commit
  35. 11 Apr, 2001 1 commit