1. 16 Sep, 2003 3 commits
  2. 02 Sep, 2003 1 commit
  3. 22 Aug, 2003 1 commit
  4. 25 Jul, 2003 1 commit
  5. 18 Jul, 2003 2 commits
  6. 17 Jul, 2003 1 commit
  7. 15 Jul, 2003 2 commits
    • Leigh B. Stoller's avatar
      Kill some leftover debugging. · 90e77f5c
      Leigh B. Stoller authored
      90e77f5c
    • Leigh B. Stoller's avatar
      A set of changes to make swapmod work on jailed nodes (note, swapmod · 92ff875a
      Leigh B. Stoller authored
      does not yet work with remove virtual nodes; that will take even more
      work).
      
      Added a new allocstate called RES_TEARDOWN. assign_wrapper no longer
      deallocates unused nodes, but rather moves them into the new state for
      the wrapper (tbswap) to deal with. Thats cause deleted vnodes need to
      be torn down, since its possible that the node on which they were
      living will not be deallocated (say, if there are other vnodes on
      it). We do not want to be doing that from assign_wrapper, so tbswap
      looks for those nodes.
      
      Made vnode_setup allocstate aware in the same way that os_setup is;
      do not reboot vnodes or try to set up vnodes when they are already in
      the RES_READY state, as they will be when doing a swapmod. In
      addition, if os_setup is going to reboot the underlying physnode, move
      the vnodes on that node into RES_READY too, since there they will
      setup automatically. Might need an interim state here, for correctness.
      92ff875a
  8. 08 Jul, 2003 2 commits
  9. 04 Jul, 2003 1 commit
  10. 25 Jun, 2003 1 commit
  11. 14 Apr, 2003 2 commits
  12. 07 Apr, 2003 1 commit
    • Chad Barb's avatar
      · 99a867cb
      Chad Barb authored
      Modify os_setup return codes to enable "intelligent" retry;
      
      Now os_setup returns:
        0 on success
        1 on one or more retry-friendly errors
       -1 on no-retry errors
      
      tbswap.in checks os_setup's return code,
      and will only retry on 1.
      99a867cb
  13. 18 Mar, 2003 1 commit
    • Chad Barb's avatar
      · e928fbe9
      Chad Barb authored
      Here it is; reswap.
      
      nfree
         - modified to put node in FREE_DIRTY when it is freed
      
      assign_wrapper
         - '-u' update switch added.
      
      os_setup
         - doesn't reboot node which is already in RES_READY
      
      tbswap
         - calls all this stuff appropriately
      e928fbe9
  14. 17 Mar, 2003 1 commit
    • Leigh B. Stoller's avatar
      Add "nextosid" slot to os_info table, for chaining from a generic OSID · 6eacae5e
      Leigh B. Stoller authored
      to a specific one, for the purposes of mapping things like FBSD-STD to
      FBSD47-STD (the current OSID to use). This is technically more correct
      than what os_setup used to do, which was map FBSD-STD to whatever
      FreeBSD OSID was currently on the disk. Now it maps to a specific one,
      and if that is not loaded, it sets up a reload.
      6eacae5e
  15. 31 Jan, 2003 2 commits
  16. 29 Jan, 2003 1 commit
  17. 07 Jan, 2003 1 commit
    • Leigh B. Stoller's avatar
      Changes for setting up jailed nodes, which need checks similar to what · 5ab15776
      Leigh B. Stoller authored
      real nodes get. Also, run a proper os_select on jailed nodes, *after*
      the os for the physical node is setup, since otherwise stated will not
      be happy.
      
      Fixes for dealing with failed os_load. Previously, if os_load would
      fail, os_setup would wait for those nodes anyway since it had no idea
      what nodes had failed (and we do not want to just quit from os_setup
      since that might cause a lot of extra power cycles). Now, for each
      node that got an os_load, check its eventstate; it should be in ISUP
      immediately after os_load exits (since thats what os_load waited for),
      and if its not, then mark that node as failed. Note though that failed
      loads no longer result in the node going into hwdown, since 99 percent
      of the time its a busted user image, not a hardware problem. I figure
      we will catch real hw errors via the reload daemon, when it sends
      email about nodes not finishing.
      
      Do not bother with doing the vnode setup if any of the phys nodes
      failed to setup. Leads to cascading errors and prolongs the angony by
      another few minutes. Might revisit this later.
      
      Remove local WaitTillAlive() function, and switch to using the version
      I put into libdb a couple of weeks ago.
      
      Fix up a bunch of print statements to be nicer.
      5ab15776
  18. 31 Oct, 2002 1 commit
  19. 18 Oct, 2002 1 commit
    • Mac Newbold's avatar
      Merge the newstated branch with the main tree. · 5c961517
      Mac Newbold authored
      Changes to watch out for:
      
      - db calls that change boot info in nodes table are now calls to os_select
      
      - whenever you want to change a node's pxe boot info, or def or next boot
      osids or paths, use os_select.
      
      - when you need to wait for a node to reach some point in the boot process
      (like ISUP), check the state in the database using the lib calls
      
      - Proxydhcp now sends a BOOTING state for each node that it talks to.
      
      - OSs that don't send ISUP will have one generated for them by stated
      either when they ping (if they support ping) or immediately after they get
      to BOOTING.
      
      - States now have timeouts. Actions aren't currently carried out, but they
      will be soon. If you notice problems here, let me know... we're still
      tuning it. (Before all timeouts were set to "none" in the db)
      
      One temporary change:
      
      - While I make our new free node manager daemon (freed), all nodes are
      forced into reloading when they're nfreed and the calls to reset the os
      are disabled (that will move into freed).
      5c961517
  20. 26 Sep, 2002 1 commit
  21. 05 Aug, 2002 1 commit
  22. 07 Jul, 2002 1 commit
  23. 03 Jul, 2002 1 commit
  24. 02 Jun, 2002 1 commit
  25. 13 May, 2002 1 commit
  26. 10 May, 2002 2 commits
  27. 09 May, 2002 1 commit
  28. 08 May, 2002 1 commit
  29. 22 Apr, 2002 1 commit
  30. 16 Apr, 2002 1 commit
  31. 05 Mar, 2002 1 commit
    • Leigh B. Stoller's avatar
      A wide ranging set of event system changes: · 0318cc22
      Leigh B. Stoller authored
      assign_wrapper.in: Hack in a change that ensures a delay node is
      created for any link on which an event is posted (up,down,modify),
      no matter what its initial parameters are. ie: If a link is created
      with no delay, but there is an event that adds a delay later, then we
      must drop in a delay node. Same for up/down on a link. We do this in
      the delay node. I am reasonably confident that this change is fine for
      duplex links, but I am less sure of the effect on lans!
      
      eventsys_control.in: Checkpoint latest changes. Add "replay" option,
      which right now just stops and starts the event scheduler so that it
      reloads the entire event list. Add check for existing experiment, and
      that the experiment is either active or swapping (do not want to start
      a scheduler for a swapped out experiment!). Add check to see if there
      are any events, and skip startup if there are not events in the DB.
      Lastly, get very serious about preventing more than one scheduler from
      being started, either by accident or intentionally. My protocol is to
      lock the table, grab and set the pid to -pid, test the pid for a
      positive value, and if positive, send the scheduler a kill(TERM) so
      that it can cleanup, clear the pid to zero in the DB, and exit. This
      approach ensures that we do not try to send a kill to a pid that is no
      longer active or owned by the user (this last part is not really
      necessary cause of how pids are reused, but it was easy to add so why
      not).
      
      exports_setup.in: Trivial change to make it easier to turn this on
      temporarily in devel trees.
      named_setup.in: Ditto.
      
      node_reboot.in: Add call to TBdbfork() in child cause of apparent DB
      connection problems across forks. In the child, set the eventstatus
      for the node to REBOOT if successful (not this event status stuff is
      temporary, will be recast in next set of revisions).
      
      GNUmakefile:  Add new controlling program, eventsys_control.
      power.in:     Ditto previous comment about REBOOT.
      os_setup.in:  Non event system cleanups.
      tbend.in:     Add DB cleanup of the new virt_trafgens and eventlist tables.
      tbprerun.in:  Ditto.
      tbreport.in:  Print out the event list in a pretty print format.
      tbswapin.in:  Add call to start the event system. Also a big fix; move
                    the named script up above the os_setup so that the named
                    tables have been updated by the time the first node
                    reboots. I noticed that nodes were failing on gethostbyname().
      tbswapout.in: Add call to stop the event system.
      0318cc22
  32. 12 Feb, 2002 1 commit