1. 22 Aug, 2003 1 commit
  2. 25 Jul, 2003 1 commit
  3. 18 Jul, 2003 2 commits
  4. 17 Jul, 2003 1 commit
  5. 15 Jul, 2003 2 commits
    • Leigh B. Stoller's avatar
      Kill some leftover debugging. · 90e77f5c
      Leigh B. Stoller authored
      90e77f5c
    • Leigh B. Stoller's avatar
      A set of changes to make swapmod work on jailed nodes (note, swapmod · 92ff875a
      Leigh B. Stoller authored
      does not yet work with remove virtual nodes; that will take even more
      work).
      
      Added a new allocstate called RES_TEARDOWN. assign_wrapper no longer
      deallocates unused nodes, but rather moves them into the new state for
      the wrapper (tbswap) to deal with. Thats cause deleted vnodes need to
      be torn down, since its possible that the node on which they were
      living will not be deallocated (say, if there are other vnodes on
      it). We do not want to be doing that from assign_wrapper, so tbswap
      looks for those nodes.
      
      Made vnode_setup allocstate aware in the same way that os_setup is;
      do not reboot vnodes or try to set up vnodes when they are already in
      the RES_READY state, as they will be when doing a swapmod. In
      addition, if os_setup is going to reboot the underlying physnode, move
      the vnodes on that node into RES_READY too, since there they will
      setup automatically. Might need an interim state here, for correctness.
      92ff875a
  6. 08 Jul, 2003 2 commits
  7. 04 Jul, 2003 1 commit
  8. 25 Jun, 2003 1 commit
  9. 14 Apr, 2003 2 commits
  10. 07 Apr, 2003 1 commit
    • Chad Barb's avatar
      · 99a867cb
      Chad Barb authored
      Modify os_setup return codes to enable "intelligent" retry;
      
      Now os_setup returns:
        0 on success
        1 on one or more retry-friendly errors
       -1 on no-retry errors
      
      tbswap.in checks os_setup's return code,
      and will only retry on 1.
      99a867cb
  11. 18 Mar, 2003 1 commit
    • Chad Barb's avatar
      · e928fbe9
      Chad Barb authored
      Here it is; reswap.
      
      nfree
         - modified to put node in FREE_DIRTY when it is freed
      
      assign_wrapper
         - '-u' update switch added.
      
      os_setup
         - doesn't reboot node which is already in RES_READY
      
      tbswap
         - calls all this stuff appropriately
      e928fbe9
  12. 17 Mar, 2003 1 commit
    • Leigh B. Stoller's avatar
      Add "nextosid" slot to os_info table, for chaining from a generic OSID · 6eacae5e
      Leigh B. Stoller authored
      to a specific one, for the purposes of mapping things like FBSD-STD to
      FBSD47-STD (the current OSID to use). This is technically more correct
      than what os_setup used to do, which was map FBSD-STD to whatever
      FreeBSD OSID was currently on the disk. Now it maps to a specific one,
      and if that is not loaded, it sets up a reload.
      6eacae5e
  13. 31 Jan, 2003 2 commits
  14. 29 Jan, 2003 1 commit
  15. 07 Jan, 2003 1 commit
    • Leigh B. Stoller's avatar
      Changes for setting up jailed nodes, which need checks similar to what · 5ab15776
      Leigh B. Stoller authored
      real nodes get. Also, run a proper os_select on jailed nodes, *after*
      the os for the physical node is setup, since otherwise stated will not
      be happy.
      
      Fixes for dealing with failed os_load. Previously, if os_load would
      fail, os_setup would wait for those nodes anyway since it had no idea
      what nodes had failed (and we do not want to just quit from os_setup
      since that might cause a lot of extra power cycles). Now, for each
      node that got an os_load, check its eventstate; it should be in ISUP
      immediately after os_load exits (since thats what os_load waited for),
      and if its not, then mark that node as failed. Note though that failed
      loads no longer result in the node going into hwdown, since 99 percent
      of the time its a busted user image, not a hardware problem. I figure
      we will catch real hw errors via the reload daemon, when it sends
      email about nodes not finishing.
      
      Do not bother with doing the vnode setup if any of the phys nodes
      failed to setup. Leads to cascading errors and prolongs the angony by
      another few minutes. Might revisit this later.
      
      Remove local WaitTillAlive() function, and switch to using the version
      I put into libdb a couple of weeks ago.
      
      Fix up a bunch of print statements to be nicer.
      5ab15776
  16. 31 Oct, 2002 1 commit
  17. 18 Oct, 2002 1 commit
    • Mac Newbold's avatar
      Merge the newstated branch with the main tree. · 5c961517
      Mac Newbold authored
      Changes to watch out for:
      
      - db calls that change boot info in nodes table are now calls to os_select
      
      - whenever you want to change a node's pxe boot info, or def or next boot
      osids or paths, use os_select.
      
      - when you need to wait for a node to reach some point in the boot process
      (like ISUP), check the state in the database using the lib calls
      
      - Proxydhcp now sends a BOOTING state for each node that it talks to.
      
      - OSs that don't send ISUP will have one generated for them by stated
      either when they ping (if they support ping) or immediately after they get
      to BOOTING.
      
      - States now have timeouts. Actions aren't currently carried out, but they
      will be soon. If you notice problems here, let me know... we're still
      tuning it. (Before all timeouts were set to "none" in the db)
      
      One temporary change:
      
      - While I make our new free node manager daemon (freed), all nodes are
      forced into reloading when they're nfreed and the calls to reset the os
      are disabled (that will move into freed).
      5c961517
  18. 26 Sep, 2002 1 commit
  19. 05 Aug, 2002 1 commit
  20. 07 Jul, 2002 1 commit
  21. 03 Jul, 2002 1 commit
  22. 02 Jun, 2002 1 commit
  23. 13 May, 2002 1 commit
  24. 10 May, 2002 2 commits
  25. 09 May, 2002 1 commit
  26. 08 May, 2002 1 commit
  27. 22 Apr, 2002 1 commit
  28. 16 Apr, 2002 1 commit
  29. 05 Mar, 2002 1 commit
    • Leigh B. Stoller's avatar
      A wide ranging set of event system changes: · 0318cc22
      Leigh B. Stoller authored
      assign_wrapper.in: Hack in a change that ensures a delay node is
      created for any link on which an event is posted (up,down,modify),
      no matter what its initial parameters are. ie: If a link is created
      with no delay, but there is an event that adds a delay later, then we
      must drop in a delay node. Same for up/down on a link. We do this in
      the delay node. I am reasonably confident that this change is fine for
      duplex links, but I am less sure of the effect on lans!
      
      eventsys_control.in: Checkpoint latest changes. Add "replay" option,
      which right now just stops and starts the event scheduler so that it
      reloads the entire event list. Add check for existing experiment, and
      that the experiment is either active or swapping (do not want to start
      a scheduler for a swapped out experiment!). Add check to see if there
      are any events, and skip startup if there are not events in the DB.
      Lastly, get very serious about preventing more than one scheduler from
      being started, either by accident or intentionally. My protocol is to
      lock the table, grab and set the pid to -pid, test the pid for a
      positive value, and if positive, send the scheduler a kill(TERM) so
      that it can cleanup, clear the pid to zero in the DB, and exit. This
      approach ensures that we do not try to send a kill to a pid that is no
      longer active or owned by the user (this last part is not really
      necessary cause of how pids are reused, but it was easy to add so why
      not).
      
      exports_setup.in: Trivial change to make it easier to turn this on
      temporarily in devel trees.
      named_setup.in: Ditto.
      
      node_reboot.in: Add call to TBdbfork() in child cause of apparent DB
      connection problems across forks. In the child, set the eventstatus
      for the node to REBOOT if successful (not this event status stuff is
      temporary, will be recast in next set of revisions).
      
      GNUmakefile:  Add new controlling program, eventsys_control.
      power.in:     Ditto previous comment about REBOOT.
      os_setup.in:  Non event system cleanups.
      tbend.in:     Add DB cleanup of the new virt_trafgens and eventlist tables.
      tbprerun.in:  Ditto.
      tbreport.in:  Print out the event list in a pretty print format.
      tbswapin.in:  Add call to start the event system. Also a big fix; move
                    the named script up above the os_setup so that the named
                    tables have been updated by the time the first node
                    reboots. I noticed that nodes were failing on gethostbyname().
      tbswapout.in: Add call to stop the event system.
      0318cc22
  30. 12 Feb, 2002 1 commit
  31. 08 Feb, 2002 1 commit
    • Leigh B. Stoller's avatar
      Big round of image/osid changes. This is the first cut (final cut?) at · a73e627e
      Leigh B. Stoller authored
      supporting autocreating and autoloading images. The imageid form now
      sports a field to specify a nodeid to create the image from; If set,
      the backend create_image script is invoked. Thats the easy part.
      Slightly harder is autoloading images based on the osid specified in
      the NS file. To support this, I have added a new DB table called
      osidtoimageid, which holds the mapping from osid/pctype to imageid.
      When users create images, they must specify what node types that image
      is good for. Obviously, the mappings have to be unique or it would be
      impossible to figure it out! Anyway, once that image mapping is
      in place and the image created, the user can specify that ID in the NS
      file. I've changed os_setup to to look for IDs that are not loaded,
      and to try and find one in the osidtoimageid. If found, it invokes
      os_load. To keep things running in parallel as much as possible,
      os_setup issues all the loads/reboots (could be more than a single set
      of loads is multiple IDs are in the NS file) at once, and waits for
      all the children to exit. I've hacked up os_load a bit to try and be
      more robust in the face of PXE failures, which still happen and are
      rather troublsesome. Need an event system!
      
      Contained in this revision are unrelated changed to make the OS and
      Image IDs per-project unique instead of globally unique, since thats a
      pain for the users. This turns out to be very messy, since underneath
      we do not want to pass around pid/ID in all the various places its
      used. Rather, I create a globally unique name and extened the OS and
      Image tables to include pid/name/ID. The user selects pid/name, and I
      create the globally unique ID. For the most part this is invisible
      throughout the system, except where we interface with the user, say in
      the web pages; the user should see his chosen name where possible, and
      the should invoke scripts (os_load, create_image, etc) using his/her
      name not the internal ID. Also, in the front end the NS file should
      use the user name not the ID. All in all, this accounted for a number
      of annoying changes and some special cases that are unavoidable.
      a73e627e
  32. 20 Dec, 2001 1 commit
  33. 17 Dec, 2001 1 commit
  34. 11 Oct, 2001 1 commit