1. 07 Jan, 2003 1 commit
    • Leigh B. Stoller's avatar
      Changes for setting up jailed nodes, which need checks similar to what · 5ab15776
      Leigh B. Stoller authored
      real nodes get. Also, run a proper os_select on jailed nodes, *after*
      the os for the physical node is setup, since otherwise stated will not
      be happy.
      
      Fixes for dealing with failed os_load. Previously, if os_load would
      fail, os_setup would wait for those nodes anyway since it had no idea
      what nodes had failed (and we do not want to just quit from os_setup
      since that might cause a lot of extra power cycles). Now, for each
      node that got an os_load, check its eventstate; it should be in ISUP
      immediately after os_load exits (since thats what os_load waited for),
      and if its not, then mark that node as failed. Note though that failed
      loads no longer result in the node going into hwdown, since 99 percent
      of the time its a busted user image, not a hardware problem. I figure
      we will catch real hw errors via the reload daemon, when it sends
      email about nodes not finishing.
      
      Do not bother with doing the vnode setup if any of the phys nodes
      failed to setup. Leads to cascading errors and prolongs the angony by
      another few minutes. Might revisit this later.
      
      Remove local WaitTillAlive() function, and switch to using the version
      I put into libdb a couple of weeks ago.
      
      Fix up a bunch of print statements to be nicer.
      5ab15776
  2. 31 Oct, 2002 1 commit
  3. 18 Oct, 2002 1 commit
    • Mac Newbold's avatar
      Merge the newstated branch with the main tree. · 5c961517
      Mac Newbold authored
      Changes to watch out for:
      
      - db calls that change boot info in nodes table are now calls to os_select
      
      - whenever you want to change a node's pxe boot info, or def or next boot
      osids or paths, use os_select.
      
      - when you need to wait for a node to reach some point in the boot process
      (like ISUP), check the state in the database using the lib calls
      
      - Proxydhcp now sends a BOOTING state for each node that it talks to.
      
      - OSs that don't send ISUP will have one generated for them by stated
      either when they ping (if they support ping) or immediately after they get
      to BOOTING.
      
      - States now have timeouts. Actions aren't currently carried out, but they
      will be soon. If you notice problems here, let me know... we're still
      tuning it. (Before all timeouts were set to "none" in the db)
      
      One temporary change:
      
      - While I make our new free node manager daemon (freed), all nodes are
      forced into reloading when they're nfreed and the calls to reset the os
      are disabled (that will move into freed).
      5c961517
  4. 26 Sep, 2002 1 commit
  5. 05 Aug, 2002 1 commit
  6. 07 Jul, 2002 1 commit
  7. 03 Jul, 2002 1 commit
  8. 02 Jun, 2002 1 commit
  9. 13 May, 2002 1 commit
  10. 10 May, 2002 2 commits
  11. 09 May, 2002 1 commit
  12. 08 May, 2002 1 commit
  13. 22 Apr, 2002 1 commit
  14. 16 Apr, 2002 1 commit
  15. 05 Mar, 2002 1 commit
    • Leigh B. Stoller's avatar
      A wide ranging set of event system changes: · 0318cc22
      Leigh B. Stoller authored
      assign_wrapper.in: Hack in a change that ensures a delay node is
      created for any link on which an event is posted (up,down,modify),
      no matter what its initial parameters are. ie: If a link is created
      with no delay, but there is an event that adds a delay later, then we
      must drop in a delay node. Same for up/down on a link. We do this in
      the delay node. I am reasonably confident that this change is fine for
      duplex links, but I am less sure of the effect on lans!
      
      eventsys_control.in: Checkpoint latest changes. Add "replay" option,
      which right now just stops and starts the event scheduler so that it
      reloads the entire event list. Add check for existing experiment, and
      that the experiment is either active or swapping (do not want to start
      a scheduler for a swapped out experiment!). Add check to see if there
      are any events, and skip startup if there are not events in the DB.
      Lastly, get very serious about preventing more than one scheduler from
      being started, either by accident or intentionally. My protocol is to
      lock the table, grab and set the pid to -pid, test the pid for a
      positive value, and if positive, send the scheduler a kill(TERM) so
      that it can cleanup, clear the pid to zero in the DB, and exit. This
      approach ensures that we do not try to send a kill to a pid that is no
      longer active or owned by the user (this last part is not really
      necessary cause of how pids are reused, but it was easy to add so why
      not).
      
      exports_setup.in: Trivial change to make it easier to turn this on
      temporarily in devel trees.
      named_setup.in: Ditto.
      
      node_reboot.in: Add call to TBdbfork() in child cause of apparent DB
      connection problems across forks. In the child, set the eventstatus
      for the node to REBOOT if successful (not this event status stuff is
      temporary, will be recast in next set of revisions).
      
      GNUmakefile:  Add new controlling program, eventsys_control.
      power.in:     Ditto previous comment about REBOOT.
      os_setup.in:  Non event system cleanups.
      tbend.in:     Add DB cleanup of the new virt_trafgens and eventlist tables.
      tbprerun.in:  Ditto.
      tbreport.in:  Print out the event list in a pretty print format.
      tbswapin.in:  Add call to start the event system. Also a big fix; move
                    the named script up above the os_setup so that the named
                    tables have been updated by the time the first node
                    reboots. I noticed that nodes were failing on gethostbyname().
      tbswapout.in: Add call to stop the event system.
      0318cc22
  16. 12 Feb, 2002 1 commit
  17. 08 Feb, 2002 1 commit
    • Leigh B. Stoller's avatar
      Big round of image/osid changes. This is the first cut (final cut?) at · a73e627e
      Leigh B. Stoller authored
      supporting autocreating and autoloading images. The imageid form now
      sports a field to specify a nodeid to create the image from; If set,
      the backend create_image script is invoked. Thats the easy part.
      Slightly harder is autoloading images based on the osid specified in
      the NS file. To support this, I have added a new DB table called
      osidtoimageid, which holds the mapping from osid/pctype to imageid.
      When users create images, they must specify what node types that image
      is good for. Obviously, the mappings have to be unique or it would be
      impossible to figure it out! Anyway, once that image mapping is
      in place and the image created, the user can specify that ID in the NS
      file. I've changed os_setup to to look for IDs that are not loaded,
      and to try and find one in the osidtoimageid. If found, it invokes
      os_load. To keep things running in parallel as much as possible,
      os_setup issues all the loads/reboots (could be more than a single set
      of loads is multiple IDs are in the NS file) at once, and waits for
      all the children to exit. I've hacked up os_load a bit to try and be
      more robust in the face of PXE failures, which still happen and are
      rather troublsesome. Need an event system!
      
      Contained in this revision are unrelated changed to make the OS and
      Image IDs per-project unique instead of globally unique, since thats a
      pain for the users. This turns out to be very messy, since underneath
      we do not want to pass around pid/ID in all the various places its
      used. Rather, I create a globally unique name and extened the OS and
      Image tables to include pid/name/ID. The user selects pid/name, and I
      create the globally unique ID. For the most part this is invisible
      throughout the system, except where we interface with the user, say in
      the web pages; the user should see his chosen name where possible, and
      the should invoke scripts (os_load, create_image, etc) using his/her
      name not the internal ID. Also, in the front end the NS file should
      use the user name not the ID. All in all, this accounted for a number
      of annoying changes and some special cases that are unavoidable.
      a73e627e
  18. 20 Dec, 2001 1 commit
  19. 17 Dec, 2001 1 commit
  20. 11 Oct, 2001 1 commit
  21. 06 Sep, 2001 1 commit
    • Leigh B. Stoller's avatar
      Minor hacks to support FBSD-STD and RHL-STD as generic OSIDs. These · d7532d24
      Leigh B. Stoller authored
      have been added as OSIDs so that the parser accepts them. os_setup
      maps them into whatever equiv OSID is loaded on the target node,
      according to the OS slot of the osid table entry. If no mapping can be
      made (no equiv OS loaded, as defined by the partitions table) os_setup
      fails. I've also changed the web page node control form so that the
      only OSIDs you can set for a node are the ones loaded (partitions
      table) or OSKit kernels (osid table entry has a path).
      d7532d24
  22. 24 Aug, 2001 3 commits
  23. 23 Aug, 2001 1 commit
    • Mac Newbold's avatar
      Lots of small changes for turning our 'require lib*' lines into 'use lib*'... · e2ed8a1c
      Mac Newbold authored
      Lots of small changes for turning our 'require lib*' lines into 'use lib*' lines. Proper modules declare themselves as a package, and use Exporter to export the names of the subroutines that should be visible from the outside world. Many of ours didn't do that, it was just a file with a bunch of subs in it. So now I've fixed many of them to be proper, and removed the requires and 'push(@INC,...)' hacks and changed it to the proper 'use lib @prefix@/lib/;' and use lib*.
      e2ed8a1c
  24. 23 Jul, 2001 1 commit
  25. 21 Jul, 2001 1 commit
    • Mac Newbold's avatar
      Many changes and updates for handling new types. The db now has types like... · 78b4e4f5
      Mac Newbold authored
      Many changes and updates for handling new types. The db now has types like 'pc600', 'pc850', and 'dnard', and each type has a class like 'pc' or 'shark'. This updates scripts that use types to use classes where appropriate, and to handle the new types where there were hardcoded things that couldn't be eliminated right now.
      78b4e4f5
  26. 17 Jul, 2001 1 commit
    • Leigh B. Stoller's avatar
      Some minor changes, plus endless hours of PERL confusion. Anyway, add · d1c90991
      Leigh B. Stoller authored
      a bootstatus field to the nodes table. os_setup sets this to one of
      okay, failed, unknown. This is to be used with the still to be defined
      method of specifying certain nodes that can fail reboot on experiment
      creation. Right now sharks are wired to this, and this information is
      presented in the web page. Its also essential for the batch system,
      which needs to consider nodes that failed to reboot, or else batch
      experiments would never end. Might still need a way for an experiment
      to tell the batch system its done though.
      d1c90991
  27. 16 Jul, 2001 1 commit
  28. 13 Jul, 2001 1 commit
  29. 10 Jul, 2001 1 commit
  30. 05 Jul, 2001 1 commit
  31. 20 Jun, 2001 1 commit
  32. 08 Jun, 2001 1 commit
  33. 16 May, 2001 1 commit
  34. 10 May, 2001 1 commit
    • Leigh B. Stoller's avatar
      Lots of little changes for sending email to the right places, with · 3285bc3e
      Leigh B. Stoller authored
      proper headers. Split out some of the mail into testbed-logs,
      testbed-ops, and testbed-approval. Added a library for including from
      our perl scripts. Contains a couple of mail helper functions, but will
      hopefully contain more as time goes by.
      
      Fixed a bug in the web interface that was causing breakage for people
      with multiple accounts. Mac and Jay have noticed this, when logging
      out and trying to join or create a project under a new or different
      name.
      3285bc3e
  35. 07 May, 2001 2 commits
  36. 03 May, 2001 1 commit
    • Leigh B. Stoller's avatar
      A slew of changes for new images/os_info tables. disk_images is gone, · 23a230e8
      Leigh B. Stoller authored
      replaced by the "images" table. New os_info table is added. New web
      pages to add and delete OSIDs to/from the os_info table, for use in
      the NS file. tb-create-os is gone. handle_os no longer operates on the
      tbcmds file, and no longer writes anything into the ir file. Moved the
      setting up of os state (nodes table) from os_setup to handle_os, where
      it should be. os_load and sched_reload now take a single argument, the
      name of the imageid from the images table.
      23a230e8