1. 20 Oct, 2003 1 commit
  2. 13 Oct, 2003 1 commit
  3. 10 Oct, 2003 1 commit
    • Mac Newbold's avatar
      New StateWait changes - the main point of all this is to move to our new · 2b2a306d
      Mac Newbold authored
      model of waiting for state changes. Before we were watching the database
      (which means we can only watch for terminal/stable/long-lived states, and
      have to poll the db). Now things that are waiting for states to change
      become event listeners, and watch the stream of events flow by, and don't
      have to do any polling. They can now watch for any state, and even
      sequences of states (ie a Shutdown followed by an Isup).
      
      To do this, there is now a cool StateWait.pm library that encapsulates the
      functionality needed. To use it, you call initStateWait before you start
      the chain of events (ie before you call node reboot). Then do your stuff,
      and call waitForState() when you're ready to wait. It can be told to
      return periodically with the results so far, and you can cancel waiting
      for things. An example program called waitForState is in
      testbed/event/stated/ , and can also be used nicely as a command line tool
      that wraps up the library functionality.
      
      This also required the introduction of a TBFAILED event that can be sent
      when a node isn't going to make it to the state that someone may be
      waiting for. Ie if it gets wedged coming up, and stated retries, but
      eventually gives up on it, it sends this to let things know that the node
      is hozed and won't ever come up.
      
      Another thing that is part of this is that node_reboot moves (back) to the
      fully-event-driven model, where users call node reboot, and it does some
      checks and sends some events. Then stated calls node_reboot in "real mode"
      to actually do the work, and handles doing the appropriate retries until
      the node either comes up or is deemed "failed" and stated gives up on it.
      This means stated is also the gatekeeper of when you can and cannot reboot
      a node. (See mail archives for extensive discussions of the details.)
      
      A big part of the motivation for this was to get uninformed timeouts and
      retries out of os_load/os_setup and put them in stated where we can make a
      wiser choice. So os_load and os_setup now use this new stuff and don't
      have to worry about timing out on nodes and rebooting. Stated makes sure
      that they either come up, get retried, or fail to boot. tbrestart also
      underwent a similar change.
      2b2a306d
  4. 25 Sep, 2003 1 commit
  5. 17 Sep, 2003 1 commit
  6. 23 Jul, 2003 1 commit
  7. 14 May, 2003 1 commit
  8. 26 Mar, 2003 1 commit
    • Leigh B. Stoller's avatar
      Add "gid" slot to the images table for changing permission scheme from · 4c56daf6
      Leigh B. Stoller authored
      only pid, to pid/gid like most other things in the testbed. Also add a
      "global" slot to denote images that are globally available to all
      projects (system images). The older "shared" attribute is now used to
      denote images that are shared within a project (available to all
      subgroups in the project). The migration path for existing DBs is
      given in the migrate file. Be sure to run those commands on an
      existing testbed or things will break!
      
      www/newimageid, www/newimageid_ez: A bunch of changes for
      shared/global attributes. Added a group menu to the form so users can
      create images in subgroups. Beefed up the Java code that constructs
      the path name to use the gid, shared, and global attributes of the
      form to give the user the best possible path that we can. Improved the
      pathname checking code so that we do not allow just any old path in
      case the user elects to disregard the path we carefully constructed
      for them. Also check the proj/group membership, and setup defaults for
      users that have permission in just one pid/gid to create images.
      
      libdb.in: Changed permission check in TBImageIDAccessCheck() to
      reflect shared/global attribute changes.
      
      os_load: Get rid of test that checked path of the image. The path
      checking is done in the web interface anyway, so why duplicate in 4
      places. Other minor changes reflecting shared->global name change.
      Also note that images can come from the group directory now.
      
      create_image: Get rid of test that checked path of the image. The path
      checking is done in the web interface anyway, so why duplicate in 4
      places. Also note that images can come from the group directory now.
      
      www/dbdefs: Changed permission check in TBImageIDAccessCheck() to
      reflect shared/global attribute changes.
      
      www/showimageid_list, www/showstuff: Minor global/shared attribute
      changes.
      
      www/menu: Change osids/imageids pointer to point to the image list,
      not the osid list. This is more reasonable for mere users who have
      access to the EZ form, and thus never really need to concern
      themselves with osids.
      
      www/editimageid: Add proper pathname checking. There were no checks at
      all before!
      4c56daf6
  9. 25 Mar, 2003 1 commit
  10. 19 Feb, 2003 1 commit
  11. 29 Jan, 2003 1 commit
  12. 13 Jan, 2003 1 commit
  13. 07 Jan, 2003 1 commit
    • Leigh B. Stoller's avatar
      Remove hardwired 15 minute wait, and replace with a hardwired · f7b3e7b7
      Leigh B. Stoller authored
      calculation based on the size of the image file. Okay, to avoid all
      you folks from going to see what bit of dreck I came up with, here it
      is:
      
          my $sb     = stat($imagepath);
          my $chunks = $sb->size / (1024 * 1024);
          $maxwait   = int((($chunks / 100.0) * 25) + (4 * 60));
      
      Note the replacement of one hardwired number (15) with several dozen
      new ones!
      
      I like it anyway, cause I hate waiting 2*15 minutes when a 60 second
      load fails.
      f7b3e7b7
  14. 11 Dec, 2002 1 commit
  15. 12 Nov, 2002 1 commit
  16. 01 Nov, 2002 1 commit
  17. 18 Oct, 2002 1 commit
    • Mac Newbold's avatar
      Merge the newstated branch with the main tree. · 5c961517
      Mac Newbold authored
      Changes to watch out for:
      
      - db calls that change boot info in nodes table are now calls to os_select
      
      - whenever you want to change a node's pxe boot info, or def or next boot
      osids or paths, use os_select.
      
      - when you need to wait for a node to reach some point in the boot process
      (like ISUP), check the state in the database using the lib calls
      
      - Proxydhcp now sends a BOOTING state for each node that it talks to.
      
      - OSs that don't send ISUP will have one generated for them by stated
      either when they ping (if they support ping) or immediately after they get
      to BOOTING.
      
      - States now have timeouts. Actions aren't currently carried out, but they
      will be soon. If you notice problems here, let me know... we're still
      tuning it. (Before all timeouts were set to "none" in the db)
      
      One temporary change:
      
      - While I make our new free node manager daemon (freed), all nodes are
      forced into reloading when they're nfreed and the calls to reset the os
      are disabled (that will move into freed).
      5c961517
  18. 04 Oct, 2002 1 commit
  19. 17 Sep, 2002 1 commit
  20. 07 Jul, 2002 1 commit
  21. 24 Jun, 2002 1 commit
  22. 19 Jun, 2002 1 commit
  23. 06 Jun, 2002 1 commit
  24. 22 Apr, 2002 1 commit
  25. 28 Mar, 2002 1 commit
  26. 12 Feb, 2002 1 commit
  27. 08 Feb, 2002 1 commit
    • Leigh B. Stoller's avatar
      Big round of image/osid changes. This is the first cut (final cut?) at · a73e627e
      Leigh B. Stoller authored
      supporting autocreating and autoloading images. The imageid form now
      sports a field to specify a nodeid to create the image from; If set,
      the backend create_image script is invoked. Thats the easy part.
      Slightly harder is autoloading images based on the osid specified in
      the NS file. To support this, I have added a new DB table called
      osidtoimageid, which holds the mapping from osid/pctype to imageid.
      When users create images, they must specify what node types that image
      is good for. Obviously, the mappings have to be unique or it would be
      impossible to figure it out! Anyway, once that image mapping is
      in place and the image created, the user can specify that ID in the NS
      file. I've changed os_setup to to look for IDs that are not loaded,
      and to try and find one in the osidtoimageid. If found, it invokes
      os_load. To keep things running in parallel as much as possible,
      os_setup issues all the loads/reboots (could be more than a single set
      of loads is multiple IDs are in the NS file) at once, and waits for
      all the children to exit. I've hacked up os_load a bit to try and be
      more robust in the face of PXE failures, which still happen and are
      rather troublsesome. Need an event system!
      
      Contained in this revision are unrelated changed to make the OS and
      Image IDs per-project unique instead of globally unique, since thats a
      pain for the users. This turns out to be very messy, since underneath
      we do not want to pass around pid/ID in all the various places its
      used. Rather, I create a globally unique name and extened the OS and
      Image tables to include pid/name/ID. The user selects pid/name, and I
      create the globally unique ID. For the most part this is invisible
      throughout the system, except where we interface with the user, say in
      the web pages; the user should see his chosen name where possible, and
      the should invoke scripts (os_load, create_image, etc) using his/her
      name not the internal ID. Also, in the front end the NS file should
      use the user name not the ID. All in all, this accounted for a number
      of annoying changes and some special cases that are unavoidable.
      a73e627e
  28. 30 Jan, 2002 1 commit
  29. 17 Jan, 2002 1 commit
  30. 14 Jan, 2002 1 commit
    • Leigh B. Stoller's avatar
      Make Frisbee.Redux live: · d08b5e41
      Leigh B. Stoller authored
      * Add appropriate goo to os/GNUMakefile so that Frisbee daemon is
        built and installed.
      
      * Rework the frisbee launcher slightly. Aside from little changes
        (send email to tbops when frisbeed dies, new cmdline syntax to
        frisbeed), allow for frisbeed to exit gracefully after a period of
        inactivity (no client requests for 30 minutes, at present). In order
        to prevent a race condition with a new client being added (and
        rebooted) and frisbeed terminating before the client gets started,
        add a load_busy indicator to the images table (next to load_address
        slot) and set that to one each time to frisbeelauncher is invoked.
        When frisbeed exits, test and clear that bit atomically (lock
        tables) and go around another time (restart frisbeed for another 30
        minute period).
      
      * Rework waitmode in os_load. Wait for all of the nodes to finish at
        once, and track which nodes never finish. Retry those nodes again by
        rebooting. The number of retries is configurable in the script, and
        is currently set to one. This should take care of some PXE boot
        related problems, although obviously not all.
      
      * Got rid of -w option to os_load and made waitmode the default. The
        -s option can be used to start a reload, but not to wait for it to
        complete.
      
      * Minor changes to sched_reload and reload_daemon; pass in -s option
        to os_load.
      d08b5e41
  31. 06 Nov, 2001 1 commit
  32. 05 Nov, 2001 1 commit
  33. 22 Oct, 2001 1 commit
    • Leigh B. Stoller's avatar
      Add -e pid,eid option to sched_reload to make it easier to schedule · 6adf504b
      Leigh B. Stoller authored
      reloads for nodes in an experiment.
      Change os_load to schedule a default image reload whenever a mereuser
      loads an image that is not the default image for that node type.
      Add some support stuff in libdb (TBSetSchedReload) and some constant
      definitions for sched_reload and for nodelog.
      6adf504b
  34. 16 Oct, 2001 1 commit
  35. 28 Sep, 2001 1 commit
    • Leigh B. Stoller's avatar
      Interface change: · f870a7e9
      Leigh B. Stoller authored
      	Usage: os_load [-s | -w] [-r] [-i <imageid>] <node> [node ...]
              Usage: sched_reload [-f | -p] [-r] [-i <imageid>] <node> [node ...]
      
      The imageid is now an optional argument. After continually forgetting
      what imageid to use, or just plain forgetting the argument, and having
      it try to load imageid pc53 on pcXX, I decided this interface was
      bogus. With now imageid, select the default imageid for each node
      provided. This is actually convenient since you can load multiple
      types of nodes in one shot.
      f870a7e9
  36. 18 Sep, 2001 1 commit
  37. 17 Sep, 2001 1 commit
    • Robert Ricci's avatar
      Added support for the new current_reloads table - this table is intended · 455dcee9
      Robert Ricci authored
      to contain a list of reloads currently in processes. It is filled by
      os_load, and is cleared out by the tmcd 'reset' command or by nfree.
      The tmcd 'loadaddr' command now uses this table instead of the reloads table.
      
      Also added Frisbee support to sched_reload, and changed the Frisbee command
      line option to os_load to '-r' to avoid a conflict with sched_reload's '-f'
      option.
      455dcee9
  38. 04 Sep, 2001 1 commit
  39. 24 Aug, 2001 2 commits