1. 22 Mar, 2005 1 commit
  2. 20 Mar, 2005 1 commit
  3. 18 Mar, 2005 1 commit
  4. 17 Mar, 2005 1 commit
    • Mike Hibler's avatar
      Partial support for disk-zeroing on experiment termination. · 60e7adb8
      Mike Hibler authored
      I did the "back half" support.  If the 'mustwipe' field is non-zero
      in the reserved table entry for a node then its disk must be zeroed.
      How the zeroing is done, depends on the value of the mustwipe field.
      Right now, '1' means pass the '-z' option to frisbee to have it zero
      all non-allocated blocks.  The value '2' is reserved for enabling a
      "full wipe" pass of the disk before running frisbee, which Keith Sklower
      (DETER) wanted to be able to do.  Note that 1 and 2 are effectively the
      same, if we are loading a full-disk image; i.e. all non-allocated blocks
      from the new image are zeroed.  But if the disk were being loaded with
      a single-partition image, then "frisbee -z" would only wipe unused
      blocks in that partition.
      
      The reload_daemon has been modified to extract the mustwipe info and
      invoke os_load accordingly.   os_load now takes a "-z <type>" option
      to enable the zeroing by setting a value in the current_reloads table.
      tmcd will read and return that info to its caller in the "loadinfo" command.
      Finally, the rc.frisbee script that runs in the frisbee MFS extracts the
      loadinfo info and crafts the frisbee startup command.
      
      What still needs to be done is the "front end," how the user specifies
      the value and how it winds up in the DB reserved table.  This will probably
      involve addition of state to the experiments table as this will likely be
      a per-experiment setting.
      60e7adb8
  5. 11 Mar, 2005 1 commit
  6. 08 Mar, 2005 1 commit
  7. 07 Mar, 2005 1 commit
  8. 01 Mar, 2005 1 commit
  9. 22 Feb, 2005 2 commits
    • Leigh B. Stoller's avatar
      Add loc_z to the location info table, and display that on both the · 67ff3af6
      Leigh B. Stoller authored
      static robot map and in the robot tracker applet.
      67ff3af6
    • Leigh B. Stoller's avatar
      Okay, first attempt to deal with os_setup waittimes on a per node_type · facc7acd
      Leigh B. Stoller authored
      and per OSID basis.
      
      * Added bios_waittime to node_types table and reboot_waittime to
        os_info table. Initialized them as follows:
      
              update node_types set bios_waittime=60 where class='pc';
              update os_info set reboot_waittime=150 where OS='Linux' or
      	  OS='FreeBSD' or OS='NetBSD';
              update os_info set reboot_waittime=180 where OS=Windows';
      
      * The bios waittime can be edited via the web interface.
      
      * The reboot waittime can be set only by admin people right now; this
        is another case of something that maybe the user should not see
        cause its too much stuff? Instead, default values are established in
        www/osiddefs.php3.
      
      * os_setup computes its per-node waitime as:
      
      	(bios_waittime + reboot_waittime) * 2
      
        as per Mike's suggestion. If either value is not defined in the DB,
        it defaults the original 7 minute value.
      facc7acd
  10. 15 Feb, 2005 1 commit
  11. 14 Feb, 2005 1 commit
  12. 08 Feb, 2005 2 commits
  13. 02 Feb, 2005 1 commit
    • Leigh B. Stoller's avatar
      Add tb-set-delay-capacity NS directive: · 21044084
      Leigh B. Stoller authored
      	tb-set-delay-capacity 1
      
      Will override the default delay capacity as specfied in the node_types
      table for each node type, and set it for all types when generating the
      ptop file.
      
      This is a big stick, to be used with caution, as it will effectively
      double the number of nodes used for delay nodes (withing an experiment).
      21044084
  14. 28 Jan, 2005 4 commits
  15. 24 Jan, 2005 1 commit
    • Leigh B. Stoller's avatar
      Bottom line on this commit: Do not update the nodetypeXpid_permissions · 775ca147
      Leigh B. Stoller authored
      table by hand anymore! Update the group_policies table and then run
      the script to update the permissions table (sbin/update_permissions).
      
      Details:
      
      My original thought when I started this was that I would be able to
      replace the existing nodetypeXpid_permissions table with this new
      stuff. Well, it turns out that this was not a good thing to do, for a
      couple of reasons:
      
        * Engineering: We access the nodetypeXpid_permissions table from three
          different languages, and no way I wanted to rewrite this library in
          in python and php!
      
        * Performance: We access the nodetypeXpid_permissions from the web
          interface, on every single page load. In fact, we access it twice if
          if you count the FreePCs() count that we put at the top of the menu.
          Going through this library on each page load would be a serious drag.
      
      So, rather then actually get rid of the nodetypeXpid_permissions table, I
      decided to keep it as a "cache" of permissions stored in the group
      policies table. Each time you update the policy tables, we need to run
      the update_permissions script which will call into this library (see the
      TBUpdateNodeTypeXpidPermissions() routine) to reconstruct the permissions
      table. I have whacked the grantnodetype script to do exactly that.
      
      Note that we could proably do the same thing for users by creating an
      equivalent nodetypeXuid_permissions table, mapping users to types they
      are allowed to use. That would be a lot rows, but the amount of data in
      the table is small. That would give us very fine grained control of what
      we show people in the web interface. Not sure it is worth it though.
      
      I also added some instructions to previous commit in database-migrate.txt
      on populating the new group_policies table from the existing
      permissions table.
      775ca147
  16. 22 Jan, 2005 1 commit
  17. 18 Jan, 2005 1 commit
    • Leigh B. Stoller's avatar
      Here is a checkpoint of the admission control stuff I have been working on. · 54f55585
      Leigh B. Stoller authored
      The last part is the stuff to hook it in from assign_wrapper, and some
      additional support in assign that Rob is adding for me. This comment is
      from the top of new file db/libadminctrl.pm.in and describes everything in
      detail.
      
      # Admission control policies. These are the ones I could think of, although
      # not all of these are implemented.
      #
      #  * Number of experiments per type/class (only one expt using robots).
      #
      #  * Number of experiments per project
      #  * Number of experiments per subgroup
      #  * Number of experiments per user
      #
      #  * Number of nodes per project      (nodes really means pc testnodes)
      #  * Number of nodes per subgroup
      #  * Number of nodes per user
      #
      #  * Number of nodes of a class per project
      #  * Number of nodes of a class per group
      #  * Number of nodes of a class per user
      #
      #  * Number of nodes of a type per project
      #  * Number of nodes of a type per group
      #  * Number of nodes of a type per user
      #
      #  * Number of nodes with attribute(s) per project
      #  * Number of nodes with attribute(s) per group
      #  * Number of nodes with attribute(s) per user
      #
      # So we have group (pid/gid) policies and user policies. These are stored
      # into two different tables, group_policies and user_policies, indexed in
      # the obvious manner. Each row of the table defines a count (experiments,
      # nodes, etc) and a type of thing being counted (experiments, nodes, types,
      # classes, etc). When we test for admission, we look for each matching row
      # and test each condition. All conditions must pass. No conditions means a
      # pass. There is also some "auxdata" which holds extra information needed
      # for the policy (say, the type of node being restricted).
      #
      #      uid:     a uid
      #   policy:     'experiments', 'nodes', 'type', 'class', 'attribute'
      #    count:     a number
      #  auxdata:     a string (optional)
      #
      # Example: A user policy of ('mike', 'nodes', 10) says that poor mike is
      # not allowed to have more 10 nodes at a time, while ('mike', 'type',
      # '10', 'pc850') says that mike cannot allocate more than 10 pc850s.
      #
      # The group_policies table:
      #
      #      pid:     a pid
      #      gid:     a gid
      #   policy:     'experiments', 'nodes', 'type', 'class', 'attribute'
      #    count:     a number
      #  auxdata:     a string (optional)
      #
      # Example: A project policy of ('testbed', 'testbed', 'experiments', 10)
      # says that the testbed project may not have more then 10 experiments
      # swapped in at a time, while ('testbed', 'TG1', 'nodes', 10) says that the
      # TG1 subgroup of the testbed project may not use more than 10 nodes at
      # time.
      #
      # In addition to group and user policies (which are policies that apply to
      # specific users/projects/subgroups), we also need policies that apply to
      # all users/projects/subgroups (ie: do not want to specify a particular
      # restriction for every user!). To indicate such a policy, we use a special
      # tag in the tables (for the user or pid/gid):
      #
      #      '+'  -  The policy applies to all users (or project/groups).
      #
      # Example: ('+','experiments',10) says that no user may have more then 10
      # experiments swapped in at a time. The rule overrides anything more
      # specific (say a particular user is restricted to 20 experiments; the above
      # rule overrides that and the user (all users) is restricted to 10.
      #
      # Sometimes, you want one of these special rules to apply to everyone, but
      # *allow* it to be overridden by a more specific rule. For that we use:
      #
      #      '-'  -  The policy applies to all users (or project/groups),
      #              but can be overridden by a more specific rule.
      #
      # Example: The rules:
      #
      #	('-','type',0, 'garcia')
      #       ('testbed', 'testbed', 'type', 10, 'garcia')
      #
      # says that no one is allowed to allocate garcias, unless there is specific
      # rule that allows it; in this case the testbed project can allocate them.
      #
      # There are other global policies we would like to enforce. For example,
      # "only one experiment can be using the robot testbed." Encoding this kind
      # of policy is harder, and leads down a path that can get arbitrarily
      # complex. Tha path leads to ruination, and so we want to avoid it at
      # all costs.
      #
      # Instead we define a simple global policies table that applies to all
      # experiments currently active on the testbed:
      #
      #   policy:     'nodes', 'type', 'class', 'attribute'
      #     test:     'max', others I cannot think of right now ...
      #    count:     a number
      #  auxdata:     a string
      #
      # Example: A global policy of ('nodes', 'max', 10, '') say that the maximum
      # number of nodes that may be allocated across the testbed is 10. Thats not
      # a very realistic policy of course, but ('type', 'max', 1, 'garcia') says
      # that a max of one garcia can be allocated across the testbed, which
      # effectively means only one experiment will be able to use them at once.
      # This is of course very weak, but I want to step back and give it some
      # more thought before I redo this part.
      #
      # Is that clear? Hope so, cause it gets more complicated. Some admission
      # control tests can be done early in the swap phase, before we really do
      # anything (before assign_wrapper). Others (type and class) tests cannot
      # be done here; only assign can figure out how an experiment is going to map
      # to physical nodes (remember virtual types too), and in that case we need
      # to tell assign what the "constraints" are and let it figure out what is
      # possible.
      #
      # So, in addition to the simple checks we can do, we also generate an array
      # to return to assign_wrapper with the maximum counts of each node type and
      # class that is limited by the policies. assign_wrapper will dump those
      # values into the ptop file so that assign can enforce those maximum values
      # regardless of what hardware is actually available to use. As per discussion
      # with Rob, that will look like:
      #
      #	set-type-limit <type> <limit>
      #
      # and assign will spit out a new type of violation that assign_wrapper will
      # parse.
      #
      # NOTES:
      #
      #  1) Admission control is skipped in admin mode; returns okay.
      #  2) Admission control is skipped when the pid is emulab-ops; returns okay.
      #  3) When calculating current usage, nodes reserved to emulab-ops are
      #     ignored.
      #  4) The sitevar "swap/use_admission_control" controls the use of admission
      #     control; defaults to 1 (on).
      #  5) The current policies can be viewed in the web interface. See
      #     https://www.emulab.net/showpolicies.php3
      #  6) The global policy stuff is weak. I plan to step back and think about it
      #     some more before redoing it, but it will tide us over for now.
      #
      54f55585
  18. 15 Jan, 2005 1 commit
  19. 13 Jan, 2005 3 commits
  20. 12 Jan, 2005 2 commits
  21. 11 Jan, 2005 3 commits
  22. 10 Jan, 2005 1 commit
    • Leigh B. Stoller's avatar
      A quick hack job to get the webcams onto the web interface. · d46902e1
      Leigh B. Stoller authored
      * Add new DB table "webcams" which hold the id of the webcam, the
        server it is attached to, and the last update time.
      
      * Add new sitevars webcam/anyone_can_view and webcam/admins_can_view.
        Should be obvious what they mean.
      
      * Add trivial script grabwebcams (invoked from cron) to grab the images
        from the servers and stash in /usr/testbed/webcams. The images are
        grabbed with scp, protected by a 5 second timeout. Fine for a couple
        of cameras.
      
      * Add web page stuff to display webcams, linked from the robot mape page.
      
      Permission to view the webcams is currently admin, or in a project that is
      allowed to use a robot. We can tighten this up later as needed.
      d46902e1
  23. 07 Jan, 2005 1 commit
  24. 06 Jan, 2005 1 commit
    • Leigh B. Stoller's avatar
      A bunch of boot changes. Read carefully. · 94ccc3f4
      Leigh B. Stoller authored
      * Add boot_errno to the nodes table so that nodes can report in a
        subcode to indicate what went wrong. At present, we do not report any
        real error codes; that is going to take some time to work out since it
        will reqiure a bunch of changes to the boot scripts.
      
      * Add new table node_bootlogs to store logs provided by the nodes. Not
        a full console log, but a log of the tmcd client side part. We can
        make it a full log if we want though; just means mucking about with
        the boot phase a bit.
      
      * Add new state transition to NORMALv2 and PCVM state machines. "TBFAILED"
        is a new state that is sent (after TBSETUP) if a node fails somewhere in
        the tmcd client side.
      
      * Change TBNodeStateWait() to take a list of states (instead of single
        state) and an optional pass by reference parameter to return the actual
        state that the node landed in. Change all calls to TBNodeStateWait() of
        course.
      
      * Change os_setup (and libreboot in wait mode) to look for both TBFAILED
        and ISUP. If a TBFAILED event is seen, we can terminate the wait early
        and not retry os_setup on physical nodes (although still retry virtual
        nodes). The nice thing about this is that the wait should terminate much
        earlier (rather then waiting for timeout), especially for virtual nodes
        which can take a really long time when there are a couple of hundred.
      
      * Add new routines dobooterrno() and dobootlog() to tmcd. Bump version
        number and increase the buffer size to allow for the larger packets that
        a console log wikk generate (added MAXTMCDPACKET variable, set to 0x4000).
      
      * Add new -f option to tmcc to specify a datafile to send along as the last
        argument to tmcd. This is more pleasing then trying to send a console log
        in on the command line. For example: "tmcc -f /tmp/log BOOTLOG" will send
        a BOOTLOG command along with the contents of /tmp/log.
      
        Also close the write side of the pipe so that server sees EOF on
        read. See aside comment below.
      
      * Changes to rc.bootsetup:
           1. Use perl tricks to capture all output, duping to the console and to
              a log file in /var/emulab/logs.
           2. On any error, send a status code (boot_errno) and the bootlog to
              tmcd.
           3. Generate a TBFAILED state transition.
      
      * Changes to rc.injail:
           1. Same as rc.bootsetup, but do not send log files; that would pummel
              boss. Leave them on the physical node.
      
      * Change vnodesetup (which calls mkjail) to watch for any error and send a
        TBFAILED state transition. This should catch almost all errors, and
        dramatically reduce waiting when something fails.
      
      * Changes to rc.cdboot are essentially the same as rc.bootsetup, although a
        bootlog is sent all the time (success or failure), and I do not generate
        a boot_errno yet. Also, instead of TBFAILED, generate a PXEFAILED state
        since the CDROM is actually operating within the PXEFBSD opmode. I have
        yet to work this into the rest of the system though; waiting to get a new
        CD built and actually experiment with it.
      
      * Add new menu option and web page to display the node bootlog. We store
        only the lastest bootlog, but maybe someday store more then one. Display
        boot_errno on node page.
      
      Aside: I made a big mistake in the tmcd protocol; I did not envision
      passing more then a small amount of data (one fragment) and so I do not
      include a record terminator (ie: close of the write side on the client
      sends EOF) or a size field at the beginning. No big deal since small
      requests are sent in one fragment and the server sees the entire
      thing. Well, with a large console log, that will end up as multiple
      fragments, and the server will often not get the entire thing on the first
      read, and there are no subsequent reads (with no EOF or known size, it
      would block forever). Well, fixing this in a backwards compatable manner
      (for old images) was way too much pain. Instead, tmcc now closes the write
      side, and the server does subsequent reads *only* in the new dobbootlog()
      routine. Note that it *is* possible to fix this in a backwards compatable
      manner, but I did not want to go down that path just yet.
      94ccc3f4
  25. 03 Jan, 2005 1 commit
  26. 21 Dec, 2004 2 commits
  27. 15 Dec, 2004 1 commit
  28. 14 Dec, 2004 1 commit
  29. 13 Dec, 2004 1 commit