1. 22 Jun, 2005 1 commit
    • Leigh B. Stoller's avatar
      Added my simplistic link tracing and monitoring. Example usage and · 7942119e
      Leigh B. Stoller authored
      some details can be found in the advanced tutorial that I wrote up.
      See this link:
      
      http://www.emulab.net/tutorial/docwrapper.php3?docname=advanced.html#Tracing
      
      The basic idea is that each virt_lan entry gets a couple of new slots
      describing the type of tracing that is desired.
      
        traced tinyint(1) default '0',
        trace_type enum('header','packet','monitor') NOT NULL default 'header',
        trace_expr tinytext,
        trace_snaplen int(11) NOT NULL default '0',
        trace_endnode tinyint(1) NOT NULL default '0',
      
      There is a new physical table called "traces" that is a little bit
      like the current delays table. A new tmcd command returns the trace
      configuration to the client nodes (tmcd/common/config/rc.trace).
      
      The delays table got a new boolean called "noshaping" that tells the
      delay node to bridge, but not set up any pipes. This allows us to
      capture traffic at the delay node, but without much less overhead on
      the packets.
      
      The pcapper got bloated up to do packet capture and more event stuff.
      I also had to add some mutex locking around calls into the pcap
      library and around malloc, since the current setup used linuxthreads,
      which is not compatable with the standard libc_r library. I was
      getting all kinds of memory corruption, and I am sure that if someone
      breathes on the pcapper again, it will break in some new way.
      7942119e
  2. 18 Jun, 2005 1 commit
  3. 13 Jun, 2005 1 commit
    • Timothy Stack's avatar
      · 5e43a771
      Timothy Stack authored
      Initial checkin of a "repositioning" daemon that moves robots back to
      their pens on swapout.
      
      	* configure, configure.in: Add tbsetup/repos_daemon.
      
      	* db/libdb.pm.in: Add constants for the
      	repositionpending/repositioning experiments.
      
      	* db/nfree.in: When freeing garcias, send them to
      	repositionpending instead of reloadpending.
      
      	* event/sched/event-sched.c: Deal with the rare case of no
      	SIMULATOR object being in the agent list for an experiment.
      
      	* robots/emc/emcd.c, robots/emc/locpiper.in: Fix some typos.
      
      	* robots/rmcd/masterController.h, robots/rmcd/masterController.c,
      	robots/rmcd/obstacles.h, robots/rmcd/obstacles.c: Ignore dynamic
      	obstacles that are far away and remove dynamic obstacles where the
      	robot is inside the natural obstacle area.
      
      	* sql/database-create.sql, sql/database-migrate.txt: Add a
      	reposition_status table that tracks the status of robots that are
      	being moved back to their pens.
      
      	* tbsetup/GNUmakefile.in: Install the repos_daemon script.
      
      	* tbsetup/reload_daemon.in: Move robots to the repositionpending
      	experiment, if they haven't already reached their pen.
      
      	* tbsetup/repos_daemon.in: Daemon that takes care of seeing robots
      	back to their pens after they are freed from an experiment.
      5e43a771
  4. 25 May, 2005 1 commit
    • Leigh B. Stoller's avatar
      Add a "nosetup" option to elabinelab experiments. In the experiments · eed85271
      Leigh B. Stoller authored
      table, if elabinelab_nosetup is non-zero, boss and ops setup will do
      just enough to get the nodes into a state that hopefully approximates
      what a real installation might look like before installing our stuff.
      I do install the packages cause there is no point in waiting for that
      to finish interactively.
      
      From this point, you can log into the console(s) and run the setup
      instructions verbatim, although I have not actually tried that yet.
      
      The nice thing is that if you manage to get things properly setup, it
      can function as a real elabinelab since the outer environment has been
      setup. This is quite a bit different then how we tested during the
      last release frenzy.
      
      Its not quite perfect of course, since the images are not "clean", but
      I think this is okay for testing.
      eed85271
  5. 20 May, 2005 1 commit
    • Leigh B. Stoller's avatar
      Checkpoint some robot changes. · 5f67fe09
      Leigh B. Stoller authored
      * New robot event listener:
      
          * It is intended to be started and stopped from the experiment
            swapin path instead of as a global daemon. It takes the pid/eid
            of the experiment, and will deal with events only for those
            nodes that are allocated to the experiment. We have some long
            range plans of time sharing the robot lab, so I figured we might
            as get a little bit of a start on that.
      
          * Once it fires up, it subscribes to the usual assortment of
            events, just like the loclistener does.
      
          * It then binds a socket on which to listen for connections from
            the web server.
      
          * Then it loops, looking for events and for connections from the
            web server. Connections from the web server are for forwarding
            the event stream in real time to whatever applets are currently
            viewing the robot lab.
      
          * As each event comes in, it is parsed, entered into the DB (nodes
            and location_info table), and fired out (in a textual form) to
            all the applet listeners. The web interface just acts as pipe in
            this case for the data.
      
          * The event stream is also duplicated to a file in the experiment
            directory (the same stuff that is piped to the applet), named by
            the current resource record ID. I hope to use this stream to
            playback the motion in the applet (coupled with webcam images
            once I figure out how to sync them).
      
      * tbswap: Start and stop the new listener.
      
      * Robotrack: I changed the interface for how we actually communicate
        the event data. Much more reasonable then it was (a comma separated
        string of numbers!).
      
      * new database fields in the experiments table to hold the process ID
        of the listener and the port it is listening on. The port is not
        used yet, as the robot lab is still not shared. Will need to revist
        this later. Currently uses a fixed port number.
      
      * www/robotrack/robopipe.php3: Killed most of the old code and replace
        with simple socket connect to the listener.
      5f67fe09
  6. 16 May, 2005 1 commit
    • Leigh B. Stoller's avatar
      Add support for specifying the CVS tag to use when getting the source code · f1863cfd
      Leigh B. Stoller authored
      for the inner elab.
      
      	tb-set-elabinelab-cvstag dist-foo
      
      Will result in this branch getting checked out from the mirrored repository
      (updated nightly) on boss and sent back to the node, instead of the usual
      source tarball that we keep in /usr/testbed/src (still the default
      behaviour if no tag is specified. You can also do this if you like:
      
      	tb-set-elabinelab-cvstag HEAD
      
      which of course is a special tag to CVS.
      f1863cfd
  7. 12 May, 2005 1 commit
    • Leigh B. Stoller's avatar
      Checkpoint the rest of my changes to support swapmod of both ElabInElab and · 6eff9de6
      Leigh B. Stoller authored
      Firewalled experiments (see tbsetup/elabinelab.in for the other stuff).
      
      * To support firewalled experiments, needed to add a new virt_firewalls
        table to split the existing firewalls table up, which included both
        virtual and physical stuff. There are the usual frontend changes and a
        few other things scattered around, including tmcd.c.
      
      * The firewall code in tbswap got some beefing up to support adding and
        deleting nodes from the its special control net vlan. Note that I have
        not made any progress on containment of deleted nodes, just as we do not
        do anything now for teardown (unless its paniced, in which case the
        experiment cannot be modified anyway).
      
      * ptopgen and assign_wrapper got some interesting modifications: Unlike
        regular swapmod, we cannot just tear down all the vlans since that would
        interrupt everything inside the inner elab. Instead, leave the vlans as
        is. The problem is that when assign runs, it can just as easily pick
        different interfaces on the same nodes, which would be a royal pain in
        the ass to deal with! So, ptopgen got a new option (-u) that assign
        wrapper uses to tell ptopgen that it should prune out unused interfaces
        from nodes that are already allocated to the experiment. This is, at
        best, as pathetically gross hack, but it makes sure that all the
        interfaces stay the same across swapmods.
      
      * The unrelated revision of elabinelab has a bunch of new code for adding
        and deleting nodes from the inner elab. Mostly it deals with dhcpd (inner
        and outer, waiting for nodes to reboot, etc). It also deals with updating
        the vlans table in the DB, pruning out any nodes (ports) that are deleted
        but for which there are still interfaces in existing vlans. Said ports
        are them moved back to the default vlan with calls to snmpit. Also under
        another revision a a couple of weeks ago are the web interface changes to
        support the newnode MFS inside an inner Emulab.
      
      * swapexp and endexp got some more checks for firewalled and paniced
        experiments, which were missing.
      6eff9de6
  8. 15 Apr, 2005 1 commit
  9. 13 Apr, 2005 1 commit
  10. 06 Apr, 2005 1 commit
  11. 05 Apr, 2005 1 commit
  12. 30 Mar, 2005 1 commit
  13. 29 Mar, 2005 1 commit
  14. 28 Mar, 2005 1 commit
  15. 25 Mar, 2005 1 commit
  16. 24 Mar, 2005 1 commit
  17. 22 Mar, 2005 1 commit
  18. 20 Mar, 2005 1 commit
  19. 18 Mar, 2005 1 commit
  20. 17 Mar, 2005 1 commit
    • Mike Hibler's avatar
      Partial support for disk-zeroing on experiment termination. · 60e7adb8
      Mike Hibler authored
      I did the "back half" support.  If the 'mustwipe' field is non-zero
      in the reserved table entry for a node then its disk must be zeroed.
      How the zeroing is done, depends on the value of the mustwipe field.
      Right now, '1' means pass the '-z' option to frisbee to have it zero
      all non-allocated blocks.  The value '2' is reserved for enabling a
      "full wipe" pass of the disk before running frisbee, which Keith Sklower
      (DETER) wanted to be able to do.  Note that 1 and 2 are effectively the
      same, if we are loading a full-disk image; i.e. all non-allocated blocks
      from the new image are zeroed.  But if the disk were being loaded with
      a single-partition image, then "frisbee -z" would only wipe unused
      blocks in that partition.
      
      The reload_daemon has been modified to extract the mustwipe info and
      invoke os_load accordingly.   os_load now takes a "-z <type>" option
      to enable the zeroing by setting a value in the current_reloads table.
      tmcd will read and return that info to its caller in the "loadinfo" command.
      Finally, the rc.frisbee script that runs in the frisbee MFS extracts the
      loadinfo info and crafts the frisbee startup command.
      
      What still needs to be done is the "front end," how the user specifies
      the value and how it winds up in the DB reserved table.  This will probably
      involve addition of state to the experiments table as this will likely be
      a per-experiment setting.
      60e7adb8
  21. 11 Mar, 2005 1 commit
  22. 08 Mar, 2005 1 commit
  23. 07 Mar, 2005 1 commit
  24. 01 Mar, 2005 1 commit
  25. 22 Feb, 2005 2 commits
    • Leigh B. Stoller's avatar
      Add loc_z to the location info table, and display that on both the · 67ff3af6
      Leigh B. Stoller authored
      static robot map and in the robot tracker applet.
      67ff3af6
    • Leigh B. Stoller's avatar
      Okay, first attempt to deal with os_setup waittimes on a per node_type · facc7acd
      Leigh B. Stoller authored
      and per OSID basis.
      
      * Added bios_waittime to node_types table and reboot_waittime to
        os_info table. Initialized them as follows:
      
              update node_types set bios_waittime=60 where class='pc';
              update os_info set reboot_waittime=150 where OS='Linux' or
      	  OS='FreeBSD' or OS='NetBSD';
              update os_info set reboot_waittime=180 where OS=Windows';
      
      * The bios waittime can be edited via the web interface.
      
      * The reboot waittime can be set only by admin people right now; this
        is another case of something that maybe the user should not see
        cause its too much stuff? Instead, default values are established in
        www/osiddefs.php3.
      
      * os_setup computes its per-node waitime as:
      
      	(bios_waittime + reboot_waittime) * 2
      
        as per Mike's suggestion. If either value is not defined in the DB,
        it defaults the original 7 minute value.
      facc7acd
  26. 15 Feb, 2005 1 commit
  27. 14 Feb, 2005 1 commit
  28. 08 Feb, 2005 2 commits
  29. 02 Feb, 2005 1 commit
    • Leigh B. Stoller's avatar
      Add tb-set-delay-capacity NS directive: · 21044084
      Leigh B. Stoller authored
      	tb-set-delay-capacity 1
      
      Will override the default delay capacity as specfied in the node_types
      table for each node type, and set it for all types when generating the
      ptop file.
      
      This is a big stick, to be used with caution, as it will effectively
      double the number of nodes used for delay nodes (withing an experiment).
      21044084
  30. 28 Jan, 2005 4 commits
  31. 24 Jan, 2005 1 commit
    • Leigh B. Stoller's avatar
      Bottom line on this commit: Do not update the nodetypeXpid_permissions · 775ca147
      Leigh B. Stoller authored
      table by hand anymore! Update the group_policies table and then run
      the script to update the permissions table (sbin/update_permissions).
      
      Details:
      
      My original thought when I started this was that I would be able to
      replace the existing nodetypeXpid_permissions table with this new
      stuff. Well, it turns out that this was not a good thing to do, for a
      couple of reasons:
      
        * Engineering: We access the nodetypeXpid_permissions table from three
          different languages, and no way I wanted to rewrite this library in
          in python and php!
      
        * Performance: We access the nodetypeXpid_permissions from the web
          interface, on every single page load. In fact, we access it twice if
          if you count the FreePCs() count that we put at the top of the menu.
          Going through this library on each page load would be a serious drag.
      
      So, rather then actually get rid of the nodetypeXpid_permissions table, I
      decided to keep it as a "cache" of permissions stored in the group
      policies table. Each time you update the policy tables, we need to run
      the update_permissions script which will call into this library (see the
      TBUpdateNodeTypeXpidPermissions() routine) to reconstruct the permissions
      table. I have whacked the grantnodetype script to do exactly that.
      
      Note that we could proably do the same thing for users by creating an
      equivalent nodetypeXuid_permissions table, mapping users to types they
      are allowed to use. That would be a lot rows, but the amount of data in
      the table is small. That would give us very fine grained control of what
      we show people in the web interface. Not sure it is worth it though.
      
      I also added some instructions to previous commit in database-migrate.txt
      on populating the new group_policies table from the existing
      permissions table.
      775ca147
  32. 22 Jan, 2005 1 commit
  33. 18 Jan, 2005 1 commit
    • Leigh B. Stoller's avatar
      Here is a checkpoint of the admission control stuff I have been working on. · 54f55585
      Leigh B. Stoller authored
      The last part is the stuff to hook it in from assign_wrapper, and some
      additional support in assign that Rob is adding for me. This comment is
      from the top of new file db/libadminctrl.pm.in and describes everything in
      detail.
      
      # Admission control policies. These are the ones I could think of, although
      # not all of these are implemented.
      #
      #  * Number of experiments per type/class (only one expt using robots).
      #
      #  * Number of experiments per project
      #  * Number of experiments per subgroup
      #  * Number of experiments per user
      #
      #  * Number of nodes per project      (nodes really means pc testnodes)
      #  * Number of nodes per subgroup
      #  * Number of nodes per user
      #
      #  * Number of nodes of a class per project
      #  * Number of nodes of a class per group
      #  * Number of nodes of a class per user
      #
      #  * Number of nodes of a type per project
      #  * Number of nodes of a type per group
      #  * Number of nodes of a type per user
      #
      #  * Number of nodes with attribute(s) per project
      #  * Number of nodes with attribute(s) per group
      #  * Number of nodes with attribute(s) per user
      #
      # So we have group (pid/gid) policies and user policies. These are stored
      # into two different tables, group_policies and user_policies, indexed in
      # the obvious manner. Each row of the table defines a count (experiments,
      # nodes, etc) and a type of thing being counted (experiments, nodes, types,
      # classes, etc). When we test for admission, we look for each matching row
      # and test each condition. All conditions must pass. No conditions means a
      # pass. There is also some "auxdata" which holds extra information needed
      # for the policy (say, the type of node being restricted).
      #
      #      uid:     a uid
      #   policy:     'experiments', 'nodes', 'type', 'class', 'attribute'
      #    count:     a number
      #  auxdata:     a string (optional)
      #
      # Example: A user policy of ('mike', 'nodes', 10) says that poor mike is
      # not allowed to have more 10 nodes at a time, while ('mike', 'type',
      # '10', 'pc850') says that mike cannot allocate more than 10 pc850s.
      #
      # The group_policies table:
      #
      #      pid:     a pid
      #      gid:     a gid
      #   policy:     'experiments', 'nodes', 'type', 'class', 'attribute'
      #    count:     a number
      #  auxdata:     a string (optional)
      #
      # Example: A project policy of ('testbed', 'testbed', 'experiments', 10)
      # says that the testbed project may not have more then 10 experiments
      # swapped in at a time, while ('testbed', 'TG1', 'nodes', 10) says that the
      # TG1 subgroup of the testbed project may not use more than 10 nodes at
      # time.
      #
      # In addition to group and user policies (which are policies that apply to
      # specific users/projects/subgroups), we also need policies that apply to
      # all users/projects/subgroups (ie: do not want to specify a particular
      # restriction for every user!). To indicate such a policy, we use a special
      # tag in the tables (for the user or pid/gid):
      #
      #      '+'  -  The policy applies to all users (or project/groups).
      #
      # Example: ('+','experiments',10) says that no user may have more then 10
      # experiments swapped in at a time. The rule overrides anything more
      # specific (say a particular user is restricted to 20 experiments; the above
      # rule overrides that and the user (all users) is restricted to 10.
      #
      # Sometimes, you want one of these special rules to apply to everyone, but
      # *allow* it to be overridden by a more specific rule. For that we use:
      #
      #      '-'  -  The policy applies to all users (or project/groups),
      #              but can be overridden by a more specific rule.
      #
      # Example: The rules:
      #
      #	('-','type',0, 'garcia')
      #       ('testbed', 'testbed', 'type', 10, 'garcia')
      #
      # says that no one is allowed to allocate garcias, unless there is specific
      # rule that allows it; in this case the testbed project can allocate them.
      #
      # There are other global policies we would like to enforce. For example,
      # "only one experiment can be using the robot testbed." Encoding this kind
      # of policy is harder, and leads down a path that can get arbitrarily
      # complex. Tha path leads to ruination, and so we want to avoid it at
      # all costs.
      #
      # Instead we define a simple global policies table that applies to all
      # experiments currently active on the testbed:
      #
      #   policy:     'nodes', 'type', 'class', 'attribute'
      #     test:     'max', others I cannot think of right now ...
      #    count:     a number
      #  auxdata:     a string
      #
      # Example: A global policy of ('nodes', 'max', 10, '') say that the maximum
      # number of nodes that may be allocated across the testbed is 10. Thats not
      # a very realistic policy of course, but ('type', 'max', 1, 'garcia') says
      # that a max of one garcia can be allocated across the testbed, which
      # effectively means only one experiment will be able to use them at once.
      # This is of course very weak, but I want to step back and give it some
      # more thought before I redo this part.
      #
      # Is that clear? Hope so, cause it gets more complicated. Some admission
      # control tests can be done early in the swap phase, before we really do
      # anything (before assign_wrapper). Others (type and class) tests cannot
      # be done here; only assign can figure out how an experiment is going to map
      # to physical nodes (remember virtual types too), and in that case we need
      # to tell assign what the "constraints" are and let it figure out what is
      # possible.
      #
      # So, in addition to the simple checks we can do, we also generate an array
      # to return to assign_wrapper with the maximum counts of each node type and
      # class that is limited by the policies. assign_wrapper will dump those
      # values into the ptop file so that assign can enforce those maximum values
      # regardless of what hardware is actually available to use. As per discussion
      # with Rob, that will look like:
      #
      #	set-type-limit <type> <limit>
      #
      # and assign will spit out a new type of violation that assign_wrapper will
      # parse.
      #
      # NOTES:
      #
      #  1) Admission control is skipped in admin mode; returns okay.
      #  2) Admission control is skipped when the pid is emulab-ops; returns okay.
      #  3) When calculating current usage, nodes reserved to emulab-ops are
      #     ignored.
      #  4) The sitevar "swap/use_admission_control" controls the use of admission
      #     control; defaults to 1 (on).
      #  5) The current policies can be viewed in the web interface. See
      #     https://www.emulab.net/showpolicies.php3
      #  6) The global policy stuff is weak. I plan to step back and think about it
      #     some more before redoing it, but it will tide us over for now.
      #
      54f55585
  34. 15 Jan, 2005 1 commit
  35. 13 Jan, 2005 1 commit