1. 12 Oct, 2005 1 commit
  2. 11 Oct, 2005 1 commit
    • Leigh B. Stoller's avatar
      Add very hacky global_policies test to use for NSDI deadline. To · 4c8a4cde
      Leigh B. Stoller authored
      support NSDI06 easily, we want to create a project called NSDI06, have
      anyone who has an NSDI deadline join that project, and use that
      membership to allow them to swapin experiments in their real projects.
      So, new global_policies entry format:
      
      	insert into global_policies
      	     values ('membership', 'emulab-ops,testbed,NSDI06', 'eq', 0);
      
      When this policy is inserted, only members of the comma separated list
      of projects names can swap in *any* experiments.
      
      Just remove the table entry to remove the retriction.
      4c8a4cde
  3. 21 Sep, 2005 1 commit
  4. 13 Sep, 2005 1 commit
  5. 08 Sep, 2005 1 commit
  6. 07 Sep, 2005 2 commits
    • Mike Hibler's avatar
      Inner-elab role changes: · 88926d1c
      Mike Hibler authored
      	'boss' -> 'boss+router'
      	'ops'  -> 'ops+fs'
      	'fs'      (new role)
      	'router'  (new role)
      
      These are in preparation for allowing configurations with split ops and fs
      nodes (and sometime down the road, split boss and router nodes).
      
      This checkin is just the DB state changes along with the scripts that look
      at that state.  The Big One, which actually sets up separate nodes
      automatically, is undergoing more testing but will be Coming Soon.
      88926d1c
    • Leigh B. Stoller's avatar
  7. 02 Sep, 2005 1 commit
    • Mike Hibler's avatar
      Base temporal OSID mapping on the last experiment modify time rather than · 5492c5c0
      Mike Hibler authored
      on the creation time.  This allows you to get an experiment out of an
      "FOO-STD maps to an old OS" trap by doing a modify on the experiment rather
      than having to recreate the experiment from scratch.
      
      The theory is, that if you want the experiment to stay with the old OS,
      you either don't modify it or you modify it and fully specify the OS.
      5492c5c0
  8. 25 Aug, 2005 1 commit
    • Timothy Stack's avatar
      · 25ca9681
      Timothy Stack authored
      Add some checks for 'free' nodes that are not allocatable.
      
      	* db/audit.in: Include the list of nodes that are not reserved but
      	have an eventstate that makes them unallocatable.
      
      	* www/dbdefs.php3.in: Add POWEROFF and ALWAYSUP node states.
      
      	* www/nodecontrol_list.php3: Add an asterisk next to the free
      	count for type(s) that have free, but unallocatable nodes.
      
      	* www/shownodetype.php3: If a node is free, but unallocatable, put
      	a yellow ball next to its name instead of a green one.
      25ca9681
  9. 19 Aug, 2005 1 commit
  10. 17 Aug, 2005 3 commits
  11. 15 Aug, 2005 1 commit
    • Leigh B. Stoller's avatar
      The bulk of the mailman support. Still not turned on by default (cause · a64593f3
      Leigh B. Stoller authored
      Jay has "comments"), but I do not want it hanging around in my source
      tree. Here is my mail message:
      
      * The "My Mailing Lists" is context sensitive (copied from Tim's
        changes to the My Bug Databases). It takes you to the *archives* for
        the current project (or subgroup) list. Or it takes you to your
        first joined project.
      
      * The showproject and showgroup pages have direct links to the project
        and group specific archives. If you are in reddot mode, you also
        get a link to the admin page for the list. Note that project and
        group leaders are just plain members of these lists.
      
      * The interface to create a new "user" list is:
      
      	https://www.emulab.net/dev/stoller/newmmlist.php3
      
        We do not store the password, but just fire it over in the list
        creation process.
      
        Anyone can create their own mailing lists. They are not associated
        with projects, but just the person creating the list. That person
        is the list administrator and is given permission to access the
        configuration page.
      
        This page is not hooked in yet; not sure where.
      
      * Once you have your own lists, you user profile page includes a link
        in the sub menu: Show Mailman Lists. From this page you can delete
        lists, zap to the admin page, or change the admin password (which is
        really just a subpage of the admin page).
      
      * As usual, in reddot mode you can mess with anyone else's mailman lists,
        (via the magic of mailman cookies).
      
      * Note on cross machine login. The mailman stuff has a really easy way
        to generate the right kind of cookie to give users access. You can
        generate a cookie to give user access, or to the admin interface for
        a list (a different cookie). Behind the scenes, I ssh over and get
        the cookie, and set it in the user's browser from boss. When the
        browser is redirected over to ops, that cookie goes along and gives
        the user the requested access. No passwords need be sent around,
        since we do the authentication ourselves.
      a64593f3
  12. 04 Aug, 2005 1 commit
    • Mike Hibler's avatar
      Further hackery to deal with more than 254 physical nodes in the testbed · 65e807a3
      Mike Hibler authored
      when assigning vnode (control net) IP addresses.  Once we hit 255, we
      increment the low bit of the network address ala:
      
      	on pc254: 172.16.254.N
      	on pc255: 172.17.1.N
      
      Note that the Utah case is different than the default in that we wrap at
      200 rather than 254.  This is so that we "wrap" at the boundary of the
      new cluster (which starts at pc201); i.e., pc201 has addresses 172.18.1.N.
      65e807a3
  13. 29 Jul, 2005 1 commit
    • Timothy Stack's avatar
      · cb7801fb
      Timothy Stack authored
      Fix the race between loading a mote and rebooting its host stargate.
      
      	* db/libdb.pm.in: Add TBNodeSubNodes function which returns the
      	list of subnodes for a given node.
      
      	* mote/tbuisp.in: Don't reboot the stargate anymore after loading
      	the attached mote.  The problem with the radio not working after
      	the upload should be fixed now.
      
      	* tbsetup/libreboot.pm.in: Check if a node's subnodes are being
      	reloaded.  If so, try to wait until they reach ISUP before
      	actually doing the reboot.
      
      	* tbsetup/os_setup.in: Do not skip the ISUP wait for subnodes that
      	are imageable (like motes), otherwise their allocstates are not
      	updated correctly.  Remove the robot-specific hack that	assumed
      	tbuisp would do the reboot if the attached mote was being reloaded.
      cb7801fb
  14. 26 Jul, 2005 1 commit
    • Kirk Webb's avatar
      · 72fc6f2c
      Kirk Webb authored
      Quick mod to stop an info leak.
      
      bootlogs were persisting after experiment termination, only being replaced
      when a particular node failed during TBSETUP (and hence sent back a boot
      log to be stashed).  This was leaking info such as project and experiment
      names, uids, groups.
      
      For now the bootlog is being cleared as nodes come in to an experiment
      via experiment swapin (inside nalloc).  running sched_reload or
      sched_reserve will also call nalloc if the node is free, hence clearing
      the bootlog as well.
      72fc6f2c
  15. 20 Jul, 2005 1 commit
  16. 22 Jun, 2005 1 commit
    • Leigh B. Stoller's avatar
      Added my simplistic link tracing and monitoring. Example usage and · 7942119e
      Leigh B. Stoller authored
      some details can be found in the advanced tutorial that I wrote up.
      See this link:
      
      http://www.emulab.net/tutorial/docwrapper.php3?docname=advanced.html#Tracing
      
      The basic idea is that each virt_lan entry gets a couple of new slots
      describing the type of tracing that is desired.
      
        traced tinyint(1) default '0',
        trace_type enum('header','packet','monitor') NOT NULL default 'header',
        trace_expr tinytext,
        trace_snaplen int(11) NOT NULL default '0',
        trace_endnode tinyint(1) NOT NULL default '0',
      
      There is a new physical table called "traces" that is a little bit
      like the current delays table. A new tmcd command returns the trace
      configuration to the client nodes (tmcd/common/config/rc.trace).
      
      The delays table got a new boolean called "noshaping" that tells the
      delay node to bridge, but not set up any pipes. This allows us to
      capture traffic at the delay node, but without much less overhead on
      the packets.
      
      The pcapper got bloated up to do packet capture and more event stuff.
      I also had to add some mutex locking around calls into the pcap
      library and around malloc, since the current setup used linuxthreads,
      which is not compatable with the standard libc_r library. I was
      getting all kinds of memory corruption, and I am sure that if someone
      breathes on the pcapper again, it will break in some new way.
      7942119e
  17. 13 Jun, 2005 1 commit
    • Timothy Stack's avatar
      · 5e43a771
      Timothy Stack authored
      Initial checkin of a "repositioning" daemon that moves robots back to
      their pens on swapout.
      
      	* configure, configure.in: Add tbsetup/repos_daemon.
      
      	* db/libdb.pm.in: Add constants for the
      	repositionpending/repositioning experiments.
      
      	* db/nfree.in: When freeing garcias, send them to
      	repositionpending instead of reloadpending.
      
      	* event/sched/event-sched.c: Deal with the rare case of no
      	SIMULATOR object being in the agent list for an experiment.
      
      	* robots/emc/emcd.c, robots/emc/locpiper.in: Fix some typos.
      
      	* robots/rmcd/masterController.h, robots/rmcd/masterController.c,
      	robots/rmcd/obstacles.h, robots/rmcd/obstacles.c: Ignore dynamic
      	obstacles that are far away and remove dynamic obstacles where the
      	robot is inside the natural obstacle area.
      
      	* sql/database-create.sql, sql/database-migrate.txt: Add a
      	reposition_status table that tracks the status of robots that are
      	being moved back to their pens.
      
      	* tbsetup/GNUmakefile.in: Install the repos_daemon script.
      
      	* tbsetup/reload_daemon.in: Move robots to the repositionpending
      	experiment, if they haven't already reached their pen.
      
      	* tbsetup/repos_daemon.in: Daemon that takes care of seeing robots
      	back to their pens after they are freed from an experiment.
      5e43a771
  18. 31 May, 2005 1 commit
  19. 29 May, 2005 1 commit
  20. 26 May, 2005 1 commit
  21. 20 May, 2005 1 commit
    • Leigh B. Stoller's avatar
      Change to stats gathering code. Generate a new experiment_resources · c5b79546
      Leigh B. Stoller authored
      record for each swapin (previously, it was just at swapmod). The
      reason for this is that as the testbed gets more fragmented in terms
      of hardware, it is less and less likely that consecutive swapins of
      the same experiment will use the same number of physical resources.
      
      We end up with some duplication of data inside the table, but no big
      deal. I suspect we will revisit this per experiment state as the
      workbench stuff proceeds.
      c5b79546
  22. 16 May, 2005 5 commits
    • Leigh B. Stoller's avatar
      Add support for specifying the CVS tag to use when getting the source code · f1863cfd
      Leigh B. Stoller authored
      for the inner elab.
      
      	tb-set-elabinelab-cvstag dist-foo
      
      Will result in this branch getting checked out from the mirrored repository
      (updated nightly) on boss and sent back to the node, instead of the usual
      source tarball that we keep in /usr/testbed/src (still the default
      behaviour if no tag is specified. You can also do this if you like:
      
      	tb-set-elabinelab-cvstag HEAD
      
      which of course is a special tag to CVS.
      f1863cfd
    • Timothy Stack's avatar
      For robots, reset anything related to physical location · 1cfc4a28
      Timothy Stack authored
      (nodes.destination_x,location_info.loc_x, ...).
      1cfc4a28
    • Timothy Stack's avatar
      · da5e8604
      Timothy Stack authored
      Some power-by-mail hacking:
      
        - Bump the timeout for waiting for the operators to flip the switch
          to 20 minutes.
      
        - Fail fast if the node is in hwdown.  This case is intended to make
          an os_load fail for a robot-mounted mote whose robot is in hwdown.
      
        - Fail if the robotlab is not open since noone is around to do
          anything about it anyways.
      
        - Assume success if the event state for a node was updated
          "recently."  This is a fall back in case the powertime web page
          isn't used to notify the system that the node was powered
          on/cycled.  Also, do not send the SHUTDOWN event in this case.
      
        - Add a TBNodeEventStateUpdated() function to libdb.pm that returns
          true if the eventstate for a node was updated within N seconds
          from the current time.
      da5e8604
    • Leigh B. Stoller's avatar
      Oops, turn off the debug flag. · dfc41c50
      Leigh B. Stoller authored
      dfc41c50
    • Leigh B. Stoller's avatar
      Strike 1; Rework the restart code, not that I have a little experience · c901f72a
      Leigh B. Stoller authored
      with it while mysqld was actually hung.
      c901f72a
  23. 12 May, 2005 3 commits
    • Leigh B. Stoller's avatar
      Checkpoint the rest of my changes to support swapmod of both ElabInElab and · 6eff9de6
      Leigh B. Stoller authored
      Firewalled experiments (see tbsetup/elabinelab.in for the other stuff).
      
      * To support firewalled experiments, needed to add a new virt_firewalls
        table to split the existing firewalls table up, which included both
        virtual and physical stuff. There are the usual frontend changes and a
        few other things scattered around, including tmcd.c.
      
      * The firewall code in tbswap got some beefing up to support adding and
        deleting nodes from the its special control net vlan. Note that I have
        not made any progress on containment of deleted nodes, just as we do not
        do anything now for teardown (unless its paniced, in which case the
        experiment cannot be modified anyway).
      
      * ptopgen and assign_wrapper got some interesting modifications: Unlike
        regular swapmod, we cannot just tear down all the vlans since that would
        interrupt everything inside the inner elab. Instead, leave the vlans as
        is. The problem is that when assign runs, it can just as easily pick
        different interfaces on the same nodes, which would be a royal pain in
        the ass to deal with! So, ptopgen got a new option (-u) that assign
        wrapper uses to tell ptopgen that it should prune out unused interfaces
        from nodes that are already allocated to the experiment. This is, at
        best, as pathetically gross hack, but it makes sure that all the
        interfaces stay the same across swapmods.
      
      * The unrelated revision of elabinelab has a bunch of new code for adding
        and deleting nodes from the inner elab. Mostly it deals with dhcpd (inner
        and outer, waiting for nodes to reboot, etc). It also deals with updating
        the vlans table in the DB, pruning out any nodes (ports) that are deleted
        but for which there are still interfaces in existing vlans. Said ports
        are them moved back to the default vlan with calls to snmpit. Also under
        another revision a a couple of weeks ago are the web interface changes to
        support the newnode MFS inside an inner Emulab.
      
      * swapexp and endexp got some more checks for firewalled and paniced
        experiments, which were missing.
      6eff9de6
    • Leigh B. Stoller's avatar
      Hmm, a questionable change that I needed cause of ElabInElab and swapmod. · b031d1a7
      Leigh B. Stoller authored
      When doing a swapmod, nodes already reserved to the experiment are "moved"
      (via update) to a holding reservation. Fine.  After assign runs, the old
      nodes are moved back, but this time by an insert into the table, which
      causes them to lose some fields that I do not want them to lose! This might
      really mean that these fields do not belong in the reserved table, but I do
      not want to ponder this right now. Instead I do another update bringing
      them back into the original experiment.
      
      I left comment indicating that this is under review (and why this commit is
      seperate from the rest of the swapmod changes).
      b031d1a7
    • Leigh B. Stoller's avatar
      Part of my changes to support swapmod of ElabInElab experiments. I needed · 283e27fd
      Leigh B. Stoller authored
      to get this change in cause it also includes some DHCPD conf changes and
      Mike and I were messing each other up.
      
      * The DHCPD change is that instead of using reserved.inner_elab_role
        as the flag to indicate a node should boot inside or outside, I
        added inner_elab_boot, which is a boolean that I set when its
        actually time to do this. This avoids two ElabInElab swapins at the
        same time from messing each other up! Basically avoids the obvious
        race.
      
      * The rest of the changes are for swapmod itself, which are incomplete
        but should be harmless until the rest of the stuff is ready.
      283e27fd
  24. 11 May, 2005 1 commit
  25. 04 May, 2005 1 commit
  26. 26 Apr, 2005 1 commit
    • Leigh B. Stoller's avatar
      A watchdog daemon to try and catch (and recover from) the periodic · c47cefa1
      Leigh B. Stoller authored
      mysqld hangs that cause the entire system to grind to a halt. The
      basic theory of operation is like this:
      
      * Once a minute fork a child (protected by a 60 second timeout) to
        connect to the DB and issue a simple query. If the child can access
        the DB okay, it exits with a zero status.
      
      * If the alarm fires, the child is killed. This indicates that mysqld
        is no longer responding in a reasonable amount of time (60 seconds).
        We shift into trying to restart mysqld:
      
           * Send mysqld a TERM. Wait for 30 seconds.
      
           * Try query again; typically, the situation will not have changed one
             bit, but I do it anyway.
      
           * If mysqld was running, send it a kill -9. Wait for 15 seconds.
      
           * Start mysqld. Wait for 5 seconds.
      
           * Try query again. If query succeeds, we are done, and no one
             will have to deal with it Sunday morning at 6am (thanks Tim).
      
           * If query still fails, send email and give up trying to do fix
             anything. The daemon continues to query the DB once a minute;
             once the query succeeds (cause a human fixed things up), the
             daemon goes back into its normal mode (attempt to fix things
             next time it fails).
      
      So, the problem is what happens when someone kills off mysqld for some
      other reason. It may be that this daemon should only try to restart
      mysqld if and only if, it actually killed a running mysqld. Comments?
      c47cefa1
  27. 25 Apr, 2005 1 commit
  28. 21 Apr, 2005 1 commit
  29. 15 Apr, 2005 1 commit
  30. 14 Apr, 2005 2 commits