1. 12 May, 2014 1 commit
    • Leigh Stoller's avatar
      Fix for loading an image on a remoteded pg node. This is a kludge, the · 15dce279
      Leigh Stoller authored
      notion of "dedicated" is currently a type specific attribute, but we
      also have "shared" nodes running on "dedicated" nodes, which messes
      everything up. I am not inclined to fix the underlying problem since
      Utah is the only site that uses this stuff, and these nodes are slowly
      dying out anyway.
      15dce279
  2. 26 Mar, 2014 1 commit
  3. 17 Mar, 2014 2 commits
    • Kirk Webb's avatar
      Refactor taintstate code and move final taint updates to stated. · 662972cd
      Kirk Webb authored
      Can't do the untainting for all cases in libosload*.  The untainting
      is now hooked into stated, where we catch the nodes as they send
      along their "RELOADDONE" events to update their taint state according
      to the final state of their partitions.
      662972cd
    • Kirk Webb's avatar
      Add taint state tracking for OSes and Nodes. · 1de4e516
      Kirk Webb authored
      Emulab can now propagate OS taint traits on to nodes that load these OSes.
      The primary reason for doing this is for loading images which
      require special treatment of the node.  For example, an OS that has
      proprietary software, and which will be used as an appliance (blackbox)
      can be marked (tainted) as such.  Code that manages user accounts on such
      OSes, along with other side channel providers (console, node admin, image
      creation) can key off of these taint states to prevent or alter access.
      
      Taint states are defined as SQL sets in the 'os_info' and 'nodes' tables,
      kept in the 'taint_states' column in both.  Currently these sets are comprised
      of the following entries:
      
      * usermode: OS/node should only allow user level access (not root)
      * blackbox: OS/node should allow no direct interaction via shell, console, etc.
      * dangerous: OS image may contain malicious software.
      
      Taint states are inherited by a node from OSes it loads during the OS load
      process.  Similarly, they are cleared from nodes as these OSes are removed.
      Any taint state applied to a node will currently enforce disk zeroing.
      
      No other tools/subsystems consider the taint states currently, but that will
      change soon.
      
      Setting taint states for an OS has to be done via SQL presently.
      1de4e516
  4. 28 Feb, 2013 1 commit
  5. 17 Nov, 2012 1 commit
    • Mike Hibler's avatar
      More PRObE inspired improvements to the swapin path. · 5a2810f2
      Mike Hibler authored
      Replace an exec of the perl os_select script with a call to the OSSelect()
      node method. This cut in half the time spent in the DB setup for each node.
      Note that this change had already been made to libosload_new.
      
      Reworked the code that setup the partitions table entries. We were potentially
      updating each DB row for each image loaded. Now we just work out all the changes
      in a perl data struct and make one set of DB changes at the end. The code is
      more comprehensible now as well (I hope!)
      
      Finally, disable the "swapinfo" stuff which was the first step in doing stateful
      swapout of disk state. That code never got finished.
      5a2810f2
  6. 25 Sep, 2012 1 commit
  7. 24 Sep, 2012 2 commits
    • Mike Hibler's avatar
      Add check in "wedged" code to verify that the node is not already reloading. · 45ec3557
      Mike Hibler authored
      Due to a race with collecting events, it looks like some events will still
      slip through the crack and we might wind up having missed a transition after
      five minutes. If we see that we are already in RELOADING (the state transition
      we are looking for) when we would declare the node wedged, then fake the
      transition and continue.
      
      I suspect this would not happen if I just looped on event_poll til there
      were no more events, but I am afraid of letting that loop go unbounded.
      So til I gather more data, lets go with this hack check.
      45ec3557
    • Eric Eide's avatar
      Replace license symbols with {{{ }}}-enclosed license blocks. · 6df609a9
      Eric Eide authored
      This commit is intended to makes the license status of Emulab and
      ProtoGENI source files more clear.  It replaces license symbols like
      "EMULAB-COPYRIGHT" and "GENIPUBLIC-COPYRIGHT" with {{{ }}}-delimited
      blocks that contain actual license statements.
      
      This change was driven by the fact that today, most people acquire and
      track Emulab and ProtoGENI sources via git.
      
      Before the Emulab source code was kept in git, the Flux Research Group
      at the University of Utah would roll distributions by making tar
      files.  As part of that process, the Flux Group would replace the
      license symbols in the source files with actual license statements.
      
      When the Flux Group moved to git, people outside of the group started
      to see the source files with the "unexpanded" symbols.  This meant
      that people acquired source files without actual license statements in
      them.  All the relevant files had Utah *copyright* statements in them,
      but without the expanded *license* statements, the licensing status of
      the source files was unclear.
      
      This commit is intended to clear up that confusion.
      
      Most Utah-copyrighted files in the Emulab source tree are distributed
      under the terms of the Affero GNU General Public License, version 3
      (AGPLv3).
      
      Most Utah-copyrighted files related to ProtoGENI are distributed under
      the terms of the GENI Public License, which is a BSD-like open-source
      license.
      
      Some Utah-copyrighted files in the Emulab source tree are distributed
      under the terms of the GNU Lesser General Public License, version 2.1
      (LGPL).
      6df609a9
  8. 23 Sep, 2012 1 commit
  9. 23 Aug, 2012 2 commits
  10. 25 Jul, 2012 3 commits
  11. 01 May, 2012 1 commit
    • Mike Hibler's avatar
      Fix libosload to properly get the size of images it cannot read directly. · 014b0f04
      Mike Hibler authored
      If a project image has been "exported" with grantimage, os_load may not
      be able to stat it due to unix permissions. So have os_load make a query
      to the local frisbee master server to get attributes for the image. The
      master server already knows how to deal with these exported images.
      This query also works for an inner boss determining the size of an image
      it has not yet downloaded and thus subsumes that case.
      014b0f04
  12. 20 Sep, 2011 1 commit
  13. 18 Feb, 2011 3 commits
  14. 18 Jan, 2011 1 commit
  15. 11 Jan, 2011 1 commit
    • Mike Hibler's avatar
      More work toward getting this working on subboss. · 8d80301e
      Mike Hibler authored
      More work on the hierarchical configuration for subboss. When doing host-based
      authentication, allow client to pass an explicit host (IP) to the mserver.
      If the mserver is configured to allow it, that IP is used for authenticating
      the request instead of the caller's IP. Add a default ("null") configuration
      so the mserver can operate out-of-the-box with no config file. The goal of
      these two changes is for an mserver instance with the default config and a
      proxy option to serve the needs of a subboss node (i.e., so no explicit
      configuration will be needed).
      8d80301e
  16. 11 Oct, 2010 1 commit
    • Leigh Stoller's avatar
      Work on an optimization to the perl code. Maybe you have noticed, but · 92f83e48
      Leigh Stoller authored
      starting any one of our scripts can take a second or two. That time is
      spent including and compiling 10000s of thousands of lines of perl
      code, both from our libraries and from the perl libraries.
      
      Mostly this is just a maintenance thing; we just never thought about
      it much and we have a lot more code these days.
      
      So I have done two things.
      
      1) I have used SelfLoader() on some of our biggest perl modules.
         SelfLoader delays compilation until code is used. This is not as
         good as AutoLoader() though, and so I did it with just a few 
         modules (the biggest ones).
      
      2) Mostly I reorganized things:
      
        a) Split libdb into an EmulabConstants module and all the rest of
           the code, which is slowly getting phased out.
      
        b) Move little things around to avoid including libdb or Experiment
           (the biggest files).
      
        c) Change "use foo" in many places to a "require foo" in the
           function that actually uses that module. This was really a big
           win cause we have dozens of cases where we would include a
           module, but use it in only one place and typically not all.
      
      Most things are now starting up in 1/3 the time. I am hoping this will
      help to reduce the load spiking we see on boss, and also help with the
      upcoming Geni tutorial (which kill boss last time).
      92f83e48
  17. 29 Sep, 2010 1 commit
  18. 20 Sep, 2010 1 commit
  19. 09 Jul, 2010 2 commits
  20. 23 Jun, 2010 1 commit
  21. 15 Mar, 2010 1 commit
    • Mike Hibler's avatar
      Clear out bogus DB partition info on default reload. · ca75a221
      Mike Hibler authored
      The new d710s have a single-partition image as their default.  As a result,
      when they were reloaded, the info for the other partitions (in particular
      partition 1) was not being cleared.  So if I were to break my install of
      FBSD72-STD on a d710 node and then free it, the reload would not clear
      the partition 1 info in the DB.  If someone else came along requesting
      FBSD72-STD and got that node assigned, os_load would think that it didn't
      need to reload partition 1 and just boot right into the old stale partition.
      
      Now for a "default" reload (i.e., from reload_daemon), a single partition
      load will clear all the other partitions.  We don't do this all the time
      since a user might want to reload one of the partitions on their allocated
      node without clobbering the others.
      ca75a221
  22. 15 Jan, 2010 1 commit
    • Mike Hibler's avatar
      Fix waitmode==2 logic. · ee7974ab
      Mike Hibler authored
      We were getting stuck in an infinite loop if a node failed to come back
      up for longer than its time limit.
      ee7974ab
  23. 23 Dec, 2009 1 commit
    • Leigh Stoller's avatar
      A couple of changes that attempt to cut short the waiting when · 28ac96a5
      Leigh Stoller authored
      a node has failed.
      
      * In the main wait loop, I check the eventstate for the node, for
        TBFAILED or PXEFAILED. Neither of these should happen after the
        reboot, so it makes sense to quit waiting if they do.
      
      * I added an event handler to libosload, specifically to watch for
        nodes entering RELOADSETUP or RELOADING, after the reboot. Because
        of the race with reboot, this was best done with a handler instead
        of polling the DB state like case #1 above. The idea is that a node
        should hit one of these two states within a fairly short time (I
        currently have it set to 5 minutes). If not, something is wrong and
        the loop bails on that node. ÊWhat happens after is subject to the
        normal waiting times.
      
      I believe that these two tests will catch a lot of cases where osload
      is waiting on something that will never finish.
      28ac96a5
  24. 22 Dec, 2009 1 commit
  25. 16 Oct, 2009 1 commit
    • David Johnson's avatar
      Make shared vnodes reloadable. This whole thing sucks for modifies · abcc783c
      David Johnson authored
      because we (vnode_setup) needs to go out to the nodes and run vnodesetup
      to trigger the reload, but os_setup needs to setup the reload.  So for
      now, os_setup sets up the reload but does not wait nor reboot the vnode;
      vnode_setup does that like normal.  Probably there are going to be timeout
      problems, but it's good enough for my needs right now.
      abcc783c
  26. 12 Oct, 2009 1 commit
    • David Johnson's avatar
      Add the ability to load images on virtnodes. For now, we just overload · c6c57bc9
      David Johnson authored
      the tb-set-node-os command with a second optional argument; if that is
      present, the first arg is the child OS and the second is the parent OS.
      We add some new features in ptopgen (OS-parentOSname-childOSname) based
      off a new table that maps which child OSes can run on which parents, and
      the right desires get added to match.  We setup the reloads in os_setup
      along with the parents.  Also needed a new opmode, RELOAD-PCVM, to handle
      all this.
      
      For now, users only have to specify that their images can run on pcvms, a
      special hack for which type the images can run on.  This makes sense in
      general since there is no point conditionalizing childOS loading on
      hardware type at the moment, but rather on parentOS.  Hopefully this stuff
      wiill mostly work on shared nodes too, although we'll have to be more
      aggressive on the client side garbage collecting old frisbee'd images for
      long-lived shared hosts.
      
      I only made these changes in libvtop, so assign_wrapper folks are left in
      the dark.
      
      Currently, the client side supports frisbee.  Only in openvz for now, and
      this probably breaks libvnode_xen.pm.  Also in here are some openvz
      improvements, like ability to sniff out which network is the public
      control net, and which is the fake virtual control net.
      c6c57bc9
  27. 04 Sep, 2009 1 commit
  28. 04 Aug, 2009 1 commit
    • Kevin Atkinson's avatar
      Implement frontend and middleend support for loading multiple images · e7871305
      Kevin Atkinson authored
      at once with Frisbee (excludes the actual MFS changes).
      
      Os_load now takes take a list of comma serrated image names for the
      "-i" and "-m" options.  The default OS is the OS for the last image
      specified in the list.  I also changed the "-p" option of osload to
      search both the project specified and emulab-ops for the image rather
      than just the project specified in order to simplify specifying
      multiple images (and because I personally found that behavior annoying
      when using osload).
      
      I modified the current_reloads table to be able to specify more than one
      image for a node by adding an "idx" column which controls the order of
      the reloads.  I also added a "prepare" column to the table (explained
      below)
      
      I modified tmcd to basically loop over the entries in the table and
      create a multiline LOADINFO responsive, and modified rc.frisbee to
      handle the multiline response and load each image in turn.
      
      I modified os_load to take a new option "-P" which will tell rc.frisbee
      to zap the superblocks even if a whole disk image is not specified.
      To do this I set the prepare entry for the first image in the
      current_reloads table to true.  Tmcd than passes this into to
      rc.frisbee in the LOADINFO line.  When rc.frisbee sees this it will
      make sure to zap the superblock before loading that image.
      
      To support having multiple images as the default, "default_imageid"
      can now be a comma separated list.  I implemented a hack to be able to
      set multiple imageids via editnodetype.php3.  Basically the form
      splits default_imageid into default_imageid_0, default_imageid_1, etc
      and than adds an empty default_imageid_# slot to allow adding an
      imageid.  Multiple images can be added by adding one image, than
      submitting the form, and than adding another into the empty slot.  Not
      the best, but I don't thing this will be a very common operation.
      When the form is submitted it will than combine all default_imageid_#
      into a comma separated list ignoring any that are deleted or set to
      "No ImageID" (ie 0).
      
      Everything will work fine with old MFSs as long as only one image is
      loaded.  If multiple images are loaded with an old MFS, an email will
      be sent to testbed-ops.  This works by having tmcd detect old MFS's by
      using the version number and setting the state to RELOADOLDMFS.  Stated
      will pick up on the and send the email to testbed-ops via a trigger.
      e7871305
  29. 10 Sep, 2008 1 commit
    • Mike Hibler's avatar
      Slight beefing up of support for alternate MBRs: · 31009d09
      Mike Hibler authored
       * when creating an image from a node, make sure the new image
         gets the MBR version used by the existing image
       * when loading a single-partition image that requires a different
         MBR, invalidate all other existing partition ("invalidate" in the
         sense that we remove any partitions table entries, we don't do anything
         to the disk)
      31009d09
  30. 05 Dec, 2007 1 commit
  31. 02 Nov, 2007 1 commit
  32. 31 Aug, 2007 1 commit