1. 17 Nov, 2003 1 commit
    • Leigh B. Stoller's avatar
      Merge the two state machines (batchstate and state) into a single · 2025e0bd
      Leigh B. Stoller authored
      state machine (state). All of the stuff that was previously handled by
      using batchstate is now embedded into the one state machine. Of
      course, these mostly overlapped, so its not that much of a change,
      except that we also redid the machine, adding more states (for
      example, modify phases are now explicit. To get a picture of the
      actual state machine, on boss:
      
      		stategraph -o newstates EXPTSTATE
      		gv newstates.ps
      
      Things to note:
      
      * The "batchstate" slot of the experiments table is now used solely to
        provide a lock for batch daemon. A secondary change will be to
        change the slot name to something more appropriate, but it can
        happen anytime after this new stuff is installed.
      
      * I have left expt_locked for now, but another later change will be to remove
        expt_locked, and change it to active_busy or some such new state name in
        the state machine. I have removed most uses of expt_locked, except those
        that were necessary until there is a new state to replace it.
      
      * These new changes are an implementation of the new state machine,
        but I have not done anything fancy. Most of the code is the same as
        it was before.
      
      * I suspect that there are races with the batch daemon now, but they
        are going to be rare, and the end result is probably that a
        cancelation is delayed a little bit.
      2025e0bd
  2. 05 Nov, 2003 1 commit
  3. 31 Oct, 2003 1 commit
  4. 20 Oct, 2003 1 commit
    • Leigh B. Stoller's avatar
      Bring wanassign back from the bit rot abyss. Three changes. · fe9eba11
      Leigh B. Stoller authored
      * Remove all of the code that dealt with allocating unconnected nodes.
        It used to be assign_wrapper passed all widearea node allocation
        decisions to wanassign, those in links and those that were
        unconnected. assign_wrapper now handles all unconnected nodes since
        assign is much better with features/desires and node type stuff.
      
      * Do not modify any database state in wanassign; It used to do the
        actual nalloc calls, but now it just returns the mapping to
        assign_wrapper so that we can more easily track "recoverability" and
        because there is existing code in assign_wrapper to allocate vnodes
        on the selected pnodes. No point in duplication.
      
      * Switch from mapping to vnodes, to mapping to pnodes. We made this
        change for other virtual nodes; instead of "fixing" to a vnode on a
        pnode, fix to the pnode. The resulting mappings are also given as
        pnodes, and assign_wrapper does the allocation on those selected
        nodes.
      
      Now all we need is uptodate widearea data!
      fe9eba11
  5. 19 Oct, 2003 1 commit
  6. 15 Oct, 2003 2 commits
  7. 13 Oct, 2003 1 commit
    • Leigh B. Stoller's avatar
      Aside from another round of cleanup, there is a significant change. · a70aef53
      Leigh B. Stoller authored
      I have implemented the suggestion Jay made a couple of weeks ago
      about allowing partial allocation in assign_wrapper, and retrying with a
      modified set of "fixed" nodes.
      
      My basic approach was to change nalloc to optionally allow partial
      allocations, returning the number of nodes that could not be allocated as
      its return value. In assign_wrapper, I determine which nodes we were able
      to get (in each loop), set their allocstate to INIT_DIRTY, augment the
      fixed_node set, and recreate the top file. Then I try again, up to the
      current number of maxtries. If assign fails with an unretryable error, or
      if we could not nalloc a user directed fixed node, then I stop right away
      since the experiment is not going to map (in the near term) if the fixed
      node list cannot be allocated.
      
      I am confident that this works okay, although testing is a little
      difficult. The main problem is how this interacts with experiment modify.
      Chad's implementation is that a modify can be reverted (recovered from)
      only as long as the DB is not modified by assign_wrapper. Well, a partial
      allocation, followed by failure, obviously modifies the DB, and so is
      deemed not recoverable. I am still trying to figure out the effects of
      this, and whether I can relax this requirement, but in the meantime
      lets install it and see what happens (won't affect many people).
      a70aef53
  8. 30 Sep, 2003 3 commits
    • Leigh B. Stoller's avatar
      Init delaynodes and jail hosting nodes with startstatus=0 so that · 55db053e
      Leigh B. Stoller authored
      the batch system see's them as always done. There is no reason to do
      this from the node itself, since it would be really hard to have
      either a jail or delay node without other nodes in the topology!
      55db053e
    • Leigh B. Stoller's avatar
      Remove tiny bit of debugging code. · f3c381da
      Leigh B. Stoller authored
      f3c381da
    • Leigh B. Stoller's avatar
      Up to now we have had two state variables associated with an experiment, · 4269dad1
      Leigh B. Stoller authored
      plus a lock field. The lock field was a simple "experiment locked, go away"
      slot that is easy to use when you do not care about the actual state that
      an experiment is in, just that it is in "transition" and should not be
      messed with.
      
      The other two state variables are "state" and "batchstate". The former
      (state) is the original variable that Chris added, and was used by the tb*
      scripts to make sure that the experiment was in the state each particular
      script wanted them to be in. But over time (and with the addition of so
      much wrapper goo around them), "state" has leaked out all over the place to
      determine what operations on an experiment are allowed, and if/when it
      should be displayed in various web pages. There are a set of transition
      states in addition to the usual "active", "swapped", etc like "swapping"
      that make testing state a pain in the butt.
      
      I added the other state variable ("batchstate") when I did the batch
      system, obviously! It was intended as a wrapper state to control access to
      the batch queue, and to prevent batch experiments from being messed with
      except when it was really okay (for example, its okay to terminate a
      swapped out batch experiment, but not a swapped in batch experiment since
      that would confuse the batch daemon). There are fewer of these states, plus
      one additional state for "modifying" experiments.
      
      So what I have done is change the system to use "batchstate" for all
      experiments to control entry into the swap system, from the web interface,
      from the command line, and from the batch daemon. The other state variable
      still exists, and will be brutally pushed back under the surface until its
      just a vague memory, used only by the original tb* scripts. This will
      happen over time, and the "batchstate" variable will be renamed once I am
      convinced that this was the right thing to do and that my changes actually
      work as intended.
      
      Only people who have bothered to read this far will know that I also added
      the ability to cancel experiment swapin in progress. For that I am using
      the "canceled" flag (ah, this one was named properly from the start!), and
      I test that at various times in assign_wrapper and tbswap. A minor downside
      right now is that a canceled swapin looks too much like a failed swapin,
      and so tbops gets email about it. I'll fix that at some point (sometime
      after the boss complains).
      
      I also cleaned up various bits of code, replacing direct calls to exec
      with calls to the recently improved SUEXEC interface. This removes
      some cruft from each script that calls an external script.
      
      Cleaned up modifyexp.ph3 quite a bit, reformatting and indenting.
      Also fixed to not run the parser directly! This was very wrong; should
      call nscheck instead. Changed to use "nobody" group instead of group
      flux (made the same change in nscheck).
      
      There is a script in the sql directory called newstates.pl. It needs
      to be run to initialize the batchstate slot of the experiments table
      for all existing experiments.
      4269dad1
  9. 26 Sep, 2003 1 commit
  10. 18 Sep, 2003 1 commit
  11. 17 Sep, 2003 2 commits
  12. 16 Sep, 2003 1 commit
  13. 13 Sep, 2003 1 commit
  14. 12 Sep, 2003 1 commit
  15. 11 Sep, 2003 1 commit
    • Leigh B. Stoller's avatar
      Widearea changes, mostly started for plab but affects all widearea · a5e1e2ee
      Leigh B. Stoller authored
      nodes. The intent is to better support last mile types, which right
      now are a total mess cause the types are associated with the virtual
      nodes, and a node can have just a single type. This info has now been
      moved in the node_types_auxtypes table and the node_types table.
      
      Anyway, we no longer use wanassign on unconnected widearea nodes, but
      use assign directly, much like we use assign to allocate virtual nodes
      on local physical nodes. ptopgen inserts the physical widearea nodes
      (even though they are allocated), and the proper counts for the types
      that are available on them. assign will pick the nodes, and
      assign_wrapper will allocate the necessary vnodes on the pnodes (but
      in the case of wideare nodes, the underlying physical node does not
      need to be allocated).
      
      Also added some fixes for dealing with vtypes when used in conjunction
      with widearea nodes. Rather then generating an error like it used to,
      you can create a vtype in your NS file:
      
      	tb-make-soft-vtype mytype {pcvroninet pcvwainet}
      	tb-set-hardware $v0 mytype
      
      and assign_wrapper now looks at the underlying types to figure out
      what it needs. Note: No consistency checking yet; mixing a remote/virt
      and a local/real type will break.
      a5e1e2ee
  16. 09 Sep, 2003 1 commit
  17. 02 Sep, 2003 1 commit
  18. 29 Aug, 2003 1 commit
    • Leigh B. Stoller's avatar
      Temporary patch to solve the non-connected veth interface problems, · 5f214b74
      Leigh B. Stoller authored
      which happens on lans of vnodes that are split between pnodes. assign
      spits out trivial links for the nodes collocated on the pnodes, but if
      there are two groups of vnodes on different pnodes, the connection is
      not explicit in the link statements that assign gives (techinically,
      they should not be trivial links, but Rob is still thinking that
      over). Fortunately, I have enough info from assign to extend the vlan
      and to patch the veth interfaces afterwards. Its god-awful stuff, and
      I hope I can strip it out soon.
      5f214b74
  19. 27 Aug, 2003 1 commit
    • Leigh B. Stoller's avatar
      Added "subnode" support, primarily for the IXPs. The main feature of · 4088258c
      Leigh B. Stoller authored
      subnodes is dealing with the hosting node, and the "fakelink" that
      we have to insert until assign is taught how to deal with hosting
      nodes. The fakelink causes assign to grab both nodes together.
      
      Also: * Minor cleanup.
            * Change control_net to control_iface.
            * Set the routertype to "manual" for jail hosts, and "none" for
              delay nodes. This is done in InitPnode.
            * Get rid of more shark code.
      4088258c
  20. 06 Aug, 2003 1 commit
  21. 05 Aug, 2003 1 commit
  22. 28 Jul, 2003 1 commit
  23. 15 Jul, 2003 1 commit
    • Leigh B. Stoller's avatar
      A set of changes to make swapmod work on jailed nodes (note, swapmod · 92ff875a
      Leigh B. Stoller authored
      does not yet work with remove virtual nodes; that will take even more
      work).
      
      Added a new allocstate called RES_TEARDOWN. assign_wrapper no longer
      deallocates unused nodes, but rather moves them into the new state for
      the wrapper (tbswap) to deal with. Thats cause deleted vnodes need to
      be torn down, since its possible that the node on which they were
      living will not be deallocated (say, if there are other vnodes on
      it). We do not want to be doing that from assign_wrapper, so tbswap
      looks for those nodes.
      
      Made vnode_setup allocstate aware in the same way that os_setup is;
      do not reboot vnodes or try to set up vnodes when they are already in
      the RES_READY state, as they will be when doing a swapmod. In
      addition, if os_setup is going to reboot the underlying physnode, move
      the vnodes on that node into RES_READY too, since there they will
      setup automatically. Might need an interim state here, for correctness.
      92ff875a
  24. 10 Jul, 2003 2 commits
    • Robert Ricci's avatar
      05c81c06
    • Leigh B. Stoller's avatar
      Split the -e option to ptopgen into -p/-e. Assign will always pass in · f5e2a9d3
      Leigh B. Stoller authored
      the pid so that ptopgen can do permission checks on the node types and
      classes before it sticks them into the ptop file. -e now takes just eid,
      but operates as before.
      
      Change the -v option slightly; assign passes -v option when it sees
      that the topology requires virtual nodes. Without -v, virtual nodes
      are not placed into the the ptop file, saving about 6000 lines of node
      entries!
      
      Get rid of the pcvm and pc601 hacks, and replace with a permission
      check, as determined by the nodetypeXpid_permissions. I've updated
      that table to reflect current types/classes.
      
      Change Rob's last change, which was doing a DB query on each node,
      which when used with -v, was issuing a lot of queries. I was puzzled
      why ptopgen was taking 4 seconds to run!
      
      Kill more sharks.
      Cleanup some terrible indenting.
      f5e2a9d3
  25. 08 Jul, 2003 2 commits
  26. 03 Jul, 2003 1 commit
  27. 01 Jul, 2003 2 commits
  28. 26 Jun, 2003 2 commits
    • Leigh B. Stoller's avatar
      Turn on lans for virtual nodes in assign_wrapper, now that Rob has · af86b3ba
      Leigh B. Stoller authored
      fixed the trivial_link problem for Lans.
      Add an option to ptopgen that says to include the various virtnode
      related things, so as not to slow down assign in the general case
      that virtnodes are not being used.
      af86b3ba
    • Robert Ricci's avatar
      Major changes to the way assign handles LAN nodes. · 83cfa8ec
      Robert Ricci authored
      LAN nodes are no longer treated specially. Instead, I've introduced
      the idea of 'static' types (old-style types retroactively become
      'dynamic' types). While a pnode can only satisfy one dynamic type at a
      time, it can always satisfy its static types (assuming it has enough
      capacity left.) Static types are flagged by prepending them with a '*'
      in the ptop file. So, for example, you may give switches the
      '*lan:10000' type so that they can satisfy virtual LAN nodes. Of
      course, other pnodes can have this type too, so that we can get
      'trivial LANs'.
      
      Actually, removing special treatment for LANs cleans up a lot of code.
      However, it may have some negative impacts on solutions, since we're
      not as smart about where to place LAN nodes as we used to be (they get
      annealed along with everything else, and not migrated.) I haven't seen
      any evidence of this yet, however.
      
      This leaves us with a single type of special pnode, a switch.
      
      Also added a new bit of syntax in ptop files - when '*' is given as a
      the maxiumum load for a type, the node is allowed to take on an
      infinite (well, actually, just a really big number of) vnodes of that
      type.
      
      ptopgen was modified to always report switches as being capable of
      hosting LANs, and assign_wrapper now understands direct links to LANs,
      which is what we get when the LAN is hosted directly on a switch.
      
      Fixed a bug in scoring direct links, in which the penatly was being
      added once when a direct link was mapped, but subtracted only once
      when it was freed.
      
      Added a '-T' option for doing simple self-testing. When adding a node
      to the solution, assign records the score, adds the node, removes it
      again, and checks to make sure that the resulting score is the same as
      the original score. The usefulness of this feature in debugging
      scoring problems cannot be understated...
      83cfa8ec
  29. 25 Jun, 2003 1 commit
  30. 24 Jun, 2003 1 commit
  31. 23 Jun, 2003 1 commit
  32. 19 Jun, 2003 1 commit
    • Leigh B. Stoller's avatar
      Add several controls from NS file. · 445edc6d
      Leigh B. Stoller authored
       * usewatunnels - Allow users to turn off widearea tunnels.
       * multiplex_factor - Allow user to specify vnode multiplex factor.
       * trivial_ok - Allow user to specify collocation okay for link.
      
      More work on the veth interface support and trivial link stuff.
      Appears to be operational and passes the test suite.
      445edc6d