1. 14 Mar, 2015 1 commit
  2. 13 Mar, 2015 1 commit
    • Mike Hibler's avatar
      MFS fixups. · e12934a2
      Mike Hibler authored
      Make sure we clear out any partial GPTs. On FreeBSD we just use
      "gpart destroy" which will get rid of an MBR or GPT.
      
      Tee the output of rc.frisbee into a file and upload that back to
      boss in the event of a failure. We will see if this proves useful.
      e12934a2
  3. 10 Mar, 2015 1 commit
  4. 06 Mar, 2015 2 commits
  5. 05 Mar, 2015 3 commits
  6. 04 Mar, 2015 3 commits
  7. 03 Mar, 2015 3 commits
  8. 25 Feb, 2015 2 commits
  9. 24 Feb, 2015 2 commits
  10. 23 Feb, 2015 1 commit
  11. 20 Feb, 2015 2 commits
  12. 19 Feb, 2015 5 commits
  13. 17 Feb, 2015 1 commit
    • Mike Hibler's avatar
      Major overhaul to support thin snapshot volumes and also fixup locking. · a9e75f33
      Mike Hibler authored
      A "thin volume" is one in which storage allocation is done on demand; i.e.,
      space is not pre-allocated, hence the "thin" part. If thin snapshots and
      the associated base volume are all part of a "thin pool", then all snapshots
      and the base share blocks from that pool. If there are N snapshots of the
      base, and none have written a particular block, then there is only one copy
      of that block in the pool that everyone shares.
      
      Anyway, we now create a global thin pool in which the thin snapshots can be
      created. We currently allocate up to 75% of the available space in the VG
      to the pool (note: space allocated to the thin pool IS statically allocated).
      The other 25% is for Things That Will Not Be Shared and as fallback in case
      something on the thin volume path fails. That is, we can disable thin
      volume creation and go back to the standard path.
      
      Images are still downloaded and saved in compressed form in individual
      LVs. These LVs are not allocated from the pool since they are TTWNBS.
      
      When the first vnode comes along that needs an image, we imageunzip the
      compressed version to create a "golden disk" LV in the pool. That first
      node and all subsequent nodes get thin snapshots of that volume.
      
      When the last vnode that uses a golden disk goes away we...well,
      do nothing. Unless $REAP_GDS (linux/xen/libvnode_xen.pm) is set non-zero,
      in which case we reap the golden disk. We always leave the compressed
      image LV around. Leigh says he is going to write a daemon to GC all these
      things when we start to run short of VG space...
      
      This speed up for creation of vnodes that shared an image turned up some
      more rack conditions, particularly around iptables. I close a couple more
      holes (in particular, ensuring that we lock iptables when setting up
      enet interfaces as we do for the cnet interface) and added some optional
      lock debug logging (turned off right now).
      
      Timestamped those messages and a variety of other important messages
      so that we could merge (important parts of) the assorted logfiles and
      get a sequential picture of what happened:
      
          grep TIMESTAMP *.log | sort +2
      
      (Think of it as Weir lite!)
      a9e75f33
  14. 10 Feb, 2015 1 commit
  15. 01 Feb, 2015 2 commits
  16. 30 Jan, 2015 1 commit
    • Mike Hibler's avatar
      Preliminary "golden image" support using thin volumes in LVM. · 660b8e45
      Mike Hibler authored
      Disabled for now. This is a checkpoint. This version still downloads
      the compressed image into a volume and imageunzips into another volume.
      The difference is that only one client does the imageunzip and then
      everyone makes a snapshot of that.
      
      On to getting rid of the initial download of the compressed image...
      660b8e45
  17. 28 Jan, 2015 1 commit
    • Mike Hibler's avatar
      Implement "plan 1" for dataset sharing: "ephemeral RO snapshots". · 7aefdaa1
      Mike Hibler authored
      You can now simultaneously RW and RO map a dataset because all the RO
      mappings use copies (clones) of a snapshot. Only a single RW mapping
      of course.
      
      When the RW mapping swaps out it automatically creates a new snapshot.
      So there is currently no user control over when a version of the dataset
      is "published", it just happens everytime you swapout an experiment with
      a RW mapping.
      
      A new RW mapping does not affect current RO mappings of course as they
      continue to use whatever snapshot they were created with. New RO mappings
      with get the most recent snapshot, which we currently track in the DB via
      the per-lease attribute "last_snapshot".
      
      You can also now declare a lease to be "exclusive use" by setting the
      "exclusive_use" lease attribute (via modlease). This means that it follows
      the old semantics of only one mapping at a time, whether it be RO or RW.
      This is an alternative to the "simultaneous_ro_datasets" sitevar which
      enforces the old behavior globally. Primarily, I put this attribute in to
      prevent an unexpected failure in the snapshot/clone path from wreaking
      havoc over time. I don't know if there is any value in exposing this to
      the user.
      7aefdaa1
  18. 26 Jan, 2015 1 commit
  19. 22 Jan, 2015 1 commit
  20. 21 Jan, 2015 1 commit
  21. 16 Jan, 2015 1 commit
    • Mike Hibler's avatar
      The Joy of Sed. · a861dd6c
      Mike Hibler authored
      We were parsing the disk number out with:
      
        dunit=`echo $disk | sed -e 's/..\([0-7]\)/\1/'`
      
      which works fine with "ad0", giving a dunit of "0"
      but with "ada0" you get a dunit of "a0"
      
      And, after misinterpreting through another sed command:
      
      dunit=`echo $dunit | sed -e 'y/01234567/abcdefgh/'`
      
      we get "aa" instead of "a". Append that to "sd" and your disk
      becomes "sdaa" instead of "sda". Next thing you know I'm blaming
      Emacs for inserting an extra character in /etc/fstab!
      a861dd6c
  22. 12 Jan, 2015 1 commit
  23. 09 Jan, 2015 3 commits