Skip to content
  • Mike Hibler's avatar
    Major overhaul to support thin snapshot volumes and also fixup locking. · a9e75f33
    Mike Hibler authored
    A "thin volume" is one in which storage allocation is done on demand; i.e.,
    space is not pre-allocated, hence the "thin" part. If thin snapshots and
    the associated base volume are all part of a "thin pool", then all snapshots
    and the base share blocks from that pool. If there are N snapshots of the
    base, and none have written a particular block, then there is only one copy
    of that block in the pool that everyone shares.
    
    Anyway, we now create a global thin pool in which the thin snapshots can be
    created. We currently allocate up to 75% of the available space in the VG
    to the pool (note: space allocated to the thin pool IS statically allocated).
    The other 25% is for Things That Will Not Be Shared and as fallback in case
    something on the thin volume path fails. That is, we can disable thin
    volume creation and go back to the standard path.
    
    Images are still downloaded and saved in compressed form in individual
    LVs. These LVs are not allocated from the pool since they are TTWNBS.
    
    When the first vnode comes along that needs an image, we imageunzip the
    compressed version to create a "golden disk" LV in the pool. That first
    node and all subsequent nodes get thin snapshots of that volume.
    
    When the last vnode that uses a golden disk goes away we...well,
    do nothing. Unless $REAP_GDS (linux/xen/libvnode_xen.pm) is set non-zero,
    in which case we reap the golden disk. We always leave the compressed
    image LV around. Leigh says he is going to write a daemon to GC all these
    things when we start to run short of VG space...
    
    This speed up for creation of vnodes that shared an image turned up some
    more rack conditions, particularly around iptables. I close a couple more
    holes (in particular, ensuring that we lock iptables when setting up
    enet interfaces as we do for the cnet interface) and added some optional
    lock debug logging (turned off right now).
    
    Timestamped those messages and a variety of other important messages
    so that we could merge (important parts of) the assorted logfiles and
    get a sequential picture of what happened:
    
        grep TIMESTAMP *.log | sort +2
    
    (Think of it as Weir lite!)
    a9e75f33