1. 20 Oct, 2014 5 commits
    • Markus Armbruster's avatar
    • Markus Armbruster's avatar
      block: Connect BlockBackend and DriveInfo · 18e46a03
      Markus Armbruster authored
      Make the BlockBackend own the DriveInfo.  Change blockdev_init() to
      return the BlockBackend instead of the DriveInfo.
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      18e46a03
    • Markus Armbruster's avatar
      block: Connect BlockBackend to BlockDriverState · 7e7d56d9
      Markus Armbruster authored
      Convenience function blk_new_with_bs() creates a BlockBackend with its
      BlockDriverState.  Callers have to unref both.  The commit after next
      will relieve them of the need to unref the BlockDriverState.
      
      Complication: due to the silly way drive_del works, we need a way to
      hide a BlockBackend, just like bdrv_make_anon().  To emphasize its
      "special" status, give the function a suitably off-putting name:
      blk_hide_on_behalf_of_do_drive_del().  Unfortunately, hiding turns the
      BlockBackend's name into the empty string.  Can't avoid that without
      breaking the blk->bs->device_name equals blk->name invariant.
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      7e7d56d9
    • Markus Armbruster's avatar
      block: New BlockBackend · 26f54e9a
      Markus Armbruster authored
      A block device consists of a frontend device model and a backend.
      
      A block backend has a tree of block drivers doing the actual work.
      The tree is managed by the block layer.
      
      We currently use a single abstraction BlockDriverState both for tree
      nodes and the backend as a whole.  Drawbacks:
      
      * Its API includes both stuff that makes sense only at the block
        backend level (root of the tree) and stuff that's only for use
        within the block layer.  This makes the API bigger and more complex
        than necessary.  Moreover, it's not obvious which interfaces are
        meant for device models, and which really aren't.
      
      * Since device models keep a reference to their backend, the backend
        object can't just be destroyed.  But for media change, we need to
        replace the tree.  Our solution is to make the BlockDriverState
        generic, with actual driver state in a separate object, pointed to
        by member opaque.  That lets us replace the tree by deinitializing
        and reinitializing its root.  This special need of the root makes
        the data structure awkward everywhere in the tree.
      
      The general plan is to separate the APIs into "block backend", for use
      by device models, monitor and whatever other code dealing with block
      backends, and "block driver", for use by the block layer and whatever
      other code (if any) dealing with trees and tree nodes.
      
      Code dealing with block backends, device models in particular, should
      become completely oblivious of BlockDriverState.  This should let us
      clean up both APIs, and the tree data structures.
      
      This commit is a first step.  It creates a minimal "block backend"
      API: type BlockBackend and functions to create, destroy and find them.
      
      BlockBackend objects are created and destroyed exactly when root
      BlockDriverState objects are created and destroyed.  "Root" in the
      sense of "in bdrv_states".  They're not yet used for anything; that'll
      come shortly.
      
      A root BlockDriverState is created with bdrv_new_root(), so where to
      create a BlockBackend is obvious.  Where these roots get destroyed
      isn't always as obvious.
      
      It is obvious in qemu-img.c, qemu-io.c and qemu-nbd.c, and in error
      paths of blockdev_init(), blk_connect().  That leaves destruction of
      objects successfully created by blockdev_init() and blk_connect().
      
      blockdev_init() is used only by drive_new() and qmp_blockdev_add().
      Objects created by the latter are currently indestructible (see commit
      48f364dd "blockdev: Refuse to drive_del something added with
      blockdev-add" and commit 2d246f01 "blockdev: Introduce
      DriveInfo.enable_auto_del").  Objects created by the former get
      destroyed by drive_del().
      
      Objects created by blk_connect() get destroyed by blk_disconnect().
      
      BlockBackend is reference-counted.  Its reference count never exceeds
      one so far, but that's going to change.
      
      In drive_del(), the BB's reference count is surely one now.  The BDS's
      reference count is greater than one when something else is holding a
      reference, such as a block job.  In this case, the BB is destroyed
      right away, but the BDS lives on until all extra references get
      dropped.
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      26f54e9a
    • Markus Armbruster's avatar
      block: Split bdrv_new_root() off bdrv_new() · e4e9986b
      Markus Armbruster authored
      Creating an anonymous BDS can't fail.  Make that obvious.
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Reviewed-by: default avatarBenoît Canet <benoit.canet@nodalink.com>
      Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      e4e9986b
  2. 03 Oct, 2014 5 commits
  3. 25 Sep, 2014 4 commits
  4. 11 Sep, 2014 1 commit
    • Markus Armbruster's avatar
      blockdev: Refuse to drive_del something added with blockdev-add · 48f364dd
      Markus Armbruster authored
      For some device models, the guest can prevent unplug.  Some users need a
      way to forcibly revoke device model access to the block backend then, so
      the underlying images can be safely used for something else.
      
      drive_del lets you do that.  Unfortunately, it conflates revoking access
      with destroying the backend.
      
      Commit 9063f814 made drive_del immediately destroy the root BDS.  Nice:
      the device name becomes available for reuse immediately.  Not so nice:
      the device model's pointer to the root BDS dangles, and we're prone to
      crash when the memory gets reused.
      
      Commit d22b2f41 fixed that by hiding the root BDS instead of destroying
      it.  Destruction only happens on unplug.  "Hiding" means removing it
      from bdrv_states and graph_bdrv_states; see bdrv_make_anon().
      
      This "destroy on revoke" is a misfeature we don't want to carry
      forward to blockdev-add, just like "destroy on unplug" (commit
      2d246f01).  So make drive_del fail on anything added with blockdev-add.
      
      We'll add separate QMP commands to revoke device model access and to
      destroy backends.
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      48f364dd
  5. 08 Sep, 2014 1 commit
  6. 29 Aug, 2014 2 commits
  7. 20 Aug, 2014 2 commits
    • Stefan Hajnoczi's avatar
      block: acquire AioContext in qmp_block_resize() · 927e0e76
      Stefan Hajnoczi authored
      Make block_resize safe for dataplane where another thread may be running
      the BlockDriverState's AioContext.
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      927e0e76
    • Markus Armbruster's avatar
      block: Use g_new() & friends where that makes obvious sense · 5839e53b
      Markus Armbruster authored
      g_new(T, n) is neater than g_malloc(sizeof(T) * n).  It's also safer,
      for two reasons.  One, it catches multiplication overflowing size_t.
      Two, it returns T * rather than void *, which lets the compiler catch
      more type errors.
      
      Patch created with Coccinelle, with two manual changes on top:
      
      * Add const to bdrv_iterate_format() to keep the types straight
      
      * Convert the allocation in bdrv_drop_intermediate(), which Coccinelle
        inexplicably misses
      
      Coccinelle semantic patch:
      
          @@
          type T;
          @@
          -g_malloc(sizeof(T))
          +g_new(T, 1)
          @@
          type T;
          @@
          -g_try_malloc(sizeof(T))
          +g_try_new(T, 1)
          @@
          type T;
          @@
          -g_malloc0(sizeof(T))
          +g_new0(T, 1)
          @@
          type T;
          @@
          -g_try_malloc0(sizeof(T))
          +g_try_new0(T, 1)
          @@
          type T;
          expression n;
          @@
          -g_malloc(sizeof(T) * (n))
          +g_new(T, n)
          @@
          type T;
          expression n;
          @@
          -g_try_malloc(sizeof(T) * (n))
          +g_try_new(T, n)
          @@
          type T;
          expression n;
          @@
          -g_malloc0(sizeof(T) * (n))
          +g_new0(T, n)
          @@
          type T;
          expression n;
          @@
          -g_try_malloc0(sizeof(T) * (n))
          +g_try_new0(T, n)
          @@
          type T;
          expression p, n;
          @@
          -g_realloc(p, sizeof(T) * (n))
          +g_renew(T, p, n)
          @@
          type T;
          expression p, n;
          @@
          -g_try_realloc(p, sizeof(T) * (n))
          +g_try_renew(T, p, n)
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Reviewed-by: default avatarMax Reitz <mreitz@redhat.com>
      Reviewed-by: default avatarJeff Cody <jcody@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      5839e53b
  8. 01 Jul, 2014 4 commits
    • Jeff Cody's avatar
      block: add backing-file option to block-stream · 13d8cc51
      Jeff Cody authored
      On some image chains, QEMU may not always be able to resolve the
      filenames properly, when updating the backing file of an image
      after a block job.
      
      For instance, certain relative pathnames may fail, or drives may
      have been specified originally by file descriptor (e.g. /dev/fd/???),
      or a relative protocol pathname may have been used.
      
      In these instances, QEMU may lack the information to be able to make
      the correct choice, but the user or management layer most likely does
      have that knowledge.
      
      With this extension to the block-stream api, the user is able to change
      the backing file of the active layer as part of the block-stream
      operation.
      
      This allows the change to be 'safe', in the sense that if the attempt
      to write the active image metadata fails, then the block-stream
      operation returns failure, without disrupting the guest.
      
      If a backing file string is not specified in the command, the backing
      file string to use is determined in the same manner as it was
      previously.
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Signed-off-by: default avatarJeff Cody <jcody@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      13d8cc51
    • Jeff Cody's avatar
      block: extend block-commit to accept a string for the backing file · 54e26900
      Jeff Cody authored
      On some image chains, QEMU may not always be able to resolve the
      filenames properly, when updating the backing file of an image
      after a block commit.
      
      For instance, certain relative pathnames may fail, or drives may
      have been specified originally by file descriptor (e.g. /dev/fd/???),
      or a relative protocol pathname may have been used.
      
      In these instances, QEMU may lack the information to be able to make
      the correct choice, but the user or management layer most likely does
      have that knowledge.
      
      With this extension to the block-commit api, the user is able to change
      the backing file of the overlay image as part of the block-commit
      operation.
      
      This allows the change to be 'safe', in the sense that if the attempt
      to write the overlay image metadata fails, then the block-commit
      operation returns failure, without disrupting the guest.
      
      If the commit top is the active layer, then specifying the backing
      file string will be treated as an error (there is no overlay image
      to modify in that case).
      
      If a backing file string is not specified in the command, the backing
      file string to use is determined in the same manner as it was
      previously.
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Signed-off-by: default avatarJeff Cody <jcody@redhat.com>
      Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      54e26900
    • Jeff Cody's avatar
      block: add QAPI command to allow live backing file change · fa40e656
      Jeff Cody authored
      This allows a user to make a live change to the backing file recorded in
      an open image.
      
      The image file to modify can be specified 2 ways:
      
      1) image filename
      2) image node-name
      
      Note: this does not cause the backing file itself to be reopened; it
      merely changes the backing filename in the image file structure, and
      in internal BDS structures.
      
      It is the responsibility of the user to pass a filename string that
      can be resolved when the image chain is reopened, and the filename
      string is not validated.
      
      A good analogy for this command is that it is a live version of
      'qemu-img rebase -u', with respect to changing the backing file string.
      
      [Jeff is offline so I respun this patch in his absence.  Dropped image
      filename since using node-name is preferred and this is a new command.
      No need to introduce the limitations of finding images by filename.
      --Stefan]
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: default avatarJeff Cody <jcody@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      fa40e656
    • Jeff Cody's avatar
      block: make 'top' argument to block-commit optional · 7676e2c5
      Jeff Cody authored
      Now that active layer block-commit is supported, the 'top' argument
      no longer needs to be mandatory.
      
      Change it to optional, with the default being the active layer in the
      device chain.
      
      [kwolf: Rebased and resolved conflict in tests/qemu-iotests/040]
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarBenoit Canet <benoit@irqsave.net>
      Signed-off-by: default avatarJeff Cody <jcody@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      7676e2c5
  9. 27 Jun, 2014 3 commits
  10. 23 Jun, 2014 1 commit
  11. 16 Jun, 2014 3 commits
  12. 04 Jun, 2014 1 commit
  13. 30 May, 2014 2 commits
  14. 28 May, 2014 4 commits
  15. 19 May, 2014 2 commits
    • Peter Lieven's avatar
      block: optimize zero writes with bdrv_write_zeroes · 465bee1d
      Peter Lieven authored
      this patch tries to optimize zero write requests
      by automatically using bdrv_write_zeroes if it is
      supported by the format.
      
      This significantly speeds up file system initialization and
      should speed zero write test used to test backend storage
      performance.
      
      I ran the following 2 tests on my internal SSD with a
      50G QCOW2 container and on an attached iSCSI storage.
      
      a) mkfs.ext4 -E lazy_itable_init=0,lazy_journal_init=0 /dev/vdX
      
      QCOW2         [off]     [on]     [unmap]
      -----
      runtime:       14secs    1.1secs  1.1secs
      filesize:      937M      18M      18M
      
      iSCSI         [off]     [on]     [unmap]
      ----
      runtime:       9.3s      0.9s     0.9s
      
      b) dd if=/dev/zero of=/dev/vdX bs=1M oflag=direct
      
      QCOW2         [off]     [on]     [unmap]
      -----
      runtime:       246secs   18secs   18secs
      filesize:      51G       192K     192K
      throughput:    203M/s    2.3G/s   2.3G/s
      
      iSCSI*        [off]     [on]     [unmap]
      ----
      runtime:       8mins     45secs   33secs
      throughput:    106M/s    1.2G/s   1.6G/s
      allocated:     100%      100%     0%
      
      * The storage was connected via an 1Gbit interface.
        It seems to internally handle writing zeroes
        via WRITESAME16 very fast.
      Signed-off-by: default avatarPeter Lieven <pl@kamp.de>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      465bee1d
    • Peter Lieven's avatar
      blockdev: add a function to parse enum ids from strings · 82a402e9
      Peter Lieven authored
      this adds a generic function to recover the enum id of a parameter
      given as a string.
      Signed-off-by: default avatarPeter Lieven <pl@kamp.de>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      82a402e9