1. 22 Aug, 2013 1 commit
    • Asias He's avatar
      block: Introduce bs->zero_beyond_eof · 0d51b4de
      Asias He authored
      In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when
      protocols reading beyond end of file), we break qemu-iotests ./check
      -qcow2 022. This happens because qcow2 temporarily sets ->growable = 1
      for vmstate accesses (which are stored beyond the end of regular image
      We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to
      disable ->zero_beyond_eof temporarily in addition to enable ->growable.
      [Since the broken patch "block: Produce zeros when protocols reading
      beyond end of file" has not been merged yet, I have applied this fix
      *first* and will then apply the next patch to keep the tree bisectable.
      -- Stefan]
      Suggested-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarAsias He <asias@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
  2. 26 Jul, 2013 1 commit
    • Ian Main's avatar
      Implement sync modes for drive-backup. · fc5d3f84
      Ian Main authored
      This patch adds sync-modes to the drive-backup interface and
      implements the FULL, NONE and TOP modes of synchronization.
      FULL performs as before copying the entire contents of the drive
      while preserving the point-in-time using CoW.
      NONE only copies new writes to the target drive.
      TOP copies changes to the topmost drive image and preserves the
      point-in-time using CoW.
      For sync mode TOP are creating a new target image using the same backing
      file as the original disk image.  Then any new data that has been laid
      on top of it since creation is copied in the main backup_run() loop.
      There is an extra check in the 'TOP' case so that we don't bother to copy
      all the data of the backing file as it already exists in the target.
      This is where the bdrv_co_is_allocated() is used to determine if the
      data exists in the topmost layer or below.
      Also any new data being written is intercepted via the write_notifier
      hook which ends up calling backup_do_cow() to copy old data out before
      it gets overwritten.
      For mode 'NONE' we create the new target image and only copy in the
      original data from the disk image starting from the time the call was
      made.  This preserves the point in time data by only copying the parts
      that are *going to change* to the target image.  This way we can
      reconstruct the final image by checking to see if the given block exists
      in the new target image first, and if it does not, you can get it from
      the original image.  This is basically an optimization allowing you to
      do point-in-time snapshots with low overhead vs the 'FULL' version.
      Since there is no old data to copy out the loop in backup_run() for the
      NONE case just calls qemu_coroutine_yield() which only wakes up after
      an event (usually cancel in this case).  The rest is handled by the
      before_write notifier which again calls backup_do_cow() to write out
      the old data so it can be preserved.
      Signed-off-by: default avatarIan Main <imain@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
  3. 28 Jun, 2013 2 commits
    • Dietmar Maurer's avatar
      block: add basic backup support to block driver · 98d2c6f2
      Dietmar Maurer authored
      backup_start() creates a block job that copies a point-in-time snapshot
      of a block device to a target block device.
      We call backup_do_cow() for each write during backup. That function
      reads the original data from the block device before it gets
      overwritten.  The data is then written to the target device.
      Currently backup cluster size is hardcoded to 65536 bytes.
      [I made a number of changes to Dietmar's original patch and folded them
      in to make code review easy.  Here is the full list:
       * Drop BackupDumpFunc interface in favor of a target block device
       * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes()
       * Use 0 delay instead of 1us, like other block jobs
       * Unify creation/start functions into backup_start()
       * Simplify cleanup, free bitmap in backup_run() instead of cb
       * function
       * Use HBitmap to avoid duplicating bitmap code
       * Use bdrv_getlength() instead of accessing ->total_sectors
       * directly
       * Delete the backup.h header file, it is no longer necessary
       * Move ./backup.c to block/backup.c
       * Remove #ifdefed out code
       * Coding style and whitespace cleanups
       * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks
       * Keep our own in-flight CowRequest list instead of using block.c
         tracked requests.  This means a little code duplication but is much
         simpler than trying to share the tracked requests list and use the
         backup block size.
       * Add on_source_error and on_target_error error handling.
       * Use trace events instead of DPRINTF()
      -- stefanha]
      Signed-off-by: default avatarDietmar Maurer <dietmar@proxmox.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
    • Stefan Hajnoczi's avatar
      block: add bdrv_add_before_write_notifier() · d616b224
      Stefan Hajnoczi authored
      The bdrv_add_before_write_notifier() function installs a callback that
      is invoked before a write request is processed.  This will be used to
      implement copy-on-write point-in-time snapshots where we need to copy
      out old data before overwriting it.
      Note that BdrvTrackedRequest is moved to block_int.h since it is passed
      to .notify() functions.
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Reviewed-by: default avatarKevin Wolf <kwolf@redhat.com>
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
  4. 04 Jun, 2013 1 commit
    • Wenchao Xia's avatar
      block: move qmp and info dump related code to block/qapi.c · f364ec65
      Wenchao Xia authored
      This patch is a pure code move patch, except following modification:
      1 get_human_readable_size() is changed to static function.
      2 dump_human_image_info() is renamed to bdrv_image_info_dump().
      3 in qmp_query_block() and qmp_query_blockstats, use bdrv_next(bs)
      instead of direct traverse of global array 'bdrv_states'.
      4 collect_snapshots() and collect_image_info() are renamed, unused parameter
      *fmt in collect_image_info() is removed.
      5 code style fix.
      To avoid conflict and tip better, macro in header file is BLOCK_QAPI_H
      instead of QAPI_H. Now block.h and snapshot.h are at the same level in
      include path, block_int.h and qapi.h will both include them.
      Signed-off-by: default avatarWenchao Xia <xiawenc@linux.vnet.ibm.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
  5. 22 Apr, 2013 1 commit
  6. 15 Apr, 2013 1 commit
  7. 05 Apr, 2013 2 commits
  8. 22 Mar, 2013 3 commits
    • Kevin Wolf's avatar
      block: Allow omitting the file name when using driver-specific options · c2ad1b0c
      Kevin Wolf authored
      After this patch, using -drive with an empty file name continues to open
      the file if driver-specific options are used. If no driver-specific
      options are specified, the semantics stay as it was: It defines a drive
      without an inserted medium.
      In order to achieve this, bdrv_open() must be made safe to work with a
      NULL filename parameter. The assumption that is made is that only block
      drivers which implement bdrv_parse_filename() support using driver
      specific options and could therefore work without a filename. These
      drivers must make sure to cope with NULL in their implementation of
      .bdrv_open() (this is only NBD for now). For all other drivers, the
      block layer code will make sure to error out before calling into their
      code - they can't possibly work without a filename.
      Now an NBD connection can be opened like this:
        qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    • Kevin Wolf's avatar
      block: Introduce .bdrv_parse_filename callback · 6963a30d
      Kevin Wolf authored
      If a driver needs structured data and not just a string, it can provide
      a .bdrv_parse_filename callback now that parses the command line string
      into separate options. Keeping this separate from .bdrv_open_filename
      ensures that the preferred way of directly specifying the options always
      works as well if parsing the string works.
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
    • Kevin Wolf's avatar
      block: Add options QDict to bdrv_file_open() prototypes · 787e4a85
      Kevin Wolf authored
      The new parameter is unused yet.
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
      Reviewed-by: default avatarEric Blake <eblake@redhat.com>
  9. 15 Mar, 2013 3 commits
  10. 01 Feb, 2013 1 commit
  11. 25 Jan, 2013 3 commits
  12. 19 Dec, 2012 4 commits
  13. 12 Dec, 2012 1 commit
    • Kevin Wolf's avatar
      qemu-io: Add AIO debugging commands · 41c695c7
      Kevin Wolf authored
      This makes the blkdebug suspend/resume functionality available in
      qemu-io. Use it like this:
        $ ./qemu-io blkdebug::/tmp/test.qcow2
        qemu-io> break write_aio req_a
        qemu-io> aio_write 0 4k
        qemu-io> blkdebug: Suspended request 'req_a'
        qemu-io> resume req_a
        blkdebug: Resuming request 'req_a'
        qemu-io> wrote 4096/4096 bytes at offset 0
        4 KiB, 1 ops; 0:00:30.71 (133.359788 bytes/sec and 0.0326 ops/sec)
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
  14. 24 Oct, 2012 2 commits
    • Paolo Bonzini's avatar
      mirror: add support for on-source-error/on-target-error · b952b558
      Paolo Bonzini authored
      Error management is important for mirroring; otherwise, an error on the
      target (even something as "innocent" as ENOSPC) requires to start again
      with a full copy.  Similar to on_read_error/on_write_error, two separate
      knobs are provided for on_source_error (reads) and on_target_error (writes).
      The default is 'report' for both.
      The 'ignore' policy will leave the sector dirty, so that it will be
      retried later.  Thus, it will not cause corruption.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
    • Paolo Bonzini's avatar
      mirror: introduce mirror job · 893f7eba
      Paolo Bonzini authored
      This patch adds the implementation of a new job that mirrors a disk to
      a new image while letting the guest continue using the old image.
      The target is treated as a "black box" and data is copied from the
      source to the target in the background.  This can be used for several
      purposes, including storage migration, continuous replication, and
      observation of the guest I/O in an external program.  It is also a
      first step in replacing the inefficient block migration code that is
      part of QEMU.
      The job is possibly never-ending, but it is logically structured into
      two phases: 1) copy all data as fast as possible until the target
      first gets in sync with the source; 2) keep target in sync and
      ensure that reopening to the target gets a correct (full) copy
      of the source data.
      The second phase is indicated by the progress in "info block-jobs"
      reporting the current offset to be equal to the length of the file.
      When the job is cancelled in the second phase, QEMU will run the
      job until the source is clean and quiescent, then it will report
      successful completion of the job.
      In other words, the BLOCK_JOB_CANCELLED event means that the target
      may _not_ be consistent with a past state of the source; the
      BLOCK_JOB_COMPLETED event means that the target is consistent with
      a past state of the source.  (Note that it could already happen
      that management lost the race against QEMU and got a completion
      event instead of cancellation).
      It is not yet possible to complete the job and switch over to the target
      disk.  The next patches will fix this and add many refinements to the
      basic idea introduced here.  These include improved error management,
      some tunable knobs and performance optimizations.
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
  15. 23 Oct, 2012 1 commit
  16. 28 Sep, 2012 7 commits
  17. 24 Sep, 2012 2 commits
    • Jeff Cody's avatar
      block: remove keep_read_only flag from BlockDriverState struct · dc1c13d9
      Jeff Cody authored
      The keep_read_only flag is no longer used, in favor of the bdrv
      flag BDRV_O_ALLOW_RDWR.
      Signed-off-by: default avatarJeff Cody <jcody@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
    • Jeff Cody's avatar
      block: Framework for reopening files safely · e971aa12
      Jeff Cody authored
      This is based on Supriya Kannery's bdrv_reopen() patch series.
      This provides a transactional method to reopen multiple
      images files safely.
      Image files are queue for reopen via bdrv_reopen_queue(), and the
      reopen occurs when bdrv_reopen_multiple() is called.  Changes are
      staged in bdrv_reopen_prepare() and in the equivalent driver level
      functions.  If any of the staged images fails a prepare, then all
      of the images left untouched, and the staged changes for each image
      Block drivers are passed a reopen state structure, that contains:
          * BDS to reopen
          * flags for the reopen
          * opaque pointer for any driver-specific data that needs to be
            persistent from _prepare to _commit/_abort
          * reopen queue pointer, if the driver needs to queue additional
            BDS for a reopen
      Signed-off-by: default avatarJeff Cody <jcody@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
  18. 13 Aug, 2012 1 commit
  19. 06 Aug, 2012 1 commit
    • Stefan Hajnoczi's avatar
      qcow2: implement lazy refcounts · bfe8043e
      Stefan Hajnoczi authored
      Lazy refcounts is a performance optimization for qcow2 that postpones
      refcount metadata updates and instead marks the image dirty.  In the
      case of crash or power failure the image will be left in a dirty state
      and repaired next time it is opened.
      Reducing metadata I/O is important for cache=writethrough and
      cache=directsync because these modes guarantee that data is on disk
      after each write (hence we cannot take advantage of caching updates in
      RAM).  Refcount metadata is not needed for guest->file block address
      translation and therefore does not need to be on-disk at the time of
      write completion - this is the motivation behind the lazy refcount
      The lazy refcount optimization must be enabled at image creation time:
        qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
        qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough
      Update qemu-iotests 031 and 036 since the extension header size changes
      when we add feature bit table entries.
      Signed-off-by: default avatarStefan Hajnoczi <stefanha@linux.vnet.ibm.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
  20. 17 Jul, 2012 1 commit
    • Markus Armbruster's avatar
      block: Geometry and translation hints are now useless, purge them · 2b584959
      Markus Armbruster authored
      There are two producers of these hints: drive_init() on behalf of
      -drive, and hd_geometry_guess().
      The only consumer of the hint is hd_geometry_guess().
      The callers of hd_geometry_guess() call it only when drive_init()
      didn't set the hints.  Therefore, drive_init()'s hints are never used.
      Thus, hd_geometry_guess() only ever sees hints it produced itself in a
      prior call.  Only the first call computes something, subsequent calls
      just repeat the first call's results.  However, hd_geometry_guess() is
      never called more than once: the device models don't, and the block
      device is destroyed on unplug.  Thus, dropping the repeat feature
      doesn't break anything now.
      If a block device wasn't destroyed on unplug and could be reused with
      a new device, then repeating old results would be wrong.  Thus,
      dropping the repeat feature prevents future breakage.
      This renders the hints unused.  Purge them from the block layer.
      Signed-off-by: default avatarMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>
  21. 15 Jun, 2012 1 commit
    • Kevin Wolf's avatar
      qemu-img check -r for repairing images · 4534ff54
      Kevin Wolf authored
      The QED block driver already provides the functionality to not only
      detect inconsistencies in images, but also fix them. However, this
      functionality cannot be manually invoked with qemu-img, but the
      check happens only automatically during bdrv_open().
      This adds a -r switch to qemu-img check that allows manual invocation
      of an image repair.
      Signed-off-by: default avatarKevin Wolf <kwolf@redhat.com>