- 22 Aug, 2013 1 commit
-
-
Asias He authored
In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when protocols reading beyond end of file), we break qemu-iotests ./check -qcow2 022. This happens because qcow2 temporarily sets ->growable = 1 for vmstate accesses (which are stored beyond the end of regular image data). We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to disable ->zero_beyond_eof temporarily in addition to enable ->growable. [Since the broken patch "block: Produce zeros when protocols reading beyond end of file" has not been merged yet, I have applied this fix *first* and will then apply the next patch to keep the tree bisectable. -- Stefan] Suggested-by:
Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by:
Asias He <asias@redhat.com> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
- 26 Jul, 2013 1 commit
-
-
Ian Main authored
This patch adds sync-modes to the drive-backup interface and implements the FULL, NONE and TOP modes of synchronization. FULL performs as before copying the entire contents of the drive while preserving the point-in-time using CoW. NONE only copies new writes to the target drive. TOP copies changes to the topmost drive image and preserves the point-in-time using CoW. For sync mode TOP are creating a new target image using the same backing file as the original disk image. Then any new data that has been laid on top of it since creation is copied in the main backup_run() loop. There is an extra check in the 'TOP' case so that we don't bother to copy all the data of the backing file as it already exists in the target. This is where the bdrv_co_is_allocated() is used to determine if the data exists in the topmost layer or below. Also any new data being written is intercepted via the write_notifier hook which ends up calling backup_do_cow() to copy old data out before it gets overwritten. For mode 'NONE' we create the new target image and only copy in the original data from the disk image starting from the time the call was made. This preserves the point in time data by only copying the parts that are *going to change* to the target image. This way we can reconstruct the final image by checking to see if the given block exists in the new target image first, and if it does not, you can get it from the original image. This is basically an optimization allowing you to do point-in-time snapshots with low overhead vs the 'FULL' version. Since there is no old data to copy out the loop in backup_run() for the NONE case just calls qemu_coroutine_yield() which only wakes up after an event (usually cancel in this case). The rest is handled by the before_write notifier which again calls backup_do_cow() to write out the old data so it can be preserved. Signed-off-by:
Ian Main <imain@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 28 Jun, 2013 2 commits
-
-
Dietmar Maurer authored
backup_start() creates a block job that copies a point-in-time snapshot of a block device to a target block device. We call backup_do_cow() for each write during backup. That function reads the original data from the block device before it gets overwritten. The data is then written to the target device. Currently backup cluster size is hardcoded to 65536 bytes. [I made a number of changes to Dietmar's original patch and folded them in to make code review easy. Here is the full list: * Drop BackupDumpFunc interface in favor of a target block device * Detect zero clusters with buffer_is_zero() and use bdrv_co_write_zeroes() * Use 0 delay instead of 1us, like other block jobs * Unify creation/start functions into backup_start() * Simplify cleanup, free bitmap in backup_run() instead of cb * function * Use HBitmap to avoid duplicating bitmap code * Use bdrv_getlength() instead of accessing ->total_sectors * directly * Delete the backup.h header file, it is no longer necessary * Move ./backup.c to block/backup.c * Remove #ifdefed out code * Coding style and whitespace cleanups * Use bdrv_add_before_write_notifier() instead of blockjob-specific hooks * Keep our own in-flight CowRequest list instead of using block.c tracked requests. This means a little code duplication but is much simpler than trying to share the tracked requests list and use the backup block size. * Add on_source_error and on_target_error error handling. * Use trace events instead of DPRINTF() -- stefanha] Signed-off-by:
Dietmar Maurer <dietmar@proxmox.com> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Stefan Hajnoczi authored
The bdrv_add_before_write_notifier() function installs a callback that is invoked before a write request is processed. This will be used to implement copy-on-write point-in-time snapshots where we need to copy out old data before overwriting it. Note that BdrvTrackedRequest is moved to block_int.h since it is passed to .notify() functions. Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Kevin Wolf <kwolf@redhat.com> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 04 Jun, 2013 1 commit
-
-
Wenchao Xia authored
This patch is a pure code move patch, except following modification: 1 get_human_readable_size() is changed to static function. 2 dump_human_image_info() is renamed to bdrv_image_info_dump(). 3 in qmp_query_block() and qmp_query_blockstats, use bdrv_next(bs) instead of direct traverse of global array 'bdrv_states'. 4 collect_snapshots() and collect_image_info() are renamed, unused parameter *fmt in collect_image_info() is removed. 5 code style fix. To avoid conflict and tip better, macro in header file is BLOCK_QAPI_H instead of QAPI_H. Now block.h and snapshot.h are at the same level in include path, block_int.h and qapi.h will both include them. Signed-off-by:
Wenchao Xia <xiawenc@linux.vnet.ibm.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 22 Apr, 2013 1 commit
-
-
Kevin Wolf authored
It is unused now in all block drivers. Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com>
-
- 15 Apr, 2013 1 commit
-
-
Kevin Wolf authored
Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
- 05 Apr, 2013 2 commits
-
-
Stefan Hajnoczi authored
It is not necessary to adjust the slice time at runtime. We already extend the current slice in order to carry over accounting into the next slice. Changing the actual slice time value introduces oscillations. The guest may experience large changes in throughput or IOPS from one moment to the next when slice times are adjusted. Reported-by:
Benoît Canet <benoit@irqsave.net> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-By:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Stefan Hajnoczi authored
I/O throttling relies on bdrv_acct_done() which is called when a request completes. This leaves a blind spot since we only charge for completed requests, not submitted requests. For example, if there is 1 operation remaining in this time slice the guest could submit 3 operations and they will all be submitted successfully since they don't actually get accounted for until they complete. Originally we probably thought this is okay since the requests will be accounted when the time slice is extended. In practice it causes fluctuations since the guest can exceed its I/O limit and it will be punished for this later on. Account for I/O upon submission so that I/O limits are enforced properly. Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Tested-By:
Benoit Canet <benoit@irqsave.net> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 22 Mar, 2013 3 commits
-
-
Kevin Wolf authored
After this patch, using -drive with an empty file name continues to open the file if driver-specific options are used. If no driver-specific options are specified, the semantics stay as it was: It defines a drive without an inserted medium. In order to achieve this, bdrv_open() must be made safe to work with a NULL filename parameter. The assumption that is made is that only block drivers which implement bdrv_parse_filename() support using driver specific options and could therefore work without a filename. These drivers must make sure to cope with NULL in their implementation of .bdrv_open() (this is only NBD for now). For all other drivers, the block layer code will make sure to error out before calling into their code - they can't possibly work without a filename. Now an NBD connection can be opened like this: qemu-system-x86_64 -drive file.driver=nbd,file.port=1234,file.host=::1 Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com>
-
Kevin Wolf authored
If a driver needs structured data and not just a string, it can provide a .bdrv_parse_filename callback now that parses the command line string into separate options. Keeping this separate from .bdrv_open_filename ensures that the preferred way of directly specifying the options always works as well if parsing the string works. Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com>
-
Kevin Wolf authored
The new parameter is unused yet. Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com>
-
- 15 Mar, 2013 3 commits
-
-
Stefan Hajnoczi authored
For now bdrv_get_aio_context() is just a stub that calls qemu_aio_get_context() since the block layer is currently tied to the main loop AioContext. Add the stub now so that the block layer can begin accessing its AioContext. Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Kevin Wolf authored
It doesn't do anything yet except storing the options QDict in the BlockDriverState. Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
Kevin Wolf authored
Signed-off-by:
Kevin Wolf <kwolf@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Reviewed-by:
Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
- 01 Feb, 2013 1 commit
-
-
Othmar Pasteka authored
Introduce a new option "adapter_type" when converting to vmdk images. It can be one of the following: ide (default), buslogic, lsilogic or legacyESX (according to the vmdk spec from vmware). In case of a non-ide adapter, heads is set to 255 instead of the 16. The latter is used for "ide". Also see LP#545089 Signed-off-by:
Othmar Pasteka <pasteka@kabsi.at> Signed-off-by:
Stefan Hajnoczi <stefanha@redhat.com>
-
- 25 Jan, 2013 3 commits
-
-
Paolo Bonzini authored
This makes sense when the next commit starts using the extra buffer space to perform many I/O operations asynchronously. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Paolo Bonzini authored
The desired granularity may be very different depending on the kind of operation (e.g. continuous replication vs. collapse-to-raw) and whether the VM is expected to perform lots of I/O while mirroring is in progress. Allow the user to customize it, while providing a sane default so that in general there will be no extra allocated space in the target compared to the source. Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Paolo Bonzini authored
This actually uses the dirty bitmap in the block layer, and converts mirroring to use an HBitmapIter. Reviewed-by: Laszlo Ersek <lersek@redhat.com> (except block/mirror.c parts) Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 19 Dec, 2012 4 commits
-
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 12 Dec, 2012 1 commit
-
-
Kevin Wolf authored
This makes the blkdebug suspend/resume functionality available in qemu-io. Use it like this: $ ./qemu-io blkdebug::/tmp/test.qcow2 qemu-io> break write_aio req_a qemu-io> aio_write 0 4k qemu-io> blkdebug: Suspended request 'req_a' qemu-io> resume req_a blkdebug: Resuming request 'req_a' qemu-io> wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0:00:30.71 (133.359788 bytes/sec and 0.0326 ops/sec) Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 24 Oct, 2012 2 commits
-
-
Paolo Bonzini authored
Error management is important for mirroring; otherwise, an error on the target (even something as "innocent" as ENOSPC) requires to start again with a full copy. Similar to on_read_error/on_write_error, two separate knobs are provided for on_source_error (reads) and on_target_error (writes). The default is 'report' for both. The 'ignore' policy will leave the sector dirty, so that it will be retried later. Thus, it will not cause corruption. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Paolo Bonzini authored
This patch adds the implementation of a new job that mirrors a disk to a new image while letting the guest continue using the old image. The target is treated as a "black box" and data is copied from the source to the target in the background. This can be used for several purposes, including storage migration, continuous replication, and observation of the guest I/O in an external program. It is also a first step in replacing the inefficient block migration code that is part of QEMU. The job is possibly never-ending, but it is logically structured into two phases: 1) copy all data as fast as possible until the target first gets in sync with the source; 2) keep target in sync and ensure that reopening to the target gets a correct (full) copy of the source data. The second phase is indicated by the progress in "info block-jobs" reporting the current offset to be equal to the length of the file. When the job is cancelled in the second phase, QEMU will run the job until the source is clean and quiescent, then it will report successful completion of the job. In other words, the BLOCK_JOB_CANCELLED event means that the target may _not_ be consistent with a past state of the source; the BLOCK_JOB_COMPLETED event means that the target is consistent with a past state of the source. (Note that it could already happen that management lost the race against QEMU and got a completion event instead of cancellation). It is not yet possible to complete the job and switch over to the target disk. The next patches will fix this and add many refinements to the basic idea introduced here. These include improved error management, some tunable knobs and performance optimizations. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 23 Oct, 2012 1 commit
-
-
Paolo Bonzini authored
The first user of close notifiers will be the embedded NBD server. It would be possible to use them to do some of the ad hoc processing (e.g. for block jobs and I/O limits) that is currently done by bdrv_close. Acked-by:
Kevin Wolf <kwolf@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- 28 Sep, 2012 7 commits
-
-
Paolo Bonzini authored
This patch adds support for error management to streaming. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Paolo Bonzini authored
The following behaviors are possible: 'report': The behavior is the same as in 1.1. An I/O error, respectively during a read or a write, will complete the job immediately with an error code. 'ignore': An I/O error, respectively during a read or a write, will be ignored. For streaming, the job will complete with an error and the backing file will be left in place. For mirroring, the sector will be marked again as dirty and re-examined later. 'stop': The job will be paused and the job iostatus will be set to failed or nospace, while the VM will keep running. This can only be specified if the block device has rerror=stop and werror=stop or enospc. 'enospc': Behaves as 'stop' for ENOSPC errors, 'report' for others. In all cases, even for 'report', the I/O error is reported as a QMP event BLOCK_JOB_ERROR, with the same arguments as BLOCK_IO_ERROR. It is possible that while stopping the VM a BLOCK_IO_ERROR event will be reported and will clobber the event from BLOCK_JOB_ERROR, or vice versa. This is not really avoidable since stopping the VM completes all pending I/O requests. In fact, it is already possible now that a series of BLOCK_IO_ERROR events are reported with rerror=stop, because vm_stop calls bdrv_drain_all and this can generate further errors. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Paolo Bonzini authored
This will let block-stream reuse the enum. Places that used the enums are renamed accordingly. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Paolo Bonzini authored
We want to remove knowledge of BLOCK_ERR_STOP_ENOSPC from drivers; drivers should only be told whether to stop/report/ignore the error. On the other hand, we want to keep using the nicer BlockErrorAction name in the drivers. So rename the enums, while leaving aside the names of the enum values for now. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Paolo Bonzini authored
Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Paolo Bonzini authored
Do this in a separate commit before we move the functions to blockjob.h. Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Jeff Cody authored
This adds the live commit coroutine. This iteration focuses on the commit only below the active layer, and not the active layer itself. The behaviour is similar to block streaming; the sectors are walked through, and anything that exists above 'base' is committed back down into base. At the end, intermediate images are deleted, and the chain stitched together. Images are restored to their original open flags upon completion. Signed-off-by:
Jeff Cody <jcody@redhat.com> Reviewed-by:
Eric Blake <eblake@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 24 Sep, 2012 2 commits
-
-
Jeff Cody authored
The keep_read_only flag is no longer used, in favor of the bdrv flag BDRV_O_ALLOW_RDWR. Signed-off-by:
Jeff Cody <jcody@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
Jeff Cody authored
This is based on Supriya Kannery's bdrv_reopen() patch series. This provides a transactional method to reopen multiple images files safely. Image files are queue for reopen via bdrv_reopen_queue(), and the reopen occurs when bdrv_reopen_multiple() is called. Changes are staged in bdrv_reopen_prepare() and in the equivalent driver level functions. If any of the staged images fails a prepare, then all of the images left untouched, and the staged changes for each image abandoned. Block drivers are passed a reopen state structure, that contains: * BDS to reopen * flags for the reopen * opaque pointer for any driver-specific data that needs to be persistent from _prepare to _commit/_abort * reopen queue pointer, if the driver needs to queue additional BDS for a reopen Signed-off-by:
Jeff Cody <jcody@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 13 Aug, 2012 1 commit
-
-
Luiz Capitulino authored
Several block/ files are relying on qerror.h being provided by qapi-types.h. Fix this, as a future commit will change qapi-types.h not to provide qerror.h. Signed-off-by:
Luiz Capitulino <lcapitulino@redhat.com> Reviewed-by:
Markus Armbruster <armbru@redhat.com>
-
- 06 Aug, 2012 1 commit
-
-
Stefan Hajnoczi authored
Lazy refcounts is a performance optimization for qcow2 that postpones refcount metadata updates and instead marks the image dirty. In the case of crash or power failure the image will be left in a dirty state and repaired next time it is opened. Reducing metadata I/O is important for cache=writethrough and cache=directsync because these modes guarantee that data is on disk after each write (hence we cannot take advantage of caching updates in RAM). Refcount metadata is not needed for guest->file block address translation and therefore does not need to be on-disk at the time of write completion - this is the motivation behind the lazy refcount optimization. The lazy refcount optimization must be enabled at image creation time: qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough Update qemu-iotests 031 and 036 since the extension header size changes when we add feature bit table entries. Signed-off-by:
Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 17 Jul, 2012 1 commit
-
-
Markus Armbruster authored
There are two producers of these hints: drive_init() on behalf of -drive, and hd_geometry_guess(). The only consumer of the hint is hd_geometry_guess(). The callers of hd_geometry_guess() call it only when drive_init() didn't set the hints. Therefore, drive_init()'s hints are never used. Thus, hd_geometry_guess() only ever sees hints it produced itself in a prior call. Only the first call computes something, subsequent calls just repeat the first call's results. However, hd_geometry_guess() is never called more than once: the device models don't, and the block device is destroyed on unplug. Thus, dropping the repeat feature doesn't break anything now. If a block device wasn't destroyed on unplug and could be reused with a new device, then repeating old results would be wrong. Thus, dropping the repeat feature prevents future breakage. This renders the hints unused. Purge them from the block layer. Signed-off-by:
Markus Armbruster <armbru@redhat.com> Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-
- 15 Jun, 2012 1 commit
-
-
Kevin Wolf authored
The QED block driver already provides the functionality to not only detect inconsistencies in images, but also fix them. However, this functionality cannot be manually invoked with qemu-img, but the check happens only automatically during bdrv_open(). This adds a -r switch to qemu-img check that allows manual invocation of an image repair. Signed-off-by:
Kevin Wolf <kwolf@redhat.com>
-