summaryrefslogtreecommitdiffstats
path: root/block/qcow2.h
Commit message (Collapse)AuthorAgeFilesLines
* qcow2: Catch some L1 table index overflowsKevin Wolf2013-05-141-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This catches the situation that is described in the bug report at https://bugs.launchpad.net/qemu/+bug/865518 and goes like this: $ qemu-img create -f qcow2 huge.qcow2 $((1024*1024))T Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off $ qemu-io /tmp/huge.qcow2 -c "write $((1024*1024*1024*1024*1024*1024 - 1024)) 512" Segmentation fault With this patch applied the segfault will be avoided, however the case will still fail, though gracefully: $ qemu-img create -f qcow2 /tmp/huge.qcow2 $((1024*1024))T Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off qemu-img: The image size is too large for file format 'qcow2' Note that even long before these overflow checks kick in, you get insanely high memory usage (up to INT_MAX * sizeof(uint64_t) = 16 GB for the L1 table), so with somewhat smaller image sizes you'll probably see qemu aborting for a failed g_malloc(). If you need huge image sizes, you should increase the cluster size to the maximum of 2 MB in order to get higher limits. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* aes: move aes.h from include/block to include/qemuAurelien Jarno2013-04-131-1/+1
| | | | | | | | | | | Move aes.h from include/block to include/qemu to show it can be reused by other subsystems. Cc: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Edgar E. Iglesias <edgar.iglesias@gmail.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* qcow2: Allow requests with multiple l2metasKevin Wolf2013-03-281-0/+3
| | | | | | | | | | Instead of expecting a single l2meta, have a list of them. This allows to still have a single I/O request for the guest data, even though multiple l2meta may be needed in order to describe both a COW overwrite and a new cluster allocation (typical sequential write case). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Finalise interface of handle_alloc()Kevin Wolf2013-03-281-0/+5
| | | | | | | | The interface works completely on a byte granularity now and duplicated parameters are removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: handle_alloc(): Get rid of keep_clusters parameterKevin Wolf2013-03-281-0/+5
| | | | | | | | | handle_alloc() is now called with the offset at which the actual new allocation starts instead of the offset at which the whole write request starts, part of which may already be processed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Change handle_dependency to byte granularityKevin Wolf2013-03-281-0/+11
| | | | | | | | | | | This is a more precise description of what really constitutes a dependency. The behaviour doesn't change at this point because the COW area of the old request is still aligned to cluster boundaries and therefore an overlap is detected wheneven the requests touch any part of the same cluster. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Handle dependencies earlierKevin Wolf2013-03-281-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Handling overlapping allocations isn't just a detail of cluster allocation. It is rather one of three ways to get the host cluster offset for a write request: 1. If a request overlaps an in-flight allocations, the cluster offset can be taken from there (this is what handle_dependencies will evolve into) or the request must just wait until the allocation has completed. Accessing the L2 is not valid in this case, it has outdated information. 2. Outside overlapping areas, check the clusters that can be written to as they are, with no COW involved. 3. If a COW is required, allocate new clusters Changing the code to reflect this doesn't change the behaviour because overlaps cannot exist for clusters that are kept in step 2. It does however make it easier for later patches to work on clusters that belong to an allocation that is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Fix segfault in qcow2_invalidate_cacheKevin Wolf2013-03-191-0/+3
| | | | | | | Need to pass an options QDict to qcow2_open() now. This fixes a segfault on the migration target with qcow2. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Allow lazy refcounts to be enabled on the command lineKevin Wolf2013-03-151-0/+1
| | | | | | | | | | | | | | | | qcow2 images now accept a boolean lazy_refcounts options. Use it like this: -drive file=test.qcow2,lazy_refcounts=on If the option is specified on the command line, it overrides the default specified by the qcow2 header flags that were set when creating the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: move include files to include/block/Paolo Bonzini2012-12-191-2/+2
| | | | Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2Kevin Wolf2012-12-131-0/+2
| | | | | | | | This is closer to where the dirty flag is really needed, and it avoids having checks for special cases related to cluster allocation directly in the writev loop. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Allocate l2meta only for cluster allocationsKevin Wolf2012-12-131-2/+5
| | | | | | | | | Even for writes to already allocated clusters, an l2meta is allocated, though it stays effectively unused. After this patch, only allocating requests still have one. Each l2meta now describes an in-flight request that writes to clusters that are not yet hooked up in the L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Drop l2meta.cluster_offsetKevin Wolf2012-12-131-4/+1
| | | | | | | | There's no real reason to have an l2meta for normal requests that don't allocate anything. Before we can get rid of it, we must return the host cluster offset in a different way. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Introduce Qcow2COWRegionKevin Wolf2012-12-131-6/+23
| | | | | | | | This makes it easier to address the areas for which a COW must be performed. As a nice side effect, the COW code in qcow2_alloc_cluster_link_l2 becomes really trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Round QCowL2Meta.offset down to cluster boundaryKevin Wolf2012-12-131-0/+22
| | | | | | | | The offset within the cluster is already present as n_start and this is what the code uses. QCowL2Meta.offset is only needed at a cluster granularity. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: implement lazy refcountsStefan Hajnoczi2012-08-061-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Lazy refcounts is a performance optimization for qcow2 that postpones refcount metadata updates and instead marks the image dirty. In the case of crash or power failure the image will be left in a dirty state and repaired next time it is opened. Reducing metadata I/O is important for cache=writethrough and cache=directsync because these modes guarantee that data is on disk after each write (hence we cannot take advantage of caching updates in RAM). Refcount metadata is not needed for guest->file block address translation and therefore does not need to be on-disk at the time of write completion - this is the motivation behind the lazy refcount optimization. The lazy refcount optimization must be enabled at image creation time: qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough Update qemu-iotests 031 and 036 since the extension header size changes when we add feature bit table entries. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: introduce dirty bitStefan Hajnoczi2012-08-061-0/+8
| | | | | | | | | | | | This patch adds an incompatible feature bit to mark images that have not been closed cleanly. When a dirty image file is opened a consistency check and repair is performed. Update qemu-iotests 031 and 036 since the extension header size changes when we add feature bit table entries. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: always operate caches in writeback modePaolo Bonzini2012-06-151-4/+1
| | | | | | | | | Writethrough does not need special-casing anymore in the qcow2 caches. The block layer adds flushes after every guest-initiated data write, and these will also flush the qcow2 caches to the OS. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Support for fixing refcount inconsistenciesKevin Wolf2012-06-151-1/+2
| | | | Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Zero write supportKevin Wolf2012-04-201-0/+1
| | | | Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Support for feature table header extensionKevin Wolf2012-04-201-0/+12
| | | | | | | Instead of printing an ugly bitmask, qemu can now print a more helpful string even for yet unknown features. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Support reading zero clustersKevin Wolf2012-04-201-0/+5
| | | | | | This adds support for reading zero clusters in version 3 images. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Version 3 imagesKevin Wolf2012-04-201-1/+16
| | | | | | | | | | | | | This adds the basic infrastructure to qcow2 to handle version 3 images. It includes code to create v3 images, allow header updates for v3 images and checks feature bits. It still misses support for zero clusters, so this is not a fully compliant implementation of v3 yet. The default for creating new images stays at v2 for now. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Ignore reserved bits in refcount table entriesKevin Wolf2012-04-201-0/+2
| | | | Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Ignore reserved bits in get_cluster_offsetKevin Wolf2012-04-201-0/+21
| | | | | | | | | | | With this change, reading from a qcow2 image ignores all reserved bits that are set in an L1 or L2 table entry. Now get_cluster_offset() assigns *cluster_offset only the offset without any other flags. The cluster type is not longer encoded in the offset, but a positive return value in case of success. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Save disk size in snapshot headerKevin Wolf2012-04-201-0/+1
| | | | | | | | | | | | | This allows that different snapshots of an image can have different sizes, which is a requirement for enabling image resizing even with images that have internal snapshots. We don't do the actual support for it now, but make sure that the additional field is present and not completely ignored in all version 3 images. When trying to load a snapshot of different size, it returns an error. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Reduce number of I/O requestsKevin Wolf2012-03-121-0/+1
| | | | | | | | | | | | | | If the first part of a write request is allocated, but the second isn't and it can be allocated so that the resulting area is contiguous, handle it at once. This is a common case for sequential writes. After this patch, alloc_cluster_offset() only checks if the clusters are already allocated or how many new clusters can be allocated contigouosly. The actual cluster allocation is split off into a new function do_alloc_cluster_offset(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
* qcow2: Add qcow2_alloc_clusters_at()Kevin Wolf2012-03-121-0/+2
| | | | | | | | | This function allows to allocate clusters at a given offset in the image file. This is useful if you want to allocate the second part of an area that must be contiguous. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
* qcow2: Keep unknown header extension when rewriting headerKevin Wolf2012-02-091-0/+8
| | | | | | | | | If we want header extensions to work as compatible extensions, we can't destroy yet unknown header extensions when rewriting the header (e.g. for changing the backing file). Save all unknown header extensions in a list of blobs and include them in a new header. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Update whole header at onceKevin Wolf2012-02-091-0/+1
| | | | | | | | | | | | | In order to switch the backing file, qcow2 issues multiple write requests that only changed a part of the image header. Any failure after the first one would leave the header in an corrupted state. With this patch, the whole header is written at once, so we can't fail in the middle. At the same time, this gives us a reusable functions that updates all fields of the qcow2 header and not only the backing file. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Allow >4 GB VM stateKevin Wolf2011-12-151-1/+1
| | | | | | | This is a compatible extension to the snapshot header format that allows saving a 64 bit VM state size. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: implement bdrv_invalidate_cache (v2)Anthony Liguori2011-11-211-0/+2
| | | | | | | | We don't reopen the actual file, but instead invoke the close and open routines. We specifically ignore the backing file since it's contents are read-only and therefore immutable. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
* qcow2: removed unused depends_on fieldFrediano Ziglio2011-09-121-1/+0
| | | | | Signed-off-by: Frediano Ziglio <freddy77@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: remove unused qcow2_create_refcount_update functionFrediano Ziglio2011-08-251-2/+0
| | | | | Signed-off-by: Frediano Ziglio <freddy77@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Use coroutinesKevin Wolf2011-08-021-1/+4
| | | | Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Use Qcow2Cache in writeback mode during loadvm/savevmKevin Wolf2011-07-191-0/+2
| | | | | | | | | In snapshotting there is no guest involved, so we can safely use a writeback mode and do the flushes in the right place (i.e. at the very end). This improves the time that creating/restoring an internal snapshot takes with an image in writethrough mode. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qemu-img create: Fix displayed default cluster sizeKevin Wolf2011-06-081-0/+2
| | | | | | | | | | | | | When not specifying a cluster size on the command line, qemu-img printed a cluster size of 0: Formatting '/tmp/test.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=0 This patch adds the default cluster size to the QEMUOptionParameter list, so that it displays the default value that is used. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Add bdrv_discard supportKevin Wolf2011-01-311-0/+2
| | | | | | | | | This adds a bdrv_discard function to qcow2 that frees the discarded clusters. It does not yet pass the discard on to the underlying file system driver, but the space can be reused by future writes to the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
* qcow2: Batch flushes for COWKevin Wolf2011-01-241-0/+1
| | | | | | | | | qcow2 calls bdrv_flush() after performing COW in order to ensure that the L2 table change is never written before the copy is safe on disk. Now that the L2 table is cached, we can wait with flushing until we write out the next L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Use QcowCacheKevin Wolf2011-01-241-5/+7
| | | | | | | Use the new functions of qcow2-cache.c for everything that works on refcount block and L2 tables. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Add QcowCacheKevin Wolf2011-01-241-0/+19
| | | | | | | | | | | | | | | | | | | | | | This adds some new cache functions to qcow2 which can be used for caching refcount blocks and L2 tables. When used with cache=writethrough they work like the old caching code which is spread all over qcow2, so for this case we have merely a cleanup. The interesting case is with writeback caching (this includes cache=none) where data isn't written to disk immediately but only kept in cache initially. This leads to some form of metadata write batching which avoids the current "write to refcount block, flush, write to L2 table" pattern for each single request when a lot of cluster allocations happen. Instead, cache entries are only written out if its required to maintain the right order. In the pure cluster allocation case this means that all metadata updates for requests are done in memory initially and on sync, first the refcount blocks are written to disk, then fsync, then L2 tables. This improves performance of scenarios with lots of cluster allocations noticably (e.g. installation or after taking a snapshot). Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Remove unused s->hd in various driversKevin Wolf2010-11-241-1/+0
| | | | | | | | All drivers use bs->file instead of s->hd for quite a while now, so it's time to remove s->hd. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
* Copy snapshots out of QCOW2 diskedison2010-10-221-0/+1
| | | | | | | | | | | | | | In order to backup snapshots, created from QCOW2 iamge, we want to copy snapshots out of QCOW2 disk to a seperate storage. The following patch adds a new option in "qemu-img": qemu-img convert -f qcow2 -O qcow2 -s snapshot_name src_img bck_img. Right now, it only supports to copy the full snapshot, delta snapshot is on the way. Changes from V1: all the comments from Kevin are addressed: Add read-only checking Fix coding style Change the name from bdrv_snapshot_load to bdrv_snapshot_load_tmp Signed-off-by: Disheng Su <edison@cloud.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Support exact L1 table growthStefan Hajnoczi2010-10-221-1/+1
| | | | | | | | | | | | | | | | | | | The L1 table grow operation includes a size calculation that bumps up the new L1 table size in order to anticipate the size needs of vmstate data. This helps reduce the number of times that the L1 table has to be grown when vmstate data is appended. This size overhead is not necessary during image creation, bdrv_truncate(), or snapshot goto operations. In fact, existing qemu-iotests that exercise table growth are no longer able to trigger it because image creation preallocates an L1 table that is too large after changes to qcow_create2(). This patch keeps the size calculation but also adds exact growth for callers that do not want to inflate the L1 table size unnecessarily. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Avoid bounce buffers for AIO read requestsKevin Wolf2010-09-211-2/+2
| | | | | | | | | | qcow2 used to use bounce buffers for any AIO requests. This does not only imply unnecessary copying, but also unbounded allocations which should be avoided. This patch removes bounce buffers from the normal AIO read path, and constrains them to a constant size for encrypted images. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2/vdi: Change check to distinguish error casesKevin Wolf2010-07-061-1/+1
| | | | | | | This distinguishes between harmless leaks and real corruption. Hopefully users better understand what qemu-img check wants to tell them. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Allow qcow2_get_cluster_offset to return errorsKevin Wolf2010-05-281-2/+2
| | | | | | | | | | | | | qcow2_get_cluster_offset() looks up a given virtual disk offset and returns the offset of the corresponding cluster in the image file. Errors (e.g. L2 table can't be read) are currenctly indicated by a return value of 0, which is unfortuately the same as for any unallocated cluster. So in effect we can't check for errors. This makes the old return value a by-reference parameter and returns the usual 0/-errno error code. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Implement bdrv_truncate() for growing imagesStefan Hajnoczi2010-05-031-0/+6
| | | | | | | | | | | | | | | | | | | | | | This patch adds the ability to grow qcow2 images in-place using bdrv_truncate(). This enables qemu-img resize command support for qcow2. Snapshots are not supported and bdrv_truncate() will return -ENOTSUP. The notion of resizing an image with snapshots could lead to confusion: users may expect snapshots to remain unchanged, but this is not possible with the current qcow2 on-disk format where the header.size field is global instead of per-snapshot. Others may expect snapshots to change size along with the current image data. I think it is safest to not support snapshots and perhaps add behavior later if there is a consensus. Backing images continue to work. If the image is now larger than its backing image, zeroes are read when accessing beyond the end of the backing image. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Open the underlying image file in generic codeKevin Wolf2010-05-031-1/+1
| | | | | | | | | | | | | | | Format drivers shouldn't need to bother with things like file names, but rather just get an open BlockDriverState for the underlying protocol. This patch introduces this behaviour for bdrv_open implementation. For protocols which need to access the filename to open their file/device/connection/... a new callback bdrv_file_open is introduced which doesn't get an underlying file opened. For now, also some of the more obscure formats use bdrv_file_open because they open() the file themselves instead of using the block.c functions. They need to be fixed in later patches. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Fix signedness bugsKevin Wolf2010-02-101-4/+2
| | | | | | | | Checking for return codes < 0 isn't really going to work with unsigned types. Use signed types instead. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
OpenPOWER on IntegriCloud