summaryrefslogtreecommitdiffstats
path: root/block/qcow2-cluster.c
Commit message (Collapse)AuthorAgeFilesLines
* qcow2: use start_of_cluster() and offset_into_cluster() everywhereHu Tao2013-12-061-1/+1
| | | | | | Signed-off-by: Hu Tao <hutao@cn.fujitsu.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: add flags to bdrv_*_write_zeroesPeter Lieven2013-11-281-1/+1
| | | | | | Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: fix possible corruption when reading multiple clustersPeter Lieven2013-11-141-1/+1
| | | | | | | | | | | | | | | | | | | | if multiple sectors spanning multiple clusters are read the function count_contiguous_clusters should ensure that the cluster type should not change between the clusters. Especially the for-loop should break when we have one or more normal clusters followed by a compressed cluster. Unfortunately the wrong macro was used in the mask to compare the flags. This was discovered while debugging a data corruption issue when converting a compressed qcow2 image to raw. qemu-img reads 2MB chunks which span multiple clusters. CC: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* bswap.h: Remove cpu_to_be64wu()Peter Maydell2013-11-051-1/+1
| | | | | | | | | | Replace the legacy cpu_to_be64wu() with stq_be_p(). Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Message-id: 1383669517-25598-9-git-send-email-peter.maydell@linaro.org Signed-off-by: Anthony Liguori <aliguori@amazon.com>
* qcow2: Use negated overflow check maskMax Reitz2013-10-111-9/+7
| | | | | | | | | | | In qcow2_check_metadata_overlap and qcow2_pre_write_overlap_check, change the parameter signifying the checks to perform from its current positive form to a negative one, i.e., it will no longer explicitly specify every check to perform but rather a mask of checks not to perform. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Free allocated L2 cluster on errorMax Reitz2013-10-071-0/+4
| | | | | | | | | If an error occurs in l2_allocate, the allocated (but unused) L2 cluster should be freed. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Benoit Canet <benoit@irqsave.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Switch L1 table in a single sequenceMax Reitz2013-10-021-2/+5
| | | | | | | | | | | | | | | Switching the L1 table in memory should be an atomic operation, as far as possible. Calling qcow2_free_clusters on the old L1 table on disk is not a good idea when the old L1 table is no longer valid and the address to the new one hasn't yet been written into the corresponding BDRVQcowState field. To be more specific, this can lead to segfaults due to qcow2_check_metadata_overlap trying to access the L1 table during the free operation. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Remove useless count_contiguous_clusters() parameterKevin Wolf2013-09-271-6/+6
| | | | | | | | All callers pass start = 0, and it's doubtful if any other value would actually do what you expect. Remove the parameter. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>
* qcow2: COMPRESSED on count_contiguous_clustersMax Reitz2013-09-271-4/+2
| | | | | | | | Compressed clusters can never be contiguous, therefore the corresponding flag does not need to be given explicitly to count_contiguous_clusters. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: count_contiguous_clusters and compressionMax Reitz2013-09-271-2/+5
| | | | | | | | | | | | The function is not intended to be used on compressed clusters and will not work correctly, if used anyway, since L2E_OFFSET_MASK is not the right mask for determining the offset of compressed clusters. Therefore, assert that the first cluster is not compressed and always include the compression flag in the mask of significant flags, i.e., stop the search as soon as a compressed cluster occurs. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Free only newly allocated clusters on errorMax Reitz2013-09-271-6/+10
| | | | | | | | | In expand_zero_clusters_in_l1, a new cluster is only allocated if it was not already preallocated. On error, such preallocated clusters should not be freed, but only the newly allocated ones. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Always use error path in l2_allocateMax Reitz2013-09-271-2/+3
| | | | | | | | | | Just returning -errno in some cases prevents trace_qcow2_l2_allocate_done from being executed (and, in one case, also the unused allocated L2 table from being freed). Always going down the error path fixes this. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Don't put invalid L2 table into cacheMax Reitz2013-09-271-2/+4
| | | | | | | | | | In l2_allocate, the fail path is executed if qcow2_cache_flush fails. However, the L2 table has not yet been fetched from the L2 table cache. The qcow2_cache_put in the fail path therefore basically gives an undefined argument as the L2 table address (in this case). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Correct bitmap size in zero expansionMax Reitz2013-09-271-11/+27
| | | | | | | | | | | | | | Since the expanded_clusters bitmap is addressed using host offsets in the underlying image file, the correct size to use for allocating the bitmap is not determined by the guest disk image but by the underlying host image file. Furthermore, this size may change during the expansion due to cluster allocations on growable image files. In this case, the bitmap needs to be resized as well to reflect the growth. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Assert against currently impossible overflowMax Reitz2013-09-251-0/+1
| | | | | | | | | | | | If qcow2_alloc_cluster_link_l2 is called with a QCowL2Meta describing a request crossing L2 boundaries, a buffer overflow will occur. This is impossible right now since such requests are never generated (every request is shortened to L2 boundaries before) and probably also completely unintended (considering the name "QCowL2Meta"), however, it is still worth an assertion. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2-cluster: Expand zero clustersMax Reitz2013-09-121-0/+233
| | | | | | | | | | | Add functionality for expanding zero clusters. This is necessary for downgrading the image version to one without zero cluster support. For non-backed images, this function may also just discard zero clusters instead of truly expanding them. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Pass discard type to qcow2_discard_clusters()Kevin Wolf2013-09-121-4/+4
| | | | | | | | The function will be used internally instead of only being called for guest discard requests. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
* qcow2-refcount: Repair OFLAG_COPIED errorsMax Reitz2013-08-301-2/+2
| | | | | | | | | | | Since the OFLAG_COPIED checks are now executed after the refcounts have been repaired (if repairing), it is safe to assume that they are correct but the OFLAG_COPIED flag may be not. Therefore, if its value differs from what it should be (considering the according refcount), that discrepancy can be repaired by correctly setting (or clearing that flag. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Employ metadata overlap checksMax Reitz2013-08-301-0/+21
| | | | | | | | The pre-write overlap check function is now called before most of the qcow2 writes (aborting it on collision or other error). Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Batch discardsKevin Wolf2013-06-241-4/+18
| | | | | | | | | | | | | This optimises the discard operation for freed clusters by batching discard requests (both snapshot deletion and bdrv_discard end up updating the refcounts cluster by cluster). Note that we don't discard asynchronously, but keep s->lock held. This is to avoid that a freed cluster is reallocated and written to while the discard is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Add refcount update reason to all callersKevin Wolf2013-06-241-6/+13
| | | | | | | | | | This adds a refcount update reason to all callers of update_refcounts(), so that a follow-up patch can use this information to decide whether clusters that reach a refcount of 0 should be discarded in the image file. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Catch some L1 table index overflowsKevin Wolf2013-05-141-8/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This catches the situation that is described in the bug report at https://bugs.launchpad.net/qemu/+bug/865518 and goes like this: $ qemu-img create -f qcow2 huge.qcow2 $((1024*1024))T Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off $ qemu-io /tmp/huge.qcow2 -c "write $((1024*1024*1024*1024*1024*1024 - 1024)) 512" Segmentation fault With this patch applied the segfault will be avoided, however the case will still fail, though gracefully: $ qemu-img create -f qcow2 /tmp/huge.qcow2 $((1024*1024))T Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=65536 lazy_refcounts=off qemu-img: The image size is too large for file format 'qcow2' Note that even long before these overflow checks kick in, you get insanely high memory usage (up to INT_MAX * sizeof(uint64_t) = 16 GB for the L1 table), so with somewhat smaller image sizes you'll probably see qemu aborting for a failed g_malloc(). If you need huge image sizes, you should increase the cluster size to the maximum of 2 MB in order to get higher limits. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Gather clusters in a looping loopKevin Wolf2013-03-281-31/+43
| | | | | | | | | | | | | | | | | | | | | | Instead of just checking once in exactly this order if there are dependendies, non-COW clusters and new allocation, this starts looping around these. This way we can, for example, gather non-COW clusters after new allocations as long as the host cluster offsets stay contiguous. Once handle_dependencies() is extended so that COW areas of in-flight allocations can be overwritten, this allows to continue with gathering other clusters (we wouldn't be able to do that without this change because we would have missed a possible second dependency in one of the next clusters). This means that in the typical sequential write case, we can combine the COW overwrite of one cluster with the allocation of the next cluster as soon as something like Delayed COW gets actually implemented. It is only by avoiding splitting requests this way that Delayed COW actually starts improving performance noticably. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Move cluster gathering to a non-looping loopKevin Wolf2013-03-281-64/+70
| | | | | | | | | | This patch is mainly to separate the indentation change from the semantic changes. All that really changes here is that everything moves into a while loop, all 'goto done' become 'break' and at the end of the loop a new 'break is inserted. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Allow requests with multiple l2metasKevin Wolf2013-03-281-0/+3
| | | | | | | | | | Instead of expecting a single l2meta, have a list of them. This allows to still have a single I/O request for the guest data, even though multiple l2meta may be needed in order to describe both a COW overwrite and a new cluster allocation (typical sequential write case). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Use byte granularity in qcow2_alloc_cluster_offset()Kevin Wolf2013-03-281-56/+28
| | | | | | | | | | | | | | This gets rid of the nb_clusters and keep_clusters and the associated complicated calculations. Just advance the number of bytes that have been processed and everything is fine. This patch advances the variables even after the last operation even though they aren't used any more afterwards to make things look more uniform. A later patch will turn the whole thing into a loop and then it actually starts making sense. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Prepare handle_alloc/copied() for byte granularityKevin Wolf2013-03-281-9/+16
| | | | | | | | | | This makes handle_alloc() and handle_copied() return byte-granularity host offsets instead of returning always the cluster start. This is required so that qcow2_alloc_cluster_offset() can stop aligning everything to cluster boundaries. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: handle_copied(): Implement non-zero host_offsetKevin Wolf2013-03-281-8/+20
| | | | | | | Look only for clusters that start at a given physical offset. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: handle_copied(): Get rid of keep_clusters parameterKevin Wolf2013-03-281-10/+13
| | | | | | | | Now *bytes is used to return the length of the area that can be written to without performing an allocation or COW. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: handle_copied(): Get rid of nb_clusters parameterKevin Wolf2013-03-281-6/+18
| | | | | | | | handle_copied() uses its bytes parameter now to determine how many clusters it should try to find. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Factor out handle_copied()Kevin Wolf2013-03-281-40/+94
| | | | | Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Clean up handle_alloc()Kevin Wolf2013-03-281-57/+53
| | | | | | | Things can be simplified a bit now. No semantic changes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Finalise interface of handle_alloc()Kevin Wolf2013-03-281-13/+16
| | | | | | | | The interface works completely on a byte granularity now and duplicated parameters are removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: handle_alloc(): Get rid of keep_clusters parameterKevin Wolf2013-03-281-17/+27
| | | | | | | | | handle_alloc() is now called with the offset at which the actual new allocation starts instead of the offset at which the whole write request starts, part of which may already be processed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: handle_alloc(): Get rid of nb_clusters parameterKevin Wolf2013-03-281-4/+15
| | | | | | | We already communicate the same information in *bytes. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Factor out handle_alloc()Kevin Wolf2013-03-281-89/+151
| | | | | Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Decouple cluster allocation from cluster reuse codeKevin Wolf2013-03-281-15/+20
| | | | | | | | | This moves some code that prepares the allocation of new clusters to where the actual allocation happens. This is the minimum required to be able to move it to a separate function in the next patch. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Change handle_dependency to byte granularityKevin Wolf2013-03-281-12/+28
| | | | | | | | | | | This is a more precise description of what really constitutes a dependency. The behaviour doesn't change at this point because the COW area of the old request is still aligned to cluster boundaries and therefore an overlap is detected wheneven the requests touch any part of the same cluster. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Improve check for overlapping allocationsKevin Wolf2013-03-281-1/+1
| | | | | | | | | The old code detected an overlapping allocation even when the allocations didn't actually overlap, but were only adjacent. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Handle dependencies earlierKevin Wolf2013-03-281-16/+43
| | | | | | | | | | | | | | | | | | | | | | | | | Handling overlapping allocations isn't just a detail of cluster allocation. It is rather one of three ways to get the host cluster offset for a write request: 1. If a request overlaps an in-flight allocations, the cluster offset can be taken from there (this is what handle_dependencies will evolve into) or the request must just wait until the allocation has completed. Accessing the L2 is not valid in this case, it has outdated information. 2. Outside overlapping areas, check the clusters that can be written to as they are, with no COW involved. 3. If a COW is required, allocate new clusters Changing the code to reflect this doesn't change the behaviour because overlaps cannot exist for clusters that are kept in step 2. It does however make it easier for later patches to work on clusters that belong to an allocation that is still in flight. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: make is_allocated return true for zero clustersPaolo Bonzini2013-03-151-0/+3
| | | | | | | | | | | | | Otherwise, live migration of the top layer will miss zero clusters and let the backing file show through. This also matches what is done in qed. QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files. Check this directly in qcow2_get_cluster_offset instead of replicating the test everywhere. Cc: qemu-stable@nongnu.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* qcow2: Allow lazy refcounts to be enabled on the command lineKevin Wolf2013-03-151-1/+1
| | | | | | | | | | | | | | | | qcow2 images now accept a boolean lazy_refcounts options. Use it like this: -drive file=test.qcow2,lazy_refcounts=on If the option is specified on the command line, it overrides the default specified by the qcow2 header flags that were set when creating the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: move include files to include/block/Paolo Bonzini2012-12-191-1/+1
| | | | Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
* qcow2: Factor out handle_dependencies()Kevin Wolf2012-12-131-28/+42
| | | | Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2Kevin Wolf2012-12-131-1/+4
| | | | | | | | This is closer to where the dirty flag is really needed, and it avoids having checks for special cases related to cluster allocation directly in the writev loop. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Allocate l2meta only for cluster allocationsKevin Wolf2012-12-131-14/+9
| | | | | | | | | Even for writes to already allocated clusters, an l2meta is allocated, though it stays effectively unused. After this patch, only allocating requests still have one. Each l2meta now describes an in-flight request that writes to clusters that are not yet hooked up in the L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Drop l2meta.cluster_offsetKevin Wolf2012-12-131-4/+6
| | | | | | | | There's no real reason to have an l2meta for normal requests that don't allocate anything. Before we can get rid of it, we must return the host cluster offset in a different way. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Introduce Qcow2COWRegionKevin Wolf2012-12-131-30/+53
| | | | | | | | This makes it easier to address the areas for which a COW must be performed. As a nice side effect, the COW code in qcow2_alloc_cluster_link_l2 becomes really trivial. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Round QCowL2Meta.offset down to cluster boundaryKevin Wolf2012-12-131-2/+2
| | | | | | | | The offset within the cluster is already present as n_start and this is what the code uses. QCowL2Meta.offset is only needed at a cluster granularity. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: implement lazy refcountsStefan Hajnoczi2012-08-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Lazy refcounts is a performance optimization for qcow2 that postpones refcount metadata updates and instead marks the image dirty. In the case of crash or power failure the image will be left in a dirty state and repaired next time it is opened. Reducing metadata I/O is important for cache=writethrough and cache=directsync because these modes guarantee that data is on disk after each write (hence we cannot take advantage of caching updates in RAM). Refcount metadata is not needed for guest->file block address translation and therefore does not need to be on-disk at the time of write completion - this is the motivation behind the lazy refcount optimization. The lazy refcount optimization must be enabled at image creation time: qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough Update qemu-iotests 031 and 036 since the extension header size changes when we add feature bit table entries. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
OpenPOWER on IntegriCloud