| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Lazy refcounts is a performance optimization for qcow2 that postpones
refcount metadata updates and instead marks the image dirty. In the
case of crash or power failure the image will be left in a dirty state
and repaired next time it is opened.
Reducing metadata I/O is important for cache=writethrough and
cache=directsync because these modes guarantee that data is on disk
after each write (hence we cannot take advantage of caching updates in
RAM). Refcount metadata is not needed for guest->file block address
translation and therefore does not need to be on-disk at the time of
write completion - this is the motivation behind the lazy refcount
optimization.
The lazy refcount optimization must be enabled at image creation time:
qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on a.qcow2 10G
qemu-system-x86_64 -drive if=virtio,file=a.qcow2,cache=writethrough
Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds an incompatible feature bit to mark images that have not
been closed cleanly. When a dirty image file is opened a consistency
check and repair is performed.
Update qemu-iotests 031 and 036 since the extension header size changes
when we add feature bit table entries.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Writethrough does not need special-casing anymore in the qcow2 caches.
The block layer adds flushes after every guest-initiated data write,
and these will also flush the qcow2 caches to the OS.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
| |
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
| |
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
| |
Instead of printing an ugly bitmask, qemu can now print a more helpful
string even for yet unknown features.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
| |
This adds support for reading zero clusters in version 3 images.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the basic infrastructure to qcow2 to handle version 3 images.
It includes code to create v3 images, allow header updates for v3 images
and checks feature bits.
It still misses support for zero clusters, so this is not a fully
compliant implementation of v3 yet.
The default for creating new images stays at v2 for now.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
| |
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
| |
With this change, reading from a qcow2 image ignores all reserved bits
that are set in an L1 or L2 table entry.
Now get_cluster_offset() assigns *cluster_offset only the offset without
any other flags. The cluster type is not longer encoded in the offset,
but a positive return value in case of success.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This allows that different snapshots of an image can have different
sizes, which is a requirement for enabling image resizing even with
images that have internal snapshots.
We don't do the actual support for it now, but make sure that the
additional field is present and not completely ignored in all version 3
images. When trying to load a snapshot of different size, it returns
an error.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If the first part of a write request is allocated, but the second isn't
and it can be allocated so that the resulting area is contiguous, handle
it at once. This is a common case for sequential writes.
After this patch, alloc_cluster_offset() only checks if the clusters are
already allocated or how many new clusters can be allocated contigouosly.
The actual cluster allocation is split off into a new function
do_alloc_cluster_offset().
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
This function allows to allocate clusters at a given offset in the image
file. This is useful if you want to allocate the second part of an area
that must be contiguous.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
If we want header extensions to work as compatible extensions, we can't
destroy yet unknown header extensions when rewriting the header (e.g.
for changing the backing file). Save all unknown header extensions in a
list of blobs and include them in a new header.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to switch the backing file, qcow2 issues multiple write
requests that only changed a part of the image header. Any failure after
the first one would leave the header in an corrupted state. With this
patch, the whole header is written at once, so we can't fail in the
middle.
At the same time, this gives us a reusable functions that updates all
fields of the qcow2 header and not only the backing file.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
| |
This is a compatible extension to the snapshot header format that allows
saving a 64 bit VM state size.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
| |
We don't reopen the actual file, but instead invoke the close and open routines.
We specifically ignore the backing file since it's contents are read-only and
therefore immutable.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
|
|
|
| |
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
| |
Signed-off-by: Frediano Ziglio <freddy77@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
| |
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
| |
In snapshotting there is no guest involved, so we can safely use a writeback
mode and do the flushes in the right place (i.e. at the very end). This
improves the time that creating/restoring an internal snapshot takes with an
image in writethrough mode.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When not specifying a cluster size on the command line, qemu-img printed
a cluster size of 0:
Formatting '/tmp/test.qcow2', fmt=qcow2 size=67108864
encryption=off cluster_size=0
This patch adds the default cluster size to the QEMUOptionParameter list, so
that it displays the default value that is used.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
| |
This adds a bdrv_discard function to qcow2 that frees the discarded clusters.
It does not yet pass the discard on to the underlying file system driver, but
the space can be reused by future writes to the image.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
| |
qcow2 calls bdrv_flush() after performing COW in order to ensure that the
L2 table change is never written before the copy is safe on disk. Now that the
L2 table is cached, we can wait with flushing until we write out the next L2
table.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
| |
Use the new functions of qcow2-cache.c for everything that works on refcount
block and L2 tables.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds some new cache functions to qcow2 which can be used for caching
refcount blocks and L2 tables. When used with cache=writethrough they work
like the old caching code which is spread all over qcow2, so for this case we
have merely a cleanup.
The interesting case is with writeback caching (this includes cache=none) where
data isn't written to disk immediately but only kept in cache initially. This
leads to some form of metadata write batching which avoids the current "write
to refcount block, flush, write to L2 table" pattern for each single request
when a lot of cluster allocations happen. Instead, cache entries are only
written out if its required to maintain the right order. In the pure cluster
allocation case this means that all metadata updates for requests are done in
memory initially and on sync, first the refcount blocks are written to disk,
then fsync, then L2 tables.
This improves performance of scenarios with lots of cluster allocations
noticably (e.g. installation or after taking a snapshot).
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
| |
All drivers use bs->file instead of s->hd for quite a while now, so it's time
to remove s->hd.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to backup snapshots, created from QCOW2 iamge, we want to copy snapshots out of QCOW2 disk to a seperate storage.
The following patch adds a new option in "qemu-img": qemu-img convert -f qcow2 -O qcow2 -s snapshot_name src_img bck_img.
Right now, it only supports to copy the full snapshot, delta snapshot is on the way.
Changes from V1: all the comments from Kevin are addressed:
Add read-only checking
Fix coding style
Change the name from bdrv_snapshot_load to bdrv_snapshot_load_tmp
Signed-off-by: Disheng Su <edison@cloud.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The L1 table grow operation includes a size calculation that bumps up
the new L1 table size in order to anticipate the size needs of vmstate
data. This helps reduce the number of times that the L1 table has to be
grown when vmstate data is appended.
This size overhead is not necessary during image creation,
bdrv_truncate(), or snapshot goto operations. In fact, existing
qemu-iotests that exercise table growth are no longer able to trigger it
because image creation preallocates an L1 table that is too large after
changes to qcow_create2().
This patch keeps the size calculation but also adds exact growth for
callers that do not want to inflate the L1 table size unnecessarily.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
| |
qcow2 used to use bounce buffers for any AIO requests. This does not only imply
unnecessary copying, but also unbounded allocations which should be avoided.
This patch removes bounce buffers from the normal AIO read path, and constrains
them to a constant size for encrypted images.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
| |
This distinguishes between harmless leaks and real corruption. Hopefully users
better understand what qemu-img check wants to tell them.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
qcow2_get_cluster_offset() looks up a given virtual disk offset and returns the
offset of the corresponding cluster in the image file. Errors (e.g. L2 table
can't be read) are currenctly indicated by a return value of 0, which is
unfortuately the same as for any unallocated cluster. So in effect we can't
check for errors.
This makes the old return value a by-reference parameter and returns the usual
0/-errno error code.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds the ability to grow qcow2 images in-place using
bdrv_truncate(). This enables qemu-img resize command support for
qcow2.
Snapshots are not supported and bdrv_truncate() will return -ENOTSUP.
The notion of resizing an image with snapshots could lead to confusion:
users may expect snapshots to remain unchanged, but this is not possible
with the current qcow2 on-disk format where the header.size field is
global instead of per-snapshot. Others may expect snapshots to change
size along with the current image data. I think it is safest to not
support snapshots and perhaps add behavior later if there is a
consensus.
Backing images continue to work. If the image is now larger than its
backing image, zeroes are read when accessing beyond the end of the
backing image.
Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Format drivers shouldn't need to bother with things like file names, but rather
just get an open BlockDriverState for the underlying protocol. This patch
introduces this behaviour for bdrv_open implementation. For protocols which
need to access the filename to open their file/device/connection/... a new
callback bdrv_file_open is introduced which doesn't get an underlying file
opened.
For now, also some of the more obscure formats use bdrv_file_open because they
open() the file themselves instead of using the block.c functions. They need to
be fixed in later patches.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
|
|
|
|
|
|
|
|
| |
Checking for return codes < 0 isn't really going to work with unsigned
types. Use signed types instead.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Returning 0/-errno allows it to distingush different errors classes. The
cluster offset of newly allocated clusters is now returned in the QCowL2Meta
struct.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
|
|
|
|
|
|
| |
It was merely a workaround and the real fix is done now.
This reverts commit ef845c3bf421290153154635dc18eaa677cecb43.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the synchronous read and write functions were dropped, they were replaced
by generic emulation functions. Unfortunately, these emulation functions don't
provide the same semantics as the original functions did.
The original bdrv_read would mean that we read some data synchronously and that
we won't be interrupted during this read. The latter assumption is no longer
true with the emulation function which needs to use qemu_aio_poll and therefore
allows the callback of any other concurrent AIO request to be run during the
read. Which in turn means that (meta)data read earlier could have changed and
be invalid now. qcow2 is not prepared to work in this way and it's just scary
how many places there are where other requests could run.
I'm not sure yet where exactly it breaks, but you'll see breakage with virtio
on qcow2 with a backing file. Providing synchronous functions again fixes the
problem for me.
Patchworks-ID: 35437
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch increases the maximum qcow2 cluster size to 2 MB. Starting with 128k
clusters, L2 tables span 2 GB or more of virtual disk space, causing 32 bit
truncation and wraparound of signed integers. Therefore some variables need to
use a larger data type.
While being at reviewing data types, change some integers that are used for
array indices to unsigned. In some places they were checked against some upper
limit but not for negative values. This could avoid potential segfaults with
corrupted qcow2 images.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Problem: Our file sys-queue.h is a copy of the BSD file, but there are
some additions and it's not entirely compatible. Because of that, there have
been conflicts with system headers on BSD systems. Some hacks have been
introduced in the commits 15cc9235840a22c289edbe064a9b3c19c5f49896,
f40d753718c72693c5f520f0d9899f6e50395e94,
96555a96d724016e13190b28cffa3bc929ac60dc and
3990d09adf4463eca200ad964cc55643c33feb50 but the fixes were fragile.
Solution: Avoid the conflict entirely by renaming the functions and the
file. Revert the previous hacks.
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When two AIO requests write to the same cluster, and this cluster is
unallocated, currently both requests allocate a new cluster and the second one
merges the first one when it is completed. This means an cluster allocation, a
read and a cluster deallocation which cause some overhead. If we simply let the
second request wait until the first one is done, we improve overall performance
with AIO requests (specifially, qcow2/virtio combinations).
This patch maintains a list of in-flight requests that have allocated new
clusters. A second request touching the same cluster is limited so that it
either doesn't touch the allocation of the first request (so it can have a
non-overlapping allocation) or it waits for the first request to complete.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
|
|
|
|
|
| |
Updated to use C99 comments.
Signed-off-by: Filip Navara <filip.navara@gmail.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The qcow2 source is now split into several more manageable files. During the
conversion quite some functions that were static before needed to be changed to
be global to make the source compile again.
We were lucky enough not to get name conflicts with these additional global
names, but they are not nice. This patch adds a qcow2_ prefix to all of the
global functions in qcow2.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
|
|
|
|
|
| |
qcow2-snapshot.c contains the code related to snapshotting.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
qcow2-cluster.c contains all functions related to the management of guest
clusters, i.e. what the guest sees on its virtual disk. This code is about
mapping these guest clusters to host clusters in the image file using the
two-level lookup tables.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|
|
qcow2-refcount.c contains all functions which are related to cluster
allocation and management in the image file. A large part of this is the
reference counting of these clusters.
Also a header file qcow2.h is introduced which will contain the interface of
the split qcow2 modules.
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
|