summaryrefslogtreecommitdiffstats
path: root/block
Commit message (Collapse)AuthorAgeFilesLines
* qcow2: Fix in-flight list after qcow2_cache_put failureKevin Wolf2011-06-151-4/+8
| | | | | | | | | | | | | If qcow2_cache_put returns an error during cluster allocation and the allocation fails, it must be removed from the list of in-flight allocations. Otherwise we'd get a loop in the list when the ACB is used for the next allocation. Luckily, this qcow2_cache_put shouldn't fail anyway because the L2 table is only read, so that qcow2_cache_put doesn't even involve I/O. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* vdi: Avoid direct AIO callbackKevin Wolf2011-06-151-5/+36
| | | | | | | | bdrv_aio_* must not call the callback before returning to its caller. In vdi, this could happen in some error cases. This starts the real requests processing in a BH to avoid this situation. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow: Avoid direct AIO callbackKevin Wolf2011-06-141-2/+56
| | | | | | | | bdrv_aio_* must not call the callback before returning to its caller. In qcow, this could happen in some error cases. This starts the real requests processing in a BH to avoid this situation. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Avoid direct AIO callbackKevin Wolf2011-06-141-9/+30
| | | | | | | | bdrv_aio_* must not call the callback before returning to its caller. In qcow2, this could happen in some error cases. This starts the real requests processing in a BH to avoid this situation. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/rbd: Remove unused local variableStefan Weil2011-06-141-4/+0
| | | | | | | | | | | | Variable 'snap' is assigned a value that is never used. Remove snap and the related code. Cc: Christian Brunner <chb@muc.de> Cc: Josh Durgin <josh.durgin@dreamhost.com> Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Weil <weil@mail.berlios.de> Reviewed-by: Josh Durgin <josh.durgin@dreamhost.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Merge remote-tracking branch 'stefanha/trivial-patches' into stagingAnthony Liguori2011-06-081-2/+0
|\
| * Fix compilation warning due to missing header for sigaction (followup)Alexandre Raymond2011-06-081-2/+0
| | | | | | | | | | | | | | | | This patch removes all references to signal.h when qemu-common.h is included as they become redundant. Signed-off-by: Alexandre Raymond <cerbere@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
* | qemu-img create: Fix displayed default cluster sizeKevin Wolf2011-06-084-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | When not specifying a cluster size on the command line, qemu-img printed a cluster size of 0: Formatting '/tmp/test.qcow2', fmt=qcow2 size=67108864 encryption=off cluster_size=0 This patch adds the default cluster size to the QEMUOptionParameter list, so that it displays the default value that is used. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | qcow2: Fix memory leaks in error casesKevin Wolf2011-06-082-4/+7
| | | | | | | | | | | | | | This fixes memory leaks that may be caused by I/O errors during L1 table growth (can happen during save_vm) and in qemu-img check. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | rbd: Add bdrv_truncate implementationJosh Durgin2011-06-081-0/+14
| | | | | | | | | | | | Reviewed-by: Christian Brunner <chb@muc.de> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | rbd: check return values when scheduling aioJosh Durgin2011-06-081-4/+20
| | | | | | | | | | | | | | | | | | | | If scheduling fails, the number of outstanding I/Os must be correct, or there will be a hang when waiting for everything to be flushed. Reviewed-by: Christian Brunner <chb@muc.de> Reported-by: Stefan Hajnoczi <stefanha@gmail.com> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | rbd: allow configuration of rados from the rbd filenameJosh Durgin2011-06-081-17/+102
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The new format is rbd:pool/image[@snapshot][:option1=value1[:option2=value2...]] Each option is used to configure rados, and may be any Ceph option, or "conf". The "conf" option specifies a Ceph configuration file to read. This allows rbd volumes from more than one Ceph cluster to be used by specifying different monitor addresses, as well as having different logging levels or locations for different volumes. Reviewed-by: Christian Brunner <chb@muc.de> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | rbd: use the higher level librbd instead of just libradosJosh Durgin2011-06-082-648/+218
| | | | | | | | | | | | | | | | | | | | | | | | | | | | librbd stacks on top of librados to provide access to rbd images. Using librbd simplifies the qemu code, and allows qemu to use new versions of the rbd format with few (if any) changes. Reviewed-by: Christian Brunner <chb@muc.de> Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com> Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block/raw-posix: get right partition sizeChristoph Egger2011-06-081-0/+32
| | | | | | | | | | | | | | | | use the correct way to get the size of a disk device or partition From: Adam Hamsik <haad@netbsd.org> Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block/raw-posix: use a character device if a block device is givenChristoph Egger2011-06-081-0/+43
| | | | | | | | | | | | | | | | | | On NetBSD a userland process is better with the character device interface. In addition, a block device can't be opened twice; if a Xen backend opens it, qemu can't and vice-versa. Signed-off-by: Christoph Egger <Christoph.Egger@amd.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | vmdk: fix endianness bugsAlexander Graf2011-06-081-8/+14
| | | | | | | | | | | | | | | | | | | | The vmdk code is sloppy when handling the header descriptor during creation of an image. Fix all header accesses in the create path to either store native endianness or convert it when appropriate. Reported-by: Yury Tsarev <ytsarev@novell.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* | block: clarify the meaning of BDRV_O_NOCACHEChristoph Hellwig2011-06-083-8/+8
|/ | | | | | | | | Change BDRV_O_NOCACHE to only imply bypassing the host OS file cache, but no writeback semantics. All existing callers are changed to also specify BDRV_O_CACHE_WB to give them writeback semantics. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qed: support for growing imagesStefan Hajnoczi2011-05-181-1/+21
| | | | | | | | | The .bdrv_truncate() operation resizes images and growing is easy to implement in QED. Simply check that the new size is valid and then update the image_size header field to reflect the new size. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qed: Periodically flush and clear need check bitStefan Hajnoczi2011-05-182-2/+109
| | | | | | | | | | | | | | | | | | | | | One strategy to limit the startup delay of consistency check when opening image files is to ensure that the file is marked dirty for as little time as possible. QED currently marks the image dirty when the first allocating write request is issued and clears the dirty bit again when the image is cleanly closed. In practice that means the image is marked dirty for most of a guest's lifetime and prone to being in a dirty state upon crash or power failure. It is safe to clear the dirty bit after all allocating write requests have completed and a flush has been performed. This patch adds a timer after the last allocating write request completes. When the timer fires it will flush and then clear the dirty bit. The timer is set to 5 seconds and is cancelled upon arrival of a new allocating write request. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Fix typos in comments and code (occured -> occurred and related)Stefan Weil2011-05-081-1/+1
| | | | | | | The code changed here is an unused data type name (evt_flush_occurred). Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
* Fix typo in code and commentsStefan Weil2011-05-061-2/+2
| | | | | | | Replace writeable -> writable Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
* NBD: Avoid leaking a couple of strings when the NBD device is closedNick Thomas2011-05-031-0/+4
| | | | | Signed-off-by: Nick Thomas <nick@bytemark.co.uk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qed: Fix consistency check on 32-bit hostsStefan Hajnoczi2011-04-272-3/+3
| | | | | | | | | | | | | The qed_bytes_to_clusters() function is normally used with size_t lengths. Consistency check used it with file size length and therefore failed on 32-bit hosts when the image file is 4 GB or more. Make qed_bytes_to_clusters() explicitly 64-bit and update consistency check to keep 64-bit cluster counts. Reported-by: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* vpc.c: Use get_option_parameter() does the searchMitnick Lyu2011-04-131-6/+2
| | | | | | | | Use get_option_parameter() to instead of duplicating the loop, and use BDRV_SECTOR_SIZE to instead of 512 Signed-off-by: Mitnick Lyu <mitnick.lyu@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qed: Add support for zero clustersAnthony Liguori2011-04-134-17/+66
| | | | | | | | | | | | | Zero clusters are similar to unallocated clusters except instead of reading their value from a backing file when one is available, the cluster is always read as zero. This implements read support only. At this stage, QED will never write a zero cluster. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* NBD device: Separate out parsing configuration and opening sockets.Nick Thomas2011-04-071-55/+102
| | | | | | | | We also change the way the file parameter is parsed so IPv6 IP addresses can be used, e.g.: "drive=nbd:[::1]:5000" Signed-off-by: Nick Thomas <nick@bytemark.co.uk> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Fix trivial "endianness bugs"Stefan Weil2011-04-031-2/+2
| | | | | | | Replace endianess -> endianness. Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* get rid of private bitmap functions in block/sheepdog.c, use generic onesMichael Tokarev2011-04-011-14/+1
| | | | | | | | | | | | | | | | | | | | | | | | | qemu now has generic bitmap functions, so don't redefine them in sheepdog.c, use common header instead. A small cleanup. Here's only one function which is actually used in sheepdog and gets replaced with a generic one (simplified): - static inline int test_bit(int nr, const volatile unsigned long *addr) + static inline int test_bit(int nr, const unsigned long *addr) { - return ((1UL << (nr % BITS_PER_LONG)) & ((unsigned long*)addr)[nr / BITS_PER_LONG])) != 0; + return 1UL & (addr[nr / BITS_PER_LONG] >> (nr & (BITS_PER_LONG-1))); } The body is equivalent, but the argument is not: there's "volatile" in there. Why it is used for - I'm not sure. Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Acked-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
* block/qcow: Don't ignore immediate read/write and other failuresStefan Weil2011-03-151-4/+12
| | | | | | | | | | | | | | This patch is similar to 171e3d6b9997c98a97d0c525867f7cd9b640cadd which fixed qcow2: Returning -EIO is far from optimal, but at least it's an error code. In addition to read/write failures, -EIO is also returned when decompress_cluster failed. Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/vdi: Don't ignore immediate read/write failuresStefan Weil2011-03-151-0/+5
| | | | | | | | | | | This patch is similar to 171e3d6b9997c98a97d0c525867f7cd9b640cadd which fixed qcow2: Returning -EIO is far from optimal, but at least it's an error code. Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Fix order in L2 table COWKevin Wolf2011-02-101-3/+6
| | | | | | | | | | | | When copying L2 tables (this happens only with internal snapshots), the order wasn't completely safe, so that after a crash you could end up with a L2 table that has too low refcount, possibly leading to corruption in the long run. This patch puts the operations in the right order: First allocate the new L2 table and replace the reference, and only then decrease the refcount of the old table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qed: Report error for unsupported featuresKevin Wolf2011-02-101-1/+8
| | | | | | | | | | | | Instead of just returning -ENOTSUP, generate a more detailed error. Unfortunately we don't have a helpful text for features that we don't know yet, so just print the feature mask. It might be useful at least if someone asks for help. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com> Acked-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
* qcow2: Report error for version > 2Kevin Wolf2011-02-101-2/+11
| | | | | | | | | | The qcow2 driver is now declared responsible for any QCOW image that has version 2 or greater (before this, version 3 would be detected as raw). For everything newer than version 2, an error is reported. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Anthony Liguori <aliguori@us.ibm.com>
* qcow2: Fix error handling for reading compressed clustersKevin Wolf2011-02-102-3/+5
| | | | | | | When reading a compressed cluster failed, qcow2 falsely returned success. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com>
* qcow2: Fix error handling for immediate backing file read failureKevin Wolf2011-02-101-1/+3
| | | | | | | | Requests could return success even though they failed when bdrv_aio_readv returned NULL for a backing file read. Reported-by: Chunqiang Tang <ctang@us.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* QCOW2: bug fix - read base image beyond its sizeChunqiang Tang2011-02-101-3/+2
| | | | | | | | | | | | | | | | | | This patch fixes the following bug in QCOW2. For a QCOW2 image that is larger than its base image, when handling a read request straddling over the end of the base image, the QCOW2 driver attempts to read beyond the end of the base image and the request would fail. This bug was found by Fast Virtual Disk (FVD)'s fully automated testing tool. The following test triggered the bug. dd if=/dev/zero of=/var/ramdisk/truth.raw count=0 bs=1 seek=1098561536 dd if=/dev/zero of=/var/ramdisk/zero-500M.raw count=0 bs=1 seek=593099264 ./qemu-img create -f qcow2 -ocluster_size=65536,backing_fmt=blksim -b /var/ramdisk/zero-500M.raw /var/ramdisk/test.qcow2 1098561536 ./qemu-io --auto --seed=30477694 --truth=/var/ramdisk/truth.raw --format=qcow2 --test=blksim:/var/ramdisk/test.qcow2 --verify_write=true --compare_before=false --compare_after=true --round=100000 --parallel=100 --io_size=10485760 --fail_prob=0 --cancel_prob=0 --instant_qemubh=true Signed-off-by: Chunqiang Tang <ctang@us.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/vdi: Fix wrong size in conditionally used memset, memcmpStefan Weil2011-02-071-2/+2
| | | | | | | | | | | | | | | | Error report from cppcheck: block/vdi.c:122: error: Using sizeof for array given as function argument returns the size of pointer. block/vdi.c:128: error: Using sizeof for array given as function argument returns the size of pointer. Fix both by setting the correct size. The buggy code is only used when QEMU is build without uuid support. The bug is not critical, so there is no urgent need to apply it to old versions of QEMU. Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Weil <weil@mail.berlios.de> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Really use cache=unsafe for image creationKevin Wolf2011-02-071-1/+2
| | | | | | | | For cache=unsafe we also need to set BDRV_O_CACHE_WB, otherwise we have some strange unsafe writethrough mode. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
* qcow2-refcount: remove write-only variablesBlue Swirl2011-01-311-4/+1
| | | | | | | | | | | | | Variables l2_modified and l2_size are not really used, remove them. Spotted by GCC 4.6.0: CC block/qcow2-refcount.o /src/qemu/block/qcow2-refcount.c: In function 'qcow2_update_snapshot_refcount': /src/qemu/block/qcow2-refcount.c:708:37: error: variable 'l2_modified' set but not used [-Werror=unused-but-set-variable] /src/qemu/block/qcow2-refcount.c:708:9: error: variable 'l2_size' set but not used [-Werror=unused-but-set-variable] CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* raw-win32: Fix bdrv_flush return valueKevin Wolf2011-01-311-1/+1
| | | | Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qed: Images with backing file do not require QED_F_NEED_CHECKStefan Hajnoczi2011-01-311-7/+17
| | | | | | | | | | The consistency check on open is necessary in order to fix inconsistent table offsets left as a result of a crash mid-operation. Images with a backing file actually flush before updating table offsets and are therefore guaranteed to be consistent. Do not mark these images dirty. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Add bdrv_discard supportKevin Wolf2011-01-313-0/+92
| | | | | | | | | This adds a bdrv_discard function to qcow2 that frees the discarded clusters. It does not yet pass the discard on to the underlying file system driver, but the space can be reused by future writes to the image. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
* sheepdog: support creating images on remote hostsMORITA Kazutaka2011-01-311-3/+14
| | | | | | | | This patch parses the input filename in sd_create(), and enables us specifying a target server to create sheepdog images. Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* Reorganize struct Qcow2Cache for better struct packingJes Sorensen2011-01-311-1/+1
| | | | | | | | Move size after the two pointers in struct Qcow2Cache to get better packing of struct elements on 64 bit architectures. Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qed: Refuse to create images on block devicesStefan Hajnoczi2011-01-241-0/+6
| | | | | | | | QED relies on the underlying filesystem to extend the file and maintain its size. Check that images are not created on a block device. Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Batch flushes for COWKevin Wolf2011-01-243-4/+19
| | | | | | | | | qcow2 calls bdrv_flush() after performing COW in order to ensure that the L2 table change is never written before the copy is safe on disk. Now that the L2 table is cached, we can wait with flushing until we write out the next L2 table. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Use QcowCacheKevin Wolf2011-01-245-298/+240
| | | | | | | Use the new functions of qcow2-cache.c for everything that works on refcount block and L2 tables. Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Add QcowCacheKevin Wolf2011-01-242-0/+309
| | | | | | | | | | | | | | | | | | | | | | This adds some new cache functions to qcow2 which can be used for caching refcount blocks and L2 tables. When used with cache=writethrough they work like the old caching code which is spread all over qcow2, so for this case we have merely a cleanup. The interesting case is with writeback caching (this includes cache=none) where data isn't written to disk immediately but only kept in cache initially. This leads to some form of metadata write batching which avoids the current "write to refcount block, flush, write to L2 table" pattern for each single request when a lot of cluster allocations happen. Instead, cache entries are only written out if its required to maintain the right order. In the pure cluster allocation case this means that all metadata updates for requests are done in memory initially and on sync, first the refcount blocks are written to disk, then fsync, then L2 tables. This improves performance of scenarios with lots of cluster allocations noticably (e.g. installation or after taking a snapshot). Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: fix unaligned accessAurelien Jarno2011-01-241-1/+1
| | | | | | | | | | cpu_to_be64w() is called with an obviously non-aligned pointer. Use cpu_to_be64wu() instead. It fixes unaligned accesses errors on IA64 hosts. Cc: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* vpc: fix a file descriptor leakBlue Swirl2011-01-121-17/+30
| | | | | | | Fix a file descriptor leak, reported by cppcheck: [/src/qemu/block/vpc.c:524]: (error) Resource leak: fd Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
OpenPOWER on IntegriCloud