summaryrefslogtreecommitdiffstats
path: root/block
Commit message (Collapse)AuthorAgeFilesLines
...
* qcow2: use an LRU algorithm to replace entries from the L2 cacheAlberto Garcia2015-05-221-18/+15
| | | | | | | | | | | | | | | | | | | The current algorithm to evict entries from the cache gives always preference to those in the lowest positions. As the size of the cache increases, the chances of the later elements of being removed decrease exponentially. In a scenario with random I/O and lots of cache misses, entries in positions 8 and higher are rarely (if ever) evicted. This can be seen even with the default cache size, but with larger caches the problem becomes more obvious. Using an LRU algorithm makes the chances of being removed from the cache independent from the position. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: simplify qcow2_cache_put() and qcow2_cache_entry_mark_dirty()Alberto Garcia2015-05-221-17/+15
| | | | | | | | | | | Since all tables are now stored together, it is possible to obtain the position of a particular table directly from its address, so the operation becomes O(1). Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: use one single memory block for the L2/refcount cache tablesAlberto Garcia2015-05-224-39/+39
| | | | | | | | | | | | | | | | | | The qcow2 L2/refcount cache contains one separate table for each cache entry. Doing one allocation per table adds unnecessary overhead and it also requires us to store the address of each table separately. Since the size of the cache is constant during its lifetime, it's better to have an array that contains all the tables using one single allocation. In my tests measuring freshly created caches with sizes 128MB (L2) and 32MB (refcount) this uses around 10MB of RAM less. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* vmdk: Fix overflow if l1_size is 0x20000000Fam Zheng2015-05-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Richard Jones caught this bug with afl fuzzer. In fact, that's the only possible value to overflow (extent->l1_size = 0x20000000) l1_size: l1_size = extent->l1_size * sizeof(long) => 0x80000000; g_try_malloc returns NULL because l1_size is interpreted as negative during type casting from 'int' to 'gsize', which yields a enormous value. Hence, by coincidence, we get a "not too bad" behavior: qemu-img: Could not open '/tmp/afl6.img': Could not open '/tmp/afl6.img': Cannot allocate memory Values larger than 0x20000000 will be refused by the validation in vmdk_add_extent. Values smaller than 0x20000000 will not overflow l1_size. Cc: qemu-stable@nongnu.org Reported-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Tested-by: Richard W.M. Jones <rjones@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* vmdk: Fix next_cluster_sector for compressed writeFam Zheng2015-05-221-4/+10
| | | | | | | | | | | | | | | This fixes the bug introduced by commit c6ac36e (vmdk: Optimize cluster allocation). Sometimes, write_len could be larger than cluster size, because it contains both data and marker. We must advance next_cluster_sector in this case, otherwise the image gets corrupted. Cc: qemu-stable@nongnu.org Reported-by: Antoni Villalonga <qemu-list@friki.cat> Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qcow2: Flush pending discards before allocating clusterKevin Wolf2015-05-221-0/+5
| | | | | | | | | | | | | | | | | | | | Before a freed cluster can be reused, pending discards for this cluster must be processed. The original assumption was that this was not a problem because discards are only cached during discard/write zeroes operations, which are synchronous so that no concurrent write requests can cause cluster allocations. However, the discard/write zeroes operation itself can allocate a new L2 table (and it has to in order to put zero flags there), so make sure we can cope with the situation. This fixes https://bugs.launchpad.net/bugs/1349972. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>
* block: get_block_status: use "else" when testing the opposite conditionPaolo Bonzini2015-05-221-3/+1
| | | | | | | | | | | | | A bit of Boolean algebra (and common sense) tells us that the second "if" here is looking for blocks that are not allocated. This is the opposite of the "if" that sets BDRV_BLOCK_ALLOCATED, and thus it can use an "else". Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Message-id: 1431599702-10431-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: Fix NULL deference for unaligned write if qiov is NULLFam Zheng2015-05-221-2/+95
| | | | | | | | | | | | | | | | | | | | | For zero write, callers pass in NULL qiov (qemu-io "write -z" or scsi-disk "write same"). Commit fc3959e466 fixed bdrv_co_write_zeroes which is the common case for this bug, but it still exists in bdrv_aio_write_zeroes. A simpler fix would be in bdrv_co_do_pwritev which is the NULL dereference point and covers both cases. So don't access it in bdrv_co_do_pwritev in this case, use three aligned writes. [Initialize ret to 0 in bdrv_co_do_zero_pwritev() to avoid uninitialized variable warning with gcc 4.9.2. --Stefan] Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1431522721-3266-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* Revert "block: Fix unaligned zero write"Fam Zheng2015-05-221-39/+6
| | | | | | | | | | | | | | | | This reverts commit fc3959e4669a1c2149b91ccb05101cfc7ae1fc05. The core write code already handles the case, so remove this duplication. Because commit 61007b316 moved the touched code from block.c to block/io.c, the change is manually reverted. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431522721-3266-2-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: align bounce buffers to pageDenis V. Lunev2015-05-222-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following sequence int fd = open(argv[1], O_RDWR | O_CREAT | O_DIRECT, 0644); for (i = 0; i < 100000; i++) write(fd, buf, 4096); performs 5% better if buf is aligned to 4096 bytes. The difference is quite reliable. On the other hand we do not want at the moment to enforce bounce buffering if guest request is aligned to 512 bytes. The patch changes default bounce buffer optimal alignment to MAX(page size, 4k). 4k is chosen as maximal known sector size on real HDD. The justification of the performance improve is quite interesting. From the kernel point of view each request to the disk was split by two. This could be seen by blktrace like this: 9,0 11 1 0.000000000 11151 Q WS 312737792 + 1023 [qemu-img] 9,0 11 2 0.000007938 11151 Q WS 312738815 + 8 [qemu-img] 9,0 11 3 0.000030735 11151 Q WS 312738823 + 1016 [qemu-img] 9,0 11 4 0.000032482 11151 Q WS 312739839 + 8 [qemu-img] 9,0 11 5 0.000041379 11151 Q WS 312739847 + 1016 [qemu-img] 9,0 11 6 0.000042818 11151 Q WS 312740863 + 8 [qemu-img] 9,0 11 7 0.000051236 11151 Q WS 312740871 + 1017 [qemu-img] 9,0 5 1 0.169071519 11151 Q WS 312741888 + 1023 [qemu-img] After the patch the pattern becomes normal: 9,0 6 1 0.000000000 12422 Q WS 314834944 + 1024 [qemu-img] 9,0 6 2 0.000038527 12422 Q WS 314835968 + 1024 [qemu-img] 9,0 6 3 0.000072849 12422 Q WS 314836992 + 1024 [qemu-img] 9,0 6 4 0.000106276 12422 Q WS 314838016 + 1024 [qemu-img] and the amount of requests sent to disk (could be calculated counting number of lines in the output of blktrace) is reduced about 2 times. Both qemu-img and qemu-io are affected while qemu-kvm is not. The guest does his job well and real requests comes properly aligned (to page). Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-3-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: minimal bounce buffer alignmentDenis V. Lunev2015-05-222-1/+7
| | | | | | | | | | | | | | | | | | | | The patch introduces new concept: minimal memory alignment for bounce buffers. Original so called "optimal" value is actually minimal required value for aligment. It should be used for validation that the IOVec is properly aligned and bounce buffer is not required. Though, from the performance point of view, it would be better if bounce buffer or IOVec allocated by QEMU will be aligned stricter. The patch does not change any alignment value yet. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-id: 1431441056-26198-2-git-send-email-den@openvz.org CC: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block: return EPERM on writes or discards to read-only devicesPaolo Bonzini2015-05-221-2/+2
| | | | | | | | | | | | | | | | | This is the behavior in the operating system, for example Linux's blkdev_write_iter has the following: if (bdev_read_only(I_BDEV(bd_inode))) return -EPERM; This does not apply to opening a device for read/write, when the device only supports read-only operation. In this case any of EACCES, EPERM or EROFS is acceptable depending on why writing is not possible. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-id: 1431013548-22492-1-git-send-email-pbonzini@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: improve image writing performance furtherDenis V. Lunev2015-05-221-20/+23
| | | | | | | | | | | | | | | | | Try to perform IO for the biggest continuous block possible. All blocks abscent in the image are accounted in the same type and preallocation is made for all of them at once. The performance for sequential write is increased from 200 Mb/sec to 235 Mb/sec on my SSD HDD. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-28-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: optimize linear image expansionDenis V. Lunev2015-05-221-10/+32
| | | | | | | | | | | | | | | | | | | | | | | | Plain image expansion spends a lot of time to update image file size. This seriously affects the performance. The following simple test qemu_img create -f parallels -o cluster_size=64k ./1.hds 64G qemu_io -n -c "write -P 0x11 0 1024M" ./1.hds could be improved if the format driver will pre-allocate some space in the image file with a reasonable chunk. This patch preallocates 128 Mb using bdrv_write_zeroes, which should normally use fallocate() call inside. Fallback to older truncate() could be used as a fallback using image open options thanks to the previous patch. The benefit is around 15%. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Karan <rkagan@parallels.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-27-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: add prealloc-mode and prealloc-size open paramemetsDenis V. Lunev2015-05-221-6/+77
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is preparational commit for tweaks in Parallels image expansion. The idea is that enlarge via truncate by one data block is slow. It would be much better to use fallocate via bdrv_write_zeroes and expand by some significant amount at once. Original idea with sequential file writing to the end of the file without fallocate/truncate would be slower than this approach if the image is expanded with several operations: - each image expanding means file metadata update, i.e. filesystem journal write. Truncate/write to newly truncated space update file metadata twice thus truncate removal helps. With fallocate call inside bdrv_write_zeroes file metadata is updated only once and this should happen infrequently thus this approach is the best one for the image expansion - tail writes are ordered, i.e. the guest IO queue could not be sent immediately to the host introducing additional IO delays This patch just adds proper parameters into BDRVParallelsState and performs options parsing in parallels_open. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-26-git-send-email-den@openvz.org CC: Roman Kagan <rkagan@parallels.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: delay writing to BAT till bdrv_co_flush_to_osDenis V. Lunev2015-05-221-6/+44
| | | | | | | | | | | | | | | | | | | | | | | | The idea is that we do not need to immediately sync BAT to the image as from the guest point of view there is a possibility that IO is lost even in the physical controller until flush command was finished. bdrv_co_flush_to_os is exactly the right place for this purpose. Technically the patch uses loaded BAT data as a cache and performs actual on-disk metadata updates in parallels_co_flush_to_os callback. This patch speed ups qemu-img create -f parallels -o cluster_size=64k ./1.hds 64G qemu-io -f parallels -c "write -P 0x11 0 1024k" 1.hds writing from 50-60 Mb/sec to 80-90 Mb/sec on rotational media and from 160 Mb/sec to 190 Mb/sec on SSD disk. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-25-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: create bat_entry_off helperDenis V. Lunev2015-05-221-6/+9
| | | | | | | | | | | | calculate offset of the BAT entry in the image file. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-24-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: improve image reading performanceDenis V. Lunev2015-05-221-5/+31
| | | | | | | | | | | | | | Try to perform IO for the biggest continuous block possible. The performance for sequential read is increased from 220 Mb/sec to 360 Mb/sec for continous image on my SSD HDD. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-23-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: implement incorrect close detectionDenis V. Lunev2015-05-221-0/+50
| | | | | | | | | | | | | | | | | | | | | | | The software driver must set inuse field in Parallels header to 0x746F6E59 when the image is opened in read-write mode. The presence of this magic in the header on open forces image consistency check. There is an unfortunate trick here. We can not check for inuse in parallels_check as this will happen too late. It is possible to do that for simple check, but during the fix this would always report an error as the image was opened in BDRV_O_RDWR mode. Thus we save the flag in BDRVParallelsState for this. On the other hand, nothing should be done to clear inuse in parallels_check. Generic close will do the job right. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-21-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: implement parallels_check method of block driverDenis V. Lunev2015-05-221-0/+85
| | | | | | | | | | | | | | | | The check is very simple at the moment. It calculates necessary stats and fix only the following errors: - space leak at the end of the image. This would happens due to preallocation - clusters outside the image are zeroed. Nothing else could be done here Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-20-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: move parallels_open/probe to the very end of the fileDenis V. Lunev2015-05-221-93/+98
| | | | | | | | | | | | | | | This will help to avoid forward declarations for upcoming parallels_check Some very obvious formatting fixes were made to the moved code to make checkpatch happy. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-19-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: read parallels image header and BAT into single bufferDenis V. Lunev2015-05-221-7/+17
| | | | | | | | | | | | | | This metadata cache would allow to properly batch BAT updates to disk in next patches. These updates will be properly aligned to avoid read-modify-write transactions on block level. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-18-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: keep BAT bitmap data in little endian in memoryDenis V. Lunev2015-05-221-12/+5
| | | | | | | | | | | | | This will allow to use this data as buffer to BAT update directly without any intermediate buffers. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-17-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: create bat2sect helperDenis V. Lunev2015-05-221-3/+9
| | | | | | | | | | | | deduplicate copy/paste arithmetcs Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-16-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: rename catalog_ names to bat_Denis V. Lunev2015-05-221-28/+30
| | | | | | | | | | | | | | | | BAT means 'block allocation table'. Thus this name is clean and shorter on writing. Some obvious formatting fixes in the old code were made to make checkpatch happy. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-15-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* parallels: change copyright information in the image headerDenis V. Lunev2015-05-221-1/+5
| | | | | | | | | | Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-14-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: support parallels image creationDenis V. Lunev2015-05-221-0/+97
| | | | | | | | | | | | | | | | | Do not even care to create WithoutFreeSpace image, it is obsolete. Always create WithouFreSpacExt one. The code also does not spend a lot of efforts to fill cylinders and heads fields, they are not used actually in a real life neither in QEMU nor in Parallels products. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-12-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: _co_writev callback for Parallels formatDenis V. Lunev2015-05-221-2/+88
| | | | | | | | | | | | | | Support write on Parallels images. The code is almost the same as one in the previous patch implemented scatter-gather IO for read. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-10-git-send-email-den@openvz.org CC: Roman Kagan <rkagan@parallels.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: mark parallels format driver as zero initedDenis V. Lunev2015-05-221-0/+1
| | | | | | | | | | | | | From the guest point of view unallocated blocks are zeroed. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-9-git-send-email-den@openvz.org CC: Roman Kagan <rkagan@parallels.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: replace magic constants 4, 64 with proper sizeofsDenis V. Lunev2015-05-221-4/+4
| | | | | | | | | | | | simple purification.. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-8-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: provide _co_readv routine for parallels format driverDenis V. Lunev2015-05-221-21/+33
| | | | | | | | | | | | | | | | | | Main approach is taken from qcow2_co_readv. The patch drops coroutine lock for the duration of IO operation and peforms normal scatter-gather IO using standard QEMU backend. The patch also adds comment about locking considerations in the driver. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Signed-off-by: Roman Kagan <rkagan@parallels.com> Message-id: 1430207220-24458-7-git-send-email-den@openvz.org CC: Roman Kagan <rkagan@parallels.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: add get_block_statusRoman Kagan2015-05-221-0/+21
| | | | | | | | | | | | | | | Implement VFS method for get_block_status to Parallels format driver. qemu_co_mutex_lock is not necessary yet (the driver is read-only) but will be necessary very soon when write will be supported. Signed-off-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Denis V. Lunev <den@openvz.org> Message-id: 1430207220-24458-6-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: read up to cluster end in one goRoman Kagan2015-05-221-5/+13
| | | | | | | | | | | | | Teach parallels_read() to do reads in coarser granularity than just a single sector: if requested, read up to the cluster end in one go. Signed-off-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1430207220-24458-5-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: switch to bdrv_readRoman Kagan2015-05-221-9/+11
| | | | | | | | | | | | | | | | | Switch the .bdrv_read method implementation from using bdrv_pread() to bdrv_read() on the underlying file, since the latter is subject to i/o throttling while the former is not. Besides, since bdrv_read() operates in sectors rather than bytes, adjust the helper functions to do so too. Signed-off-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Denis V. Lunev <den@openvz.org> Message-id: 1430207220-24458-4-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> CC: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* block/parallels: rename parallels_header to ParallelsHeaderDenis V. Lunev2015-05-221-4/+4
| | | | | | | | | | | this follows QEMU coding convention Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@parallels.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1430207220-24458-3-git-send-email-den@openvz.org CC: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
* Merge remote-tracking branch 'remotes/qmp-unstable/tags/for-upstream' into ↵Peter Maydell2015-05-121-3/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | staging QMP pull request # gpg: Signature made Mon May 11 14:15:19 2015 BST using RSA key ID E24ED5A7 # gpg: Good signature from "Luiz Capitulino <lcapitulino@gmail.com>" * remotes/qmp-unstable/tags/for-upstream: scripts: qmp-shell: Add verbose flag scripts: qmp-shell: add transaction subshell scripts: qmp-shell: Expand support for QMP expressions scripts: qmp-shell: refactor helpers MAINTAINERS: New maintainer for QMP and QAPI json-parser: Accept 'null' in QMP qobject: Add a special null QObject qobject: Clean up around qtype_code QJSON: Use OBJECT_CHECK Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
| * qobject: Clean up around qtype_codeMarkus Armbruster2015-05-111-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QTYPE_NONE is a sentinel value. No QObject has this type code. Document it properly. Fix dump_qobject() to abort() on QTYPE_NONE, just like for any other invalid type code. Fix to_json() to abort() on all invalid type codes, not just QTYPE_MAX. Clean up Property member qtype's type: it's a qtype_code. Signed-off-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
* | sheepdog: fix resource leak with sd_snapshot_createzhanghailiang2015-05-081-0/+1
|/ | | | | | Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
* block: move I/O request processing to block/io.cStefan Hajnoczi2015-04-282-1/+2541
| | | | | | | | | | | | | | | | | | | | | | | | The block.c file has grown to over 6000 lines. It is time to split this file so there are fewer conflicts and the code is easier to maintain. Extract I/O request processing code: * Read * Write * Zero writes and making the image empty * Flush * Discard * ioctl * Tracked requests and queuing * Throttling and copy-on-read * Block status and allocated functions * Refreshing block limits * Reading/writing vmstate * qemu_blockalign() and friends The patch simply moves code from block.c into block/io.c. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* vmdk: Widen before shifting 32 bit header fieldFam Zheng2015-04-281-1/+1
| | | | | | | | | | | Coverity spotted this. The field is 32 bits, but if it's possible to overflow in 32 bit left shift. Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: John Snow <jsnow@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/dmg: make it modularMichael Tokarev2015-04-281-1/+2
| | | | | | | | dmg can optionally utilize libbz2, make it modular Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/mirror: Always call block_job_sleep_ns()Max Reitz2015-04-281-3/+0
| | | | | | | | | | | | The mirror block job is trying to take a clever shortcut if delay_ns is 0 and skips block_job_sleep_ns() in that case. But that function must be called in every block job iteration, because otherwise it is for example impossible to pause the job. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block: Ensure consistent bitmap function prototypesJohn Snow2015-04-282-17/+11
| | | | | | | | | | | | We often don't need the BlockDriverState for functions that operate on bitmaps. Remove it. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-15-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qmp: Add support of "dirty-bitmap" sync mode for drive-backupJohn Snow2015-04-282-22/+137
| | | | | | | | | | | | | | | | For "dirty-bitmap" sync mode, the block job will iterate through the given dirty bitmap to decide if a sector needs backup (backup all the dirty clusters and skip clean ones), just as allocation conditions of "top" sync mode. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-11-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qmp: Add block-dirty-bitmap-add and block-dirty-bitmap-removeJohn Snow2015-04-281-9/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new command pair is added to manage a user created dirty bitmap. The dirty bitmap's name is mandatory and must be unique for the same device, but different devices can have bitmaps with the same names. The granularity is an optional field. If it is not specified, we will choose a default granularity based on the cluster size if available, clamped to between 4K and 64K to mirror how the 'mirror' code was already choosing granularity. If we do not have cluster size info available, we choose 64K. This code has been factored out into a helper shared with block/mirror. This patch also introduces the 'block_dirty_bitmap_lookup' helper, which takes a device name and a dirty bitmap name and validates the lookup, returning NULL and setting errp if there is a problem with either field. This helper will be re-used in future patches in this series. The types added to block-core.json will be re-used in future patches in this series, see: 'qapi: Add transaction support to block-dirty-bitmap-{add, enable, disable}' Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Message-id: 1429314609-29776-5-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qmp: Ensure consistent granularity typeJohn Snow2015-04-281-2/+2
| | | | | | | | | | | | | We treat this field with a variety of different types everywhere in the code. Now it's just uint32_t. Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-4-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* qapi: Add optional field "name" to block dirty bitmapFam Zheng2015-04-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | This field will be set for user created dirty bitmap. Also pass in an error pointer to bdrv_create_dirty_bitmap, so when a name is already taken on this BDS, it can report an error message. This is not global check, two BDSes can have dirty bitmap with a common name. Implemented bdrv_find_dirty_bitmap to find a dirty bitmap by name, will be used later when other QMP commands want to reference dirty bitmap by name. Add bdrv_dirty_bitmap_make_anon. This unsets the name of dirty bitmap. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: John Snow <jsnow@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1429314609-29776-3-git-send-email-jsnow@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/iscsi: use the allocationmap also if cache.direct=onPeter Lieven2015-04-281-1/+1
| | | | | | | | | | | | | the allocationmap has only a hint character. The driver always double checks that blocks marked unallocated in the cache are still unallocated before taking the fast path and return zeroes. So using the allocationmap is migration safe and can also be enabled with cache.direct=on. Signed-off-by: Peter Lieven <pl@kamp.de> Message-id: 1429193313-4263-10-git-send-email-pl@kamp.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/iscsi: bump year in copyright noticePeter Lieven2015-04-281-1/+1
| | | | | | | Signed-off-by: Peter Lieven <pl@kamp.de> Message-id: 1429193313-4263-9-git-send-email-pl@kamp.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
* block/iscsi: handle SCSI_STATUS_TASK_SET_FULLPeter Lieven2015-04-281-2/+5
| | | | | | | | | | a target may issue a SCSI_STATUS_TASK_SET_FULL status if there is more than one "BUSY" command queued already. Signed-off-by: Peter Lieven <pl@kamp.de> Message-id: 1429193313-4263-8-git-send-email-pl@kamp.de Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
OpenPOWER on IntegriCloud