summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* dm crypt: remove unused io_pool and _crypt_io_poolMikulas Patocka2015-02-161-22/+2
| | | | | | | | | The previous commit ("dm crypt: don't allocate pages for a partial request") stopped using the io_pool slab mempool and backing _crypt_io_pool kmem cache. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm crypt: avoid deadlock in mempoolsMikulas Patocka2015-02-161-5/+36
| | | | | | | | | | | | | | | | | | | | Fix a theoretical deadlock introduced in the previous commit ("dm crypt: don't allocate pages for a partial request"). The function crypt_alloc_buffer may be called concurrently. If we allocate from the mempool concurrently, there is a possibility of deadlock. For example, if we have mempool of 256 pages, two processes, each wanting 256, pages allocate from the mempool concurrently, it may deadlock in a situation where both processes have allocated 128 pages and the mempool is exhausted. To avoid such a scenario we allocate the pages under a mutex. In order to not degrade performance with excessive locking, we try non-blocking allocations without a mutex first and if that fails, we fallback to a blocking allocations with a mutex. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm crypt: don't allocate pages for a partial requestMikulas Patocka2015-02-161-109/+30
| | | | | | | | | | | | | | | | Change crypt_alloc_buffer so that it only ever allocates pages for a full request. This is a prerequisite for the commit "dm crypt: offload writes to thread". This change simplifies the dm-crypt code at the expense of reduced throughput in low memory conditions (where allocation for a partial request is most useful). Note: the next commit ("dm crypt: avoid deadlock in mempools") is needed to fix a theoretical deadlock. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm crypt: use unbound workqueue for request processingMikulas Patocka2015-02-162-16/+40
| | | | | | | | | | Use unbound workqueue by default so that work is automatically balanced between available CPUs. The original behavior of encrypting using the same cpu that IO was submitted on can still be enabled by setting the optional 'same_cpu_crypt' table argument. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* dm io: reject unsupported DISCARD requests with EOPNOTSUPPDarrick J. Wong2015-02-131-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I created a dm-raid1 device backed by a device that supports DISCARD and another device that does NOT support DISCARD with the following dm configuration: # echo '0 2048 mirror core 1 512 2 /dev/sda 0 /dev/sdb 0' | dmsetup create moo # lsblk -D NAME DISC-ALN DISC-GRAN DISC-MAX DISC-ZERO sda 0 4K 1G 0 `-moo (dm-0) 0 4K 1G 0 sdb 0 0B 0B 0 `-moo (dm-0) 0 4K 1G 0 Notice that the mirror device /dev/mapper/moo advertises DISCARD support even though one of the mirror halves doesn't. If I issue a DISCARD request (via fstrim, mount -o discard, or ioctl BLKDISCARD) through the mirror, kmirrord gets stuck in an infinite loop in do_region() when it tries to issue a DISCARD request to sdb. The problem is that when we call do_region() against sdb, num_sectors is set to zero because q->limits.max_discard_sectors is zero. Therefore, "remaining" never decreases and the loop never terminates. To fix this: before entering the loop, check for the combination of REQ_DISCARD and no discard and return -EOPNOTSUPP to avoid hanging up the mirror device. This bug was found by the unfortunate coincidence of pvmove and a discard operation in the RHEL 6.5 kernel; upstream is also affected. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Acked-by: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
* dm mirror: do not degrade the mirror on discard errorMikulas Patocka2015-02-131-0/+9
| | | | | | | | | | | | | | | | | | It may be possible that a device claims discard support but it rejects discards with -EOPNOTSUPP. It happens when using loopback on ext2/ext3 filesystem driven by the ext4 driver. It may also happen if the underlying devices are moved from one disk on another. If discard error happens, we reject the bio with -EOPNOTSUPP, but we do not degrade the array. This patch fixes failed test shell/lvconvert-repair-transient.sh in the lvm2 testsuite if the testsuite is extracted on an ext2 or ext3 filesystem and it is being driven by the ext4 driver. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org
* dm space map disk: fix sm_disk_count_is_more_than_one()Mike Snitzer2015-02-131-1/+3
| | | | | | | | | | | | | dm_tm_shadow_block() is the only caller of dm_sm_count_is_more_than_one() which only ever operates on a metadata space-map. So in practice, sm_disk_count_is_more_than_one() isn't actually used (which explains why this bug never amounted to anything). But fix sm_disk_count_is_more_than_one() to properly set *result and return 0. Reported-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* Merge tag 'dm-3.20-changes' of ↵Linus Torvalds2015-02-1217-198/+416
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper changes from Mike Snitzer: - The most significant change this cycle is request-based DM now supports stacking ontop of blk-mq devices. This blk-mq support changes the model request-based DM uses for cloning a request to relying on calling blk_get_request() directly from the underlying blk-mq device. An early consumer of this code is Intel's emerging NVMe hardware; thanks to Keith Busch for working on, and pushing for, these changes. - A few other small fixes and cleanups across other DM targets. * tag 'dm-3.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm: inherit QUEUE_FLAG_SG_GAPS flags from underlying queues dm snapshot: remove unnecessary NULL checks before vfree() calls dm mpath: simplify failure path of dm_multipath_init() dm thin metadata: remove unused dm_pool_get_data_block_size() dm ioctl: fix stale comment above dm_get_inactive_table() dm crypt: update url in CONFIG_DM_CRYPT help text dm bufio: fix time comparison to use time_after_eq() dm: use time_in_range() and time_after() dm raid: fix a couple integer overflows dm table: train hybrid target type detection to select blk-mq if appropriate dm: allocate requests in target when stacking on blk-mq devices dm: prepare for allocating blk-mq clone requests in target dm: submit stacked requests in irq enabled context dm: split request structure out from dm_rq_target_io structure dm: remove exports for request-based interfaces without external callers
| * dm: inherit QUEUE_FLAG_SG_GAPS flags from underlying queuesKeith Busch2015-02-111-0/+13
| | | | | | | | | | | | | | | | | | | | | | A DM device must inherit the QUEUE_FLAG_SG_GAPS flags from its underlying block devices' request queues. This fixes problems when submitting cloned requests to multipathed devices requiring virtually contiguous buffers. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm snapshot: remove unnecessary NULL checks before vfree() callsMarkus Elfring2015-02-091-10/+4
| | | | | | | | | | | | | | | | | | | | The vfree() function performs input parameter validation. Thus the NULL pointer test around vfree() calls is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm mpath: simplify failure path of dm_multipath_init()Johannes Thumshirn2015-02-091-9/+15
| | | | | | | | | | | | | | | | Currently the cleanup of all error cases are open-coded. Introduce a common exit path and labels. Signed-off-by: Johannes Thumshirn <morbidrsa@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm thin metadata: remove unused dm_pool_get_data_block_size()Rickard Strandqvist2015-02-092-11/+0
| | | | | | | | | | | | | | | | | | | | | | The thin-pool target doesn't display the data block size as part of its table status, unlike the dm-cache target, so there is no need for dm_pool_get_data_block_size(). This was found using cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm ioctl: fix stale comment above dm_get_inactive_table()Junxiao Bi2015-02-091-2/+2
| | | | | | | | | | | | | | dm_table_put() was replaced by dm_put_live_table(). Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm crypt: update url in CONFIG_DM_CRYPT help textLoic Pefferkorn2015-02-091-3/+2
| | | | | | | | | | | | | | Update the obsolete url in the CONFIG_DM_CRYPT help text. Signed-off-by: Loic Pefferkorn <loic@loicp.eu> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm bufio: fix time comparison to use time_after_eq()Asaf Vertz2015-02-091-1/+2
| | | | | | | | | | | | | | | | To be future-proof and for better readability the time comparison is modified to use time_after_eq() instead of plain, error-prone math. Signed-off-by: Asaf Vertz <asaf.vertz@tandemg.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm: use time_in_range() and time_after()Manuel Schölling2015-02-093-6/+9
| | | | | | | | | | | | | | | | To be future-proof and for better readability the time comparisons are modified to use time_in_range() and time_after() instead of plain, error-prone math. Signed-off-by: Manuel Schölling <manuel.schoelling@gmx.de> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm raid: fix a couple integer overflowsDan Carpenter2015-02-091-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | My static checker complains that if "num_raid_params" is UINT_MAX then the "if (num_raid_params + 1 > argc) {" check doesn't work as intended. The other change is that I moved the "if (argc != (num_raid_devs * 2))" condition forward a few lines so it was before the call to context_alloc(). If we had an integer overflow inside that function then it would lead to an immediate crash. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm table: train hybrid target type detection to select blk-mq if appropriateMike Snitzer2015-02-092-16/+22
| | | | | | | | | | | | | | | | | | | | | | Otherwise replacing the multipath target with the error target fails: device-mapper: ioctl: can't change device type after initial table load. The error target was mistakenly considered to be target type DM_TYPE_REQUEST_BASED rather than DM_TYPE_MQ_REQUEST_BASED even if the target it was to replace was of type DM_TYPE_MQ_REQUEST_BASED. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm: allocate requests in target when stacking on blk-mq devicesMike Snitzer2015-02-097-48/+185
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For blk-mq request-based DM the responsibility of allocating a cloned request is transfered from DM core to the target type. Doing so enables the cloned request to be allocated from the appropriate blk-mq request_queue's pool (only the DM target, e.g. multipath, can know which block device to send a given cloned request to). Care was taken to preserve compatibility with old-style block request completion that requires request-based DM _not_ acquire the clone request's queue lock in the completion path. As such, there are now 2 different request-based DM target_type interfaces: 1) the original .map_rq() interface will continue to be used for non-blk-mq devices -- the preallocated clone request is passed in from DM core. 2) a new .clone_and_map_rq() and .release_clone_rq() will be used for blk-mq devices -- blk_get_request() and blk_put_request() are used respectively from these hooks. dm_table_set_type() was updated to detect if the request-based target is being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set. DM core disallows switching the DM table's type after it is set. This means that there is no mixing of non-blk-mq and blk-mq devices within the same request-based DM table. [This patch was started by Keith and later heavily modified by Mike] Tested-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm: prepare for allocating blk-mq clone requests in targetKeith Busch2015-02-091-68/+66
| | | | | | | | | | | | | | | | | | | | | | | | For blk-mq request-based DM the responsibility of allocating a cloned request will be transfered from DM core to the target type. To prepare for conditionally using this new model the original request's 'special' now points to the dm_rq_target_io because the clone is allocated later in the block layer rather than in DM core. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm: submit stacked requests in irq enabled contextKeith Busch2015-02-092-18/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch to having request-based DM enqueue all prep'ed requests into work processed by another thread. This allows request-based DM to invoke block APIs that assume interrupt enabled context (e.g. blk_get_request) and is a prerequisite for adding blk-mq support to request-based DM. The new kernel thread is only initialized for request-based DM devices. multipath_map() is now always in irq enabled context so change multipath spinlock (m->lock) locking to always disable interrupts. Signed-off-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm: split request structure out from dm_rq_target_io structureMike Snitzer2015-02-091-9/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Request-based DM support for blk-mq devices requires that dm_rq_target_io structures not be allocated with an embedded request structure. The request-based DM target (e.g. dm-multipath) must allocate the request from the blk-mq devices' request_queue using blk_get_request(). The unfortunate side-effect of this change is old-style request-based DM support will no longer use contiguous memory for the dm_rq_target_io and request structures for each clone. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
| * dm: remove exports for request-based interfaces without external callersMike Snitzer2015-02-092-9/+3
| | | | | | | | | | | | | | Remove exports for dm_dispatch_request, dm_requeue_unmapped_request, and dm_kill_unmapped_request. Signed-off-by: Mike Snitzer <snitzer@redhat.com>
* | Merge branch 'for-3.20/drivers' of git://git.kernel.dk/linux-blockLinus Torvalds2015-02-1219-440/+476
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block driver changes from Jens Axboe: "This contains: - The 4k/partition fixes for brd from Boaz/Matthew. - A few xen front/back block fixes from David Vrabel and Roger Pau Monne. - Floppy changes from Takashi, cleaning the device file creation. - Switching libata to use the new blk-mq tagging policy, removing code (and a suboptimal implementation) from libata. This will throw you a merge conflict, since a bug in the original libata tagging code was fixed since this code was branched. Trivial. From Shaohua. - Conversion of loop to blk-mq, from Ming Lei. - Cleanup of the io_schedule() handling in bsg from Peter Zijlstra. He claims it improves on unreadable code, which will cost him a beer. - Maintainer update or NDB, now handled by Markus Pargmann. - NVMe: - Optimization from me that avoids a kmalloc/kfree per IO for smaller (<= 8KB) IO. This cuts about 1% of high IOPS CPU overhead. - Removal of (now) dead RCU code, a relic from before NVMe was converted to blk-mq" * 'for-3.20/drivers' of git://git.kernel.dk/linux-block: xen-blkback: default to X86_32 ABI on x86 xen-blkfront: fix accounting of reqs when migrating xen-blkback,xen-blkfront: add myself as maintainer block: Simplify bsg complete all floppy: Avoid manual call of device_create_file() NVMe: avoid kmalloc/kfree for smaller IO MAINTAINERS: Update NBD maintainer libata: make sata_sil24 use fifo tag allocator libata: move sas ata tag allocation to libata-scsi.c libata: use blk taging NVMe: within nvme_free_queues(), delete RCU sychro/deferred free null_blk: suppress invalid partition info brd: Request from fdisk 4k alignment brd: Fix all partitions BUGs axonram: Fix bug in direct_access loop: add blk-mq.h include block: loop: don't handle REQ_FUA explicitly block: loop: introduce lo_discard() and lo_req_flush() block: loop: say goodby to bio block: loop: improve performance via blk-mq
| * | xen-blkback: default to X86_32 ABI on x86David Vrabel2015-02-102-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Prior to the existance of 64-bit backends using the X86_64 ABI, frontends used the X86_32 ABI. These old frontends do not specify the ABI and when used with a 64-bit backend do not work. On x86, default to the X86_32 ABI if one is not specified. Backends on ARM continue to default to their NATIVE ABI. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com>
| * | xen-blkfront: fix accounting of reqs when migratingRoger Pau Monne2015-02-101-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current migration code uses blk_put_request in order to finish a request before requeuing it. This function doesn't update the statistics of the queue, which completely screws accounting. Use blk_end_request_all instead which properly updates the statistics of the queue. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Reported-and-Tested-by: Ouyang Zhaowei (Charles) <ouyangzhaowei@huawei.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: xen-devel@lists.xenproject.org
| * | xen-blkback,xen-blkfront: add myself as maintainerRoger Pau Monne2015-02-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | I've done quite a lot of work in blkfront/blkback, and I usually end up looking at the patches, so add myself as maintainer together with Konrad. Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | block: Simplify bsg complete allPeter Zijlstra2015-02-042-47/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It took me a few tries to figure out what this code did; lets rewrite it into a more regular form. The thing that makes this one 'special' is the BSG_F_BLOCK flag, if that is not set we're not supposed/allowed to block and should spin wait for completion. The (new) io_wait_event() will never see a false condition in case of the spinning and we will therefore not block. Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | floppy: Avoid manual call of device_create_file()Takashi Iwai2015-02-031-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | Use the static attribute groups assigned to the device instead of calling device_create_file() after the device registration. Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
| * | NVMe: avoid kmalloc/kfree for smaller IOJens Axboe2015-01-292-33/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we allocate an nvme_iod for each IO, which holds the sg list, prps, and other IO related info. Set a threshold of 2 pages and/or 8KB of data, below which we can just embed this in the per-command pdu in blk-mq. For any IO at or below NVME_INT_PAGES and NVME_INT_BYTES, we save a kmalloc and kfree. For higher IOPS, this saves up to 1% of CPU time. Signed-off-by: Jens Axboe <axboe@fb.com> Reviewed-by: Keith Busch <keith.busch@intel.com>
| * | MAINTAINERS: Update NBD maintainerMarkus Pargmann2015-01-281-1/+2
| | | | | | | | | | | | | | | | | | | | | Paul stops maintining NBD and I will take his place from now on. Signed-off-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | libata: make sata_sil24 use fifo tag allocatorShaohua Li2015-01-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libata starts using block tag now, we can use BLK_TAG_ALLOC_FIFO to solve the sata_sil24 tag bug. https://bugzilla.kernel.org/show_bug.cgi?id=87101 Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Shaohua Li <shli@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | libata: move sas ata tag allocation to libata-scsi.cShaohua Li2015-01-243-77/+46
| | | | | | | | | | | | | | | | | | | | | | | | Basically move the sas ata tag allocation to libata-scsi.c to make it clear these staffs are just for sas. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | libata: use blk tagingShaohua Li2015-01-234-17/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | libata uses its own tag management which is duplication and the implementation is poor. And if we switch to blk-mq, tag is build-in. It's time to switch to generic taging. The SAS driver has its own tag management, and looks we can't directly map the host controler tag to SATA tag. So I just bypassed the SAS case. I changed the code/variable name for the tag management of libata to make it self contained. Only sas will use it. Later if libsas implements its tag management, the tag management code in libata can be deleted easily. Cc: Jens Axboe <axboe@fb.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Shaohua Li <shli@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | Merge branch 'for-3.20/core' into for-3.20/driversJens Axboe2015-01-2316-62/+146
| |\ \ | | | | | | | | | | | | We need the tagging changes for the libata conversion.
| * | | NVMe: within nvme_free_queues(), delete RCU sychro/deferred freekaoudis2015-01-211-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Converting from to blk-queue got rid of the driver's RCU locking-on-queue, so removing unnecessary RCU locking-on-queue artefacts. Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Kelly Nicole Kaoudis <kaoudis@colorado.edu> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | null_blk: suppress invalid partition infoJens Axboe2015-01-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | null_blk is partitionable, but it doesn't store any of the info. When it is loaded, you would normally see: [1226739.343608] nullb0: unknown partition table [1226739.343746] nullb1: unknown partition table which can confuse some people. Add the appropriate gendisk flag to suppress this info. Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | brd: Request from fdisk 4k alignmentBoaz Harrosh2015-01-131-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because of the direct_access() API which returns a PFN. partitions better start on 4K boundary, else offset ZERO of a partition will not be aligned and blk_direct_access() will fail the call. By setting blk_queue_physical_block_size(PAGE_SIZE) we can communicate this to fdisk and friends. The call to blk_queue_physical_block_size() is harmless and will not affect the Kernel behavior in any way. It is only for communication to user-mode. before this patch running fdisk on a default size brd of 4M the first sector offered is 34 (BAD), but after this patch it will be 40, ie 8 sectors aligned. Also when entering some random partition sizes the next partition-start sector is offered 8 sectors aligned after this patch. (Please note that with fdisk the user can still enter bad values, only the offered default values will be correct) Note that with bdev-size > 4M fdisk will try to align on a 1M boundary (above first-sector will be 2048), in any case. CC: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | brd: Fix all partitions BUGsBoaz Harrosh2015-01-131-62/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes up brd's partitions scheme, now enjoying all worlds. The MAIN fix here is that currently, if one fdisks some partitions, a BAD bug will make all partitions point to the same start-end sector ie: 0 - brd_size And an mkfs of any partition would trash the partition table and the other partition. Another fix is that "mount -U uuid" will not work if show_part was not specified, because of the GENHD_FL_SUPPRESS_PARTITION_INFO flag. We now always load without it and remove the show_part parameter. [We remove Dmitry's new module-param part_show it is now always show] So NOW the logic goes like this: * max_part - Just says how many minors to reserve between ramX devices. In any way, there can be as many partition as requested. If minors between devices ends, then dynamic 259-major ids will be allocated on the fly. The default is now max_part=1, which means all partitions devt(s) will be from the dynamic (259) major-range. (If persistent partition minors is needed use max_part=X) For example with /dev/sdX max_part is hard coded 16. * Creation of new devices on the fly still/always work: mknod /path/devnod b 1 X fdisk -l /path/devnod Will create a new device if [X / max_part] was not already created before. (Just as before) partitions on the dynamically created device will work as well Same logic applies with minors as with the pre-created ones. TODO: dynamic grow of device size. So each device can have it's own size. CC: Dmitry Monakhov <dmonakhov@openvz.org> Tested-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Boaz Harrosh <boaz@plexistor.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | Merge branch 'for-3.20/core' into for-3.20/driversJens Axboe2015-01-137-58/+86
| |\ \ \
| * | | | axonram: Fix bug in direct_accessMatthew Wilcox2015-01-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'pfn' returned by axonram was completely bogus, and has been since 2008. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | loop: add blk-mq.h includeJens Axboe2015-01-022-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Looks like we pull it in through other ways on x86, but we fail on sparc: In file included from drivers/block/cryptoloop.c:30:0: drivers/block/loop.h:63:24: error: field 'tag_set' has incomplete type struct blk_mq_tag_set tag_set; Add the include to loop.h, kill it from loop.c. Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | block: loop: don't handle REQ_FUA explicitlyMing Lei2015-01-021-11/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | block core handles REQ_FUA by its flush state machine, so won't do it in loop explicitly. Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | block: loop: introduce lo_discard() and lo_req_flush()Ming Lei2015-01-021-33/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No behaviour change, just move the handling for REQ_DISCARD and REQ_FLUSH in these two functions. Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | block: loop: say goodby to bioMing Lei2015-01-021-25/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch to block request completely. Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | block: loop: improve performance via blk-mqMing Lei2015-01-022-170/+174
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion is a bit straightforward, and use work queue to dispatch requests of loop block, and one big change is that requests is submitted to backend file/device concurrently with work queue, so throughput may get improved much. Given write requests over same file are often run exclusively, so don't handle them concurrently for avoiding extra context switch cost, possible lock contention and work schedule cost. Also with blk-mq, there is opportunity to get loop I/O merged before submitting to backend file/device. In the following test: - base: v3.19-rc2-2041231 - loop over file in ext4 file system on SSD disk - bs: 4k, libaio, io depth: 64, O_DIRECT, num of jobs: 1 - throughput: IOPS ------------------------------------------------------ | | base | base with loop-mq | delta | ------------------------------------------------------ | randread | 1740 | 25318 | +1355%| ------------------------------------------------------ | read | 42196 | 51771 | +22.6%| ----------------------------------------------------- | randwrite | 35709 | 34624 | -3% | ----------------------------------------------------- | write | 39137 | 40326 | +3% | ----------------------------------------------------- So loop-mq can improve throughput for both read and randread, meantime, performance of write and randwrite isn't hurted basically. Another benefit is that loop driver code gets simplified much after blk-mq conversion, and the patch can be thought as cleanup too. Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | | | Merge branch 'for-3.20/core' of git://git.kernel.dk/linux-blockLinus Torvalds2015-02-1230-575/+499
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block IO changes from Jens Axboe: "This contains: - A series from Christoph that cleans up and refactors various parts of the REQ_BLOCK_PC handling. Contributions in that series from Dongsu Park and Kent Overstreet as well. - CFQ: - A bug fix for cfq for realtime IO scheduling from Jeff Moyer. - A stable patch fixing a potential crash in CFQ in OOM situations. From Konstantin Khlebnikov. - blk-mq: - Add support for tag allocation policies, from Shaohua. This is a prep patch enabling libata (and other SCSI parts) to use the blk-mq tagging, instead of rolling their own. - Various little tweaks from Keith and Mike, in preparation for DM blk-mq support. - Minor little fixes or tweaks from me. - A double free error fix from Tony Battersby. - The partition 4k issue fixes from Matthew and Boaz. - Add support for zero+unprovision for blkdev_issue_zeroout() from Martin" * 'for-3.20/core' of git://git.kernel.dk/linux-block: (27 commits) block: remove unused function blk_bio_map_sg block: handle the null_mapped flag correctly in blk_rq_map_user_iov blk-mq: fix double-free in error path block: prevent request-to-request merging with gaps if not allowed blk-mq: make blk_mq_run_queues() static dm: fix multipath regression due to initializing wrong request cfq-iosched: handle failure of cfq group allocation block: Quiesce zeroout wrapper block: rewrite and split __bio_copy_iov() block: merge __bio_map_user_iov into bio_map_user_iov block: merge __bio_map_kern into bio_map_kern block: pass iov_iter to the BLOCK_PC mapping functions block: add a helper to free bio bounce buffer pages block: use blk_rq_map_user_iov to implement blk_rq_map_user block: simplify bio_map_kern block: mark blk-mq devices as stackable block: keep established cmd_flags when cloning into a blk-mq request block: add blk-mq support to blk_insert_cloned_request() block: require blk_rq_prep_clone() be given an initialized clone request blk-mq: add tag allocation policy ...
| * | | | | block: remove unused function blk_bio_map_sgChristoph Hellwig2015-02-112-31/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | | block: handle the null_mapped flag correctly in blk_rq_map_user_iovChristoph Hellwig2015-02-111-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tape drivers (and the sg driver in a special case that doesn't matter here) use the null_mapped flag to tell blk_rq_map_user to not copy around any data into or out of the bounce buffers. blk_rq_map_user_iov never got that treatment, which didn't matter until I refactored blk_rq_map_user to be implemented in terms of blk_rq_map_user_iov. Signed-off-by: Christoph Hellwig <hch@lst.de> Fixes: ddad8dd0a162 ("block: use blk_rq_map_user_iov to implement blk_rq_map_user") Signed-off-by: Jens Axboe <axboe@fb.com>
| * | | | | blk-mq: fix double-free in error pathTony Battersby2015-02-111-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the allocation of bt->bs fails, then bt->map can be freed twice, once in blk_mq_init_bitmap_tags() -> bt_alloc(), and once in blk_mq_init_bitmap_tags() -> bt_free(). Fix by setting the pointer to NULL after the first free. Cc: <stable@vger.kernel.org> Signed-off-by: Tony Battersby <tonyb@cybernetics.com> Signed-off-by: Jens Axboe <axboe@fb.com>
OpenPOWER on IntegriCloud