summaryrefslogtreecommitdiffstats
path: root/drivers/block
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'ceph-for-4.14-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2017-09-121-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "The highlights include: - a large series of fixes and improvements to the snapshot-handling code (Zheng Yan) - individual read/write OSD requests passed down to libceph are now limited to 16M in size to avoid hitting OSD-side limits (Zheng Yan) - encode MStatfs v2 message to allow for more accurate space usage reporting (Douglas Fuller) - switch to the new writeback error tracking infrastructure (Jeff Layton)" * tag 'ceph-for-4.14-rc1' of git://github.com/ceph/ceph-client: (35 commits) ceph: stop on-going cached readdir if mds revokes FILE_SHARED cap ceph: wait on writeback after writing snapshot data ceph: fix capsnap dirty pages accounting ceph: ignore wbc->range_{start,end} when write back snapshot data ceph: fix "range cyclic" mode writepages ceph: cleanup local variables in ceph_writepages_start() ceph: optimize pagevec iterating in ceph_writepages_start() ceph: make writepage_nounlock() invalidate page that beyonds EOF ceph: properly get capsnap's size in get_oldest_context() ceph: remove stale check in ceph_invalidatepage() ceph: queue cap snap only when snap realm's context changes ceph: handle race between vmtruncate and queuing cap snap ceph: fix message order check in handle_cap_export() ceph: fix NULL pointer dereference in ceph_flush_snaps() ceph: adjust 36 checks for NULL pointers ceph: delete an unnecessary return statement in update_dentry_lease() ceph: ENOMEM pr_err in __get_or_create_frag() is redundant ceph: check negative offsets in ceph_llseek() ceph: more accurate statfs ceph: properly set snap follows for cap reconnect ...
| * rbd: silence bogus uninitialized use warning in rbd_acquire_lock()Kefeng Wang2017-09-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | drivers/block/rbd.c: In function 'rbd_acquire_lock': drivers/block/rbd.c:3602:44: error: 'ret' may be used uninitialized in this function [-Werror=maybe-uninitialized] Silence the warning, found it when built old kernel(3.10) with OBS(opensuse build service). Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | Merge branch 'for-4.14/block-postmerge' of git://git.kernel.dk/linux-blockLinus Torvalds2017-09-092-92/+109
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull followup block layer updates from Jens Axboe: "I ended up splitting the main pull request for this series into two, mainly because of clashes between NVMe fixes that went into 4.13 after the for-4.14 branches were split off. This pull request is mostly NVMe, but not exclusively. In detail, it contains: - Two pull request for NVMe changes from Christoph. Nothing new on the feature front, basically just fixes all over the map for the core bits, transport, rdma, etc. - Series from Bart, cleaning up various bits in the BFQ scheduler. - Series of bcache fixes, which has been lingering for a release or two. Coly sent this in, but patches from various people in this area. - Set of patches for BFQ from Paolo himself, updating both documentation and fixing some corner cases in performance. - Series from Omar, attempting to now get the 4k loop support correct. Our confidence level is higher this time. - Series from Shaohua for loop as well, improving O_DIRECT performance and fixing a use-after-free" * 'for-4.14/block-postmerge' of git://git.kernel.dk/linux-block: (74 commits) bcache: initialize dirty stripes in flash_dev_run() loop: set physical block size to logical block size bcache: fix bch_hprint crash and improve output bcache: Update continue_at() documentation bcache: silence static checker warning bcache: fix for gc and write-back race bcache: increase the number of open buckets bcache: Correct return value for sysfs attach errors bcache: correct cache_dirty_target in __update_writeback_rate() bcache: gc does not work when triggering by manual command bcache: Don't reinvent the wheel but use existing llist API bcache: do not subtract sectors_to_gc for bypassed IO bcache: fix sequential large write IO bypass bcache: Fix leak of bdev reference block/loop: remove unused field block/loop: fix use after free bfq: Use icq_to_bic() consistently bfq: Suppress compiler warnings about comparisons bfq: Check kstrtoul() return value bfq: Declare local functions static ...
| * | loop: set physical block size to logical block sizeOmar Sandoval2017-09-061-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 6c6b6f28b333 ("loop: set physical block size to PAGE_SIZE") caused mkfs.xfs to barf on ppc64 [1]. Always using PAGE_SIZE as the physical block size still makes the most sense semantically, but let's just lie and always set it to the same value as the logical block size (same goes for io_min). In the future we might want to at least bump up io_min to PAGE_SIZE but I'm sick of these stupid changes so let's play it safe. 1: https://marc.info/?l=linux-xfs&m=150459024723753&w=2 Tested-by: Chandan Rajendra <chandan@linux.vnet.ibm.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block/loop: remove unused fieldShaohua Li2017-09-011-1/+0
| | | | | | | | | | | | | | | | | | | | | nobody uses the list. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block/loop: fix use after freeShaohua Li2017-09-012-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | lo_rw_aio->call_read_iter-> 1 aops->direct_IO 2 iov_iter_revert lo_rw_aio_complete could happen between 1 and 2, the bio and bvec could be freed before 2, which accesses bvec. Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block/loop: allow request merge for directio modeShaohua Li2017-09-012-15/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently loop disables merge. While it makes sense for buffer IO mode, directio mode can benefit from request merge. Without merge, loop could send small size IO to underlayer disk and harm performance. Reviewed-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block/loop: set hw_sectorsShaohua Li2017-09-011-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Loop can handle any size of request. Limiting it to 255 sectors just burns the CPU for bio split and request merge for underlayer disk and also cause bad fs block allocation in directio mode. Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | loop: fold loop_switch() into callersOmar Sandoval2017-08-311-65/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The comments here are really outdated, and blk-mq made flushing much simpler, so just fold the two cases into the callers. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | loop: add ioctl for changing logical block sizeOmar Sandoval2017-08-311-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a different approach from the first attempt in f2c6df7dbf9a ("loop: support 4k physical blocksize"). Rather than extending LOOP_{GET,SET}_STATUS, add a separate ioctl just for setting the block size. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | loop: set physical block size to PAGE_SIZEOmar Sandoval2017-08-311-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The physical block size is "the lowest possible sector size that the hardware can operate on without reverting to read-modify-write operations" (from the comment on blk_queue_physical_block_size()). Since loop does buffered I/O on the backing file by default, the RMW unit is a page. This isn't the case for direct I/O mode, but let's keep it simple. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | loop: get rid of lo_blocksizeOmar Sandoval2017-08-312-9/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is only used for setting the soft block size on the struct block_device once and then never used again. Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Omar Sandoval <osandov@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | Merge tag 'v4.13-rc7' into for-4.14/block-postmergeJens Axboe2017-08-286-48/+82
| |\ \ | | | | | | | | | | | | | | | | | | | | Linux 4.13-rc7 Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | drivers/block/zram/zram_drv.c: convert to using memset_lMatthew Wilcox2017-09-081-11/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zram was the motivation for creating memset_l(). Minchan Kim sees a 7% performance improvement on x86 with 100MB of non-zero deduplicatable data: perf stat -r 10 dd if=/dev/zram0 of=/dev/null vanilla: 0.232050465 seconds time elapsed ( +- 0.51% ) memset_l: 0.217219387 seconds time elapsed ( +- 0.07% ) Link: http://lkml.kernel.org/r/20170720184539.31609-7-willy@infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Tested-by: Minchan Kim <minchan@kernel.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: David Miller <davem@davemloft.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Richard Henderson <rth@twiddle.net> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2017-09-077-7877/+0
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This is mostly updates of the usual suspects: lpfc, qla2xxx, hisi_sas, megaraid_sas, zfcp and a host of minor updates. The major driver change here is the elimination of the block based cciss driver in favour of the SCSI based hpsa driver (which now drives all the legacy cases cciss used to be required for). Plus a reset handler clean up and the redo of the SAS SMP handler to use bsg lib" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (279 commits) scsi: scsi-mq: Always unprepare before requeuing a request scsi: Show .retries and .jiffies_at_alloc in debugfs scsi: Improve requeuing behavior scsi: Call scsi_initialize_rq() for filesystem requests scsi: qla2xxx: Reset the logo flag, after target re-login. scsi: qla2xxx: Fix slow mem alloc behind lock scsi: qla2xxx: Clear fc4f_nvme flag scsi: qla2xxx: add missing includes for qla_isr scsi: qla2xxx: Fix an integer overflow in sysfs code scsi: aacraid: report -ENOMEM to upper layer from aac_convert_sgraw2() scsi: aacraid: get rid of one level of indentation scsi: aacraid: fix indentation errors scsi: storvsc: fix memory leak on ring buffer busy scsi: scsi_transport_sas: switch to bsg-lib for SMP passthrough scsi: smartpqi: remove the smp_handler stub scsi: hpsa: remove the smp_handler stub scsi: bsg-lib: pass the release callback through bsg_setup_queue scsi: Rework handling of scsi_device.vpd_pg8[03] scsi: Rework the code for caching Vital Product Data (VPD) scsi: rcu: Introduce rcu_swap_protected() ...
| * | | | scsi: cciss: Drop obsolete driverHannes Reinecke2017-08-247-7877/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The hpsa driver now has support for all boards the cciss driver used to support, so this patch removes the cciss driver and make hpsa an alias to cciss. Signed-off-by: Hannes Reinecke <hare@suse.com> Acked-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | | | | Merge branch 'for-4.14/block' of git://git.kernel.dk/linux-blockLinus Torvalds2017-09-0728-2672/+2404
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block layer updates from Jens Axboe: "This is the first pull request for 4.14, containing most of the code changes. It's a quiet series this round, which I think we needed after the churn of the last few series. This contains: - Fix for a registration race in loop, from Anton Volkov. - Overflow complaint fix from Arnd for DAC960. - Series of drbd changes from the usual suspects. - Conversion of the stec/skd driver to blk-mq. From Bart. - A few BFQ improvements/fixes from Paolo. - CFQ improvement from Ritesh, allowing idling for group idle. - A few fixes found by Dan's smatch, courtesy of Dan. - A warning fixup for a race between changing the IO scheduler and device remova. From David Jeffery. - A few nbd fixes from Josef. - Support for cgroup info in blktrace, from Shaohua. - Also from Shaohua, new features in the null_blk driver to allow it to actually hold data, among other things. - Various corner cases and error handling fixes from Weiping Zhang. - Improvements to the IO stats tracking for blk-mq from me. Can drastically improve performance for fast devices and/or big machines. - Series from Christoph removing bi_bdev as being needed for IO submission, in preparation for nvme multipathing code. - Series from Bart, including various cleanups and fixes for switch fall through case complaints" * 'for-4.14/block' of git://git.kernel.dk/linux-block: (162 commits) kernfs: checking for IS_ERR() instead of NULL drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_set drbd: Fix allyesconfig build, fix recent commit drbd: switch from kmalloc() to kmalloc_array() drbd: abort drbd_start_resync if there is no connection drbd: move global variables to drbd namespace and make some static drbd: rename "usermode_helper" to "drbd_usermode_helper" drbd: fix race between handshake and admin disconnect/down drbd: fix potential deadlock when trying to detach during handshake drbd: A single dot should be put into a sequence. drbd: fix rmmod cleanup, remove _all_ debugfs entries drbd: Use setup_timer() instead of init_timer() to simplify the code. drbd: fix potential get_ldev/put_ldev refcount imbalance during attach drbd: new disk-option disable-write-same drbd: Fix resource role for newly created resources in events2 drbd: mark symbols static where possible drbd: Send P_NEG_ACK upon write error in protocol != C drbd: add explicit plugging when submitting batches drbd: change list_for_each_safe to while(list_first_entry_or_null) drbd: introduce drbd_recv_header_maybe_unplug ...
| * | | | | drbd: remove BIOSET_NEED_RESCUER flag from drbd_{md_,}io_bio_setNeilBrown2017-08-301-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Careful analysis shows that this flag is not needed. The RESCUER flag is only needed when a make_request_fn might: - allocate a bio from the bioset - submit it with generic_make_request() or similar - allocate another bio from the bioset The second allocation can block until the first bio is processed, so a rescuer is needed to ensure the first bio does get processed. With a rescuer it will only get processed when the make_request_fn completes. In drbd, allocations from drbd_io_bio_set happen from drbd_new_req() or w_restart_disk_io() which is only called to handle RESTART_FROZEN_DISK_IO. In former is called precisely once from the make_request_fn. The later is never called by within the make_request_fn. So there cannot be two allocations in the same call to the make_request_fn, so a rescuer is not needed. Allocations from drbd_md_io_bio_set are used for IO to the bitmap and the activity log. There are only accessed from worker threads and workqueues, never directly from make_request_fn. Again, the rescuer isn't needed. Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: Fix allyesconfig build, fix recent commitPhilipp Reisner2017-08-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Globals where prefixed with drbd_, that was missed in the in #ifdef'nd code when it is built-in. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Fixes: 183ece30053f ("drbd: move global variables to drbd namespace and make some static") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: switch from kmalloc() to kmalloc_array()Roland Kammerer2017-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We had one call to kmalloc that actually allocates an array. Switch that one to the kmalloc_array() function. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: abort drbd_start_resync if there is no connectionRoland Kammerer2017-08-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was found by a static analysis tool. While highly unlikely, be sure to return without dereferencing the NULL pointer. Reported-by: Shaobo <shaobo@cs.utah.edu> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: move global variables to drbd namespace and make some staticRoland Kammerer2017-08-294-49/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a follow-up to Gregs complaints that drbd clutteres the global namespace. Some of DRBD's module parameters are only used within one compilation unit. Make these static. Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: rename "usermode_helper" to "drbd_usermode_helper"Greg Kroah-Hartman2017-08-293-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Nothing like having a very generic global variable in a tiny driver subsystem to make a mess of the global namespace... Note, there are many other "generic" named global variables in the drbd subsystem, someone should fix those up one day before they hit a linking error. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: fix race between handshake and admin disconnect/downLars Ellenberg2017-08-291-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | conn_try_disconnect() could potentialy hit the BUG_ON() in _conn_set_state() where it iterates over _drbd_set_state() and "asserts" via BUG_ON() that the latter was successful. If the STATE_SENT bit was not yet visible to conn_is_valid_transition() early in _conn_request_state(), but became visible before conn_set_state() later in that call path, we could hit the BUG_ON() after _drbd_set_state(), because it returned SS_IN_TRANSIENT_STATE. To avoid that race, we better protect set_bit(SENT_STATE) with the spinlock. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: fix potential deadlock when trying to detach during handshakeLars Ellenberg2017-08-293-23/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When requesting a detach, we first suspend IO, and also inhibit meta-data IO by means of drbd_md_get_buffer(), because we don't want to "fail" the disk while there is IO in-flight: the transition into D_FAILED for detach purposes may get misinterpreted as actual IO error in a confused endio function. We wrap it all into wait_event(), to retry in case the drbd_req_state() returns SS_IN_TRANSIENT_STATE, as it does for example during an ongoing connection handshake. In that example, the receiver thread may need to grab drbd_md_get_buffer() during the handshake to make progress. To avoid potential deadlock with detach, detach needs to grab and release the meta data buffer inside of that wait_event retry loop. To avoid lock inversion between mutex_lock(&device->state_mutex) and drbd_md_get_buffer(device), introduce a new enum chg_state_flag CS_INHIBIT_MD_IO, and move the call to drbd_md_get_buffer() inside the state_mutex grabbed in drbd_req_state(). Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: A single dot should be put into a sequence.Markus Elfring2017-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thus use the corresponding function "seq_putc". This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: fix rmmod cleanup, remove _all_ debugfs entriesLars Ellenberg2017-08-291-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there are still resources defined, but "empty", no more volumes or connections configured, they don't hold module reference counts, so rmmod is possible. To avoid DRBD leftovers in debugfs, we need to call our global drbd_debugfs_cleanup() only after all resources have been cleaned up. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: Use setup_timer() instead of init_timer() to simplify the code.Geliang Tang2017-08-291-12/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Geliang Tang <geliangtang@gmail.com> Signed-off-by: Roland Kammerer <roland.kammerer@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: fix potential get_ldev/put_ldev refcount imbalance during attachLars Ellenberg2017-08-291-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Race: drbd_adm_attach() | async drbd_md_endio() | device->ldev is still NULL. | | drbd_md_read( | .endio = drbd_md_endio; | submit; | .... | wait for done == 1; | done = 1; ); | wake_up(); .. lot of other stuff, | .. includeing taking and | ...giving up locks, | .. doing further IO, | .. stuff that takes "some time" | | while in this context, | this is the next statement. | which means this context was scheduled .. only then, finally, | away for "some time". device->ldev = nbc; | | if (device->ldev) | put_ldev() Unlikely, but possible. I was able to provoke it "reliably" by adding an mdelay(500); after the wake_up(). Fixed by moving the if (!NULL) put_ldev() before done = 1; Impact of the bug was that the resulting refcount imbalance could lead to premature destruction of the object, potentially causing a NULL pointer dereference during a subsequent detach. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: new disk-option disable-write-sameLars Ellenberg2017-08-291-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some backend devices claim to support write-same, but would fail actual write-same requests. Allow to set (or toggle) whether or not DRBD tries to support write-same. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: Fix resource role for newly created resources in events2Philipp Reisner2017-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conn_higest_role() (a terribly misnamed function) returns the role of the resource. It returned R_UNKNOWN as long as the resource has not a single device. Resources without devices are short living objects. But it matters for the NOTIFY_CREATE netwlink message. It makes a lot more sense to report R_SECONDARY for the newly created resource than R_UNKNOWN. I reviewd all call sites of conn_highest_role(), that change does not matter for the other call sites. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: mark symbols static where possibleBaoyou Xie2017-08-293-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We get a few warnings when building kernel with W=1: drbd/drbd_receiver.c:1224:6: warning: no previous prototype for 'one_flush_endio' [-Wmissing-prototypes] drbd/drbd_req.c:1450:6: warning: no previous prototype for 'send_and_submit_pending' [-Wmissing-prototypes] drbd/drbd_main.c:924:6: warning: no previous prototype for 'assign_p_sizes_qlim' [-Wmissing-prototypes] .... In fact, these functions are only used in the file in which they are declared and don't need a declaration, but can be made static. So this patch marks these functions with 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: Send P_NEG_ACK upon write error in protocol != CLars Ellenberg2017-08-291-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In protocol != C, we forgot to send the P_NEG_ACK for failing writes. Once we no longer submit to local disk, because we already "detached", due to the typical "on-io-error detach;" config setting, we already send the neg acks right away. Only those requests that have been submitted, and have been error-completed by the local disk, would forget to send the neg-ack, and only in asynchronous replication (protocol != C). Unless this happened during resync, where we already always send acks, regardless of protocol. The primary side needs the P_NEG_ACK in order to mark the affected block(s) for resync in its out-of-sync bitmap. If the blocks in question are not re-written again, we may miss to resync them later, causing data inconsistencies. This patch will always send the neg-acks, and also at least try to persist the out-of-sync status on the local node already. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: add explicit plugging when submitting batchesLars Ellenberg2017-08-291-6/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When submitting batches of requests which had been queued on the submitter thread, typically because they needed to wait for an activity log transactions, use explicit plugging to help potential merging of requests in the backend io-scheduler. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: change list_for_each_safe to while(list_first_entry_or_null)Lars Ellenberg2017-08-291-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Two instances of list_for_each_safe can drop their tmp element, they really just peel off each element in turn from the start of the list. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | drbd: introduce drbd_recv_header_maybe_unplugLars Ellenberg2017-08-296-9/+139
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently, drbd_recv_header() was changed to potentially implicitly "unplug" the backend device(s), in case there is currently nothing to receive. Be more explicit about it: re-introduce the original drbd_recv_header(), and introduce a new drbd_recv_header_maybe_unplug() for use by the receiver "main loop". Using explicit plugging via blk_start_plug(); blk_finish_plug(); really helps the io-scheduler of the backend with merging requests. Wrap the receiver "main loop" with such a plug. Also catch unplug events on the Primary, and try to propagate. This is performance relevant. Without this, if the receiving side does not merge requests, number of IOPS on the peer can me significantly higher than IOPS on the Primary, and can easily become the bottleneck. Together, both changes should help to reduce the number of IOPS as seen on the backend of the receiving side, by increasing the chance of merging mergable requests, without trading latency for more throughput. Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | skd: Let the block layer core choose .nr_requestsBart Van Assche2017-08-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since blk_mq_init_queue() initializes .nr_requests to the tag set size and since that value is a good default for the skd driver, do not overwrite the value set by blk_mq_init_queue(). This change doubles the default value of .nr_requests. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | skd: Remove blk_queue_bounce_limit() callBart Van Assche2017-08-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since sTec s1120 devices support 64-bit DMA it is not necessary to request data buffer bouncing. Hence remove the blk_queue_bounce_limit() call. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | nbd: make device_attribute constBhumika Goyal2017-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make this const as is is only passed as an argument to the function device_create_file and device_remove_file and the corresponding arguments are of type const. Done using Coccinelle Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | null_blk: use available 'dev' in nullb_device_power_store()Jens Axboe2017-08-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We already have this pointer, no need to use to_nullb_device() again. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | | block/nullb: delete unnecessary memory freeShaohua Li2017-08-281-1/+0
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2984c86(nullb: factor disk parameters) has a typo. The nullb_device allocation/free is done outside of null_add_dev. The commit accidentally frees the nullb_device in error code path. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | skd: Remove SKD_ID_INCRBart Van Assche2017-08-251-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The SKD_ID_INCR flag in skd_request_context.id duplicates information that is already available otherwise, e.g. through the block layer request state and through skd_request_context.state. Hence remove the code that manipulates this flag and also the flag itself. Since skd_isr_completion_posted() only uses the lower bits of skd_request_context.id as hardware tag, this patch does not change the behavior of the skd driver. I'm referring to the following code: tag = req_id & SKD_ID_SLOT_AND_TABLE_MASK; Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | skd: Make it easier for static analyzers to analyze skd_free_disk()Bart Van Assche2017-08-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although it is easy to see that skdev->disk != NULL if skdev->queue != NULL, add a test for skdev->disk to avoid that smatch reports the following warning: drivers/block/skd_main.c:3080 skd_free_disk() error: we previously assumed 'disk' could be null (see line 3074) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | skd: Inline skd_end_request()Bart Van Assche2017-08-251-32/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is not worth to keep the debug statements in skd_end_request(). Without debug statements that function only consists of two statements. Hence inline skd_end_request(). Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | skd: Rename skd_softirq_done() into skd_complete_rq()Bart Van Assche2017-08-251-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The latter name follows more closely the function names used in other blk-mq drivers. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | block/nullb: fix NULL dereferenceShaohua Li2017-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan reported this: The patch 2984c8684f96: "nullb: factor disk parameters" from Aug 14, 2017, leads to the following Smatch complaint: drivers/block/null_blk.c:1759 null_init_tag_set() error: we previously assumed 'nullb' could be null (see line 1750) 1755 set->cmd_size = sizeof(struct nullb_cmd); 1756 set->flags = BLK_MQ_F_SHOULD_MERGE; 1757 set->driver_data = NULL; 1758 1759 if (nullb->dev->blocking) ^^^^^^^^^^^^^^^^^^^^ And an unchecked dereference. nullb could be NULL here. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | null_blk: update email adressJens Axboe2017-08-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update to a working one, the fusionio address hasn't been valid in 4 years. Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | block: replace bi_bdev with a gendisk pointer and partitions indexChristoph Hellwig2017-08-2310-20/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This way we don't need a block_device structure to submit I/O. The block_device has different life time rules from the gendisk and request_queue and is usually only available when the block device node is open. Other callers need to explicitly create one (e.g. the lightnvm passthrough code, or the new nvme multipathing code). For the actual I/O path all that we need is the gendisk, which exists once per block device. But given that the block layer also does partition remapping we additionally need a partition index, which is used for said remapping in generic_make_request. Note that all the block drivers generally want request_queue or sometimes the gendisk, so this removes a layer of indirection all over the stack. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | skd: Change default interrupt mode to MSI-XBart Van Assche2017-08-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since MSI support on some motherboards is unreliable, change the default interrupt mode from MSI to MSI-X. This patch avoids that the following message appears sporadially in the kernel logs of my test setup: do_IRQ: 3.193 No irq handler for vector Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | skd: Avoid double completions in case of a timeoutBart Van Assche2017-08-231-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid that normal request completion and the timeout handler can run concurrently by calling blk_mq_complete_request() instead of blk_mq_end_request() from skd_end_request(). Avoid that the block layer can reuse a request while the firmware is still processing it. Convert skd_softirq_done() to blk-mq. Pass the pointer to skd_softirq_done() to the block layer core through blk_mq_ops.complete instead of by calling blk_queue_softirq_done(). Pass the pointer to skd_timed_out() to the block layer core through blk_mq_ops.timeout instead of by calling blk_queue_timed_out(). The timeout handler has been tested as follows: echo 1 > /sys/block/skd0/io-timeout-fail && (cd /sys/kernel/debug/fail_io_timeout && echo 100 > probability && echo N > task-filter && echo 1 > times) Fixes: commit a74d5b76fab9 ("skd: Switch to block layer timeout mechanism") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
OpenPOWER on IntegriCloud