summaryrefslogtreecommitdiffstats
path: root/drivers/block
Commit message (Collapse)AuthorAgeFilesLines
* zram: restrict add/remove attributes to root onlySergey Senozhatsky2016-12-071-1/+7
| | | | | | | | | | | | | | | | | | | zram hot_add sysfs attribute is a very 'special' attribute - reading from it creates a new uninitialized zram device. This file, by a mistake, can be read by a 'normal' user at the moment, while only root must be able to create a new zram device, therefore hot_add attribute must have S_IRUSR mode, not S_IRUGO. [akpm@linux-foundation.org: s/sence/sense/, reflow comment to use 80 cols] Fixes: 6566d1a32bf72 ("zram: add dynamic device add/remove functionality") Link: http://lkml.kernel.org/r/20161205155845.20129-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Steven Allen <steven@stebalien.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> [4.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* zram: fix unbalanced idr management at hot removalTakashi Iwai2016-11-301-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The zram hot removal code calls idr_remove() even when zram_remove() returns an error (typically -EBUSY). This results in a leftover at the device release, eventually leading to a crash when the module is reloaded. As described in the bug report below, the following procedure would cause an Oops with zram: - provision three zram devices via modprobe zram num_devices=3 - configure a size for each device + echo "1G" > /sys/block/$zram_name/disksize - mkfs and mount zram0 only - attempt to hot remove all three devices + echo 2 > /sys/class/zram-control/hot_remove + echo 1 > /sys/class/zram-control/hot_remove + echo 0 > /sys/class/zram-control/hot_remove - zram0 removal fails with EBUSY, as expected - unmount zram0 - try zram0 hot remove again + echo 0 > /sys/class/zram-control/hot_remove - fails with ENODEV (unexpected) - unload zram kernel module + completes successfully - zram0 device node still exists - attempt to mount /dev/zram0 + mount command is killed + following BUG is encountered BUG: unable to handle kernel paging request at ffffffffa0002ba0 IP: get_disk+0x16/0x50 Oops: 0000 [#1] SMP CPU: 0 PID: 252 Comm: mount Not tainted 4.9.0-rc6 #176 Call Trace: exact_lock+0xc/0x20 kobj_lookup+0xdc/0x160 get_gendisk+0x2f/0x110 __blkdev_get+0x10c/0x3c0 blkdev_get+0x19d/0x2e0 blkdev_open+0x56/0x70 do_dentry_open.isra.19+0x1ff/0x310 vfs_open+0x43/0x60 path_openat+0x2c9/0xf30 do_filp_open+0x79/0xd0 do_sys_open+0x114/0x1e0 SyS_open+0x19/0x20 entry_SYSCALL_64_fastpath+0x13/0x94 This patch adds the proper error check in hot_remove_store() not to call idr_remove() unconditionally. Fixes: 17ec4cd98578 ("zram: don't call idr_remove() from zram_remove()") Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=1010970 Link: http://lkml.kernel.org/r/20161121132140.12683-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Reviewed-by: David Disseldorp <ddiss@suse.de> Reported-by: David Disseldorp <ddiss@suse.de> Tested-by: David Disseldorp <ddiss@suse.de> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: <stable@vger.kernel.org> [4.4+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* aoe: fix crash in page count manipulationJens Axboe2016-11-121-41/+0
| | | | | | | | | | aoeblk contains some mysterious code, that wants to elevate the bio vec page counts while it's under IO. That is not needed, it's fragile, and it's causing kernel oopses for some. Reported-by: Tested-by: Don Koch <kochd@us.ibm.com> Tested-by: Tested-by: Don Koch <kochd@us.ibm.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* drbd: Fix kernel_sendmsg() usage - potential NULL derefRichard Weinberger2016-11-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't pass a size larger than iov_len to kernel_sendmsg(). Otherwise it will cause a NULL pointer deref when kernel_sendmsg() returns with rv < size. DRBD as external module has been around in the kernel 2.4 days already. We used to be compatible to 2.4 and very early 2.6 kernels, we used to use rv = sock_sendmsg(sock, &msg, iov.iov_len); then later changed to rv = kernel_sendmsg(sock, &msg, &iov, 1, size); when we should have used rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len); tcp_sendmsg() used to totally ignore the size parameter. 57be5bd ip: convert tcp_sendmsg() to iov_iter primitives changes that, and exposes our long standing error. Even with this error exposed, to trigger the bug, we would need to have an environment (config or otherwise) causing us to not use sendpage() for larger transfers, a failing connection, and have it fail "just at the right time". Apparently that was unlikely enough for most, so this went unnoticed for years. Still, it is known to trigger at least some of these, and suspected for the others: [0] http://lists.linbit.com/pipermail/drbd-user/2016-July/023112.html [1] http://lists.linbit.com/pipermail/drbd-dev/2016-March/003362.html [2] https://forums.grsecurity.net/viewtopic.php?f=3&t=4546 [3] https://ubuntuforums.org/showthread.php?t=2336150 [4] http://e2.howsolveproblem.com/i/1175162/ This should go into 4.9, and into all stable branches since and including v4.0, which is the first to contain the exposing change. It is correct for all stable branches older than that as well (which contain the DRBD driver; which is 2.6.33 and up). It requires a small "conflict" resolution for v4.4 and earlier, with v4.5 we dropped the comment block immediately preceding the kernel_sendmsg(). Fixes: b411b3637fa7 ("The DRBD driver") Cc: <stable@vger.kernel.org> # 2.6.33.x- Cc: viro@zeniv.linux.org.uk Cc: christoph.lechleitner@iteg.at Cc: wolfgang.glas@iteg.at Reported-by: Christoph Lechleitner <christoph.lechleitner@iteg.at> Tested-by: Christoph Lechleitner <christoph.lechleitner@iteg.at> Signed-off-by: Richard Weinberger <richard@nod.at> [changed oneliner to be "obvious" without context; more verbose message] Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* nbd: Fix error handlingChristophe JAILLET2016-11-061-1/+1
| | | | | | | | | | 'blk_mq_alloc_request()' returns an error pointer in case of error, not NULL. So test it with IS_ERR. Fixes: fd8383fd88a2 ("nbd: convert to blkmq") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Jens Axboe <axboe@fb.com>
* virtio_blk: Delete an unnecessary initialisation in init_vq()Markus Elfring2016-10-311-1/+1
| | | | | | | | | The local variable "err" will be set to an appropriate value by a following statement. Thus omit the explicit initialisation at the beginning. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* virtio_blk: Use kmalloc_array() in init_vq()Markus Elfring2016-10-311-4/+4
| | | | | | | | | | | Multiplications for the size determination of memory allocations indicated that array data structures should be processed. Thus use the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
* block: DAC960: print a hex number after a 0x prefixUwe Kleine-König2016-10-271-2/+2
| | | | | | | | | | It makes the message hard to interpret correctly if a base 10 number is prefixed by 0x. So change to a hex number. Link: http://lkml.kernel.org/r/20161026125658.25728-3-u.kleine-koenig@pengutronix.de Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* nbd: fix incorrect unlock of nbd->sock_lock in sock_shutdownJohn W. Linville2016-10-241-1/+1
| | | | | | | | | | | | | | | | Commit 0eadf37afc250 ("nbd: allow block mq to deal with timeouts") changed normal usage of nbd->sock_lock to use spin_lock/spin_unlock rather than the *_irq variants, but it missed this unlock in an error path. Found by Coverity, CID 1373871. Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Jens Axboe <axboe@fb.com> Cc: Markus Pargmann <mpa@pengutronix.de> Fixes: 0eadf37afc250 ("nbd: allow block mq to deal with timeouts") Signed-off-by: Jens Axboe <axboe@fb.com>
* rbd: don't retry watch reregistration if header object is goneIlya Dryomov2016-10-151-1/+1
| | | | | | | | | If the header object gets deleted (perhaps along with the entire pool), there is no point in attempting to reregister the watch. Treat this the same as blacklisting: fail all pending and new I/Os requiring the lock. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* rbd: don't wait for the lock forever if blacklistedIlya Dryomov2016-10-151-17/+33
| | | | | | | | | | | -EBLACKLISTED from __rbd_register_watch() means that our ceph_client got blacklisted - we won't be able to restore the watch and reacquire the lock. Wake up and fail all outstanding requests waiting for the lock and arrange for all new requests that require the lock to fail immediately. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Mike Christie <mchristi@redhat.com>
* kthread: kthread worker API cleanupPetr Mladek2016-10-111-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A good practice is to prefix the names of functions by the name of the subsystem. The kthread worker API is a mix of classic kthreads and workqueues. Each worker has a dedicated kthread. It runs a generic function that process queued works. It is implemented as part of the kthread subsystem. This patch renames the existing kthread worker API to use the corresponding name from the workqueues API prefixed by kthread_: __init_kthread_worker() -> __kthread_init_worker() init_kthread_worker() -> kthread_init_worker() init_kthread_work() -> kthread_init_work() insert_kthread_work() -> kthread_insert_work() queue_kthread_work() -> kthread_queue_work() flush_kthread_work() -> kthread_flush_work() flush_kthread_worker() -> kthread_flush_worker() Note that the names of DEFINE_KTHREAD_WORK*() macros stay as they are. It is common that the "DEFINE_" prefix has precedence over the subsystem names. Note that INIT() macros and init() functions use different naming scheme. There is no good solution. There are several reasons for this solution: + "init" in the function names stands for the verb "initialize" aka "initialize worker". While "INIT" in the macro names stands for the noun "INITIALIZER" aka "worker initializer". + INIT() macros are used only in DEFINE() macros + init() functions are used close to the other kthread() functions. It looks much better if all the functions use the same scheme. + There will be also kthread_destroy_worker() that will be used close to kthread_cancel_work(). It is related to the init() function. Again it looks better if all functions use the same naming scheme. + there are several precedents for such init() function names, e.g. amd_iommu_init_device(), free_area_init_node(), jump_label_init_type(), regmap_init_mmio_clk(), + It is not an argument but it was inconsistent even before. [arnd@arndb.de: fix linux-next merge conflict] Link: http://lkml.kernel.org/r/20160908135724.1311726-1-arnd@arndb.de Link: http://lkml.kernel.org/r/1470754545-17632-3-git-send-email-pmladek@suse.com Suggested-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Petr Mladek <pmladek@suse.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jiri Kosina <jkosina@suse.cz> Cc: Borislav Petkov <bp@suse.de> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2016-10-102-264/+1179
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull Ceph updates from Ilya Dryomov: "The big ticket item here is support for rbd exclusive-lock feature, with maintenance operations offloaded to userspace (Douglas Fuller, Mike Christie and myself). Another block device bullet is a series fixing up layering error paths (myself). On the filesystem side, we've got patches that improve our handling of buffered vs dio write races (Neil Brown) and a few assorted fixes from Zheng. Also included a couple of random cleanups and a minor CRUSH update" * tag 'ceph-for-4.9-rc1' of git://github.com/ceph/ceph-client: (39 commits) crush: remove redundant local variable crush: don't normalize input of crush_ln iteratively libceph: ceph_build_auth() doesn't need ceph_auth_build_hello() libceph: use CEPH_AUTH_UNKNOWN in ceph_auth_build_hello() ceph: fix description for rsize and rasize mount options rbd: use kmalloc_array() in rbd_header_from_disk() ceph: use list_move instead of list_del/list_add ceph: handle CEPH_SESSION_REJECT message ceph: avoid accessing / when mounting a subpath ceph: fix mandatory flock check ceph: remove warning when ceph_releasepage() is called on dirty page ceph: ignore error from invalidate_inode_pages2_range() in direct write ceph: fix error handling of start_read() rbd: add rbd_obj_request_error() helper rbd: img_data requests don't own their page array rbd: don't call rbd_osd_req_format_read() for !img_data requests rbd: rework rbd_img_obj_exists_submit() error paths rbd: don't crash or leak on errors in rbd_img_obj_parent_read_full_callback() rbd: move bumping img_request refcount into rbd_obj_request_submit() rbd: mark the original request as done if stat request fails ...
| * rbd: use kmalloc_array() in rbd_header_from_disk()Markus Elfring2016-10-031-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | * A multiplication for the size determination of a memory allocation indicated that an array data structure should be processed. Thus use the corresponding function "kmalloc_array". This issue was detected by using the Coccinelle software. * Delete the local variable "size" which became unnecessary with this refactoring. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: add rbd_obj_request_error() helperIlya Dryomov2016-10-031-10/+18
| | | | | | | | | | | | | | | | Pull setting an error and marking a request done code into a new helper. obj_request_img_data_test() check isn't strictly needed right now, but makes it applicable to !img_data requests and a bit safer. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: img_data requests don't own their page arrayIlya Dryomov2016-10-031-8/+3
| | | | | | | | | | | | | | | | | | | | Move the check into rbd_obj_request_destroy() to avoid use-after-free on errors in rbd_img_request_fill(..., OBJ_REQUEST_PAGES, ...), where pages, owned by the caller, gets freed in rbd_img_request_fill(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: David Disseldorp <ddiss@suse.de>
| * rbd: don't call rbd_osd_req_format_read() for !img_data requestsIlya Dryomov2016-10-031-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Accessing obj_request->img_request union field is only valid for object requests associated with an image (i.e. if obj_request_img_data_test() returns true). rbd_osd_req_format_read() used to do more, but now it just sets osd_req->snap_id. Standalone and stat object requests always go to the HEAD revision and are fine with CEPH_NOSNAP set by libceph, so get around the invalid union field use by simply not calling rbd_osd_req_format_read() in those places. Reported-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: David Disseldorp <ddiss@suse.de>
| * rbd: rework rbd_img_obj_exists_submit() error pathsIlya Dryomov2016-10-031-20/+22
| | | | | | | | | | | | | | | | | | | | | | | | - don't put obj_request before rbd_obj_request_get() if rbd_obj_request_create() fails - don't leak pages if rbd_obj_request_create() fails - don't leak stat_request if rbd_osd_req_create() fails Reported-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: David Disseldorp <ddiss@suse.de>
| * rbd: don't crash or leak on errors in rbd_img_obj_parent_read_full_callback()Ilya Dryomov2016-10-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | - fix parent_length == img_request->xferred assert to not fire on copyup read failures - don't leak pages if copyup read fails or we can't allocate a new osd request Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: David Disseldorp <ddiss@suse.de>
| * rbd: move bumping img_request refcount into rbd_obj_request_submit()Ilya Dryomov2016-10-031-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 0f2d5be792b0 ("rbd: use reference counts for image requests") added rbd_img_request_get(), which rbd_img_request_fill() calls for each obj_request added to img_request. It was an urgent band-aid for the uglyness that is rbd_img_obj_callback() and none of the error paths were updated. Given that this img_request reference is meant to represent an obj_request that hasn't passed through rbd_img_obj_callback() yet, proper cleanup in appropriate destructors is a challenge. However, noting that if we don't get a chance to call rbd_obj_request_complete(), there is not going to be a call to rbd_img_obj_callback(), we can move rbd_img_request_get() into rbd_obj_request_submit() and fixup the two places that call rbd_obj_request_complete() directly and not through rbd_obj_request_submit() to temporarily bump img_request, so that rbd_img_obj_callback() can put as usual. This takes care of img_request leaks on errors on the submit side. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
| * rbd: mark the original request as done if stat request failsIlya Dryomov2016-10-031-13/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | If stat request fails with something other than -ENOENT (which just means that we need to copyup), the original object request is never marked as done and therefore never completed. Fix this by moving the mark done + complete snippet from rbd_img_obj_parent_read_full() into rbd_img_obj_exists_callback(). The former remains covered, as the latter is its only caller (through rbd_img_obj_request_submit()). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: David Disseldorp <ddiss@suse.de>
| * rbd: clean up asserts in rbd_img_obj_request_submit() helpersIlya Dryomov2016-10-031-20/+10
| | | | | | | | | | | | | | | | Assert once in rbd_img_obj_request_submit(). Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: David Disseldorp <ddiss@suse.de>
| * rbd: change rbd_obj_request_submit() signatureIlya Dryomov2016-10-031-47/+23
| | | | | | | | | | | | | | | | | | | | - osdc parameter is useless - starting with commit 5aea3dcd5021 ("libceph: a major OSD client update"), ceph_osdc_start_request() always returns success Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: David Disseldorp <ddiss@suse.de>
| * rbd: lock_on_read map optionIlya Dryomov2016-10-031-1/+12
| | | | | | | | | | | | | | | | | | Add a per-device option to acquire exclusive lock on reads (in addition to writes and discards). The use case is iSCSI, where it will be used to prevent execution of stale writes after the implicit failover. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Tested-by: Mike Christie <mchristi@redhat.com>
| * rbd: add force close optionMike Christie2016-08-241-9/+26
| | | | | | | | | | | | | | | | | | | | | | This adds a force close option, so we can force the unmapping of a rbd device that is open. If a path/device is blacklisted, apps like multipathd can map a new device and then unmap the old one. The unmapping cleanup would then be handled by the generic hotunplug code paths in multipahd like is done for iSCSI, FC/FCOE, SAS, etc. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: add 'config_info' sysfs rbd device attributeMike Christie2016-08-241-2/+21
| | | | | | | | | | | | | | | | | | Export the info used to setup the rbd image, so it can be used to remap the image. Signed-off-by: Mike Christie <mchristi@redhat.com> [idryomov@gmail.com: do_rbd_add() EH] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: add 'snap_id' sysfs rbd device attributeMike Christie2016-08-241-0/+10
| | | | | | | | | | | | | | Export snap id in sysfs, so tools like multipathd can use it in a uuid. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: add 'cluster_fsid' sysfs rbd device attributeMike Christie2016-08-241-0/+10
| | | | | | | | | | | | | | | | Export the cluster fsid, so tools like udev and multipath-tools can use it for part of the uuid. Signed-off-by: Mike Christie <mchristi@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: add 'client_addr' sysfs rbd device attributeIlya Dryomov2016-08-241-0/+13
| | | | | | | | | | | | | | | | | | Export client addr/nonce, so userspace can check if a image is being blacklisted. Signed-off-by: Mike Christie <mchristi@redhat.com> [idryomov@gmail.com: ceph_client_addr(), endianess fix] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: print capacity in decimal and features in hexIlya Dryomov2016-08-241-2/+3
| | | | | | | | | | | | | | | | With exclusive-lock added and more to come, print features into dmesg. Change capacity to decimal while at it. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com>
| * rbd: support for exclusive-lock featureIlya Dryomov2016-08-242-16/+807
| | | | | | | | | | | | | | | | | | | | | | Add basic support for RBD_FEATURE_EXCLUSIVE_LOCK feature. Maintenance operations (resize, snapshot create, etc) are offloaded to librbd via returning -EOPNOTSUPP - librbd should request the lock and execute the operation. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Tested-by: Mike Christie <mchristi@redhat.com>
| * rbd: retry watch re-registration periodicallyIlya Dryomov2016-08-241-29/+109
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Revamp watch code to support retrying watch re-registration: - add rbd_dev->watch_state for more robust errcb handling - store watch cookie separately to avoid dereferencing watch_handle which is set to NULL on unwatch - move re-register code into a delayed work and retry re-registration every second, unless the client is blacklisted Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Tested-by: Mike Christie <mchristi@redhat.com>
| * rbd: introduce a per-device ordered workqueueIlya Dryomov2016-08-241-80/+71
| | | | | | | | | | | | | | | | | | | | This is going to be used for re-registering watch requests and exclusive-lock tasks: acquire/request lock, notify-acquired, release lock, notify-released. Some refactoring in the map/unmap paths was necessary to give this workqueue a meaningful name: "rbdX-tasks". Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com>
| * libceph: rename ceph_client_id() -> ceph_client_gid()Ilya Dryomov2016-08-241-1/+1
| | | | | | | | | | | | | | | | It's gid / global_id in other places. Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Mike Christie <mchristi@redhat.com> Reviewed-by: Alex Elder <elder@linaro.org>
* | Merge branch 'for-4.9/block-irq' of git://git.kernel.dk/linux-blockLinus Torvalds2016-10-097-7/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull blk-mq irq/cpu mapping updates from Jens Axboe: "This is the block-irq topic branch for 4.9-rc. It's mostly from Christoph, and it allows drivers to specify their own mappings, and more importantly, to share the blk-mq mappings with the IRQ affinity mappings. It's a good step towards making this work better out of the box" * 'for-4.9/block-irq' of git://git.kernel.dk/linux-block: blk_mq: linux/blk-mq.h does not include all the headers it depends on blk-mq: kill unused blk_mq_create_mq_map() blk-mq: get rid of the cpumask in struct blk_mq_tags nvme: remove the post_scan callout nvme: switch to use pci_alloc_irq_vectors blk-mq: provide a default queue mapping for PCI device blk-mq: allow the driver to pass in a queue mapping blk-mq: remove ->map_queue blk-mq: only allocate a single mq_map per tag_set blk-mq: don't redistribute hardware queues on a CPU hotplug event
| * | blk-mq: remove ->map_queueChristoph Hellwig2016-09-156-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All drivers use the default, so provide an inline version of it. If we ever need other queue mapping we can add an optional method back, although supporting will also require major changes to the queue setup code. This provides better code generation, and better debugability as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | nbd: use BLK_MQ_F_BLOCKINGJosef Bacik2016-09-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We take a mutex when sending commands and send stuff over the network, we need to have queue_rq called asynchronously. Signed-off-by: Josef Bacik <jbacik@fb.com> Fixes: fd8383fd88a2 ("nbd: convert to blkmq") Signed-off-by: Jens Axboe <axboe@fb.com>
* | | lightnvm: control life of nvm_dev in driverMatias Bjørling2016-09-211-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LightNVM compatible device drivers does not have a method to expose LightNVM specific sysfs entries. To enable LightNVM sysfs entries to be exposed, lightnvm device drivers require a struct device to attach it to. To allow both the actual device driver and lightnvm sysfs entries to coexist, the device driver tracks the lifetime of the nvm_dev structure. This patch refactors NVMe and null_blk to handle the lifetime of struct nvm_dev, which eliminates the need for struct gendisk when a lightnvm compatible device is provided. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
* | | null_blk: refactor to support non-gendisk devicesMatias Bjørling2016-09-211-49/+61
|/ / | | | | | | | | | | | | | | | | | | | | | | With LightNVM enabled devices, the gendisk structure is not exposed to the user. This hides the device driver specific sysfs entries, and prevents binding of LightNVM geometry information to the device. Refactor the device registration process, so that gendisk and non-gendisk devices are easily managed. Signed-off-by: Matias Bjørling <m@bjorling.me> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nbd: allow block mq to deal with timeoutsJosef Bacik2016-09-081-37/+14
| | | | | | | | | | | | | | | | Instead of rolling our own timer, just utilize the blk mq req timeout and do the disconnect if any of our commands timeout. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nbd: use flags instead of boolJosef Bacik2016-09-081-8/+10
| | | | | | | | | | | | | | | | In preparation for some future changes, change a few of the state bools over to normal bits to set/clear properly. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nbd: don't shutdown sock with irq's disabledJosef Bacik2016-09-081-4/+15
| | | | | | | | | | | | | | | | | | | | We hit a warning when shutting down the nbd connection because we have irq's disabled. We don't really need to do the shutdown under the lock, just clear the nbd->sock. So do the shutdown outside of the irq. This gets rid of the warning. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | nbd: convert to blkmqJosef Bacik2016-09-081-208/+129
| | | | | | | | | | | | | | | | | | | | This moves NBD over to using blkmq, which allows us to get rid of the NBD wide queue lock and the async submit kthread. We will start with 1 hw queue for now, but I plan to add multiple tcp connection support in the future and we'll fix how we set the hwqueue's. Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | mtip32xx: mark symbols static where possibleBaoyou Xie2016-08-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | We get 1 warning when biuld kernel with W=1: drivers/block/mtip32xx/mtip32xx.c:3689:6: warning: no previous prototype for 'mtip_block_release' [-Wmissing-prototypes] In fact, this function is only used in the file in which it is declared and don't need a declaration, but can be made static. so this patch marks it 'static'. Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org> Signed-off-by: Jens Axboe <axboe@fb.com>
* | Revert "floppy: refactor open() flags handling"Jens Axboe2016-08-251-19/+15
| | | | | | | | This reverts commit 09954bad448791ef01202351d437abdd9497a804.
* | Revert "floppy: fix open(O_ACCMODE) for ioctl-only open"Jens Axboe2016-08-251-9/+12
| | | | | | | | This reverts commit ff06db1efb2ad6db06eb5b99b88a0c15a9cc9b0e.
* | xen-blkfront: free resources if xlvbd_alloc_gendisk failsBob Liu2016-08-191-1/+6
| | | | | | | | | | | | | | | | Current code forgets to free resources in the failure path of xlvbd_alloc_gendisk(), this patch fix it. Signed-off-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | xen-blkfront: introduce blkif_set_queue_limits()Bob Liu2016-08-191-38/+48
| | | | | | | | | | | | | | | | | | | | blk_mq_update_nr_hw_queues() reset all queue limits to default which it's not as xen-blkfront expected, introducing blkif_set_queue_limits() to reset limits with initial correct values. Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | xen-blkfront: fix places not updated after introducing 64KB page granularityBob Liu2016-08-191-2/+2
|/ | | | | | | | | Two places didn't get updated when 64KB page granularity was introduced, this patch fix them. Signed-off-by: Bob Liu <bob.liu@oracle.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds2016-08-111-18/+8
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull virtio/vhost fixes and cleanups from Michael Tsirkin: "Misc fixes and cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio/s390: deprecate old transport virtio/s390: keep early_put_chars virtio_blk: Fix a slient kernel panic virtio-vsock: fix include guard typo vhost/vsock: fix vhost virtio_vsock_pkt use-after-free 9p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc() virtio: fix error handling for debug builds virtio: fix memory leak in virtqueue_add()
OpenPOWER on IntegriCloud