summaryrefslogtreecommitdiffstats
path: root/drivers/block
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-clientLinus Torvalds2018-01-111-6/+12
|\ | | | | | | | | | | | | | | | | | | Pull ceph fixes from Ilya Dryomov: "Two rbd fixes for 4.12 and 4.2 issues respectively, marked for stable" * tag 'ceph-for-4.15-rc8' of git://github.com/ceph/ceph-client: rbd: set max_segments to USHRT_MAX rbd: reacquire lock should update lock owner client id
| * rbd: set max_segments to USHRT_MAXIlya Dryomov2018-01-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit d3834fefcfe5 ("rbd: bump queue_max_segments") bumped max_segments (unsigned short) to max_hw_sectors (unsigned int). max_hw_sectors is set to the number of 512-byte sectors in an object and overflows unsigned short for 32M (largest possible) objects, making the block layer resort to handing us single segment (i.e. single page or even smaller) bios in that case. Cc: stable@vger.kernel.org Fixes: d3834fefcfe5 ("rbd: bump queue_max_segments") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Alex Elder <elder@linaro.org>
| * rbd: reacquire lock should update lock owner client idFlorian Margaine2018-01-091-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | Otherwise, future operations on this RBD using exclusive-lock are going to require the lock from a non-existent client id. Cc: stable@vger.kernel.org Fixes: 14bb211d324d ("rbd: support updating the lock cookie without releasing the lock") Link: http://tracker.ceph.com/issues/19929 Signed-off-by: Florian Margaine <florian@platform.sh> [idryomov@gmail.com: rbd_set_owner_cid() call, __rbd_lock() helper] Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | loop: fix concurrent lo_open/lo_releaseLinus Torvalds2018-01-061-2/+8
|/ | | | | | | | | | | | | | 范龙飞 reports that KASAN can report a use-after-free in __lock_acquire. The reason is due to insufficient serialization in lo_release(), which will continue to use the loop device even after it has decremented the lo_refcnt to zero. In the meantime, another process can come in, open the loop device again as it is being shut down. Confusion ensues. Reported-by: 范龙飞 <long7573@126.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* null_blk: unalign call_single_dataJens Axboe2017-12-201-2/+2
| | | | | | | | | | | | | | | Commit 966a967116e6 randomly added alignment to this structure, but it's actually detrimental to performance of null_blk. Test case: Running on both the home and remote node shows a ~5% degradation in performance. While in there, move blk_status_t to the hole after the integer tag in the nullb_cmd structure. After this patch, we shrink the size from 192 to 152 bytes. Fixes: 966a967116e69 ("smp: Avoid using two cache lines for struct call_single_data") Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2017-12-011-1/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull block fixes from Jens Axboe: "A selection of fixes/changes that should make it into this series. This contains: - NVMe, two merges, containing: - pci-e, rdma, and fc fixes - Device quirks - Fix for a badblocks leak in null_blk - bcache fix from Rui Hua for a race condition regression where -EINTR was returned to upper layers that didn't expect it. - Regression fix for blktrace for a bug introduced in this series. - blktrace cleanup for cgroup id. - bdi registration error handling. - Small series with cleanups for blk-wbt. - Various little fixes for typos and the like. Nothing earth shattering, most important are the NVMe and bcache fixes" * 'for-linus' of git://git.kernel.dk/linux-block: (34 commits) nvme-pci: fix NULL pointer dereference in nvme_free_host_mem() nvme-rdma: fix memory leak during queue allocation blktrace: fix trace mutex deadlock nvme-rdma: Use mr pool nvme-rdma: Check remotely invalidated rkey matches our expected rkey nvme-rdma: wait for local invalidation before completing a request nvme-rdma: don't complete requests before a send work request has completed nvme-rdma: don't suppress send completions bcache: check return value of register_shrinker bcache: recover data from backing when data is clean bcache: Fix building error on MIPS bcache: add a comment in journal bucket reading nvme-fc: don't use bit masks for set/test_bit() numbers blk-wbt: fix comments typo blk-wbt: move wbt_clear_stat to common place in wbt_done blk-sysfs: remove NULL pointer checking in queue_wb_lat_store blk-wbt: remove duplicated setting in wbt_init nvme-pci: add quirk for delay before CHK RDY for WDC SN200 block: remove useless assignment in bio_split null_blk: fix dev->badblocks leak ...
| * null_blk: fix dev->badblocks leakDavid Disseldorp2017-11-221-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | null_alloc_dev() allocates memory for dev->badblocks, but cleanup currently only occurs in the configfs release codepath, missing a number of other places. This bug was found running the blktests block/010 test, alongside kmemleak: rapido1:/blktests# ./check block/010 ... rapido1:/blktests# echo scan > /sys/kernel/debug/kmemleak [ 306.966708] kmemleak: 32 new suspected memory leaks (see /sys/kernel/debug/kmemleak) rapido1:/blktests# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff88001f86d000 (size 4096): comm "modprobe", pid 231, jiffies 4294892415 (age 318.252s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814b0379>] kmemleak_alloc+0x49/0xa0 [<ffffffff810f180f>] kmem_cache_alloc+0x9f/0xe0 [<ffffffff8124e45f>] badblocks_init+0x2f/0x60 [<ffffffffa0019fae>] 0xffffffffa0019fae [<ffffffffa0021273>] nullb_device_badblocks_store+0x63/0x130 [null_blk] [<ffffffff810004cd>] do_one_initcall+0x3d/0x170 [<ffffffff8109fe0d>] do_init_module+0x56/0x1e9 [<ffffffff8109ebd7>] load_module+0x1c47/0x26a0 [<ffffffff8109f819>] SyS_finit_module+0xa9/0xd0 [<ffffffff814b4f60>] entry_SYSCALL_64_fastpath+0x13/0x94 Fixes: 2f54a613c942 ("nullb: badbblocks support") Reviewed-by: Shaohua Li <shli@fb.com> Signed-off-by: David Disseldorp <ddiss@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE castsKees Cook2017-11-212-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With all callbacks converted, and the timer callback prototype switched over, the TIMER_FUNC_TYPE cast is no longer needed, so remove it. Conversion was done with the following scripts: perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \ $(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u) perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \ $(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u) The now unused macros are also dropped from include/linux/timer.h. Signed-off-by: Kees Cook <keescook@chromium.org>
* | treewide: setup_timer() -> timer_setup() (2 field)Kees Cook2017-11-211-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This converts all remaining setup_timer() calls that use a nested field to reach a struct timer_list. Coccinelle does not have an easy way to match multiple fields, so a new script is needed to change the matches of "&_E->_timer" into "&_E->_field1._timer" in all the rules. spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup-2fields.cocci @fix_address_of depends@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _field1; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_field1._timer, NULL, _E); +timer_setup(&_E->_field1._timer, NULL, 0); | -setup_timer(&_E->_field1._timer, NULL, (_cast_data)_E); +timer_setup(&_E->_field1._timer, NULL, 0); | -setup_timer(&_E._field1._timer, NULL, &_E); +timer_setup(&_E._field1._timer, NULL, 0); | -setup_timer(&_E._field1._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._field1._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _field1; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_field1._timer, _callback, _E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, &_callback, _E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, (_cast_func)_callback, _E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, _callback, (_cast_data)_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._field1._timer, _callback, 0); | _E->_field1._timer@_stl.function = _callback; | _E->_field1._timer@_stl.function = &_callback; | _E->_field1._timer@_stl.function = (_cast_func)_callback; | _E->_field1._timer@_stl.function = (_cast_func)&_callback; | _E._field1._timer@_stl.function = _callback; | _E._field1._timer@_stl.function = &_callback; | _E._field1._timer@_stl.function = (_cast_func)_callback; | _E._field1._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _field1._timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _field1._timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _field1._timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _field1._timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _field1._timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _field1._timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _field1._timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_field1._timer, _callback, 0); +setup_timer(&_E->_field1._timer, _callback, (_cast_data)_E); | -timer_setup(&_E._field1._timer, _callback, 0); +setup_timer(&_E._field1._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_field1._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_field1._timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_field1._timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_field1._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._field1._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._field1._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._field1._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._field1._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._field1; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_field1._timer | -(_cast_data)&_E +&_E._field1._timer | -_E +&_E->_field1._timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _field1; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_field1._timer, _callback, 0); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, _callback, 0L); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E->_field1._timer, _callback, 0UL); +timer_setup(&_E->_field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, _callback, 0); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, _callback, 0L); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_E._field1._timer, _callback, 0UL); +timer_setup(&_E._field1._timer, _callback, 0); | -setup_timer(&_field1._timer, _callback, 0); +timer_setup(&_field1._timer, _callback, 0); | -setup_timer(&_field1._timer, _callback, 0L); +timer_setup(&_field1._timer, _callback, 0); | -setup_timer(&_field1._timer, _callback, 0UL); +timer_setup(&_field1._timer, _callback, 0); | -setup_timer(_field1._timer, _callback, 0); +timer_setup(_field1._timer, _callback, 0); | -setup_timer(_field1._timer, _callback, 0L); +timer_setup(_field1._timer, _callback, 0); | -setup_timer(_field1._timer, _callback, 0UL); +timer_setup(_field1._timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>
* | treewide: setup_timer() -> timer_setup()Kees Cook2017-11-217-22/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This converts all remaining cases of the old setup_timer() API into using timer_setup(), where the callback argument is the structure already holding the struct timer_list. These should have no behavioral changes, since they just change which pointer is passed into the callback with the same available pointers after conversion. It handles the following examples, in addition to some other variations. Casting from unsigned long: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... setup_timer(&ptr->my_timer, my_callback, ptr); and forced object casts: void my_callback(struct something *ptr) { ... } ... setup_timer(&ptr->my_timer, my_callback, (unsigned long)ptr); become: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... timer_setup(&ptr->my_timer, my_callback, 0); Direct function assignments: void my_callback(unsigned long data) { struct something *ptr = (struct something *)data; ... } ... ptr->my_timer.function = my_callback; have a temporary cast added, along with converting the args: void my_callback(struct timer_list *t) { struct something *ptr = from_timer(ptr, t, my_timer); ... } ... ptr->my_timer.function = (TIMER_FUNC_TYPE)my_callback; And finally, callbacks without a data assignment: void my_callback(unsigned long data) { ... } ... setup_timer(&ptr->my_timer, my_callback, 0); have their argument renamed to verify they're unused during conversion: void my_callback(struct timer_list *unused) { ... } ... timer_setup(&ptr->my_timer, my_callback, 0); The conversion is done with the following Coccinelle script: spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/timer_setup.cocci @fix_address_of@ expression e; @@ setup_timer( -&(e) +&e , ...) // Update any raw setup_timer() usages that have a NULL callback, but // would otherwise match change_timer_function_usage, since the latter // will update all function assignments done in the face of a NULL // function initialization in setup_timer(). @change_timer_function_usage_NULL@ expression _E; identifier _timer; type _cast_data; @@ ( -setup_timer(&_E->_timer, NULL, _E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E->_timer, NULL, (_cast_data)_E); +timer_setup(&_E->_timer, NULL, 0); | -setup_timer(&_E._timer, NULL, &_E); +timer_setup(&_E._timer, NULL, 0); | -setup_timer(&_E._timer, NULL, (_cast_data)&_E); +timer_setup(&_E._timer, NULL, 0); ) @change_timer_function_usage@ expression _E; identifier _timer; struct timer_list _stl; identifier _callback; type _cast_func, _cast_data; @@ ( -setup_timer(&_E->_timer, _callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, &_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, _E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, &_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)_E); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, (_cast_func)&_callback, (_cast_data)&_E); +timer_setup(&_E._timer, _callback, 0); | _E->_timer@_stl.function = _callback; | _E->_timer@_stl.function = &_callback; | _E->_timer@_stl.function = (_cast_func)_callback; | _E->_timer@_stl.function = (_cast_func)&_callback; | _E._timer@_stl.function = _callback; | _E._timer@_stl.function = &_callback; | _E._timer@_stl.function = (_cast_func)_callback; | _E._timer@_stl.function = (_cast_func)&_callback; ) // callback(unsigned long arg) @change_callback_handle_cast depends on change_timer_function_usage@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; identifier _handle; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { ( ... when != _origarg _handletype *_handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(_handletype *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg | ... when != _origarg _handletype *_handle; ... when != _handle _handle = -(void *)_origarg; +from_timer(_handle, t, _timer); ... when != _origarg ) } // callback(unsigned long arg) without existing variable @change_callback_handle_cast_no_arg depends on change_timer_function_usage && !change_callback_handle_cast@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _origtype; identifier _origarg; type _handletype; @@ void _callback( -_origtype _origarg +struct timer_list *t ) { + _handletype *_origarg = from_timer(_origarg, t, _timer); + ... when != _origarg - (_handletype *)_origarg + _origarg ... when != _origarg } // Avoid already converted callbacks. @match_callback_converted depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier t; @@ void _callback(struct timer_list *t) { ... } // callback(struct something *handle) @change_callback_handle_arg depends on change_timer_function_usage && !match_callback_converted && !change_callback_handle_cast && !change_callback_handle_cast_no_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; @@ void _callback( -_handletype *_handle +struct timer_list *t ) { + _handletype *_handle = from_timer(_handle, t, _timer); ... } // If change_callback_handle_arg ran on an empty function, remove // the added handler. @unchange_callback_handle_arg depends on change_timer_function_usage && change_callback_handle_arg@ identifier change_timer_function_usage._callback; identifier change_timer_function_usage._timer; type _handletype; identifier _handle; identifier t; @@ void _callback(struct timer_list *t) { - _handletype *_handle = from_timer(_handle, t, _timer); } // We only want to refactor the setup_timer() data argument if we've found // the matching callback. This undoes changes in change_timer_function_usage. @unchange_timer_function_usage depends on change_timer_function_usage && !change_callback_handle_cast && !change_callback_handle_cast_no_arg && !change_callback_handle_arg@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type change_timer_function_usage._cast_data; @@ ( -timer_setup(&_E->_timer, _callback, 0); +setup_timer(&_E->_timer, _callback, (_cast_data)_E); | -timer_setup(&_E._timer, _callback, 0); +setup_timer(&_E._timer, _callback, (_cast_data)&_E); ) // If we fixed a callback from a .function assignment, fix the // assignment cast now. @change_timer_function_assignment depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression change_timer_function_usage._E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_func; typedef TIMER_FUNC_TYPE; @@ ( _E->_timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -&_callback +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)_callback; +(TIMER_FUNC_TYPE)_callback ; | _E->_timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -&_callback; +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)_callback +(TIMER_FUNC_TYPE)_callback ; | _E._timer.function = -(_cast_func)&_callback +(TIMER_FUNC_TYPE)_callback ; ) // Sometimes timer functions are called directly. Replace matched args. @change_timer_function_calls depends on change_timer_function_usage && (change_callback_handle_cast || change_callback_handle_cast_no_arg || change_callback_handle_arg)@ expression _E; identifier change_timer_function_usage._timer; identifier change_timer_function_usage._callback; type _cast_data; @@ _callback( ( -(_cast_data)_E +&_E->_timer | -(_cast_data)&_E +&_E._timer | -_E +&_E->_timer ) ) // If a timer has been configured without a data argument, it can be // converted without regard to the callback argument, since it is unused. @match_timer_function_unused_data@ expression _E; identifier _timer; identifier _callback; @@ ( -setup_timer(&_E->_timer, _callback, 0); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0L); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E->_timer, _callback, 0UL); +timer_setup(&_E->_timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0L); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_E._timer, _callback, 0UL); +timer_setup(&_E._timer, _callback, 0); | -setup_timer(&_timer, _callback, 0); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0L); +timer_setup(&_timer, _callback, 0); | -setup_timer(&_timer, _callback, 0UL); +timer_setup(&_timer, _callback, 0); | -setup_timer(_timer, _callback, 0); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0L); +timer_setup(_timer, _callback, 0); | -setup_timer(_timer, _callback, 0UL); +timer_setup(_timer, _callback, 0); ) @change_callback_unused_data depends on match_timer_function_unused_data@ identifier match_timer_function_unused_data._callback; type _origtype; identifier _origarg; @@ void _callback( -_origtype _origarg +struct timer_list *unused ) { ... when != _origarg } Signed-off-by: Kees Cook <keescook@chromium.org>
* | treewide: init_timer() -> setup_timer()Kees Cook2017-11-212-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This mechanically converts all remaining cases of ancient open-coded timer setup with the old setup_timer() API, which is the first step in timer conversions. This has no behavioral changes, since it ultimately just changes the order of assignment to fields of struct timer_list when finding variations of: init_timer(&t); f.function = timer_callback; t.data = timer_callback_arg; to be converted into: setup_timer(&t, timer_callback, timer_callback_arg); The conversion is done with the following Coccinelle script, which is an improved version of scripts/cocci/api/setup_timer.cocci, in the following ways: - assignments-before-init_timer() cases - limit the .data case removal to the specific struct timer_list instance - handling calls by dereference (timer->field vs timer.field) spatch --very-quiet --all-includes --include-headers \ -I ./arch/x86/include -I ./arch/x86/include/generated \ -I ./include -I ./arch/x86/include/uapi \ -I ./arch/x86/include/generated/uapi -I ./include/uapi \ -I ./include/generated/uapi --include ./include/linux/kconfig.h \ --dir . \ --cocci-file ~/src/data/setup_timer.cocci @fix_address_of@ expression e; @@ init_timer( -&(e) +&e , ...) // Match the common cases first to avoid Coccinelle parsing loops with // "... when" clauses. @match_immediate_function_data_after_init_timer@ expression e, func, da; @@ -init_timer +setup_timer ( \(&e\|e\) +, func, da ); ( -\(e.function\|e->function\) = func; -\(e.data\|e->data\) = da; | -\(e.data\|e->data\) = da; -\(e.function\|e->function\) = func; ) @match_immediate_function_data_before_init_timer@ expression e, func, da; @@ ( -\(e.function\|e->function\) = func; -\(e.data\|e->data\) = da; | -\(e.data\|e->data\) = da; -\(e.function\|e->function\) = func; ) -init_timer +setup_timer ( \(&e\|e\) +, func, da ); @match_function_and_data_after_init_timer@ expression e, e2, e3, e4, e5, func, da; @@ -init_timer +setup_timer ( \(&e\|e\) +, func, da ); ... when != func = e2 when != da = e3 ( -e.function = func; ... when != da = e4 -e.data = da; | -e->function = func; ... when != da = e4 -e->data = da; | -e.data = da; ... when != func = e5 -e.function = func; | -e->data = da; ... when != func = e5 -e->function = func; ) @match_function_and_data_before_init_timer@ expression e, e2, e3, e4, e5, func, da; @@ ( -e.function = func; ... when != da = e4 -e.data = da; | -e->function = func; ... when != da = e4 -e->data = da; | -e.data = da; ... when != func = e5 -e.function = func; | -e->data = da; ... when != func = e5 -e->function = func; ) ... when != func = e2 when != da = e3 -init_timer +setup_timer ( \(&e\|e\) +, func, da ); @r1 exists@ expression t; identifier f; position p; @@ f(...) { ... when any init_timer@p(\(&t\|t\)) ... when any } @r2 exists@ expression r1.t; identifier g != r1.f; expression e8; @@ g(...) { ... when any \(t.data\|t->data\) = e8 ... when any } // It is dangerous to use setup_timer if data field is initialized // in another function. @script:python depends on r2@ p << r1.p; @@ cocci.include_match(False) @r3@ expression r1.t, func, e7; position r1.p; @@ ( -init_timer@p(&t); +setup_timer(&t, func, 0UL); ... when != func = e7 -t.function = func; | -t.function = func; ... when != func = e7 -init_timer@p(&t); +setup_timer(&t, func, 0UL); | -init_timer@p(t); +setup_timer(t, func, 0UL); ... when != func = e7 -t->function = func; | -t->function = func; ... when != func = e7 -init_timer@p(t); +setup_timer(t, func, 0UL); ) Signed-off-by: Kees Cook <keescook@chromium.org>
* | treewide: Switch DEFINE_TIMER callbacks to struct timer_list *Kees Cook2017-11-211-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes all DEFINE_TIMER() callbacks to use a struct timer_list pointer instead of unsigned long. Since the data argument has already been removed, none of these callbacks are using their argument currently, so this renames the argument to "unused". Done using the following semantic patch: @match_define_timer@ declarer name DEFINE_TIMER; identifier _timer, _callback; @@ DEFINE_TIMER(_timer, _callback); @change_callback depends on match_define_timer@ identifier match_define_timer._callback; type _origtype; identifier _origarg; @@ void -_callback(_origtype _origarg) +_callback(struct timer_list *unused) { ... } Signed-off-by: Kees Cook <keescook@chromium.org>
* | Merge tag 'ceph-for-4.15-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds2017-11-211-52/+13
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ceph updates from Ilya Dryomov: "We have a set of file locking improvements from Zheng, rbd rw/ro state handling code cleanup from myself and some assorted CephFS fixes from Jeff. rbd now defaults to single-major=Y, lifting the limit of ~240 rbd images per host for everyone" * tag 'ceph-for-4.15-rc1' of git://github.com/ceph/ceph-client: rbd: default to single-major device number scheme libceph: don't WARN() if user tries to add invalid key rbd: set discard_alignment to zero ceph: silence sparse endianness warning in encode_caps_cb ceph: remove the bump of i_version ceph: present consistent fsid, regardless of arch endianness ceph: clean up spinlocking and list handling around cleanup_cap_releases() rbd: get rid of rbd_mapping::read_only rbd: fix and simplify rbd_ioctl_set_ro() ceph: remove unused and redundant variable dropping ceph: mark expected switch fall-throughs ceph: -EINVAL on decoding failure in ceph_mdsc_handle_fsmap() ceph: disable cached readdir after dropping positive dentry ceph: fix bool initialization/comparison ceph: handle 'session get evicted while there are file locks' ceph: optimize flock encoding during reconnect ceph: make lock_to_ceph_filelock() static ceph: keep auth cap when inode has flocks or posix locks
| * rbd: default to single-major device number schemeIlya Dryomov2017-11-131-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's been 3.5 years, let's turn it on by default. Support in rbd(8) utility goes back to pre-firefly, "rbd map" has been loading the module with single_major=Y ever since. However, if the module is already loaded (whether by hand or at boot time), we end up with single_major=N. Also, some people don't install rbd(8) and use the sysfs interface directly. (With single-major=N, a major number is consumed for every mapping, imposing a limit of ~240 rbd images per host. single-major=Y allows mapping thousands of rbd images on a single machine.) Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com>
| * rbd: set discard_alignment to zeroDavid Disseldorp2017-11-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | RBD devices are currently incorrectly initialised with the block queue discard_alignment set to the underlying RADOS object size. As per Documentation/ABI/testing/sysfs-block: The discard_alignment parameter indicates how many bytes the beginning of the device is offset from the internal allocation unit's natural alignment. Correcting the discard_alignment parameter from the RADOS object size to zero (the blk_set_default_limits() default) has no effect on how discard requests are propagated through the block layer - @alignment in __blkdev_issue_discard() remains zero. However, it does fix the UNMAP granularity alignment value advertised to SCSI initiators via the Block Limits VPD. Signed-off-by: David Disseldorp <ddiss@suse.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: get rid of rbd_mapping::read_onlyIlya Dryomov2017-11-131-19/+4
| | | | | | | | | | | | | | It is redundant -- rw/ro state is stored in hd_struct and managed by the block layer. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
| * rbd: fix and simplify rbd_ioctl_set_ro()Ilya Dryomov2017-11-131-28/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | ->open_count/-EBUSY check is bogus and wrong: when an open device is set read-only, blkdev_write_iter() refuses further writes with -EPERM. This is standard behaviour and all other block devices allow this. set_disk_ro() call is also problematic: we affect the entire device when called on a single partition. All rbd_ioctl_set_ro() needs to do is refuse ro -> rw transition for mapped snapshots. Everything else can be handled by generic code. Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* | Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds2017-11-175-59/+54
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block layer updates from Jens Axboe: "A followup pull request, with some parts that either needed a bit more testing before going in, merge sync, or just later arriving fixes. This contains: - Timer related updates from Kees. These were purposefully delayed since I didn't want to pull in a later v4.14-rc tag to my block tree. - ide-cd prep sense buffer fix from Bart. Also delayed, as not to clash with the late fix we put into 4.14-rc. - Small BFQ updates series from Luca and Paolo. - Single nvmet fix from James, fixing a non-functional case there. - Bio fast clone fix from Michael, which made bcache return the wrong data for some cases. - Legacy IO path regression hang fix from Ming" * 'for-linus' of git://git.kernel.dk/linux-block: bio: ensure __bio_clone_fast copies bi_partno nvmet_fc: fix better length checking block: wake up all tasks blocked in get_request() block, bfq: move debug blkio stats behind CONFIG_DEBUG_BLK_CGROUP block, bfq: update blkio stats outside the scheduler lock block, bfq: add missing invocations of bfqg_stats_update_io_add/remove doc, block, bfq: update max IOPS sustainable with BFQ ide: Make ide_cdrom_prep_fs() initialize the sense buffer pointer md: Convert timers to use timer_setup() block: swim3: Convert timers to use timer_setup() block/aoe: Convert timers to use timer_setup() amifloppy: Convert timers to use timer_setup() block/floppy: Convert callback to pass timer_list
| * | block: swim3: Convert timers to use timer_setup()Kees Cook2017-11-141-16/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Arvind Yadav <arvind.yadav.cs@gmail.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block/aoe: Convert timers to use timer_setup()Kees Cook2017-11-142-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Jens Axboe <axboe@kernel.dk> Cc: "Ed L. Cashin" <ed.cashin@acm.org> Cc: linux-block@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | amifloppy: Convert timers to use timer_setup()Kees Cook2017-11-141-31/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This converts the amifloppy driver to pass the timer pointer to the callback instead of the drive number (and flags). It eliminates the decusagecounter flag, as it was unused, and drops the ininterrupt flag which appeared to be a needless optimization. The drive can then be calculated from the offset of the timer in the drive timer array. Additionally moves to a static data variable instead of the soon-to-be-gone timer->data field. Cc: Jens Axboe <axboe@kernel.dk> Cc: Krzysztof Halasa <khc@pm.waw.pl> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | block/floppy: Convert callback to pass timer_listKees Cook2017-11-141-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to passing in the timer pointer explicitly. Calculate the drive from the offset of the timer in the timer list. Cc: Jiri Kosina <jikos@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <tom.leiming@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Geliang Tang <geliangtang@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | Merge tag 'libnvdimm-for-4.15' of ↵Linus Torvalds2017-11-172-77/+0
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - 957ac8c421ad ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - a39e596baa07 ("xfs: support for synchronous DAX faults") and 7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...
| * | | brd: remove dax supportDan Williams2017-11-142-77/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DAX support in brd is awkward because its backing page frames are distinct from the ones provided by pmem, dcssblk, or axonram. We need pfn_t_devmap() entries to fully support DAX, and the limited DAX support for pfn_t_special() page frames is not interesting for brd when pmem is already a superset of brd. Lastly, brd is the only dax capable driver that may sleep in its ->direct_access() implementation. So it causes a global burden with no net gain of kernel functionality. For all these reasons, remove DAX support. Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox <mawilcox@microsoft.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | | drivers/block/zram/zram_drv.c: make zram_page_end_io() staticColin Ian King2017-11-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | zram_page_end_io() is local to the source and does not need to be in global scope, so make it static. Cleans up sparse warning: symbol 'zram_page_end_io' was not declared. Should it be static? Link: http://lkml.kernel.org/r/20171016173336.20320-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | zram: remove zlib from the list of recommended algorithmsSergey Senozhatsky2017-11-151-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ZSTD tends to outperform deflate/inflate, thus we remove zlib from the list of recommended algorithms and recommend zstd instead. Link: http://lkml.kernel.org/r/20170912050005.3247-2-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Suggested-by: Minchan Kim <minchan@kernel.org> Acked-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | zram: add zstd to the supported algorithms listSergey Senozhatsky2017-11-151-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add ZSTD to the list of supported compression algorithms. ZRAM fio perf test: LZO DEFLATE ZSTD #jobs1 WRITE: (2180MB/s) (77.2MB/s) (1429MB/s) WRITE: (1617MB/s) (77.7MB/s) (1202MB/s) READ: (426MB/s) (595MB/s) (1181MB/s) READ: (422MB/s) (572MB/s) (1020MB/s) READ: (318MB/s) (67.8MB/s) (563MB/s) WRITE: (318MB/s) (67.9MB/s) (564MB/s) READ: (336MB/s) (68.3MB/s) (583MB/s) WRITE: (335MB/s) (68.2MB/s) (582MB/s) #jobs2 WRITE: (3441MB/s) (152MB/s) (2141MB/s) WRITE: (2507MB/s) (147MB/s) (1888MB/s) READ: (801MB/s) (1146MB/s) (1890MB/s) READ: (767MB/s) (1096MB/s) (2073MB/s) READ: (621MB/s) (126MB/s) (1009MB/s) WRITE: (621MB/s) (126MB/s) (1009MB/s) READ: (656MB/s) (125MB/s) (1075MB/s) WRITE: (657MB/s) (126MB/s) (1077MB/s) #jobs3 WRITE: (4772MB/s) (225MB/s) (3394MB/s) WRITE: (3905MB/s) (211MB/s) (2939MB/s) READ: (1216MB/s) (1608MB/s) (3218MB/s) READ: (1159MB/s) (1431MB/s) (2981MB/s) READ: (906MB/s) (156MB/s) (1457MB/s) WRITE: (907MB/s) (156MB/s) (1458MB/s) READ: (953MB/s) (158MB/s) (1595MB/s) WRITE: (952MB/s) (157MB/s) (1593MB/s) #jobs4 WRITE: (6036MB/s) (265MB/s) (4469MB/s) WRITE: (5059MB/s) (263MB/s) (3951MB/s) READ: (1618MB/s) (2066MB/s) (4276MB/s) READ: (1573MB/s) (1942MB/s) (3830MB/s) READ: (1202MB/s) (227MB/s) (1971MB/s) WRITE: (1200MB/s) (227MB/s) (1968MB/s) READ: (1265MB/s) (226MB/s) (2116MB/s) WRITE: (1264MB/s) (226MB/s) (2114MB/s) #jobs5 WRITE: (5339MB/s) (233MB/s) (3781MB/s) WRITE: (4298MB/s) (234MB/s) (3276MB/s) READ: (1626MB/s) (2048MB/s) (4081MB/s) READ: (1567MB/s) (1929MB/s) (3758MB/s) READ: (1174MB/s) (205MB/s) (1747MB/s) WRITE: (1173MB/s) (204MB/s) (1746MB/s) READ: (1214MB/s) (208MB/s) (1890MB/s) WRITE: (1215MB/s) (208MB/s) (1892MB/s) #jobs6 WRITE: (5666MB/s) (270MB/s) (4338MB/s) WRITE: (4828MB/s) (267MB/s) (3772MB/s) READ: (1803MB/s) (2058MB/s) (4946MB/s) READ: (1805MB/s) (2156MB/s) (4711MB/s) READ: (1334MB/s) (235MB/s) (2135MB/s) WRITE: (1335MB/s) (235MB/s) (2137MB/s) READ: (1364MB/s) (236MB/s) (2268MB/s) WRITE: (1365MB/s) (237MB/s) (2270MB/s) #jobs7 WRITE: (5474MB/s) (270MB/s) (4300MB/s) WRITE: (4666MB/s) (266MB/s) (3817MB/s) READ: (2022MB/s) (2319MB/s) (5472MB/s) READ: (1924MB/s) (2260MB/s) (5031MB/s) READ: (1369MB/s) (242MB/s) (2153MB/s) WRITE: (1370MB/s) (242MB/s) (2155MB/s) READ: (1499MB/s) (246MB/s) (2310MB/s) WRITE: (1497MB/s) (246MB/s) (2307MB/s) #jobs8 WRITE: (5558MB/s) (273MB/s) (4439MB/s) WRITE: (4763MB/s) (271MB/s) (3918MB/s) READ: (2201MB/s) (2599MB/s) (6062MB/s) READ: (2105MB/s) (2463MB/s) (5413MB/s) READ: (1490MB/s) (252MB/s) (2238MB/s) WRITE: (1488MB/s) (252MB/s) (2236MB/s) READ: (1566MB/s) (254MB/s) (2434MB/s) WRITE: (1568MB/s) (254MB/s) (2437MB/s) #jobs9 WRITE: (5120MB/s) (264MB/s) (4035MB/s) WRITE: (4531MB/s) (267MB/s) (3740MB/s) READ: (1940MB/s) (2258MB/s) (4986MB/s) READ: (2024MB/s) (2387MB/s) (4871MB/s) READ: (1343MB/s) (246MB/s) (2038MB/s) WRITE: (1342MB/s) (246MB/s) (2037MB/s) READ: (1553MB/s) (238MB/s) (2243MB/s) WRITE: (1552MB/s) (238MB/s) (2242MB/s) #jobs10 WRITE: (5345MB/s) (271MB/s) (3988MB/s) WRITE: (4750MB/s) (254MB/s) (3668MB/s) READ: (1876MB/s) (2363MB/s) (5150MB/s) READ: (1990MB/s) (2256MB/s) (5080MB/s) READ: (1355MB/s) (250MB/s) (2019MB/s) WRITE: (1356MB/s) (251MB/s) (2020MB/s) READ: (1490MB/s) (252MB/s) (2202MB/s) WRITE: (1488MB/s) (252MB/s) (2199MB/s) jobs1 perfstat instructions 52,065,555,710 ( 0.79) 855,731,114,587 ( 2.64) 54,280,709,944 ( 1.40) branches 14,020,427,116 ( 725.847) 101,733,449,582 (1074.521) 11,170,591,067 ( 992.869) branch-misses 22,626,174 ( 0.16%) 274,197,885 ( 0.27%) 25,915,805 ( 0.23%) jobs2 perfstat instructions 103,633,110,402 ( 0.75) 1,710,822,100,914 ( 2.59) 107,879,874,104 ( 1.28) branches 27,931,237,282 ( 679.203) 203,298,267,479 (1037.326) 22,185,350,842 ( 884.427) branch-misses 46,103,811 ( 0.17%) 533,747,204 ( 0.26%) 49,682,483 ( 0.22%) jobs3 perfstat instructions 154,857,283,657 ( 0.76) 2,565,748,974,197 ( 2.57) 161,515,435,813 ( 1.31) branches 41,759,490,355 ( 670.529) 304,905,605,277 ( 978.765) 33,215,805,907 ( 888.003) branch-misses 74,263,293 ( 0.18%) 759,746,240 ( 0.25%) 76,841,196 ( 0.23%) jobs4 perfstat instructions 206,215,849,076 ( 0.75) 3,420,169,460,897 ( 2.60) 215,003,061,664 ( 1.31) branches 55,632,141,739 ( 666.501) 406,394,977,433 ( 927.241) 44,214,322,251 ( 883.532) branch-misses 102,287,788 ( 0.18%) 1,098,617,314 ( 0.27%) 103,891,040 ( 0.23%) jobs5 perfstat instructions 258,711,315,588 ( 0.67) 4,275,657,533,244 ( 2.23) 269,332,235,685 ( 1.08) branches 69,802,821,166 ( 588.823) 507,996,211,252 ( 797.036) 55,450,846,129 ( 735.095) branch-misses 129,217,214 ( 0.19%) 1,243,284,991 ( 0.24%) 173,512,278 ( 0.31%) jobs6 perfstat instructions 312,796,166,008 ( 0.61) 5,133,896,344,660 ( 2.02) 323,658,769,588 ( 1.04) branches 84,372,488,583 ( 520.541) 610,310,494,402 ( 697.642) 66,683,292,992 ( 693.939) branch-misses 159,438,978 ( 0.19%) 1,396,368,563 ( 0.23%) 174,406,934 ( 0.26%) jobs7 perfstat instructions 363,211,372,930 ( 0.56) 5,988,205,600,879 ( 1.75) 377,824,674,156 ( 0.93) branches 98,057,013,765 ( 463.117) 711,841,255,974 ( 598.762) 77,879,009,954 ( 600.443) branch-misses 199,513,153 ( 0.20%) 1,507,651,077 ( 0.21%) 248,203,369 ( 0.32%) jobs8 perfstat instructions 413,960,354,615 ( 0.52) 6,842,918,558,378 ( 1.45) 431,938,486,581 ( 0.83) branches 111,812,574,884 ( 414.224) 813,299,084,518 ( 491.173) 89,062,699,827 ( 517.795) branch-misses 233,584,845 ( 0.21%) 1,531,593,921 ( 0.19%) 286,818,489 ( 0.32%) jobs9 perfstat instructions 465,976,220,300 ( 0.53) 7,698,467,237,372 ( 1.47) 486,352,600,321 ( 0.84) branches 125,931,456,162 ( 424.063) 915,207,005,715 ( 498.192) 100,370,404,090 ( 517.439) branch-misses 256,992,445 ( 0.20%) 1,782,809,816 ( 0.19%) 345,239,380 ( 0.34%) jobs10 perfstat instructions 517,406,372,715 ( 0.53) 8,553,527,312,900 ( 1.48) 540,732,653,094 ( 0.84) branches 139,839,780,676 ( 427.732) 1,016,737,699,389 ( 503.172) 111,696,557,638 ( 516.750) branch-misses 259,595,561 ( 0.19%) 1,952,570,279 ( 0.19%) 357,818,661 ( 0.32%) seconds elapsed 20.630411534 96.084546565 12.743373571 seconds elapsed 22.292627625 100.984155001 14.407413560 seconds elapsed 22.396016966 110.344880848 14.032201392 seconds elapsed 22.517330949 113.351459170 14.243074935 seconds elapsed 28.548305104 156.515193765 19.159286861 seconds elapsed 30.453538116 164.559937678 19.362492717 seconds elapsed 33.467108086 188.486827481 21.492612173 seconds elapsed 35.617727591 209.602677783 23.256422492 seconds elapsed 42.584239509 243.959902566 28.458540338 seconds elapsed 47.683632526 269.635248851 31.542404137 Over all, ZSTD has slower WRITE, but much faster READ (perhaps a static compression buffer used during the test helped ZSTD a lot), which results in faster test results. Memory consumption (zram mm_stat file): zram LZO mm_stat mm_stat (jobs1): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs2): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs3): 2147483648 23068672 33558528 0 33562624 0 0 mm_stat (jobs4): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs5): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs6): 2147483648 23068672 33558528 0 33562624 0 0 mm_stat (jobs7): 2147483648 23068672 33558528 0 33566720 0 0 mm_stat (jobs8): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs9): 2147483648 23068672 33558528 0 33558528 0 0 mm_stat (jobs10): 2147483648 23068672 33558528 0 33562624 0 0 zram DEFLATE mm_stat mm_stat (jobs1): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs2): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs3): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs4): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs5): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs6): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs7): 2147483648 16252928 25178112 0 25190400 0 0 mm_stat (jobs8): 2147483648 16252928 25178112 0 25190400 0 0 mm_stat (jobs9): 2147483648 16252928 25178112 0 25178112 0 0 mm_stat (jobs10): 2147483648 16252928 25178112 0 25178112 0 0 zram ZSTD mm_stat mm_stat (jobs1): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs2): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs3): 2147483648 11010048 16781312 0 16785408 0 0 mm_stat (jobs4): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs5): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs6): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs7): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs8): 2147483648 11010048 16781312 0 16781312 0 0 mm_stat (jobs9): 2147483648 11010048 16781312 0 16785408 0 0 mm_stat (jobs10): 2147483648 11010048 16781312 0 16781312 0 0 ================================================================================== Official benchmarks [1]: Compressor name Ratio Compression Decompress. zstd 1.1.3 -1 2.877 430 MB/s 1110 MB/s zlib 1.2.8 -1 2.743 110 MB/s 400 MB/s brotli 0.5.2 -0 2.708 400 MB/s 430 MB/s quicklz 1.5.0 -1 2.238 550 MB/s 710 MB/s lzo1x 2.09 -1 2.108 650 MB/s 830 MB/s lz4 1.7.5 2.101 720 MB/s 3600 MB/s snappy 1.1.3 2.091 500 MB/s 1650 MB/s lzf 3.6 -1 2.077 400 MB/s 860 MB/s Minchan said: : I did test with my sample data and compared zstd with deflate. zstd's : compress ratio is lower a little bit but compression speed is much faster : 3 times more and decompress speed is too 2 times more. With different : data, it is different but overall, zstd would be better for speed at the : cost of a little lower compress ratio(about 5%) so I believe it's worth to : replace deflate. [1] https://github.com/facebook/zstd Link: http://lkml.kernel.org/r/20170912050005.3247-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Acked-by: Minchan Kim <minchan@kernel.org> Tested-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | bdi: introduce BDI_CAP_SYNCHRONOUS_IOMinchan Kim2017-11-152-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As discussed at https://lkml.kernel.org/r/<20170728165604.10455-1-ross.zwisler@linux.intel.com> someday we will remove rw_page(). If so, we need something to detect such super-fast storage on which synchronous IO operations like the current rw_page are always a win. Introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices. With it, we could use various optimization techniques. Link: http://lkml.kernel.org/r/1505886205-9671-3-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | zram: set BDI_CAP_STABLE_WRITES onceMinchan Kim2017-11-151-10/+6
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With fast swap storage, the platform wants to use swap more aggressively and swap-in is crucial to application latency. The rw_page() based synchronous devices like zram, pmem and btt are such fast storage. When I profile swapin performance with zram lz4 decompress test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm. This patchset reduces swap-in latency by skipping swapcache if the swap device is a synchronous device like a rw_page() based device. It enhances by 45% my swapin test (5G sequential swapin, no readahead) from 2.41sec to 1.64sec. This patch (of 4): Commit 19b7ccf8651d ("block: get rid of blk_integrity_revalidate()") fixed a weird thing (i.e., reset BDI_CAP_STABLE_WRITES flag unconditionally whenever revalidat_disk is called) so zram doesn't need to reset the flag any more when revalidating the bdev. Instead, set the flag just once when the zram device is created. It shouldn't change any behavior. Link: http://lkml.kernel.org/r/1505886205-9671-2-git-send-email-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Hugh Dickins <hughd@google.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mappingLinus Torvalds2017-11-141-0/+4
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull dma-mapping updates from Christoph Hellwig: - turn dma_cache_sync into a dma_map_ops instance and remove implementation that purely are dead because the architecture doesn't support noncoherent allocations - add a flag for busses that need DMA configuration (Robin Murphy) * tag 'dma-mapping-4.15' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: turn dma_cache_sync into a dma_map_ops method sh: make dma_cache_sync a no-op xtensa: make dma_cache_sync a no-op unicore32: make dma_cache_sync a no-op powerpc: make dma_cache_sync a no-op mn10300: make dma_cache_sync a no-op microblaze: make dma_cache_sync a no-op ia64: make dma_cache_sync a no-op frv: make dma_cache_sync a no-op x86: make dma_cache_sync a no-op floppy: consolidate the dummy fd_cacheflush definition drivers: flag buses which demand DMA configuration
| * | | floppy: consolidate the dummy fd_cacheflush definitionChristoph Hellwig2017-10-191-0/+4
| |/ / | | | | | | | | | | | | | | | | | | Only mips defines this helper, so remove all the other arch definitions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com>
* | | Merge branch 'for-4.15/block' of git://git.kernel.dk/linux-blockLinus Torvalds2017-11-1410-28/+41
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull core block layer updates from Jens Axboe: "This is the main pull request for block storage for 4.15-rc1. Nothing out of the ordinary in here, and no API changes or anything like that. Just various new features for drivers, core changes, etc. In particular, this pull request contains: - A patch series from Bart, closing the whole on blk/scsi-mq queue quescing. - A series from Christoph, building towards hidden gendisks (for multipath) and ability to move bio chains around. - NVMe - Support for native multipath for NVMe (Christoph). - Userspace notifications for AENs (Keith). - Command side-effects support (Keith). - SGL support (Chaitanya Kulkarni) - FC fixes and improvements (James Smart) - Lots of fixes and tweaks (Various) - bcache - New maintainer (Michael Lyle) - Writeback control improvements (Michael) - Various fixes (Coly, Elena, Eric, Liang, et al) - lightnvm updates, mostly centered around the pblk interface (Javier, Hans, and Rakesh). - Removal of unused bio/bvec kmap atomic interfaces (me, Christoph) - Writeback series that fix the much discussed hundreds of millions of sync-all units. This goes all the way, as discussed previously (me). - Fix for missing wakeup on writeback timer adjustments (Yafang Shao). - Fix laptop mode on blk-mq (me). - {mq,name} tupple lookup for IO schedulers, allowing us to have alias names. This means you can use 'deadline' on both !mq and on mq (where it's called mq-deadline). (me). - blktrace race fix, oopsing on sg load (me). - blk-mq optimizations (me). - Obscure waitqueue race fix for kyber (Omar). - NBD fixes (Josef). - Disable writeback throttling by default on bfq, like we do on cfq (Luca Miccio). - Series from Ming that enable us to treat flush requests on blk-mq like any other request. This is a really nice cleanup. - Series from Ming that improves merging on blk-mq with schedulers, getting us closer to flipping the switch on scsi-mq again. - BFQ updates (Paolo). - blk-mq atomic flags memory ordering fixes (Peter Z). - Loop cgroup support (Shaohua). - Lots of minor fixes from lots of different folks, both for core and driver code" * 'for-4.15/block' of git://git.kernel.dk/linux-block: (294 commits) nvme: fix visibility of "uuid" ns attribute blk-mq: fixup some comment typos and lengths ide: ide-atapi: fix compile error with defining macro DEBUG blk-mq: improve tag waiting setup for non-shared tags brd: remove unused brd_mutex blk-mq: only run the hardware queue if IO is pending block: avoid null pointer dereference on null disk fs: guard_bio_eod() needs to consider partitions xtensa/simdisk: fix compile error nvme: expose subsys attribute to sysfs nvme: create 'slaves' and 'holders' entries for hidden controllers block: create 'slaves' and 'holders' entries for hidden gendisks nvme: also expose the namespace identification sysfs files for mpath nodes nvme: implement multipath access to nvme subsystems nvme: track shared namespaces nvme: introduce a nvme_ns_ids structure nvme: track subsystems block, nvme: Introduce blk_mq_req_flags_t block, scsi: Make SCSI quiesce and resume work reliably block: Add the QUEUE_FLAG_PREEMPT_ONLY request queue flag ...
| * | | brd: remove unused brd_mutexMikulas Patocka2017-11-101-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove unused mutex brd_mutex. It is unused since the commit ff26956875c2 ("brd: remove support for BLKFLSBUF"). Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | nbd: don't start req until after the dead connection logicJosef Bacik2017-11-061-13/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We can end up sleeping for a while waiting for the dead timeout, which means we could get the per request timer to fire. We did handle this case, but if the dead timeout happened right after we submitted we'd either tear down the connection or possibly requeue as we're handling an error and race with the endio which can lead to panics and other hilarity. Fixes: 560bc4b39952 ("nbd: handle dead connections") Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | nbd: wait uninterruptible for the dead timeoutJosef Bacik2017-11-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we have a pending signal or the user kills their application then it'll bring down the whole device, which is less than awesome. Instead wait uninterruptible for the dead timeout so we're sure we gave it our best shot. Fixes: 560bc4b39952 ("nbd: handle dead connections") Cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | cdrom: hide CONFIG_CDROM menu selectionJens Axboe2017-11-031-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need to expose this. The point is that drivers select the uniform CDROM layer, if they need it, the user should not have to make a conscious decision on whether to include this separately or not. Fixes: 2a750166a5be ("block: Rework drivers/cdrom/Makefile") Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | skd: use ktime_get_real_seconds()Arnd Bergmann2017-11-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Like many storage drivers, skd uses an unsigned 32-bit number for interchanging the current time with the firmware. This will overflow in y2106 and is otherwise safe. However, the get_seconds() function is generally considered deprecated since the behavior is different between 32-bit and 64-bit architectures, and using it may indicate a bigger problem. To annotate that we've thought about this, let's add a comment here and migrate to the ktime_get_real_seconds() function that consistently returns a 64-bit number. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: Rework drivers/cdrom/MakefileBart Van Assche2017-11-012-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of referring from inside drivers/cdrom/Makefile to all the drivers that use this driver, let these drivers select the cdrom driver. This change makes the cdrom build code follow the approach that is used for most other drivers, namely refer from the higher layers to the lower layer instead of from the lower layer to the higher layers. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | nullb: fix error return code in null_init()Wei Yongjun2017-10-171-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix to return error code -ENOMEM from the null_alloc_dev() error handling case instead of 0, as done elsewhere in this function. Fixes: 2984c8684f96 ("nullb: factor disk parameters") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | mtip32xx: Clean up unused variablesChristos Gkekas2017-10-121-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove variables that are set but never used. Signed-off-by: Christos Gkekas <chris.gekas@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | null_blk: add "no_sched" module parameterweiping zhang2017-09-301-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | add an option that disable io scheduler for null block device. Signed-off-by: weiping zhang <zhangweiping@didichuxing.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: fix a build errorShaohua Li2017-09-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code is only for blkcg not for all cgroups Fixes: d4478e92d618 ("block/loop: make loop cgroup aware") Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block: cryptoloop - Fix build warningCorentin Labbe2017-09-261-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fix the following build warning: drivers/block/cryptoloop.c:46:8: warning: variable 'cipher' set but not used [-Wunused-but-set-variable] Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | block/loop: make loop cgroup awareShaohua Li2017-09-262-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | loop block device handles IO in a separate thread. The actual IO dispatched isn't cloned from the IO loop device received, so the dispatched IO loses the cgroup context. I'm ignoring buffer IO case now, which is quite complicated. Making the loop thread aware cgroup context doesn't really help. The loop device only writes to a single file. In current writeback cgroup implementation, the file can only belong to one cgroup. For direct IO case, we could workaround the issue in theory. For example, say we assign cgroup1 5M/s BW for loop device and cgroup2 10M/s. We can create a special cgroup for loop thread and assign at least 15M/s for the underlayer disk. In this way, we correctly throttle the two cgroups. But this is tricky to setup. This patch tries to address the issue. We record bio's css in loop command. When loop thread is handling the command, we then use the API provided in patch 1 to set the css for current task. The bio layer will use the css for new IO (from patch 3). Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | Merge tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfsLinus Torvalds2017-11-141-2/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull configfs updates from Christoph Hellwig: "A couple of configfs cleanups: - proper use of the bool type (Thomas Meyer) - constification of struct config_item_type (Bhumika Goyal)" * tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs: RDMA/cma: make config_item_type const stm class: make config_item_type const ACPI: configfs: make config_item_type const nvmet: make config_item_type const usb: gadget: configfs: make config_item_type const PCI: endpoint: make config_item_type const iio: make function argument and some structures const usb: gadget: make config_item_type structures const dlm: make config_item_type const netconsole: make config_item_type const nullb: make config_item_type const ocfs2/cluster: make config_item_type const target: make config_item_type const configfs: make ci_type field, some pointers and function arguments const configfs: make config_item_type const configfs: Fix bool initialization/comparison
| * | | | nullb: make config_item_type constBhumika Goyal2017-10-191-2/+2
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make these structures const as they are either passed to the functions having the argument as const or stored as a reference in the "ci_type" const field of a config_item structure. Done using Coccienlle. Signed-off-by: Bhumika Goyal <bhumirks@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
* | | | Merge branch 'timers-core-for-linus' of ↵Linus Torvalds2017-11-139-63/+30
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "Yet another big pile of changes: - More year 2038 work from Arnd slowly reaching the point where we need to think about the syscalls themself. - A new timer function which allows to conditionally (re)arm a timer only when it's either not running or the new expiry time is sooner than the armed expiry time. This allows to use a single timer for multiple timeout requirements w/o caring about the first expiry time at the call site. - A new NMI safe accessor to clock real time for the printk timestamp work. Can be used by tracing, perf as well if required. - A large number of timer setup conversions from Kees which got collected here because either maintainers requested so or they simply got ignored. As Kees pointed out already there are a few trivial merge conflicts and some redundant commits which was unavoidable due to the size of this conversion effort. - Avoid a redundant iteration in the timer wheel softirq processing. - Provide a mechanism to treat RTC implementations depending on their hardware properties, i.e. don't inflict the write at the 0.5 seconds boundary which originates from the PC CMOS RTC to all RTCs. No functional change as drivers need to be updated separately. - The usual small updates to core code clocksource drivers. Nothing really exciting" * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (111 commits) timers: Add a function to start/reduce a timer pstore: Use ktime_get_real_fast_ns() instead of __getnstimeofday() timer: Prepare to change all DEFINE_TIMER() callbacks netfilter: ipvs: Convert timers to use timer_setup() scsi: qla2xxx: Convert timers to use timer_setup() block/aoe: discover_timer: Convert timers to use timer_setup() ide: Convert timers to use timer_setup() drbd: Convert timers to use timer_setup() mailbox: Convert timers to use timer_setup() crypto: Convert timers to use timer_setup() drivers/pcmcia: omap1: Fix error in automated timer conversion ARM: footbridge: Fix typo in timer conversion drivers/sgi-xp: Convert timers to use timer_setup() drivers/pcmcia: Convert timers to use timer_setup() drivers/memstick: Convert timers to use timer_setup() drivers/macintosh: Convert timers to use timer_setup() hwrng/xgene-rng: Convert timers to use timer_setup() auxdisplay: Convert timers to use timer_setup() sparc/led: Convert timers to use timer_setup() mips: ip22/32: Convert timers to use timer_setup() ...
| * | | block/aoe: discover_timer: Convert timers to use timer_setup()Kees Cook2017-11-061-36/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. This refactors the discover_timer to remove the needless locking and state machine used for synchronizing timer death. Using del_timer_sync() will already do the right thing. Cc: Jens Axboe <axboe@kernel.dk> Cc: "Ed L. Cashin" <ed.cashin@acm.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kees Cook <keescook@chromium.org>
| * | | drbd: Convert timers to use timer_setup()Kees Cook2017-11-066-21/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for unconditionally passing the struct timer_list pointer to all timer callbacks, switch to using the new timer_setup() and from_timer() to pass the timer pointer explicitly. Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Lars Ellenberg <lars.ellenberg@linbit.com> Cc: drbd-dev@lists.linbit.com Signed-off-by: Kees Cook <keescook@chromium.org>
| * | | timer: Remove meaningless .data/.function assignmentsKees Cook2017-10-171-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Several timer users needlessly reset their .function/.data fields during their timer callback, but nothing else changes them. Some users do not use their .data field at all. Each instance is removed here. Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> # for staging Acked-by: Krzysztof Halasa <khc@pm.waw.pl> # for wan/hdlc* Acked-by: Jens Axboe <axboe@kernel.dk> # for amiflop Cc: devel@driverdev.osuosl.org Cc: netdev@vger.kernel.org Cc: linux-wireless@vger.kernel.org Cc: Jens Axboe <axboe@fb.com> Cc: Ganesh Krishna <ganesh.krishna@microchip.com> Cc: Aditya Shankar <aditya.shankar@microchip.com> Link: https://lkml.kernel.org/r/20171010001032.GA119829@beast
OpenPOWER on IntegriCloud