| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs
Pull xfs pnfs block layout support from Dave Chinner:
"This contains the changes to XFS needed to support the PNFS block
layout server that you pulled in through Bruce's NFS server tree
merge.
I originally thought that I'd need to merge changes into the NFS
server side, but Bruce had already picked them up and so this is
purely changes to the fs/xfs/ codebase.
Summary:
This update contains the implementation of the PNFS server export
methods that enable use of XFS filesystems as a block layout target"
* tag 'xfs-pnfs-for-linus-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: recall pNFS layouts on conflicting access
xfs: implement pNFS export operations
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Recall all outstanding pNFS layouts and truncates, writes and similar extent
list modifying operations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add operations to export pNFS block layouts from an XFS filesystem. See
the previous commit adding the operations for an explanation of them.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull more NFS client updates from Trond Myklebust:
"Highlights include:
- Fix a use-after-free in decode_cb_sequence_args()
- Fix a compile error when #undef CONFIG_PROC_FS
- NFSv4.1 backchannel spinlocking issue
- Cleanups in the NFS unstable write code requested by Linus
- NFSv4.1 fix issues when the server denies our backchannel request
- Cleanups in create_session and bind_conn_to_session"
* tag 'nfs-for-3.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4.1: Clean up bind_conn_to_session
NFSv4.1: Always set up a forward channel when binding the session
NFSv4.1: Don't set up a backchannel if the server didn't agree to do so
NFSv4.1: Clean up create_session
pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit
NFSv4: Kill unused nfs_inode->delegation_state field
NFS: struct nfs_commit_info.lock must always point to inode->i_lock
nfs: Can call nfs_clear_page_commit() instead
nfs: Provide and use helper functions for marking a page as unstable
SUNRPC: Always manipulate rpc_rqst::rq_bc_pa_list under xprt->bc_pa_lock
SUNRPC: Fix a compile error when #undef CONFIG_PROC_FS
NFSv4.1: Convert open-coded array allocation calls to kmalloc_array()
NFSv4.1: Fix a kfree() of uninitialised pointers in decode_cb_sequence_args
|
| | |
| | |
| | |
| | |
| | |
| | | |
We don't need to fake up an entire session in order retrieve the arguments.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently, the client requests a back channel or a bidirectional
connection when binding a new TCP channel to an existing session.
Fix that to ask for a forward channel or bidirectional.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the server doesn't agree to out backchannel setup request, then
don't set one up.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Don't decode directly into the shared struct session
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Merge cleanups requested by Linus.
* cleanups: (3 commits)
pnfs: Refactor the *_layout_mark_request_commit to use pnfs_layout_mark_request_commit
nfs: Can call nfs_clear_page_commit() instead
nfs: Provide and use helper functions for marking a page as unstable
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
pnfs_layout_mark_request_commit
The File Layout's filelayout_mark_request_commit() is almost the
Flex File Layout's ff_layout_mark_request_commit(). And that can
be reduced by calling into nfs_request_add_commit_list().
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Tom Haynes <loghyr@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit 411a99adffb4f (nfs: clear_request_commit while holding i_lock)
assumes that the nfs_commit_info always points to the inode->i_lock.
For historical reasons, that is not the case for O_DIRECT writes.
Cc: Weston Andros Adamson <dros@primarydata.com>
Fixes: 411a99adffb4f ("nfs: clear_request_commit while holding i_lock")
Cc: stable@vger.kernel.org # 3.17.x
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Other code that accesses rq_bc_pa_list holds xprt->bc_pa_lock.
xprt_complete_bc_request() should do the same.
Fixes: 2ea24497a1b3 ("SUNRPC: RPC callbacks may be split . . .")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The definition of rpc_count_iostats_metrics() is borked.
Reported by: Jim Davis <jim.epost@gmail.com>
Fixes: d67ae825a59d6 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Cc: Tom Haynes <thomas.haynes@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
For added overflow protection...
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If the call to decode_rc_list() fails due to a memory allocation error,
then we need to truncate the array size to ensure that we only call
kfree() on those pointer that were allocated.
Reported-by: David Ramos <daramos@stanford.edu>
Fixes: 4aece6a19cf7f ("nfs41: cb_sequence xdr implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull one more batch of power management and ACPI updates from Rafael Wysocki:
"These are mostly fixes on top of the previously merged recent PM and
ACPI material.
First, one commit that broke the ACPI LPSS (Low-Power Subsystem)
driver on a Dell box is reverted and there are two stable-candidate
fixes for that driver. Another fix cleans up two recently added ACPI
EC messages that look odd and the printk level of a noisy debug
message in the core ACPI resources handling code is reduced.
In addition to that we have two stable-candidate fixes for the s3c
cpufreq driver, two cpuidle powernv driver updates related to Device
Trees and a PNP subsystem cleanup that will allow us to get rid of
some old ugliness going forward. Also there is a new blacklist entry
for the ACPI backlight code.
Specifics:
- Revert a recent ACPI LPSS driver commit that prevented the touchpad
driver from loading on Dell XPS13 (Jarkko Nikula).
- Make the ACPI LPSS driver disable the I2C controllers and deassert
SPI host controllers resets at startup on Intel BayTrail and
Braswell SoCs in case they have been left in wrong states by the
platform firmware which then may casuse fatal controller driver
failures during resume from hibernation (Mika Westerberg).
- Make two recently added ACPI EC messages look better (Scot Doyle).
- Reduce the printk level of a recently added debug message related
to ACPI resources that may become noisy in some cases (Rafael J
Wysocki).
- Add a new ACPI backlight blacklist entry for Samsung Series 9
(900X3C/900X3D/900X3E/900X4C/900X4D) laptops where the native
backlight interface doesn't work while the ACPI based one does
(Jens Reyer).
- Make the PNP sybsystem's core code use __request_region() followed
by __release_region() instead of __check_region() which then will
allow us to get rid of the latter as it has no more users (Jakub
Sitnicki).
- Fix a build breakage and an issue with two __init functions that
may be called after initialization in the s3c cpufreq driver (Arnd
Bergmann).
- Make the powernv cpuidle driver read target_residency values for
idle states from a Device Tree (as we have the suitable DT bindings
for that now) and improve the parsing of the powermgmt DT node in
that driver (Preeti U Murthy)"
* tag 'pm+acpi-3.20-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: powernv: Avoid endianness conversions while parsing DT
cpufreq: s3c: remove last use of resume_clocks callback
cpufreq: s3c: remove incorrect __init annotations
ACPI / LPSS: Deassert resets for SPI host controllers on Braswell
ACPI / LPSS: Always disable I2C host controllers
ACPI / resources: Change pr_info() to pr_debug() for debug information
ACPI / video: Disable native backlight on Samsung Series 9 laptops
cpuidle: powernv: Read target_residency value of idle states from DT if available
Revert "ACPI / LPSS: Remove non-existing clock control from Intel Lynxpoint I2C"
ACPI / EC: Remove non-standard log emphasis
PNP: Switch from __check_region() to __request_region()
|
| | \ \ \ | |
| | \ \ \ | |
| | \ \ \ | |
| | \ \ \ | |
| |\ \ \ \ \ \
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
* pnp:
PNP: Switch from __check_region() to __request_region()
* pm-cpuidle:
cpuidle: powernv: Avoid endianness conversions while parsing DT
cpuidle: powernv: Read target_residency value of idle states from DT if available
* pm-cpufreq:
cpufreq: s3c: remove last use of resume_clocks callback
cpufreq: s3c: remove incorrect __init annotations
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Commit 32726d2d550 ("ARM: SAMSUNG: Remove legacy clock code")
already removed the callback pointer, but there was one remaining
user:
drivers/cpufreq/s3c24xx-cpufreq.c: In function 's3c_cpufreq_resume_clocks':
drivers/cpufreq/s3c24xx-cpufreq.c:149:14: error: 'struct s3c_cpufreq_info' has no member named 'resume_clocks'
cpu_cur.info->resume_clocks();
^
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 32726d2d550 ("ARM: SAMSUNG: Remove legacy clock code")
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.17+ <stable@vger.kernel.org> # v3.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The two functions s3c2416_cpufreq_driver_init and s3c_cpufreq_register
are marked init but are called from a context that might be run after
the __init sections are discarded, as the compiler points out:
WARNING: vmlinux.o(.data+0x1ad9dc): Section mismatch in reference from the variable s3c2416_cpufreq_driver to the function .init.text:s3c2416_cpufreq_driver_init()
WARNING: drivers/built-in.o(.text+0x35b5dc): Section mismatch in reference from the function s3c2410a_cpufreq_add() to the function .init.text:s3c_cpufreq_register()
This removes the __init markings.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
We currently read the information about idle states from the DT
so as to populate the cpuidle table. Use those APIs to read from
the DT that can avoid endianness conversions of the property values
in the cpuidle driver.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
available
The device tree now exposes the residency values for different idle states. Read
these values instead of calculating residency from the latency values. The values
exposed in the DT are validated for optimal power efficiency. However to maintain
compatibility with the older firmware code which does not expose residency
values, use default values as a fallback mechanism. While at it, use better
APIs to parse the powermgmt device tree node.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Stewart Smith <stewart@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
PNP core is the last user of the __check_region() which has been
deprecated for almost 12 years (since v2.5.54). Replace it with a combo
of __request_region() followed by __release_region().
pnp_check_port() and pnp_check_mem() remain racy after this change.
Signed-off-by: Jakub Sitnicki <jsitnicki@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | | | |
| | \ \ \ \ \ \ | |
| | \ \ \ \ \ \ | |
| | \ \ \ \ \ \ | |
| | \ \ \ \ \ \ | |
| | \ \ \ \ \ \ | |
| |\ \ \ \ \ \ \ \ \ \
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
* acpi-ec:
ACPI / EC: Remove non-standard log emphasis
* acpi-soc:
ACPI / LPSS: Deassert resets for SPI host controllers on Braswell
ACPI / LPSS: Always disable I2C host controllers
Revert "ACPI / LPSS: Remove non-existing clock control from Intel Lynxpoint I2C"
* acpi-video:
ACPI / video: Disable native backlight on Samsung Series 9 laptops
* acpi-resources:
ACPI / resources: Change pr_info() to pr_debug() for debug information
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Annoying and noisy ACPI debug messages are printed with pr_info()
after the recent ACPI resources handling rework. Replace the
pr_info() with pr_debug() to reduce to noise level.
Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Add video_disable_native_backlight quirk for SAMSUNG 900X3C/900X3D/
900X3E/900X4C/900X4D laptops.
The native intel backlight controls do not work correctly on SAMSUNG
Series 9 (900X3C/900X3D/900X3E/900X4C/900X4D) laptops:
One machine has an completely dimmed (= black) display after boot at the
GDM login screen and brightness controls work only between 0 and 5%
(= no effect).
Another machine has the same brightness control issues if an external
HDMI monitor is or gets connected, although the initial brightness is
ok.
After login to Gnome both machines always work fine.
Tested on both machines.
Link: https://bugs.freedesktop.org/show_bug.cgi?id=87286
Link: https://bugs.debian.org/772440
Signed-off-by: Jens Reyer <jens.reyer@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
On some Braswell systems BIOS leaves resets for SPI host controllers
active. This prevents the SPI driver from transferring messages on wire.
Fix this in similar way that we do for I2C already by deasserting resets
for the SPI host controllers.
Reported-by: Yang A Fang <yang.a.fang@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: 3.17+ <stable@vger.kernel.org> # 3.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
On Baytrail and Braswell the BIOS might leave the I2C host controllers
enabled, probably because it uses them for its own purposes. This is fine
in normal cases because the I2C driver will disable the hardware when it
is probed anyway.
However, in case of suspend to disk it is different story. If the driver
happens to be compiled as a module the boot kernel never loads the driver
thus leaving host controllers enabled upon loading the hibernation image.
The I2C host controller interrupt mask register has default value of 0x8ff,
in other words it has most of the interrupts unmasked. When combined with
the fact that the host controller is enabled, the driver immediately starts
getting interrupts even before its resume hook is called (once IO-APIC is
resumed). Since the driver is not prepared for this it will crash the
kernel due to NULL pointer derefence because dev->msgs is NULL.
Unfortunately we were not able to get full backtrace to from the console
which could be reproduced here.
In order to fix this even when the driver is compiled as module, we disable
the I2C host controllers in byt_i2c_setup() before devices are created.
Reported-by: Yu Chen <yu.c.chen@intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: 3.17+ <stable@vger.kernel.org> # 3.17+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Revert commit b893e80e3147 ("ACPI / LPSS: Remove non-existing clock control
from Intel Lynxpoint I2C") because it causes touchpad to not load on Dell
XPS13.
Regression is a clear indication that not only some early prototype version
of Lynxpoint I2C but also newer versions can be doing clock gating even
documentation does not state it.
Therefore it is best to revert since this clock gating haven't caused known
issues on those Lynxpoint version which don't do clock gating.
Reported-by-and-tested-by: Chris Rorvick <chris@rorvick.com>
Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | |
| | | | | | | | | | | | |
Remove unusual pr_info() visual emphasis introduced in ad479e7f47ca
"ACPI / EC: Introduce STARTED/STOPPED flags to replace BLOCKED flag".
Signed-off-by: Scot Doyle <lkml14@scotdoyle.com>
[ rjw: Change pr_info() to pr_debug() too in those places. ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|\ \ \ \ \ \ \ \ \ \ \ \
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | |
| | | | | | | | | | | | | |
Pull followup block layer updates from Jens Axboe:
"Two things in this pull request:
- A block throttle oops fix (marked for stable) from Thadeu.
- The NVMe fixes/features queued up for 3.20, but merged later in the
process. From Keith. We should have gotten this merged earlier,
we're ironing out the kinks in the process. Will be ready for the
initial pull next series"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-throttle: check stats_cpu before reading it from sysfs
NVMe: Fix potential corruption on sync commands
NVMe: Remove unused variables
NVMe: Fix scsi mode select llbaa setting
NVMe: Fix potential corruption during shutdown
NVMe: Asynchronous controller probe
NVMe: Register management handle under nvme class
NVMe: Update SCSI Inquiry VPD 83h translation
NVMe: Metadata format support
|
| |\ \ \ \ \ \ \ \ \ \ \ \
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
into for-linus
Merge 3.20 NVMe changes from Keith.
|
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
This makes all sync commands uninterruptible and schedules without timeout
so the controller either has to post a completion or the timeout recovery
fails the command. This fixes potential memory or data corruption from
a command timing out too early or woken by a signal. Previously any DMA
buffers mapped for that command would have been released even though we
don't know what the controller is planning to do with those addresses.
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
We don't track queues in a llist, subscribe to hot-cpu notifications,
or internally retry commands. Delete the unused artifacts.
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
It should be a logical bitwise AND, not conditional.
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
The driver has to end unreturned commands at some point even if the
controller has not provided a completion. The driver tried to be safe by
deleting IO queues prior to ending all unreturned commands. That should
cause the controller to internally abort inflight commands, but IO queue
deletion request does not have to be successful, so all bets are off. We
still have to make progress, so to be extra safe, this patch doesn't
clear a queue to release the dma mapping for a command until after the
pci device has been disabled.
This patch removes the special handling during device initialization
so controller recovery can be done all the time. This is possible since
initialization is not inlined with pci probe anymore.
Reported-by: Nilish Choudhury <nilesh.choudhury@oracle.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
This performs the longest parts of nvme device probe in scheduled work.
This speeds up probe significantly when multiple devices are in use.
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
This creates a new class type for nvme devices to register their
management character devices with. This is so we do not rely on miscdev
to provide enough minors for as many nvme devices some people plan to
use. The previous limit was approximately 60 NVMe controllers, depending
on the platform and kernel. Now the limit is 1M, which ought to be enough
for anybody.
Since we have a new device class, it makes sense to attach the block
devices under this as well, so part of this patch moves the management
handle initialization prior to the namespaces discovery.
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
The original translation created collisions on Inquiry VPD 83 for many
existing devices. Newer specifications provide other ways to translate
based on the device's version can be used to create unique identifiers.
Version 1.1 provides an EUI64 field that uniquely identifies each
namespace, and 1.2 added the longer NGUID field for the same reason.
Both follow the IEEE EUI format and readily translate to the SCSI device
identification EUI designator type 2h. For devices implementing either,
the translation will use this type, defaulting to the EUI64 8-byte type if
implemented then NGUID's 16 byte version if not. If neither are provided,
the 1.0 translation is used, and is updated to use the SCSI String format
to guarantee a unique identifier.
Knowing when to use the new fields depends on the nvme controller's
revision. The NVME_VS macro was not decoding this correctly, so that is
fixed in this patch and moved to a more appropriate place.
Since the Identify Namespace structure required an update for the NGUID
field, this patch adds the remaining new 1.2 fields to the structure.
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
Adds support for NVMe metadata formats and exposes block devices for
all namespaces regardless of their format. Namespace formats that are
unusable will have disk capacity set to 0, but a handle to the block
device is created to simplify device management. A namespace is not
usable when the format requires host interleave block and metadata in
single buffer, has no provisioned storage, or has better data but failed
to register with blk integrity.
The namespace has to be scanned in two phases to support separate
metadata formats. The first establishes the sector size and capacity
prior to invoking add_disk. If metadata is required, the capacity will
be temporarilly set to 0 until it can be revalidated and registered with
the integrity extenstions after add_disk completes.
The driver relies on the integrity extensions to provide the metadata
buffer. NVMe requires this be a single physically contiguous region,
so only one integrity segment is allowed per command. If the metadata
is used for T10 PI, the driver provides mappings to save and restore
the reftag physical block translation. The driver provides no-op
functions for generate and verify if metadata is not used for protection
information. This way the setup is always provided by the block layer.
If a request does not supply a required metadata buffer, the command
is failed with bad address. This could only happen if a user manually
disables verify/generate on such a disk. The only exception to where
this is okay is if the controller is capable of stripping/generating
the metadata, which is possible on some types of formats.
The metadata scatter gather list now occupies the spot in the nvme_iod
that used to be used to link retryable IOD's, but we don't do that
anymore, so the field was unused.
Signed-off-by: Keith Busch <keith.busch@intel.com>
|
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
When reading blkio.throttle.io_serviced in a recently created blkio
cgroup, it's possible to race against the creation of a throttle policy,
which delays the allocation of stats_cpu.
Like other functions in the throttle code, just checking for a NULL
stats_cpu prevents the following oops caused by that race.
[ 1117.285199] Unable to handle kernel paging request for data at address 0x7fb4d0020
[ 1117.285252] Faulting instruction address: 0xc0000000003efa2c
[ 1137.733921] Oops: Kernel access of bad area, sig: 11 [#1]
[ 1137.733945] SMP NR_CPUS=2048 NUMA PowerNV
[ 1137.734025] Modules linked in: bridge stp llc kvm_hv kvm binfmt_misc autofs4
[ 1137.734102] CPU: 3 PID: 5302 Comm: blkcgroup Not tainted 3.19.0 #5
[ 1137.734132] task: c000000f1d188b00 ti: c000000f1d210000 task.ti: c000000f1d210000
[ 1137.734167] NIP: c0000000003efa2c LR: c0000000003ef9f0 CTR: c0000000003ef980
[ 1137.734202] REGS: c000000f1d213500 TRAP: 0300 Not tainted (3.19.0)
[ 1137.734230] MSR: 9000000000009032 <SF,HV,EE,ME,IR,DR,RI> CR: 42008884 XER: 20000000
[ 1137.734325] CFAR: 0000000000008458 DAR: 00000007fb4d0020 DSISR: 40000000 SOFTE: 0
GPR00: c0000000003ed3a0 c000000f1d213780 c000000000c59538 0000000000000000
GPR04: 0000000000000800 0000000000000000 0000000000000000 0000000000000000
GPR08: ffffffffffffffff 00000007fb4d0020 00000007fb4d0000 c000000000780808
GPR12: 0000000022000888 c00000000fdc0d80 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 000001003e120200 c000000f1d5b0cc0 0000000000000200 0000000000000000
GPR24: 0000000000000001 c000000000c269e0 0000000000000020 c000000f1d5b0c80
GPR28: c000000000ca3a08 c000000000ca3dec c000000f1c667e00 c000000f1d213850
[ 1137.734886] NIP [c0000000003efa2c] .tg_prfill_cpu_rwstat+0xac/0x180
[ 1137.734915] LR [c0000000003ef9f0] .tg_prfill_cpu_rwstat+0x70/0x180
[ 1137.734943] Call Trace:
[ 1137.734952] [c000000f1d213780] [d000000005560520] 0xd000000005560520 (unreliable)
[ 1137.734996] [c000000f1d2138a0] [c0000000003ed3a0] .blkcg_print_blkgs+0xe0/0x1a0
[ 1137.735039] [c000000f1d213960] [c0000000003efb50] .tg_print_cpu_rwstat+0x50/0x70
[ 1137.735082] [c000000f1d2139e0] [c000000000104b48] .cgroup_seqfile_show+0x58/0x150
[ 1137.735125] [c000000f1d213a70] [c0000000002749dc] .kernfs_seq_show+0x3c/0x50
[ 1137.735161] [c000000f1d213ae0] [c000000000218630] .seq_read+0xe0/0x510
[ 1137.735197] [c000000f1d213bd0] [c000000000275b04] .kernfs_fop_read+0x164/0x200
[ 1137.735240] [c000000f1d213c80] [c0000000001eb8e0] .__vfs_read+0x30/0x80
[ 1137.735276] [c000000f1d213cf0] [c0000000001eb9c4] .vfs_read+0x94/0x1b0
[ 1137.735312] [c000000f1d213d90] [c0000000001ebb38] .SyS_read+0x58/0x100
[ 1137.735349] [c000000f1d213e30] [c000000000009218] syscall_exit+0x0/0x98
[ 1137.735383] Instruction dump:
[ 1137.735405] 7c6307b4 7f891800 409d00b8 60000000 60420000 3d420004 392a63b0 786a1f24
[ 1137.735471] 7d49502a e93e01c8 7d495214 7d2ad214 <7cead02a> e9090008 e9490010 e9290018
And here is one code that allows to easily reproduce this, although this
has first been found by running docker.
void run(pid_t pid)
{
int n;
int status;
int fd;
char *buffer;
buffer = memalign(BUFFER_ALIGN, BUFFER_SIZE);
n = snprintf(buffer, BUFFER_SIZE, "%d\n", pid);
fd = open(CGPATH "/test/tasks", O_WRONLY);
write(fd, buffer, n);
close(fd);
if (fork() > 0) {
fd = open("/dev/sda", O_RDONLY | O_DIRECT);
read(fd, buffer, 512);
close(fd);
wait(&status);
} else {
fd = open(CGPATH "/test/blkio.throttle.io_serviced", O_RDONLY);
n = read(fd, buffer, BUFFER_SIZE);
close(fd);
}
free(buffer);
exit(0);
}
void test(void)
{
int status;
mkdir(CGPATH "/test", 0666);
if (fork() > 0)
wait(&status);
else
run(getpid());
rmdir(CGPATH "/test");
}
int main(int argc, char **argv)
{
int i;
for (i = 0; i < NR_TESTS; i++)
test();
return 0;
}
Reported-by: Ricardo Marin Matinata <rmm@br.ibm.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|\ \ \ \ \ \ \ \ \ \ \ \ \ \
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull more device mapper changes from Mike Snitzer:
- Significant dm-crypt CPU scalability performance improvements thanks
to changes that enable effective use of an unbound workqueue across
all available CPUs. A large battery of tests were performed to
validate these changes, summary of results is available here:
https://www.redhat.com/archives/dm-devel/2015-February/msg00106.html
- A few additional stable fixes (to DM core, dm-snapshot and dm-mirror)
and a small fix to the dm-space-map-disk.
* tag 'dm-3.20-changes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm snapshot: fix a possible invalid memory access on unload
dm: fix a race condition in dm_get_md
dm crypt: sort writes
dm crypt: add 'submit_from_crypt_cpus' option
dm crypt: offload writes to thread
dm crypt: remove unused io_pool and _crypt_io_pool
dm crypt: avoid deadlock in mempools
dm crypt: don't allocate pages for a partial request
dm crypt: use unbound workqueue for request processing
dm io: reject unsupported DISCARD requests with EOPNOTSUPP
dm mirror: do not degrade the mirror on discard error
dm space map disk: fix sm_disk_count_is_more_than_one()
|
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | |
When the snapshot target is unloaded, snapshot_dtr() waits until
pending_exceptions_count drops to zero. Then, it destroys the snapshot.
Therefore, the function that decrements pending_exceptions_count
should not touch the snapshot structure after the decrement.
pending_complete() calls free_pending_exception(), which decrements
pending_exceptions_count, and then it performs up_write(&s->lock) and it
calls retry_origin_bios() which dereferences s->origin. These two
memory accesses to the fields of the snapshot may touch the dm_snapshot
struture after it is freed.
This patch moves the call to free_pending_exception() to the end of
pending_complete(), so that the snapshot will not be destroyed while
pending_complete() is in progress.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
|
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | |
The function dm_get_md finds a device mapper device with a given dev_t,
increases the reference count and returns the pointer.
dm_get_md calls dm_find_md, dm_find_md takes _minor_lock, finds the
device, tests that the device doesn't have DMF_DELETING or DMF_FREEING
flag, drops _minor_lock and returns pointer to the device. dm_get_md then
calls dm_get. dm_get calls BUG if the device has the DMF_FREEING flag,
otherwise it increments the reference count.
There is a possible race condition - after dm_find_md exits and before
dm_get is called, there are no locks held, so the device may disappear or
DMF_FREEING flag may be set, which results in BUG.
To fix this bug, we need to call dm_get while we hold _minor_lock. This
patch renames dm_find_md to dm_get_md and changes it so that it calls
dm_get while holding the lock.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
|
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | |
Write requests are sorted in a red-black tree structure and are
submitted in the sorted order.
In theory the sorting should be performed by the underlying disk
scheduler, however, in practice the disk scheduler only accepts and
sorts a finite number of requests. To allow the sorting of all
requests, dm-crypt needs to implement its own sorting.
The overhead associated with rbtree-based sorting is considered
negligible so it is not used conditionally. Even on SSD sorting can be
beneficial since in-order request dispatch promotes lower latency IO
completion to the upper layers.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | |
Make it possible to disable offloading writes by setting the optional
'submit_from_crypt_cpus' table argument.
There are some situations where offloading write bios from the
encryption threads to a single thread degrades performance
significantly.
The default is to offload write bios to the same thread because it
benefits CFQ to have writes submitted using the same IO context.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | |
Submitting write bios directly in the encryption thread caused serious
performance degradation. On a multiprocessor machine, encryption requests
finish in a different order than they were submitted. Consequently, write
requests would be submitted in a different order and it could cause severe
performance degradation.
Move the submission of write requests to a separate thread so that the
requests can be sorted before submitting. But this commit improves
dm-crypt performance even without having dm-crypt perform request
sorting (in particular it enables IO schedulers like CFQ to sort more
effectively).
Note: it is required that a previous commit ("dm crypt: don't allocate
pages for a partial request") be applied before applying this patch.
Otherwise, this commit could introduce a crash.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | |
| | | | | | | | | | | | | | | |
The previous commit ("dm crypt: don't allocate pages for a partial
request") stopped using the io_pool slab mempool and backing
_crypt_io_pool kmem cache.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|