summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* clk: tegra: Add the DFLL as a possible parent of the cclk_g clockTuomas Tynkkynen2015-07-161-1/+3
| | | | | | | | | | The DFLL clocksource was missing from the list of possible parents for the fast CPU cluster. Add it to the list. Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi> Acked-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
* clk: tegra: Save/restore CCLKG_BURST_POLICY on suspendTuomas Tynkkynen2015-07-161-0/+14
| | | | | | | | | | | | Save and restore this register since the LP1 restore assembly routines fiddle with it. Otherwise the CPU would keep running on PLLX after resume from suspend even when DFLL was the original clocksource. Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
* clk: tegra: Add Tegra124 DFLL clocksource platform driverTuomas Tynkkynen2015-07-164-4/+172
| | | | | | | | | | | | | Add basic platform driver support for the fast CPU cluster DFLL clocksource found on Tegra124 SoCs. This small driver selects the appropriate Tegra124-specific characterization data and integration code. It relies on the DFLL common code to do most of the work. Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi> Acked-by: Michael Turquette <mturquette@linaro.org> [treding@nvidia.com: move setup code into ->probe()] Signed-off-by: Thierry Reding <treding@nvidia.com>
* clk: tegra: Add DFLL DVCO reset control for Tegra124Paul Walmsley2015-07-162-0/+80
| | | | | | | | | | | | | | | | | | | The DVCO present in the DFLL IP block has a separate reset line, exposed via the CAR IP block. This reset line is asserted upon SoC reset. Unless something (such as the DFLL driver) deasserts this line, the DVCO will not oscillate, although reads and writes to the DFLL IP block will complete. Thanks to Aleksandr Frid <afrid@nvidia.com> for identifying this and saving hours of debugging time. Signed-off-by: Paul Walmsley <pwalmsley@nvidia.com> [ttynkkynen: ported to tegra124 from tegra114] Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> [mikko.perttunen: ported to special reset callback] Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi> Acked-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
* clk: tegra: Introduce ability for SoC-specific reset control callbacksMikko Perttunen2015-07-162-8/+34
| | | | | | | | | | | | | | | | | This patch allows SoC-specific CAR initialization routines to register their own reset_assert and reset_deassert callbacks with the common Tegra CAR code. If defined, the common code will call these callbacks when a reset control with number >= num_periph_banks * 32 is attempted to be asserted or deasserted respectively. Numbers greater than or equal to num_periph_banks * 32 are used to avoid clashes with low numbers that are automatically mapped to standard CAR reset lines. Each SoC with these special resets should specify the defined reset control numbers in a device tree header file. Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi> Acked-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
* clk: tegra: Add functions for parsing CVB tablesTuomas Tynkkynen2015-07-163-0/+208
| | | | | | | | | | | | | | | | | | Tegra CVB tables encode the relationship between operating voltage and optimal frequency as a function of the so-called speedo value. The speedo value is written to the on-chip fuses at the factory, which allows the voltage-frequency operating points to be calculated on an per-chip basis. Add utility functions to parse the Tegra-specific tables and export the voltage-frequency pairs to the generic OPP framework for other drivers to use. Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
* clk: tegra: Add closed loop support for the DFLLTuomas Tynkkynen2015-07-161-3/+663
| | | | | | | | | | | | | | | | | | With closed loop support, the clock rate of the DFLL can be adjusted. The oscillator itself in the DFLL is a free-running oscillator whose rate is directly determined the supply voltage. However, the DFLL module contains logic to compare the DFLL output rate to a fixed reference clock (51 MHz) and make a decision to either lower or raise the DFLL supply voltage. The DFLL module can then autonomously change the supply voltage by communicating with an off-chip PMIC via either I2C or PWM signals. This driver currently supports only I2C. Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
* clk: tegra: Add library for the DFLL clock source (open-loop mode)Tuomas Tynkkynen2015-07-163-0/+1150
| | | | | | | | | | | | | | | | | | | | | | Add shared code to support the Tegra DFLL clocksource in open-loop mode. This root clocksource is present on the Tegra124 SoCs. The DFLL is the intended primary clock source for the fast CPU cluster. This code is very closely based on a patch by Paul Walmsley from December (http://comments.gmane.org/gmane.linux.ports.tegra/15273), which in turn comes from the internal driver by originally created by Aleksandr Frid <afrid@nvidia.com>. Subsequent patches will add support for closed loop mode and drivers for the Tegra124 fast CPU cluster DFLL devices, which rely on this code. Signed-off-by: Paul Walmsley <pwalmsley@nvidia.com> Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi> Acked-by: Peter De Schrijver <pdeschrijver@nvidia.com> Acked-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
* clk: tegra: Add binding for the Tegra124 DFLL clocksourceTuomas Tynkkynen2015-07-161-0/+79
| | | | | | | | | | | | The DFLL is the main clocksource for the fast CPU cluster on Tegra124 and also provides automatic CPU rail voltage scaling as well. The DFLL is a separate IP block from the usual Tegra124 clock-and-reset controller, so it gets its own node in the device tree. Signed-off-by: Tuomas Tynkkynen <ttynkkynen@nvidia.com> Signed-off-by: Mikko Perttunen <mikko.perttunen@kapsi.fi> Acked-by: Michael Turquette <mturquette@linaro.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
* Linux 4.2-rc1v4.2-rc1Linus Torvalds2015-07-051-2/+2
|
* Merge tag 'platform-drivers-x86-v4.2-2' of ↵Linus Torvalds2015-07-057-45/+1004
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull late x86 platform driver updates from Darren Hart: "The following came in a bit later and I wanted them to bake in next a few more days before submitting, thus the second pull. A new intel_pmc_ipc driver, a symmetrical allocation and free fix in dell-laptop, a couple minor fixes, and some updated documentation in the dell-laptop comments. intel_pmc_ipc: - Add Intel Apollo Lake PMC IPC driver tc1100-wmi: - Delete an unnecessary check before the function call "kfree" dell-laptop: - Fix allocating & freeing SMI buffer page - Show info about WiGig and UWB in debugfs - Update information about wireless control" * tag 'platform-drivers-x86-v4.2-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driver tc1100-wmi: Delete an unnecessary check before the function call "kfree" dell-laptop: Fix allocating & freeing SMI buffer page dell-laptop: Show info about WiGig and UWB in debugfs dell-laptop: Update information about wireless control
| * intel_pmc_ipc: Add Intel Apollo Lake PMC IPC driverqipeng.zha2015-06-295-0/+864
| | | | | | | | | | | | | | | | | | | | | | This driver provides support for PMC control on Apollo Lake platforms. The PMC is an ARC processor which defines some IPC commands for communication with other entities in the CPU. Signed-off-by: qipeng.zha <qipeng.zha@intel.com> [fengguang.wu@intel.com: Fix Sparse and Cocinelle warnings] Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
| * tc1100-wmi: Delete an unnecessary check before the function call "kfree"Markus Elfring2015-06-261-1/+1
| | | | | | | | | | | | | | | | | | | | The kfree() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
| * dell-laptop: Fix allocating & freeing SMI buffer pagePali Rohár2015-06-241-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit fix kernel crash when probing for rfkill devices in dell-laptop driver failed. Function free_page() was incorrectly used on struct page * instead of virtual address of SMI buffer. This commit also simplify allocating page for SMI buffer by using __get_free_page() function instead of sequential call of functions alloc_page() and page_address(). Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Michal Hocko <mhocko@suse.cz> Cc: stable@vger.kernel.org Signed-off-by: Darren Hart <dvhart@linux.intel.com>
| * dell-laptop: Show info about WiGig and UWB in debugfsPali Rohár2015-06-221-0/+17
| | | | | | | | | | | | | | | | This commit show additional information about rfkill state in debugfs based on newly released documentation by Dell. Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
| * dell-laptop: Update information about wireless controlPali Rohár2015-06-221-39/+119
| | | | | | | | | | | | | | | | Make sure that all existing SMBIOS calls for wireless control are properly documented. This commit also add new documentation released by Dell. Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Darren Hart <dvhart@linux.intel.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2015-07-0499-553/+784
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted VFS fixes and related cleanups (IMO the most interesting in that part are f_path-related things and Eric's descriptor-related stuff). UFS regression fixes (it got broken last cycle). 9P fixes. fs-cache series, DAX patches, Jan's file_remove_suid() work" [ I'd say this is much more than "fixes and related cleanups". The file_table locking rule change by Eric Dumazet is a rather big and fundamental update even if the patch isn't huge. - Linus ] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits) 9p: cope with bogus responses from server in p9_client_{read,write} p9_client_write(): avoid double p9_free_req() 9p: forgetting to cancel request on interrupted zero-copy RPC dax: bdev_direct_access() may sleep block: Add support for DAX reads/writes to block devices dax: Use copy_from_iter_nocache dax: Add block size note to documentation fs/file.c: __fget() and dup2() atomicity rules fs/file.c: don't acquire files->file_lock in fd_install() fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation vfs: avoid creation of inode number 0 in get_next_ino namei: make set_root_rcu() return void make simple_positive() public ufs: use dir_pages instead of ufs_dir_pages() pagemap.h: move dir_pages() over there remove the pointless include of lglock.h fs: cleanup slight list_entry abuse xfs: Correctly lock inode when removing suid and file capabilities fs: Call security_ops->inode_killpriv on truncate fs: Provide function telling whether file_remove_privs() will do anything ...
| * | 9p: cope with bogus responses from server in p9_client_{read,write}Al Viro2015-07-041-0/+8
| | | | | | | | | | | | | | | | | | if server claims to have written/read more than we'd told it to, warn and cap the claimed byte count to avoid advancing more than we are ready to.
| * | p9_client_write(): avoid double p9_free_req()Al Viro2015-07-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Braino in "9p: switch p9_client_write() to passing it struct iov_iter *"; if response is impossible to parse and we discard the request, get the out of the loop right there. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | 9p: forgetting to cancel request on interrupted zero-copy RPCAl Viro2015-07-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we'd already sent a request and decide to abort it, we *must* issue TFLUSH properly and not just blindly reuse the tag, or we'll get seriously screwed when response eventually arrives and we confuse it for response to later request that had reused the same tag. Cc: stable@vger.kernel.org # v3.2 and later Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | dax: bdev_direct_access() may sleepMatthew Wilcox2015-07-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The brd driver is the only in-tree driver that may sleep currently. After some discussion on linux-fsdevel, we decided that any driver may choose to sleep in its ->direct_access method. To ensure that all callers of bdev_direct_access() are prepared for this, add a call to might_sleep(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | block: Add support for DAX reads/writes to block devicesMatthew Wilcox2015-07-042-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a block device supports the ->direct_access methods, bypass the normal DIO path and use DAX to go straight to memcpy() instead of allocating a DIO and a BIO. Includes support for the DIO_SKIP_DIO_COUNT flag in DAX, as is done in do_blockdev_direct_IO(). Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | dax: Use copy_from_iter_nocacheMatthew Wilcox2015-07-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When userspace does a write, there's no need for the written data to pollute the CPU cache. This matches the original XIP code. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | dax: Add block size note to documentationMatthew Wilcox2015-07-041-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | For block devices which are small enough, mkfs will default to creating a filesystem with block sizes smaller than page size. Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs/file.c: __fget() and dup2() atomicity rulesEric Dumazet2015-07-011-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __fget() does lockless fetch of pointer from the descriptor table, attempts to grab a reference and treats "it was already zero" as "it's already gone from the table, we just hadn't seen the store, let's fail". Unfortunately, that breaks the atomicity of dup2() - __fget() might see the old pointer, notice that it's been already dropped and treat that as "it's closed". What we should be getting is either the old file or new one, depending whether we come before or after dup2(). Dmitry had following test failing sometimes : int fd; void *Thread(void *x) { char buf; int n = read(fd, &buf, 1); if (n != 1) exit(printf("read failed: n=%d errno=%d\n", n, errno)); return 0; } int main() { fd = open("/dev/urandom", O_RDONLY); int fd2 = open("/dev/urandom", O_RDONLY); if (fd == -1 || fd2 == -1) exit(printf("open failed\n")); pthread_t th; pthread_create(&th, 0, Thread, 0); if (dup2(fd2, fd) == -1) exit(printf("dup2 failed\n")); pthread_join(th, 0); if (close(fd) == -1) exit(printf("close failed\n")); if (close(fd2) == -1) exit(printf("close failed\n")); printf("DONE\n"); return 0; } Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs/file.c: don't acquire files->file_lock in fd_install()Eric Dumazet2015-07-013-19/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mateusz Guzik reported : Currently obtaining a new file descriptor results in locking fdtable twice - once in order to reserve a slot and second time to fill it. Holding the spinlock in __fd_install() is needed in case a resize is done, or to prevent a resize. Mateusz provided an RFC patch and a micro benchmark : http://people.redhat.com/~mguzik/pipebench.c A resize is an unlikely operation in a process lifetime, as table size is at least doubled at every resize. We can use RCU instead of the spinlock. __fd_install() must wait if a resize is in progress. The resize must block new __fd_install() callers from starting, and wait that ongoing install are finished (synchronize_sched()) resize should be attempted by a single thread to not waste resources. rcu_sched variant is used, as __fd_install() and expand_fdtable() run from process context. It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R) CPU E5-2696 v2 @ 2.50GHz Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Mateusz Guzik <mguzik@redhat.com> Acked-by: Mateusz Guzik <mguzik@redhat.com> Tested-by: Mateusz Guzik <mguzik@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper ↵Wang YanQing2015-07-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | limitation Execution of get_anon_bdev concurrently and preemptive kernel all could bring race condition, it isn't enough to check dev against its upper limitation with equality operator only. This patch fix it. Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | vfs: avoid creation of inode number 0 in get_next_inoCarlos Maiolino2015-06-301-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | currently, get_next_ino() is able to create inodes with inode number = 0. This have a bad impact in the filesystems relying in this function to generate inode numbers. While there is no problem at all in having inodes with number 0, userspace tools which handle file management tasks can have problems handling these files, like for example, the impossiblity of users to delete these files, since glibc will ignore them. So, I believe the best way is kernel to avoid creating them. This problem has been raised previously, but the old thread didn't have any other update for a year+, and I've seen too many users hitting the same issue regarding the impossibility to delete files while using filesystems relying on this function. So, I'm starting the thread again, with the same patch that I believe is enough to address this problem. Signed-off-by: Carlos Maiolino <cmaiolino@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | namei: make set_root_rcu() return voidAl Viro2015-06-291-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | The only caller that cares about its return value can just as easily pick it from nd->root_seq itself. We used to just calculate it and return to caller, but these days we are storing it in nd->root_seq in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | make simple_positive() publicAl Viro2015-06-2314-60/+25
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | ufs: use dir_pages instead of ufs_dir_pages()Fabian Frederick2015-06-231-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | dir_pages was declared in a lot of filesystems. Use newly dir_pages() from pagemap.h Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | pagemap.h: move dir_pages() over thereFabian Frederick2015-06-238-38/+6
| | | | | | | | | | | | | | | | | | | | | | | | That function was declared in a lot of filesystems to calculate directory pages. Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | remove the pointless include of lglock.hAl Viro2015-06-231-1/+0
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fs: cleanup slight list_entry abuseRasmus Villemoes2015-06-2313-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | list_entry is just a wrapper for container_of, but it is arguably wrong (and slightly confusing) to use it when the pointed-to struct member is not a struct list_head. Use container_of directly instead. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | Merge branch 'fscache-fixes' into for-nextAl Viro2015-06-2311-184/+378
| |\ \
| | * | FS-Cache: Retain the netfs context in the retrieval op earlierDavid Howells2015-04-023-11/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the retrieval operation may be disposed of by fscache_put_operation() before we actually set the context, the retrieval-specific cleanup operation can produce a NULL-pointer dereference when it tries to unconditionally clean up the netfs context. Given that it is expected that we'll get at least as far as the place where we currently set the context pointer and it is unlikely we'll go through the error handling paths prior to that point, retain the context right from the point that the retrieval op is allocated. Concomitant to this, we need to retain the cookie pointer in the retrieval op also so that we can call the netfs to release its context in the release method. In addition, we might now get into fscache_release_retrieval_op() with the op only initialised. To this end, set the operation to DEAD only after the release method has been called and skip the n_pages test upon cleanup if the op is still in the INITIALISED state. Without these changes, the following oops might be seen: BUG: unable to handle kernel NULL pointer dereference at 00000000000000b8 ... RIP: 0010:[<ffffffffa0089c98>] fscache_release_retrieval_op+0xae/0x100 ... Call Trace: [<ffffffffa0088560>] fscache_put_operation+0x117/0x2e0 [<ffffffffa008b8f5>] __fscache_read_or_alloc_pages+0x351/0x3ac [<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs] [<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs] [<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e [<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a [<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c [<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af [<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a [<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a [<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs] [<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs] [<ffffffff811363be>] new_sync_read+0x78/0x9c [<ffffffff81137164>] __vfs_read+0x13/0x38 [<ffffffff8113721e>] vfs_read+0x95/0x121 [<ffffffff811372f6>] SyS_read+0x4c/0x8a [<ffffffff81557a52>] system_call_fastpath+0x12/0x17 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: The operation cancellation method needs calling in more placesDavid Howells2015-04-026-40/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Any time an incomplete operation is cancelled, the operation cancellation function needs to be called to clean up. This is currently being passed directly to some of the functions that might want to call it, but not all. Instead, pass the cancellation method pointer to the fscache_operation_init() and have that cache it in the operation struct. Further, plug in a dummy cancellation handler if the caller declines to set one as this allows us to call the function unconditionally (the extra overhead isn't worth bothering about as we don't expect to be calling this typically). The cancellation method must thence be called everywhere the CANCELLED state is set. Note that we call it *before* setting the CANCELLED state such that the method can use the old state value to guide its operation. fscache_do_cancel_retrieval() needs moving higher up in the sources so that the init function can use it now. Without this, the following oops may be seen: FS-Cache: Assertion failed FS-Cache: 3 == 0 is false ------------[ cut here ]------------ kernel BUG at ../fs/fscache/page.c:261! ... RIP: 0010:[<ffffffffa0089c1b>] fscache_release_retrieval_op+0x77/0x100 [<ffffffffa008853d>] fscache_put_operation+0x114/0x2da [<ffffffffa008b8c2>] __fscache_read_or_alloc_pages+0x358/0x3b3 [<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs] [<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs] [<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e [<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a [<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c [<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af [<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a [<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a [<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs] [<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs] [<ffffffff811363be>] new_sync_read+0x78/0x9c [<ffffffff81137164>] __vfs_read+0x13/0x38 [<ffffffff8113721e>] vfs_read+0x95/0x121 [<ffffffff811372f6>] SyS_read+0x4c/0x8a [<ffffffff81557a52>] system_call_fastpath+0x12/0x17 The assertion is showing that the remaining number of pages (n_pages) is not 0 when the operation is being released. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: Put an aborted initialised op so that it is accounted correctlyDavid Howells2015-04-022-33/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Call fscache_put_operation() or a wrapper on any op that has gone through fscache_operation_init() so that the accounting shown in /proc is done correctly, specifically fscache_n_op_release. fscache_put_operation() therefore now allows an op in the INITIALISED state as well as in the CANCELLED and COMPLETE states. Note that this means that an operation can get put that doesn't have its ->object pointer filled in, so anything that depends on the object needs to be conditional in fscache_put_operation(). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: Fix cancellation of in-progress operationDavid Howells2015-04-021-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Cancellation of an in-progress operation needs to update the relevant counters and start any operations that are pending waiting on this one. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: Count the number of initialised operationsDavid Howells2015-04-024-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Count and display through /proc/fs/fscache/stats the number of initialised operations. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: Out of line fscache_operation_init()David Howells2015-04-022-21/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Out of line fscache_operation_init() so that it can access internal FS-Cache features, such as stats, in a later commit. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: Permit fscache_cancel_op() to cancel in-progress operations tooDavid Howells2015-04-023-6/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, fscache_cancel_op() only cancels pending operations - attempts to cancel in-progress operations are ignored. This leads to a problem in fscache_wait_for_operation_activation() whereby the wait is terminated, but the object has been killed. The check at the end of the function now triggers because it's no longer contingent on the cache having produced an I/O error since the commit that fixed the logic error in fscache_object_is_dead(). The result of the check is that it tries to cancel the operation - but since the object may not be pending by this point, the cancellation request may be ignored - with the result that the the object is just put by the caller and fscache_put_operation has an assertion failure because the operation isn't in either the COMPLETE or the CANCELLED states. To fix this, we permit in-progress ops to be cancelled under some circumstances. The bug results in an oops that looks something like this: FS-Cache: fscache_wait_for_operation_activation() = -ENOBUFS [obj dead 3] FS-Cache: FS-Cache: Assertion failed FS-Cache: 3 == 5 is false ------------[ cut here ]------------ kernel BUG at ../fs/fscache/operation.c:432! ... RIP: 0010:[<ffffffffa0088574>] fscache_put_operation+0xf2/0x2cd Call Trace: [<ffffffffa008b92a>] __fscache_read_or_alloc_pages+0x2ec/0x3b3 [<ffffffffa00b761f>] __nfs_readpages_from_fscache+0x59/0xbf [nfs] [<ffffffffa00b06c5>] nfs_readpages+0x10c/0x185 [nfs] [<ffffffff81124925>] ? alloc_pages_current+0x119/0x13e [<ffffffff810ee5fd>] ? __page_cache_alloc+0xfb/0x10a [<ffffffff810f87f8>] __do_page_cache_readahead+0x188/0x22c [<ffffffff810f8b3a>] ondemand_readahead+0x29e/0x2af [<ffffffff810f8c92>] page_cache_sync_readahead+0x38/0x3a [<ffffffff810ef337>] generic_file_read_iter+0x1a2/0x55a [<ffffffffa00a9dff>] ? nfs_revalidate_mapping+0xd6/0x288 [nfs] [<ffffffffa00a6a23>] nfs_file_read+0x49/0x70 [nfs] [<ffffffff811363be>] new_sync_read+0x78/0x9c [<ffffffff81137164>] __vfs_read+0x13/0x38 [<ffffffff8113721e>] vfs_read+0x95/0x121 [<ffffffff811372f6>] SyS_read+0x4c/0x8a [<ffffffff81557a52>] system_call_fastpath+0x12/0x17 Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: fscache_object_is_dead() has wrong logic, kill itDavid Howells2015-04-023-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fscache_object_is_dead() returns true only if the object is marked dead and the cache got an I/O error. This should be a logical OR instead. Since two of the callers got split up into handling for separate subcases, expand the other callers and kill the function. This is probably the right thing to do anyway since one of the subcases isn't about the object at all, but rather about the cache. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: Synchronise object death state change vs operation submissionDavid Howells2015-04-021-3/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an object is being marked as no longer live, do this under the object spinlock to prevent a race with operation submission targeted on that object. The problem occurs due to the following pair of intertwined sequences when the cache tries to create an object that would take it over the hard available space limit: NETFS INTERFACE =============== (A) The netfs calls fscache_acquire_cookie(). object creation is deferred to the object state machine and the netfs is allowed to continue. OBJECT STATE MACHINE KTHREAD ============================ (1) The object is looked up on disk by fscache_look_up_object() calling cachefiles_walk_to_object(). The latter finds that the object is not yet represented on disk and calls fscache_object_lookup_negative(). (2) fscache_object_lookup_negative() sets FSCACHE_COOKIE_NO_DATA_YET and clears FSCACHE_COOKIE_LOOKING_UP, thus allowing the netfs to start queuing read operations. (B) The netfs calls fscache_read_or_alloc_pages(). This calls fscache_wait_for_deferred_lookup() which sees FSCACHE_COOKIE_LOOKING_UP become clear, allowing the read to begin. (C) A read operation is set up and passed to fscache_submit_op() to deal with. (3) cachefiles_walk_to_object() calls cachefiles_has_space(), which fails (or one of the file operations to create stuff fails). cachefiles returns an error to fscache. (4) fscache_look_up_object() transits to the LOOKUP_FAILURE state, (5) fscache_lookup_failure() sets FSCACHE_OBJECT_LOOKED_UP and FSCACHE_COOKIE_UNAVAILABLE and clears FSCACHE_COOKIE_LOOKING_UP then transits to the KILL_OBJECT state. (6) fscache_kill_object() clears FSCACHE_OBJECT_IS_LIVE in an attempt to reject any further requests from the netfs. (7) object->n_ops is examined and found to be 0. fscache_kill_object() transits to the DROP_OBJECT state. (D) fscache_submit_op() locks the object spinlock, sees if it can dispatch the op immediately by calling fscache_object_is_active() - which fails since FSCACHE_OBJECT_IS_AVAILABLE has not yet been set. (E) fscache_submit_op() then tests FSCACHE_OBJECT_LOOKED_UP - which is set. It then queues the object and increments object->n_ops. (8) fscache_drop_object() releases the object and eventually fscache_put_object() calls cachefiles_put_object() which suffers an assertion failure here: ASSERTCMP(object->fscache.n_ops, ==, 0); Locking the object spinlock in step (6) around the clearance of FSCACHE_OBJECT_IS_LIVE ensures that the the decision trees in fscache_submit_op() and fscache_submit_exclusive_op() don't see the IS_LIVE flag being cleared mid-decision: either the op is queued before step (7) - in which case fscache_kill_object() will see n_ops>0 and will deal with the op - or the op will be rejected. This, combined with rejecting op submission if the target object is dying, fix the problem. The problem shows up as the following oops: CacheFiles: Assertion failed CacheFiles: 1 == 0 is false ------------[ cut here ]------------ kernel BUG at ../fs/cachefiles/interface.c:339! ... RIP: 0010:[<ffffffffa014fd9c>] [<ffffffffa014fd9c>] cachefiles_put_object+0x2a4/0x301 [cachefiles] ... Call Trace: [<ffffffffa008674b>] fscache_put_object+0x18/0x21 [fscache] [<ffffffffa00883e6>] fscache_object_work_func+0x3ba/0x3c9 [fscache] [<ffffffff81054dad>] process_one_work+0x226/0x441 [<ffffffff81055d91>] worker_thread+0x273/0x36b [<ffffffff81055b1e>] ? rescuer_thread+0x2e1/0x2e1 [<ffffffff81059b9d>] kthread+0x10e/0x116 [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb [<ffffffff815579ac>] ret_from_fork+0x7c/0xb0 [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: Handle a new operation submitted against a killed objectDavid Howells2015-04-022-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reject new operations that are being submitted against an object if that object has failed its lookup or creation states or has been killed by the cache backend for some other reason, such as having been culled. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: When submitting an op, cancel it if the target object is dyingDavid Howells2015-04-022-19/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When submitting an operation, prefer to cancel the operation immediately rather than queuing it for later processing if the object is marked as dying (ie. the object state machine has reached the KILL_OBJECT state). Whilst we're at it, change the series of related test_bit() calls into a READ_ONCE() and bitwise-AND operators to reduce the number of load instructions (test_bit() has a volatile address). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: Move fscache_report_unexpected_submission() to make it more availableDavid Howells2015-04-021-37/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move fscache_report_unexpected_submission() up within operation.c so that it can be called from fscache_submit_exclusive_op() too. Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| | * | FS-Cache: Count culled objects and objects rejected due to lack of spaceDavid Howells2015-02-248-13/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Count the number of objects that get culled by the cache backend and the number of objects that the cache backend declines to instantiate due to lack of space in the cache. These numbers are made available through /proc/fs/fscache/stats Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Steve Dickson <steved@redhat.com> Acked-by: Jeff Layton <jeff.layton@primarydata.com>
| * | | xfs: Correctly lock inode when removing suid and file capabilitiesJan Kara2015-06-231-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently XFS calls file_remove_privs() without holding i_mutex. This is wrong because that function can end up messing with file permissions and file capabilities stored in xattrs for which we need i_mutex held. Fix the problem by grabbing iolock exclusively when we will need to change anything in permissions / xattrs. Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | fs: Call security_ops->inode_killpriv on truncateJan Kara2015-06-233-6/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Comment in include/linux/security.h says that ->inode_killpriv() should be called when setuid bit is being removed and that similar security labels (in fact this applies only to file capabilities) should be removed at this time as well. However we don't call ->inode_killpriv() when we remove suid bit on truncate. We fix the problem by calling ->inode_need_killpriv() and subsequently ->inode_killpriv() on truncate the same way as we do it on file write. After this patch there's only one user of should_remove_suid() - ocfs2 - and indeed it's buggy because it doesn't call ->inode_killpriv() on write. However fixing it is difficult because of special locking constraints. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
OpenPOWER on IntegriCloud