summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* sysfs, kernfs: prepare open, release, poll paths for kernfsTejun Heo2013-11-293-21/+10
| | | | | | | | | | | | | | We're in the process of separating out core sysfs functionality into kernfs which will deal with sysfs_dirents directly. This patch prepares the rest - open, release and poll. There isn't much to do. Just renaming is enough. As sysfs_file_operations and sysfs_bin_operations are identical now, use the same file_operations for both - kernfs_file_operations. This patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sysfs, kernfs: prepare mmap path for kernfsTejun Heo2013-11-291-30/+39
| | | | | | | | | | | | | | | | | | | We're in the process of separating out core sysfs functionality into kernfs which will deal with sysfs_dirents directly. This patch rearranges mmap path so that the kernfs and sysfs parts are separate. sysfs_kf_bin_mmap() which handles the interaction with bin_attribute mmap method is factored out of sysfs_bin_mmap(), which is renamed to kernfs_file_mmap(). All vma ops are renamed accordingly. sysfs_bin_mmap() is updated such that it can be used for both file types. This will eventually allow using the same file_operations for both file types, which is necessary to separate out kernfs. This patch doesn't introduce any behavior changes. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sysfs, kernfs: prepare write path for kernfsTejun Heo2013-11-291-53/+50
| | | | | | | | | | | | | | | | | We're in the process of separating out core sysfs functionality into kernfs which will deal with sysfs_dirents directly. This patch rearranges write path so that the kernfs and sysfs parts are separate. kernfs_file_write() handles all boilerplate work including buffer management and locking and invokes sysfs_kf_write() or sysfs_kf_bin_write() depending on the file type which deals with the interaction with kobj store or bin_attribute write method. While this patch changes the order of some operations, it shouldn't change any visible behavior. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sysfs, kernfs: prepare read path for kernfsTejun Heo2013-11-291-65/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We're in the process of separating out core sysfs functionality into kernfs which will deal with sysfs_dirents directly. This patch rearranges read path so that the kernfs and sysfs parts are separate. * Regular file read path is refactored such that kernfs_seq_start/next/stop/show() handle all the boilerplate work including locking and updating event count for poll, while sysfs_kf_seq_show() deals with interaction with kobj show method. * Bin file read path is refactored such that kernfs_file_direct_read() handles all the boilerplate work including buffer management and locking, while sysfs_kf_bin_read() deals with interaction with bin_attribute read method. kernfs_file_read() is added. It invokes either the seq_file or direct read path depending on the file type. This will eventually allow using the same file_operations for both file types, which is necessary to separate out kernfs. While this patch changes the order of some operations, it shouldn't change any visible behavior. v2: Dropped unnecessary zeroing of @count from sysfs_kf_seq_show(). Add comments explaining single_open() behavior. Both suggested by Pavel. v3: seq_stop() is called even after seq_start() failed. kernfs_seq_start() updated so that it doesn't unlock sysfs_open_file->mutex on failure so that kernfs_seq_stop() doesn't try to unlock an already unlocked mutex. Reported by Fengguang. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Pavel Machek <pavel@ucw.cz> Cc: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sysfs, kernfs: introduce kernfs_create_dir[_ns]()Tejun Heo2013-11-293-30/+36
| | | | | | | | | | | | | | | | | | | | | | | | | Introduce kernfs interface to manipulate a directory which takes and returns sysfs_dirents. create_dir() is renamed to kernfs_create_dir_ns() and its argumantes and return value are updated. create_dir() usages are replaced with kernfs_create_dir_ns() and sysfs_create_subdir() usages are replaced with kernfs_create_dir(). Dup warnings are handled explicitly by sysfs users of the kernfs interface. sysfs_enable_ns() is renamed to kernfs_enable_ns(). This patch doesn't introduce any behavior changes. v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS. v3: kernfs_enable_ns() added. v4: Refreshed on top of "sysfs: drop kobj_ns_type handling, take #2" so that this patch removes sysfs_enable_ns(). Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sysfs, kernfs: replace sysfs_dirent->s_dir.kobj and ->s_attr.[bin_]attr with ↵Tejun Heo2013-11-295-24/+21
| | | | | | | | | | | | | | | | | | | | | ->priv A directory sysfs_dirent points to the associated kobj. A regular or bin file points to the associated [bin_]attribute. This patch replaces sysfs_dirent->s_dir.kobj and ->s_attr.[bin_]attr with void * ->priv. This is to prepare for kernfs interface so that sysfs can specify the private data in the same way for directories and files. This lower debuggability but not by much - the whole thing was overlaid in a union anyway. If debuggability becomes an issue, we can later add ->priv accessors which explicitly check for the sysfs_dirent type and performs casting. This patch doesn't introduce any behavior difference. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* Merge 3.13-rc2 into driver-core-nextGreg Kroah-Hartman2013-11-2917-68/+312
|\ | | | | | | | | | | We want those fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * Merge branch 'for-linus' of ↵Linus Torvalds2013-11-291-2/+1
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs dentry reference count fix from Al Viro. This fixes a possible inode_permission NULL pointer dereference (and other problems) that were due to the root dentry count being decremented too much. In commit 48a066e72d97 ("RCU'd vfsmounts") the placement of clearing the LOOKUP_RCU bit changed, and we then returned failure of incrementing the lockref on the parent dentry with LOOKUP_RCU cleared. But that meant we needed to go through the same cleanup routines that the later failures did wrt LOOKUP_ROOT and nd->root. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fix bogus path_put() of nd->root after some unlazy_walk() failures
| | * fix bogus path_put() of nd->root after some unlazy_walk() failuresAl Viro2013-11-291-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Failure to grab reference to parent dentry should go through the same cleanup as nd->seq mismatch. As it is, we might end up with caller thinking it needs to path_put() nd->root, with obvious nasty results once we'd hit that bug enough times to drive the refcount of root dentry all the way to zero... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2013-11-287-24/+189
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "SMB3 "validate negotiate" is needed to prevent certain types of downgrade attacks. Also changes SMB2/SMB3 copy offload from using the BTRFS copy ioctl (BTRFS_IOC_CLONE) to a cifs specific ioctl (CIFS_IOC_COPYCHUNK_FILE) to address Christoph's comment that there are semantic differences between requesting copy offload in which copy-on-write is mandatory (as in the BTRFS ioctl) and optional in the SMB2/SMB3 case. Also fixes SMB2/SMB3 copychunk for large files" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload Check SMB3 dialects against downgrade attacks Removed duplicated (and unneeded) goto CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large files
| | * | [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offloadSteve French2013-11-251-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead of BTRFS_IOC_CLONE to avoid confusion about whether copy-on-write is required or optional for this operation. SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since they both speed up copy by offloading the copy rather than passing many read and write requests back and forth and both have identical syntax (passing file handles), but for SMB2/SMB3 CopyChunk the server is not required to use copy-on-write to make a copy of the file (although some do), and Christoph has commented that since CopyChunk does not require copy-on-write we should not reuse BTRFS_IOC_CLONE. This patch renames the ioctl to use a cifs specific IOCTL CIFS_IOCTL_COPYCHUNK. This ioctl is particularly important for SMB2/SMB3 since large file copy over the network otherwise can be very slow, and with this is often more than 100 times faster putting less load on server and client. Note that if a copy syscall is ever introduced, depending on its requirements/format it could end up using one of the other three methods that CIFS/SMB2/SMB3 can do for copy offload, but this method is particularly useful for file copy and broadly supported (not just by Samba server). Signed-off-by: Steve French <smfrench@gmail.com> Reviewed-by: Jeff Layton <jlayton@redhat.com> Reviewed-by: David Disseldorp <ddiss@samba.org>
| | * | Check SMB3 dialects against downgrade attacksSteve French2013-11-196-4/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we are running SMB3 or SMB3.02 connections which are signed we need to validate the protocol negotiation information, to ensure that the negotiate protocol response was not tampered with. Add the missing FSCTL which is sent at mount time (immediately after the SMB3 Tree Connect) to validate that the capabilities match what we think the server sent. "Secure dialect negotiation is introduced in SMB3 to protect against man-in-the-middle attempt to downgrade dialect negotiation. The idea is to prevent an eavesdropper from downgrading the initially negotiated dialect and capabilities between the client and the server." For more explanation see 2.2.31.4 of MS-SMB2 or http://blogs.msdn.com/b/openspecification/archive/2012/06/28/smb3-secure-dialect-negotiation.aspx Reviewed-by: Pavel Shilovsky <piastry@etersoft.ru> Signed-off-by: Steve French <smfrench@gmail.com>
| | * | Removed duplicated (and unneeded) gotoSteve French2013-11-181-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | Remove an unneeded goto (and also was duplicated goto target name). Signed-off-by: Steve French <smfrench@gmail.com>
| | * | CIFS: Fix SMB2/SMB3 Copy offload support (refcopy) for large filesSteve French2013-11-182-14/+93
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This third version of the patch, incorparating feedback from David Disseldorp extends the ability of copychunk (refcopy) over smb2/smb3 mounts to handle servers with smaller than usual maximum chunk sizes and also fixes it to handle files bigger than the maximum chunk sizes In the future this can be extended further to handle sending multiple chunk requests in on SMB2 ioctl request which will further improve performance, but even with one 1MB chunk per request the speedup on cp is quite large. Reviewed-by: David Disseldorp <ddiss@samba.org> Signed-off-by: Steve French <smfrench@gmail.com>
| * | | Merge tag 'driver-core-3.13-rc2' of ↵Linus Torvalds2013-11-271-2/+20
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are 3 patches for sysfs issues that have been reported. Well, 1 patch really, the first one is reverted as it's not really needed (the correct fix is coming in through the different driver subsystems instead) But that 1 sysfs fix is needed, so this is still a good thing to pull in now" Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> * tag 'driver-core-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: Revert "sysfs: handle duplicate removal attempts in sysfs_remove_group()" sysfs: use a separate locking class for open files depending on mmap sysfs: handle duplicate removal attempts in sysfs_remove_group()
| * | | | remove obsolete references to powertweakDave Jones2013-11-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This tool hasn't been maintained in over a decade, and is pretty much useless these days. Let's pretend it never happened. Also remove a long-dead email address. Signed-off-by: Dave Jones <davej@fedoraproject.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | Merge branch 'for-linus-bugs' of ↵Linus Torvalds2013-11-268-41/+121
| |\ \ \ \ | | |_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull ceph bug-fixes from Sage Weil: "These include a couple fixes to the new fscache code that went in during the last cycle (which will need to go stable@ shortly as well), a couple client-side directory fragmentation fixes, a fix for a race in the cap release queuing path, and a couple race fixes in the request abort and resend code. Obviously some of this could have gone into 3.12 final, but I preferred to overtest rather than send things in for a late -rc, and then my travel schedule intervened" * 'for-linus-bugs' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: allocate non-zero page to fscache in readpage() ceph: wake up 'safe' waiters when unregistering request ceph: cleanup aborted requests when re-sending requests. ceph: handle race between cap reconnect and cap release ceph: set caps count after composing cap reconnect message ceph: queue cap release in __ceph_remove_cap() ceph: handle frag mismatch between readdir request and reply ceph: remove outdated frag information ceph: hung on ceph fscache invalidate in some cases
| | * | | ceph: allocate non-zero page to fscache in readpage()Li Wang2013-11-231-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ceph_osdc_readpages() returns number of bytes read, currently, the code only allocate full-zero page into fscache, this patch fixes this. Signed-off-by: Li Wang <liwang@ubuntukylin.com> Reviewed-by: Milosz Tanski <milosz@adfin.com> Reviewed-by: Sage Weil <sage@inktank.com>
| | * | | ceph: wake up 'safe' waiters when unregistering requestYan, Zheng2013-11-231-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We also need to wake up 'safe' waiters if error occurs or request aborted. Otherwise sync(2)/fsync(2) may hang forever. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by: Sage Weil <sage@inktank.com>
| | * | | ceph: cleanup aborted requests when re-sending requests.Yan, Zheng2013-11-231-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Aborted requests usually get cleared when the reply is received. If MDS crashes, no reply will be received. So we need to cleanup aborted requests when re-sending requests. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Greg Farnum <greg@inktank.com> Signed-off-by: Sage Weil <sage@inktank.com>
| | * | | ceph: handle race between cap reconnect and cap releaseYan, Zheng2013-11-233-4/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a cap get released while composing the cap reconnect message. We should skip queuing the release message if the cap hasn't been added to the cap reconnect message. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
| | * | | ceph: set caps count after composing cap reconnect messageYan, Zheng2013-11-231-5/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's possible that some caps get released while composing the cap reconnect message. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
| | * | | ceph: queue cap release in __ceph_remove_cap()Yan, Zheng2013-11-233-21/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | call __queue_cap_release() in __ceph_remove_cap(), this avoids acquiring s_cap_lock twice. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
| | * | | ceph: handle frag mismatch between readdir request and replyYan, Zheng2013-09-303-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If client has outdated directory fragments information, it may request readdir an non-existent directory fragment. In this case, the MDS finds an approximate directory fragment and sends its contents back to the client. When receiving a reply with fragment that is different than the requested one, the client need to reset the 'readdir offset'. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
| | * | | ceph: remove outdated frag informationYan, Zheng2013-09-301-4/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If directory fragments change, fill_inode() inserts new frags into the fragtree, but it does not remove outdated frags from the fragtree. This patch fixes it. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com> Reviewed-by: Sage Weil <sage@inktank.com>
| | * | | ceph: hung on ceph fscache invalidate in some casesMilosz Tanski2013-09-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In some cases I'm on my ceph client cluster I'm seeing hunk kernel tasks in the invalidate page code path. This is due to the fact that we don't check if the page is marked as cache before calling fscache_wait_on_page_write(). This is the log from the hang INFO: task XXXXXX:12034 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. ... Call Trace: [<ffffffff81568d09>] schedule+0x29/0x70 [<ffffffffa01d4cbd>] __fscache_wait_on_page_write+0x6d/0xb0 [fscache] [<ffffffff81083520>] ? add_wait_queue+0x60/0x60 [<ffffffffa029a3e9>] ceph_invalidate_fscache_page+0x29/0x50 [ceph] [<ffffffffa027df00>] ceph_invalidatepage+0x70/0x190 [ceph] [<ffffffff8112656f>] ? delete_from_page_cache+0x5f/0x70 [<ffffffff81133cab>] truncate_inode_page+0x8b/0x90 [<ffffffff81133ded>] truncate_inode_pages_range.part.12+0x13d/0x620 [<ffffffff8113431d>] truncate_inode_pages_range+0x4d/0x60 [<ffffffff811343b5>] truncate_inode_pages+0x15/0x20 [<ffffffff8119bbf6>] evict+0x1a6/0x1b0 [<ffffffff8119c3f3>] iput+0x103/0x190 ... Signed-off-by: Milosz Tanski <milosz@adfin.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | | | | Merge branch 'driver-core-linus' into driver-core-nextGreg Kroah-Hartman2013-11-271-2/+20
|\ \ \ \ \ | | |_|/ / | |/| | | | | | | | | | | | | | | | | | | | | | | We need those sysfs fixes in this branch to make testing, and future patches apply properly. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | Revert "sysfs: handle duplicate removal attempts in sysfs_remove_group()"Greg Kroah-Hartman2013-11-271-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 54d71145a4548330313ca664a4a009772fe8b7dd. The root cause of these "inverted" sysfs removals have now been found, so there is no need for this patch. Keep this functionality around so that this type of error doesn't show up in driver code again. Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | sysfs: use a separate locking class for open files depending on mmapTejun Heo2013-11-231-2/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following two commits implemented mmap support in the regular file path and merged bin file support into the regular path. 73d9714627ad ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c") 3124eb1679b2 ("sysfs: merge regular and bin file handling") After the merge, the following commands trigger a spurious lockdep warning. "test-mmap-read" simply mmaps the file and dumps the content. $ cat /sys/block/sda/trace/act_mask $ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096 ====================================================== [ INFO: possible circular locking dependency detected ] 3.12.0-work+ #378 Not tainted ------------------------------------------------------- test-mmap-read/567 is trying to acquire lock: (&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120 but task is already holding lock: (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #3 (&mm->mmap_sem){++++++}: ... -> #2 (sr_mutex){+.+.+.}: ... -> #1 (&bdev->bd_mutex){+.+.+.}: ... -> #0 (&of->mutex){+.+.+.}: ... other info that might help us debug this: Chain exists of: &of->mutex --> sr_mutex --> &mm->mmap_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&mm->mmap_sem); lock(sr_mutex); lock(&mm->mmap_sem); lock(&of->mutex); *** DEADLOCK *** 1 lock held by test-mmap-read/567: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0 stack backtrace: CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80 ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208 0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0 Call Trace: [<ffffffff81611ad2>] dump_stack+0x4e/0x7a [<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf [<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60 [<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0 [<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0 [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120 [<ffffffff8115d363>] mmap_region+0x3b3/0x5b0 [<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0 [<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0 [<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250 [<ffffffff81008282>] SyS_mmap+0x22/0x30 [<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b This happens because one file nests sr_mutex, which nests mm->mmap_sem under it, under of->mutex while mmap implementation naturally nests of->mutex under mm->mmap_sem. The warning is false positive as of->mutex is per open-file and the two paths belong to two different files. This warning didn't trigger before regular and bin file supports were merged because only bin file supported mmap and the other side of locking happened only on regular files which used equivalent but separate locking. It'd be best if we give separate locking classes per file but we can't easily do that. Let's differentiate on ->mmap() for now. Later we'll add explicit file operations struct and can add per-ops lockdep key there. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * | | | sysfs: handle duplicate removal attempts in sysfs_remove_group()Mika Westerberg2013-11-231-0/+9
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive) changed the behavior so that directory removals will be done recursively. This means that the sysfs group might already be removed if its parent directory has been removed. The current code outputs warnings similar to following log snippet when it detects that there is no group for the given kobject: WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0() sysfs group ffffffff81c6f1e0 not found for kobject 'host7' Modules linked in: CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13 Hardware name: /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013 Workqueue: kacpi_hotplug acpi_hotplug_work_fn 0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8 ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0 ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48 Call Trace: [<ffffffff817daab1>] dump_stack+0x45/0x56 [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0 [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50 [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70 [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0 [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50 [<ffffffff8142a0d0>] device_del+0x40/0x1b0 [<ffffffff8142a24d>] device_unregister+0xd/0x20 [<ffffffff8144131a>] scsi_remove_host+0xba/0x110 [<ffffffff8145f526>] ata_host_detach+0xc6/0x100 [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20 [<ffffffff812e8f48>] pci_device_remove+0x28/0x60 [<ffffffff8142d854>] __device_release_driver+0x64/0xd0 [<ffffffff8142d8de>] device_release_driver+0x1e/0x30 [<ffffffff8142d257>] bus_remove_device+0xf7/0x140 [<ffffffff8142a1b1>] device_del+0x121/0x1b0 [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0 [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0 [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0 [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20 [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0 [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0 [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0 [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0 [<ffffffff812fd90d>] hotplug_event+0xcd/0x160 [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60 [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22 [<ffffffff8105cf3a>] process_one_work+0x17a/0x430 [<ffffffff8105db29>] worker_thread+0x119/0x390 [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0 [<ffffffff81063a5d>] kthread+0xcd/0xf0 [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180 [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0 [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180 On this particular machine I see ~16 of these message during Thunderbolt hot-unplug. Fix this in similar way that was done for sysfs_remove_one() by checking if the parent directory has already been removed and bailing out early. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | sysfs, kernfs: introduce kernfs_setattr()Tejun Heo2013-11-273-11/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce kernfs setattr interface - kernfs_setattr(). sysfs_sd_setattr() is renamed to __kernfs_setattr() and kernfs_setattr() is a simple wrapper around it with sysfs_mutex locking. sysfs_chmod_file() is updated to get an explicit ref on kobj->sd and then invoke kernfs_setattr() so that it doesn't have to use internal interface. This patch doesn't introduce any behavior differences. v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | sysfs, kernfs: introduce kernfs_rename[_ns]()Tejun Heo2013-11-273-12/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce kernfs rename interface, krenfs_rename[_ns](). This is just rename of sysfs_rename(). No functional changes. Function comment is added to kernfs_rename_ns() and @new_parent_sd is renamed to @new_parent for consistency with other kernfs interfaces. v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | sysfs, kernfs: introduce kernfs_create_link()Tejun Heo2013-11-271-30/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Separate out kernfs symlink interface - kernfs_create_link() - which takes and returns sysfs_dirents, from sysfs_do_create_link_sd(). sysfs_do_create_link_sd() now just determines the parent and target sysfs_dirents and invokes the new interface and handles dup warning. This patch doesn't introduce behavior changes. v2: Dummy implementation for !CONFIG_SYSFS updated to return -ENOSYS. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | sysfs, kernfs: introduce kernfs_remove[_by_name[_ns]]()Tejun Heo2013-11-275-26/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce kernfs removal interfaces - kernfs_remove() and kernfs_remove_by_name[_ns](). These are just renames of sysfs_remove() and sysfs_hash_and_remove(). No functional changes. v2: Dummy kernfs_remove_by_name_ns() for !CONFIG_SYSFS updated to return -ENOSYS instead of 0. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | sysfs, kernfs: add skeletons for kernfsTejun Heo2013-11-277-1/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Core sysfs implementation will be separated into kernfs so that it can be used by other non-kobject users. This patch creates fs/kernfs/ directory and makes boilerplate changes. kernfs interface will be directly based on sysfs_dirent and its forward declaration is moved to include/linux/kernfs.h which is included from include/linux/sysfs.h. sysfs core implementation will be gradually separated out and moved to kernfs. This patch doesn't introduce any functional changes. v2: mount.c added. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: linux-fsdevel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | sysfs: make __sysfs_add_one() fail if the parent isn't a directoryTejun Heo2013-11-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the kobject based interface guarantees that a parent sysfs_dirent is always a directory; however, the planned kernfs interface will be directly based on sysfs_dirents and the caller may specify non-directory node as the parent. Add an explicit check in __sysfs_add_one() so that such attempts fail with -EINVAL. Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | | sysfs: drop kobj_ns_type handling, take #2Tejun Heo2013-11-274-112/+55
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The way namespace tags are implemented in sysfs is more complicated than necessary. As each tag is a pointer value and required to be non-NULL under a namespace enabled parent, there's no need to record separately what type each tag is. If multiple namespace types are needed, which currently aren't, we can simply compare the tag to a set of allowed tags in the superblock assuming that the tags, being pointers, won't have the same value across multiple types. This patch rips out kobj_ns_type handling from sysfs. sysfs now has an enable switch to turn on namespace under a node. If enabled, all children are required to have non-NULL namespace tags and filtered against the super_block's tag. kobject namespace determination is now performed in lib/kobject.c::create_dir() making sysfs_read_ns_type() unnecessary. The sanity checks are also moved. create_dir() is restructured to ease such addition. This removes most kobject namespace knowledge from sysfs proper which will enable proper separation and layering of sysfs. This is the second try. The first one was cb26a311578e ("sysfs: drop kobj_ns_type handling") which tried to automatically enable namespace if there are children with non-NULL namespace tags; however, it was broken for symlinks as they should inherit the target's tag iff namespace is enabled in the parent. This led to namespace filtering enabled incorrectly for wireless net class devices through phy80211 symlinks and thus network configuration failure. a1212d278c05 ("Revert "sysfs: drop kobj_ns_type handling"") reverted the commit. This shouldn't introduce any behavior changes, for real. v2: Dummy implementation of sysfs_enable_ns() for !CONFIG_SYSFS was missing and caused build failure. Reported by kbuild test robot. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kay Sievers <kay@vrfy.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | Merge tag 'ecryptfs-3.13-rc1-quiet-checkers' of ↵Linus Torvalds2013-11-221-6/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull minor eCryptfs fix from Tyler Hicks: "Quiet static checkers by removing unneeded conditionals" * tag 'ecryptfs-3.13-rc1-quiet-checkers' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: eCryptfs: file->private_data is always valid
| * | | eCryptfs: file->private_data is always validTyler Hicks2013-11-141-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When accessing the lower_file pointer located in private_data of eCryptfs files, there is no need to check to see if the private_data pointer has been initialized to a non-NULL value. The file->private_data and file->private_data->lower_file pointers are always initialized to non-NULL values in ecryptfs_open(). This change quiets a Smatch warning: CHECK /var/scm/kernel/linux/fs/ecryptfs/file.c fs/ecryptfs/file.c:321 ecryptfs_unlocked_ioctl() error: potential NULL dereference 'lower_file'. fs/ecryptfs/file.c:335 ecryptfs_compat_ioctl() error: potential NULL dereference 'lower_file'. Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Geyslan G. Bem <geyslan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge git://git.kvack.org/~bcrl/aio-nextLinus Torvalds2013-11-221-83/+51
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull aio fixes from Benjamin LaHaise. * git://git.kvack.org/~bcrl/aio-next: aio: nullify aio->ring_pages after freeing it aio: prevent double free in ioctx_alloc aio: Fix a trinity splat
| * | | | aio: nullify aio->ring_pages after freeing itSasha Levin2013-11-191-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After freeing ring_pages we leave it as is causing a dangling pointer. This has already caused an issue so to help catching any issues in the future NULL it out. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
| * | | | aio: prevent double free in ioctx_allocSasha Levin2013-11-191-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ioctx_alloc() calls aio_setup_ring() to allocate a ring. If aio_setup_ring() fails to do so it would call aio_free_ring() before returning, but ioctx_alloc() would call aio_free_ring() again causing a double free of the ring. This is easily reproducible from userspace. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Benjamin LaHaise <bcrl@kvack.org>
| * | | | Merge branch 'aio-fix' of http://evilpiepirate.org/git/linux-bcacheBenjamin LaHaise2013-11-041-81/+48
| |\ \ \ \
| | * | | | aio: Fix a trinity splatKent Overstreet2013-10-101-81/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | aio kiocb refcounting was broken - it was relying on keeping track of the number of available ring buffer entries, which it needs to do anyways; then at shutdown time it'd wait for completions to be delivered until the # of available ring buffer entries equalled what it was initialized to. Problem with that is that the ring buffer is mapped writable into userspace, so userspace could futz with the head and tail pointers to cause the kernel to see extra completions, and cause free_ioctx() to return while there were still outstanding kiocbs. Which would be bad. Fix is just to directly refcount the kiocbs - which is more straightforward, and with the new percpu refcounting code doesn't cost us any cacheline bouncing which was the whole point of the original scheme. Also clean up ioctx_alloc()'s error path and fix a bug where it wasn't subtracting from aio_nr if ioctx_add_table() failed. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* | | | | | Merge branch 'for-3.13' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2013-11-222-75/+101
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd bugfixes from Bruce Fields: "A couple nfsd bugfixes" * 'for-3.13' of git://linux-nfs.org/~bfields/linux: nfsd4: fix xdr decoding of large non-write compounds nfsd: make sure to balance get/put_write_access nfsd: split up nfsd_setattr
| * | | | | | nfsd4: fix xdr decoding of large non-write compoundsJ. Bruce Fields2013-11-191-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes a regression from 247500820ebd02ad87525db5d9b199e5b66f6636 "nfsd4: fix decoding of compounds across page boundaries". The previous code was correct: argp->pagelist is initialized in nfs4svc_deocde_compoundargs to rqstp->rq_arg.pages, and is therefore a pointer to the page *after* the page we are currently decoding. The reason that patch nevertheless fixed a problem with decoding compounds containing write was a bug in the write decoding introduced by 5a80a54d21c96590d013378d8c5f65f879451ab4 "nfsd4: reorganize write decoding", after which write decoding no longer adhered to the rule that argp->pagelist point to the next page. Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: make sure to balance get/put_write_accessChristoph Hellwig2013-11-181-14/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use a straight goto error label style in nfsd_setattr to make sure we always do the put_write_access call after we got it earlier. Note that the we have been failing to do that in the case nfsd_break_lease() returns an error, a bug introduced into 2.6.38 with 6a76bebefe15d9a08864f824d7f8d5beaf37c997 "nfsd4: break lease on nfsd setattr". Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | | | | | nfsd: split up nfsd_setattrChristoph Hellwig2013-11-181-60/+84
| | |_|_|_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split out two helpers to make the code more readable and easier to verify for correctness. Signed-off-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Signed-off-by: J. Bruce Fields <bfields@redhat.com>
* | | | | | Merge tag 'gfs2-fixes' of ↵Linus Torvalds2013-11-222-2/+6
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes Pull GFS2 fixes from Steven Whitehouse: "A couple of small, but important bug fixes for GFS2. The first one fixes a possible NULL pointer dereference, and the second one resolves a reference counting issue in one of the lesser used paths through atomic_open" * tag 'gfs2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes: GFS2: Fix ref count bug relating to atomic_open GFS2: fix potential NULL pointer dereference
| * | | | | | GFS2: Fix ref count bug relating to atomic_openSteven Whitehouse2013-11-211-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case that atomic_open calls finish_no_open() with the dentry that was supplied to gfs2_atomic_open() an extra reference count is required. This patch fixes that issue preventing a bug trap triggering at umount time. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
OpenPOWER on IntegriCloud