summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* xfs: fix _xfs_buf_find oops on blocks beyond the filesystem endDave Chinner2013-01-281-0/+18
| | | | | | | | | | | | | | | | | | | | | When _xfs_buf_find is passed an out of range address, it will fail to find a relevant struct xfs_perag and oops with a null dereference. This can happen when trying to walk a filesystem with a metadata inode that has a partially corrupted extent map (i.e. the block number returned is corrupt, but is otherwise intact) and we try to read from the corrupted block address. In this case, just fail the lookup. If it is readahead being issued, it will simply not be done, but if it is real read that fails we will get an error being reported. Ideally this case should result in an EFSCORRUPTED error being reported, but we cannot return an error through xfs_buf_read() or xfs_buf_get() so this lookup failure may result in ENOMEM or EIO errors being reported instead. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: pull up stack_switch check into xfs_bmapi_writeBrian Foster2013-01-281-3/+3
| | | | | | | | | | | | The stack_switch check currently occurs in __xfs_bmapi_allocate, which means the stack switch only occurs when xfs_bmapi_allocate() is called in a loop. Pull the check up before the loop in xfs_bmapi_write() such that the first iteration of the loop has consistent behavior. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: Do not return EFSCORRUPTED when filesystem probe finds no XFS magicEric Sandeen2013-01-281-1/+1
| | | | | | | | | | | 9802182 changed the return value from EWRONGFS (aka EINVAL) to EFSCORRUPTED which doesn't seem to be handled properly by the root filesystem probe. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Sergei Trofimovich <slyfox@gentoo.org> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: recalculate leaf entry pointer after compacting a dir2 blockEric Sandeen2013-01-161-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dave Jones hit this assert when doing a compile on recent git, with CONFIG_XFS_DEBUG enabled: XFS: Assertion failed: (char *)dup - (char *)hdr == be16_to_cpu(*xfs_dir2_data_unused_tag_p(dup)), file: fs/xfs/xfs_dir2_data.c, line: 828 Upon further digging, the tag found by xfs_dir2_data_unused_tag_p(dup) contained "2" and not the proper offset, and I found that this value was changed after the memmoves under "Use a stale leaf for our new entry." in xfs_dir2_block_addname(), i.e. memmove(&blp[mid + 1], &blp[mid], (highstale - mid) * sizeof(*blp)); overwrote it. What has happened is that the previous call to xfs_dir2_block_compact() has rearranged things; it changes btp->count as well as the blp array. So after we make that call, we must recalculate the proper pointer to the leaf entries by making another call to xfs_dir2_block_leaf_p(). Dave provided a metadump image which led to a simple reproducer (create a particular filename in the affected directory) and this resolves the testcase as well as the bug on his live system. Thanks also to dchinner for looking at this one with me. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Tested-by: Dave Jones <davej@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: remove int casts from debug dquot soft limit timer assertsBrian Foster2013-01-161-2/+2
| | | | | | | | | | | The int casts here make it easy to trigger an assert with a large soft limit. For example, set a >4TB soft limit on an empty volume to reproduce a (0 > -x) comparison due to an overflow of d_blk_softlimit. Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: fix the multi-segment log buffer formatMark Tinguely2013-01-162-5/+15
| | | | | | | | | | | | Per Dave Chinner suggestion, this patch: 1) Corrects the detection of whether a multi-segment buffer is still tracking data. 2) Clears all the buffer log formats for a multi-segment buffer. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: fix segment in xfs_buf_item_format_segmentMark Tinguely2013-01-161-5/+15
| | | | | | | | | | | | Not every segment in a multi-segment buffer is dirty in a transaction and they will not be outputted. The assert in xfs_buf_item_format_segment() that checks for the at least one chunk of data in the segment to be used is not necessary true for multi-segmented buffers. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: rename bli_format to avoid confusion with bli_formatsMark Tinguely2013-01-163-24/+24
| | | | | | | | | | Rename the bli_format structure to __bli_format to avoid accidently confusing them with the bli_formats pointer. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
* xfs: use b_maps[] for discontiguous buffersMark Tinguely2013-01-162-9/+9
| | | | | | | | | | | | | | | | | | Commits starting at 77c1a08 introduced a multiple segment support to xfs_buf. xfs_trans_buf_item_match() could not find a multi-segment buffer in the transaction because it was looking at the single segment block number rather than the multi-segment b_maps[0].bm.bn. This results on a recursive buffer lock that can never be satisfied. This patch: 1) Changed the remaining b_map accesses to be b_maps[0] accesses. 2) Renames the single segment b_map structure to __b_map to avoid future confusion. Signed-off-by: Mark Tinguely <tinguely@sgi.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ben Myers <bpm@sgi.com>
* Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2012-12-214-17/+18
|\ | | | | | | | | | | | | | | | | | | Pull CIFS fixes from Steve French: "Misc small cifs fixes" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: cifs: eliminate cifsERROR variable cifs: don't compare uniqueids in cifs_prime_dcache unless server inode numbers are in use cifs: fix double-free of "string" in cifs_parse_mount_options
| * cifs: eliminate cifsERROR variableJeff Layton2012-12-202-6/+1
| | | | | | | | | | | | | | It's always set to "1" and there's no way to change it to anything else. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * cifs: don't compare uniqueids in cifs_prime_dcache unless server inode ↵Jeff Layton2012-12-201-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | numbers are in use Oliver reported that commit cd60042c caused his cifs mounts to continually thrash through new inodes on readdir. His servers are not sending inode numbers (or he's not using them), and the new test in that function doesn't account for that sort of setup correctly. If we're not using server inode numbers, then assume that the inode attached to the dentry hasn't changed. Go ahead and update the attributes in place, but keep the same inode number. Cc: <stable@vger.kernel.org> # v3.5+ Reported-and-Tested-by: Oliver Mössinger <Oliver.Moessinger@ichaus.de> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
| * cifs: fix double-free of "string" in cifs_parse_mount_optionsJeff Layton2012-12-201-7/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan reported the following regression in commit d387a5c5: + fs/cifs/connect.c:1903 cifs_parse_mount_options() error: double free of 'string' That patch has some of the new option parsing code free "string" without setting the variable to NULL afterward. Since "string" is automatically freed in an error condition, fix the code to just rely on that instead of freeing it explicitly. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <smfrench@gmail.com>
* | Revert "nfsd: warn on odd reply state in nfsd_vfs_read"J. Bruce Fields2012-12-211-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e. This is obviously wrong, and I have no idea how I missed seeing the warning in testing: I must just not have looked at the right logs. The caller bumps rq_resused/rq_next_page, so it will always be hit on a large enough read. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | NFS: Kill fscache warnings when mounting without -ofscTrond Myklebust2012-12-211-4/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The fscache code will currently bleat a "non-unique superblock keys" warning even if the user is mounting without the 'fsc' option. There should be no reason to even initialise the superblock cache cookie unless we're planning on using fscache for something, so ensure that we check for the NFS_OPTION_FSCACHE flag before calling into the fscache code. Reported-by: Paweł Sikora <pawel.sikora@agmk.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: David Howells <dhowells@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | NFS: Provide stub nfs_fscache_wait_on_invalidate() for when CONFIG_NFS_FSCACHE=nDavid Howells2012-12-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a stub nfs_fscache_wait_on_invalidate() function for when CONFIG_NFS_FSCACHE=n lest the following error appear: fs/nfs/inode.c: In function 'nfs_invalidate_mapping': fs/nfs/inode.c:887:2: error: implicit declaration of function 'nfs_fscache_wait_on_invalidate' [-Werror=implicit-function-declaration] cc1: some warnings being treated as errors Reported-by: kbuild test robot <fengguang.wu@intel.com> Reported-by: Vineet Gupta <Vineet.Gupta1@synopsys.com> Reported-by: Borislav Petkov <bp@alien8.de> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge branch 'for-next' of git://git.infradead.org/users/eparis/notifyLinus Torvalds2012-12-2011-104/+152
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull filesystem notification updates from Eric Paris: "This pull mostly is about locking changes in the fsnotify system. By switching the group lock from a spin_lock() to a mutex() we can now hold the lock across things like iput(). This fixes a problem involving unmounting a fs and having inodes be busy, first pointed out by FAT, but reproducible with tmpfs. This also restores signal driven I/O for inotify, which has been broken since about 2.6.32." Ugh. I *hate* the timing of this. It was rebased after the merge window opened, and then left to sit with the pull request coming the day before the merge window closes. That's just crap. But apparently the patches themselves have been around for over a year, just gathering dust, so now it's suddenly critical. Fixed up semantic conflict in fs/notify/fdinfo.c as per Stephen Rothwell's fixes from -next. * 'for-next' of git://git.infradead.org/users/eparis/notify: inotify: automatically restart syscalls inotify: dont skip removal of watch descriptor if creation of ignored event failed fanotify: dont merge permission events fsnotify: make fasync generic for both inotify and fanotify fsnotify: change locking order fsnotify: dont put marks on temporary list when clearing marks by group fsnotify: introduce locked versions of fsnotify_add_mark() and fsnotify_remove_mark() fsnotify: pass group to fsnotify_destroy_mark() fsnotify: use a mutex instead of a spinlock to protect a groups mark list fanotify: add an extra flag to mark_remove_from_mask that indicates wheather a mark should be destroyed fsnotify: take groups mark_lock before mark lock fsnotify: use reference counting for groups fsnotify: introduce fsnotify_get_group() inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()
| * | inotify: automatically restart syscallsEric Paris2012-12-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We were mistakenly returning EINTR when we found an outstanding signal. Instead we should returen ERESTARTSYS and allow the kernel to handle things the right way. Patch-from: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | inotify: dont skip removal of watch descriptor if creation of ignored event ↵Lino Sanfilippo2012-12-111-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | failed In inotify_ignored_and_remove_idr() the removal of a watch descriptor is skipped if the allocation of an ignored event failed and we are leaking memory (the watch descriptor and the mark linked to it). This patch ensures that the watch descriptor is removed regardless of whether event creation failed or not. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fanotify: dont merge permission eventsLino Sanfilippo2012-12-111-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Boyd Yang reported a problem for the case that multiple threads of the same thread group are waiting for a reponse for a permission event. In this case it is possible that some of the threads are never woken up, even if the response for the event has been received (see http://marc.info/?l=linux-kernel&m=131822913806350&w=2). The reason is that we are currently merging permission events if they belong to the same thread group. But we are not prepared to wake up more than one waiter for each event. We do wait_event(group->fanotify_data.access_waitq, event->response || atomic_read(&group->fanotify_data.bypass_perm)); and after that event->response = 0; which is the reason that even if we woke up all waiters for the same event some of them may see event->response being already set 0 again, then go back to sleep and block forever. With this patch we avoid that more than one thread is waiting for a response by not merging permission events for the same thread group any more. Reported-by: Boyd Yang <boyd.yang@gmail.com> Signed-off-by: Lino Sanfilippo <LinoSanfilipp@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: make fasync generic for both inotify and fanotifyEric Paris2012-12-114-9/+16
| | | | | | | | | | | | | | | | | | | | | | | | inotify is supposed to support async signal notification when information is available on the inotify fd. This patch moves that support to generic fsnotify functions so it can be used by all notification mechanisms. Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: change locking orderLino Sanfilippo2012-12-111-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On Mon, Aug 01, 2011 at 04:38:22PM -0400, Eric Paris wrote: > > I finally built and tested a v3.0 kernel with these patches (I know I'm > SOOOOOO far behind). Not what I hoped for: > > > [ 150.937798] VFS: Busy inodes after unmount of tmpfs. Self-destruct in 5 seconds. Have a nice day... > > [ 150.945290] BUG: unable to handle kernel NULL pointer dereference at 0000000000000070 > > [ 150.946012] IP: [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] PGD 2bf9e067 PUD 2bf9f067 PMD 0 > > [ 150.946012] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC > > [ 150.946012] CPU 0 > > [ 150.946012] Modules linked in: nfs lockd fscache auth_rpcgss nfs_acl sunrpc ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_filter ip6_tables ext4 jbd2 crc16 joydev ata_piix i2c_piix4 pcspkr uinput ipv6 autofs4 usbhid [last unloaded: scsi_wait_scan] > > [ 150.946012] > > [ 150.946012] Pid: 2764, comm: syscall_thrash Not tainted 3.0.0+ #1 Red Hat KVM > > [ 150.946012] RIP: 0010:[<ffffffff810ffd58>] [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] RSP: 0018:ffff88002c2e5df8 EFLAGS: 00010282 > > [ 150.946012] RAX: 000000004e370d9f RBX: 0000000000000000 RCX: ffff88003a029438 > > [ 150.946012] RDX: 0000000033630a5f RSI: 0000000000000000 RDI: ffff88003491c240 > > [ 150.946012] RBP: ffff88002c2e5e08 R08: 0000000000000000 R09: 0000000000000000 > > [ 150.946012] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a029428 > > [ 150.946012] R13: ffff88003a029428 R14: ffff88003a029428 R15: ffff88003499a610 > > [ 150.946012] FS: 00007f5a05420700(0000) GS:ffff88003f600000(0000) knlGS:0000000000000000 > > [ 150.946012] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > [ 150.946012] CR2: 0000000000000070 CR3: 000000002a662000 CR4: 00000000000006f0 > > [ 150.946012] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 150.946012] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > [ 150.946012] Process syscall_thrash (pid: 2764, threadinfo ffff88002c2e4000, task ffff88002bfbc760) > > [ 150.946012] Stack: > > [ 150.946012] ffff88003a029438 ffff88003a029428 ffff88002c2e5e38 ffffffff81102f76 > > [ 150.946012] ffff88003a029438 ffff88003a029598 ffffffff8160f9c0 ffff88002c221250 > > [ 150.946012] ffff88002c2e5e68 ffffffff8115e9be ffff88002c2e5e68 ffff88003a029438 > > [ 150.946012] Call Trace: > > [ 150.946012] [<ffffffff81102f76>] shmem_evict_inode+0x76/0x130 > > [ 150.946012] [<ffffffff8115e9be>] evict+0x7e/0x170 > > [ 150.946012] [<ffffffff8115ee40>] iput_final+0xd0/0x190 > > [ 150.946012] [<ffffffff8115ef33>] iput+0x33/0x40 > > [ 150.946012] [<ffffffff81180205>] fsnotify_destroy_mark_locked+0x145/0x160 > > [ 150.946012] [<ffffffff81180316>] fsnotify_destroy_mark+0x36/0x50 > > [ 150.946012] [<ffffffff81181937>] sys_inotify_rm_watch+0x77/0xd0 > > [ 150.946012] [<ffffffff815aca52>] system_call_fastpath+0x16/0x1b > > [ 150.946012] Code: 67 4a 00 b8 e4 ff ff ff eb aa 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83 ec 10 48 89 1c 24 4c 89 64 24 08 48 8b 9f 40 05 00 00 > > [ 150.946012] 83 7b 70 00 74 1c 4c 8d a3 80 00 00 00 4c 89 e7 e8 d2 5d 4a > > [ 150.946012] RIP [<ffffffff810ffd58>] shmem_free_inode+0x18/0x50 > > [ 150.946012] RSP <ffff88002c2e5df8> > > [ 150.946012] CR2: 0000000000000070 > > Looks at aweful lot like the problem from: > http://www.spinics.net/lists/linux-fsdevel/msg46101.html > I tried to reproduce this bug with your test program, but without success. However, if I understand correctly, this occurs since we dont hold any locks when we call iput() in mark_destroy(), right? With the patches you tested, iput() is also not called within any lock, since the groups mark_mutex is released temporarily before iput() is called. This is, since the original codes behaviour is similar. However since we now have a mutex as the biggest lock, we can do what you suggested (http://www.spinics.net/lists/linux-fsdevel/msg46107.html) and call iput() with the mutex held to avoid the race. The patch below implements this. It uses nested locking to avoid deadlock in case we do the final iput() on an inode which still holds marks and thus would take the mutex again when calling fsnotify_inode_delete() in destroy_inode(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: dont put marks on temporary list when clearing marks by groupLino Sanfilippo2012-12-111-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In clear_marks_by_group_flags() the mark list of a group is iterated and the marks are put on a temporary list. Since we introduced fsnotify_destroy_mark_locked() we dont need the temp list any more and are able to remove the marks while the mark list is iterated and the mark list mutex is held. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: introduce locked versions of fsnotify_add_mark() and ↵Lino Sanfilippo2012-12-111-12/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fsnotify_remove_mark() This patch introduces fsnotify_add_mark_locked() and fsnotify_remove_mark_locked() which are essentially the same as fsnotify_add_mark() and fsnotify_remove_mark() but assume that the caller has already taken the groups mark mutex. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: pass group to fsnotify_destroy_mark()Lino Sanfilippo2012-12-117-25/+28
| | | | | | | | | | | | | | | | | | | | | | | | In fsnotify_destroy_mark() dont get the group from the passed mark anymore, but pass the group itself as an additional parameter to the function. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: use a mutex instead of a spinlock to protect a groups mark listLino Sanfilippo2012-12-114-14/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | Replaces the groups mark_lock spinlock with a mutex. Using a mutex instead of a spinlock results in more flexibility (i.e it allows to sleep while the lock is held). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fanotify: add an extra flag to mark_remove_from_mask that indicates wheather ↵Lino Sanfilippo2012-12-111-5/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a mark should be destroyed This patch adds an extra flag to mark_remove_from_mask() to inform the caller if the mark should be destroyed. With this we dont destroy the mark implicitly in the function itself any more but let the caller handle it. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: take groups mark_lock before mark lockLino Sanfilippo2012-12-111-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Race-free addition and removal of a mark to a groups mark list would be easier if we could lock the mark list of group before we lock the specific mark. This patch changes the order used to add/remove marks to/from mark lists from 1. mark->lock 2. group->mark_lock 3. inode->i_lock to 1. group->mark_lock 2. mark->lock 3. inode->i_lock Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: use reference counting for groupsLino Sanfilippo2012-12-114-28/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Get a group ref for each mark that is added to the groups list and release that ref when the mark is freed in fsnotify_put_mark(). We also use get a group reference for duplicated marks and for private event data. Now we dont free a group any more when the number of marks becomes 0 but when the groups ref count does. Since this will only happen when all marks are removed from a groups mark list, we dont have to set the groups number of marks to 1 at group creation. Beside clearing all marks in fsnotify_destroy_group() we do also flush the groups event queue. This is since events may hold references to groups (due to private event data) and we have to put those references first before we get a chance to put the final ref, which will result in a call to fsnotify_final_destroy_group(). Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | fsnotify: introduce fsnotify_get_group()Lino Sanfilippo2012-12-111-0/+8
| | | | | | | | | | | | | | | | | | | | | Introduce fsnotify_get_group() which increments the reference counter of a group. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
| * | inotify, fanotify: replace fsnotify_put_group() with fsnotify_destroy_group()Lino Sanfilippo2012-12-113-13/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently in fsnotify_put_group() the ref count of a group is decremented and if it becomes 0 fsnotify_destroy_group() is called. Since a groups ref count is only at group creation set to 1 and never increased after that a call to fsnotify_put_group() always results in a call to fsnotify_destroy_group(). With this patch fsnotify_destroy_group() is called directly. Signed-off-by: Lino Sanfilippo <LinoSanfilippo@gmx.de> Signed-off-by: Eric Paris <eparis@redhat.com>
* | | Merge branch 'akpm' (Andrew's patch-bomb)Linus Torvalds2012-12-2013-23/+85
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge the rest of Andrew's patches for -rc1: "A bunch of fixes and misc missed-out-on things. That'll do for -rc1. I still have a batch of IPC patches which still have a possible bug report which I'm chasing down." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (25 commits) keys: use keyring_alloc() to create module signing keyring keys: fix unreachable code sendfile: allows bypassing of notifier events SGI-XP: handle non-fatal traps fat: fix incorrect function comment Documentation: ABI: remove testing/sysfs-devices-node proc: fix inconsistent lock state linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors memcg: don't register hotcpu notifier from ->css_alloc() checkpatch: warn on uapi #includes that #include <uapi/... revert "rtc: recycle id when unloading a rtc driver" mm: clean up transparent hugepage sysfs error messages hfsplus: add error message for the case of failure of sync fs in delayed_sync_fs() method hfsplus: rework processing of hfs_btree_write() returned error hfsplus: rework processing errors in hfsplus_free_extents() hfsplus: avoid crash on failed block map free kcmp: include linux/ptrace.h drivers/rtc/rtc-imxdi.c: must include <linux/spinlock.h> mm: cma: WARN if freed memory is still in use exec: do not leave bprm->interp on stack ...
| * | | sendfile: allows bypassing of notifier eventsScott Wolchok2012-12-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do_sendfile() in fs/read_write.c does not call the fsnotify functions, unlike its neighbors. This manifests as a lack of inotify ACCESS events when a file is sent using sendfile(2). Addresses https://bugzilla.kernel.org/show_bug.cgi?id=12812 [akpm@linux-foundation.org: use fsnotify_modify(out.file), not fsnotify_access(), per Dave] Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Scott Wolchok <swolchok@umich.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | fat: fix incorrect function commentRavishankar N2012-12-203-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fat_search_long() returns 0 on success, -ENOENT/ENOMEM on failure. Change the function comment accordingly. While at it, fix some trivial typos. Signed-off-by: Ravishankar N <cyberax82@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | proc: fix inconsistent lock stateXiaotian Feng2012-12-201-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Lockdep found an inconsistent lock state when rcu is processing delayed work in softirq. Currently, kernel is using spin_lock/spin_unlock to protect proc_inum_ida, but proc_free_inum is called by rcu in softirq context. Use spin_lock_bh/spin_unlock_bh fix following lockdep warning. ================================= [ INFO: inconsistent lock state ] 3.7.0 #36 Not tainted --------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes: (proc_inum_lock){+.?...}, at: proc_free_inum+0x1c/0x50 {SOFTIRQ-ON-W} state was registered at: __lock_acquire+0x8ae/0xca0 lock_acquire+0x199/0x200 _raw_spin_lock+0x41/0x50 proc_alloc_inum+0x4c/0xd0 alloc_mnt_ns+0x49/0xc0 create_mnt_ns+0x25/0x70 mnt_init+0x161/0x1c7 vfs_caches_init+0x107/0x11a start_kernel+0x348/0x38c x86_64_start_reservations+0x131/0x136 x86_64_start_kernel+0x103/0x112 irq event stamp: 2993422 hardirqs last enabled at (2993422): _raw_spin_unlock_irqrestore+0x55/0x80 hardirqs last disabled at (2993421): _raw_spin_lock_irqsave+0x29/0x70 softirqs last enabled at (2993394): _local_bh_enable+0x13/0x20 softirqs last disabled at (2993395): call_softirq+0x1c/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(proc_inum_lock); <Interrupt> lock(proc_inum_lock); *** DEADLOCK *** no locks held by swapper/1/0. stack backtrace: Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36 Call Trace: <IRQ> [<ffffffff810a40f1>] ? vprintk_emit+0x471/0x510 print_usage_bug+0x2a5/0x2c0 mark_lock+0x33b/0x5e0 __lock_acquire+0x813/0xca0 lock_acquire+0x199/0x200 _raw_spin_lock+0x41/0x50 proc_free_inum+0x1c/0x50 free_pid_ns+0x1c/0x50 put_pid_ns+0x2e/0x50 put_pid+0x4a/0x60 delayed_put_pid+0x12/0x20 rcu_process_callbacks+0x462/0x790 __do_softirq+0x1b4/0x3b0 call_softirq+0x1c/0x30 do_softirq+0x59/0xd0 irq_exit+0x54/0xd0 smp_apic_timer_interrupt+0x95/0xa3 apic_timer_interrupt+0x72/0x80 cpuidle_enter_tk+0x10/0x20 cpuidle_enter_state+0x17/0x50 cpuidle_idle_call+0x287/0x520 cpu_idle+0xba/0x130 start_secondary+0x2b3/0x2bc Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | hfsplus: add error message for the case of failure of sync fs in ↵Vyacheslav Dubeyko2012-12-201-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | delayed_sync_fs() method Add an error message for the case of failure of sync fs in delayed_sync_fs() method. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | hfsplus: rework processing of hfs_btree_write() returned errorVyacheslav Dubeyko2012-12-203-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add to hfs_btree_write() a return of -EIO on failure of b-tree node searching. Also add logic ofor processing errors from hfs_btree_write() in hfsplus_system_write_inode() with a message about b-tree writing failure. [akpm@linux-foundation.org: reduce scope of `err', print errno on error] Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | hfsplus: rework processing errors in hfsplus_free_extents()Vyacheslav Dubeyko2012-12-201-4/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, it doesn't process error codes from the hfsplus_block_free() call in hfsplus_free_extents() method. Add some error code processing. Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | hfsplus: avoid crash on failed block map freeAlan Cox2012-12-201-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the read fails we kmap an error code. This doesn't end well. Instead print a critical error and pray. This mirrors the rest of the fs behaviour with critical error cases. Acked-by: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Alan Cox <alan@linux.intel.com> Signed-off-by: Vyacheslav Dubeyko <slava@dubeyko.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Cc: Jan Kara <jack@suse.cz> Acked-by: Hin-Tak Leung <htl10@users.sourceforge.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | exec: do not leave bprm->interp on stackKees Cook2012-12-203-2/+22
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a series of scripts are executed, each triggering module loading via unprintable bytes in the script header, kernel stack contents can leak into the command line. Normally execution of binfmt_script and binfmt_misc happens recursively. However, when modules are enabled, and unprintable bytes exist in the bprm->buf, execution will restart after attempting to load matching binfmt modules. Unfortunately, the logic in binfmt_script and binfmt_misc does not expect to get restarted. They leave bprm->interp pointing to their local stack. This means on restart bprm->interp is left pointing into unused stack memory which can then be copied into the userspace argv areas. After additional study, it seems that both recursion and restart remains the desirable way to handle exec with scripts, misc, and modules. As such, we need to protect the changes to interp. This changes the logic to require allocation for any changes to the bprm->interp. To avoid adding a new kmalloc to every exec, the default value is left as-is. Only when passing through binfmt_script or binfmt_misc does an allocation take place. For a proof of concept, see DoTest.sh from: http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/ Signed-off-by: Kees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for-linus' of ↵Linus Torvalds2012-12-2064-435/+1096
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull VFS update from Al Viro: "fscache fixes, ESTALE patchset, vmtruncate removal series, assorted misc stuff." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (79 commits) vfs: make lremovexattr retry once on ESTALE error vfs: make removexattr retry once on ESTALE vfs: make llistxattr retry once on ESTALE error vfs: make listxattr retry once on ESTALE error vfs: make lgetxattr retry once on ESTALE vfs: make getxattr retry once on an ESTALE error vfs: allow lsetxattr() to retry once on ESTALE errors vfs: allow setxattr to retry once on ESTALE errors vfs: allow utimensat() calls to retry once on an ESTALE error vfs: fix user_statfs to retry once on ESTALE errors vfs: make fchownat retry once on ESTALE errors vfs: make fchmodat retry once on ESTALE errors vfs: have chroot retry once on ESTALE error vfs: have chdir retry lookup and call once on ESTALE error vfs: have faccessat retry once on an ESTALE error vfs: have do_sys_truncate retry once on an ESTALE error vfs: fix renameat to retry on ESTALE errors vfs: make do_unlinkat retry once on ESTALE errors vfs: make do_rmdir retry once on ESTALE errors vfs: add a flags argument to user_path_parent ...
| * | | vfs: make lremovexattr retry once on ESTALE errorJeff Layton2012-12-201-2/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | vfs: make removexattr retry once on ESTALEJeff Layton2012-12-201-2/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | vfs: make llistxattr retry once on ESTALE errorJeff Layton2012-12-201-2/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | vfs: make listxattr retry once on ESTALE errorJeff Layton2012-12-201-2/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | vfs: make lgetxattr retry once on ESTALEJeff Layton2012-12-201-2/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | vfs: make getxattr retry once on an ESTALE errorJeff Layton2012-12-201-2/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | vfs: allow lsetxattr() to retry once on ESTALE errorsJeff Layton2012-12-201-2/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | vfs: allow setxattr to retry once on ESTALE errorsJeff Layton2012-12-201-2/+7
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | vfs: allow utimensat() calls to retry once on an ESTALE errorJeff Layton2012-12-201-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clearly, we can't handle the NULL filename case, but we can deal with the case where there's a real pathname. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
OpenPOWER on IntegriCloud