summaryrefslogtreecommitdiffstats
path: root/fs
Commit message (Collapse)AuthorAgeFilesLines
* [ARM] fs/adfs/adfs.h: "extern inline" doesn't make senseAdrian Bunk2005-08-201-2/+2
| | | | | | | "extern inline" doesn't make sense. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* Merge master.kernel.org:/pub/scm/linux/kernel/git/aia21/ntfs-2.6Linus Torvalds2005-08-172-1/+2
|\
| * NTFS: Complete the previous fix for the unset device when mapping buffersAnton Altaparmakov2005-08-162-1/+2
| | | | | | | | | | | | | | for mft record writing. I had missed the writepage based mft record write code path. Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
* | [PATCH] nfsd to unlock kernel before exitingSteven Rostedt2005-08-171-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The nfsd holds the big kernel lock upon exit, when it really shouldn't. Not to mention that this breaks Ingo's RT patch. This is a trivial fix to release the lock. Ingo, this patch also works with your kernel, and stops the problem with nfsd. Note, there's a "goto out;" where "out:" is right above svc_exit_thread. The point of the goto also holds the kernel_lock, so I don't see any problem here in releasing it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | Merge head 'for-linus' of ↵Linus Torvalds2005-08-165-24/+34
|\ \ | |/ |/| | | master.kernel.org:/pub/scm/linux/kernel/git/shaggy/jfs-2.6
| * Merge with /home/shaggy/git/linus-clean/Dave Kleikamp2005-08-105-15/+16
| |\ | | | | | | | | | Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | JFS: Fix race in txLockDave Kleikamp2005-08-102-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | TxAnchor.anon_list is protected by jfsTxnLock (TXN_LOCK), but there was a place in txLock() that was removing an entry from the list without holding the spinlock. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | Merge with /home/shaggy/git/linus-clean/Dave Kleikamp2005-08-0413-20/+57
| |\ \ | | | | | | | | | | | | Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | | JFS: Check for invalid inodes in jfs_delete_inodeDave Kleikamp2005-08-011-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Some error paths may iput an invalid inode with i_nlink=0. jfs should not try to actually delete such an inode. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | | Merge with /home/shaggy/git/linus-clean/Dave Kleikamp2005-07-2815-45/+149
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /home/shaggy/git/linus-clean/ /home/shaggy/git/linus-clean/ Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * \ \ \ Merge with /home/shaggy/git/linus-clean/Dave Kleikamp2005-07-271-54/+71
| |\ \ \ \ | | | | | | | | | | | | | | | | | | Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | | | | JFS: Improve sync barrier processingDave Kleikamp2005-07-274-24/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Under heavy load, hot metadata pages are often locked by non-committed transactions, making them difficult to flush to disk. This prevents the sync point from advancing past a transaction that had modified the page. There is a point during the sync barrier processing where all outstanding transactions have been committed to disk, but no new transaction have been allowed to proceed. This is the best time to write the metadata. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
* | | | | | Merge master.kernel.org:/pub/scm/linux/kernel/git/aia21/ntfs-2.6Linus Torvalds2005-08-162-0/+5
|\ \ \ \ \ \
| * | | | | | NTFS: Fix bug in mft record writing where we forgot to set the device inAnton Altaparmakov2005-08-162-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the buffers when mapping them after the VM had discarded them. Thanks to Martin MOKREJŠ for the bug report. Signed-off-by: Anton Altaparmakov <aia21@cantab.net>
* | | | | | | [PATCH] NFS: Ensure we always update inode->i_mode when doing O_EXCL createsTrond Myklebust2005-08-164-15/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the client performs an exclusive create and opens the file for writing, a Netapp filer will first create the file using the mode 01777. It does this since an NFSv3/v4 exclusive create cannot immediately set the mode bits. The 01777 mode then gets put into the inode->i_mode. After the file creation is successful, we then do a setattr to change the mode to the correct value (as per the NFS spec). The problem is that nfs_refresh_inode() no longer updates inode->i_mode, so the latter retains the 01777 mode. A bit later, the VFS notices this, and calls remove_suid(). This of course now resets the file mode to inode->i_mode & 0777. Hey presto, the file mode on the server is now magically changed to 0777. Duh... Fixes http://bugzilla.linux-nfs.org/show_bug.cgi?id=32 Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | | | | [PATCH] NFS: Ensure ACL xdr code doesn't overflow.Trond Myklebust2005-08-161-0/+1
|/ / / / / / | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | | | [PATCH] inotify: add MOVE_SELF eventJohn McCutchan2005-08-151-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a MOVE_SELF event to inotify. It is sent whenever the inode you are watching is moved. We need this event so that we can catch something like this: - app1: watch /etc/mtab - app2: cp /etc/mtab /tmp/mtab-work mv /etc/mtab /etc/mtab~ mv /tmp/mtab-work /etc/mtab app1 still thinks it's watching /etc/mtab but it's actually watching /etc/mtab~. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | | | [PATCH] inotify: fix idr_get_new_above usageRobert Love2005-08-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are saving the wrong thing in ->last_wd. We want the wd, not the return value. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | | | [PATCH] CIFS: Fix path name conversion for long filenamesSteve French2005-08-142-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix path name conversion for long filenames when mapchars mount option was specified at mount time. Signed-off-by: Steve French (sfrench@us.ibm.com) Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | | | [PATCH] CIFS: Fix missing entries in search resultsSteve French2005-08-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix missing entries in search results when very long file names and more than 50 (or so) of such long search entries in the directory. FindNext could send corrupt last byte of resume name when resume key was a few hundred bytes long file name or longer. Fixes Samba Bug # 2932 Signed-off-by: Steve French (sfrench@us.ibm.com) Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | | | [PATCH] Fix error handling in reiserfsJan Kara2005-08-131-0/+3
| |_|_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Initialize key object ID in inode so that we don't try to remove the inode when we fail on some checks even before we manage to allocate something. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | | [PATCH] fsnotify_name/inoderemoveJohn McCutchan2005-08-082-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch below unhooks fsnotify from vfs_unlink & vfs_rmdir. It introduces two new fsnotify calls, that are hooked in at the dcache level. This not only more closely matches how the VFS layer works, it also avoids the problem with locking and inode lifetimes. The two functions are - fsnotify_nameremove -- called when a directory entry is going away. It notifies the PARENT of the deletion. This is called from d_delete(). - inoderemove -- called when the files inode itself is going away. It notifies the inode that is being deleted. This is called from dentry_iput(). Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | | [PATCH] namespace.c: fix bind mount from foreign namespaceMiklos Szeredi2005-08-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm resending this patch, because I still believe it's the correct fix. Tested before/after applying the patch with a test application available from: http://www.inf.bme.hu/~mszeredi/nstest.c Bind mount from a foreign namespace results in an un-removable mount. The reason is that mnt->mnt_namespace is copied from the old mount in clone_mnt(). Because of this check_mnt() in sys_umount() will fail. The solution is to set mnt->mnt_namespace to current->namespace in clone_mnt(). clone_mnt() is either called from do_loopback() or copy_tree(). copy_tree() is called from do_loopback() or copy_namespace(). When called (directly or indirectly) from do_loopback(), always current->namspace is being modified: check_mnt(nd->mnt). So setting mnt->mnt_namespace to current->namspace is the right thing to do. When called from copy_namespace(), the setting of mnt_namespace is irrelevant, since mnt_namespace is reset later in that function for all copied mounts. Jamie said: This patch is correct. The old code was buggy for more fundamental and serious reason: it broke the invariant that a tree of vfsmnts all have the same value of mnt_namespace (and the same for the mnt_list list). Signed-off-by: Miklos Szeredi <miklos@szeredi.hu> Acked-by: Jamie Lokier <jamie@shareable.org> Cc: <viro@parcelfarce.linux.theplanet.co.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | | [PATCH] __bio_clone() dead commentAndrew Morton2005-08-071-6/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove a very wrong comment. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | | Check input buffer size in zisofsLinus Torvalds2005-08-061-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This uses the new deflateBound() thing to sanity-check the input to the zlib decompressor before we even bother to start reading in the blocks. Problem noted by Tim Yamin <plasmaroo@gentoo.org>
* | | | | [PATCH] Clean up inotify delete race fixJohn McCutchan2005-08-041-7/+2
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This avoids the whole #ifdef mess by just getting a copy of dentry->d_inode before d_delete is called - that makes the codepaths the same for the INOTIFY/DNOTIFY cases as for the regular no-notify case. I've been running this under a Gnome session for the last 10 minutes. Inotify is being used extensively. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] inotify delete race fixJohn McCutchan2005-08-041-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The included patch fixes a problem where a inotify client would receive a delete event before the file was actually deleted. The bug affects both dnotify & inotify. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] inotify: update help textRobert Love2005-08-041-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inotify help text still refers to the character device. Update it. Fixes kernel bug #4993. Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] hfs: don't reference missing pageRoman Zippel2005-08-012-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there was a read error, the bnode might miss some pages, so skip them. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] hfs: don't dirty unchanged inodeRoman Zippel2005-08-012-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If inode size hasn't changed, don't do anything further in truncate, which also prevents a dirty inode, what might upset some readonly devices quite badly. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] inotify: fix race between the kernel and user spaceJohn McCutchan2005-08-011-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When you rm a watch, an IN_IGNORED event is sent down the event queue with the watch descriptor that you just rm'd. If you then add a watch you could get the ignored watch's wd and if you haven't read the entire event queue, user space will think that it's newly created watch was just ignored. To avoid this problem we just use idr_get_new_above instead of idr_get_new. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] inotify: fix file deletion by rename detectionJohn McCutchan2005-08-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a file is moved over an existing file that you are watching, inotify won't send you a DELETE_SELF event and it won't unref the inode until the inotify instance is closed by the application. Signed-off-by: John McCutchan <ttb@tentacle.dhs.org> Signed-off-by: Robert Love <rml@novell.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] sysfs: fix sysfs_setattrManeesh Soni2005-07-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o sysfs_dirent's s_mode field should also be updated in sysfs_setattr(), else there could be inconsistency in the two fields. s_mode is used while ->readdir so as not to bring in the inode to cache. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] sysfs: fix sysfs_chmod_fileManeesh Soni2005-07-291-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | o sysfs_chmod_file() must update the new iattr field in sysfs_dirent else the mode change will not be persistent in case of inode evacuation from cache. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] uml: implement hostfs syncingPaolo 'Blaisorblade' Giarrusso2005-07-283-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Actually implement the hostfs "sync" method. Signed-off-by: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] bio_clone fixAndrew Morton2005-07-281-0/+1
| |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix bug introduced in 2.6.11-rc2: when we clone a BIO we need to copy over the current index into it as well. It corrupts data with some MD setups. See http://bugzilla.kernel.org/show_bug.cgi?id=4946 Huuuuuuuuge thanks to Matthew Stapleton <matthew4196@gmail.com> for doggedly chasing this one down. Acked-by: Jens Axboe <axboe@suse.de> Cc: <linux-raid@vger.kernel.org> Cc: <dm-devel@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | Merge head 'for-linus' of ↵Linus Torvalds2005-07-274-31/+42
|\ \ \ | |/ / | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/shaggy/jfs-2.6
| * | JFS: Fix i_blocks accounting when allocation failsDave Kleikamp2005-07-261-4/+9
| | | | | | | | | | | | | | | | | | | | | A failure in dbAlloc caused a directory's i_blocks to be incorrectly incremented, causing jfs_fsck to find the inode to be corrupt. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | JFS: Don't set log_SYNCBARRIER when log->active == 0Dave Kleikamp2005-07-251-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a metadata page is kept active, it is possible that the sync barrier logic continues to trigger, even if all active transactions have been phyically written to the journal. This can cause a hang, since the completion of the journal I/O is what unsets the sync barrier flag to allow new transactions to be created. Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | JFS: Fix typo in last patchDave Kleikamp2005-07-221-1/+1
| | | | | | | | | | | | Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | Merge with /home/shaggy/git/linus-clean/Dave Kleikamp2005-07-1930-1008/+2632
| |\ \ | | | | | | | | | | | | Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | | JFS: fsync wrong behavior when I/O failure occursQu Fuping2005-07-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is half of a patch that Qu Fuping submitted in April. The first part was applied to fs/mpage.c in 2.6.12-rc4. jfs_fsync should return error, but it doesn't wait for the metadata page to be uptodate, e.g.: jfs_fsync->jfs_commit_inode->txCommit->diWrite->read_metapage-> __get_metapage->read_cache_page reads a page from disk. Because read is async, when read_cache_page: err = filler(data, page), filler will not return error, it just submits I/O request and returns. So, page is not uptodate. Checking only if(IS_ERROR(mp->page)) is not enough, we should add "|| !PageUptodate(mp->page)" Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | | JFS: Remove assert statement in dbJoin & return -EIO insteadDave Kleikamp2005-07-151-16/+30
| | | | | | | | | | | | | | | | Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
| * | | JFS: Remove bogus WARN_ON statement and some dead codeDave Kleikamp2005-07-141-9/+0
| | | | | | | | | | | | | | | | Signed-off-by: Dave Kleikamp <shaggy@austin.ibm.com>
* | | | [PATCH] clean up inline static vs static inlineJesper Juhl2005-07-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | `gcc -W' likes to complain if the static keyword is not at the beginning of the declaration. This patch fixes all remaining occurrences of "inline static" up with "static inline" in the entire kernel tree (140 occurrences in 47 files). While making this change I came across a few lines with trailing whitespace that I also fixed up, I have also added or removed a blank line or two here and there, but there are no functional changes in the patch. Signed-off-by: Jesper Juhl <juhl-lkml@dif.dk> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] turn many #if $undefined_string into #ifdef $undefined_stringOlaf Hering2005-07-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | turn many #if $undefined_string into #ifdef $undefined_string to fix some warnings after -Wno-def was added to global CFLAGS Signed-off-by: Olaf Hering <olh@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] reiserfs doesn't use mbcacheAndreas Gruenbacher2005-07-271-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | reiserfs doesn't use the mbcache, so this can go. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] mbcache: Remove unused mb_cache_shrink parameterAndreas Gruenbacher2005-07-273-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cache parameter to mb_cache_shrink isn't used. We may as well remove it. Signed-off-by: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] stale POSIX lock handlingPeter Staubach2005-07-272-35/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I believe that there is a problem with the handling of POSIX locks, which the attached patch should address. The problem appears to be a race between fcntl(2) and close(2). A multithreaded application could close a file descriptor at the same time as it is trying to acquire a lock using the same file descriptor. I would suggest that that multithreaded application is not providing the proper synchronization for itself, but the OS should still behave correctly. SUS3 (Single UNIX Specification Version 3, read: POSIX) indicates that when a file descriptor is closed, that all POSIX locks on the file, owned by the process which closed the file descriptor, should be released. The trick here is when those locks are released. The current code releases all locks which exist when close is processing, but any locks in progress are handled when the last reference to the open file is released. There are three cases to consider. One is the simple case, a multithreaded (mt) process has a file open and races to close it and acquire a lock on it. In this case, the close will release one reference to the open file and when the fcntl is done, it will release the other reference. For this situation, no locks should exist on the file when both the close and fcntl operations are done. The current system will handle this case because the last reference to the open file is being released. The second case is when the mt process has dup(2)'d the file descriptor. The close will release one reference to the file and the fcntl, when done, will release another, but there will still be at least one more reference to the open file. One could argue that the existence of a lock on the file after the close has completed is okay, because it was acquired after the close operation and there is still a way for the application to release the lock on the file, using an existing file descriptor. The third case is when the mt process has forked, after opening the file and either before or after becoming an mt process. In this case, each process would hold a reference to the open file. For each process, this degenerates to first case above. However, the lock continues to exist until both processes have released their references to the open file. This lock could block other lock requests. The changes to release the lock when the last reference to the open file aren't quite right because they would allow the lock to exist as long as there was a reference to the open file. This is too long. The new proposed solution is to add support in the fcntl code path to detect a race with close and then to release the lock which was just acquired when such as race is detected. This causes locks to be released in a timely fashion and for the system to conform to the POSIX semantic specification. This was tested by instrumenting a kernel to detect the handling locks and then running a program which generates case #3 above. A dangling lock could be reliably generated. When the changes to detect the close/fcntl race were added, a dangling lock could no longer be generated. Cc: Matthew Wilcox <willy@debian.org> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | | | [PATCH] fix xip sparse file handling in ext2Carsten Otte2005-07-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oliver Paukstadt from our test department is testing the xip patches in Linus' git-tree. He found a problem that shows when reading a file that contains sparse blocks (holes) on a -o xip mounted ext2 filesystem: the BUG_ON() in fs/ext2/xip.c:40 triggers where it should not. The problem was introduced by a cleanup in my previous patch, this patch fixes it. Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
OpenPOWER on IntegriCloud