summaryrefslogtreecommitdiffstats
path: root/fs/open.c
Commit message (Collapse)AuthorAgeFilesLines
...
* | Merge tag 'xfs-for-linus-3.15-rc1' of git://oss.sgi.com/xfs/xfsLinus Torvalds2014-04-041-3/+26
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs update from Dave Chinner: "There are a couple of new fallocate features in this request - it was decided that it was easiest to push them through the XFS tree using topic branches and have the ext4 support be based on those branches. Hence you may see some overlap with the ext4 tree merge depending on how they including those topic branches into their tree. Other than that, there is O_TMPFILE support, some cleanups and bug fixes. The main changes in the XFS tree for 3.15-rc1 are: - O_TMPFILE support - allowing AIO+DIO writes beyond EOF - FALLOC_FL_COLLAPSE_RANGE support for fallocate syscall and XFS implementation - FALLOC_FL_ZERO_RANGE support for fallocate syscall and XFS implementation - IO verifier cleanup and rework - stack usage reduction changes - vm_map_ram NOIO context fixes to remove lockdep warings - various bug fixes and cleanups" * tag 'xfs-for-linus-3.15-rc1' of git://oss.sgi.com/xfs/xfs: (34 commits) xfs: fix directory hash ordering bug xfs: extra semi-colon breaks a condition xfs: Add support for FALLOC_FL_ZERO_RANGE fs: Introduce FALLOC_FL_ZERO_RANGE flag for fallocate xfs: inode log reservations are still too small xfs: xfs_check_page_type buffer checks need help xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation xfs: use NOIO contexts for vm_map_ram xfs: don't leak EFSBADCRC to userspace xfs: fix directory inode iolock lockdep false positive xfs: allocate xfs_da_args to reduce stack footprint xfs: always do log forces via the workqueue xfs: modify verifiers to differentiate CRC from other errors xfs: print useful caller information in xfs_error_report xfs: add xfs_verifier_error() xfs: add helper for updating checksums on xfs_bufs xfs: add helper for verifying checksums on xfs_bufs xfs: Use defines for CRC offsets in all cases xfs: skip pointless CRC updates after verifier failures xfs: Add support FALLOC_FL_COLLAPSE_RANGE for fallocate ...
| * fs: Introduce FALLOC_FL_ZERO_RANGE flag for fallocateLukas Czerner2014-03-131-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. It can be used to convert a range of file to zeros preferably without issuing data IO. Blocks should be preallocated for the regions that span holes in the file, and the entire range is preferable converted to unwritten extents - even though file system may choose to zero out the extent or do whatever which will result in reading zeros from the range while the range remains allocated for the file. This can be also used to preallocate blocks past EOF in the same way as with fallocate. Flag FALLOC_FL_KEEP_SIZE which should cause the inode size to remain the same. Signed-off-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
| * fs: Add new flag(FALLOC_FL_COLLAPSE_RANGE) for fallocateNamjae Jeon2014-02-241-3/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is in response of the following post: http://lwn.net/Articles/556136/ "ext4: introduce two new ioctls" Dave chinner suggested that truncate_block_range (which was one of the ioctls name) should be a fallocate operation and not any fs specific ioctl, hence we add this functionality to new flags of fallocate. This new functionality of collapsing range could be used by media editing tools which does non linear editing to quickly purge and edit parts of a media file. This will immensely improve the performance of these operations. The limitation of fs block size aligned offsets can be easily handled by media codecs which are encapsulated in a conatiner as they have to just change the offset to next keyframe value to match the proper alignment. Signed-off-by: Namjae Jeon <namjae.jeon@samsung.com> Signed-off-by: Ashish Sangwan <a.sangwan@samsung.com> Reviewed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Dave Chinner <david@fromorbit.com>
* | vfs: atomic f_pos accesses as per POSIXLinus Torvalds2014-03-101-0/+4
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our write() system call has always been atomic in the sense that you get the expected thread-safe contiguous write, but we haven't actually guaranteed that concurrent writes are serialized wrt f_pos accesses, so threads (or processes) that share a file descriptor and use "write()" concurrently would quite likely overwrite each others data. This violates POSIX.1-2008/SUSv4 Section XSI 2.9.7 that says: "2.9.7 Thread Interactions with Regular File Operations All of the following functions shall be atomic with respect to each other in the effects specified in POSIX.1-2008 when they operate on regular files or symbolic links: [...]" and one of the effects is the file position update. This unprotected file position behavior is not new behavior, and nobody has ever cared. Until now. Yongzhi Pan reported unexpected behavior to Michael Kerrisk that was due to this. This resolves the issue with a f_pos-specific lock that is taken by read/write/lseek on file descriptors that may be shared across threads or processes. Reported-by: Yongzhi Pan <panyongzhi@gmail.com> Reported-by: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* locks: break delegations on any attribute modificationJ. Bruce Fields2013-11-091-4/+18
| | | | | | | | | | | | | NFSv4 uses leases to guarantee that clients can cache metadata as well as data. Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Cc: David Howells <dhowells@redhat.com> Cc: Tyler Hicks <tyhicks@canonical.com> Cc: Dustin Kirkland <dustin.kirkland@gazzang.com> Acked-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of s_files and files_lockAl Viro2013-11-091-2/+0
| | | | | | | | The only thing we need it for is alt-sysrq-r (emergency remount r/o) and these days we can do just as well without going through the list of files. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* file->f_op is never NULL...Al Viro2013-10-241-2/+6
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: improve i_op->atomic_open() documentationMiklos Szeredi2013-09-161-3/+18
| | | | | | | | | | | | | | | | Fix documentation of ->atomic_open() and related functions: finish_open() and finish_no_open(). Also add details that seem to be unclear and a source of bugs (some of which are fixed in the following series). Cc-ing maintainers of all filesystems implementing ->atomic_open(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Cc: Eric Van Hensbergen <ericvh@gmail.com> Cc: Sage Weil <sage@inktank.com> Cc: Steve French <sfrench@samba.org> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-09-071-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace changes from Eric Biederman: "This is an assorted mishmash of small cleanups, enhancements and bug fixes. The major theme is user namespace mount restrictions. nsown_capable is killed as it encourages not thinking about details that need to be considered. A very hard to hit pid namespace exiting bug was finally tracked and fixed. A couple of cleanups to the basic namespace infrastructure. Finally there is an enhancement that makes per user namespace capabilities usable as capabilities, and an enhancement that allows the per userns root to nice other processes in the user namespace" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: userns: Kill nsown_capable it makes the wrong thing easy capabilities: allow nice if we are privileged pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD userns: Allow PR_CAPBSET_DROP in a user namespace. namespaces: Simplify copy_namespaces so it is clear what is going on. pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup sysfs: Restrict mounting sysfs userns: Better restrictions on when proc and sysfs can be mounted vfs: Don't copy mount bind mounts of /proc/<pid>/ns/mnt between namespaces kernel/nsproxy.c: Improving a snippet of code. proc: Restrict mounting the proc filesystem vfs: Lock in place mounts from more privileged users
| * userns: Kill nsown_capable it makes the wrong thing easyEric W. Biederman2013-08-301-1/+1
| | | | | | | | | | | | | | | | | | | | nsown_capable is a special case of ns_capable essentially for just CAP_SETUID and CAP_SETGID. For the existing users it doesn't noticably simplify things and from the suggested patches I have seen it encourages people to do the wrong thing. So remove nsown_capable. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* | switch fchmod() to fdgetAl Viro2013-09-031-6/+5
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | fs: Fix file mode for O_TMPFILEAndy Lutomirski2013-08-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | O_TMPFILE, like O_CREAT, should respect the requested mode and should create regular files. This fixes two bugs: O_TMPFILE required privilege (because the mode ended up as 000) and it produced bogus inodes with no type. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | allow O_TMPFILE to work with O_WRONLYAl Viro2013-07-201-0/+2
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Safer ABI for O_TMPFILEAl Viro2013-07-131-2/+2
| | | | | | | | [suggested by Rasmus Villemoes] make O_DIRECTORY | O_RDWR part of O_TMPFILE; that will fail on old kernels in a lot more cases than what I came up with. And make sure O_CREAT doesn't get there... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [O_TMPFILE] it's still short a few helpers, but infrastructure should be OK ↵Al Viro2013-06-291-5/+9
| | | | | | now... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* allow build_open_flags() to return an errorAl Viro2013-06-291-21/+28
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protectAl Viro2013-03-031-20/+4
| | | | | | | ... and switch i386 to HAVE_SYSCALL_WRAPPERS, killing open-coded uses of asmlinkage_protect() in a bunch of syscalls. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* teach SYSCALL_DEFINE<n> how to deal with long long/unsigned long longAl Viro2013-03-031-25/+3
| | | | | | | ... and convert a bunch of SYSCALL_DEFINE ones to SYSCALL_DEFINE<n>, killing the boilerplate crap around them. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2013-03-031-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more VFS bits from Al Viro: "Unfortunately, it looks like xattr series will have to wait until the next cycle ;-/ This pile contains 9p cleanups and fixes (races in v9fs_fid_add() etc), fixup for nommu breakage in shmem.c, several cleanups and a bit more file_inode() work" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: constify path_get/path_put and fs_struct.c stuff fix nommu breakage in shmem.c cache the value of file_inode() in struct file 9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry 9p: make sure ->lookup() adds fid to the right dentry 9p: untangle ->lookup() a bit 9p: double iput() in ->lookup() if d_materialise_unique() fails 9p: v9fs_fid_add() can't fail now v9fs: get rid of v9fs_dentry 9p: turn fid->dlist into hlist 9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine more file_inode() open-coded instances selinux: opened file can't have NULL or negative ->f_path.dentry (In the meantime, the hlist traversal macros have changed, so this required a semantic conflict fixup for the newly hlistified fid->dlist)
| * cache the value of file_inode() in struct fileAl Viro2013-03-011-1/+2
| | | | | | | | | | | | | | Note that this thing does *not* contribute to inode refcount; it's pinned down by dentry. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-linus' of ↵Linus Torvalds2013-03-021-0/+15
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal/compat fixes from Al Viro: "Fixes for several regressions introduced in the last signal.git pile, along with fixing bugs in truncate and ftruncate compat (on just about anything biarch at least one of those two had been done wrong)." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: compat: restore timerfd settime and gettime compat syscalls [regression] braino in "sparc: convert to ksignal" fix compat truncate/ftruncate switch lseek to COMPAT_SYSCALL_DEFINE lseek() and truncate() on sparc really need sign extension
| * fix compat truncate/ftruncateAl Viro2013-02-251-0/+15
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zeroAl Viro2013-02-261-1/+0
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | propagate error from get_empty_filp() to its callersAl Viro2013-02-221-14/+13
| | | | | | | | | | | | Based on parts from Anatol's patch (the rest is the next commit). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | new helper: file_inode(file)Al Viro2013-02-221-3/+3
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: make fchownat retry once on ESTALE errorsJeff Layton2012-12-201-0/+5
| | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: make fchmodat retry once on ESTALE errorsJeff Layton2012-12-201-2/+7
| | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: have chroot retry once on ESTALE errorJeff Layton2012-12-201-2/+7
| | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: have chdir retry lookup and call once on ESTALE errorJeff Layton2012-12-201-2/+7
| | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: have faccessat retry once on an ESTALE errorJeff Layton2012-12-201-2/+7
| | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: have do_sys_truncate retry once on an ESTALE errorJeff Layton2012-12-201-1/+7
| | | | | Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Make more complete truncate operation available to CacheFilesDavid Howells2012-12-201-23/+27
| | | | | | | | | Make a more complete truncate operation available to CacheFiles (including security checks and suchlike) so that it can use this to clear invalidated cache files. Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: Allow chroot if you have CAP_SYS_CHROOT in your user namespaceEric W. Biederman2012-11-191-1/+1
| | | | | | | | | Once you are confined to a user namespace applications can not gain privilege and escape the user namespace so there is no longer a reason to restrict chroot. Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* vfs: make path_openat take a struct filename pointerJeff Layton2012-10-121-4/+21
| | | | | | | | | | | | | ...and fix up the callers. For do_file_open_root, just declare a struct filename on the stack and fill out the .name field. For do_filp_open, make it also take a struct filename pointer, and fix up its callers to call it appropriately. For filp_open, add a variant that takes a struct filename pointer and turn filp_open into a wrapper around it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vfs: define struct filename and have getname() return itJeff Layton2012-10-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* audit: set the name_len in audit_inode for parent lookupsJeff Layton2012-10-121-2/+2
| | | | | | | | | | | | | | | | | | Currently, this gets set mostly by happenstance when we call into audit_inode_child. While that might be a little more efficient, it seems wrong. If the syscall ends up failing before audit_inode_child ever gets called, then you'll have an audit_names record that shows the full path but has the parent inode info attached. Fix this by passing in a parent flag when we call audit_inode that gets set to the value of LOOKUP_PARENT. We can then fix up the pathname for the audit entry correctly from the get-go. While we're at it, clean up the no-op macro for audit_inode in the !CONFIG_AUDITSYSCALL case. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2012-10-021-100/+30
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs update from Al Viro: - big one - consolidation of descriptor-related logics; almost all of that is moved to fs/file.c (BTW, I'm seriously tempted to rename the result to fd.c. As it is, we have a situation when file_table.c is about handling of struct file and file.c is about handling of descriptor tables; the reasons are historical - file_table.c used to be about a static array of struct file we used to have way back). A lot of stray ends got cleaned up and converted to saner primitives, disgusting mess in android/binder.c is still disgusting, but at least doesn't poke so much in descriptor table guts anymore. A bunch of relatively minor races got fixed in process, plus an ext4 struct file leak. - related thing - fget_light() partially unuglified; see fdget() in there (and yes, it generates the code as good as we used to have). - also related - bits of Cyrill's procfs stuff that got entangled into that work; _not_ all of it, just the initial move to fs/proc/fd.c and switch of fdinfo to seq_file. - Alex's fs/coredump.c spiltoff - the same story, had been easier to take that commit than mess with conflicts. The rest is a separate pile, this was just a mechanical code movement. - a few misc patches all over the place. Not all for this cycle, there'll be more (and quite a few currently sit in akpm's tree)." Fix up trivial conflicts in the android binder driver, and some fairly simple conflicts due to two different changes to the sock_alloc_file() interface ("take descriptor handling from sock_alloc_file() to callers" vs "net: Providing protocol type via system.sockprotoname xattr of /proc/PID/fd entries" adding a dentry name to the socket) * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits) MAX_LFS_FILESIZE should be a loff_t compat: fs: Generic compat_sys_sendfile implementation fs: push rcu_barrier() from deactivate_locked_super() to filesystems btrfs: reada_extent doesn't need kref for refcount coredump: move core dump functionality into its own file coredump: prevent double-free on an error path in core dumper usb/gadget: fix misannotations fcntl: fix misannotations ceph: don't abuse d_delete() on failure exits hypfs: ->d_parent is never NULL or negative vfs: delete surplus inode NULL check switch simple cases of fget_light to fdget new helpers: fdget()/fdput() switch o2hb_region_dev_write() to fget_light() proc_map_files_readdir(): don't bother with grabbing files make get_file() return its argument vhost_set_vring(): turn pollstart/pollstop into bool switch prctl_set_mm_exe_file() to fget_light() switch xfs_find_handle() to fget_light() switch xfs_swapext() to fget_light() ...
| * switch simple cases of fget_light to fdgetAl Viro2012-09-261-34/+30
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * switch fchmod(2) to fget_light()Al Viro2012-09-261-7/+5
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * switch fallocate(2) to fget_light()Al Viro2012-09-261-3/+3
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * switch ftruncate(2) to fget_lightAl Viro2012-09-261-5/+5
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * don't leak O_CLOEXEC into ->f_flagsAl Viro2012-09-261-1/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * take descriptor-related part of close() to file.cAl Viro2012-09-261-21/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * move put_unused_fd() and fd_install() to fs/file.cAl Viro2012-09-261-44/+0
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | Merge branch 'for-linus' of ↵Linus Torvalds2012-10-021-1/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull user namespace changes from Eric Biederman: "This is a mostly modest set of changes to enable basic user namespace support. This allows the code to code to compile with user namespaces enabled and removes the assumption there is only the initial user namespace. Everything is converted except for the most complex of the filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs, nfs, ocfs2 and xfs as those patches need a bit more review. The strategy is to push kuid_t and kgid_t values are far down into subsystems and filesystems as reasonable. Leaving the make_kuid and from_kuid operations to happen at the edge of userspace, as the values come off the disk, and as the values come in from the network. Letting compile type incompatible compile errors (present when user namespaces are enabled) guide me to find the issues. The most tricky areas have been the places where we had an implicit union of uid and gid values and were storing them in an unsigned int. Those places were converted into explicit unions. I made certain to handle those places with simple trivial patches. Out of that work I discovered we have generic interfaces for storing quota by projid. I had never heard of the project identifiers before. Adding full user namespace support for project identifiers accounts for most of the code size growth in my git tree. Ultimately there will be work to relax privlige checks from "capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing root in a user names to do those things that today we only forbid to non-root users because it will confuse suid root applications. While I was pushing kuid_t and kgid_t changes deep into the audit code I made a few other cleanups. I capitalized on the fact we process netlink messages in the context of the message sender. I removed usage of NETLINK_CRED, and started directly using current->tty. Some of these patches have also made it into maintainer trees, with no problems from identical code from different trees showing up in linux-next. After reading through all of this code I feel like I might be able to win a game of kernel trivial pursuit." Fix up some fairly trivial conflicts in netfilter uid/git logging code. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits) userns: Convert the ufs filesystem to use kuid/kgid where appropriate userns: Convert the udf filesystem to use kuid/kgid where appropriate userns: Convert ubifs to use kuid/kgid userns: Convert squashfs to use kuid/kgid where appropriate userns: Convert reiserfs to use kuid and kgid where appropriate userns: Convert jfs to use kuid/kgid where appropriate userns: Convert jffs2 to use kuid and kgid where appropriate userns: Convert hpfs to use kuid and kgid where appropriate userns: Convert btrfs to use kuid/kgid where appropriate userns: Convert bfs to use kuid/kgid where appropriate userns: Convert affs to use kuid/kgid wherwe appropriate userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids userns: On ia64 deal with current_uid and current_gid being kuid and kgid userns: On ppc convert current_uid from a kuid before printing. userns: Convert s390 getting uid and gid system calls to use kuid and kgid userns: Convert s390 hypfs to use kuid and kgid where appropriate userns: Convert binder ipc to use kuids userns: Teach security_path_chown to take kuids and kgids userns: Add user namespace support to IMA userns: Convert EVM to deal with kuids and kgids in it's hmac computation ...
| * userns: Teach security_path_chown to take kuids and kgidsEric W. Biederman2012-09-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Don't make the security modules deal with raw user space uid and gids instead pass in a kuid_t and a kgid_t so that security modules only have to deal with internal kernel uids and gids. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: James Morris <james.l.morris@oracle.com> Cc: John Johansen <john.johansen@canonical.com> Cc: Kentaro Takeda <takedakn@nttdata.co.jp> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | vfs: canonicalize create mode in build_open_flags()Miklos Szeredi2012-08-151-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Userspace can pass weird create mode in open(2) that we canonicalize to "(mode & S_IALLUGO) | S_IFREG" in vfs_create(). The problem is that we use the uncanonicalized mode before calling vfs_create() with unforseen consequences. So do the canonicalization early in build_open_flags(). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> Tested-by: Richard W.M. Jones <rjones@redhat.com> CC: stable@vger.kernel.org
* | missed mnt_drop_write() in do_dentry_open()Al Viro2012-08-041-1/+1
|/ | | | | | | This one ought to be __mnt_drop_write(), to match __mnt_want_write() in the beginning... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: Protect write paths by sb_start_write - sb_end_writeJan Kara2012-07-311-1/+6
| | | | | | | | | | | | | | | | | | | | | | There are several entry points which dirty pages in a filesystem. mmap (handled by block_page_mkwrite()), buffered write (handled by __generic_file_aio_write()), splice write (generic_file_splice_write), truncate, and fallocate (these can dirty last partial page - handled inside each filesystem separately). Protect these places with sb_start_write() and sb_end_write(). ->page_mkwrite() calls are particularly complex since they are called with mmap_sem held and thus we cannot use standard sb_start_write() due to lock ordering constraints. We solve the problem by using a special freeze protection sb_start_pagefault() which ranks below mmap_sem. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fs: Add freezing handling to mnt_want_write() / mnt_drop_write()Jan Kara2012-07-311-1/+1
| | | | | | | | | | | | | | | | Most of places where we want freeze protection coincides with the places where we also have remount-ro protection. So make mnt_want_write() and mnt_drop_write() (and their _file alternative) prevent freezing as well. For the few cases that are really interested only in remount-ro protection provide new function variants. BugLink: https://bugs.launchpad.net/bugs/897421 Tested-by: Kamal Mostafa <kamal@canonical.com> Tested-by: Peter M. Petrakis <peter.petrakis@canonical.com> Tested-by: Dann Frazier <dann.frazier@canonical.com> Tested-by: Massimo Morana <massimo.morana@canonical.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
OpenPOWER on IntegriCloud