summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_syscalls.c
Commit message (Collapse)AuthorAgeFilesLines
* Don't add VAPPEND if the file is not being opened for writing. Note that thistrasz2009-12-081-1/+1
| | | | | | | only affects cases where open(2) is being used improperly - i.e. when the user specifies O_APPEND without O_WRONLY or O_RDWR. Reviewed by: rwatson
* In fhopen, vfs_ref() the mount point while vnode is unlocked, to preventkib2009-09-061-1/+3
| | | | | | | | | | | vn_start_write(NULL, &mp) from operating on potentially freed or reused struct mount *. Remove unmatched vfs_rel() in cleanup. Noted and reviewed by: tegge Tested by: pho MFC after: 3 days
* Honor the vfs.timestamp_precision sysctl settings for utimes(path, NULL)kib2009-08-261-2/+1
| | | | | | | and similar calls. Obtained from: Petr Salinger, Debian GNU/kFreeBSD, Debian bug #489894 MFC after: 3 days
* Fix some LORs between vnode locks and filedescriptor table locks.jhb2009-07-311-8/+0
| | | | | | | | | | - Don't grab the filedesc lock just to read fd_cmask. - Drop vnode locks earlier when mounting the root filesystem and before sanitizing stdin/out/err file descriptors during execve(). Submitted by: kib Approved by: re (rwatson) MFC after: 1 week
* Rework vnode argument auditing to follow the same structure, in orderrwatson2009-07-281-8/+8
| | | | | | | | | | to avoid exposing ARG_ macros/flag values outside of the audit code in order to name which one of two possible vnodes will be audited for a system call. Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 month
* There is an optimization in chmod(1), that makes it not to call chmod(2)trasz2009-07-081-4/+23
| | | | | | | | | | | | | if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't. This change adds lpathconf(3) to make it possible to solve that problem in a clean way. Reviewed by: rwatson (earlier version) Approved by: re (kib)
* For access(2) and eaccess(2), audit the requested access mode.rwatson2009-07-011-0/+1
| | | | | Approved by: re (audit argument blanket) MFC after: 3 days
* Audit the file descriptor number passed to lseek(2).rwatson2009-07-011-0/+1
| | | | | Approved by: re (kib) MFC after: 3 days
* Fix link(2) auditing: use the second audit record path for the new objectrwatson2009-07-011-1/+1
| | | | | | | name. Approved by: re (kib) MFC after: 3 days
* Replace AUDIT_ARG() with variable argument macros with a set more morerwatson2009-06-271-32/+32
| | | | | | | | | | | | | | specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week
* Remove the static from int hardlink_check_uid.bz2009-06-131-1/+1
| | | | | | | | | | | There is an external use in the opensolaris code. I am not sure how this ever worked but I have seen two reports of: link_elf: symbol hardlink_check_uid undefined lately. Reported by: Scott Ullrich (sullrich gmail.com), pfsense Reported by: Mister Olli (mister.olli googlemail.com)
* Simply shared vnode locking and extend it to also include fsync.ps2009-06-081-2/+8
| | | | | | | Also, in vop_write, no longer assert for exclusive locks on the vnode. Reviewed by: jhb, kmacy, jeffr
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Add hierarchical jails. A jail may further virtualize its environmentjamie2009-05-271-7/+1
| | | | | | | | | | | | | | | | | | | | | | by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
* - Implement a lockless file descriptor lookup algorithm injeff2009-05-141-14/+5
| | | | | | | | | | | | fget_unlocked(). - Save old file descriptor tables created on expansion until the entire descriptor table is freed so that pointers may be followed without regard for expanders. - Mark the file zone as NOFREE so we may attempt to reference potentially freed files. - Convert several fget_locked() users to fget_unlocked(). This requires us to manage reference counts explicitly but reduces locking overhead in the common case.
* Prevent overflow of uio_resid.kib2009-05-111-0/+3
| | | | | Noted by: jhb MFC after: 3 days
* Remove the thread argument from the FSD (File-System Dependent) parts ofattilio2009-05-111-7/+7
| | | | | | | | | | | | | | | | | the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
* Remove VOP_LEASE and supporting functions. This hasn't been used sincerwatson2009-04-101-24/+0
| | | | | | | | | | | | | | the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
* Don't make Linux stat() open character devices to resolve its name.ed2009-02-201-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing code calls kern_open() to resolve the vnode of a pathname right after a stat(). This is not correct, because it causes random character devices to be opened in /dev. This means ls'ing a tape streamer will cause it to rewind, for example. Changes I have made: - Add kern_statat_vnhook() to allow binary emulators to `post-process' struct stat, using the proper vnode. - Remove unneeded printf's from stat() and statfs(). - Make the Linuxolator use kern_statat_vnhook(), replacing translate_path_major_minor_at(). - Let translate_fd_major_minor() use vp->v_rdev instead of vp->v_un.vu_cdev. Result: crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0 crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1 crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2 Before this commit, ptmx also had a major number of 136, because it silently allocated and deallocated a pseudo-terminal. Device nodes that cannot be opened now have proper major/minor-numbers. Reviewed by: kib, netchild, rdivacky (thanks!)
* Use shared vnode locks when invoking VOP_READDIR().jhb2009-02-131-3/+2
| | | | MFC after: 1 month
* In some situations, mnt_lockref could go negative due to vfs_unbusy() beingtrasz2009-02-051-3/+5
| | | | | | | | | | | called without calling vfs_busy() first. This made umount(8) hang waiting for mnt_lockref to become zero, which would never happen. Reviewed by: kib Approved by: rwatson (mentor) Reported by: pho Found with: stress2 Sponsored by: FreeBSD Foundation
* Use shared vnode locks for fchdir().jhb2009-01-231-2/+2
| | | | Submitted by: ups
* Prevent overflow of uio_resid.pho2008-12-271-0/+2
| | | | Approved by: kib
* The quotactl, statfs and fstatfs syscall implementations may dereferencekib2008-12-181-6/+18
| | | | | | | | | | | | | | NULL pointer to struct mount if the looked up vnode is reclaimed. Also, these syscalls only mnt_ref() the mp, still allowing it to be unmounted; only struct mount memory is kept from being reused. Lock the vnode when doing name lookup, then reference its mount point, unlock the vnode and vfs_busy the mountpoint. This sequence shall take care of both races. Reported and tested by: pho Discussed with: attilio MFC after: 1 month
* In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointerkib2008-11-291-9/+9
| | | | | | | | | | | | | | | | | | | to the fs, but before a vnode on the fs is locked, unmount may free fs structures, causing access to destroyed data and freed memory. Introduce a vfs_busymp() function that looks up and busies found fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the implementation of the handle syscalls. Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular, sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl handler, that prevents Giant-locked unmount code to interfere with it. Noted by: tegge Reviewed by: dfr Tested by: pho MFC after: 1 month
* Merge latest DTrace changes from Perforce.rodrigc2008-11-051-0/+15
| | | | Approved by: jb
* Use shared vnode locks for auditing vnode arguments as auditing onlyjhb2008-11-041-4/+4
| | | | | | does a VOP_GETATTR() which does not require an exclusive lock. Reviewed by: csjp, rwatson
* Use shared vnode locks instead of exclusive vnode locks for the access(),jhb2008-11-031-14/+14
| | | | | | | | | | chdir(), chroot(), eaccess(), fpathconf(), fstat(), fstatfs(), lseek() (when figuring out the current size of the file in the SEEK_END case), pathconf(), readlink(), and statfs() system calls. Submitted by: ups (mostly) Tested by: pho MFC after: 1 month
* Improve VFS locking:attilio2008-11-021-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
* Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessarytrasz2008-10-281-16/+18
| | | | | | | to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
* Whitespace fix.jhb2008-10-231-1/+2
|
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-2/+2
| | | | MFC after: 3 months
* Split the copyout of *base at the end of getdirentries() out leaving thejhb2008-10-221-10/+23
| | | | | | | | | | rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland. Submitted by: ups MFC after: 1 week
* When setting error to EINVAL in 'fvp == tdvp' case, jump to out label,pjd2008-09-011-1/+3
| | | | | | | because if not, the error will be later overwritten by mac_vnode_check_rename_to() call. Reviewed by: rwatson
* Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.attilio2008-08-311-10/+10
| | | | | | Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>
* Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed threadattilio2008-08-281-13/+13
| | | | | | was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* If S_IFIFO is passed to mknod(2), invoke kern_mkfifoat(9) to create arwatson2008-06-221-0/+4
| | | | | | | | | | | | | | | | | FIFO, as required by SUSv3. No specific privilege check is performed in this case, as FIFOs may be created by unprivileged processes (subject to the normal file system name space restrictions that may be in place). Unlike the Apple implementation, we reject requests to create a FIFO using mknod(2) if there is a non-zero dev argument to the system call, which is permitted by the Open Group specification ("... undefined ..."). We might want to revise this if we find it causes compatibility problems for applications in practice. PR: kern/74242, kern/68459 Obtained from: Apple, Inc. MFC after: 3 weeks
* vfs_syscalls.c 1.452 mistakenly swapped the behavior of chown() and lchown().truckman2008-04-071-1/+1
|
* Implement thekib2008-03-311-125/+452
| | | | | | | | | | | | openat(2), faccessat(2), fchmodat(2), fchownat(2), fstatat(2), futimesat(2), linkat(2), mkdirat(2), mkfifoat(2), mknodat(2), readlinkat(2), renameat(2), symlinkat(2) syscalls. Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
* Add the support for the O_EXEC open(2) mode, as specified by thekib2008-03-311-2/+12
| | | | | | | POSIX Extended API Set Part 2 extension specification. Reviewed by: rwatson, rdivacky Tested by: pho
* This patch adds a new ktrace(2) record type, KTR_STRUCT, whose payloaddes2008-02-231-0/+12
| | | | | | | | | | | | | | | | | | | | | | | consists of the null-terminated name and the contents of any structure you wish to record. A new ktrstruct() function constructs and emits a KTR_STRUCT record. It is accompanied by convenience macros for struct stat and struct sockaddr. In kdump(1), KTR_STRUCT records are handled by a dispatcher function that runs stringent sanity checks on its contents before handing it over to individual decoding funtions for each type of structure. Currently supported structures are struct stat and struct sockaddr for the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK and AF_IPX is present but disabled, as I am unable to test it properly. Since 's' was already taken, the letter 't' is used by ktrace(1) to enable KTR_STRUCT trace points, and in kdump(1) to enable their decoding. Derived from patches by Andrew Li <andrew2.li@citi.com>. PR: kern/117836 MFC after: 3 weeks
* Change readlink(2)'s return type and type of the last argumentru2008-02-121-3/+3
| | | | | | to match POSIX. Prodded by: Alexey Lyashkov
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-27/+27
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* vn_lock() is currently only used with the 'curthread' passed as argument.attilio2008-01-101-20/+20
| | | | | | | | | | | | | | | | Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
* Make ftruncate a 'struct file' operation rather than a vnode operation.jhb2008-01-071-90/+0
| | | | | | | | | | | | | | This makes it possible to support ftruncate() on non-vnode file types in the future. - 'struct fileops' grows a 'fo_truncate' method to handle an ftruncate() on a given file descriptor. - ftruncate() moves to kern/sys_generic.c and now just fetches a file object and invokes fo_truncate(). - The vnode-specific portions of ftruncate() move to vn_truncate() in vfs_vnops.c which implements fo_truncate() for vnode file types. - Non-vnode file types return EINVAL in their fo_truncate() method. Submitted by: rwatson
* Remove explicit locking of struct file.jeff2007-12-301-19/+15
| | | | | | | | | | | | | - Introduce a finit() which is used to initailize the fields of struct file in such a way that the ops vector is only valid after the data, type, and flags are valid. - Protect f_flag and f_count with atomic operations. - Remove the global list of all files and associated accounting. - Rewrite the unp garbage collection such that it no longer requires the global list of all files and instead uses a list of all unp sockets. - Mark sockets in the accept queue so we don't incorrectly gc them. Tested by: kris, pho
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-31/+31
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* Rename mac_check_vnode_delete() MAC Framework and MAC Policy entryrwatson2007-09-101-2/+2
| | | | | | | | | | | | | | point to mac_check_vnode_unlink(), reflecting UNIX naming conventions. This is the first of several commits to synchronize the MAC Framework in FreeBSD 7.0 with the MAC Framework as it will appear in Mac OS X Leopard. Reveiwed by: csjp, Samy Bahra <sbahra at gwu dot edu> Submitted by: Jacques Vidrine <nectar at apple dot com> Obtained from: Apple Computer, Inc. Sponsored by: SPARTA, SPAWAR Approved by: re (bmah)
* Rework the routines to convert a 5.x+ statfs structure (with fixed-sizejhb2007-08-281-4/+46
| | | | | | | | | | | | | | | 64-bit counters) to a 4.x statfs structure (with long-sized counters). - For block counters, we scale up the block size sufficiently large so that the resulting block counts fit into a the long-sized (long for the ABI, so 32-bit in freebsd32) counters. In 4.x the NFS client's statfs VOP did this already. This can lie about the block size to 4.x binaries, but it presents a more accurate picture of the ratios of free and available space. - For non-block counters, fix the freebsd32 stats converter to cap the values at INT32_MAX rather than losing the upper 32-bits to match the behavior of the 4.x statfs conversion routine in vfs_syscalls.c Approved by: re (kensmith)
* Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncatepeter2007-07-041-3/+36
| | | | Approved by: re (kensmith)
OpenPOWER on IntegriCloud