summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_syscalls.c
Commit message (Collapse)AuthorAgeFilesLines
* In revoke(), verify that VCHR vnode indeed belongs to devfs.kib2010-07-061-1/+1
| | | | | Found and tested by: pho MFC after: 1 week
* Handle a case in kern_openat() when vn_open() change file type fromkib2010-04-131-15/+2
| | | | | | | | | | | | | | DTYPE_VNODE. Only acquire locks for O_EXLOCK/O_SHLOCK if file type is still vnode, since we allow for fcntl(2) to process with advisory locks for DTYPE_VNODE only. Another reason is that all fo_close() routines need to check and release locks otherwise. For O_TRUNC, call fo_truncate() instead of truncating the vnode. Discussed with: rwatson MFC after: 2 week
* Remove XXX comment. Add another comment, describing why f_vnode assignmentkib2010-04-131-1/+6
| | | | | | is useful. MFC after: 3 days
* Rename st_*timespec fields to st_*tim for POSIX 2008 compliance.ed2010-03-281-7/+7
| | | | | | | | | | | | | | | A nice thing about POSIX 2008 is that it finally standardizes a way to obtain file access/modification/change times in sub-second precision, namely using struct timespec, which we already have for a very long time. Unfortunately POSIX uses different names. This commit adds compatibility macros, so existing code should still build properly. Also change all source code in the kernel to work without any of the compatibility macros. This makes it all a less ambiguous. I am also renaming st_birthtime to st_birthtim, even though it was a local extension anyway. It seems Cygwin also has a st_birthtim.
* Actually make O_DIRECTORY work.ed2010-03-211-0/+4
| | | | | | According to POSIX open() must return ENOTDIR when the path name does not refer to a path name. Change vn_open() to respect this flag. This also simplifies the Linuxolator a bit.
* Fix a comment nit.jhb2010-03-111-2/+2
| | | | Submitted by: Alexander Best
* Allow lseek(SEEK_END) to work on disk devices by using the DIOCGMEDIASIZEjhb2010-03-031-1/+11
| | | | | | | to determine the media size. Submitted by: nox MFC after: 1 week
* Remove stale comment about socket buffer accounting from access(2) code.rwatson2010-02-271-2/+1
| | | | | | | | It is the case, however, that the uidinfo of the temporary credential set up for access(2) is not properly updated when its effective uid is changed. MFC after: 3 days
* Background:mckusick2010-01-111-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes. Solution: This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are: setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it. As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge. Reported by: jeff Security issues: rwatson
* Don't add VAPPEND if the file is not being opened for writing. Note that thistrasz2009-12-081-1/+1
| | | | | | | only affects cases where open(2) is being used improperly - i.e. when the user specifies O_APPEND without O_WRONLY or O_RDWR. Reviewed by: rwatson
* In fhopen, vfs_ref() the mount point while vnode is unlocked, to preventkib2009-09-061-1/+3
| | | | | | | | | | | vn_start_write(NULL, &mp) from operating on potentially freed or reused struct mount *. Remove unmatched vfs_rel() in cleanup. Noted and reviewed by: tegge Tested by: pho MFC after: 3 days
* Honor the vfs.timestamp_precision sysctl settings for utimes(path, NULL)kib2009-08-261-2/+1
| | | | | | | and similar calls. Obtained from: Petr Salinger, Debian GNU/kFreeBSD, Debian bug #489894 MFC after: 3 days
* Fix some LORs between vnode locks and filedescriptor table locks.jhb2009-07-311-8/+0
| | | | | | | | | | - Don't grab the filedesc lock just to read fd_cmask. - Drop vnode locks earlier when mounting the root filesystem and before sanitizing stdin/out/err file descriptors during execve(). Submitted by: kib Approved by: re (rwatson) MFC after: 1 week
* Rework vnode argument auditing to follow the same structure, in orderrwatson2009-07-281-8/+8
| | | | | | | | | | to avoid exposing ARG_ macros/flag values outside of the audit code in order to name which one of two possible vnodes will be audited for a system call. Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 month
* There is an optimization in chmod(1), that makes it not to call chmod(2)trasz2009-07-081-4/+23
| | | | | | | | | | | | | if the new file mode is the same as it was before; however, this optimization must be disabled for filesystems that support NFSv4 ACLs. Chmod uses pathconf(2) to determine whether this is the case - however, pathconf(2) always follows symbolic links, while the 'chmod -h' doesn't. This change adds lpathconf(3) to make it possible to solve that problem in a clean way. Reviewed by: rwatson (earlier version) Approved by: re (kib)
* For access(2) and eaccess(2), audit the requested access mode.rwatson2009-07-011-0/+1
| | | | | Approved by: re (audit argument blanket) MFC after: 3 days
* Audit the file descriptor number passed to lseek(2).rwatson2009-07-011-0/+1
| | | | | Approved by: re (kib) MFC after: 3 days
* Fix link(2) auditing: use the second audit record path for the new objectrwatson2009-07-011-1/+1
| | | | | | | name. Approved by: re (kib) MFC after: 3 days
* Replace AUDIT_ARG() with variable argument macros with a set more morerwatson2009-06-271-32/+32
| | | | | | | | | | | | | | specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week
* Remove the static from int hardlink_check_uid.bz2009-06-131-1/+1
| | | | | | | | | | | There is an external use in the opensolaris code. I am not sure how this ever worked but I have seen two reports of: link_elf: symbol hardlink_check_uid undefined lately. Reported by: Scott Ullrich (sullrich gmail.com), pfsense Reported by: Mister Olli (mister.olli googlemail.com)
* Simply shared vnode locking and extend it to also include fsync.ps2009-06-081-2/+8
| | | | | | | Also, in vop_write, no longer assert for exclusive locks on the vnode. Reviewed by: jhb, kmacy, jeffr
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Add hierarchical jails. A jail may further virtualize its environmentjamie2009-05-271-7/+1
| | | | | | | | | | | | | | | | | | | | | | by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
* - Implement a lockless file descriptor lookup algorithm injeff2009-05-141-14/+5
| | | | | | | | | | | | fget_unlocked(). - Save old file descriptor tables created on expansion until the entire descriptor table is freed so that pointers may be followed without regard for expanders. - Mark the file zone as NOFREE so we may attempt to reference potentially freed files. - Convert several fget_locked() users to fget_unlocked(). This requires us to manage reference counts explicitly but reduces locking overhead in the common case.
* Prevent overflow of uio_resid.kib2009-05-111-0/+3
| | | | | Noted by: jhb MFC after: 3 days
* Remove the thread argument from the FSD (File-System Dependent) parts ofattilio2009-05-111-7/+7
| | | | | | | | | | | | | | | | | the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
* Remove VOP_LEASE and supporting functions. This hasn't been used sincerwatson2009-04-101-24/+0
| | | | | | | | | | | | | | the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
* Don't make Linux stat() open character devices to resolve its name.ed2009-02-201-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing code calls kern_open() to resolve the vnode of a pathname right after a stat(). This is not correct, because it causes random character devices to be opened in /dev. This means ls'ing a tape streamer will cause it to rewind, for example. Changes I have made: - Add kern_statat_vnhook() to allow binary emulators to `post-process' struct stat, using the proper vnode. - Remove unneeded printf's from stat() and statfs(). - Make the Linuxolator use kern_statat_vnhook(), replacing translate_path_major_minor_at(). - Let translate_fd_major_minor() use vp->v_rdev instead of vp->v_un.vu_cdev. Result: crw-rw-rw- 1 root root 0, 14 Feb 20 13:54 /dev/ptmx crw--w---- 1 root adm 136, 0 Feb 20 14:03 /dev/pts/0 crw--w---- 1 root adm 136, 1 Feb 20 14:02 /dev/pts/1 crw--w---- 1 ed tty 136, 2 Feb 20 14:03 /dev/pts/2 Before this commit, ptmx also had a major number of 136, because it silently allocated and deallocated a pseudo-terminal. Device nodes that cannot be opened now have proper major/minor-numbers. Reviewed by: kib, netchild, rdivacky (thanks!)
* Use shared vnode locks when invoking VOP_READDIR().jhb2009-02-131-3/+2
| | | | MFC after: 1 month
* In some situations, mnt_lockref could go negative due to vfs_unbusy() beingtrasz2009-02-051-3/+5
| | | | | | | | | | | called without calling vfs_busy() first. This made umount(8) hang waiting for mnt_lockref to become zero, which would never happen. Reviewed by: kib Approved by: rwatson (mentor) Reported by: pho Found with: stress2 Sponsored by: FreeBSD Foundation
* Use shared vnode locks for fchdir().jhb2009-01-231-2/+2
| | | | Submitted by: ups
* Prevent overflow of uio_resid.pho2008-12-271-0/+2
| | | | Approved by: kib
* The quotactl, statfs and fstatfs syscall implementations may dereferencekib2008-12-181-6/+18
| | | | | | | | | | | | | | NULL pointer to struct mount if the looked up vnode is reclaimed. Also, these syscalls only mnt_ref() the mp, still allowing it to be unmounted; only struct mount memory is kept from being reused. Lock the vnode when doing name lookup, then reference its mount point, unlock the vnode and vfs_busy the mountpoint. This sequence shall take care of both races. Reported and tested by: pho Discussed with: attilio MFC after: 1 month
* In the nfsrv_fhtovp(), after the vfs_getvfs() function found the pointerkib2008-11-291-9/+9
| | | | | | | | | | | | | | | | | | | to the fs, but before a vnode on the fs is locked, unmount may free fs structures, causing access to destroyed data and freed memory. Introduce a vfs_busymp() function that looks up and busies found fs while mountlist_mtx is held. Use it in nfsrv_fhtovp() and in the implementation of the handle syscalls. Two other uses of the vfs_getvfs() in the vfs_subr.c, namely in sysctl_vfs_ctl and vfs_getnewfsid seems to be ok. In particular, sysctl_vfs_ctl is protected by Giant by being a non-sleeping sysctl handler, that prevents Giant-locked unmount code to interfere with it. Noted by: tegge Reviewed by: dfr Tested by: pho MFC after: 1 month
* Merge latest DTrace changes from Perforce.rodrigc2008-11-051-0/+15
| | | | Approved by: jb
* Use shared vnode locks for auditing vnode arguments as auditing onlyjhb2008-11-041-4/+4
| | | | | | does a VOP_GETATTR() which does not require an exclusive lock. Reviewed by: csjp, rwatson
* Use shared vnode locks instead of exclusive vnode locks for the access(),jhb2008-11-031-14/+14
| | | | | | | | | | chdir(), chroot(), eaccess(), fpathconf(), fstat(), fstatfs(), lseek() (when figuring out the current size of the file in the SEEK_END case), pathconf(), readlink(), and statfs() system calls. Submitted by: ups (mostly) Tested by: pho MFC after: 1 month
* Improve VFS locking:attilio2008-11-021-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
* Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessarytrasz2008-10-281-16/+18
| | | | | | | to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
* Whitespace fix.jhb2008-10-231-1/+2
|
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-231-2/+2
| | | | MFC after: 3 months
* Split the copyout of *base at the end of getdirentries() out leaving thejhb2008-10-221-10/+23
| | | | | | | | | | rest in kern_getdirentries(). Use kern_getdirentries() to implement freebsd32_getdirentries(). This fixes a bug where calls to getdirentries() in 32-bit binaries would trash the 4 bytes after the 'long base' in userland. Submitted by: ups MFC after: 1 week
* When setting error to EINVAL in 'fvp == tdvp' case, jump to out label,pjd2008-09-011-1/+3
| | | | | | | because if not, the error will be later overwritten by mac_vnode_check_rename_to() call. Reviewed by: rwatson
* Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.attilio2008-08-311-10/+10
| | | | | | Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>
* Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed threadattilio2008-08-281-13/+13
| | | | | | was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* If S_IFIFO is passed to mknod(2), invoke kern_mkfifoat(9) to create arwatson2008-06-221-0/+4
| | | | | | | | | | | | | | | | | FIFO, as required by SUSv3. No specific privilege check is performed in this case, as FIFOs may be created by unprivileged processes (subject to the normal file system name space restrictions that may be in place). Unlike the Apple implementation, we reject requests to create a FIFO using mknod(2) if there is a non-zero dev argument to the system call, which is permitted by the Open Group specification ("... undefined ..."). We might want to revise this if we find it causes compatibility problems for applications in practice. PR: kern/74242, kern/68459 Obtained from: Apple, Inc. MFC after: 3 weeks
* vfs_syscalls.c 1.452 mistakenly swapped the behavior of chown() and lchown().truckman2008-04-071-1/+1
|
* Implement thekib2008-03-311-125/+452
| | | | | | | | | | | | openat(2), faccessat(2), fchmodat(2), fchownat(2), fstatat(2), futimesat(2), linkat(2), mkdirat(2), mkfifoat(2), mknodat(2), readlinkat(2), renameat(2), symlinkat(2) syscalls. Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
* Add the support for the O_EXEC open(2) mode, as specified by thekib2008-03-311-2/+12
| | | | | | | POSIX Extended API Set Part 2 extension specification. Reviewed by: rwatson, rdivacky Tested by: pho
* This patch adds a new ktrace(2) record type, KTR_STRUCT, whose payloaddes2008-02-231-0/+12
| | | | | | | | | | | | | | | | | | | | | | | consists of the null-terminated name and the contents of any structure you wish to record. A new ktrstruct() function constructs and emits a KTR_STRUCT record. It is accompanied by convenience macros for struct stat and struct sockaddr. In kdump(1), KTR_STRUCT records are handled by a dispatcher function that runs stringent sanity checks on its contents before handing it over to individual decoding funtions for each type of structure. Currently supported structures are struct stat and struct sockaddr for the AF_INET, AF_INET6 and AF_UNIX families; support for AF_APPLETALK and AF_IPX is present but disabled, as I am unable to test it properly. Since 's' was already taken, the letter 't' is used by ktrace(1) to enable KTR_STRUCT trace points, and in kdump(1) to enable their decoding. Derived from patches by Andrew Li <andrew2.li@citi.com>. PR: kern/117836 MFC after: 3 weeks
OpenPOWER on IntegriCloud