summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_lookup.c
Commit message (Collapse)AuthorAgeFilesLines
* Support only LOOKUP operation for "/" in relookup() because lookup()jh2010-03-261-11/+9
| | | | | | can't succeed for CREATE, DELETE and RENAME. Discussed with: bde
* Only audit pathnames in namei(9) if copying the directory string completesrwatson2010-02-021-5/+10
| | | | | | | successfully. Continue to do this before the empty path check so that the ENOENT returned in that case gets an empty string token in the BSM record. MFC after: 3 days
* When rename("a", "b/.") is performed, target namei() call returnskib2009-11-101-0/+6
| | | | | | | | | | | | | dvp == vp. Rename syscall does not check for the case, and at least ufs_rename() cannot deal with it. POSIX explicitely requires that both rename(2) and rmdir(2) return EINVAL when any of the pathes end in "/.". Detect the slashdot lookup for RENAME or REMOVE in lookup(), and return EINVAL. Reported by: Jim Meyering <jim meyering net> Tested by: simon, pho MFC after: 1 week
* Eliminate ARG_UPATH[12] arguments to AUDIT_ARG_UPATH() and insteadrwatson2009-07-291-2/+2
| | | | | | | | | | | | provide specific macros, AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() to capture path information for audit records. This allows us to move the definitions of ARG_* out of the public audit header file, as they are an implementation detail of our current kernel-internal audit record, which may change. Approved by: re (kensmith) Obtained from: TrustedBSD Project MFC after: 1 month
* Rework vnode argument auditing to follow the same structure, in orderrwatson2009-07-281-4/+4
| | | | | | | | | | to avoid exposing ARG_ macros/flag values outside of the audit code in order to name which one of two possible vnodes will be audited for a system call. Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 month
* Audit file descriptors passed to fooat(2) system calls, which are usedrwatson2009-07-281-1/+6
| | | | | | | | | | | | | | | instead of the root/current working directory as the starting point for lookups. Up to two such descriptors can be audited. Add audit record BSM encoding for fooat(2). Note: due to an error in the OpenBSM 1.1p1 configuration file, a further change is required to that file in order to fix openat(2) auditing. Approved by: re (kib) Reviewed by: rdivacky (fooat(2) portions) Obtained from: TrustedBSD Project MFC after: 1 month
* Replace AUDIT_ARG() with variable argument macros with a set more morerwatson2009-06-271-9/+6
| | | | | | | | | | | | | | specific macros for each audit argument type. This makes it easier to follow call-graphs, especially for automated analysis tools (such as fxr). In MFC, we should leave the existing AUDIT_ARG() macros as they may be used by third-party kernel modules. Suggested by: brooks Approved by: re (kib) Obtained from: TrustedBSD Project MFC after: 1 week
* Eliminate trailing_slash, which was made redundant in r193028.des2009-06-061-23/+11
| | | | | | | Remove a couple of 4-year-old "temporary" KASSERTs. Improve comments. MFC after: 1 week
* Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERICrwatson2009-06-051-1/+0
| | | | | | | | and used in a large number of files, but also because an increasing number of incorrect uses of MAC calls were sneaking in due to copy-and-paste of MAC-aware code without the associated opt_mac.h include. Discussed with: pjd
* Let vfs_lookup() return ENOTDIR if the path has a trailing slash anddes2009-05-291-8/+12
| | | | | | | | | | | | | | | | | the last component is a symlink to something that isn't a directory. We introduce a new namei flag, TRAILINGSLASH, which is set by lookup() if the last component is followed by a slash. The trailing slash is then stripped, as before. If the final component is a symlink, lookup() will return to namei(), which will expand the symlink and call lookup() with the new path. When all symlinks have been resolved, lookup() checks if the TRAILINGSLASH flag is set, and if it is, and the vnode it ended up with is not a directory, it returns ENOTDIR. PR: kern/21768 Submitted by: Eygene Ryabinkin <rea-fbsd@codelabs.ru> MFC after: 3 weeks
* Fix misleading comment.des2009-05-291-1/+1
| | | | MFC after: 1 week
* Add hierarchical jails. A jail may further virtualize its environmentjamie2009-05-271-0/+7
| | | | | | | | | | | | | | | | | | | | | | by creating a child jail, which is visible to that jail and to any parent jails. Child jails may be restricted more than their parents, but never less. Jail names reflect this hierarchy, being MIB-style dot-separated strings. Every thread now points to a jail, the default being prison0, which contains information about the physical system. Prison0's root directory is the same as rootvnode; its hostname is the same as the global hostname, and its securelevel replaces the global securelevel. Note that the variable "securelevel" has actually gone away, which should not cause any problems for code that properly uses securelevel_gt() and securelevel_ge(). Some jail-related permissions that were kept in global variables and set via sysctls are now per-jail settings. The sysctls still exist for backward compatibility, used only by the now-deprecated jail(2) system call. Approved by: bz (mentor)
* Fix a kernel compilation error, introduced after r191990, by definingattilio2009-05-111-0/+3
| | | | | | thread with curthread in the AUDIT case. Reported by: dchagin
* Remove the thread argument from the FSD (File-System Dependent) parts ofattilio2009-05-111-3/+4
| | | | | | | | | | | | | | | | | the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
* Add SDT DTrace probes for namei():rwatson2009-04-061-1/+14
| | | | | | | | | vfs:namei:lookup:entry takes parent directory vnode pointer, path to look up, and lookup flags. vfs:namei:lookup:return takes an error value, and if successful, the returned vnode pointer. MFC after: 1 month
* When a file lookup fails due to encountering a doomed vnode from a forcedjhb2009-03-241-4/+6
| | | | | | | unmount, consistently return ENOENT rather than EBADF. Reviewed by: kib MFC after: 1 month
* Gah, fix the code to match the comment. For non-open lookups use ajhb2009-03-111-1/+1
| | | | | | shared vnode lock for the leaf vnode if LOCKSHARED is set. Submitted by: rdivacky
* Add a new internal mount flag (MNTK_EXTENDED_SHARED) to indicate that ajhb2009-03-111-4/+38
| | | | | | | | | | | | | | | | | | | | | | | | filesystem supports additional operations using shared vnode locks. Currently this is used to enable shared locks for open() and close() of read-only file descriptors. - When an ISOPEN namei() request is performed with LOCKSHARED, use a shared vnode lock for the leaf vnode only if the mount point has the extended shared flag set. - Set LOCKSHARED in vn_open_cred() for requests that specify O_RDONLY but not O_CREAT. - Use a shared vnode lock around VOP_CLOSE() if the file was opened with O_RDONLY and the mountpoint has the extended shared flag set. - Adjust md(4) to upgrade the vnode lock on the vnode it gets back from vn_open() since it now may only have a shared vnode lock. - Don't enable shared vnode locks on FIFO vnodes in ZFS and UFS since FIFO's require exclusive vnode locks for their open() and close() routines. (My recent MPSAFE patches for UDF and cd9660 already included this change.) - Enable extended shared operations on UFS, cd9660, and UDF. Submitted by: ups Reviewed by: pjd (ZFS bits) MFC after: 1 month
* Do not return success and doomed vnode from lookup. LK_UPGRADE allowskib2008-12-181-0/+4
| | | | | | | the vnode to be reclaimed. Tested by: pho MFC after: 1 month
* Update ZFS from version 6 to 13 and bring some FreeBSD-specific changes.pjd2008-11-171-9/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This bring huge amount of changes, I'll enumerate only user-visible changes: - Delegated Administration Allows regular users to perform ZFS operations, like file system creation, snapshot creation, etc. - L2ARC Level 2 cache for ZFS - allows to use additional disks for cache. Huge performance improvements mostly for random read of mostly static content. - slog Allow to use additional disks for ZFS Intent Log to speed up operations like fsync(2). - vfs.zfs.super_owner Allows regular users to perform privileged operations on files stored on ZFS file systems owned by him. Very careful with this one. - chflags(2) Not all the flags are supported. This still needs work. - ZFSBoot Support to boot off of ZFS pool. Not finished, AFAIK. Submitted by: dfr - Snapshot properties - New failure modes Before if write requested failed, system paniced. Now one can select from one of three failure modes: - panic - panic on write error - wait - wait for disk to reappear - continue - serve read requests if possible, block write requests - Refquota, refreservation properties Just quota and reservation properties, but don't count space consumed by children file systems, clones and snapshots. - Sparse volumes ZVOLs that don't reserve space in the pool. - External attributes Compatible with extattr(2). - NFSv4-ACLs Not sure about the status, might not be complete yet. Submitted by: trasz - Creation-time properties - Regression tests for zpool(8) command. Obtained from: OpenSolaris
* A few style nits.jhb2008-11-031-1/+2
|
* Improve VFS locking:attilio2008-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
* Enable shared locks for path name lookups on supported filesystems (NFSjhb2008-10-011-1/+1
| | | | client, UFS, and ZFS) by default.
* Remove the LOOKUP_SHARED kernel option. Instead, make vfs.lookup_sharedjhb2008-10-011-5/+1
| | | | a loader tunable (it was already a sysctl).
* Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.attilio2008-08-311-2/+2
| | | | | | Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>
* Implement the linux syscallskib2008-04-081-2/+11
| | | | | | | | | openat, mkdirat, mknodat, fchownat, futimesat, fstatat, unlinkat, renameat, linkat, symlinkat, readlinkat, fchmodat, faccessat. Submitted by: rdivacky Sponsored by: Google Summer of Code 2007 Tested by: pho
* Add the support for the AT_FDCWD and fd-relative name lookups to thekib2008-03-311-3/+23
| | | | | | | | | namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
* In keeping with style(9)'s recommendations on macros, use a ';'rwatson2008-03-161-1/+1
| | | | | | | | | after each SYSINIT() macro invocation. This makes a number of lightweight C parsers much happier with the FreeBSD kernel source, including cflow's prcc and lxr. MFC after: 1 month Discussed with: imp, rink
* Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it isattilio2008-02-251-2/+2
| | | | | | | | | always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>
* Introduce some functions in the vnode locks namespace and in the ffsattilio2008-02-241-1/+1
| | | | | | | | | | | | | | | namespace in order to handle lockmgr fields in a controlled way instead than spreading all around bogus stubs: - VN_LOCK_AREC() allows lock recursion for a specified vnode - VN_LOCK_ASHARE() allows lock sharing for a specified vnode In FFS land: - BUF_AREC() allows lock recursion for a specified buffer lock - BUF_NOREC() disallows recursion for a specified buffer lock Side note: union_subr.c::unionfs_node_update() is the only other function directly handling lockmgr fields. As this is not simple to fix, it has been left behind as "sole" exception.
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-12/+11
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* vn_lock() is currently only used with the 'curthread' passed as argument.attilio2008-01-101-7/+12
| | | | | | | | | | | | | | | | Remove this argument and pass curthread directly to underlying VOP_LOCK1() VFS method. This modify makes the code cleaner and in particular remove an annoying dependence helping next lockmgr() cleanup. KPI results, obviously, changed. Manpage and FreeBSD_version will be updated through further commits. As a side note, would be valuable to say that next commits will address a similar cleanup about VFS methods, in particular vop_lock1 and vop_unlock. Tested by: Diego Sardina <siarodx at gmail dot com>, Andrea Di Pasquale <whyx dot it at gmail dot com>
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* Fix some locking cases where we ask for exclusively locked vnode, but we getpjd2007-09-211-0/+8
| | | | | | | | shared locked vnode in instead when vfs.lookup_shared is set to 1. Discussed with: kib, kris Tested by: kris Approved by: re (kensmith)
* Universally adopt most conventional spelling of acquire.rwatson2007-05-271-1/+1
|
* Replace custom file descriptor array sleep lock constructed using a mutexrwatson2007-04-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and flags with an sxlock. This leads to a significant and measurable performance improvement as a result of access to shared locking for frequent lookup operations, reduced general overhead, and reduced overhead in the event of contention. All of these are imported for threaded applications where simultaneous access to a shared file descriptor array occurs frequently. Kris has reported 2x-4x transaction rate improvements on 8-core MySQL benchmarks; smaller improvements can be expected for many workloads as a result of reduced overhead. - Generally eliminate the distinction between "fast" and regular acquisisition of the filedesc lock; the plan is that they will now all be fast. Change all locking instances to either shared or exclusive locks. - Correct a bug (pointed out by kib) in fdfree() where previously msleep() was called without the mutex held; sx_sleep() is now always called with the sxlock held exclusively. - Universally hold the struct file lock over changes to struct file, rather than the filedesc lock or no lock. Always update the f_ops field last. A further memory barrier is required here in the future (discussed with jhb). - Improve locking and reference management in linux_at(), which fails to properly acquire vnode references before using vnode pointers. Annotate improper use of vn_fullpath(), which will be replaced at a future date. In fcntl(), we conservatively acquire an exclusive lock, even though in some cases a shared lock may be sufficient, which should be revisited. The dropping of the filedesc lock in fdgrowtable() is no longer required as the sxlock can be held over the sleep operation; we should consider removing that (pointed out by attilio). Tested by: kris Discussed with: jhb, kris, attilio, jeff
* Rather than ignoring any error return from getnewvnode() in nameiinit(),rwatson2007-03-311-1/+5
| | | | | | | | explicitly test and panic. This should not ever happen, but if it does, this is a preferred failure mode to a NULL pointer dereference in kernel. Coverity CID: 1716 Found with: Coverity Prevent(tm)
* If both ISDOTDOT and NOCROSSMOUNT are set then lookup() might breaks outkib2007-02-151-3/+4
| | | | | | | | | of the special handling for ".." and perform an ISDOTDOT VOP_LOOKUP() for a filesystem root vnode. Handle this case inside lookup(). Submitted by: tegge PR: 92785 MFC after: 1 week
* Below is slightly edited description of the LOR by Tor Egge:kib2007-01-221-4/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | -------------------------- [Deadlock] is caused by a lock order reversal in vfs_lookup(), where [some] process is trying to lock a directory vnode, that is the parent directory of covered vnode) while holding an exclusive vnode lock on covering vnode. A simplified scenario: root fs var fs / A / (/var) D /var B /log (/var/log) E vfs lock C vfs lock F Within each file system, the lock order is clear: C->A->B and F->D->E When traversing across mounts, the system can choose between two lock orders, but everything must then follow that lock order: L1: C->A->B | +->F->D->E L2: F->D->E | +->C->A->B The lookup() process for namei("/var") mixes those two lock orders: VOP_LOOKUP() obtains B while A is held vfs_busy() obtains a shared lock on F while A and B are held (follows L1, violates L2) vput() releases lock on B VOP_UNLOCK() releases lock on A VFS_ROOT() obtains lock on D while shared lock on F is held vfs_unbusy() releases shared lock on F vn_lock() obtains lock on A while D is held (violates L1, follows L2) dounmount() follows L1 (B is locked while F is drained). Without unmount activity, vfs_busy() will always succeed without blocking and the deadlock isn't triggered (the system behaves as if L2 is followed). With unmount, you can get 4 processes in a deadlock: p1: holds D, want A (in lookup()) p2: holds shared lock on F, want D (in VFS_ROOT()) p3: holds B, want drain lock on F (in dounmount()) p4: holds A, want B (in VOP_LOOKUP()) You can have more than one instance of p2. The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode. - Tor Egge To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp is actually not used by the callers of namei. Thus, placeholder deadfs vnode vp_crossmp is introduced that is filled into ni_dvp. Idea by: ups Reviewed by: tegge, ups, jeff, rwatson (mac interaction) Tested by: Peter Holm MFC after: 2 weeks
* Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.hrwatson2006-10-221-1/+1
| | | | | | | | | | | | | begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA
* Fix for a potential bug caught by Coverity. Pointed out to me by Kris Kennaway.mohans2006-09-141-1/+2
|
* Fixes up the handling of shared vnode lock lookups in the NFS client,mohans2006-09-131-7/+23
| | | | | | | | | | | | | | | | | | | | adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@
* Remove register, use ANSI function headers.rwatson2006-08-051-15/+9
|
* We now spell "inode" as "vnode" in the VFS layer, so update commentrwatson2006-08-051-3/+3
| | | | | | | for new world order. MFC after: 3 days Pointed out by: mckusick
* Lock giant when assigning ni_vp and keep vfslocked state valid.kris2006-04-291-0/+1
| | | | Committed for: jeff
* - Consistently track ni_dvp and ni_vp with dvfslocked and vfslocked ratherjeff2006-04-281-13/+15
| | | | | | | | | | | | than trying to optimize it into a single lock. This adds more calls to lock giant with non smpsafe filesystems but is the only way to reliably hold the correct lock. - Remove an invalid assert in the mountedhere case in lookup and fix the code to properly deal with the scenario. We can actually have a lookup that returns dp == dvp with mountedhere set with certain unmount races. Tested by: kris Reported by: kris/mohans
* - LK_RETRY means nothing when passed to VOP_LOCK. Call vn_lock instead.jeff2006-03-311-1/+2
| | | | | | | | | | - Move the vn_lock of the dvp until after we've unbusied the filesystem to avoid a LOR with the mount point lock. - In the v_mountedhere while loop we acquire a new instance of giant each time through without releasing the first. This would cause us to leak Giant. Sponsored by: Isilon Systems, Inc.
* - Don't check v_mount for NULL to determine if a vnode has been recycled.jeff2006-02-061-2/+2
| | | | | | | Use the more appropriate VI_DOOMED flag instead. Sponsored by: Isilon Systems, Inc. MFC After: 1 week
* Add AUDITVNODE[12] flags to namei(), which cause namei() to audit pathrwatson2006-02-051-0/+19
| | | | | | | | | and vnode attribute information for looked up vnodes during the lookup operation. This will allow consumers of namei() to specify that this information be added to the in-process audit record. Submitted by: wsalamon Obtained from: TrustedBSD Project
* - Solve a problem where a vput could be called on an outgoing directoryjeff2006-02-011-5/+13
| | | | | | | | | without Giant held. Do this by tracking the vfslocked state for the directory seperate from the child. This is only important in the case where we cross a mountpoint. Sponsored by: Isilon Systems, Inc. MFC After: 3 days
OpenPOWER on IntegriCloud