summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_default.c
Commit message (Collapse)AuthorAgeFilesLines
* The r241025 fixed the case when a binary, executed from nullfs mount,kib2012-11-021-0/+20
| | | | | | | | | | | | | | | | | | | | | | | was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-4/+1
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Fix the mis-handling of the VV_TEXT on the nullfs vnodes.kib2012-09-281-0/+30
| | | | | | | | | | | | | | | | If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
* When synchronously syncing a device (MNT_WAIT), wait for buffersmckusick2012-06-091-2/+11
| | | | | | | | | to become available. Otherwise we may excessively spin and fail with ``fsync: giving up on dirty''. Reviewed by: kib Tested by: Peter Holm MFC after: 1 week
* Skip directory entries with zero inode number during traversal.gleb2012-05-161-2/+2
| | | | | | | | Entries with zero inode number are considered placeholders by libc and UFS. Fix remaining uses of VOP_READDIR in kernel: vop_stdvptocnp, unionfs. Sponsored by: Google Summer of Code 2011
* Replace the MNT_VNODE_FOREACH interface with MNT_VNODE_FOREACH_ALL.mckusick2012-04-171-10/+5
| | | | | | | | | | | | | | | | | | | | | The primary changes are that the user of the interface no longer needs to manage the mount-mutex locking and that the vnode that is returned has its mutex locked (thus avoiding the need to check to see if its is DOOMED or other possible end of life senarios). To minimize compatibility issues for third-party developers, the old MNT_VNODE_FOREACH interface will remain available so that this change can be MFC'ed to 9. Following the MFC to 9, MNT_VNODE_FOREACH will be removed in head. The reason for this update is to prepare for the addition of the MNT_VNODE_FOREACH_ACTIVE interface that will loop over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()trociny2012-02-291-0/+27
| | | | | | | | | | | | | | | | | | | operations for setting and accessing vnode's v_socket field. The operations are necessary to implement proper unix socket handling on layered file systems like nullfs(5). This change fixes the long standing issue with nullfs(5) being in that unix sockets did not work between lower and upper layers: if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. PR: kern/51583, kern/159663 Suggested by: kib Reviewed by: arch MFC after: 2 weeks
* Existing VOP_VPTOCNP() interface has a fatal flow that is critical forkib2011-11-191-1/+1
| | | | | | | | | | | | | | | | | | | | | nullfs. The problem is that resulting vnode is only required to be held on return from the successfull call to vop, instead of being referenced. Nullfs VOP_INACTIVE() method reclaims the vnode, which in combination with the VOP_VPTOCNP() interface means that the directory vnode returned from VOP_VPTOCNP() is reclaimed in advance, causing vn_fullpath() to error with EBADF or like. Change the interface for VOP_VPTOCNP(), now the dvp must be referenced. Convert all in-tree implementations of VOP_VPTOCNP(), which is trivial, because vhold(9) and vref(9) are similar in the locking prerequisites. Out-of-tree fs implementation of VOP_VPTOCNP(), if any, should have no trouble with the fix. Tested by: pho Reviewed by: mckusick MFC after: 3 weeks (subject of re approval)
* Add the posix_fadvise(2) system call. It is somewhat similar tojhb2011-11-041-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
* Add a lock flags argument to the VFS_FHTOVP() file systemrmacklem2011-05-221-1/+2
| | | | | | | | | | | method, so that callers can indicate the minimum vnode locking requirement. This will allow some file systems to choose to return a LK_SHARED locked vnode when LK_SHARED is specified for the flags argument. This patch only adds the flag. It does not change any file system to use it and all callers specify LK_EXCLUSIVE, so file system semantics are not changed. Reviewed by: kib
* Allow VOP_ALLOCATE to be iterative, and have kern_posix_fallocate(9)mdf2011-04-191-23/+21
| | | | | | drive looping and potentially yielding. Requested by: kib
* Add the posix_fallocate(2) syscall. The default implementation inmdf2011-04-181-0/+131
| | | | | | | | | | | | | | vop_stdallocate() is filesystem agnostic and will run as slow as a read/write loop in userspace; however, it serves to correctly implement the functionality for filesystems that do not implement a VOP_ALLOCATE. Note that __FreeBSD_version was already bumped today to 900036 for any ports which would like to use this function. Also reserve space in the syscall table for posix_fadvise(2). Reviewed by: -arch (previous version)
* If we read zero bytes from the directory, early out with ENOENTbrian2010-08-251-2/+6
| | | | | | | | | rather than forging ahead and interpreting garbage buffer content and dirent structures. This change backs out r211684 which was essentially a no-op. MFC after: 1 week
* uio_resid isn't updated by VOP_READDIR for nfs filesystems. Usebrian2010-08-231-3/+2
| | | | | | | | | the uio_offset adjustment instead to calculate a correct *len. Without this change, we run off the end of the directory data we're reading and panic horribly for nfs filesystems. MFC after: 1 week
* Add VOP_ADVLOCKPURGE so that the file system is called when purgingzml2010-05-121-0/+11
| | | | | | | locks (in the case where the VFS impl isn't using lf_*) Submitted by: Matthew Fleming <matthew.fleming@isilon.com> Reviewed by: zml, dfr
* Supply default implementation of VOP_RENAME() that does neccessarykib2010-04-021-0/+16
| | | | | | | | | | | | unlocks and unreferences for argument vnodes, as expected by kern_renameat(9), and returns EOPNOTSUPP. This fixes locks and reference leaks when rename is attempted on fs that does not implement rename. PR: kern/107439 Based on submission by: Mikolaj Golub <to.my.trociny gmail com> Tested by: Mikolaj Golub MFC after: 1 week
* Use vput() instead of VOP_UNLOCK()+vrele(). The comment here is out-dated,pjd2010-02-181-4/+1
| | | | we no longer pass thread pointer to VOP_UNLOCK().
* Revert r198873. Having different VAPPEND semantics for VOP_ACCESS(9)trasz2009-11-111-8/+0
| | | | and VOP_ACCESSX(9) is not a good idea.
* While VAPPEND without VWRITE makes sense for VOP_ACCESSX(9) (e.g. to checktrasz2009-11-041-0/+8
| | | | | | for the permission to create subdirectory (ACE4_ADD_SUBDIRECTORY)), it doesn't really make sense for VOP_ACCESS(9). Also, many VOP_ACCESS(9) implementations don't expect that. Make sure we don't confuse them.
* Provide default implementation for VOP_ACCESS(9), so that filesystems whichtrasz2009-10-011-0/+15
| | | | | | | want to provide VOP_ACCESSX(9) don't have to implement both. Note that this commit makes implementation of either of these two mandatory. Reviewed by: kib
* Add explicit struct ucred * argument for VOP_VPTOCNP, to be used bykib2009-06-211-4/+5
| | | | | | | | | | vn_open_cred in default implementation. Valid struct ucred is needed for audit and MAC, and curthread credentials may be wrong. This further requires modifying the interface of vn_fullpath(9), but it is out of scope of this change. Reviewed by: rwatson
* Add another flags argument to vn_open_cred. Use it to specify that somekib2009-06-211-1/+1
| | | | | | | | | | | | | | vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson
* Add mac_framework.h include missed when MAC code was (presumably) copiedrwatson2009-06-051-0/+2
| | | | from another file.
* Add VOP_ACCESSX, which can be used to query for newly added V*trasz2009-05-301-0/+17
| | | | | | | | permissions, such as VWRITE_ACL. For a filsystems that don't implement it, there is a default implementation, which works as a wrapper around VOP_ACCESS. Reviewed by: rwatson@
* Remove the thread argument from the FSD (File-System Dependent) parts ofattilio2009-05-111-12/+8
| | | | | | | | | | | | | | | | | the VFS. Now all the VFS_* functions and relating parts don't want the context as long as it always refers to curthread. In some points, in particular when dealing with VOPs and functions living in the same namespace (eg. vflush) which still need to be converted, pass curthread explicitly in order to retain the old behaviour. Such loose ends will be fixed ASAP. While here fix a bug: now, UFS_EXTATTR can be compiled alone without the UFS_EXTATTR_AUTOSTART option. VFS KPI is heavilly changed by this commit so thirdy parts modules needs to be recompiled. Bump __FreeBSD_version in order to signal such situation.
* Remove VOP_LEASE and supporting functions. This hasn't been used sincerwatson2009-04-101-1/+0
| | | | | | | | | | | | | | the removal of NQNFS, but was left in in case it was required for NFSv4. Since our new NFSv4 client and server can't use it for their requirements, GC the old mechanism, as well as other unused lease- related code and interfaces. Due to its impact on kernel programming and binary interfaces, this change should not be MFC'd. Proposed by: jeff Reviewed by: jeff Discussed with: rmacklem, zach loafman @ isilon
* Add a default implementation for VOP_VPTOCNP(9) which scans the parentmarcus2009-03-081-1/+235
| | | | | | | | | | | | directory of a vnode to find a dirent with a matching file number. The name from that dirent is then used to provide the component name. Note: if the initial vnode argument is not a directory itself, then the default VOP_VPTOCNP(9) implementation still returns ENOENT. Reviewed by: kib Approved by: kib Tested by: pho
* Extract the no_poll() and vop_nopoll() code into the common routinekib2009-03-061-11/+1
| | | | | | | | | poll_no_poll(). Return a poll_no_poll() result from devfs_poll_f() when filedescriptor does not reference the live cdev, instead of ENXIO. Noted and tested by: hps MFC after: 1 week
* Add a new VOP, VOP_VPTOCNP, which translates a vnode to its component namemarcus2008-12-121-0/+8
| | | | | | | | | | | | | | | | on a best-effort basis. Teach vn_fullpath to use this new VOP if a regular VFS cache lookup fails. This VOP is designed to supplement the VFS cache to provide a better chance that a vnode-to-name lookup will succeed. Currently, an implementation for devfs is being committed. The default implementation is to return ENOENT. A big thanks to kib for the mentorship on this, and to pho for running it through his stress test suite. Reviewed by: arch Approved by: kib
* Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed threadattilio2008-08-281-6/+10
| | | | | | was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Move the head of byte-level advisory lock list from thekib2008-04-161-1/+40
| | | | | | | | | | | | | | | | | | | | | | filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
* - Complete part of the unfinished bufobj work by consistently usingjeff2008-03-221-15/+12
| | | | | | | | | | | | | | | | | BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
* Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it isattilio2008-02-251-4/+1
| | | | | | | | | always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>
* - Introduce lockmgr_args() in the lockmgr space. This function performsattilio2008-02-151-1/+2
| | | | | | | | | | | the same operation of lockmgr() but accepting a custom wmesg, prio and timo for the particular lock instance, overriding default values lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo. - Use lockmgr_args() in order to implement BUF_TIMELOCK() - Cleanup BUF_LOCK() - Remove LK_INTERNAL as it is nomore used in the lockmgr namespace Tested by: Andrea Barberio <insomniac at slackware dot it>
* Cleanup lockmgr interface and exported KPI:attilio2008-01-241-4/+3
| | | | | | | | | | | | | | | | | | | | - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo
* VOP_LOCK1() (and so VOP_LOCK()) and VOP_UNLOCK() are only used inattilio2008-01-131-4/+5
| | | | | | | | | | | conjuction with 'thread' argument passing which is always curthread. Remove the unuseful extra-argument and pass explicitly curthread to lower layer functions, when necessary. KPI results broken by this change, which should affect several ports, so version bumping and manpage update will be further committed. Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
* Since renaming of vop_lock to _vop_lock, pre- and post-conditionkib2007-05-181-2/+2
| | | | | | function calls are no more generated for vop_lock. Rename _vop_lock to vop_lock1 to satisfy tools/vnode_if.awk assumption about vop naming conventions. This restores pre/post-condition calls.
* Remove VFS_VPTOFH entirely. API is already broken and it is good time topjd2007-02-161-14/+1
| | | | | | do it. Suggested by: rwatson
* Move vnode-to-file-handle translation from vfs_vptofh to vop_vptofh method.pjd2007-02-151-0/+11
| | | | | | | | | | | | | | | | This way we may support multiple structures in v_data vnode field within one file system without using black magic. Vnode-to-file-handle should be VOP in the first place, but was made VFS operation to keep interface as compatible as possible with SUN's VFS. BTW. Now Solaris also implements vnode-to-file-handle as VOP operation. VFS_VPTOFH() was left for API backward compatibility, but is marked for removal before 8.0-RELEASE. Approved by: mckusick Discussed with: many (on IRC) Tested with: ufs, msdosfs, cd9660, nullfs and zfs
* change vop_lock handling to allowing tracking of callers' file and line forkmacy2006-11-131-3/+5
| | | | | | acquisition of lockmgr locks Approved by: scottl (standing in for mentor rwatson)
* Don't try to obtain a reference to a nonexisting (NULL) mount structure integge2006-09-201-4/+6
| | | | default VOP_GETWRITEMOUNT().
* - GETWRITEMOUNT now returns a referenced mountpoint to prevent itsjeff2006-03-311-1/+15
| | | | | | | | identity from changing. This is possible now that mounts are not freed. Discussed with: tegge Tested by: kris Sponsored by: Isilon Systems, Inc.
* - Add a comment warning about an anomalous condition where we VOP_UNLOCKjeff2006-01-301-0/+1
| | | | | and then vrele rather than vput because we would like to VOP_UNLOCK with a specific thread.
* Add marker vnodes to ensure that all vnodes associated with the mount point aretegge2006-01-091-3/+5
| | | | | | iterated over when using MNT_VNODE_FOREACH. Reviewed by: truckman
* Eradicate caddr_t from the VFS API.des2005-12-141-1/+1
|
* In vop_stdpathconf(ap) also default for _PC_NAME_MAX and _PC_PATH_MAX.phk2005-08-171-0/+6
|
* - Replace the series of DEBUG_LOCKS hacks which tried to save the vn_lockjeff2005-08-031-5/+0
| | | | | | | caller by saving the stack of the last locker/unlocker in lockmgr. We also put the stack in KTR at the moment. Contributed by: Antoine Brodin <antoine.brodin@laposte.net>
* - Add and enhance asserts related to the wrong bufobj panic.jeff2005-06-141-0/+3
| | | | | Sponsored by: Isilon Systems, Inc. Approved by: re (blanket vfs)
* Allow EVFILT_VNODE events to work on every filesystem type, not justssouhlal2005-06-091-0/+8
| | | | | | | | | | | | | | | UFS by: - Making the pre and post hooks for the VOP functions work even when DEBUG_VFS_LOCKS is not defined. - Moving the KNOTE activations into the corresponding VOP hooks. - Creating a MNTK_NOKNOTE flag for the mnt_kern_flag field of struct mount that permits filesystems to disable the new behavior. - Creating a default VOP_KQFILTER function: vfs_kqfilter() My benchmarks have not revealed any performance degradation. Reviewed by: jeff, bde Approved by: rwatson, jmg (kqueue changes), grehan (mentor)
* - Remove unnecessary spls.jeff2005-05-011-10/+2
|
OpenPOWER on IntegriCloud