summaryrefslogtreecommitdiffstats
path: root/sys/sys/vnode.h
Commit message (Collapse)AuthorAgeFilesLines
* MFC r279362:kib2015-03-061-0/+1
| | | | | | | The VNASSERT in vflush() FORCECLOSE case is trying to panic early to prevent errors from yanking devices out from under filesystems. Only care about special vnodes on devfs, special nodes on other kinds of filesystems do not have special properties.
* MFC r276008:kib2015-01-041-0/+1
| | | | | Add VN_OPEN_NAMECACHE flag for vn_open_cred(9), which requests that the created file name was cached. Use the flag for core dumps.
* MFC r273131:kib2014-10-221-0/+3
| | | | | When vnode bypass cannot be performed on the cdev file descriptor for read/write/poll/ioctl, call standard vnode filedescriptor fop.
* MFC r272534:kib2014-10-181-0/+1
| | | | | Add IO_RANGELOCKED flag for vn_rdwr(9), which specifies that vnode is not locked, but range is.
* Add function and wrapper to switch lockmgr and vnode lock back tokib2014-09-051-0/+1
| | | | | | auto-promotion of shared to exclusive. Approved by: re (gjb)
* MFC r268612:kib2014-07-281-0/+1
| | | | | | | Add helper helper vfs_write_suspend_umnt(). Fix the bug in the FFS unmount, when suspension failed, the ufs extattrs were not reinitialized.
* MFC r268606:kib2014-07-281-0/+4
| | | | | | Generalize vn_get_ino() to allow filesystems to use custom vnode producer. Convert inline copies of vn_get_ino() in msdosfs and cd9660 into the uses of vn_get_ino_gen().
* MFC r267564:kib2014-06-241-0/+2
| | | | | In msdosfs_setattr(), add a check for result of the utimes(2) permissions test. Refactor the permission checks for utimes(2).
* There are several code sequences likekib2013-07-091-1/+4
| | | | | | | | | | | | | | | | | | | vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
* When renaming a directory from one parent directory to another,mckusick2013-03-201-0/+2
| | | | | | | | | | | | | | we need to call ufs_checkpath() to walk from our new location to the root of the filesystem to ensure that we do not encounter ourselves along the way. Until now, we accomplished this by reading the ".." entries of each directory in our path until we reached the root (or encountered an error). This change tries to avoid the I/O of reading the ".." entries by first looking them up in the name cache and only doing the I/O when the name cache lookup fails. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks
* Implement the helper function vn_io_fault_pgmove(), intended to use bykib2013-03-151-0/+2
| | | | | | | | | | | the filesystem VOP_READ() and VOP_WRITE() implementations in the same way as vn_io_fault_uiomove() over the unmapped buffers. Helper provides the convenient wrapper over the pmap_copy_pages() for struct uio consumers, taking care of the TDP_UIOHELD situations. Sponsored by: The FreeBSD Foundation Tested by: pho MFC after: 2 weeks
* The softdep freeblks workitem might hold a reference on the dquot.kib2013-02-271-0/+1
| | | | | | | | | | | | | | | | | | | Current dqflush() panics when a dquot with with non-zero refcount is encountered. The situation is possible, because quotas are turned off before softdep workitem queue if flushed, due to the quota file writes might create softdep workitems. Make the encountering an active dquot in dqflush() not fatal, return the error from quotaoff() instead. Ignore the quotaoff() failures when ffs_flushfiles() is called in the course of softdep_flushfiles() loop, until the last iteration. At the last loop, the quotas must be closed, and because SU workitems should be already flushed, the references to dquot are gone. Sponsored by: The FreeBSD Foundation Reported and tested by: pho Reviewed by: mckusick MFC after: 2 weeks
* Rearrange the struct bufobj and struct vnode layouts to reducekib2013-01-141-16/+18
| | | | | | | | | | padding. On the amd64 kernel with INVARIANTS turned off, size of the struct vnode is reduced from 496 to 472 bytes, saving 24 bytes of memory and KVA per vnode. Noted and reviewed by: peter Tested by: pho Sponsored by: The FreeBSD Foundation
* Add exported vfs_hash_index() function, which calculates the canonicalkib2013-01-141-0/+1
| | | | | | | | | | | | pre-masked hash for the given vnode. The function assumes that vp->v_hash is initialized by the filesystem vnode instantiation function. At the moment, it is only done if filesystem uses vfs_hash_insert(). Reviewed by: peter Tested by: peter, pho (previous version) Sponsored by: The FreeBSD Foundation MFC after: 5 days
* Add flags argument to vfs_write_resume() and removekib2013-01-111-2/+1
| | | | | | vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation
* The process_deferred_inactive() function locks the vnodes of the ufskib2013-01-011-0/+1
| | | | | | | | | | | | | | | mount, which means that is must not be called while the snaplock is owned. The vfs_write_resume(9) does call the function as the VFS_SUSP_CLEAN() method, which is too early and falls into the region still protected by snaplock. Add yet another flag for the vfs_write_resume_flags() to avoid calling suspension cleanup handler after the suspend is lifted, and use it in the ffs_snapshot() call to vfs_write_resume. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Make it possible to atomically resume writes on the mount and accountkib2012-12-281-0/+3
| | | | | | | | | | | | | | | | | | | the write start, by adding a variation of the vfs_write_resume(9) which accepts flags. Use the new function to prevent a deadlock between parallel suspension and snapshotting a UFS mount. The ffs_snapshot() code performed vfs_write_resume() followed by vn_start_write() while owning the snaplock. If the suspension intervene between resume and vn_start_write(), the deadlock occured after the suspending thread tried to lock the snaplock, most typically during the write in the ffs_copyonwrite(). Reported and tested by: Andreas Longwitz <longwitz@incore.de> Reviewed by: mckusick MFC after: 2 weeks X-MFC-note: make the vfs_write_resume(9) function a macro after the MFC, in HEAD
* - Add NOCAPCHECK flag to namei that allows lookup to work even if the processpjd2012-11-271-0/+1
| | | | | | | | | | | | is in capability mode. - Add VN_OPEN_NOCAPCHECK flag for vn_open_cred() to will ne converted into NOCAPCHECK namei flag. This functionality will be used to enable core dumps for sandboxed processes. Reviewed by: rwatson Obtained from: WHEEL Systems MFC after: 2 weeks
* Add a KPI to allow to reserve some amount of space in the numvnodeskib2012-10-141-0/+2
| | | | | | | | | | | | | counter, without actually allocating the vnodes. The supposed use of the getnewvnode_reserve(9) is to reclaim enough free vnodes while the code still does not hold any resources that might be needed during the reclamation, and to consume the slack later for getnewvnode() calls made from the innards. After the critical block is finished, the caller shall free any reserve left, by getnewvnode_drop_reserve(9). Reviewed by: avg Tested by: pho MFC after: 1 week
* Split the second half of vn_open_cred() (after a vnode has been found viajhb2012-06-081-0/+2
| | | | | | | | | | a lookup or created via VOP_CREATE()) into a new vn_open_vnode() function and use this function in fhopen() instead of duplicating code from vn_open_cred() directly. Tested by: pho Reviewed by: kib MFC after: 2 weeks
* vn_io_fault() is a facility to prevent page faults while filesystemskib2012-05-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | perform copyin/copyout of the file data into the usermode buffer. Typical filesystem hold vnode lock and some buffer locks over the VOP_READ() and VOP_WRITE() operations, and since page fault handler may need to recurse into VFS to get the page content, a deadlock is possible. The facility works by disabling page faults handling for the current thread and attempting to execute i/o while allowing uiomove() to access the usermode mapping of the i/o buffer. If all buffer pages are resident, uiomove() is successfull and request is finished. If EFAULT is returned from uiomove(), the pages backing i/o buffer are faulted in and held, and the copyin/out is performed using uiomove_fromphys() over the held pages for the second attempt of VOP call. Since pages are hold in chunks to prevent large i/o requests from starving free pages pool, and since vnode lock is only taken for i/o over the current chunk, the vnode lock no longer protect atomicity of the whole i/o request. Use newly added rangelocks to provide the required atomicity of i/o regardind other i/o and truncations. Filesystems need to explicitely opt-in into the scheme, by setting the MNTK_NO_IOPF struct mount flag, and optionally by using vn_io_fault_uiomove(9) helper which takes care of calling uiomove() or converting uio into request for uiomove_fromphys(). Reviewed by: bf (comments), mdf, pjd (previous version) Tested by: pho Tested by: flo, Gustau P?rez <gperez entel upc edu> (previous version) MFC after: 2 months
* Add a rangelock implementation, intended to be used to range-lockingkib2012-05-301-0/+11
| | | | | | | | | the i/o regions of the vnode data space. The implementation is quite simple-minded, it uses the list of the lock requests, ordered by arrival time. Each request may be for read or for write. The implementation is fair FIFO. MFC after: 2 month
* Clarify that the v_lockf is advisory lock list.kib2012-05-301-1/+1
| | | | MFC after: 3 days
* Add a vn_bmap_seekhole(9) vnode helper which can be used by anykib2012-05-261-0/+2
| | | | | | | filesystem which supports VOP_BMAP(9) to implement SEEK_HOLE/SEEK_DATA commands for lseek(2). MFC after: 2 weeks
* Remove unused thread argument to vrecycle().trasz2012-04-231-1/+1
| | | | Reviewed by: kib
* Remove unused thread argument from vtruncbuf().trasz2012-04-231-2/+2
| | | | Reviewed by: kib
* This change creates a new list of active vnodes associated withmckusick2012-04-201-1/+2
| | | | | | | | | | | | | | | | | | | | a mount point. Active vnodes are those with a non-zero use or hold count, e.g., those vnodes that are not on the free list. Note that this list is in addition to the list of all the vnodes associated with a mount point. To avoid adding another set of linkage pointers to the vnode structure, the active list uses the existing linkage pointers used by the free list (previously named v_freelist, now renamed v_actfreelist). This update adds the MNT_VNODE_FOREACH_ACTIVE interface that loops over just the active vnodes associated with a mount point (typically less than 1% of the vnodes associated with the mount point). Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
* Drop export of vdestroy() function from kern/vfs_subr.c as it ismckusick2012-04-171-1/+0
| | | | | | | | | | | | | | | used only as a helper function in that file. Replace sole call to vbusy() with inline code in vholdl(). Replace sole calls to vfree() and vdestroy() with inline code in vdropl(). The Clang compiler already inlines these functions, so they do not show up in a kernel backtrace which is confusing. Also you cannot set their frame in kgdb which means that it is impossible to view their local variables. So, while the produced code is unchanged, the debugging should be easier. Discussed with: kib MFC after: 2 weeks
* Export vinactive() from kern/vfs_subr.c (e.g., make it no longermckusick2012-04-111-0/+1
| | | | | | | | | | static and declare its prototype in sys/vnode.h) so that it can be called from process_deferred_inactive() (in ufs/ffs/ffs_snapshot.c) instead of the body of vinactive() being cut and pasted into process_deferred_inactive(). Reviewed by: kib MFC after: 2 weeks
* Remove unused define.trasz2012-03-251-1/+0
| | | | Discussed with: kib
* Remove fifo.h. The only used function declaration from the header iskib2012-03-111-0/+3
| | | | | | migrated to sys/vnode.h. Submitted by: gianni
* Post r230394, the Lookup RPC counts for both NFS clients increasedrmacklem2012-03-031-2/+3
| | | | | | | | | | | | | | | | | | | | | significantly. Upon investigation this was caused by name cache misses for lookups of "..". For name cache entries for non-".." directories, the cache entry serves double duty. It maps both the named directory plus ".." for the parent of the directory. As such, two ctime values (one for each of the directory and its parent) need to be saved in the name cache entry. This patch adds an entry for ctime of the parent directory to the name cache. It also adds an additional uma zone for large entries with this time value, in order to minimize memory wastage. As well, it fixes a couple of cases where the mtime of the parent directory was being saved instead of ctime for positive name cache entries. With this patch, Lookup RPC counts return to values similar to pre-r230394 kernels. Reported by: bde Discussed with: kib Reviewed by: jhb MFC after: 2 weeks
* Introduce VOP_UNP_BIND(), VOP_UNP_CONNECT(), and VOP_UNP_DETACH()trociny2012-02-291-0/+3
| | | | | | | | | | | | | | | | | | | operations for setting and accessing vnode's v_socket field. The operations are necessary to implement proper unix socket handling on layered file systems like nullfs(5). This change fixes the long standing issue with nullfs(5) being in that unix sockets did not work between lower and upper layers: if we bound to a socket on the lower layer we could connect only to the lower path; if we bound to the upper layer we could connect only to the upper path. The new behavior is one can connect to both the lower and the upper paths regardless what layer path one binds to. PR: kern/51583, kern/159663 Suggested by: kib Reviewed by: arch MFC after: 2 weeks
* When detaching an unix domain socket, uipc_detach() checkstrociny2012-02-251-0/+2
| | | | | | | | | | | | | | | unp->unp_vnode pointer to detect if there is a vnode associated with (binded to) this socket and does necessary cleanup if there is. The issue is that after forced unmount this check may be too late as the unp_vnode is reclaimed and the reference is stale. To fix this provide a helper function that is called on a socket vnode reclamation to do necessary cleanup. Pointed by: kib Reviewed by: kib MFC after: 2 weeks
* Fix found places where uio_resid is truncated to int.kib2012-02-211-1/+1
| | | | | | | | | Add the sysctl debug.iosize_max_clamp, enabled by default. Setting the sysctl to zero allows to perform the SSIZE_MAX-sized i/o requests from the usermode. Discussed with: bde, das (previous versions) MFC after: 1 month
* Trim 8 unused bytes from struct vnode on 64-bit architectures.kib2012-02-081-2/+2
| | | | Reviewed by: alc
* Rename cache_lookup_times() to cache_lookup() and retire the old API andjhb2012-02-061-3/+1
| | | | ABI stub for cache_lookup().
* Current implementations of sync(2) and syncer vnode fsync() VOP useskib2012-02-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | mnt_noasync counter to temporary remove MNTK_ASYNC mount option, which is needed to guarantee a synchronous completion of the initiated i/o before syscall or VOP return. Global removal of MNTK_ASYNC option is harmful because not only i/o started from corresponding thread becomes synchronous, but all i/o is synchronous on the filesystem which is initiated during sync(2) or syncer activity. Instead of removing MNTK_ASYNC from mnt_kern_flag, provide a local thread flag to disable async i/o for current thread only. Use the opportunity to move DOINGASYNC() macro into sys/vnode.h and consistently use it through places which tested for MNTK_ASYNC. Some testing demonstrated 60-70% improvements in run time for the metadata-intensive operations on async-mounted UFS volumes, but still with great deviation due to other reasons. Reviewed by: mckusick Tested by: scottl MFC after: 2 weeks
* Close a race in NFS lookup processing that could result in stale name cachejhb2012-01-201-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | entries on one client when a directory was renamed on another client. The root cause for the stale entry being trusted is that each per-vnode nfsnode structure has a single 'n_ctime' timestamp used to validate positive name cache entries. However, if there are multiple entries for a single vnode, they all share a single timestamp. To fix this, extend the name cache to allow filesystems to optionally store a timestamp value in each name cache entry. The NFS clients now fetch the timestamp associated with each name cache entry and use that to validate cache hits instead of the timestamps previously stored in the nfsnode. Another part of the fix is that the NFS clients now use timestamps from the post-op attributes of RPCs when adding name cache entries rather than pulling the timestamps out of the file's attribute cache. The latter is subject to races with other lookups updating the attribute cache concurrently. Some more details: - Add a variant of nfsm_postop_attr() to the old NFS client that can return a vattr structure with a copy of the post-op attributes. - Handle lookups of "." as a special case in the NFS clients since the name cache does not store name cache entries for ".", so we cannot get a useful timestamp. It didn't really make much sense to recheck the attributes on the the directory to validate the namecache hit for "." anyway. - ABI compat shims for the name cache routines are present in this commit so that it is safe to MFC. MFC after: 2 weeks
* Introduce vn_path_to_global_path()mm2012-01-151-0/+2
| | | | | | | | | | | | | This function updates path string to vnode's full global path and checks the size of the new path string against the pathlen argument. In vfs_domount(), sys_unmount() and kern_jail_set() this new function is used to update the supplied path argument to the respective global path. Unbreaks jailed zfs(8) with enforce_statfs set to 1. Reviewed by: kib MFC after: 1 month
* Add post-VOP hooks for VOP_DELETEEXTATTR() and VOP_SETEXTATTR() and usejhb2011-12-231-0/+2
| | | | | | | | | | these to trigger a NOTE_ATTRIB EVFILT_VNODE kevent when the extended attributes of a vnode are changed. Note that OS X already implements this behavior. Reviewed by: rwatson MFC after: 2 weeks
* Add the posix_fadvise(2) system call. It is somewhat similar tojhb2011-11-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
* Generalize ffs_pages_remove() into vn_pages_remove().mm2011-08-251-0/+1
| | | | | | | | | | | Remove mapped pages for all dataset vnodes in zfs_rezget() using new vn_pages_remove() to fix mmapped files changed by zfs rollback or zfs receive -F. PR: kern/160035, kern/156933 Reviewed by: kib, pjd Approved by: re (kib) MFC after: 1 week
* Add the fo_chown and fo_chmod methods to struct fileops and use themkib2011-08-161-1/+9
| | | | | | | | | | to implement fchown(2) and fchmod(2) support for several file types that previously lacked it. Add MAC entries for chown/chmod done on posix shared memory and (old) in-kernel posix semaphores. Based on the submission by: glebius Reviewed by: rwatson Approved by: re (bz)
* Update locking annotations for the struct vnode.kib2011-07-101-5/+2
| | | | MFC after: 3 days
* Implement fully asynchronous partial truncation with softupdates journalingjeff2011-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
* Add the posix_fallocate(2) syscall. The default implementation inmdf2011-04-181-0/+1
| | | | | | | | | | | | | | vop_stdallocate() is filesystem agnostic and will run as slow as a read/write loop in userspace; however, it serves to correctly implement the functionality for filesystems that do not implement a VOP_ALLOCATE. Note that __FreeBSD_version was already bumped today to 900036 for any ports which would like to use this function. Also reserve space in the syscall table for posix_fadvise(2). Reviewed by: -arch (previous version)
* Put the general logic for being a CPU hog into a new functionmdf2011-02-021-2/+0
| | | | | | | | | | should_yield(). Use this in various places. Encapsulate the common case of check-and-yield into a new function maybe_yield(). Change several checks for a magic number of iterations to use should_yield() instead. MFC after: 1 week
* Remove prtactive variable and related printf()s in the vop_inactivekib2010-11-191-1/+0
| | | | | | | | and vop_reclaim() methods. They seems to be unused, and the reported situation is normal for the forced unmount. MFC after: 1 week X-MFC-note: keep prtactive symbol in vfs_subr.c
* Add dedicated routines to toggle lockmgr flags such as LK_NOSHARE andjhb2010-08-201-4/+2
| | | | | | | | | | | | LK_CANRECURSE after a lock is created. Use them to implement macros that otherwise manipulated the flags directly. Assert that the associated lockmgr lock is exclusively locked by the current thread when manipulating these flags to ensure the flag updates are safe. This last change required some minor shuffling in a few filesystems to exclusively lock a brand new vnode slightly earlier. Reviewed by: kib MFC after: 3 days
OpenPOWER on IntegriCloud