summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_subr.c
Commit message (Collapse)AuthorAgeFilesLines
* syncdelay, filedelay, dirdelay, metadelay are ints, not time_t's,dillon2001-10-271-4/+4
| | | | and can also be made static.
* Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has adillon2001-10-261-36/+76
| | | | | | | | | | | | | | | | real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week
* Add missing TAILQ_INSERT_TAIL's which somehow didn't get comitted withdillon2001-10-251-0/+2
| | | | the recent vnode cleanup.
* Change the vnode list under the mount point from a LIST to a TAILQdillon2001-10-231-11/+12
| | | | | | in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days
* fix minor bug in kern.minvnodes sysctl. Use OID_AUTO.dillon2001-10-161-1/+1
|
* WS Cleanupdillon2001-10-081-4/+0
|
* vinvalbuf() was only waiting for write-I/O to complete. It really has todillon2001-10-051-4/+15
| | | | | | | | wait for both read AND write I/O to complete. Only NFS calls vinvalbuf() on an active vnode (when the server indicates that the file is stale), so this bug fix only effects NFS clients. MFC after: 3 days
* After extensive testing it has been determined that adding complexitydillon2001-10-011-29/+68
| | | | | | | | | | | | | | | | | | to avoid removing higher level directory vnodes from the namecache has no perceivable effect and will be removed. This is especially true when vmiodirenable is turned on, which it is by default now. ( vmiodirenable makes a huge difference in directory caching ). The vfs.vmiodirenable and vfs.nameileafonly sysctls have been left in to allow further testing, but I expect to rip out vfs.nameileafonly soon too. I have also determined through testing that the real problem with numvnodes getting too large is due to the VM Page cache preventing the vnode from being reclaimed. The directory stuff made only a tiny dent relative to Poul's original code, enough so that some tests succeeded. But tests with several million small files show that the bigger problem is the VM Page cache. This will have to be addressed by a future commit. MFC after: 3 days
* KSE Milestone 2julian2001-09-121-89/+90
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* If a file has been completely unlinked, stop automatically syncing thepeter2001-08-271-0/+3
| | | | | | file. ffs will discard any pending dirty pages when it is closed, so we may as well not waste time trying to clean them. This doesn't stop other things from writing it out, eg: pageout, fsync(2) etc.
* Revert previous accidental commit. FWIW, it was part of enablingpeter2001-07-271-23/+0
| | | | | | VM caching of disks through mmap() and stopping syncing of open files that had their last reference in the fs removed (ie: their unsync'ed pages get discarded on close already, so I made it stop syncing too).
* Fix cut/paste blunder. Serves me right for doing a last minute tweakpeter2001-07-271-0/+23
| | | | | | to what I had for some time. Submitted by: bde
* With Alfred's permission, remove vm_mtx in favor of a fine-grained approachdillon2001-07-041-10/+6
| | | | | | | | | (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
* - Fix a mntvnode and vnode interlock reversal.jhb2001-06-281-6/+13
| | | | - Protect the mnt_vnode list with the mntvnode lock.
* Introduce a global lock for the vm subsystem (vm_mtx).alfred2001-05-191-1/+14
| | | | | | | | | | | | | | | | | | | vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
* Change the second argument of vflush() to an integer that specifiesiedowse2001-05-161-11/+45
| | | | | | | | | | | | | | | | | | | | the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp
* In vrele() and vput(), avoid triggering the confusing "missed vn_close"iedowse2001-05-111-2/+6
| | | | | | | KASSERT when vp->v_usecount is zero or negative. In this case, the "v*: negative ref cnt" panic that follows is much more appropriate. Reviewed by: mckusick
* vfs_subr.c is getting rather fat. The underlying repocopy and thisphk2001-04-261-346/+0
| | | | commit moves the filesystem export handling code to vfs_export.c
* Move the netexport structure from the fs-specific mountstructurephk2001-04-251-5/+61
| | | | | | | | | | | | | | to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>
* Correct #includes to work with fixed sys/mount.h.grog2001-04-231-0/+2
|
* Reclaim directory vnodes held in namecache if few free vnodes aretanimura2001-04-181-0/+26
| | | | | | | | | | | | | | | available. Only directory vnodes holding no child directory vnodes held in v_cache_src are recycled, so that directory vnodes near the root of the filesystem hierarchy remain in namecache and directory vnodes are not reclaimed in cascade. The period of vnode reclaiming attempt and the number of vnodes attempted to reclaim can be tuned via sysctl(2). Suggested by: tegge Approved by: phk
* This patch removes the VOP_BWRITE() vector.phk2001-04-171-1/+1
| | | | | | | | | | | | | VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.
* Add a NOTE_REVOKE flag for vnodes, which is triggered from within vclean().jlemon2001-02-231-0/+5
| | | | | | | Use this to tell a filter attached to a vnode that the underlying vnode is no longer valid, by returning EV_EOF. PR: kern/25309, kern/25206
* Switch to using a struct xucred instead of a struct xucred when notgreen2001-02-181-2/+10
| | | | | | | | | | | | | | | | | actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL). This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout. Reviewed by: bde
* Change and clean the mutex lock interface.bmilekic2001-02-091-87/+87
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
* Mechanical change to use <sys/queue.h> macro API instead ofphk2001-02-041-1/+1
| | | | | | | fondling implementation details. Created with: sed(1) Reviewed by: md5(1)
* Properly lock new vnode.bp2001-01-311-4/+9
| | | | Reminded by: tegge
* Convert all simplelocks to mutexes and remove the simplelock implementations.jasone2001-01-241-55/+55
|
* o The move to using VADMIN under vaccess() resulted in some systemrwatson2001-01-231-1/+1
| | | | | | | | | | | calls returning EACCES instead of EPERM. This patch modifies vaccess() to return EPERM instead of EACCES if VADMIN is among the requested rights. This affects functions normally limited to the owners of a file, such as chmod(), as EPERM is the error indicating that privilege would allow the operation, rather than a chance in mandatory or discretionary rights. Reported by: bde
* Stick the kthread API in a kthread_* namespace, and the specialized kprocjhb2000-12-151-2/+2
| | | | | | functions in a kproc_* namespace. Reviewed by: -arch
* Use proper mutex locking when calling setrunnable from speedup_syncer().mckusick2000-12-131-3/+2
| | | | Submitted by: Tor.Egge@fast.no
* Convert more malloc+bzero to malloc+M_ZERO.dwmalone2000-12-081-4/+2
| | | | | Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
* Untangle vfsinit() a bit. Use seperate sysinit functions rather thanpeter2000-12-061-2/+4
| | | | having a super-function calling bits all over the place.
* Correct int/long type mismatch in the proper place this time. freevnodesgallatin2000-12-021-3/+3
| | | | | | | | | | | and numvnodes are longs in the kernel. They should remain longs in systat, what really needs to change is that they should be using SYSCTL_LONG rather than SYSCTL_INT. I also changed wantfreevnodes to SYSCTL_LONG because I happened to notice it. I wish there was a way to find all of these automatically.. Pointed out by: bde
* Use msleep() instead of mtx_exit()/tsleep() so that we release the lock andjhb2000-12-011-13/+7
| | | | go to sleep as an "atomic" operation.
* Get rid of a bogus mtx_exit (it was attempting to release anmckusick2000-11-301-1/+0
| | | | | | already released mutex). Submitted by: "Chris Knight" <chris@aims.com.au>
* Implement a low-memory deadlock solution.dillon2000-11-181-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
* Clear the VFREE flag when the vnode is removed from the free list integge2000-11-021-0/+1
| | | | | | | getnewvnode(). Otherwise routines called from VOP_INACTIVE() might attempt to remove the vnode from a free list the vnode isn't on, causing corruption. PR: 18012
* Take VBLK devices further out of their missery.phk2000-11-021-12/+11
| | | | This should fix the panic I introduced in my previous commit on this topic.
* Catch up to moving headers:jhb2000-10-201-1/+1
| | | | | - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
* o Introduce new VOP_ACCESS() flag VADMIN, allowing file systems to performrwatson2000-10-191-0/+5
| | | | | | | | | | | | | | | | | | | | "administrative" authorization checks. In most cases, the VADMIN test checks to make sure the credential effective uid is the same as the file owner. o Modify vaccess() to set VADMIN as an available right if the uid is appropriate. o Modify references to uid-based access control operations such that they now always invoke VOP_ACCESS() instead of using hard-coded policy checks. o This allows alternative UFS policies to be implemented by replacing only ufs_access() (such as mandatory system policies). o VOP_ACCESS() requires the caller to hold an exclusive vnode lock on the vnode: I believe that new invocations of VOP_ACCESS() are always called with the lock held. o Some direct checks of the uid remain, largely associated with the QUOTA and SUIDDIR code. Reviewed by: eivind Obtained from: TrustedBSD Project
* Blow away the v_specmountpoint define, replacing it with what it waseivind2000-10-091-2/+2
| | | | defined as (rdev->si_mountpoint)
* Do not call lockdestroy() for v_vnlock, which may point to a lock in ajasone2000-10-061-4/+1
| | | | | | deeper vfs stacking layer. Submitted by: bp
* Style fixes based on comments by bdeeivind2000-10-051-20/+31
|
* Convert lockmgr locks from using simple locks to using mutexes.jasone2000-10-041-53/+58
| | | | | | Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
* Move KASSERTs which checks value of v_usecount after vnode locking, sobp2000-10-021-2/+4
| | | | it will not produce wrong alarms.
* Do the right thing if bdevvp is called twice for the same device.mckusick2000-09-271-0/+2
| | | | Obtained from: Poul-Henning Kamp <phk@freebsd.org>
* Add a lock structure to vnode structure. Previously it was either allocatedbp2000-09-251-4/+5
| | | | | | | | | | | | | | | | | | | separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_no*lock*() functions maintain only interlock lock. vop_std*lock*() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick
* Style fixes:eivind2000-09-221-52/+107
| | | | | | | | | | * Add lots of comments * Convert a couple of assertions to KASSERT() * Minimal whitespace & misapplied {} fixes * Convert #if 0 to #if COMPILING_LINT for code we presently do not support, but want to keep available. Reviewed by: adrian, markm
* Staticize addalias()eivind2000-09-221-1/+2
|
OpenPOWER on IntegriCloud