summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_bio.c
Commit message (Collapse)AuthorAgeFilesLines
* Change and clean the mutex lock interface.bmilekic2001-02-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
* This commit represents work mainly submitted by Tor and slightly modifieddillon2001-02-041-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | by myself. It solves a serious vm_map corruption problem that can occur with the buffer cache when block sizes > 64K are used. This code has been heavily tested in -stable but only tested somewhat on -current. An MFC will occur in a few days. My additions include the vm_map_simplify_entry() and minor buffer cache boundry case fix. Make the buffer cache use a system map for buffer cache KVM rather then a normal map. Ensure that VM objects are not allocated for system maps. There were cases where a buffer map could wind up with a backing VM object -- normally harmless, but this could also result in the buffer cache blocking in places where it assumes no blocking will occur, possibly resulting in corrupted maps. Fix a minor boundry case in the buffer cache size limit is reached that could result in non-optimal code. Add vm_map_simplify_entry() calls to prevent 'creeping proliferation' of vm_map_entry's in the buffer cache's vm_map. Previously only a simple linear optimization was made. (The buffer vm_map typically has only a handful of vm_map_entry's. This stabilizes it at that level permanently). PR: 20609 Submitted by: (Tor Egge) tegge
* Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variablesjake2001-01-101-5/+5
| | | | other then curproc.
* This implements a better launder limiting solution. There was a solutiondillon2000-12-261-135/+128
| | | | | | | | | | | | | | | | | | | in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.
* Stick the kthread API in a kthread_* namespace, and the specialized kprocjhb2000-12-151-2/+2
| | | | | | functions in a kproc_* namespace. Reviewed by: -arch
* Implement a low-memory deadlock solution.dillon2000-11-181-80/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
* Take VBLK devices further out of their missery.phk2000-11-021-1/+1
| | | | This should fix the panic I introduced in my previous commit on this topic.
* Catch up to moving headers:jhb2000-10-201-2/+1
| | | | | - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h
* Convert lockmgr locks from using simple locks to using mutexes.jasone2000-10-041-2/+2
| | | | | | Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
* Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT.bp2000-09-121-11/+13
| | | | | | | They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon
* Major update to the way synchronization is done in the kernel. Highlightsjasone2000-09-071-7/+13
| | | | | | | | | | | | | | | include: * Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) * Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
* Now that buffer locks can be recursive, we need to delete the panicsmckusick2000-07-251-4/+0
| | | | | | that complain about them. Obtained from: Brian Fundakowski Feldman <green@FreeBSD.org>
* Add snapshots to the fast filesystem. Most of the changes supportmckusick2000-07-111-1/+5
| | | | | | | | | | | | | | | | | | | | the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
* Virtualizes & untangles the bioops operations vector.phk2000-06-161-14/+13
| | | | Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@
* Back out the previous change to the queue(3) interface.jake2000-05-261-1/+1
| | | | | | It was not discussed and should probably not happen. Requested by: msmith and others
* Change the way that the queue(3) structures are declared; don't assume thatjake2000-05-231-1/+1
| | | | | | | | the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
* Separate the struct bio related stuff out of <sys/buf.h> intophk2000-05-051-0/+1
| | | | | | | | | | | | | | | <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
* Give struct bio it's own call back mechanism.phk2000-05-011-7/+11
|
* Hmm, diff/patch still doesn't like me.phk2000-04-301-2/+2
| | | | Missed one s/biowait/bufwait/g
* s/biowait/bufwait/gphk2000-04-291-3/+3
| | | | Prodded by: several.
* Remove a leftover dysonism.phk2000-04-291-4/+0
|
* Remove ~25 unneeded #include <sys/conf.h>phk2000-04-191-1/+0
| | | | Remove ~60 unneeded #include <sys/malloc.h>
* Don't declare common variables in include files:phk2000-04-181-0/+1
| | | | move buftimelock til vfs_bio.c where it is initialized.
* Complete the bio/buf divorce for all code below devfs::strategyphk2000-04-151-2/+8
| | | | | | | | | | Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS
* Move B_ERROR flag to b_ioflags and call it BIO_ERROR.phk2000-04-021-26/+38
| | | | | | | | | | | | | (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.
* Draw the outline of "struct bio".phk2000-04-021-3/+3
| | | | Struct bio is the future carrier of I/O requests for "struct buf".
* Commit the buffer cache cleanup patch to 4.x and 5.x. This patch fixes adillon2000-03-271-198/+138
| | | | | | | | | | | | | | | | fragmentation problem due to geteblk() reserving too much space for the buffer and imposes a larger granularity (16K) on KVA reservations for the buffer cache to avoid fragmentation issues. The buffer cache size calculations have been redone to simplify them (fewer defines, better comments, less chance of running out of KVA). The geteblk() fix solves a performance problem that DG was able reproduce. This patch does not completely fix the KVA fragmentation problems, but it goes a long way Mostly Reviewed by: bde and others Approved by: jkh
* Rename the existing BUF_STRATEGY() to DEV_STRATEGY()phk2000-03-201-7/+7
| | | | | | | | substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.
* Remove B_READ, B_WRITE and B_FREEBUF and replace them with a newphk2000-03-201-23/+33
| | | | | | | | | | | | | | | | | | | | | field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.
* Eliminate the undocumented, experimental, non-delivering and highlyphk2000-03-161-26/+0
| | | | dangerous MAX_PERF option.
* Need to reset the buffer pointer to avoid reconsidering the same buffermckusick2000-01-181-0/+1
| | | | | | | again (without this the rollback analysis was being lost). Should reduce the write count for most workloads. Submitted by: Craig A Soules <soules+@andrew.cmu.edu>
* Give vn_isdisk() a second argument where it can return a suitable errno.phk2000-01-101-3/+3
| | | | Suggested by: bde
* Several performance improvements for soft updates have been added:mckusick2000-01-101-5/+144
| | | | | | | | | | | | | | | 1) Fastpath deletions. When a file is being deleted, check to see if it was so recently created that its inode has not yet been written to disk. If so, the delete can proceed to immediately free the inode. 2) Background writes: No file or block allocations can be done while the bitmap is being written to disk. To avoid these stalls, the bitmap is copied to another buffer which is written thus leaving the original available for futher allocations. 3) Link count tracking. Constantly track the difference in i_effnlink and i_nlink so that inodes that have had no change other than i_effnlink need not be written. 4) Identify buffers with rollback dependencies so that the buffer flushing daemon can choose to skip over them.
* Introduce a mechanism to suspend/resume system processes. Suspend syncerluoqi2000-01-071-10/+21
| | | | and bufdaemon prior to disk sync during system shutdown.
* Reimplement buf_daemon / getnewbuf() interaction for dealing withdillon1999-12-201-46/+43
| | | | | | | | stressful situations. buf_daemon now makes a distinction between being woken up and its sleep timing out, and as a consequence is now much better able to dynamically tune itself to its environment. Reviewed by: Alfred Perlstein <bright@wintelcom.net>
* Collect read and write counts for filesystems. This new codemckusick1999-12-011-20/+0
| | | | | | | | | | | | | | drops the counting in bwrite and puts it all in spec_strategy. I did some tests and verified that the counts collected for writes in spec_strategy is identical to the counts that we previously collected in bwrite. We now also get read counts (async reads come from requests for read-ahead blocks). Note that you need to compile a new version of mount to get the read counts printed out. The old mount binary is completely compatible, the only reason to install a new mount is to get the read counts printed. Submitted by: Craig A Soules <soules+@andrew.cmu.edu> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* Convert various pieces of code to use vn_isdisk() rather than checkingphk1999-11-221-4/+4
| | | | | | | | for vp->v_type == VBLK. In ccd: we don't need to call VOP_GETATTR to find the type of a vnode. Reviewed by: sos
* This is a partial commit of the patch from PR 14914:phk1999-11-161-3/+1
| | | | | | | | | | | | | Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. This batch of changes compile to the same object files. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914
* Remove a #define which doesn't do miracles anymore.phk1999-10-301-1/+0
|
* useracc() the prequel:phk1999-10-291-1/+0
| | | | | | | | | | | Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
* Adjust the buffer cache to better handle small-memory machines. Adillon1999-10-241-56/+57
| | | | | | | | | | | | | slightly older version of this code was tested by BDE and I. Also fixes a lockup situation when kva gets too fragmented. Remove the maxvmiobufspace variable and sysctl, they are no longer used. Also cleanup (remove) #if 0 sections from prior commits. This code is more of a hack, but presumably the whole buffer cache implementation is going to be rewritten in the next year so it's no big deal.
* Trim unused options (or #ifdef for undoc options).peter1999-10-111-1/+0
| | | | Submitted by: phk
* Count bogus_page as wired.dt1999-09-301-0/+1
|
* Fix bug in brelse() regarding redirtying buffers on B_ERROR. brelse()dillon1999-09-201-3/+5
| | | | | | | | | | | | improperly ignored the B_INVAL flag when acting on the B_ERROR. If both B_INVAL and B_ERROR are set the buffer is typically out of the underlying device's block range and must be destroyed. If only B_ERROR is set (for a write), a write error occured and operation remains as it was before: the buffer must be redirtied to avoid corrupting the filesystem state. Reviewed by: David Greenman <dg@root.com> Submitted by: Tor.Egge@fast.no
* Allow getblk() to be called from an idle context (by panic() insideluoqi1999-09-031-4/+10
| | | | | | an interrupt handler). Reviewed by: dillon
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Cast pointers to uintptr_t instead of casting them to u_long, and/or vicebde1999-08-241-4/+5
| | | | versa. Cosmetic.
* Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,phk1999-08-081-2/+2
| | | | | | a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.
* Add sysctl and support code to allow directories to be VMIO'd. The defaultalc1999-07-261-1/+4
| | | | | | setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon
* bufhashinit() is called with a caddr_t and is expected to return thepeter1999-07-091-3/+3
| | | | same in both the alpha and i386 ports.
OpenPOWER on IntegriCloud