FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.	alfred	2003-01-21	1	-1/+1
\| \| \| \|	Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
*	- Use %j to print intmax_t values.	jhb	2002-11-07	1	-3/+4
\| \| \| \|	- Cast more daddr_t values to intmax_t when printing to quiet warnings.
*	- Use incore() where no other interlock locking is necessary.	jeff	2002-09-25	1	-2/+6
\| \| \| \|	- Lock access to numoutput.
*	Replace various spelling with FALLTHROUGH which is lint()able	charnier	2002-08-25	1	-3/+3
\|
*	o Lock page accesses by vm_page_io_start() with the page queues lock.	alc	2002-07-31	1	-1/+4
\| \| \| \|	o Assert that the page queues lock is held in vm_page_io_start().
*	Replace the global buffer hash table with per-vnode splay trees using a	dillon	2002-07-10	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc
*	This commit adds basic support for the UFS2 filesystem. The UFS2	mckusick	2002-06-21	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
*	Make daddr_t and u_daddr_t 64bits wide.	phk	2002-05-14	1	-3/+3
\| \| \| \| \| \|	Retire daddr64_t and use daddr_t instead. Sponsored by: DARPA & NAI Labs.
*	Remove __P.	alfred	2002-03-19	1	-3/+3
\|
*	Introduce the new 64-bit size disk block, daddr64_t. Change	mckusick	2002-03-15	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \|	the bio and buffer structures to have daddr64_t bio_pblkno, b_blkno, and b_lblkno fields which allows access to disks larger than a Terabyte in size. This change also requires that the VOP_BMAP vnode operation accept and return daddr64_t blocks. This delta should not affect system operation in any way. It merely sets up the necessary interfaces to allow the development of disk drivers that work with these larger disk block addresses. It also allows for the development of UFS2 which will use 64-bit block addresses.
*	Document all functions, global and static variables, and sysctls.	eivind	2002-03-05	1	-3/+11
\| \| \| \| \| \| \| \|	Includes some minor whitespace changes, and re-ordering to be able to document properly (e.g, grouping of variables and the SYSCTL macro calls for them, where the documentation has been added.) Reviewed by: phk (but all errors are mine)
*	Implement IO_NOWDRAIN and B_NOWDRAIN - prevents the buffer cache from blocking	dillon	2001-11-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	in wdrain during a write. This flag needs to be used in devices whos strategy routines turn-around and issue another high level I/O, such as when MD turns around and issues a VOP_WRITE to vnode backing store, in order to avoid deadlocking the dirty buffer draining code. Remove a vprintf() warning from MD when the backing vnode is found to be in-use. The syncer of buf_daemon could be flushing the backing vnode at the time of an MD operation so the warning is not correct. MFC after: 1 week
*	In cluster_rbuild(), 'size' had better match buf->b_bcount and buf->b_bufsize	dillon	2001-10-25	1	-2/+14
\| \| \| \| \| \| \|	or the cluster will not be properly merged. Dup the code from cluster_wbuild() and add some printf()s to see if bad cases are present. MFC after: 2 weeks
*	Syntax cleanup and documentation, no operational changes.	dillon	2001-10-21	1	-10/+55
\| \| \| \|	MFC after: 1 day
*	Change the kernel's ucred API as follows:	jhb	2001-10-11	1	-4/+2
\| \| \| \| \| \| \| \|	- crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
*	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach	dillon	2001-07-04	1	-10/+6
\| \| \| \| \| \| \| \| \|	(this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
*	This patch implements O_DIRECT about 80% of the way. It takes a patchset	dillon	2001-05-24	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon
*	Introduce a global lock for the vm subsystem (vm_mtx).	alfred	2001-05-19	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
*	Revert consequences of changes to mount.h, part 2.	grog	2001-04-29	1	-2/+0
\| \| \| \|	Requested by: bde
*	Correct #includes to work with fixed sys/mount.h.	grog	2001-04-23	1	-0/+2
\|
*	This patch removes the VOP_BWRITE() vector.	phk	2001-04-17	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \|	VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.
*	Fix lockup for loopback NFS mounts. The pipelined I/O limitations could be	dillon	2001-02-28	1	-6/+0
\| \| \| \| \| \| \| \| \|	hit on the client side and prevent the server side from retiring writes. Pipeline operations turned off for all READs (no big loss since reads are usually synchronous) and for NFS writes, and left on for the default bwrite(). (MFC expected prior to 4.3 freeze) Testing by: mjacob, dillon
*	Fix typo: teh -> the.	asmodai	2001-02-06	1	-1/+1
\|
*	Do not cluster with B_LOCKED buffers.	dillon	2001-01-19	1	-4/+20
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is an odd one. This patch appears to fix a panic related to background bitmap writes (for FFS), though neither Kirk, Ian, or I can figure out how B_CLUSTEROK could possibly be set on a bitmap block to cause the clustering code to improperly cluster with a buffer undergoing a background write. In anycase, the clustering code is very fragile and this patch helps with that, as well as possibly fixing a bug Andre was having. Suggested by: Ian Dowse <iedowse@maths.tcd.ie> Testing by: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
*	This implements a better launder limiting solution. There was a solution	dillon	2000-12-26	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.
*	Implement a low-memory deadlock solution.	dillon	2000-11-18	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
*	Don't attempt to cluster write buffers where the VMIO flag isn't set.	tegge	2000-11-17	1	-1/+2
\|
*	Virtualizes & untangles the bioops operations vector.	phk	2000-06-16	1	-3/+2
\| \| \| \|	Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@
*	Separate the struct bio related stuff out of <sys/buf.h> into	phk	2000-05-05	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	<sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
*	s/biowait/bufwait/g	phk	2000-04-29	1	-1/+1
\| \| \| \|	Prodded by: several.
*	Complete the bio/buf divorce for all code below devfs::strategy	phk	2000-04-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS
*	Move B_ERROR flag to b_ioflags and call it BIO_ERROR.	phk	2000-04-02	1	-6/+10
\| \| \| \| \| \| \| \| \| \| \| \| \|	(Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.
*	Change the write-behind code to take more care when starting	dillon	2000-04-02	1	-8/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	async I/O's. The sequential read heuristic has been extended to cover writes as well. We continue to call cluster_write() normally, thus blocks in the file will still be reallocated for large (but still random) I/O's, but I/O will only be initiated for truely sequential writes. This solves a number of annoying situations, especially with DBM (hash method) writes, and also has the side effect of fixing a number of (stupid) benchmarks. Reviewed-by: mckusick
*	Remove B_READ, B_WRITE and B_FREEBUF and replace them with a new	phk	2000-03-20	1	-12/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.
*	useracc() the prequel:	phk	1999-10-29	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \|	Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.
*	Remove v_maxio from struct vnode.	phk	1999-09-29	1	-4/+4
\| \| \| \| \| \|	Replace it with mnt_iosize_max in struct mount. Nits from: bde
*	Initialize vp->v_maxio to its default in getnetvnode() rather than	phk	1999-09-20	1	-8/+0
\| \| \| \|	four different places in vfs_cluster.c
*	If integration of a buffer into a cluster write operation fails, release	tegge	1999-08-31	1	-1/+3
\| \| \| \| \| \|	the buffer instead of creating a future deadlock. PR: 12800 Submitted by: dillon
*	$Id$ -> $FreeBSD$	peter	1999-08-28	1	-1/+1
\|
*	These changes appear to give us benefits with both small (32MB) and	mckusick	1999-07-08	1	-13/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() PRIOR to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
*	The vfs.write_behind sysctl and related code support has been added to	mckusick	1999-07-04	1	-3/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	allow changes to the filesystem's write_behind behavior. By the default the filesystem aggressively issues write_behind's. Three values may be specified for vfs.write_behind. 0 disables write_behind, 1 results in historical operation (agressive write_behind), and 2 is an experimental backed-off write_behind. The values of 0 and 1 are recommended. The value of 0 is recommended in conjuction with an increase in the number of NBUF's and the number of dirty buffers allowed (vfs.{lo,hi}dirtybuffers). Note that a value of 0 will radically increase the dirty buffer load on the system. Future work on write_behind behavior will use values 2 and greater for testing purposes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
*	Hopefully fix the remaining glitches with the BUF_*() changes. This should	peter	1999-06-29	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	(really this time) fix pageout to swap and a couple of clustering cases. This simplifies BUF_KERNPROC() so that it unconditionally reassigns the lock owner rather than testing B_ASYNC and having the caller decide when to do the reassign. At present this is required because some places use B_CALL/b_iodone to free the buffers without B_ASYNC being set. Also, vfs_cluster.c explicitly calls BUF_KERNPROC() when attaching the buffers rather than the parent walking the cluster_head tailq. Reviewed by: Kirk McKusick <mckusick@mckusick.com>
*	Convert buffer locking from using the B_BUSY and B_WANTED flags to using	mckusick	1999-06-26	1	-18/+17
\| \| \| \| \| \| \|	lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
*	Reformat comment to match indentation of code around it.	julian	1999-06-17	1	-8/+9
\|
*	Changed trypbuf to a getpbuf to work around a problem where redundant writes	dg	1999-06-16	1	-2/+2
\| \| \| \| \| \| \| \| \|	would occur when clustering them - caused by running out of buffers and taking a degenerate code path as a result. It appears that waiting instead for buffers to become available is okay. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Discovered by: Craig A Soules <soules+@andrew.cmu.edu>
*	The VFS/BIO subsystem contained a number of hacks in order to optimize	alc	1999-05-02	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
*	Reviewed by: Many at differnt times in differnt parts,	julian	1999-03-12	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)
*	Fix warnings in preparation for adding -Wall -Wcast-qual to the	dillon	1999-01-27	1	-2/+2
\| \| \| \|	kernel compile
*	This is a rather large commit that encompasses the new swapper,	dillon	1999-01-21	1	-4/+6
\| \| \| \| \| \| \| \| \| \|	changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
*	KNFize, by bde.	eivind	1999-01-10	1	-9/+8
\|