summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_cluster.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Reviewed by: Many at differnt times in differnt parts,julian1999-03-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)
* Fix warnings in preparation for adding -Wall -Wcast-qual to thedillon1999-01-271-2/+2
| | | | kernel compile
* This is a rather large commit that encompasses the new swapper,dillon1999-01-211-4/+6
| | | | | | | | | | changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
* KNFize, by bde.eivind1999-01-101-9/+8
|
* Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT aseivind1999-01-081-18/+9
| | | | | | | | | discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith
* Even the most recently allocated buffer may not have its b_blknomckusick1998-12-051-2/+5
| | | | | | field properly filled in, so we must do a VOP_BMAP on that buffer as well if it is not resolved. Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>
* Because buffers may be tossed and recreated at will under the new VMmckusick1998-11-171-4/+9
| | | | | | system, the mapping from logical to physical block number may be lost. Hence we have to check for a reconstituted buffer and redo the call to VOP_BMAP if the physical block number has been lost.
* Fixed a missing include. <sys/kernel.h> is needed by the newbde1998-11-151-2/+2
| | | | | | | MALLOC_DEFINE() and MALLOC_DEFINE() is needed by the recently reenabled "reallocblks" code, but <sys/kernel.h> was only included if CLUSTERDEBUG was defined. This was too harmless. gcc only warns about garbage like `SYSINIT(blech);' at file scope ...
* Restored the "reallocblks" code to its former glory. What this does isdg1998-11-131-15/+4
| | | | | | basically do a on-the-fly defragmentation of the FFS filesystem, changing file block allocations to make them contiguous. Thanks to Kirk McKusick for providing hints on what needed to be done to get this working.
* Nitpicking and dusting performed on a train. Removes trivial warningsphk1998-10-251-2/+2
| | | | about unused variables, labels and other lint.
* Cosmetic changes to the PAGE_XXX macros to make them consistent withdfr1998-09-041-3/+3
| | | | the other objects in vm.
* Change various syscalls to use size_t arguments instead of u_int.dfr1998-08-241-9/+5
| | | | | | | | | | Add some overflow checks to read/write (from bde). Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable. Reviewed by: Bruce Evans <bde@zeta.org.au>
* Protect all modifications to v_numoutput with splbio().dfr1998-08-131-1/+3
|
* Protect all modifications to paging_in_progress with splvm(). The i386dfr1998-08-061-2/+6
| | | | | | | | | managed to avoid corruption of this variable by luck (the compiler used a memory read-modify-write instruction which wasn't interruptable) but other architectures cannot. With this change, I am now able to 'make buildworld' on the alpha (sfx: the crowd goes wild...)
* Fixed printf format errors.bde1998-07-291-12/+12
|
* Fixed printf format errors.bde1998-07-111-5/+6
|
* Don't depend on gcc's feature of casting lvalues.bde1998-07-071-4/+5
|
* VOP_STRATEGY grows an (struct vnode *) argumentjulian1998-07-041-3/+3
| | | | | | as the value in b_vp is often not really what you want. (and needs to be frobbed). more cleanups will follow this. Reviewed by: Bruce Evans <bde@freebsd.org>
* Make flushing dirty pages work correctly on filesystems thatdyson1998-05-211-12/+8
| | | | | | unexpectedly do not complete writes even with sync I/O requests. This should help the behavior of mmaped files when using softupdates (and perhaps in other circumstances also.)
* Partially fixed write clustering for cases where cluster_wbuild() isbde1998-05-011-1/+4
| | | | | | | called from vfs_bio_awrite() without going through cluster_write() or ufs_bmaparray(), in particular for all writes to block disk devices. Only ufs_bmaparray() sets vp->v_maxio in a correct way, and it doesn't seem to be called early enough even for regular files.
* In kern_physio.c fix tsleep priority messup.dyson1998-03-191-11/+22
| | | | | | | | | | | | | | | | | | | | | | In vfs_bio.c, remove b_generation count usage, remove redundant reassignbuf, remove redundant spl(s), manage page PG_ZERO flags more correctly, utilize in invalid value for b_offset until it is properly initialized. Add asserts for #ifdef DIAGNOSTIC, when b_offset is improperly used. when a process is not performing I/O, and just waiting on a buffer generally, make the sleep priority low. only check page validity in getblk for B_VMIO buffers. In vfs_cluster, add b_offset asserts, correct pointer calculation for clustered reads. Improve readability of certain parts of the code. Remove redundant spl(s). In vfs_subr, correct usage of vfs_bio_awrite (From Andrew Gallatin <gallatin@cs.duke.edu>). More vtruncbuf problems fixed.
* Remove a soft-update hook that was accidentally added to the READ path.julian1998-03-161-26/+53
| | | | also add some comments, and a couple of very minor cosmetic changes.
* Some VM improvements, including elimination of alot of Sig-11dyson1998-03-161-16/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | problems. Tor Egge and others have helped with various VM bugs lately, but don't blame him -- blame me!!! pmap.c: 1) Create an object for kernel page table allocations. This fixes a bogus allocation method previously used for such, by grabbing pages from the kernel object, using bogus pindexes. (This was a code cleanup, and perhaps a minor system stability issue.) pmap.c: 2) Pre-set the modify and accessed bits when prudent. This will decrease bus traffic under certain circumstances. vfs_bio.c, vfs_cluster.c: 3) Rather than calculating the beginning virtual byte offset multiple times, stick the offset into the buffer header, so that the calculated offset can be reused. (Long long multiplies are often expensive, and this is a probably unmeasurable performance improvement, and code cleanup.) vfs_bio.c: 4) Handle write recursion more intelligently (but not perfectly) so that it is less likely to cause a system panic, and is also much more robust. vfs_bio.c: 5) getblk incorrectly wrote out blocks that are incorrectly sized. The problem is fixed, and writes blocks out ONLY when B_DELWRI is true. vfs_bio.c: 6) Check that already constituted buffers have fully valid pages. If not, then make sure that the B_CACHE bit is not set. (This was a major source of Sig-11 type problems.) vfs_bio.c: 7) Fix a potential system deadlock due to an incorrectly specified sleep priority while waiting for a buffer write operation. The change that I made opens the system up to serious problems, and we need to examine the issue of process sleep priorities. vfs_cluster.c, vfs_bio.c: 8) Make clustered reads work more correctly (and more completely) when buffers are already constituted, but not fully valid. (This was another system reliability issue.) vfs_subr.c, ffs_inode.c: 9) Create a vtruncbuf function, which is used by filesystems that can truncate files. The vinvalbuf forced a file sync type operation, while vtruncbuf only invalidates the buffers past the new end of file, and also invalidates the appropriate pages. (This was a system reliabiliy and performance issue.) 10) Modify FFS to use vtruncbuf. vm_object.c: 11) Make the object rundown mechanism for OBJT_VNODE type objects work more correctly. Included in that fix, create pager entries for the OBJT_DEAD pager type, so that paging requests that might slip in during race conditions are properly handled. (This was a system reliability issue.) vm_page.c: 12) Make some of the page validation routines be a little less picky about arguments passed to them. Also, support page invalidation change the object generation count so that we handle generation counts a little more robustly. vm_pageout.c: 13) Further reduce pageout daemon activity when the system doesn't need help from it. There should be no additional performance decrease even when the pageout daemon is running. (This was a significant performance issue.) vnode_pager.c: 14) Teach the vnode pager to handle race conditions during vnode deallocations.
* Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)julian1998-03-081-3/+8
| | | | | Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
* This mega-commit is meant to fix numerous interrelated problems. Theredyson1998-03-071-12/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
* Back out DIAGNOSTIC changes.eivind1998-02-061-2/+1
|
* Turn DIAGNOSTIC into a new-style option.eivind1998-02-041-1/+2
|
* Change the busy page mgmt, so that when pages are freed, theydyson1998-01-311-2/+13
| | | | | | | | | | | | | | MUST be PG_BUSY. It is bogus to free a page that isn't busy, because it is in a state of being "unavailable" when being freed. The additional advantage is that the page_remove code has a better cross-check that the page should be busy and unavailable for other use. There were some minor problems with the collapse code, and this plugs those subtile "holes." Also, the vfs_bio code wasn't checking correctly for PG_BUSY pages. I am going to develop a more consistant scheme for grabbing pages, busy or otherwise. For now, we are stuck with the current morass.
* Make the debug options new-style.eivind1998-01-311-1/+3
| | | | | This also zaps a DPT option from lint; it wasn't referenced from anywhere.
* Add better support for larger I/O clusters, including larger physicaldyson1998-01-241-5/+11
| | | | | I/O. The support is not mature yet, and some of the underlying implementation needs help. However, support does exist for IDE devices now.
* Make our v_usecount vnode reference count work identically to thedyson1998-01-061-6/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
* Remove a bunch of variables which were unused both in GENERIC and LINT.phk1997-11-071-2/+2
| | | | Found by: -Wunused
* Removed unused #includes.bde1997-08-021-5/+1
|
* Fix a problem with the VN device. Specifically, the VN device candyson1997-06-151-1/+2
| | | | | | | | | | cause a problem of spiraling death due to buffer resource limitations. The vfs_bio code in general had little ability to handle buffer resource management, and now it does. Also, there are a lot more knobs for tuning the vfs_bio code now. The knobs came free because of the need that there always be some immediately available buffers (non-delayed or locked) for use. Note that the buffer cache code is much less likely to get bogged down with lots of delayed writes, even more so than before.
* Don't zero b_dirtyoff and b_dirtyend on error.dfr1997-04-251-3/+3
| | | | Submitted by: Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp>
* Don't allow partial buffers to be cluster-comitted.dfr1997-04-181-1/+2
| | | | | | | Zero the b_dirty{off,end} after cluster-comitting a group of buffers. With these fixes, I was able to complete a 'make world' with remote src and obj directories.
* Use OID_AUTO instead of magic number for the old sysctl debug.rcluster.bde1997-04-011-16/+3
| | | | | | | | The magic number conflicted with the rotting disabled one in ext2fs for debug.doasyncfree. Removed messy debugging variable/constant/sysctl debug.doreallocblks. Lite2 removed it, and we don't use the code that it controls.
* Remove unnecessary check for vp->v_mount being null. Pointeddyson1997-03-071-2/+2
| | | | out by BDE.
* Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are notpeter1997-02-221-1/+1
| | | | ready for it yet.
* Make the long-awaited change from $Id$ to $FreeBSD$jkh1997-01-141-1/+1
| | | | | | | | This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
* This commit is the embodiment of some VFS read clustering improvements.dyson1996-12-291-127/+181
| | | | | | | | | | | | Firstly, now our read-ahead clustering is on a file descriptor basis and not on a per-vnode basis. This will allow multiple processes reading the same file to take advantage of read-ahead clustering. Secondly, there previously was a problem with large reads still using the ramp-up algorithm. Of course, that was bogus, and now we read the entire "chunk" off of the disk in one operation. The read-ahead clustering algorithm should use less CPU than the previous also (I hope :-)). NOTE: THAT LKMS MUST BE REBUILT!!!
* Implement a new totally dynamic (up to MAXPHYS) buffer kva allocationdyson1996-11-301-1/+9
| | | | | | | | | | | | | | | scheme. Additionally, add the capability for checking for unexpected kernel page faults. The maximum amount of kva space for buffers hasn't been decreased from where it is, but it will now be possible to do so. This scheme manages the kva space similar to the buffers themselves. If there isn't enough kva space because of usage or fragementation, buffers will be reclaimed until a buffer allocation is successful. This scheme should be very resistant to fragmentation problems until/if the LFS code is fixed and uses the bogus buffer locking scheme -- but a 'fixed' LFS is not likely to use such a scheme. Now there should be NO problem allocating buffers up to MAXPHYS.
* Fix 4 problems:dyson1996-10-061-4/+10
| | | | | | | | | | | Major: When blocking occurs in allocbuf() for VMIO files, excess wire counts could accumulate. Major: Pages are incorrectly accumulated into the physical buffer for clustered reads. This happens when bogus page is needed. Minor: When reclaiming buffers, the async flag on the buffer needs to be zero, or the reclaim is not optimal. Minor: The age flag should be cleared, if a buffer is wanted.
* Modification to vfs_cluster to allow clustering of NFS delayed writes.dyson1996-07-271-3/+14
| | | | Submitted by: Doug Rabson <dfr@render.com>
* Fix an error when B_MALLOC buffers are returned from the cluster readdyson1996-06-031-3/+4
| | | | | | code without the B_READ flag being set. This is a problem when the data is not cached, and the result will be a bogus attempted write. Submitted by: Kato Takenori <kato@eclogite.eps.nagoya-u.ac.jp>
* 1) Fix a bug that a buffer is removed from a queue, but thedyson1996-03-021-3/+3
| | | | | | | | | | | | | | | queue type is not set to QUEUE_NONE. This appears to have caused a hang bug that has been lurking. 2) Fix bugs that brelse'ing locked buffers do not "free" them, but the code assumes so. This can cause hangs when LFS is used. 3) Use malloced memory for directories when applicable. The amount of malloced memory is seriously limited, but should decrease the amount of memory used by an average directory to 1/4 - 1/2 previous. This capability is fully tunable. (Note that there is no config parameter, and might never be.) 4) Bias slightly the buffer cache usage towards non-VMIO buffers. Since the data in VMIO buffers is not lost when the buffer is reclaimed, this will help performance. This is adjustable also.
* An earlier modification had decreased CPU usage, but alsodyson1996-01-281-4/+4
| | | | decreased performance. This essentially undoes that change.
* Previous commit to vfs_cluster accidentally disabled read-ahead. Problemdyson1996-01-201-1/+2
| | | | | fixed by initializing "alreadyincore" to 0 in the case of sequential reads.
* Eliminated many redundant vm_map_lookup operations for vm_mmap.dyson1996-01-191-31/+26
| | | | | | | | | | | | | | | | | | | | | | | | | Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
* Fixed bugs and finished staticization for things inside `#ifdef DEBUG'.bde1995-12-221-20/+22
| | | | | Moved most of these things inside `#ifdef notyet_block_reallocation_enabled' where they may never be used again.
OpenPOWER on IntegriCloud