summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_fault.c
Commit message (Collapse)AuthorAgeFilesLines
...
* Convert a "page not busy" warning to an assertion.alc1999-07-201-7/+3
| | | | Submitted by: dillon@backplane.com
* The VFS/BIO subsystem contained a number of hacks in order to optimizealc1999-05-021-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
* Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>alc1999-02-251-2/+3
| | | | | | Corrected the computation of cnt.v_ozfod in vm_fault: vm_fault was counting the number of unoptimized rather than optimized zero-fill faults.
* Submitted by: Luoqi Chen <luoqi@watermarkgroup.com>dillon1999-02-171-1/+12
| | | | | | | Unlock vnode before messing with map to avoid deadlock between map and vnode ( e.g. with exec_map and underlying program binary vnode ). Solves a deadlock that most often occurs during a large -j# buildworld reported by three people.
* Remove MAP_ENTRY_IS_A_MAP 'share' maps. These maps were once used todillon1999-02-071-4/+3
| | | | | | attempt to optimize forks but were essentially given-up on due to problems and replaced with an explicit dup of the vm_map_entry structure. Prior to the removal, they were entirely unused.
* Change all manual settings of vm_page_t->dirty = VM_PAGE_BITS_ALLdillon1999-01-241-2/+2
| | | | | | to use the vm_page_dirty() inline. The inline can thus do sanity checks ( or not ) over all cases.
* Get rid of unused old_m in vm_fault. Add INVARIANTS to test whetherdillon1999-01-241-4/+13
| | | | | page is still busy after all the hell vm_fault goes through.. it is supposed to be, and printf() if it isn't. don't panic, though.
* Reenable John Dyson's low-memory VM_WAIT code for page reactivations outdillon1999-01-231-7/+13
| | | | | of PQ_CACHE. Add comments explaining what it accomplishes and its limitations.
* Mainly cleanup. Removed some inappropriate low-memory handling codedillon1999-01-211-1/+1
| | | | | and added lots of comments. Add tie-in to vm_pager ( and thus the new swapper ) to deallocate backing swap for dirtied pages on the fly.
* This is a rather large commit that encompasses the new swapper,dillon1999-01-211-37/+84
| | | | | | | | | | changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
* KNFize, by bde.eivind1999-01-101-2/+1
|
* Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT aseivind1999-01-081-5/+3
| | | | | | | | | discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith
* Add missing splvm protection around unqueue call. Without this, the pagedg1998-11-251-4/+4
| | | | queues would eventually get corrupted.
* Added a second argument, "activate" to the vm_page_unwire() call so thatdg1998-10-281-3/+3
| | | | the caller can select either inactive or active queue to put the page on.
* Nitpicking and dusting performed on a train. Removes trivial warningsphk1998-10-251-2/+1
| | | | about unused variables, labels and other lint.
* Cosmetic changes to the PAGE_XXX macros to make them consistent withdfr1998-09-041-12/+12
| | | | the other objects in vm.
* Change various syscalls to use size_t arguments instead of u_int.dfr1998-08-241-10/+11
| | | | | | | | | | Add some overflow checks to read/write (from bde). Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable. Reviewed by: Bruce Evans <bde@zeta.org.au>
* Protect all modifications to paging_in_progress with splvm(). The i386dfr1998-08-061-3/+3
| | | | | | | | | managed to avoid corruption of this variable by luck (the compiler used a memory read-modify-write instruction which wasn't interruptable) but other architectures cannot. With this change, I am now able to 'make buildworld' on the alpha (sfx: the crowd goes wild...)
* Improved pager input failure message.dg1998-07-221-3/+3
|
* Fixed printf format errors.bde1998-07-111-2/+2
|
* Work around some VM bugs, the worst being an overly aggressivedyson1998-05-041-2/+3
| | | | | swap space free calculation. More complete fixes will be forthcoming, in a week.
* Make vm_fault much cleaner by removing the evil macro inlines, anddyson1998-03-071-220/+208
| | | | | | | | | | put alot of it's context into a data structure. This allows significant shortening of its codepath, and will significantly decrease it's cache footprint. Also, add some stats to vmmeter. Note that you'll have to rebuild/recompile vmstat, systat, etc... Otherwise, you'll get "very interesting" paging stats.
* 1) Use a more consistent page wait methodology.dyson1998-03-011-6/+7
| | | | | | | | | | | | | 2) Do not unnecessarily force page blocking when paging pages out. 3) Further improve swap pager performance and correctness, including fixing the paging in progress deadlock (except in severe I/O error conditions.) 4) Enable vfs_ioopt=1 as a default. 5) Fix and enable the page prezeroing in SMP mode. All in all, SMP systems especially should show a significant improvement in "snappyness."
* Staticize.eivind1998-02-091-3/+4
|
* Back out DIAGNOSTIC changes.eivind1998-02-061-3/+1
|
* Turn DIAGNOSTIC into a new-style option.eivind1998-02-041-1/+3
|
* Change the busy page mgmt, so that when pages are freed, theydyson1998-01-311-14/+5
| | | | | | | | | | | | | | MUST be PG_BUSY. It is bogus to free a page that isn't busy, because it is in a state of being "unavailable" when being freed. The additional advantage is that the page_remove code has a better cross-check that the page should be busy and unavailable for other use. There were some minor problems with the collapse code, and this plugs those subtile "holes." Also, the vfs_bio code wasn't checking correctly for PG_BUSY pages. I am going to develop a more consistant scheme for grabbing pages, busy or otherwise. For now, we are stuck with the current morass.
* VM level code cleanups.dyson1998-01-221-170/+89
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) Start using TSM. Struct procs continue to point to upages structure, after being freed. Struct vmspace continues to point to pte object and kva space for kstack. u_map is now superfluous. 2) vm_map's don't need to be reference counted. They always exist either in the kernel or in a vmspace. The vmspaces are managed by reference counts. 3) Remove the "wired" vm_map nonsense. 4) No need to keep a cache of kernel stack kva's. 5) Get rid of strange looking ++var, and change to var++. 6) Change more data structures to use our "zone" allocator. Added struct proc, struct vmspace and struct vnode. This saves a significant amount of kva space and physical memory. Additionally, this enables TSM for the zone managed memory. 7) Keep ioopt disabled for now. 8) Remove the now bogus "single use" map concept. 9) Use generation counts or id's for data structures residing in TSM, where it allows us to avoid unneeded restart overhead during traversals, where blocking might occur. 10) Account better for memory deficits, so the pageout daemon will be able to make enough memory available (experimental.) 11) Fix some vnode locking problems. (From Tor, I think.) 12) Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp. (experimental.) 13) Significantly shrink, cleanup, and make slightly faster the vm_fault.c code. Use generation counts, get rid of unneded collpase operations, and clean up the cluster code. 14) Make vm_zone more suitable for TSM. This commit is partially as a result of discussions and contributions from other people, including DG, Tor Egge, PHK, and probably others that I have forgotten to attribute (so let me know, if I forgot.) This is not the infamous, final cleanup of the vnode stuff, but a necessary step. Vnode mgmt should be correct, but things might still change, and there is still some missing stuff (like ioopt, and physical backing of non-merged cache files, debugging of layering concepts.)
* Tie up some loose ends in vnode/object management. Remove an unneededdyson1998-01-171-40/+52
| | | | | | | config option in pmap. Fix a problem with faulting in pages. Clean-up some loose ends in swap pager memory management. The system should be much more stable, but all subtile bugs aren't fixed yet.
* Fix some vnode management problems, and better mgmt of vnode free list.dyson1998-01-121-1/+3
| | | | | | | | | | | | | | | Fix the UIO optimization code. Fix an assumption in vm_map_insert regarding allocation of swap pagers. Fix an spl problem in the collapse handling in vm_object_deallocate. When pages are freed from vnode objects, and the criteria for putting the associated vnode onto the free list is reached, either put the vnode onto the list, or put it onto an interrupt safe version of the list, for further transfer onto the actual free list. Some minor syntax changes changing pre-decs, pre-incs to post versions. Remove a bogus timeout (that I added for debugging) from vn_lock. PHK will likely still have problems with the vnode list management, and so do I, but it is better than it was.
* Make our v_usecount vnode reference count work identically to thedyson1998-01-061-11/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
* Some performance improvements, and code cleanups (including changing ourdyson1997-12-191-2/+7
| | | | expensive OFF_TO_IDX to btoc whenever possible.)
* Removed unused #includes.bde1997-09-011-5/+1
|
* Fixed type mismatches for functions with args of type vm_prot_t and/orbde1997-08-251-6/+2
| | | | | | | | | vm_inherit_t. These types are smaller than ints, so the prototypes should have used the promoted type (int) to match the old-style function definitions. They use just vm_prot_t and/or vm_inherit_t. This depends on gcc features to work. I fixed the definitions since this is easiest. The correct fix may be to change the small types to u_int, to optimize for time instead of space.
* Fix a few bugs with NFS and mmap caused by NFS' use of b_validoffdfr1997-05-191-2/+2
| | | | | | | | and b_validend. The changes to vfs_bio.c are a bit ugly but hopefully can be tidied up later by a slight redesign. PR: kern/2573, kern/2754, kern/3046 (possibly) Reviewed by: dyson
* Commit a typo fix that's been sitting in my tree for ages, quite forgotten.peter1997-04-061-2/+2
| | | | | | | | | The typo was detected once apon a time with the -Wunused compile option. The result was that a block of code for implementing madvise(.. MADV_SEQUENTIAL..) behavior was "dead" and unused, probably negating the effect of activating the option. Reviewed by: dyson
* Fix the gdb executable modify problem. Thanks to the detective workdyson1997-04-061-9/+11
| | | | | | | | | | | | by Alan Cox <alc@cs.rice.edu>, and his description of the problem. The bug was primarily in procfs_mem, but the mistake likely happened due to the lack of vm system support for the operation. I added better support for selective marking of page dirty flags so that vm_map_pageable(wiring) will not cause this problem again. The code in procfs_mem is now less bogus (but maybe still a little so.)
* Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are notpeter1997-02-221-1/+1
| | | | ready for it yet.
* This is the kernel Lite/2 commit. There are some requisite userlanddyson1997-02-101-2/+3
| | | | | | | | | | | | | | | changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
* Change the map entry flags from bitfields to bitmasks. Allowsdyson1997-01-161-2/+2
| | | | for some code simplification.
* Make the long-awaited change from $Id$ to $FreeBSD$jkh1997-01-141-1/+1
| | | | | | | | This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
* Undo the collapse breakage (swap space usage problem.)dyson1997-01-031-3/+1
|
* Guess what? We left alot of the old collapse code that is not neededdyson1997-01-011-1/+3
| | | | | | anymore with the "full" collapse fix that we added about 1yr ago!!! The code has been removed by optioning it out for now, so we can put it back in ASAP if any problems are found.
* Superficial cleanup of comment.dyson1996-12-291-2/+2
|
* Implement closer-to POSIX mlock semantics. The major difference isdyson1996-12-141-2/+69
| | | | | | | | | | that we do allow mlock to span unallocated regions (of course, not mlocking them.) We also allow mlocking of RO regions (which the old code couldn't.) The restriction there is that once a RO region is wired (mlocked), it cannot be debugged (or EVER written to.) Under normal usage, the new mlock code will be a significant improvement over our old stuff.
* Implement a new totally dynamic (up to MAXPHYS) buffer kva allocationdyson1996-11-301-8/+7
| | | | | | | | | | | | | | | scheme. Additionally, add the capability for checking for unexpected kernel page faults. The maximum amount of kva space for buffers hasn't been decreased from where it is, but it will now be possible to do so. This scheme manages the kva space similar to the buffers themselves. If there isn't enough kva space because of usage or fragementation, buffers will be reclaimed until a buffer allocation is successful. This scheme should be very resistant to fragmentation problems until/if the LFS code is fixed and uses the bogus buffer locking scheme -- but a 'fixed' LFS is not likely to use such a scheme. Now there should be NO problem allocating buffers up to MAXPHYS.
* Addition of page coloring support. Various levels of coloring are afforded.dyson1996-09-081-2/+2
| | | | | | The default level works with minimal overhead, but one can also enable full, efficient use of a 512K cache. (Parameters can be generated to support arbitrary cache sizes also.)
* Backed out the recent changes/enhancements to the VM code. Thedyson1996-07-301-6/+13
| | | | | | | problem with the 'shell scripts' was found, but there was a 'strange' problem found with a 486 laptop that we could not find. This commit backs the code back to 25-jul, and will be re-entered after the snapshot in smaller (more easily tested) chunks.
* Undo part of the scalability commit. Many of the changesdyson1996-07-281-29/+15
| | | | | in vm_fault had some performance enhancements not ready for prime time. This commit backs out some of the changes.
* This commit is meant to solve a couple of VM system problems ordyson1996-07-271-27/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | performance issues. 1) The pmap module has had too many inlines, and so the object file is simply bigger than it needs to be. Some common code is also merged into subroutines. 2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls. Unfortunately, a few have needed to be added also. The removal caused the need for more vm_page_lookups. I added lookup hints to minimize the need for the page table lookup operations. 3) Removal of some bogus performance improvements, that mostly made the code more complex (tracking individual page table page updates unnecessarily). Those improvements actually hurt 386 processors perf (not that people who worry about perf use 386 processors anymore :-)). 4) Changed pv queue manipulations/structures to be TAILQ's. 5) The pv queue code has had some performance problems since day one. Some significant scalability issues are resolved by threading the pv entries from the pmap AND the physical address instead of just the physical address. This makes certain pmap operations run much faster. This does not affect most micro-benchmarks, but should help loaded system performance *significantly*. DG helped and came up with most of the solution for this one. 6) Most if not all pmap bit operations follow the pattern: pmap_test_bit(); pmap_clear_bit(); That made for twice the necessary pv list traversal. The pmap interface now supports only pmap_tc_bit type operations: pmap_[test/clear]_modified, pmap_[test/clear]_referenced. Additionally, the modified routine now takes a vm_page_t arg instead of a phys address. This eliminates a PHYS_TO_VM_PAGE operation. 7) Several rewrites of routines that contain redundant code to use common routines, so that there is a greater likelihood of keeping the cache footprint smaller.
OpenPOWER on IntegriCloud