summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_page.h
Commit message (Collapse)AuthorAgeFilesLines
* Implement idle zeroing of pages. I've been tinkering with thispeter2001-08-251-0/+1
| | | | | | | | | | | | | | on and off since John Dyson left his work-in-progress. It is off by default for now. sysctl vm.zeroidle_enable=1 to turn it on. There are some hacks here to deal with the present lack of preemption - we yield after doing a small number of pages since we wont preempt otherwise. This is basically Matt's algorithm [with hysteresis] with an idle process to call it in a similar way it used to be called from the idle loop. I cleaned up the includes a fair bit here too.
* Oops. Last commit to vm_object.c should have got these files too.jake2001-07-311-1/+0
| | | | | | | Remove the use of atomic ops to manipulate vm_object and vm_page flags. Giant is required here, so they are superfluous. Discussed with: dillon
* make vm_page_select_cache staticassar2001-07-231-1/+0
| | | | Requested by: bde
* (vm_page_select_cache): add prototypeassar2001-07-211-0/+1
|
* Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).dillon2001-07-041-0/+17
| | | | | | | | | | | Also removed some spl's and added some VM mutexes, but they are not actually used yet, so this commit does not really make any operational changes to the system. vm_page.c relates to vm_page_t manipulation, including high level deactivation, activation, etc... vm_pageq.c relates to finding free pages and aquiring exclusive access to a page queue (exclusivity part not yet implemented). And the world still builds... :-)
* Change inlines back into mainline code in preparation for mutexing. Also,dillon2001-07-041-314/+45
| | | | | | | | most of these inlines had been bloated in -current far beyond their original intent. Normalize prototypes and function declarations to be ANSI only (half already were). And do some general cleanup. (kernel size also reduced by 50-100K, but that isn't the prime intent)
* With Alfred's permission, remove vm_mtx in favor of a fine-grained approachdillon2001-07-041-22/+14
| | | | | | | | | (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
* This patch implements O_DIRECT about 80% of the way. It takes a patchsetdillon2001-05-241-0/+1
| | | | | | | | | | | | | | | | Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon
* Introduce a global lock for the vm subsystem (vm_mtx).alfred2001-05-191-6/+25
| | | | | | | | | | | | | | | | | | | vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
* This implements a better launder limiting solution. There was a solutiondillon2000-12-261-0/+1
| | | | | | | | | | | | | | | | | | | in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.
* Implement a low-memory deadlock solution.dillon2000-11-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
* Make the arguments match the functionality of the functions.obrien2000-08-261-2/+2
|
* #elsif -> #elifalfred2000-07-111-8/+8
| | | | Noticed by: green
* Replace the PQ_*CACHE options with a single PQ_CACHESIZE option that youjhb2000-07-041-24/+31
| | | | | | | set equal to the number of kilobytes in your cache. The old options are still supported for backwards compatibility. Submitted by: Kelly Yancey <kbyanc@posi.net>
* This is a cleanup patch to Peter's new OBJT_PHYS VM object typedillon2000-05-291-0/+9
| | | | | | | | | | | | | | | | | and sysv shared memory support for it. It implements a new PG_UNMANAGED flag that has slightly different characteristics from PG_FICTICIOUS. A new sysctl, kern.ipc.shm_use_phys has been added to enable the use of physically-backed sysv shared memory rather then swap-backed. Physically backed shm segments are not tracked with PV entries, allowing programs which use a large shm segment as a rendezvous point to operate without eating an insane amount of KVM in the PV entry management. Read: Oracle. Peter's OBJT_PHYS object will also allow us to eventually implement page-table sharing and/or 4MB physical page support for such segments. We're half way there.
* Back out the previous change to the queue(3) interface.jake2000-05-261-3/+3
| | | | | | It was not discussed and should probably not happen. Requested by: msmith and others
* Change the way that the queue(3) structures are declared; don't assume thatjake2000-05-231-3/+3
| | | | | | | | the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
* Implement an optimization of the VM<->pmap API. Pass vm_page_t's directlypeter2000-05-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry. Inspired by: John Dyson's 1998 patches. Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha). Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits). Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files. Reviewed by: dillon
* Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"peter1999-12-291-2/+2
| | | | | | is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
* Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC todillon1999-12-121-4/+4
| | | | | | | | | | | | | | | | | madvise(). This feature prevents the update daemon from gratuitously flushing dirty pages associated with a mapped file-backed region of memory. The system pager will still page the memory as necessary and the VM system will still be fully coherent with the filesystem. Modifications made by other means to the same area of memory, for example by write(), are unaffected. The feature works on a page-granularity basis. MAP_NOSYNC allows one to use mmap() to share memory between processes without incuring any significant filesystem overhead, putting it in the same performance category as SysV Shared memory and anonymous memory. Reviewed by: julian, alc, dg
* The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) toalc1999-10-301-16/+9
| | | | | | | | | eliminate an extra (useless) level of indirection in half of the page queue accesses and (2) to use a single name for each queue throughout, instead of, e.g., "vm_page_queue_active" in some places and "vm_page_queues[PQ_ACTIVE]" in others. Reviewed by: dillon
* Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>dillon1999-09-171-1/+3
| | | | | | | | | | | | | | | | Replace various VM related page count calculations strewn over the VM code with inlines to aid in readability and to reduce fragility in the code where modules depend on the same test being performed to properly sleep and wakeup. Split out a portion of the page deactivation code into an inline in vm_page.c to support vm_page_dontneed(). add vm_page_dontneed(), which handles the madvise MADV_DONTNEED feature in a related commit coming up for vm_map.c/vm_object.c. This code prevents degenerate cases where an essentially active page may be rotated through a subset of the paging lists, resulting in premature disposal.
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Unbreak the nfs KLD_MODULE. It needs a bit more of vm_page.h than wasgreen1999-08-171-3/+5
| | | | | exported (notably vm_page_undirty()). Also, let vm_page_dirty() work in a KLD.
* Add the (inline) function vm_page_undirty for clearing the dirty bitmaskalc1999-08-171-1/+13
| | | | | | | | of a vm_page. Use it. Submitted by: dillon
* contigmalloc1 (currently) depends on PQ_FREE and PQ_CACHE not being 0alc1999-08-151-2/+2
| | | | | to tell a valid "struct vm_page" from an invalid one in the vm_page_array. This isn't a very robust method.
* Add back in old definitions if we're compiling for alpha.mjacob1999-08-151-1/+10
|
* Don't create a "struct vpgqueues" for PQ_NONE.alc1999-08-141-7/+7
|
* Make the default page coloring parameters match a (non-Xeon) Pentium II/III.alc1999-08-121-2/+8
| | | | | | | | | | This setting is also acceptable for Celerons and Pentium Pros with less than 1MB L2 caches. Note: PQ_L2_SIZE is a misnomer. The correct number of colors is a function of the cache's degree of associativity as well as its size. Submitted by: bde and alc
* Change the type of vpgqueues::lcnt from "int *" to "int". The indirectionalc1999-07-311-2/+2
| | | | served no purpose.
* Reduce the number of "magic constants" used for page coloringalc1999-07-221-7/+1
| | | | | by one: PQ_PRIME2 and PQ_PRIME3 are used to accomplish the same thing at different places in the kernel. Drop PQ_PRIME3.
* Remove (1) "extern" declarations for variables that were previouslyalc1999-06-221-4/+1
| | | | made "static" and (2) initialized but unused variables.
* Remove some unused function and variable declarations.alc1999-06-191-11/+1
|
* The VFS/BIO subsystem contained a number of hacks in order to optimizealc1999-05-021-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
* Catch a case spotted by Tor where files mmapped could leave garbage in thejulian1999-04-051-1/+2
| | | | | | | | | | | | unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>
* Fix breakage in last commitjulian1999-03-151-3/+3
| | | | Submitted by: Brian Feldman <green@unixhelp.org>
* A bit of a hack, but allows the vn device to be a module again.julian1999-03-141-1/+15
| | | | Submitted by: Matt Dillon <dillon@freebsd.org>
* Minor reorganization of vm_page_alloc(). No functional changes havedillon1999-02-151-2/+2
| | | | | | been made but the code has been reorganized and documented to make it more readable, reduce the size of the code, and optimize the branch path caching capabilities that most modern processors have.
* Rip out PQ_ZERO queue. PQ_ZERO functionality is now combined in withdillon1999-02-081-14/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | PQ_FREE. There is little operational difference other then the kernel being a few kilobytes smaller and the code being more readable. * vm_page_select_free() has been *greatly* simplified. * The PQ_ZERO page queue and supporting structures have been removed * vm_page_zero_idle() revamped (see below) PG_ZERO setting and clearing has been migrated from vm_page_alloc() to vm_page_free[_zero]() and will eventually be guarenteed to remain tracked throughout a page's life ( if it isn't already ). When a page is freed, PG_ZERO pages are appended to the appropriate tailq in the PQ_FREE queue while non-PG_ZERO pages are prepended. When locating a new free page, PG_ZERO selection operates from within vm_page_list_find() ( get page from end of queue instead of beginning of queue ) and then only occurs in the nominal critical path case. If the nominal case misses, both normal and zero-page allocation devolves into the same _vm_page_list_find() select code without any specific zero-page optimizations. Additionally, vm_page_zero_idle() has been revamped. Hysteresis has been added and zero-page tracking adjusted to conform with the other changes. Currently hysteresis is set at 1/3 (lo) and 1/2 (hi) the number of free pages. We may wish to increase both parameters as time permits. The hysteresis is designed to avoid silly zeroing in borderline allocation/free situations.
* Remove L1 cache coloring optimization ( leave L2 cache coloring opt ).dillon1999-02-071-8/+16
| | | | | Rewrite vm_page_list_find() and vm_page_select_free() - make inline out of nominal case.
* Add vm_page_dirty() inline with PQ_CACHE sanity checkdillon1999-01-241-1/+20
|
* Add invariants to vm_page_busy() and vm_page_wakeup() to check fordillon1999-01-241-1/+12
| | | | PG_BUSY stupidity.
* The TAILQ hashq has been turned into a singly-linked=list link,dillon1999-01-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | reducing the size of vm_page_t. SWAPBLK_NONE and SWAPBLK_MASK are defined here. These actually are more generalized then their names imply, but their placement is somewhat of a legacy issue from a prior test version of this code that put the swapblk in the vm_page_t structure. That test code was eventually thrown away. The legacy remains. Added vm_page_flash() inline. Similar to vm_page_wakeup() except that it does not clear PG_BUSY ( one assumes that PG_BUSY is already clear ). Used by a number of routines to wakeup waiters. Collapsed some of the code in inline calls to make other inline calls. GCC will optimize this well and it reduces duplication. vm_page_free() and vm_page_free_zero() inlines added to convert to the proper vm_page_free_toq() call. vm_page_sleep_busy() inline added, replacing vm_page_sleep() ( which has been removed ). This implements a much more optimizable page-waiting function.
* This is a rather large commit that encompasses the new swapper,dillon1999-01-211-21/+111
| | | | | | | | | | changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
* Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT aseivind1999-01-081-6/+2
| | | | | | | | | discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith
* Added a second argument, "activate" to the vm_page_unwire() call so thatdg1998-10-281-2/+2
| | | | the caller can select either inactive or active queue to put the page on.
* Nuked PG_TABLED flag. Replaced with m->object != NULL.dg1998-10-211-2/+1
|
* Cosmetic changes to the PAGE_XXX macros to make them consistent withdfr1998-09-041-21/+47
| | | | the other objects in vm.
* Separate wakeup conditions for page I/O count (pg_busy) and lock (PG_BUSY).wollman1998-09-011-4/+3
| | | | | | | This is not sa completely solution to the deadlock, but the additional wakeups have helped in my observation. Suggested by: John Dyson
* Change various syscalls to use size_t arguments instead of u_int.dfr1998-08-241-8/+16
| | | | | | | | | | Add some overflow checks to read/write (from bde). Change all modifications to vm_page::flags, vm_page::busy, vm_object::flags and vm_object::paging_in_progress to use operations which are not interruptable. Reviewed by: Bruce Evans <bde@zeta.org.au>
OpenPOWER on IntegriCloud