summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_page.h
Commit message (Collapse)AuthorAgeFilesLines
* - Add vm_paddr_t, a physical address type. This is required for systemsjake2003-03-251-2/+2
| | | | | | | | | | | | | | | where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long. Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms. Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)
* - Remove vm_page_sleep_busy(). The transition to vm_page_sleep_if_busy(),alc2002-12-191-1/+0
| | | | | | which incorporates page queue and field locking, is complete. - Assert that the page queue lock rather than Giant is held in vm_page_flag_set().
* Remove vm_page_protect(). Instead, use pmap_page_protect() directly.alc2002-11-181-1/+0
|
* Export the function vm_page_splay().alc2002-11-041-0/+1
|
* - Add a new flag to vm_page_alloc, VM_ALLOC_NOOBJ. This tellsjeff2002-11-011-3/+4
| | | | | | | | | | vm_page_alloc not to insert this page into an object. The pindex is still used for colorization. - Rework vm_page_select_* to accept a color instead of an object and pindex to work with VM_PAGE_NOOBJ. - Document other VM_ALLOC_ flags. Reviewed by: peter, jake
* o Reinline vm_page_undirty(), reducing the kernel size. (This revertsalc2002-10-201-1/+11
| | | | a part of vm_page.h revision 1.87 and vm_page.c revision 1.167.)
* Replace the vm_page hash table with a per-vmobject splay tree. There shoulddillon2002-10-181-1/+2
| | | | | | | | | | | | | | | | be no major change in performance from this change at this time but this will allow other work to progress: Giant lock removal around VM system in favor of per-object mutexes, ranged fsyncs, more optimal COMMIT rpc's for NFS, partial filesystem syncs by the syncer, more optimal object flushing, etc. Note that the buffer cache is already using a similar splay tree mechanism. Note that a good chunk of the old hash table code is still in the tree. Alan or I will remove it prior to the release if the new code does not introduce unsolvable bugs, else we can revert more easily. Submitted by: alc (this is Alan's code) Approved by: re
* - Split UMA_ZFLAG_OFFPAGE into UMA_ZFLAG_OFFPAGE and UMA_ZFLAG_HASH.jeff2002-09-181-0/+1
| | | | | | | - Remove all instances of the mallochash. - Stash the slab pointer in the vm page's object pointer when allocating from the kmem_obj. - Use the overloaded object pointer to find slabs for malloced memory.
* o Retire vm_page_zero_fill() and vm_page_zero_fill_area(). Ever sincealc2002-08-251-2/+0
| | | | | | pmap_zero_page() and pmap_zero_page_area() were modified to accept a struct vm_page * instead of a physical address, vm_page_zero_fill() and vm_page_zero_fill_area() have served no purpose.
* o Remove the setting and clearing of the PG_MAPPED flag from the alpha andalc2002-08-101-1/+0
| | | | | ia64 pmap. o Remove the PG_MAPPED flag's declaration.
* o Introduce vm_page_sleep_if_busy() as an eventual replacement foralc2002-07-291-0/+1
| | | | | vm_page_sleep_busy(). vm_page_sleep_if_busy() uses the page queues lock.
* o Modify vm_page_grab() to accept VM_ALLOC_WIRED.alc2002-07-281-1/+1
|
* o Remove dead and/or unused code.alc2002-07-201-2/+0
|
* o Introduce an argument, VM_ALLOC_WIRED, that requests vm_page_alloc()alc2002-07-181-1/+5
| | | | | | | | | | to return a wired page. o Use VM_ALLOC_WIRED within Alpha's pmap_growkernel(). Also, because Alpha's pmap_growkernel() calls vm_page_alloc() from within a critical section, specify VM_ALLOC_INTERRUPT instead of VM_ALLOC_SYSTEM. (Only VM_ALLOC_INTERRUPT is implemented entirely with a spin mutex.) o Assert that the page queues mutex is held in vm_page_wire() on Alpha, just like the other platforms.
* o Complete the locking of page queue accesses by vm_page_unwire().alc2002-07-131-4/+3
| | | | | | o Assert that the page queues lock is held in vm_page_unwire(). o Make vm_page_lock_queues() and vm_page_unlock_queues() visible to kernel loadable modules.
* o Resurrect vm_page_lock_queues(), vm_page_unlock_queues(), and the freealc2002-07-041-0/+5
| | | | | | queue lock (revision 1.33 of vm/vm_page.c removed them). o Make the free queue lock a spin lock because it's sometimes acquired inside of a critical section.
* At long last, commit the zero copy sockets code.ken2002-06-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MAKEDEV: Add MAKEDEV glue for the ti(4) device nodes. ti.4: Update the ti(4) man page to include information on the TI_JUMBO_HDRSPLIT and TI_PRIVATE_JUMBOS kernel options, and also include information about the new character device interface and the associated ioctls. man9/Makefile: Add jumbo.9 and zero_copy.9 man pages and associated links. jumbo.9: New man page describing the jumbo buffer allocator interface and operation. zero_copy.9: New man page describing the general characteristics of the zero copy send and receive code, and what an application author should do to take advantage of the zero copy functionality. NOTES: Add entries for ZERO_COPY_SOCKETS, TI_PRIVATE_JUMBOS, TI_JUMBO_HDRSPLIT, MSIZE, and MCLSHIFT. conf/files: Add uipc_jumbo.c and uipc_cow.c. conf/options: Add the 5 options mentioned above. kern_subr.c: Receive side zero copy implementation. This takes "disposable" pages attached to an mbuf, gives them to a user process, and then recycles the user's page. This is only active when ZERO_COPY_SOCKETS is turned on and the kern.ipc.zero_copy.receive sysctl variable is set to 1. uipc_cow.c: Send side zero copy functions. Takes a page written by the user and maps it copy on write and assigns it kernel virtual address space. Removes copy on write mapping once the buffer has been freed by the network stack. uipc_jumbo.c: Jumbo disposable page allocator code. This allocates (optionally) disposable pages for network drivers that want to give the user the option of doing zero copy receive. uipc_socket.c: Add kern.ipc.zero_copy.{send,receive} sysctls that are enabled if ZERO_COPY_SOCKETS is turned on. Add zero copy send support to sosend() -- pages get mapped into the kernel instead of getting copied if they meet size and alignment restrictions. uipc_syscalls.c:Un-staticize some of the sf* functions so that they can be used elsewhere. (uipc_cow.c) if_media.c: In the SIOCGIFMEDIA ioctl in ifmedia_ioctl(), avoid calling malloc() with M_WAITOK. Return an error if the M_NOWAIT malloc fails. The ti(4) driver and the wi(4) driver, at least, call this with a mutex held. This causes witness warnings for 'ifconfig -a' with a wi(4) or ti(4) board in the system. (I've only verified for ti(4)). ip_output.c: Fragment large datagrams so that each segment contains a multiple of PAGE_SIZE amount of data plus headers. This allows the receiver to potentially do page flipping on receives. if_ti.c: Add zero copy receive support to the ti(4) driver. If TI_PRIVATE_JUMBOS is not defined, it now uses the jumbo(9) buffer allocator for jumbo receive buffers. Add a new character device interface for the ti(4) driver for the new debugging interface. This allows (a patched version of) gdb to talk to the Tigon board and debug the firmware. There are also a few additional debugging ioctls available through this interface. Add header splitting support to the ti(4) driver. Tweak some of the default interrupt coalescing parameters to more useful defaults. Add hooks for supporting transmit flow control, but leave it turned off with a comment describing why it is turned off. if_tireg.h: Change the firmware rev to 12.4.11, since we're really at 12.4.11 plus fixes from 12.4.13. Add defines needed for debugging. Remove the ti_stats structure, it is now defined in sys/tiio.h. ti_fw.h: 12.4.11 firmware. ti_fw2.h: 12.4.11 firmware, plus selected fixes from 12.4.13, and my header splitting patches. Revision 12.4.13 doesn't handle 10/100 negotiation properly. (This firmware is the same as what was in the tree previously, with the addition of header splitting support.) sys/jumbo.h: Jumbo buffer allocator interface. sys/mbuf.h: Add a new external mbuf type, EXT_DISPOSABLE, to indicate that the payload buffer can be thrown away / flipped to a userland process. socketvar.h: Add prototype for socow_setup. tiio.h: ioctl interface to the character portion of the ti(4) driver, plus associated structure/type definitions. uio.h: Change prototype for uiomoveco() so that we'll know whether the source page is disposable. ufs_readwrite.c:Update for new prototype of uiomoveco(). vm_fault.c: In vm_fault(), check to see whether we need to do a page based copy on write fault. vm_object.c: Add a new function, vm_object_allocate_wait(). This does the same thing that vm_object allocate does, except that it gives the caller the opportunity to specify whether it should wait on the uma_zalloc() of the object structre. This allows vm objects to be allocated while holding a mutex. (Without generating WITNESS warnings.) vm_object_allocate() is implemented as a call to vm_object_allocate_wait() with the malloc flag set to M_WAITOK. vm_object.h: Add prototype for vm_object_allocate_wait(). vm_page.c: Add page-based copy on write setup, clear and fault routines. vm_page.h: Add page based COW function prototypes and variable in the vm_page structure. Many thanks to Drew Gallatin, who wrote the zero copy send and receive code, and to all the other folks who have tested and reviewed this code over the years.
* Turn VM_ALLOC_ZERO into a flag.jeff2002-06-251-1/+1
| | | | | Submitted by: tegge Reviewed by: dillon
* o Remove unused #defines.alc2002-05-271-9/+0
|
* We do not necessarily need to map/unmap pages to zero parts of them.peter2002-04-281-0/+1
| | | | | On systems where physical memory is also direct mapped (alpha, sparc, ia64 etc) this is slightly harmful.
* - Remove a number of extra newlines that do not belong here according toeivind2002-03-101-4/+2
| | | | | | | | | style(9) - Minor space adjustment in cases where we have "( ", " )", if(), return(), while(), for(), etc. - Add /* SYMBOL */ after a few #endifs. Reviewed by: alc
* o Create vm_pageq_enqueue() to encapsulate code that is duplicated timealc2002-03-041-8/+1
| | | | | | and again in vm_page.c and vm_pageq.c. o Delete unusused prototypes. (Mainly a result of the earlier renaming of various functions from vm_page_*() to vm_pageq_*().)
* Remove some long dead code.alc2002-03-021-9/+0
|
* Add a page queue, PQ_HOLD, that temporarily owns pages with nonzero holdtegge2002-02-191-1/+2
| | | | | | | | count that would otherwise be on one of the free queues. This eliminates a panic when broken programs unmap memory that still has pending IO from raw devices. Reviewed by: dillon, alc
* Implement idle zeroing of pages. I've been tinkering with thispeter2001-08-251-0/+1
| | | | | | | | | | | | | | on and off since John Dyson left his work-in-progress. It is off by default for now. sysctl vm.zeroidle_enable=1 to turn it on. There are some hacks here to deal with the present lack of preemption - we yield after doing a small number of pages since we wont preempt otherwise. This is basically Matt's algorithm [with hysteresis] with an idle process to call it in a similar way it used to be called from the idle loop. I cleaned up the includes a fair bit here too.
* Oops. Last commit to vm_object.c should have got these files too.jake2001-07-311-1/+0
| | | | | | | Remove the use of atomic ops to manipulate vm_object and vm_page flags. Giant is required here, so they are superfluous. Discussed with: dillon
* make vm_page_select_cache staticassar2001-07-231-1/+0
| | | | Requested by: bde
* (vm_page_select_cache): add prototypeassar2001-07-211-0/+1
|
* Reorg vm_page.c into vm_page.c, vm_pageq.c, and vm_contig.c (for contigmalloc).dillon2001-07-041-0/+17
| | | | | | | | | | | Also removed some spl's and added some VM mutexes, but they are not actually used yet, so this commit does not really make any operational changes to the system. vm_page.c relates to vm_page_t manipulation, including high level deactivation, activation, etc... vm_pageq.c relates to finding free pages and aquiring exclusive access to a page queue (exclusivity part not yet implemented). And the world still builds... :-)
* Change inlines back into mainline code in preparation for mutexing. Also,dillon2001-07-041-314/+45
| | | | | | | | most of these inlines had been bloated in -current far beyond their original intent. Normalize prototypes and function declarations to be ANSI only (half already were). And do some general cleanup. (kernel size also reduced by 50-100K, but that isn't the prime intent)
* With Alfred's permission, remove vm_mtx in favor of a fine-grained approachdillon2001-07-041-22/+14
| | | | | | | | | (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
* This patch implements O_DIRECT about 80% of the way. It takes a patchsetdillon2001-05-241-0/+1
| | | | | | | | | | | | | | | | Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon
* Introduce a global lock for the vm subsystem (vm_mtx).alfred2001-05-191-6/+25
| | | | | | | | | | | | | | | | | | | vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
* This implements a better launder limiting solution. There was a solutiondillon2000-12-261-0/+1
| | | | | | | | | | | | | | | | | | | in 4.2-REL which I ripped out in -stable and -current when implementing the low-memory handling solution. However, maxlaunder turns out to be the saving grace in certain very heavily loaded systems (e.g. newsreader box). The new algorithm limits the number of pages laundered in the first pageout daemon pass. If that is not sufficient then suceessive will be run without any limit. Write I/O is now pipelined using two sysctls, vfs.lorunningspace and vfs.hirunningspace. This prevents excessive buffered writes in the disk queues which cause long (multi-second) delays for reads. It leads to more stable (less jerky) and generally faster I/O streaming to disk by allowing required read ops (e.g. for indirect blocks and such) to occur without interrupting the write stream, amoung other things. NOTE: eventually, filesystem write I/O pipelining needs to be done on a per-device basis. At the moment it is globalized.
* Implement a low-memory deadlock solution.dillon2000-11-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
* Make the arguments match the functionality of the functions.obrien2000-08-261-2/+2
|
* #elsif -> #elifalfred2000-07-111-8/+8
| | | | Noticed by: green
* Replace the PQ_*CACHE options with a single PQ_CACHESIZE option that youjhb2000-07-041-24/+31
| | | | | | | set equal to the number of kilobytes in your cache. The old options are still supported for backwards compatibility. Submitted by: Kelly Yancey <kbyanc@posi.net>
* This is a cleanup patch to Peter's new OBJT_PHYS VM object typedillon2000-05-291-0/+9
| | | | | | | | | | | | | | | | | and sysv shared memory support for it. It implements a new PG_UNMANAGED flag that has slightly different characteristics from PG_FICTICIOUS. A new sysctl, kern.ipc.shm_use_phys has been added to enable the use of physically-backed sysv shared memory rather then swap-backed. Physically backed shm segments are not tracked with PV entries, allowing programs which use a large shm segment as a rendezvous point to operate without eating an insane amount of KVM in the PV entry management. Read: Oracle. Peter's OBJT_PHYS object will also allow us to eventually implement page-table sharing and/or 4MB physical page support for such segments. We're half way there.
* Back out the previous change to the queue(3) interface.jake2000-05-261-3/+3
| | | | | | It was not discussed and should probably not happen. Requested by: msmith and others
* Change the way that the queue(3) structures are declared; don't assume thatjake2000-05-231-3/+3
| | | | | | | | the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
* Implement an optimization of the VM<->pmap API. Pass vm_page_t's directlypeter2000-05-211-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry. Inspired by: John Dyson's 1998 patches. Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha). Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits). Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files. Reviewed by: dillon
* Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"peter1999-12-291-2/+2
| | | | | | is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
* Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC todillon1999-12-121-4/+4
| | | | | | | | | | | | | | | | | madvise(). This feature prevents the update daemon from gratuitously flushing dirty pages associated with a mapped file-backed region of memory. The system pager will still page the memory as necessary and the VM system will still be fully coherent with the filesystem. Modifications made by other means to the same area of memory, for example by write(), are unaffected. The feature works on a page-granularity basis. MAP_NOSYNC allows one to use mmap() to share memory between processes without incuring any significant filesystem overhead, putting it in the same performance category as SysV Shared memory and anonymous memory. Reviewed by: julian, alc, dg
* The core of this patch is to vm/vm_page.h. The effects are two-fold: (1) toalc1999-10-301-16/+9
| | | | | | | | | eliminate an extra (useless) level of indirection in half of the page queue accesses and (2) to use a single name for each queue throughout, instead of, e.g., "vm_page_queue_active" in some places and "vm_page_queues[PQ_ACTIVE]" in others. Reviewed by: dillon
* Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>dillon1999-09-171-1/+3
| | | | | | | | | | | | | | | | Replace various VM related page count calculations strewn over the VM code with inlines to aid in readability and to reduce fragility in the code where modules depend on the same test being performed to properly sleep and wakeup. Split out a portion of the page deactivation code into an inline in vm_page.c to support vm_page_dontneed(). add vm_page_dontneed(), which handles the madvise MADV_DONTNEED feature in a related commit coming up for vm_map.c/vm_object.c. This code prevents degenerate cases where an essentially active page may be rotated through a subset of the paging lists, resulting in premature disposal.
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Unbreak the nfs KLD_MODULE. It needs a bit more of vm_page.h than wasgreen1999-08-171-3/+5
| | | | | exported (notably vm_page_undirty()). Also, let vm_page_dirty() work in a KLD.
* Add the (inline) function vm_page_undirty for clearing the dirty bitmaskalc1999-08-171-1/+13
| | | | | | | | of a vm_page. Use it. Submitted by: dillon
* contigmalloc1 (currently) depends on PQ_FREE and PQ_CACHE not being 0alc1999-08-151-2/+2
| | | | | to tell a valid "struct vm_page" from an invalid one in the vm_page_array. This isn't a very robust method.
OpenPOWER on IntegriCloud