summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Add a new file descriptor type for IPC shared memory objects and use it tojhb2008-01-081-3/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | implement shm_open(2) and shm_unlink(2) in the kernel: - Each shared memory file descriptor is associated with a swap-backed vm object which provides the backing store. Each descriptor starts off with a size of zero, but the size can be altered via ftruncate(2). The shared memory file descriptors also support fstat(2). read(2), write(2), ioctl(2), select(2), poll(2), and kevent(2) are not supported on shared memory file descriptors. - shm_open(2) and shm_unlink(2) are now implemented as system calls that manage shared memory file descriptors. The virtual namespace that maps pathnames to shared memory file descriptors is implemented as a hash table where the hash key is generated via the 32-bit Fowler/Noll/Vo hash of the pathname. - As an extension, the constant 'SHM_ANON' may be specified in place of the path argument to shm_open(2). In this case, an unnamed shared memory file descriptor will be created similar to the IPC_PRIVATE key for shmget(2). Note that the shared memory object can still be shared among processes by sharing the file descriptor via fork(2) or sendmsg(2), but it is unnamed. This effectively serves to implement the getmemfd() idea bandied about the lists several times over the years. - The backing store for shared memory file descriptors are garbage collected when they are not referenced by any open file descriptors or the shm_open(2) virtual namespace. Submitted by: dillon, peter (previous versions) Submitted by: rwatson (I based this on his version) Reviewed by: alc (suggested converting getmemfd() to shm_open())
* When MAC is enabled in the kernel, fix a panic triggered by a lockingcsjp2008-01-081-8/+8
| | | | | | | | | | | | | | assertion hit in swapoff_one() when we un-mount a swap partition. We should be using curthread where we used thread0 before. This change also replaces the thread argument with a credential argument, as the MAC framework only requires the cred. It should be noted that this allows the machine to be rebooted without panicing with "cannot differ from curthread or NULL" when MAC is enabled. Submitted by: rwatson Reviewed by: attilio MFC after: 2 weeks
* In the vm_map_stack(), check for the specified stack region wraparound.kib2008-01-041-1/+3
| | | | | | Reported and tested by: Peter Holm Reviewed by: alc MFC after: 3 days
* Add an access type parameter to pmap_enter(). It will be used to implementalc2008-01-033-6/+8
| | | | | | | superpage promotion. Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.
* Defer setting either PG_CACHED or PG_FREE until after the free pagealc2008-01-021-2/+2
| | | | | | | | queues lock is acquired. Otherwise, the state of a reservation's pages' flags and its population count can be inconsistent. That could result in a page being freed twice. Reported by: kris
* Correct a style error that was introduced in revision 1.77.alc2008-01-011-1/+1
|
* Add the superpage reservation system. This is "part 2 of 2" of thealc2007-12-295-4/+834
| | | | | | | | | | | | machine-independent support for superpages. (The earlier part was the rewrite of the physical memory allocator.) The remainder of the code required for superpages support is machine-dependent and will be added to the various pmap implementations at a later date. Initially, I am only supporting one large page size per architecture. Moreover, I am only enabling the reservation system on amd64. (In an emergency, it can be disabled by setting VM_NRESERVLEVELS to 0 in amd64/include/vmparam.h or your kernel configuration file.)
* Add a list of reservations to the vm object structure.alc2007-12-271-0/+2
| | | | | | | | Recycle the vm object's "pg_color" field to represent the color of the first virtual page address at which the object is mapped instead of the color of the object's first physical page. Since an object may not be mapped, introduce a flag "OBJ_COLORED" that indicates whether "pg_color" is valid.
* Add the superpage reservation type.alc2007-12-271-0/+3
|
* Update the comment describing vm_phys_unfree_page().alc2007-12-211-1/+3
|
* Modify vm_phys_unfree_page() so that it no longer requires the givenalc2007-12-203-10/+19
| | | | | | | | | | page to be in the free lists. Instead, it now returns TRUE if it removed the page from the free lists and FALSE if the page was not in the free lists. This change is required to support superpage reservations. Specifically, once reservations are introduced, a cached page can either be in the free lists or a reservation.
* Correct one half of a loop continuation condition in vm_phys_unfree_page().alc2007-12-191-3/+1
| | | | | At present, this error is inconsequential; the other half of the loop continuation condition is sufficient to achieve correct execution.
* Eliminate redundant code from vm_page_startup().alc2007-12-191-17/+0
|
* Simplify vm_page_free_toq().alc2007-12-111-5/+3
|
* Correct a comment.alc2007-12-021-1/+1
|
* Modify stack(9) stack_print() and stack_sbuf_print() routines to use newrwatson2007-12-011-4/+4
| | | | | | | | | | | | | | | | linker interfaces for looking up function names and offsets from instruction pointers. Create two variants of each call: one that is "DDB-safe" and avoids locking in the linker, and one that is safe for use in live kernels, by virtue of observing locking, and in particular safe when kernel modules are being loaded and unloaded simultaneous to their use. This will allow them to be used outside of debugging contexts. Modify two of three current stack(9) consumers to use the DDB-safe interfaces, as they run in low-level debugging contexts, such as inside lockmgr(9) and the kernel memory allocator. Update man page.
* Make contigmalloc(9)'s page laundering more robust. Specifically, usealc2007-11-253-4/+8
| | | | | | | | | vm_pageout_fallback_object_lock() in vm_contig_launder_page() to better handle a lock-ordering problem. Consequently, trylock's failure on the page's containing object no longer implies that the page cannot be laundered. MFC after: 6 weeks
* Tidy up: Add comments. Eliminate the pointlessalc2007-11-251-19/+23
| | | | | | | malloc_type_allocated(..., 0) calls that occur when contigmalloc() has failed. Eliminate the acquisition and release of the page queues lock from vm_page_release_contig(). Rename contigmalloc2() to contigmapping(), reflecting what it does.
* Add a read/write sysctl for reconfiguring the maximum number of physicalalc2007-11-231-0/+2
| | | | | | | | pages that can be wired. Submitted by: Eugene Grosbein PR: 114654 MFC after: 6 weeks
* Remove an unnecessary call to pmap_remove_all() and the associated "XXX"alc2007-11-221-15/+1
| | | | | | | | comments from vnode_pager_setsize(). This call was introduced in revision 1.140 to address a problem that no longer exists. Specifically, pmap_zero_page_area() has replaced a (possibly) problematic implementation of page zeroing that was based on vm_pager_map(), bzero(), and vm_pager_unmap().
* When reactivating a cached page, reset the page's pool to the defaultalc2007-11-211-0/+1
| | | | | pool. (Not doing this before was a performance pessimization but not a cause for panic.)
* Prevent the leakage of wired pages in the following circumstances:alc2007-11-172-1/+14
| | | | | | | | | | | | | | | | | | | | | | First, a file is mmap(2)ed and then mlock(2)ed. Later, it is truncated. Under "normal" circumstances, i.e., when the file is not mlock(2)ed, the pages beyond the EOF are unmapped and freed. However, when the file is mlock(2)ed, the pages beyond the EOF are unmapped but not freed because they have a non-zero wire count. This can be a mistake. Specifically, it is a mistake if the sole reason why the pages are wired is because of wired, managed mappings. Previously, unmapping the pages destroys these wired, managed mappings, but does not reduce the pages' wire count. Consequently, when the file is unmapped, the pages are not unwired because the wired mapping has been destroyed. Moreover, when the vm object is finally destroyed, the pages are leaked because they are still wired. The fix is to reduce the pages' wired count by the number of wired, managed mappings destroyed. To do this, I introduce a new pmap function pmap_page_wired_mappings() that returns the number of managed mappings to the given physical page that are wired, and I use this function in vm_object_page_remove(). Reviewed by: tegge MFC after: 6 weeks
* Change unused 'user_wait' argument to 'timo' argument, which will bepjd2007-11-073-7/+7
| | | | | | | used to specify timeout for msleep(9). Discussed with: alc Reviewed by: alc
* Fix for the panic("vm_thread_new: kstack allocation failed") andkib2007-11-054-23/+51
| | | | | | | | | | | | | | | | | | | | silent NULL pointer dereference in the i386 and sparc64 pmap_pinit() when the kmem_alloc_nofault() failed to allocate address space. Both functions now return error instead of panicing or dereferencing NULL. As consequence, vmspace_exec() and vmspace_unshare() returns the errno int. struct vmspace arg was added to vm_forkproc() to avoid dealing with failed allocation when most of the fork1() job is already done. The kernel stack for the thread is now set up in the thread_alloc(), that itself may return NULL. Also, allocation of the first process thread is performed in the fork1() to properly deal with stack allocation failure. proc_linkup() is separated into proc_linkup() called from fork1(), and proc_linkup0(), that is used to set up the kernel process (was known as swapper). In collaboration with: Peter Holm Reviewed by: jhb
* The intent of the freeing the (zeroed) page in vm_page_cache() forkib2007-11-051-2/+5
| | | | | | | | | | | | | | | | | default object rather than cache it was to have vm_pager_has_page(object, pindex, ...) == FALSE to imply that there is no cached page in object at pindex. This allows to avoid explicit checks for cached pages in vm_object_backing_scan(). For now, we need the same bandaid for the swap object, otherwise both the vm_page_lookup() and the pager can report that there is no page at offset, while page is stored in the cache. Also, this fixes another instance of the KASSERT("object type is incompatible") failure in the vm_page_cache_transfer(). Reported and tested by: Peter Holm Reviewed by: alc MFC after: 3 days
* o Fix panic message: it's swap_pager_putpages() not swap_pager_getpages().maxim2007-11-021-1/+1
| | | | Submitted by: Mark Tinguely
* Correct a copy and paste'o in phys_pager.c, we are talking about phys hereremko2007-10-301-1/+1
| | | | | | | and not about devices. PR: 93755 Approved by: imp (mentor, implicit when re-assigning the ticket to me).
* Change vm_page_cache_transfer() such that it does not transfer pagesalc2007-10-271-10/+20
| | | | | | | | | that would have an offset beyond the end of the target object. Such pages should remain in the source object. MFC after: 3 days Diagnosed and reviewed by: Kostik Belousov Reported and tested by: Peter Holm
* Merge first in a series of TrustedBSD MAC Framework KPI changesrwatson2007-10-242-3/+3
| | | | | | | | | | | | | | | | | | | | | | | from Mac OS X Leopard--rationalize naming for entry points to the following general forms: mac_<object>_<method/action> mac_<object>_check_<method/action> The previous naming scheme was inconsistent and mostly reversed from the new scheme. Also, make object types more consistent and remove spaces from object types that contain multiple parts ("posix_sem" -> "posixsem") to make mechanical parsing easier. Introduce a new "netinet" object type for certain IPv4/IPv6-related methods. Also simplify, slightly, some entry point names. All MAC policy modules will need to be recompiled, and modules not updates as part of this commit will need to be modified to conform to the new KPI. Sponsored by: SPARTA (original patches against Mac OS X) Obtained from: TrustedBSD Project, Apple Computer
* Correct an error of omission in the reimplementation of the pagealc2007-10-221-0/+4
| | | | | | | | | | cache: vnode_pager_setsize() must handle the case where a file is truncated to a non-page-size-aligned boundary and there is a cached page underlying the new end of file. Reported by: kris, tegge Tested by: kris MFC after: 3 days
* Correct an error in vm_map_sync(), nee vm_map_clean(), that has existedalc2007-10-221-2/+4
| | | | | | | | | | | | since revision 1.1. Specifically, neither traversal of the vm map checks whether the end of the vm map has been reached. Consequently, the first traversal can wrap around and bogusly return an error. This error has gone unnoticed for so long because no one had ever before tried msync(2)ing a region above the stack. Reported by: peter MFC after: 1 week
* Rename the kthread_xxx (e.g. kthread_create()) callsjulian2007-10-201-1/+1
| | | | | | | | | | | to kproc_xxx as they actually make whole processes. Thos makes way for us to add REAL kthread_create() and friends that actually make theads. it turns out that most of these calls actually end up being moved back to the thread version when it's added. but we need to make this cosmetic change first. I'd LOVE to do this rename in 7.0 so that we can eventually MFC the new kthread_xxx() calls.
* The previous revision, updating vm_object_page_remove() for the new pagealc2007-10-181-1/+2
| | | | | | | | | cache, did not account for the case where the vm object has nothing but cached pages. Reported by: kris, tegge Reviewed by: tegge MFC after: 3 days
* Fix cosmetic bug in stale copy of msync_args. 'len' is size_t, not int.peter2007-10-181-1/+1
|
* Fix CTL_VM_NAMES.ru2007-10-161-8/+8
|
* Allow recursion on the 'zones' internal UMA zone.jhb2007-10-111-1/+1
| | | | | | | Submitted by: thompsa MFC after: 1 week Approved by: re (kensmith) Discussed with: jeff
* Do not dereference NULL pointer.kib2007-10-081-3/+2
| | | | | | Reported by: Peter Holm Reviewed by: alc Approved by: re (kensmith)
* In the rare case that vm_page_cache() actually frees the given page,alc2007-10-081-10/+3
| | | | | | | | | | | it must first ensure that the page is no longer mapped. This is trivially accomplished by calling pmap_remove_all() a little earlier in vm_page_cache(). While I'm in the neighborbood, make a related panic message a little more useful. Approved by: re (kensmith) Reported by: Peter Holm and Konstantin Belousov Reviewed by: Konstantin Belousov
* Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that isalc2007-10-071-1/+1
| | | | | | | | | | | | | a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs when uma_small_free() frees a page. The solution has two parts: (1) Mark pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested. This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable flags, i.e., they do not change state between the time that a page is allocated and freed. Approved by: re (kensmith) PR: 116794
* Correct an error of omission in the reimplementation of the pagealc2007-09-273-18/+48
| | | | | | | | | | | | | | cache: vm_object_page_remove() should convert any cached pages that fall with the specified range to free pages. Otherwise, there could be a problem if a file is first truncated and then regrown. Specifically, some old data from prior to the truncation might reappear. Generalize vm_page_cache_free() to support the conversion of either a subset or the entirety of an object's cached pages. Reported by: tegge Reviewed by: tegge Approved by: re (kensmith)
* Correct an error in the previous revision, specifically,alc2007-09-251-1/+2
| | | | | | | | vm_object_madvise() should request that the reactivated, cached page not be busied. Reported by: Rink Springer Approved by: re (kensmith)
* Change the management of cached pages (PQ_CACHE) in two fundamentalalc2007-09-2511-252/+449
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)
* - Redefine p_swtime and td_slptime as p_swtick and td_slptick. Thisjeff2007-09-211-9/+14
| | | | | | | | | | | | changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re
* - Move all of the PS_ flags into either p_flag or td_flags.jeff2007-09-172-67/+89
| | | | | | | | | | | | | | - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
* Correct an assertion in vm_pageout_flush(). Specifically, if a page'salc2007-09-151-1/+2
| | | | | | | | | | | status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have already been recycled, i.e., freed and reallocated to a new purpose; thus, asserting that such pages cannot be written is inappropriate. Reported by: kris Submitted by: tegge Approved by: re (kensmith) MFC after: 1 week
* Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert().kib2007-08-203-18/+40
| | | | | | | | | | | For this, introduce vm_map_fixed() that does that for MAP_FIXED case. Dropping the lock allowed for parallel thread to occupy the freed space. Reported by: Tijl Coosemans <tijl ulyssis org> Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
* Remove comment that is no longer quite true.kib2007-08-181-3/+0
| | | | | Noted by: alc Approved by: re (kensmith)
* Fix the phys_pager in the way similar to the rev. 1.83 of thekib2007-08-181-22/+25
| | | | | | | | | | | | | | sys/vm/device_pager.c: Protect the creation of the phys pager with non-NULL handle with the phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now synchronized with its removal from the list, and phys_pager_mtx is put before vm object lock in lock order. Dispose the phys_pager_alloc_lock and tsleep calls, together with acquiring Giant, since phys_pager_mtx now covers the same block. Reviewed by: alc Approved by: re (kensmith)
* Protect the creation of the device pager with the dev_pager_mtx. Lookupkib2007-08-071-12/+24
| | | | | | | | | | | of device pager in the pagers list by handle is now synchronized with its removal from the list, and dev_pager_mtx is put before vm object lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx now covers the same block. Noted by: kensmith Reviewed by: alc Approved by: re (kensmith)
* Consider a scenario in which one processor, call it Pt, is performingalc2007-08-054-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vm_object_terminate() on a device-backed object at the same time that another processor, call it Pa, is performing dev_pager_alloc() on the same device. The problem is that vm_pager_object_lookup() should not be allowed to return a doomed object, i.e., an object with OBJ_DEAD set, but it does. In detail, the unfortunate sequence of events is: Pt in vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls vm_pager_object_lookup(), which returns the doomed object. Next, Pa calls vm_object_reference(), which requires the doomed object's lock, so Pa waits for Pt to release the doomed object's lock. Pt proceeds to the point in vm_object_terminate() where it releases the doomed object's lock. Pa is now able to complete vm_object_reference() because it can now complete the acquisition of the doomed object's lock. So, now the doomed object has a reference count of one! Pa releases dev_pager_sx and returns the doomed object from dev_pager_alloc(). Pt now acquires dev_pager_mtx, removes the doomed object from dev_pager_object_list, releases dev_pager_mtx, and finally calls uma_zfree with the doomed object. However, the doomed object is still in use by Pa. Repeating my key point, vm_pager_object_lookup() must not return a doomed object. Moreover, the test for the object's state, i.e., doomed or not, and the increment of the object's reference count should be carried out atomically. Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks
OpenPOWER on IntegriCloud