summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Allow recursion on the 'zones' internal UMA zone.jhb2007-10-111-1/+1
| | | | | | | Submitted by: thompsa MFC after: 1 week Approved by: re (kensmith) Discussed with: jeff
* Do not dereference NULL pointer.kib2007-10-081-3/+2
| | | | | | Reported by: Peter Holm Reviewed by: alc Approved by: re (kensmith)
* In the rare case that vm_page_cache() actually frees the given page,alc2007-10-081-10/+3
| | | | | | | | | | | it must first ensure that the page is no longer mapped. This is trivially accomplished by calling pmap_remove_all() a little earlier in vm_page_cache(). While I'm in the neighborbood, make a related panic message a little more useful. Approved by: re (kensmith) Reported by: Peter Holm and Konstantin Belousov Reviewed by: Konstantin Belousov
* Correct a lock assertion failure in sparc64's pmap_page_is_mapped() that isalc2007-10-071-1/+1
| | | | | | | | | | | | | a consequence of sparc64/sparc64/vm_machdep.c revision 1.76. It occurs when uma_small_free() frees a page. The solution has two parts: (1) Mark pages allocated with VM_ALLOC_NOOBJ as PG_UNMANAGED. (2) Defer the lock assertion in pmap_page_is_mapped() until after PG_UNMANAGED is tested. This is safe because both PG_UNMANAGED and PG_FICTITIOUS are immutable flags, i.e., they do not change state between the time that a page is allocated and freed. Approved by: re (kensmith) PR: 116794
* Correct an error of omission in the reimplementation of the pagealc2007-09-273-18/+48
| | | | | | | | | | | | | | cache: vm_object_page_remove() should convert any cached pages that fall with the specified range to free pages. Otherwise, there could be a problem if a file is first truncated and then regrown. Specifically, some old data from prior to the truncation might reappear. Generalize vm_page_cache_free() to support the conversion of either a subset or the entirety of an object's cached pages. Reported by: tegge Reviewed by: tegge Approved by: re (kensmith)
* Correct an error in the previous revision, specifically,alc2007-09-251-1/+2
| | | | | | | | vm_object_madvise() should request that the reactivated, cached page not be busied. Reported by: Rink Springer Approved by: re (kensmith)
* Change the management of cached pages (PQ_CACHE) in two fundamentalalc2007-09-2511-252/+449
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ways: (1) Cached pages are no longer kept in the object's resident page splay tree and memq. Instead, they are kept in a separate per-object splay tree of cached pages. However, access to this new per-object splay tree is synchronized by the _free_ page queues lock, not to be confused with the heavily contended page queues lock. Consequently, a cached page can be reclaimed by vm_page_alloc(9) without acquiring the object's lock or the page queues lock. This solves a problem independently reported by tegge@ and Isilon. Specifically, they observed the page daemon consuming a great deal of CPU time because of pages bouncing back and forth between the cache queue (PQ_CACHE) and the inactive queue (PQ_INACTIVE). The source of this problem turned out to be a deadlock avoidance strategy employed when selecting a cached page to reclaim in vm_page_select_cache(). However, the root cause was really that reclaiming a cached page required the acquisition of an object lock while the page queues lock was already held. Thus, this change addresses the problem at its root, by eliminating the need to acquire the object's lock. Moreover, keeping cached pages in the object's primary splay tree and memq was, in effect, optimizing for the uncommon case. Cached pages are reclaimed far, far more often than they are reactivated. Instead, this change makes reclamation cheaper, especially in terms of synchronization overhead, and reactivation more expensive, because reactivated pages will have to be reentered into the object's primary splay tree and memq. (2) Cached pages are now stored alongside free pages in the physical memory allocator's buddy queues, increasing the likelihood that large allocations of contiguous physical memory (i.e., superpages) will succeed. Finally, as a result of this change long-standing restrictions on when and where a cached page can be reclaimed and returned by vm_page_alloc(9) are eliminated. Specifically, calls to vm_page_alloc(9) specifying VM_ALLOC_INTERRUPT can now reclaim and return a formerly cached page. Consequently, a call to malloc(9) specifying M_NOWAIT is less likely to fail. Discussed with: many over the course of the summer, including jeff@, Justin Husted @ Isilon, peter@, tegge@ Tested by: an earlier version by kris@ Approved by: re (kensmith)
* - Redefine p_swtime and td_slptime as p_swtick and td_slptick. Thisjeff2007-09-211-9/+14
| | | | | | | | | | | | changes the units from seconds to the value of 'ticks' when swapped in/out. ULE does not have a periodic timer that scans all threads in the system and as such maintaining a per-second counter is difficult. - Change computations requiring the unit in seconds to subtract ticks and divide by hz. This does make the wraparound condition hz times more frequent but this is still in the range of several months to years and the adverse effects are minimal. Approved by: re
* - Move all of the PS_ flags into either p_flag or td_flags.jeff2007-09-172-67/+89
| | | | | | | | | | | | | | - p_sflag was mostly protected by PROC_LOCK rather than the PROC_SLOCK or previously the sched_lock. These bugs have existed for some time. - Allow swapout to try each thread in a process individually and then swapin the whole process if any of these fail. This allows us to move most scheduler related swap flags into td_flags. - Keep ki_sflag for backwards compat but change all in source tools to use the new and more correct location of P_INMEM. Reported by: pho Reviewed by: attilio, kib Approved by: re (kensmith)
* Correct an assertion in vm_pageout_flush(). Specifically, if a page'salc2007-09-151-1/+2
| | | | | | | | | | | status after vm_pager_put_pages() is VM_PAGER_PEND, then it could have already been recycled, i.e., freed and reallocated to a new purpose; thus, asserting that such pages cannot be written is inappropriate. Reported by: kris Submitted by: tegge Approved by: re (kensmith) MFC after: 1 week
* Do not drop vm_map lock between doing vm_map_remove() and vm_map_insert().kib2007-08-203-18/+40
| | | | | | | | | | | For this, introduce vm_map_fixed() that does that for MAP_FIXED case. Dropping the lock allowed for parallel thread to occupy the freed space. Reported by: Tijl Coosemans <tijl ulyssis org> Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
* Remove comment that is no longer quite true.kib2007-08-181-3/+0
| | | | | Noted by: alc Approved by: re (kensmith)
* Fix the phys_pager in the way similar to the rev. 1.83 of thekib2007-08-181-22/+25
| | | | | | | | | | | | | | sys/vm/device_pager.c: Protect the creation of the phys pager with non-NULL handle with the phys_pager_mtx. Lookup of phys pager in the pagers list by handle is now synchronized with its removal from the list, and phys_pager_mtx is put before vm object lock in lock order. Dispose the phys_pager_alloc_lock and tsleep calls, together with acquiring Giant, since phys_pager_mtx now covers the same block. Reviewed by: alc Approved by: re (kensmith)
* Protect the creation of the device pager with the dev_pager_mtx. Lookupkib2007-08-071-12/+24
| | | | | | | | | | | of device pager in the pagers list by handle is now synchronized with its removal from the list, and dev_pager_mtx is put before vm object lock in lock order. Dispose the dev_pager_sx lock, since dev_pager_mtx now covers the same block. Noted by: kensmith Reviewed by: alc Approved by: re (kensmith)
* Consider a scenario in which one processor, call it Pt, is performingalc2007-08-054-18/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vm_object_terminate() on a device-backed object at the same time that another processor, call it Pa, is performing dev_pager_alloc() on the same device. The problem is that vm_pager_object_lookup() should not be allowed to return a doomed object, i.e., an object with OBJ_DEAD set, but it does. In detail, the unfortunate sequence of events is: Pt in vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls vm_pager_object_lookup(), which returns the doomed object. Next, Pa calls vm_object_reference(), which requires the doomed object's lock, so Pa waits for Pt to release the doomed object's lock. Pt proceeds to the point in vm_object_terminate() where it releases the doomed object's lock. Pa is now able to complete vm_object_reference() because it can now complete the acquisition of the doomed object's lock. So, now the doomed object has a reference count of one! Pa releases dev_pager_sx and returns the doomed object from dev_pager_alloc(). Pt now acquires dev_pager_mtx, removes the doomed object from dev_pager_object_list, releases dev_pager_mtx, and finally calls uma_zfree with the doomed object. However, the doomed object is still in use by Pa. Repeating my key point, vm_pager_object_lookup() must not return a doomed object. Moreover, the test for the object's state, i.e., doomed or not, and the increment of the object's reference count should be carried out atomically. Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks
* Do not acquire Giant unconditionally around the calls to the cdevswkib2007-08-051-5/+0
| | | | | | | | | d_mmap methods. prep_cdevsw() already installs the shims that acquire/drop Giant for the methods of a driver that specified the D_NEEDGIANT flag. Reviewed by: alc Approved by: re (kensmith)
* Add a counter for the total number of pages cached and support foralc2007-07-272-0/+3
| | | | | | reporting the value of this counter in the program "vmstat". Approved by: re (rwatson)
* When we do open, we should lock the vnode exclusively. This fixes few races:pjd2007-07-261-3/+3
| | | | | | | | | | - fifo race, where two threads assign v_fifoinfo, - v_writecount modifications, - v_object modifications, - and probably more... Discussed with: kib, ups Approved by: re (rwatson)
* Two changes to vm_fault_additional_pages():alc2007-07-201-19/+11
| | | | | | | | | | | | | | | | 1. Rewrite the backward scan. Specifically, reverse the order in which pages are allocated so that upon failure it is never necessary to free pages that were just allocated. Moreover, any allocated pages can be put to use. This makes the backward scan behave just like the forward scan. 2. Eliminate an explicit, unsynchronized check for low memory before calling vm_page_alloc(). It serves no useful purpose. It is, in effect, optimizing the uncommon case at the expense of the common case. Approved by: re (hrs) MFC after: 3 weeks
* Eliminate two unused functions: vm_phys_alloc_pages() andalc2007-07-143-40/+17
| | | | | | | | | | vm_phys_free_pages(). Rename vm_phys_alloc_pages_locked() to vm_phys_alloc_pages() and vm_phys_free_pages_locked() to vm_phys_free_pages(). Add comments regarding the need for the free page queues lock to be held by callers to these functions. No functional changes. Approved by: re (hrs)
* Eliminate dead code, specifically, an unused sysctl: "vm.idlezero_maxrun".alc2007-07-141-4/+0
| | | | Approved by: re (hrs)
* Update a comment describing the page queues.alc2007-07-131-6/+7
| | | | Approved by: re (hrs)
* Eliminate dead code.alc2007-07-121-10/+0
| | | | Approved by: re (hrs)
* Correct a problem in the ZERO_COPY_SOCKETS option, specifically, inalc2007-07-101-2/+22
| | | | | | | | | | | | | | | | | vm_page_cowfault(). Initially, if vm_page_cowfault() sleeps, the given page is wired, preventing it from being recycled. However, when transmission of the page completes, the page is unwired and returned to the page queues. At that point, the page is not in any special state that prevents it from being recycled. Consequently, vm_page_cowfault() should verify that the page is still held by the same vm object before retrying the replacement of the page. Note: The containing object is, however, safe from being recycled by virtue of having a non-zero paging-in-progress count. While I'm here, add some assertions and comments. Approved by: re (rwatson) MFC After: 3 weeks
* Eliminate the special case handling of OBJT_DEVICE objects inalc2007-07-081-10/+0
| | | | | | | | | | vm_fault_additional_pages() that was introduced in revision 1.47. Then as now, it is unnecessary because dev_pager_haspage() returns zero for both the number of pages to read ahead and read behind, producing the same exact behavior by vm_fault_additional_pages() as the special case handling. Approved by: re (rwatson)
* When a cached page is reactivated in vm_fault(), update the counter thatalc2007-07-061-8/+10
| | | | | | | | | | tracks the total number of reactivated pages. (We have not been counting reactivations by vm_fault() since revision 1.46.) Correct a comment in vm_fault_additional_pages(). Approved by: re (kensmith) MFC after: 1 week
* Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncatepeter2007-07-041-0/+14
| | | | Approved by: re (kensmith)
* In the previous revision, when I replaced the unconditional acquisitionalc2007-07-021-11/+10
| | | | | | | | | | | | | | | | | | | | | | of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate the acquisition of the vnode interlock before releasing the vm object's lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is performed. Unfortunately, this allows the vnode to be recycled between the release of the vm object's lock and the vget() on the vnode. In this revision, I prevent the vnode from being recycled by acquiring another reference to the vm object and underlying vnode before releasing the vm object's lock. This change also addresses another preexisting but trivial problem. By acquiring another reference to the vm object, I also prevent the vm object from being recycled. Previously, the "vnodes skipped" counter could be wrong because if it examined a recycled vm object. Reported by: kib Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks
* Eliminate the use of Giant from vm_daemon(). Replace the unconditionalalc2007-06-261-21/+27
| | | | | | | use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(). Approved by: re (kensmith) MFC after: 3 weeks
* Eliminate GIANT_REQUIRED from swap_pager_putpages().alc2007-06-241-1/+0
| | | | | Approved by: re (mux) MFC after: 1 week
* Eliminate unnecessary checks from vm_pageout_clean(): The page that isalc2007-06-181-7/+4
| | | | | | | | | | | | passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because it came from the inactive queue and PG_UNMANAGED pages are not in any page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED. So, if the page that is passed to vm_pageout_clean() is not PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its neighbors from the same object cannot themselves be PG_UNMANAGED. Reviewed by: tegge
* Don't declare inline a function which isn't.mjacob2007-06-171-1/+1
|
* Make sure object is NULL- there is a possible case where you couldmjacob2007-06-171-1/+2
| | | | | fall through to it being used w/o being set. Put a break in the default case.
* Initialize reqpage to zero.mjacob2007-06-171-1/+1
|
* If attempting to cache a "busy", panic instead of printing a diagnosticalc2007-06-161-2/+1
| | | | message and returning.
* Update a comment.alc2007-06-161-7/+7
|
* Enable the new physical memory allocator.alc2007-06-167-727/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re
* Eliminate dead code: We have not performed pageouts on the kernel objectalc2007-06-131-3/+1
| | | | in this millenium.
* Conditionally acquire Giant in vm_contig_launder_page().alc2007-06-111-0/+4
|
* Optimize vmmeter locking.attilio2007-06-104-20/+10
| | | | | | | | | | In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)
* Add a new physical memory allocator. However, do not yet connect italc2007-06-102-0/+741
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to the build. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re
* Commit 14/14 of sched_lock decomposition.jeff2007-06-054-41/+63
| | | | | | | | | | | - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* Do proper "locking" for missing vmmeters part.attilio2007-06-045-23/+33
| | | | | | | | Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)
* Rework the PCPU_* (MD) interface:attilio2007-06-041-6/+6
| | | | | | | | | | | | - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-012-11/+6
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Revert VMCNT_* operations introduction.attilio2007-05-3114-184/+177
| | | | | | | | Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
* Revert UF_OPENING workaround for CURRENT.kib2007-05-311-1/+1
| | | | | | | | | Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
* Add functions sx_xlock_sig() and sx_slock_sig().attilio2007-05-311-2/+2
| | | | | | | | | | | | | | | | These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)
* Eliminate the reactivation of cached pages in vm_fault_prefault() andalc2007-05-222-8/+16
| | | | | | | | | | | | | | | | | vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With the exception of calls to vm_map_pmap_enter() from madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter() are both used to create speculative mappings. Thus, always reactivating cached pages is a mistake. In principle, cached pages should only be reactivated by an actual access. Otherwise, the following misbehavior can occur. On a hard fault for a text page the clustering algorithm fetches not only the required page but also several of the adjacent pages. Now, suppose that one or more of the adjacent pages are never accessed. Ultimately, these unused pages become cached pages through the efforts of the page daemon. However, the next activation of the executable reactivates and maps these unused pages. Consequently, they are never replaced. In effect, they become pinned in memory.
* - rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.jeff2007-05-201-1/+1
| | | | | Suggested by: julian@ Contributed by: attilio@
OpenPOWER on IntegriCloud