summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Eliminate the special case handling of OBJT_DEVICE objects inalc2007-07-081-10/+0
| | | | | | | | | | vm_fault_additional_pages() that was introduced in revision 1.47. Then as now, it is unnecessary because dev_pager_haspage() returns zero for both the number of pages to read ahead and read behind, producing the same exact behavior by vm_fault_additional_pages() as the special case handling. Approved by: re (rwatson)
* When a cached page is reactivated in vm_fault(), update the counter thatalc2007-07-061-8/+10
| | | | | | | | | | tracks the total number of reactivated pages. (We have not been counting reactivations by vm_fault() since revision 1.46.) Correct a comment in vm_fault_additional_pages(). Approved by: re (kensmith) MFC after: 1 week
* Add freebsd6_ wrappers for mmap/lseek/pread/pwrite/truncate/ftruncatepeter2007-07-041-0/+14
| | | | Approved by: re (kensmith)
* In the previous revision, when I replaced the unconditional acquisitionalc2007-07-021-11/+10
| | | | | | | | | | | | | | | | | | | | | | of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(), I had to eliminate the acquisition of the vnode interlock before releasing the vm object's lock because the vnode interlock cannot be held when VFS_LOCK_GIANT() is performed. Unfortunately, this allows the vnode to be recycled between the release of the vm object's lock and the vget() on the vnode. In this revision, I prevent the vnode from being recycled by acquiring another reference to the vm object and underlying vnode before releasing the vm object's lock. This change also addresses another preexisting but trivial problem. By acquiring another reference to the vm object, I also prevent the vm object from being recycled. Previously, the "vnodes skipped" counter could be wrong because if it examined a recycled vm object. Reported by: kib Reviewed by: kib Approved by: re (kensmith) MFC after: 3 weeks
* Eliminate the use of Giant from vm_daemon(). Replace the unconditionalalc2007-06-261-21/+27
| | | | | | | use of Giant in vm_pageout_scan() with VFS_LOCK_GIANT(). Approved by: re (kensmith) MFC after: 3 weeks
* Eliminate GIANT_REQUIRED from swap_pager_putpages().alc2007-06-241-1/+0
| | | | | Approved by: re (mux) MFC after: 1 week
* Eliminate unnecessary checks from vm_pageout_clean(): The page that isalc2007-06-181-7/+4
| | | | | | | | | | | | passed to vm_pageout_clean() cannot possibly be PG_UNMANAGED because it came from the inactive queue and PG_UNMANAGED pages are not in any page queue. Moreover, PG_UNMANAGED pages only exist in OBJT_PHYS objects, and all pages within a OBJT_PHYS object are PG_UNMANAGED. So, if the page that is passed to vm_pageout_clean() is not PG_UNMANAGED, then it cannot be from an OBJT_PHYS object and its neighbors from the same object cannot themselves be PG_UNMANAGED. Reviewed by: tegge
* Don't declare inline a function which isn't.mjacob2007-06-171-1/+1
|
* Make sure object is NULL- there is a possible case where you couldmjacob2007-06-171-1/+2
| | | | | fall through to it being used w/o being set. Put a break in the default case.
* Initialize reqpage to zero.mjacob2007-06-171-1/+1
|
* If attempting to cache a "busy", panic instead of printing a diagnosticalc2007-06-161-2/+1
| | | | message and returning.
* Update a comment.alc2007-06-161-7/+7
|
* Enable the new physical memory allocator.alc2007-06-167-727/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re
* Eliminate dead code: We have not performed pageouts on the kernel objectalc2007-06-131-3/+1
| | | | in this millenium.
* Conditionally acquire Giant in vm_contig_launder_page().alc2007-06-111-0/+4
|
* Optimize vmmeter locking.attilio2007-06-104-20/+10
| | | | | | | | | | In particular: - Add an explicative table for locking of struct vmmeter members - Apply new rules for some of those members - Remove some unuseful comments Heavily reviewed by: alc, bde, jeff Approved by: jeff (mentor)
* Add a new physical memory allocator. However, do not yet connect italc2007-06-102-0/+741
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to the build. This allocator uses a binary buddy system with a twist. First and foremost, this allocator is required to support the implementation of superpages. As a side effect, it enables a more robust implementation of contigmalloc(9). Moreover, this reimplementation of contigmalloc(9) eliminates the acquisition of Giant by contigmalloc(..., M_NOWAIT, ...). The twist is that this allocator tries to reduce the number of TLB misses incurred by accesses through a direct map to small, UMA-managed objects and page table pages. Roughly speaking, the physical pages that are allocated for such purposes are clustered together in the physical address space. The performance benefits vary. In the most extreme case, a uniprocessor kernel running on an Opteron, I measured an 18% reduction in system time during a buildworld. This allocator does not implement page coloring. The reason is that superpages have much the same effect. The contiguous physical memory allocation necessary for a superpage is inherently colored. Finally, the one caveat is that this allocator does not effectively support prezeroed pages. I hope this is temporary. On i386, this is a slight pessimization. However, on amd64, the beneficial effects of the direct-map optimization outweigh the ill effects. I speculate that this is true in general of machines with a direct map. Approved by: re
* Commit 14/14 of sched_lock decomposition.jeff2007-06-054-41/+63
| | | | | | | | | | | - Use thread_lock() rather than sched_lock for per-thread scheduling sychronization. - Use the per-process spinlock rather than the sched_lock for per-process scheduling synchronization. Tested by: kris, current@ Tested on: i386, amd64, ULE, 4BSD, libthr, libkse, PREEMPTION, etc. Discussed with: kris, attilio, kmacy, jhb, julian, bde (small parts each)
* Do proper "locking" for missing vmmeters part.attilio2007-06-045-23/+33
| | | | | | | | Now, we assume no more sched_lock protection for some of them and use the distribuited loads method for vmmeter (distribuited through CPUs). Reviewed by: alc, bde Approved by: jeff (mentor)
* Rework the PCPU_* (MD) interface:attilio2007-06-041-6/+6
| | | | | | | | | | | | - Rename PCPU_LAZY_INC into PCPU_INC - Add the PCPU_ADD interface which just does an add on the pcpu member given a specific value. Note that for most architectures PCPU_INC and PCPU_ADD are not safe. This is a point that needs some discussions/work in the next days. Reviewed by: alc, bde Approved by: jeff (mentor)
* - Move rusage from being per-process in struct pstats to per-thread injeff2007-06-012-11/+6
| | | | | | | | | | | | | | | | | | | td_ru. This removes the requirement for per-process synchronization in statclock() and mi_switch(). This was previously supported by sched_lock which is going away. All modifications to rusage are now done in the context of the owning thread. reads proceed without locks. - Aggregate exiting threads rusage in thread_exit() such that the exiting thread's rusage is not lost. - Provide a new routine, rufetch() to fetch an aggregate of all rusage structures from all threads in a process. This routine must be used in any place requiring a rusage from a process prior to it's exit. The exited process's rusage is still available via p_ru. - Aggregate tick statistics only on demand via rufetch() or when a thread exits. Tick statistics are kept in the thread and protected by sched_lock until it exits. Initial patch by: attilio Reviewed by: attilio, bde (some objections), arch (mostly silent)
* Revert VMCNT_* operations introduction.attilio2007-05-3114-184/+177
| | | | | | | | Probabilly, a general approach is not the better solution here, so we should solve the sched_lock protection problems separately. Requested by: alc Approved by: jeff (mentor)
* Revert UF_OPENING workaround for CURRENT.kib2007-05-311-1/+1
| | | | | | | | | Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation argument from being file descriptor index into the pointer to struct file. Proposed and reviewed by: jhb Reviewed by: daichi (unionfs) Approved by: re (kensmith)
* Add functions sx_xlock_sig() and sx_slock_sig().attilio2007-05-311-2/+2
| | | | | | | | | | | | | | | | These functions are intended to do the same actions of sx_xlock() and sx_slock() but with the difference to perform an interruptible sleep, so that sleep can be interrupted by external events. In order to support these new featueres, some code renstruction is needed, but external API won't be affected at all. Note: use "void" cast for "int" returning functions in order to avoid tools like Coverity prevents to whine. Requested by: rwatson Tested by: rwatson Reviewed by: jhb Approved by: jeff (mentor)
* Eliminate the reactivation of cached pages in vm_fault_prefault() andalc2007-05-222-8/+16
| | | | | | | | | | | | | | | | | vm_map_pmap_enter() unless the caller is madvise(MADV_WILLNEED). With the exception of calls to vm_map_pmap_enter() from madvise(MADV_WILLNEED), vm_fault_prefault() and vm_map_pmap_enter() are both used to create speculative mappings. Thus, always reactivating cached pages is a mistake. In principle, cached pages should only be reactivated by an actual access. Otherwise, the following misbehavior can occur. On a hard fault for a text page the clustering algorithm fetches not only the required page but also several of the adjacent pages. Now, suppose that one or more of the adjacent pages are never accessed. Ultimately, these unused pages become cached pages through the efforts of the page daemon. However, the next activation of the executable reactivates and maps these unused pages. Consequently, they are never replaced. In effect, they become pinned in memory.
* - rename VMCNT_DEC to VMCNT_SUB to reflect the count argument.jeff2007-05-201-1/+1
| | | | | Suggested by: julian@ Contributed by: attilio@
* - define and use VMCNT_{GET,SET,ADD,SUB,PTR} macros for manipulatingjeff2007-05-1814-177/+184
| | | | | | | | vmcnts. This can be used to abstract away pcpu details but also changes to use atomics for all counters now. This means sched lock is no longer responsible for protecting counts in the switch routines. Contributed by: Attilio Rao <attilio@FreeBSD.org>
* Update stale comment on protecting UMA per-CPU caches: we now userwatson2007-05-091-7/+7
| | | | critical sections rather than mutexes.
* Define every architecture as either VM_PHYSSEG_DENSE oralc2007-05-052-2/+28
| | | | | | | | | | | | | | | | | | | | VM_PHYSSEG_SPARSE depending on whether the physical address space is densely or sparsely populated with memory. The effect of this definition is to determine which of two implementations of vm_page_array and PHYS_TO_VM_PAGE() is used. The legacy implementation is obtained by defining VM_PHYSSEG_DENSE, and a new implementation that trades off time for space is obtained by defining VM_PHYSSEG_SPARSE. For now, all architectures except for ia64 and sparc64 define VM_PHYSSEG_DENSE. Defining VM_PHYSSEG_SPARSE on ia64 allows the entirety of my Itanium 2's memory to be used. Previously, only the first 1 GB could be used. Defining VM_PHYSSEG_SPARSE on sparc64 allows USIIIi-based systems to boot without crashing. This change is a combination of Nathan Whitehorn's patch and my own work in perforce. Discussed with: kmacy, marius, Nathan Whitehorn PR: 112194
* Remove some code from vmspace_fork() that became redundant afteralc2007-04-261-4/+0
| | | | | revision 1.334 modified _vm_map_init() to initialize the new vm map's flags to zero.
* Audit pathnames looked up in swapon(2) and swapoff(2).rwatson2007-04-231-2/+4
| | | | | MFC after: 2 weeks Obtained from: TrustedBSD Project
* Correct contigmalloc2()'s implementation of M_ZERO. Specifically,alc2007-04-191-1/+1
| | | | | | | | | contigmalloc2() was always testing the first physical page for PG_ZERO, not the current page of interest. Submitted by: Michael Plass PR: 81301 MFC after: 1 week
* Correct two comments.alc2007-04-191-2/+2
| | | | Submitted by: Michael Plass
* Minor typo fix, noticed while I was going through *_pager.c files.keramida2007-04-101-1/+1
|
* When KVA is exhausted, try the vm_lowmem event for the last time beforepjd2007-04-051-4/+14
| | | | panicing. This helps a lot in ZFS stability.
* Fix a problem for file systems that don't implement VOP_BMAP() operation.pjd2007-04-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | The problem is this: vm_fault_additional_pages() calls vm_pager_has_page(), which calls vnode_pager_haspage(). Now when VOP_BMAP() returns an error (eg. EOPNOTSUPP), vnode_pager_haspage() returns TRUE without initializing 'before' and 'after' arguments, so we have some accidental values there. This bascially was causing this condition to be meet: if ((rahead + rbehind) > ((cnt.v_free_count + cnt.v_cache_count) - cnt.v_free_reserved)) { pagedaemon_wakeup(); [...] } (we have some random values in rahead and rbehind variables) I'm not entirely sure this is the right fix, maybe we should just return FALSE in vnode_pager_haspage() when VOP_BMAP() fails? alc@ knows about this problem, maybe he will be able to come up with a better fix if this is not the right one.
* Prevent a race between vm_object_collapse() and vm_object_split() fromalc2007-03-271-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | causing a crash. Suppose that we have two objects, obj and backing_obj, where backing_obj is obj's backing object. Further, suppose that backing_obj has a reference count of two. One being the reference held by obj and the other by a map entry. Now, suppose that the map entry is deallocated and its reference removed by vm_object_deallocate(). vm_object_deallocate() recognizes that the only remaining reference is from a shadow object, obj, and calls vm_object_collapse() on obj. vm_object_collapse() executes if (backing_object->ref_count == 1) { /* * If there is exactly one reference to the backing * object, we can collapse it into the parent. */ vm_object_backing_scan(object, OBSC_COLLAPSE_WAIT); vm_object_backing_scan(OBSC_COLLAPSE_WAIT) executes if (op & OBSC_COLLAPSE_WAIT) { vm_object_set_flag(backing_object, OBJ_DEAD); } Finally, suppose that either vm_object_backing_scan() or vm_object_collapse() sleeps releasing its locks. At this instant, another thread executes vm_object_split(). It crashes in vm_object_reference_locked() on the assertion that the object is not dead. If, however, assertions are not enabled, it crashes much later, after the object has been recycled, in vm_object_deallocate() because the shadow count and shadow list are inconsistent. Reviewed by: tegge Reported by: jhb MFC after: 1 week
* Two small changes to vm_map_pmap_enter():alc2007-03-251-4/+3
| | | | | | | | | | 1) Eliminate an unnecessary check for fictitious pages. Specifically, only device-backed objects contain fictitious pages and the object is not device-backed. 2) Change the types of "psize" and "tmpidx" to vm_pindex_t in order to prevent possible wrap around with extremely large maps and objects, respectively. Observed by: tegge (last summer)
* vm_page_busy() no longer requires the page queues lock to be held. Reducealc2007-03-231-2/+2
| | | | the scope of the page queues lock in vm_fault() accordingly.
* Change the order of lock reacquisition in vm_object_split() in order toalc2007-03-221-2/+5
| | | | simplify the code slightly. Add a comment concerning lock ordering.
* Use PCPU_LAZY_INC() to update page fault statistics.alc2007-03-051-6/+6
|
* Use pause() in vm_object_deallocate() to yield the CPU to the lock holderjhb2007-02-271-1/+1
| | | | | | rather than a tsleep() on &proc0. The only wakeup on &proc0 is intended to awaken the swapper, not random threads blocked in vm_object_deallocate().
* Use pause() rather than tsleep() on stack variables and function pointers.jhb2007-02-271-2/+1
|
* Change the way that unmanaged pages are created. Specifically,alc2007-02-256-48/+11
| | | | | | | | | | | | | | immediately flag any page that is allocated to a OBJT_PHYS object as unmanaged in vm_page_alloc() rather than waiting for a later call to vm_page_unmanage(). This allows for the elimination of some uses of the page queues lock. Change the type of the kernel and kmem objects from OBJT_DEFAULT to OBJT_PHYS. This allows us to take advantage of the above change to simplify the allocation of unmanaged pages in kmem_alloc() and kmem_malloc(). Remove vm_page_unmanage(). It is no longer used.
* Change the page's CLEANCHK flag from being a page queue mutex synchronizedalc2007-02-222-16/+16
| | | | flag to a vm object mutex synchronized flag.
* Enable vm_page_free() and vm_page_free_zero() to be called on some pagesalc2007-02-181-2/+4
| | | | | without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue.
* Remove a stale comment. Add punctuation to a nearby comment.alc2007-02-171-6/+1
|
* Relax the page queue lock assertions in vm_page_remove() andalc2007-02-151-2/+3
| | | | | | | | vm_page_free_toq() to account for recent changes that allow vm_page_free_toq() to be called on some pages without the page queues lock being held, specifically, pages that are not contained in a vm object and not a member of a page queue. (Examples of such pages include page table pages, pv entry pages, and uma small alloc pages.)
* Avoid the unnecessary acquisition of the free page queues lock when a pagealc2007-02-141-4/+5
| | | | | | | is actually being added to the hold queue, not the free queue. At the same time, avoid unnecessary tests to wake up threads waiting for free memory and the idle thread that zeroes free pages. (These tests will be performed later when the page finally moves from the hold queue to the free queue.)
* Add uma_set_align() interface, which will be called at most once duringrwatson2007-02-112-2/+26
| | | | | | | | | | | | | | | | | boot by MD code to indicated detected alignment preference. Rather than cache alignment being encoded in UMA consumers by defining a global alignment value of (16 - 1) in UMA_ALIGN_CACHE, UMA_ALIGN_CACHE is now a special value (-1) that causes UMA to look at registered alignment. If no preferred alignment has been selected by MD code, a default alignment of (16 - 1) will be used. Currently, no hardware platforms specify alignment; architecture maintainers will need to modify MD startup code to specify an alignment if desired. This must occur before initialization of UMA so that all UMA zones pick up the requested alignment. Reviewed by: jeff, alc Submitted by: attilio
OpenPOWER on IntegriCloud