summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Instead of forcing vn_start_write() to reset mp back to NULL for thekib2008-11-161-2/+1
| | | | | | | | | failed calls with non-NULL vp, explicitely clear mp after failure. Tested by: stass Reviewed by: tegge PR: 123768 MFC after: 1 week
* Support kernel crash mini dumps on ARM architecture.raj2008-11-061-1/+1
| | | | Obtained from: Juniper Networks, Semihalf
* Various comment nits, and typos.keramida2008-11-021-32/+32
|
* Update mmap() comment: no more block devices, so no more block devicerwatson2008-10-221-4/+0
| | | | | | cache coherency questions. MFC after: 3 days
* Remove the struct thread unuseful argument from bufobj interface.attilio2008-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | | In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Move the code for doing out-of-memory grass from vm_pageout_scan()kib2008-09-293-71/+87
| | | | | | | | | | | | into the separate function vm_pageout_oom(). Supply a parameter for vm_pageout_oom() describing a reason for the call. Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone is exhausted. Reviewed by: alc Tested by: pho, jhb MFC after: 2 weeks
* Move CTASSERT from header file to source file, per implementation note nowemaste2008-09-262-7/+7
| | | | in the CTASSERT man page.
* Save previous content of the td_fpop before storing the currentkib2008-09-261-0/+6
| | | | | | | | | | | | | | | filedescriptor into it. Make sure that td_fpop is NULL when calling d_mmap from dev_pager_getpages(). Change guards against td_fpop field being non-NULL with private state for another device, and against sudden clearing the td_fpop. This could occur when either a driver method calls another driver through the filedescriptor operation, or a page fault happen while driver is writing to a memory backed by another driver. Noted by: rwatson Tested by: rnoland MFC after: 3 days
* Prevent an integer overflow in vm_pageout_page_stats() on machines with aalc2008-09-211-1/+2
| | | | | | | | large number of physical pages. PR: 126158 Submitted by: Dmitry Tejblum MFC after: 3 days
* Allow the d_mmap driver methods to use cdevpriv KPI during verificationkib2008-09-201-0/+2
| | | | | | | | phase of establishing mapping. Discussed with: rwatson, jhb, rnoland Tested by: rnoland MFC after: 3 days
* Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed threadattilio2008-08-283-6/+7
| | | | | | was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Remove unused variable nosleepwithlocks.antoine2008-08-231-7/+0
| | | | | | | PR: 126609 Submitted by: Mateusz Guzik MFC after: 1 month X-MFC: to stable/7 only, this variable is still used in stable/6
* Allow the MD UMA allocator to use VM routines like kmem_*(). Existing code ↵nwhitehorn2008-08-231-1/+1
| | | | | | requires MD allocator to be available early in the boot process, before the VM is fully available. This defines a new VM define (UMA_MD_SMALL_ALLOC_NEEDS_VM) that allows an MD UMA small allocator to become available at the same time as the default UMA allocator. Approved by: marcel (mentor)
* A bunch of formatting fixes brough to light by, or created by the Vimage commitjulian2008-08-201-0/+1
| | | | a few days ago.
* Work around differences in page allocation for initial page tables on xenkmacy2008-08-171-0/+4
| | | | MFC after: 1 month
* Fix REDZONE(9) on amd64 and perhaps other 64 bit targets -- ensure the spaceemaste2008-08-131-0/+2
| | | | | | | that redzone adds to the allocation for storing its metadata is at least as large as the metadata that it will store there. Submitted by: Nima Misaghian
* If a thread that is swapped out is made runnable, then the setrunnable()jhb2008-08-051-38/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | routine wakes up proc0 so that proc0 can swap the thread back in. Historically, this has been done by waking up proc0 directly from setrunnable() itself via a wakeup(). When waking up a sleeping thread that was swapped out (the usual case when waking proc0 since only sleeping threads are eligible to be swapped out), this resulted in a bit of recursion (e.g. wakeup() -> setrunnable() -> wakeup()). With sleep queues having separate locks in 6.x and later, this caused a spin lock LOR (sleepq lock -> sched_lock/thread lock -> sleepq lock). An attempt was made to fix this in 7.0 by making the proc0 wakeup use the ithread mechanism for doing the wakeup. However, this required grabbing proc0's thread lock to perform the wakeup. If proc0 was asleep elsewhere in the kernel (e.g. waiting for disk I/O), then this degenerated into the same LOR since the thread lock would be some other sleepq lock. Fix this by deferring the wakeup of the swapper until after the sleepq lock held by the upper layer has been locked. The setrunnable() routine now returns a boolean value to indicate whether or not proc0 needs to be woken up. The end result is that consumers of the sleepq API such as *sleep/wakeup, condition variables, sx locks, and lockmgr, have to wakeup proc0 if they get a non-zero return value from sleepq_abort(), sleepq_broadcast(), or sleepq_signal(). Discussed with: jeff Glanced at by: sam Tested by: Jurgen Weber jurgen - ish com au MFC after: 2 weeks
* Fill in a few sysctl descriptions.trhodes2008-08-034-7/+8
| | | | | Reviewed by: alc, Matt Dillon <dillon@apollo.backplane.com> Approved by: alc
* One more whitespace nit.jhb2008-07-301-2/+0
|
* A few more whitespace fixes.jhb2008-07-302-2/+1
|
* If the kernel has run out of metadata for swap, then explicitly panic()jhb2008-07-301-1/+1
| | | | | | instead of emitting a warning before deadlocking. MFC after: 1 month
* The behaviour of the lockmgr going back at least to the 4.4BSD-Lite2 waskib2008-07-301-2/+9
| | | | | | | | | | | | | | | | | | to downgrade the exclusive lock to shared one when exclusive lock owner requested shared lock. New lockmgr panics instead. The vnode_pager_lock function requests shared lock on the vnode backing the OBJT_VNODE, and can be called when the current thread already holds an exlcusive lock on the vnode. For instance, it happens when handling page fault from the VOP_WRITE() uiomove that writes to the file, with the faulted in page fetched from the vm object backed by the same file. We then get the situation described above. Verify whether the vnode is already exclusively locked by the curthread and request recursed exclusive vnode lock instead of shared, if true. Reported by: gallatin Discussed with: attilio
* Eliminate stale comments from kmem_malloc().alc2008-07-181-12/+0
|
* Use the VM_ALLOC_INTERRUPT for the page requests when allocating memorykib2008-07-111-7/+5
| | | | | | | | | | | | | | | for the bio for swapout write. It allows the page allocator to drain free page list deeper. As result, a deadlock where pageout deamon sleeps waiting for bio to be allocated for swapout is no more reproducable in practice. Alan said that M_USE_RESERVE shall be ressurrected and used there, but until this is implemented, M_NOWAIT does exactly what is needed. Tested by: pho, kris Reviewed by: alc No objections from: phk MFC after: 2 weeks (RELENG_7 only)
* Enable the creation of a kmem map larger than 4GB.alc2008-07-051-1/+1
| | | | | | | Submitted by: Tz-Huan Huang Make several variables related to kmem map auto-sizing static. Found by: CScout
* Make preparations for increasing the size of the kernel virtual address spacealc2008-06-221-2/+6
| | | | | | | | on the amd64 architecture. The amd64 architecture requires kernel code and global variables to reside in the highest 2GB of the 64-bit virtual address space. Thus, the memory allocated during bootstrap, before the call to kmem_init(), starts at KERNBASE, which is not necessarily the same as VM_MIN_KERNEL_ADDRESS on amd64.
* KERNBASE is not necessarily an address within the kernel map, e.g.,alc2008-06-211-1/+1
| | | | | | | | PowerPC/AIM. Consequently, it should not be used to determine the maximum number of kernel map entries. Intead, use VM_MIN_KERNEL_ADDRESS, which marks the start of the kernel map on all architectures. Tested by: marcel@ (PowerPC/AIM)
* Fix vm object creation locking to allow SHARED vnode locking for ↵ups2008-06-121-6/+7
| | | | | | | | vnode_create_vobject. (Not currently used) Noticed by: kib@
* Essentially, neither madvise(..., MADV_DONTNEED) nor madvise(..., MADV_FREE)alc2008-06-061-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | work. (Moreover, I don't believe that they have ever worked as intended.) The explanation is fairly simple. Both MADV_DONTNEED and MADV_FREE perform vm_page_dontneed() on each page within the range given to madvise(). This function moves the page to the inactive queue. Specifically, if the page is clean, it is moved to the head of the inactive queue where it is first in line for processing by the page daemon. On the other hand, if it is dirty, it is placed at the tail. Let's further examine the case in which the page is clean. Recall that the page is at the head of the line for processing by the page daemon. The expectation of vm_page_dontneed()'s author was that the page would be transferred from the inactive queue to the cache queue by the page daemon. (Once the page is in the cache queue, it is, in effect, free, that is, it can be reallocated to a new vm object by vm_page_alloc() if it isn't reactivated quickly enough by a user of the old vm object.) The trouble is that nowhere in the execution of either MADV_DONTNEED or MADV_FREE is either the machine-independent reference flag (PG_REFERENCED) or the reference bit in any page table entry (PTE) mapping the page cleared. Consequently, the immediate reaction of the page daemon is to reactivate the page because it is referenced. In effect, the madvise() was for naught. The case in which the page was dirty is not too different. Instead of being laundered, the page is reactivated. Note: The essential difference between MADV_DONTNEED and MADV_FREE is that MADV_FREE clears a page's dirty field. So, MADV_FREE is always executing the clean case above. This revision changes vm_page_dontneed() to clear both the machine- independent reference flag (PG_REFERENCED) and the reference bit in all PTEs mapping the page. MFC after: 6 weeks
* To date, our implementation of munmap(2) has required that thealc2008-05-241-7/+0
| | | | | | | | | | | | entirety of the specified range be mapped. Specifically, it has returned EINVAL if the entire range is not mapped. There is not, however, any basis for this in either SuSv2 or our own man page. Moreover, neither Linux nor Solaris impose this requirement. This revision removes this requirement. Submitted by: Tijl Coosemans PR: 118510 MFC after: 6 weeks
* Allow VM object creation in ufs_lookup. (If vfs.vmiodirenable is set)ups2008-05-203-18/+41
| | | | | | | | | | | | Directory IO without a VM object will store data in 'malloced' buffers severely limiting caching of the data. Without this change VM objects for directories are only created on an open() of the directory. TODO: Inline test if VM object already exists to avoid locking/function call overhead. Tested by: kris@ Reviewed by: jeff@ Reported by: David Filo
* Retire pmap_addr_hint(). It is no longer used.alc2008-05-181-1/+0
|
* In order to map device memory using superpages, mmap(2) must find aalc2008-05-171-5/+3
| | | | | | | | | | | | | | | | | superpage-aligned virtual address for the mapping. Revision 1.65 implemented an overly simplistic and generally ineffectual method for finding a superpage-aligned virtual address. Specifically, it rounds the virtual address corresponding to the end of the data segment up to the next superpage-aligned virtual address. If this virtual address is unallocated, then the device will be mapped using superpages. Unfortunately, in modern times, where applications like the X server dynamically load much of their code, this virtual address is already allocated. In such cases, mmap(2) simply uses the first available virtual address, which is not necessarily superpage aligned. This revision changes mmap(2) to use a more robust method, specifically, the VMFS_ALIGNED_SPACE option that is now implemented by vm_map_find().
* Preset a device object's alignment ("pg_color") based upon thealc2008-05-171-1/+5
| | | | | | physical address of the device's memory. This enables pmap_align_superpage() to propose a virtual address for mapping the device memory that permits the use of superpage mappings.
* Don't call vm_reserv_alloc_page() on device-backed objects. Otherwise, thealc2008-05-151-1/+1
| | | | | | | system may panic because there is no reservation structure corresponding to the physical address of the device memory. Reported by: Giorgos Keramidas
* Provide the new argument to kmem_suballoc().alc2008-05-101-1/+1
|
* Introduce a new parameter "superpage_align" to kmem_suballoc() that isalc2008-05-103-17/+15
| | | | | | | | | | | used to request superpage alignment for the submap. Request superpage alignment for the kmem_map. Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently equivalent but VMFS_ANY_SPACE is the new preferred spelling.) Remove a stale comment from kmem_malloc().
* Generalize vm_map_find(9)'s parameter "find_space". Specifically, addalc2008-05-102-10/+23
| | | | | | | | | | support for VMFS_ALIGNED_SPACE, which requests the allocation of an address range best suited to superpages. The old options TRUE and FALSE are mapped to VMFS_ANY_SPACE and VMFS_NO_SPACE, so that there is no immediate need to update all of vm_map_find(9)'s callers. While I'm here, correct a misstatement about vm_map_find(9)'s return values in the man page.
* Introduce pmap_align_superpage(). It increases the starting virtualalc2008-05-091-0/+2
| | | | | address of the given mapping if a different alignment might result in more superpage mappings.
* add malloc flag to blist so that it can be used in ithread contextkmacy2008-05-051-1/+1
| | | | Reviewed by: alc, bsdimp
* Eliminate pointless casts from kmem_suballoc().alc2008-04-281-2/+2
|
* vm_map_fixed(), unlike vm_map_find(), does not update "addr", so it can bealc2008-04-283-5/+5
| | | | passed by value.
* - Make SCHED_STATS more generic by adding a wrapper to create thejeff2008-04-172-2/+2
| | | | | | | | | | | | | | | | | | variables and sysctl nodes. - In reset walk the children of kern_sched_stats and reset the counters via the oid_arg1 pointer. This allows us to add arbitrary counters to the tree and still reset them properly. - Define a set of switch types to be passed with flags to mi_switch(). These types are named SWT_*. These types correspond to SCHED_STATS counters and are automatically handled in this way. - Make the new SWT_ types more specific than the older switch stats. There are now stats for idle switches, remote idle wakeups, remote preemption ithreads idling, etc. - Add switch statistics for ULE's pickcpu algorithm. These stats include how much migration there is, how often affinity was successful, how often threads were migrated to the local cpu on wakeup, etc. Sponsored by: Nokia
* Introduce vm_reserv_reclaim_contig(). This function is used byalc2008-04-064-25/+99
| | | | | | | | | | contigmalloc(9) as a last resort to steal pages from an inactive, partially-used superpage reservation. Rename vm_reserv_reclaim() to vm_reserv_reclaim_inactive() and refactor it so that a separate subroutine is responsible for breaking the selected reservation. This subroutine is also used by vm_reserv_reclaim_contig().
* Eliminate an unnecessary test from vm_phys_unfree_page().alc2008-04-051-1/+1
|
* Update a comment to vm_map_pmap_enter().alc2008-04-041-2/+2
|
* Reintroduce UMA_SLAB_KMAP; however, change its spelling toalc2008-04-042-0/+5
| | | | | | | | | | | | | | | | | | | UMA_SLAB_KERNEL for consistency with its sibling UMA_SLAB_KMEM. (UMA_SLAB_KMAP met its original demise in revision 1.30 of vm/uma_core.c.) UMA_SLAB_KERNEL is now required by the jumbo frame allocators. Without it, UMA cannot correctly return pages from the jumbo frame zones to the VM system because it resets the pages' object field to NULL instead of the kernel object. In more detail, the jumbo frame zones are created with the option UMA_ZONE_REFCNT. This causes UMA to overwrite the pages' object field with the address of the slab. However, when UMA wants to release these pages, it doesn't know how to restore the object field, so it sets it to NULL. This change teaches UMA how to reset the object field to the kernel object. Crashes reported by: kris Fix tested by: kris Fix discussed with: jeff MFC after: 6 weeks
* Eliminate an unnecessary printf() from kmem_suballoc(). The subsequentalc2008-03-301-4/+2
| | | | panic() can be extended to convey the same information.
* - Use vm_object_reference_locked() directly fromjeff2008-03-291-15/+2
| | | | | | | | | vm_object_reference(). This is intended to get rid of vget() consumers who don't wish to acquire a lock. This is functionally the same as calling vref(). vm_object_reference_locked() already uses vref. Discussed with: alc
* Do not dereference cdev->si_cdevsw, use the dev_refthread() to properlykib2008-03-201-4/+16
| | | | | | | | obtain the reference. In particular, this fixes the panic reported in the PR. Remove the comments stating that this needs to be done. PR: kern/119422 MFC after: 1 week
OpenPOWER on IntegriCloud