summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Introduce the function kmem_alloc_attr(), which allocates kernel virtualalc2010-04-092-19/+100
| | | | | | | | | | | | | | | | memory with the specified physical attributes. In particular, like kmem_alloc_contig(), the caller can specify the physical address range from which the physical pages are allocated and the memory attributes (i.e., cache behavior) for these physical pages. However, in contrast to kmem_alloc_contig() or contigmalloc(), the physical pages that are allocated by kmem_alloc_attr() are not necessarily physically contiguous. This function is needed by DRM and VirtualBox. Correct an error in the prototype for kmem_malloc(). The third argument had the wrong type. Tested by: rnoland MFC after: 3 days
* Start copyright notice with /*-joel2010-04-072-2/+2
|
* When OOM searches for a process to kill, ignore the processes alreadykib2010-04-062-8/+17
| | | | | | | | | | | | | | | killed by OOM. When killed process waits for a page allocation, try to satisfy the request as fast as possible. This removes the often encountered deadlock, where OOM continously selects the same victim process, that sleeps uninterruptibly waiting for a page. The killed process may still sleep if page cannot be obtained immediately, but testing has shown that system has much higher chance to survive in OOM situation with the patch. In collaboration with: pho Reviewed by: alc MFC after: 4 weeks
* vm_reserv_alloc_page() should never be called on an OBJT_SG object, just asalc2010-04-051-0/+1
| | | | | | | | it is never called on an OBJT_DEVICE object. (This change should have been included in r195840.) Reported by: dougb@, avg@ MFC after: 3 days
* Make _vm_map_init() the one place where the vm map's pmap field isalc2010-04-032-10/+10
| | | | | | initialized. Reviewed by: kib
* Re-enable the call to pmap_release() by vmspace_dofree(). The accountingalc2010-04-031-6/+3
| | | | | | | | problem that is described in the comment has been addressed. Submitted by: kib Tested by: pho (a few months ago) MFC after: 6 weeks
* Reject attempts to create a MAP_ANON mapping with a non-zero offset.jhb2010-03-231-2/+1
| | | | | | PR: kern/71258 Submitted by: Alexander Best MFC after: 2 weeks
* - enable alignment on amd64 onlykmacy2010-03-221-2/+6
| | | | - only align pcpu caches and the volatile portion of uma_zone
* turn 205266 in to a no-op until the problem can be properly diagnosedkmacy2010-03-181-1/+1
|
* Cache line align various structures and move volatile counters tokmacy2010-03-171-6/+14
| | | | | | | not share a cache line with (mostly) immutable state Reviewed by: jeff@ MFC after: 7 days
* Update comment for vm_page_alloc(9), listing all acceptable flags [1].kib2010-02-271-1/+6
| | | | | | | Note that the function does not sleep, it can block. Submitted by: Giovanni Trematerra <giovanni.trematerra gmail com> [1] MFC after: 3 days
* Remove write-only variable.kib2010-02-221-3/+0
| | | | MFC after: 3 days
* Align the start of the clean submap to a superpage boundary. Althoughalc2010-02-211-1/+1
| | | | | | no superpage mappings are created within the clean submap, aligning the start of the clean submap helps to prevent interference with kmem_alloc()'s use of superpages.
* The MAP_ENTRY_NEEDS_COPY flag belongs to protoeflags, cow variablekib2010-01-291-1/+1
| | | | | | | uses different namespace. Reported by: Jonathan Anderson <jonathan.anderson cl cam ac uk> MFC after: 3 days
* When a vnode-backed vm object is referenced, it increments the vnodekib2010-01-172-1/+8
| | | | | | | | | | | | | | | reference count, and decrements it on dereference. If referenced object is deallocated, object type is reset to OBJT_DEAD. Consequently, all vnode references that are owned by object references are never released. vunref() the vnode in vm object deallocation code for OBJT_VNODE appropriate number of times to prevent leak. Add an assertion to the vm_pageout() to make sure that we never get reference on the vnode but then do not execute code to release it. In collaboration with: pho Reviewed by: alc MFC after: 3 weeks
* Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.rnoland2009-12-291-14/+3
| | | | | | | | | | | | | This replaces d_mmap() with the d_mmap2() implementation and also changes the type of offset to vm_ooffset_t. Purge d_mmap2(). All driver modules will need to be rebuilt since D_VERSION is also bumped. Reviewed by: jhb@ MFC after: Not in this lifetime...
* (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.antoine2009-12-281-2/+2
| | | | | | | | | Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month
* VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm objectkib2009-12-212-22/+6
| | | | | | | | | | | | | flag. Besides providing the redundand information, need to update both vnode and object flags causes more acquisition of vnode interlock. OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects. Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects. Suggested and reviewed by: alc Tested by: pho MFC after: 3 weeks
* Remove trailing ";" in UMA_HASH_INSERT and UMA_HASH_REMOVE macros.antoine2009-12-051-2/+2
| | | | MFC after: 1 month
* Properly synchronize the previous change.alc2009-11-281-0/+2
|
* Support the new VM_PROT_COPY option on wired pages. The effect of whichalc2009-11-271-3/+6
| | | | | is that a debugger can now set a breakpoint in a program that uses mlock(2) on its text segment or mlockall(2) on its entire address space.
* Simplify the invocation of vm_fault(). Specifically, eliminate the flagalc2009-11-272-9/+11
| | | | | | | VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault(). Discussed with: kib
* Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE hasalc2009-11-263-23/+10
| | | | | | | | | | | | | | | | | | | | | | represented a write access that is allowed to override write protection. Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into text pages. Text pages are not just write protected but they are also copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the text page and triggers the replication of the page so that the breakpoint will be written to a private copy. However, here is where things become confused. It is the debugger, not the process being debugged that requires write access to the copied page. Nonetheless, the copied page is being mapped into the process with write access enabled. In other words, once the debugger sets a breakpoint within a text page, the program can write to its private copy of that text page. Whereas prior to setting the breakpoint, a SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the replication of a copy-on-write page even though the access is only for read. Moreover, the replicated page is only mapped into the process with read access, and not write access. Reviewed by: kib MFC after: 4 weeks
* Simplify both the invocation and the implementation of vm_fault() for wiringalc2009-11-184-37/+16
| | | | | | | | | | pages. (Note: Claims made in the comments about the handling of breakpoints in wired pages have been false for roughly a decade. This and another bug involving breakpoints will be fixed in coming changes.) Reviewed by: kib
* Eliminate an unnecessary #include. (This #include should have been removedalc2009-11-041-1/+0
| | | | in r188331 when vnode_pager_lock() was eliminated.)
* Eliminate a bit of hackery from vm_fault(). The operations that thisalc2009-11-031-11/+0
| | | | | | | hackery sought to prevent are now properly supported by vm_map_protect(). (See r198505.) Reviewed by: kib
* Split P_NOLOAD into a per-thread flag (TDF_NOLOAD).attilio2009-11-031-11/+6
| | | | | | | | | | This improvements aims for avoiding further cache-misses in scheduler specific functions which need to keep track of average thread running time and further locking in places setting for this flag. Reported by: jeff (originally), kris (currently) Reviewed by: jhb Tested by: Giuseppe Cocomazzi <sbudella at email dot it>
* Avoid pointless calls to pmap_protect().alc2009-11-021-3/+3
| | | | Reviewed by: kib
* Add sysctl documentation strings. The descriptions are derivedivoras2009-11-021-3/+7
| | | | | | | | | from tuning(7). One of the descriptions references tuning(7) because it is too complex to adequatly describe here (it is not a simple boolean sysctl) and users should be warned to that. Reviewed by: alc, kib Approved by: gnn (mentor)
* Correct an error in vm_fault_copy_entry() that has existed since the firstalc2009-10-311-1/+1
| | | | | | | | | | | version of this file. When a process forks, any wired pages are immediately copied because copy-on-write is not supported for wired pages. In other words, the child process is given its own private copy of each wired page from its parent's address space. Unfortunately, to date, these copied pages have been mapped into the child's address space with the wrong permissions, typically VM_PROT_ALL. This change corrects the permissions. Reviewed by: kib
* When protection of wired read-only mapping is changed to read-write,kib2009-10-272-20/+56
| | | | | | | | | | | | | | install new shadow object behind the map entry and copy the pages from the underlying objects to it. This makes the mprotect(2) call to actually perform the requested operation instead of silently do nothing and return success, that causes SIGSEGV on later write access to the mapping. Reuse vm_fault_copy_entry() to do the copying, modifying it to behave correctly when src_entry == dst_entry. Reviewed by: alc MFC after: 3 weeks
* Simplify the inner loop of vm_fault_copy_entry().alc2009-10-261-13/+12
| | | | Reviewed by: kib
* Eliminate an unnecessary check from vm_fault_prefault().alc2009-10-251-2/+2
|
* o Introduce vm_sync_icache() for making the I-cache coherent withmarcel2009-10-213-0/+9
| | | | | | | | | | | | | | | | | | | | | the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent *after* writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent *before* any writes happen. The difference is key when the I-cache prefetches.
* Remove spurious call to priv_check(PRIV_VM_SWAP_NOQUOTA).kib2009-10-181-6/+4
| | | | | | | | | | | | Call priv_check(PRIV_VM_SWAP_NORLIMIT) only when per-uid limit is actually exceed. Both changes aim at calling priv_check(9) only for the cases when privilege is actually exercised by the process. Reported and tested by: rwatson Reviewed by: alc MFC after: 3 days
* Align and pad the page queue and free page queue locks so that the linkeralc2009-10-042-4/+14
| | | | | | can't possibly place them together within the same cache line. MFC after: 3 weeks
* Back out the functional parts from r197537. After r197711, affecting allbz2009-10-021-15/+0
| | | | user mappings, mmap no longer needs special treatment.
* Move the annotation for vm_map_startup() immediately before the function.kib2009-10-011-16/+16
| | | | MFC after: 3 days
* Do not allow mmap with the MAP_FIXED argument to map at address zero.simon2009-09-271-1/+18
| | | | | | | | | | | | | | | | | This is done to make it harder to exploit kernel NULL pointer security vulnerabilities. While this of course does not fix vulnerabilities, it does mitigate their impact. Note that this may break some applications, most likely emulators or similar, which for one reason or another require mapping memory at zero. This restriction can be disabled with the security.bsd.mmap_zero sysctl variable. Discussed with: rwatson, bz Tested by: bz (Wine), simon (VirtualBox) Submitted by: jhb
* Old (a.out) rtld attempts to mmap zero-length region, e.g. when bsskib2009-09-201-1/+3
| | | | | | | | | | | of the linked object is zero-length. More old code assumes that mmap of zero length returns success. For a.out and pre-8 ELF binaries, allow the mmap of zero length. Reported by: tegge Reviewed by: tegge, alc, jhb MFC after: 3 days
* Reintroduce the r196640, after fixing the problem with my testing.kib2009-09-012-45/+95
| | | | | | | | | | | | | | | | | | | | | | | | | Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho (and retested according to new test scenarious) MFC after: 1 week
* Reverse r196640 and r196644 for now.kib2009-08-292-95/+45
|
* Remove the altkstacks, instead instantiate threads with kernel stackkib2009-08-292-45/+95
| | | | | | | | | | | | | | | | | | | | | | | | allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho MFC after: 1 week
* Mark the fake pages constructed by the OBJT_SG pager valid. This wasjhb2009-08-291-0/+1
| | | | | | | | accidentally lost at one point during the PAT development. Without this fix vm_pager_get_pages() was zeroing each of the pages. Submitted by: czander @ NVidia MFC after: 3 days
* Extend the device pager to support different memory attributes on differentjhb2009-08-282-11/+21
| | | | | | | | | | | | | | | pages in an object. - Add a new variant of d_mmap() currently called d_mmap2() which accepts an additional in/out parameter that is the memory attribute to use for the requested page. - A driver either uses d_mmap() or d_mmap2() for all requests but not both. The current implementation uses a flag in the cdevsw (D_MMAP2) to indicate that the driver provides a d_mmap2() handler instead of d_mmap(). This is done to make the change ABI compatible with existing drivers and MFC'able to 7 and 8. Submitted by: alc MFC after: 1 month
* Remove debugging that crept in with previous commit.jhb2009-07-241-5/+1
| | | | | Reported by: nwhitehorn Approved by: re (kib)
* Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar tojhb2009-07-2411-11/+293
| | | | | | | | | | | a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
* Change the handling of fictitious pages by pmap_page_set_memattr() onalc2009-07-191-12/+4
| | | | | | | | | | | | | | | | | amd64 and i386. Essentially, fictitious pages provide a mechanism for creating aliases for either normal or device-backed pages. Therefore, pmap_page_set_memattr() on a fictitious page needn't update the direct map or flush the cache. Such actions are the responsibility of the "primary" instance of the page or the device driver that "owns" the physical address. For example, these actions are already performed by pmap_mapdev(). The device pager needn't restore the memory attributes on a fictitious page before releasing it. It's now pointless. Add pmap_page_set_memattr() to the Xen pmap. Approved by: re (kib)
* An addendum to r195649, "Add support to the virtual memory system foralc2009-07-181-1/+3
| | | | | | | | | | | | | | | | configuring machine-dependent memory attributes...": Don't set the memory attribute for a "real" page that is allocated to a device object in vm_page_alloc(). It is a pointless act, because the device pager replaces this "real" page with a "fake" page and sets the memory attribute on that "fake" page. Eliminate pointless code from pmap_cache_bits() on amd64. Employ the "Self Snoop" feature supported by some x86 processors to avoid cache flushes in the pmap. Approved by: re (kib)
* - Change mmap() to fail requests with EINVAL that pass a length of 0. Thisjhb2009-07-141-1/+1
| | | | | | | | | | | | behavior is mandated by POSIX. - Do not fail requests that pass a length greater than SSIZE_MAX (such as > 2GB on 32-bit platforms). The 'len' parameter is actually an unsigned 'size_t' so negative values don't really make sense. Submitted by: Alexander Best alexbestms at math.uni-muenster.de Reviewed by: alc Approved by: re (kib) MFC after: 1 week
OpenPOWER on IntegriCloud