summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* When a vnode-backed vm object is referenced, it increments the vnodekib2010-01-172-1/+8
| | | | | | | | | | | | | | | reference count, and decrements it on dereference. If referenced object is deallocated, object type is reset to OBJT_DEAD. Consequently, all vnode references that are owned by object references are never released. vunref() the vnode in vm object deallocation code for OBJT_VNODE appropriate number of times to prevent leak. Add an assertion to the vm_pageout() to make sure that we never get reference on the vnode but then do not execute code to release it. In collaboration with: pho Reviewed by: alc MFC after: 3 weeks
* Update d_mmap() to accept vm_ooffset_t and vm_memattr_t.rnoland2009-12-291-14/+3
| | | | | | | | | | | | | This replaces d_mmap() with the d_mmap2() implementation and also changes the type of offset to vm_ooffset_t. Purge d_mmap2(). All driver modules will need to be rebuilt since D_VERSION is also bumped. Reviewed by: jhb@ MFC after: Not in this lifetime...
* (S)LIST_HEAD_INITIALIZER takes a (S)LIST_HEAD as an argument.antoine2009-12-281-2/+2
| | | | | | | | | Fix some wrong usages. Note: this does not affect generated binaries as this argument is not used. PR: 137213 Submitted by: Eygene Ryabinkin (initial version) MFC after: 1 month
* VI_OBJDIRTY vnode flag mirrors the state of OBJ_MIGHTBEDIRTY vm objectkib2009-12-212-22/+6
| | | | | | | | | | | | | flag. Besides providing the redundand information, need to update both vnode and object flags causes more acquisition of vnode interlock. OBJ_MIGHTBEDIRTY is only checked for vnode-backed vm objects. Remove VI_OBJDIRTY and make sure that OBJ_MIGHTBEDIRTY is set only for vnode-backed vm objects. Suggested and reviewed by: alc Tested by: pho MFC after: 3 weeks
* Remove trailing ";" in UMA_HASH_INSERT and UMA_HASH_REMOVE macros.antoine2009-12-051-2/+2
| | | | MFC after: 1 month
* Properly synchronize the previous change.alc2009-11-281-0/+2
|
* Support the new VM_PROT_COPY option on wired pages. The effect of whichalc2009-11-271-3/+6
| | | | | is that a debugger can now set a breakpoint in a program that uses mlock(2) on its text segment or mlockall(2) on its entire address space.
* Simplify the invocation of vm_fault(). Specifically, eliminate the flagalc2009-11-272-9/+11
| | | | | | | VM_FAULT_DIRTY. The information provided by this flag can be trivially inferred by vm_fault(). Discussed with: kib
* Replace VM_PROT_OVERRIDE_WRITE by VM_PROT_COPY. VM_PROT_OVERRIDE_WRITE hasalc2009-11-263-23/+10
| | | | | | | | | | | | | | | | | | | | | | represented a write access that is allowed to override write protection. Until now, VM_PROT_OVERRIDE_WRITE has been used to write breakpoints into text pages. Text pages are not just write protected but they are also copy-on-write. VM_PROT_OVERRIDE_WRITE overrides the write protection on the text page and triggers the replication of the page so that the breakpoint will be written to a private copy. However, here is where things become confused. It is the debugger, not the process being debugged that requires write access to the copied page. Nonetheless, the copied page is being mapped into the process with write access enabled. In other words, once the debugger sets a breakpoint within a text page, the program can write to its private copy of that text page. Whereas prior to setting the breakpoint, a SIGSEGV would have occurred upon a write access. VM_PROT_COPY addresses this problem. The combination of VM_PROT_READ and VM_PROT_COPY forces the replication of a copy-on-write page even though the access is only for read. Moreover, the replicated page is only mapped into the process with read access, and not write access. Reviewed by: kib MFC after: 4 weeks
* Simplify both the invocation and the implementation of vm_fault() for wiringalc2009-11-184-37/+16
| | | | | | | | | | pages. (Note: Claims made in the comments about the handling of breakpoints in wired pages have been false for roughly a decade. This and another bug involving breakpoints will be fixed in coming changes.) Reviewed by: kib
* Eliminate an unnecessary #include. (This #include should have been removedalc2009-11-041-1/+0
| | | | in r188331 when vnode_pager_lock() was eliminated.)
* Eliminate a bit of hackery from vm_fault(). The operations that thisalc2009-11-031-11/+0
| | | | | | | hackery sought to prevent are now properly supported by vm_map_protect(). (See r198505.) Reviewed by: kib
* Split P_NOLOAD into a per-thread flag (TDF_NOLOAD).attilio2009-11-031-11/+6
| | | | | | | | | | This improvements aims for avoiding further cache-misses in scheduler specific functions which need to keep track of average thread running time and further locking in places setting for this flag. Reported by: jeff (originally), kris (currently) Reviewed by: jhb Tested by: Giuseppe Cocomazzi <sbudella at email dot it>
* Avoid pointless calls to pmap_protect().alc2009-11-021-3/+3
| | | | Reviewed by: kib
* Add sysctl documentation strings. The descriptions are derivedivoras2009-11-021-3/+7
| | | | | | | | | from tuning(7). One of the descriptions references tuning(7) because it is too complex to adequatly describe here (it is not a simple boolean sysctl) and users should be warned to that. Reviewed by: alc, kib Approved by: gnn (mentor)
* Correct an error in vm_fault_copy_entry() that has existed since the firstalc2009-10-311-1/+1
| | | | | | | | | | | version of this file. When a process forks, any wired pages are immediately copied because copy-on-write is not supported for wired pages. In other words, the child process is given its own private copy of each wired page from its parent's address space. Unfortunately, to date, these copied pages have been mapped into the child's address space with the wrong permissions, typically VM_PROT_ALL. This change corrects the permissions. Reviewed by: kib
* When protection of wired read-only mapping is changed to read-write,kib2009-10-272-20/+56
| | | | | | | | | | | | | | install new shadow object behind the map entry and copy the pages from the underlying objects to it. This makes the mprotect(2) call to actually perform the requested operation instead of silently do nothing and return success, that causes SIGSEGV on later write access to the mapping. Reuse vm_fault_copy_entry() to do the copying, modifying it to behave correctly when src_entry == dst_entry. Reviewed by: alc MFC after: 3 weeks
* Simplify the inner loop of vm_fault_copy_entry().alc2009-10-261-13/+12
| | | | Reviewed by: kib
* Eliminate an unnecessary check from vm_fault_prefault().alc2009-10-251-2/+2
|
* o Introduce vm_sync_icache() for making the I-cache coherent withmarcel2009-10-213-0/+9
| | | | | | | | | | | | | | | | | | | | | the memory or D-cache, depending on the semantics of the platform. vm_sync_icache() is basically a wrapper around pmap_sync_icache(), that translates the vm_map_t argumument to pmap_t. o Introduce pmap_sync_icache() to all PMAP implementation. For powerpc it replaces the pmap_page_executable() function, added to solve the I-cache problem in uiomove_fromphys(). o In proc_rwmem() call vm_sync_icache() when writing to a page that has execute permissions. This assures that when breakpoints are written, the I-cache will be coherent and the process will actually hit the breakpoint. o This also fixes the Book-E PMAP implementation that was missing necessary locking while trying to deal with the I-cache coherency in pmap_enter() (read: mmu_booke_enter_locked). The key property of this change is that the I-cache is made coherent *after* writes have been done. Doing it in the PMAP layer when adding or changing a mapping means that the I-cache is made coherent *before* any writes happen. The difference is key when the I-cache prefetches.
* Remove spurious call to priv_check(PRIV_VM_SWAP_NOQUOTA).kib2009-10-181-6/+4
| | | | | | | | | | | | Call priv_check(PRIV_VM_SWAP_NORLIMIT) only when per-uid limit is actually exceed. Both changes aim at calling priv_check(9) only for the cases when privilege is actually exercised by the process. Reported and tested by: rwatson Reviewed by: alc MFC after: 3 days
* Align and pad the page queue and free page queue locks so that the linkeralc2009-10-042-4/+14
| | | | | | can't possibly place them together within the same cache line. MFC after: 3 weeks
* Back out the functional parts from r197537. After r197711, affecting allbz2009-10-021-15/+0
| | | | user mappings, mmap no longer needs special treatment.
* Move the annotation for vm_map_startup() immediately before the function.kib2009-10-011-16/+16
| | | | MFC after: 3 days
* Do not allow mmap with the MAP_FIXED argument to map at address zero.simon2009-09-271-1/+18
| | | | | | | | | | | | | | | | | This is done to make it harder to exploit kernel NULL pointer security vulnerabilities. While this of course does not fix vulnerabilities, it does mitigate their impact. Note that this may break some applications, most likely emulators or similar, which for one reason or another require mapping memory at zero. This restriction can be disabled with the security.bsd.mmap_zero sysctl variable. Discussed with: rwatson, bz Tested by: bz (Wine), simon (VirtualBox) Submitted by: jhb
* Old (a.out) rtld attempts to mmap zero-length region, e.g. when bsskib2009-09-201-1/+3
| | | | | | | | | | | of the linked object is zero-length. More old code assumes that mmap of zero length returns success. For a.out and pre-8 ELF binaries, allow the mmap of zero length. Reported by: tegge Reviewed by: tegge, alc, jhb MFC after: 3 days
* Reintroduce the r196640, after fixing the problem with my testing.kib2009-09-012-45/+95
| | | | | | | | | | | | | | | | | | | | | | | | | Remove the altkstacks, instead instantiate threads with kernel stack allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho (and retested according to new test scenarious) MFC after: 1 week
* Reverse r196640 and r196644 for now.kib2009-08-292-95/+45
|
* Remove the altkstacks, instead instantiate threads with kernel stackkib2009-08-292-45/+95
| | | | | | | | | | | | | | | | | | | | | | | | allocated with the right size from the start. For the thread that has kernel stack cached, verify that requested stack size is equial to the actual, and reallocate the stack if sizes differ [1]. This fixes the bug introduced by r173361 that was committed several days after r173004 and consisted of kthread_add(9) ignoring the non-default kernel stack size. Also, r173361 removed the caching of the kernel stacks for a non-first thread in the process. Introduce separate kernel stack cache that keeps some limited amount of preallocated kernel stacks to lower the latency of thread allocation. Add vm_lowmem handler to prune the cache on low memory condition. This way, system with reasonable amount of the threads get lower latency of thread creation, while still not exhausting significant portion of KVA for unused kstacks. Submitted by: peter [1] Discussed with: jhb, julian, peter Reviewed by: jhb Tested by: pho MFC after: 1 week
* Mark the fake pages constructed by the OBJT_SG pager valid. This wasjhb2009-08-291-0/+1
| | | | | | | | accidentally lost at one point during the PAT development. Without this fix vm_pager_get_pages() was zeroing each of the pages. Submitted by: czander @ NVidia MFC after: 3 days
* Extend the device pager to support different memory attributes on differentjhb2009-08-282-11/+21
| | | | | | | | | | | | | | | pages in an object. - Add a new variant of d_mmap() currently called d_mmap2() which accepts an additional in/out parameter that is the memory attribute to use for the requested page. - A driver either uses d_mmap() or d_mmap2() for all requests but not both. The current implementation uses a flag in the cdevsw (D_MMAP2) to indicate that the driver provides a d_mmap2() handler instead of d_mmap(). This is done to make the change ABI compatible with existing drivers and MFC'able to 7 and 8. Submitted by: alc MFC after: 1 month
* Remove debugging that crept in with previous commit.jhb2009-07-241-5/+1
| | | | | Reported by: nwhitehorn Approved by: re (kib)
* Add a new type of VM object: OBJT_SG. An OBJT_SG object is very similar tojhb2009-07-2411-11/+293
| | | | | | | | | | | a device pager (OBJT_DEVICE) object in that it uses fictitious pages to provide aliases to other memory addresses. The primary difference is that it uses an sglist(9) to determine the physical addresses for a given offset into the object instead of invoking the d_mmap() method in a device driver. Reviewed by: alc Approved by: re (kensmith) MFC after: 2 weeks
* Change the handling of fictitious pages by pmap_page_set_memattr() onalc2009-07-191-12/+4
| | | | | | | | | | | | | | | | | amd64 and i386. Essentially, fictitious pages provide a mechanism for creating aliases for either normal or device-backed pages. Therefore, pmap_page_set_memattr() on a fictitious page needn't update the direct map or flush the cache. Such actions are the responsibility of the "primary" instance of the page or the device driver that "owns" the physical address. For example, these actions are already performed by pmap_mapdev(). The device pager needn't restore the memory attributes on a fictitious page before releasing it. It's now pointless. Add pmap_page_set_memattr() to the Xen pmap. Approved by: re (kib)
* An addendum to r195649, "Add support to the virtual memory system foralc2009-07-181-1/+3
| | | | | | | | | | | | | | | | configuring machine-dependent memory attributes...": Don't set the memory attribute for a "real" page that is allocated to a device object in vm_page_alloc(). It is a pointless act, because the device pager replaces this "real" page with a "fake" page and sets the memory attribute on that "fake" page. Eliminate pointless code from pmap_cache_bits() on amd64. Employ the "Self Snoop" feature supported by some x86 processors to avoid cache flushes in the pmap. Approved by: re (kib)
* - Change mmap() to fail requests with EINVAL that pass a length of 0. Thisjhb2009-07-141-1/+1
| | | | | | | | | | | | behavior is mandated by POSIX. - Do not fail requests that pass a length greater than SSIZE_MAX (such as > 2GB on 32-bit platforms). The 'len' parameter is actually an unsigned 'size_t' so negative values don't really make sense. Submitted by: Alexander Best alexbestms at math.uni-muenster.de Reviewed by: alc Approved by: re (kib) MFC after: 1 week
* Add support to the virtual memory system for configuring machine-alc2009-07-1210-43/+125
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | dependent memory attributes: Rename vm_cache_mode_t to vm_memattr_t. The new name reflects the fact that there are machine-dependent memory attributes that have nothing to do with controlling the cache's behavior. Introduce vm_object_set_memattr() for setting the default memory attributes that will be given to an object's pages. Introduce and use pmap_page_{get,set}_memattr() for getting and setting a page's machine-dependent memory attributes. Add full support for these functions on amd64 and i386 and stubs for them on the other architectures. The function pmap_page_set_memattr() is also responsible for any other machine-dependent aspects of changing a page's memory attributes, such as flushing the cache or updating the direct map. The uses include kmem_alloc_contig(), vm_page_alloc(), and the device pager: kmem_alloc_contig() can now be used to allocate kernel memory with non-default memory attributes on amd64 and i386. vm_page_alloc() and the device pager will set the memory attributes for the real or fictitious page according to the object's default memory attributes. Update the various pmap functions on amd64 and i386 that map pages to incorporate each page's memory attributes in the mapping. Notes: (1) Inherent to this design are safety features that prevent the specification of inconsistent memory attributes by different mappings on amd64 and i386. In addition, the device pager provides a warning when a device driver creates a fictitious page with memory attributes that are inconsistent with the real page that the fictitious page is an alias for. (2) Storing the machine-dependent memory attributes for amd64 and i386 as a dedicated "int" in "struct md_page" represents a compromise between space efficiency and the ease of MFCing these changes to RELENG_7. In collaboration with: jhb Approved by: re (kib)
* When VM_MAP_WIRE_HOLESOK is not specified and vm_map_wire(9) encounterskib2009-07-121-1/+1
| | | | | | | | | | | | non-readable and non-executable map entry, the entry is skipped from wiring and loop is aborted. But, since MAP_ENTRY_WIRE_SKIPPED was not set for the map entry, its wired_count is later erronously decremented. vm_map_delete(9) for such map entry stuck in "vmmaps". Properly set MAP_ENTRY_WIRE_SKIPPED when aborting the loop. Reported by: John Marshall <john.marshall riverwillow com au> Approved by: re (kensmith)
* When forking a vm space that has wired map entries, do not forget tokib2009-07-033-12/+16
| | | | | | | | | charge the objects created by vm_fault_copy_entry. The object charge was set, but reserve not incremented. Reported by: Greg Rivers <gcr+freebsd-current tharned org> Reviewed by: alc (previous version) Approved by: re (kensmith)
* Eliminiate code duplication by calling vm_object_destroy()kib2009-06-281-18/+4
| | | | | | | from vm_object_collapse(). Requested and reviewed by: alc Approved by: re (kensmith)
* This change is the next step in implementing the cache control functionalityalc2009-06-265-6/+16
| | | | | | | | | | | required by video card drivers. Specifically, this change introduces vm_cache_mode_t with an appropriate VM_CACHE_DEFAULT definition on all architectures. In addition, this changes adds a vm_cache_mode_t parameter to kmem_alloc_contig() and vm_phys_alloc_contig(). These will be the interfaces for allocating mapped kernel memory and physical memory, respectively, with non-default cache modes. In collaboration with: jhb
* Change the type of uio_resid member of struct uio from int to ssize_t.kib2009-06-251-1/+1
| | | | | | | | Note that this does not actually enable full-range i/o requests for 64 architectures, and is done now to update KBI only. Tested by: pho Reviewed by: jhb, bde (as part of the review of the bigger patch)
* Initialize the uip to silence gcc warning that seems to sneak in in somekib2009-06-241-0/+1
| | | | | | build environments. Reported by: alc, bf1783 at googlemail com
* The bits set in a page's dirty mask are a subset of the bits set in itsalc2009-06-242-10/+8
| | | | | | | | valid mask. Consequently, there is no need to perform a bit-wise and of the page's dirty and valid masks in order to determine which parts of a page are dirty and valid. Eliminate an unnecessary #include.
* Implement global and per-uid accounting of the anonymous memory. Addkib2009-06-2316-61/+575
| | | | | | | | | | | | | | | | | | | | | | | | | | | rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
* Validate the page in one place, dev_pager_getpages(), rather than doing italc2009-06-221-7/+6
| | | | | | in two places, dev_pager_getfake() and dev_pager_updatefake(). Compare a pointer to "NULL" rather than "0".
* Implement a mechanism within vm_phys_alloc_contig() to defer all necessaryalc2009-06-211-9/+20
| | | | | | calls to vdrop() until after the free page queues lock is released. This eliminates repeatedly releasing and reacquiring the free page queues lock each time the last cached page is reclaimed from a vnode-backed object.
* Strive for greater consistency among the places that implement real,alc2009-06-213-13/+18
| | | | | fictious, and contiguous page allocation. Eliminate unnecessary reinitialization of a page's fields.
* Track the kernel mapping of a physical page by a new entry in vm_pagethompsa2009-06-181-2/+1
| | | | | | | | | | | structure. When the page is shared, the kernel mapping becomes a special type of managed page to force the cache off the page mappings. This is needed to avoid stale entries on all ARM VIVT caches, and VIPT caches with cache color issue. Submitted by: Mark Tinguely Reviewed by: alc Tested by: Grzegorz Bernacki, thompsa
* Add support for UMA_SLAB_KERNEL to page_free(). (While I'm here remove analc2009-06-181-2/+4
| | | | unnecessary newline character from the end of two panic messages.)
OpenPOWER on IntegriCloud