summaryrefslogtreecommitdiffstats
path: root/sys/vm/vm_kern.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r263710, r273377, r273378, r273423 and r273455:hselasky2014-10-271-2/+2
| | | | | | | - De-vnet hash sizes and hash masks. - Fix multiple issues related to arguments passed to SYSCTL macros. Sponsored by: Mellanox Technologies
* Merge the changes to pmap_enter(9) for sleep-less operation (requestedkib2014-08-241-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | by flag). The ia64 pmap.c changes are direct commit, since ia64 is removed on head. MFC r269368 (by alc): Retire PVO_EXECUTABLE. MFC r269728: Change pmap_enter(9) interface to take flags parameter and superpage mapping size (currently unused). MFC r269759 (by alc): Update the text of a KASSERT() to reflect the changes in r269728. MFC r269822 (by alc): Change {_,}pmap_allocpte() so that they look for the flag PMAP_ENTER_NOSLEEP instead of M_NOWAIT/M_WAITOK when deciding whether to sleep on page table page allocation. MFC r270151 (by alc): Replace KASSERT that no PV list locks are held with a conditional unlock. Reviewed by: alc Approved by: re (gjb) Sponsored by: The FreeBSD Foundation
* Add a mmap flag (MAP_32BIT) on 64-bit platforms to request that a mapping usejhb2013-09-091-1/+1
| | | | | | | | | | | | | an address in the first 2GB of the process's address space. This flag should have the same semantics as the same flag on Linux. To facilitate this, add a new parameter to vm_map_find() that specifies an optional maximum virtual address. While here, fix several callers of vm_map_find() to use a VMFS_* constant for the findspace argument instead of TRUE and FALSE. Reviewed by: alc Approved by: re (kib)
* Add new mmap(2) flags to permit applications to request specific virtualjhb2013-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | address alignment of mappings. - MAP_ALIGNED(n) requests a mapping aligned on a boundary of (1 << n). Requests for n >= number of bits in a pointer or less than the size of a page fail with EINVAL. This matches the API provided by NetBSD. - MAP_ALIGNED_SUPER is a special case of MAP_ALIGNED. It can be used to optimize the chances of using large pages. By default it will align the mapping on a large page boundary (the system is free to choose any large page size to align to that seems best for the mapping request). However, if the object being mapped is already using large pages, then it will align the virtual mapping to match the existing large pages in the object instead. - Internally, VMFS_ALIGNED_SPACE is now renamed to VMFS_SUPER_SPACE, and VMFS_ALIGNED_SPACE(n) is repurposed for specifying a specific alignment. MAP_ALIGNED(n) maps to using VMFS_ALIGNED_SPACE(n), while MAP_ALIGNED_SUPER maps to VMFS_SUPER_SPACE. - mmap() of a device object now uses VMFS_OPTIMAL_SPACE rather than explicitly using VMFS_SUPER_SPACE. All device objects are forced to use a specific color on creation, so VMFS_OPTIMAL_SPACE is effectively equivalent. Reviewed by: alc MFC after: 1 month
* Replace kernel virtual address space allocation with vmem. This providesjeff2013-08-071-268/+116
| | | | | | | | | | | | | transparent layering and better fragmentation. - Normalize functions that allocate memory to use kmem_* - Those that allocate address space are named kva_* - Those that operate on maps are named kmap_* - Implement recursive allocation handling for kmem_arena in vmem. Reviewed by: alc Tested by: pho Sponsored by: EMC / Isilon Storage Division
* - Add a general purpose resource allocator, vmem, from NetBSD. It wasjeff2013-06-281-2/+0
| | | | | | | | | | | | | | originally inspired by the Solaris vmem detailed in the proceedings of usenix 2001. The NetBSD version was heavily refactored for bugs and simplicity. - Use this resource allocator to allocate the buffer and transient maps. Buffer cache defrags are reduced by 25% when used by filesystems with mixed block sizes. Ultimately this may permit dynamic buffer cache sizing on low KVA machines. Discussed with: alc, kib, attilio Tested by: pho Sponsored by: EMC / Isilon Storage Division
* Implement the concept of the unmapped VMIO buffers, i.e. buffers whichkib2013-03-191-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks
* Remove excessive and inconsistent initializers for the various kernelkib2013-03-141-4/+4
| | | | | | maps and submaps. MFC after: 2 weeks
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-15/+15
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-1/+1
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* On arm, like sparc64, the end of the kernel map varies from one type ofalc2013-02-181-1/+1
| | | | | | | | | machine to another. Therefore, VM_MAX_KERNEL_ADDRESS can't be a constant. Instead, #define it to be a variable, vm_max_kernel_address, just like we do on sparc64. Reviewed by: kib Tested by: ian
* Try to improve r242655 take III: move these SYSCTLs describing the kernelmarius2013-02-041-0/+11
| | | | | | map, which is defined and initialized in vm/vm_kern.c, to the latter. Submitted by: alc
* Flip the semantic of M_NOWAIT to only require the allocation to notkib2012-11-141-23/+3
| | | | | | | | | | | | | | | | | | | | sleep, and perform the page allocations with VM_ALLOC_SYSTEM class. Previously, the allocation was also allowed to completely drain the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT request class for vm_page_alloc() and similar functions. Allow the caller of malloc* to request the 'deep drain' semantic by providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM allocation class. Centralize the translation of the M_* malloc(9) flags in the single inline function malloc2vm_flags(). Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com> Reviewed by: alc, mdf (previous version) Tested by: pho (previous version) MFC after: 2 weeks
* Move what remains of vm/vm_contig.c into vm/vm_pageout.c, where similaralc2012-07-181-2/+2
| | | | | | code resides. Rename vm_contig_grow_cache() to vm_pageout_grow_cache(). Reviewed by: kib
* Move kmem_alloc_{attr,contig}() to vm/vm_kern.c, where similarly namedalc2012-07-141-0/+142
| | | | functions reside. Correct the comment describing kmem_alloc_contig().
* Simplify kmem_alloc() by eliminating code that existed on account ofalc2012-02-291-30/+0
| | | | | | | | | external pagers in Mach. FreeBSD doesn't implement external pagers. Moreover, it don't pageout the kernel object. So, the reasons for having code don't hold. Reviewed by: kib MFC after: 6 weeks
* exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64kmacy2012-01-271-0/+2
| | | | | | | | excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create Reviewed by: alc, avg MFC after: 2 weeks
* Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls toalc2011-10-271-1/+1
| | | | | | vm_page_alloc(). While I'm here, for the sake of consistency, always specify the allocation class, such as VM_ALLOC_NORMAL, as the first of the flags.
* - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flagkib2011-08-091-2/+2
| | | | | | | | | | | | | | to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)
* Move the ZERO_REGION_SIZE to a machine-dependent file, as on manymdf2011-05-131-9/+6
| | | | | | | | | | | | | | | | | | architectures (i386, for example) the virtual memory space may be constrained enough that 2MB is a large chunk. Use 64K for arches other than amd64 and ia64, with special handling for sparc64 due to differing hardware. Also commit the comment changes to kmem_init_zero_region() that I missed due to not saving the file. (Darn the unfamiliar development environment). Arch maintainers, please feel free to adjust ZERO_REGION_SIZE as you see fit. Requested by: alc MFC after: 1 week MFC with: r221853
* Usa a globally visible region of zeros for both /dev/zero and the mdmdf2011-05-131-0/+34
| | | | | | | | device. There are likely other kernel uses of "blob of zeros" than can be converted. Reviewed by: alc MFC after: 1 week
* Since r218070 reenabled the call to vm_map_simplify_entry() fromkib2011-02-151-9/+23
| | | | | | | | | | | | | | | | | vm_map_insert(), the kmem_back() assumption about newly inserted entry might be broken due to interference of two factors. In the low memory condition, when vm_page_alloc() returns NULL, supplied map is unlocked. If another thread performs kmem_malloc() meantime, and its map entry is placed right next to our thread map entry in the map, both entries wire count is still 0 and entries are coalesced due to vm_map_simplify_entry(). Mark new entry with MAP_ENTRY_IN_TRANSITION to prevent coalesce. Fix some style issues, tighten the assertions to account for MAP_ENTRY_IN_TRANSITION state. Reported and tested by: pho Reviewed by: alc
* Replace an XXX comment with the appropriate code.mdf2010-09-201-5/+1
| | | | Submitted by: alc
* Rework memguard(9) to reserve significantly more KVA to detectmdf2010-08-111-8/+27
| | | | | | | | | | | | | | | | | use-after-free over a longer time. Also release the backing pages of a guarded allocation at free(9) time to reduce the overhead of using memguard(9). Allow setting and varying the malloc type at run-time. Add knobs to allow: - randomly guarding memory - adding un-backed KVA guard pages to detect underflow and overflow - a lower limit on the size of allocations that are guarded Reviewed by: alc Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page) Silence from: -arch Approved by: zml (mentor) MFC after: 1 month
* The pages allocated by kmem_alloc_attr() and kmem_malloc() are unmanaged.alc2010-05-031-4/+0
| | | | | Consequently, neither the page lock nor the page queues lock is needed to unwire and free them.
* On Alan's advice, rather than do a wholesale conversion on a singlekmacy2010-04-301-0/+2
| | | | | | | | | | | | architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
* o) Add a VM find-space option, VMFS_TLB_ALIGNED_SPACE, which searches thejmallett2010-04-181-0/+29
| | | | | | | | | | | | | | | | | | | | | address space for an address as aligned by the new pmap_align_tlb() function, which is for constraints imposed by the TLB. [1] o) Add a kmem_alloc_nofault_space() function, which acts like kmem_alloc_nofault() but allows the caller to specify which find-space option to use. [1] o) Use kmem_alloc_nofault_space() with VMFS_TLB_ALIGNED_SPACE to allocate the kernel stack address on MIPS. [1] o) Make pmap_align_tlb() on MIPS align addresses so that they do not start on an odd boundary within the TLB, so that they are suitable for insertion as wired entries and do not have to share a TLB entry with another mapping, assuming they are appropriately-sized. o) Eliminate md_realstack now that the kstack will be appropriately-aligned on MIPS. o) Increase the number of guard pages to 2 so that we retain the proper alignment of the kstack address. Reviewed by: [1] alc X-MFC-after: Making sure alc has not come up with a better interface.
* Implement global and per-uid accounting of the anonymous memory. Addkib2009-06-231-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
* Revert the addition of the freelist argument for the vm_map_delete()kib2009-02-241-9/+3
| | | | | | | | | function, done in r188334. Instead, collect the entries that shall be freed, in the deferred_freelist member of the map. Automatically purge the deferred freelist when map is unlocked. Tested by: pho Reviewed by: alc
* Put debug.vm_lowmem sysctl under DIAGNOSTIC.rwatson2009-02-231-0/+2
| | | | | Submitted by: sam MFC after: 3 days
* Add a debugging sysctl, debug.vm_lowmem, that when assigned a value ofrwatson2009-02-231-0/+22
| | | | | | | | 1 will trigger a pass through the VM's low-memory handlers, such as protocol and UMA drain routines. This makes it easier to exercise these otherwise rarely-invoked code paths. MFC after: 3 days
* Do not call vm_object_deallocate() from vm_map_delete(), because wekib2009-02-081-3/+9
| | | | | | | | | | hold the map lock there, and might need the vnode lock for OBJT_VNODE objects. Postpone object deallocation until caller of vm_map_delete() drops the map lock. Link the map entries to be freed into the freelist, that is released by the new helper function vm_map_entry_free_freelist(). Reviewed by: tegge, alc Tested by: pho
* Eliminate stale comments from kmem_malloc().alc2008-07-181-12/+0
|
* Make preparations for increasing the size of the kernel virtual address spacealc2008-06-221-2/+6
| | | | | | | | on the amd64 architecture. The amd64 architecture requires kernel code and global variables to reside in the highest 2GB of the 64-bit virtual address space. Thus, the memory allocated during bootstrap, before the call to kmem_init(), starts at KERNBASE, which is not necessarily the same as VM_MIN_KERNEL_ADDRESS on amd64.
* Introduce a new parameter "superpage_align" to kmem_suballoc() that isalc2008-05-101-11/+7
| | | | | | | | | | | used to request superpage alignment for the submap. Request superpage alignment for the kmem_map. Pass VMFS_ANY_SPACE instead of TRUE to vm_map_find(). (They are currently equivalent but VMFS_ANY_SPACE is the new preferred spelling.) Remove a stale comment from kmem_malloc().
* Eliminate pointless casts from kmem_suballoc().alc2008-04-281-2/+2
|
* Eliminate an unnecessary printf() from kmem_suballoc(). The subsequentalc2008-03-301-4/+2
| | | | panic() can be extended to convey the same information.
* When one tries to allocate memory with the M_WAITOK flag and we are short inpjd2008-01-101-6/+13
| | | | | | | | | | | | address space in kmem map call vm_lowmem event in a loop and wait a bit for subsystems to reclaim some memory which in turn will reclaim address space as well. Note, this is a work-around. Reviewed by: alc Approved by: alc MFC after: 3 days
* Add an access type parameter to pmap_enter(). It will be used to implementalc2008-01-031-1/+2
| | | | | | | superpage promotion. Correct a style error in kmem_malloc(): pmap_enter()'s last parameter is a Boolean.
* Change unused 'user_wait' argument to 'timo' argument, which will bepjd2007-11-071-1/+1
| | | | | | | used to specify timeout for msleep(9). Discussed with: alc Reviewed by: alc
* When KVA is exhausted, try the vm_lowmem event for the last time beforepjd2007-04-051-4/+14
| | | | panicing. This helps a lot in ZFS stability.
* Change the way that unmanaged pages are created. Specifically,alc2007-02-251-6/+4
| | | | | | | | | | | | | | immediately flag any page that is allocated to a OBJT_PHYS object as unmanaged in vm_page_alloc() rather than waiting for a later call to vm_page_unmanage(). This allows for the elimination of some uses of the page queues lock. Change the type of the kernel and kmem objects from OBJT_DEFAULT to OBJT_PHYS. This allows us to take advantage of the above change to simplify the allocation of unmanaged pages in kmem_alloc() and kmem_malloc(). Remove vm_page_unmanage(). It is no longer used.
* Declare the map entry created by kmem_init() for the range fromalc2007-01-071-1/+2
| | | | | VM_MIN_KERNEL_ADDRESS to the end of the kernel's bootstrap data as MAP_NOFAULT.
* There is no point in setting PG_REFERENCED on kmem_object pages becausealc2006-11-131-6/+1
| | | | | | they are "unmanaged", i.e., non-pageable, pages. Remove a stale comment.
* Make pmap_enter() responsible for setting PG_WRITEABLE insteadalc2006-11-121-1/+1
| | | | | of its caller. (As a beneficial side-effect, a high-contention acquisition of the page queues lock in vm_fault() is eliminated.)
* The page queues lock is no longer required by vm_page_wakeup().alc2006-10-231-1/+1
|
* /* -> /*- for license, minor formatting changesimp2005-01-071-1/+1
|
* Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup().alc2004-10-241-2/+1
|
* Back out all behavioral chnages.green2004-08-101-4/+0
|
* Revamp VM map wiring.green2004-08-091-0/+4
| | | | | | | | | | | | | | | | | * Allow no-fault wiring/unwiring to succeed for consistency; however, the wired count remains at zero, so it's a special case. * Fix issues inside vm_map_wire() and vm_map_unwire() where the exact state of user wiring (one or zero) and system wiring (zero or more) could be confused; for example, system unwiring could succeed in removing a user wire, instead of being an error. * Require all mappings to be unwired before they are deleted. When VM space is still wired upon deletion, it will be waited upon for the following unwire. This makes vslock(9) work rather than allowing kernel-locked memory to be deleted out from underneath of its consumer as it would before.
OpenPOWER on IntegriCloud