summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how toalc2011-12-053-71/+269
| | | | | | | | | | | | | | | use superpage reservations. So, for the first time, kernel virtual memory that is allocated by contigmalloc(), kmem_alloc_attr(), and kmem_alloc_contig() can be promoted to superpages. In fact, even a series of small contigmalloc() allocations may collectively result in a promoted superpage. Eliminate some duplication of code in vm_reserv_alloc_page(). Change the type of vm_reserv_reclaim_contig()'s first parameter in order that it be consistent with other vm_*_contig() functions. Tested by: marius (sparc64)
* Rename vm_page_set_valid() to vm_page_set_valid_range().kib2011-11-303-6/+6
| | | | | | | The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc
* Hide the internals of vm_page_lock(9) from the loadable modules.kib2011-11-292-0/+49
| | | | | | | | | | | Since the address of vm_page lock mutex depends on the kernel options, it is easy for module to get out of sync with the kernel. No vm_page_lockptr() accessor is provided for modules. It can be added later if needed, unless proper KPI is developed to serve the needs. Reviewed by: attilio, alc MFC after: 3 weeks
* Introduce the same mutex-wise fix in r227758 for sx locks.attilio2011-11-211-29/+13
| | | | | | | | | | | | | | | | | | | | | The functions that offer file and line specifications are: - sx_assert_ - sx_downgrade_ - sx_slock_ - sx_slock_sig_ - sx_sunlock_ - sx_try_slock_ - sx_try_xlock_ - sx_try_upgrade_ - sx_unlock_ - sx_xlock_ - sx_xlock_sig_ - sx_xunlock_ Now vm_map locking is fully converted and can avoid to know specifics about locking procedures. Reviewed by: kib MFC after: 1 month
* Introduce macro stubs in the mutex implementation that will be alwaysattilio2011-11-201-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | defined and will allow consumers, willing to provide options, file and line to locking requests, to not worry about options redefining the interfaces. This is typically useful when there is the need to build another locking interface on top of the mutex one. The introduced functions that consumers can use are: - mtx_lock_flags_ - mtx_unlock_flags_ - mtx_lock_spin_flags_ - mtx_unlock_spin_flags_ - mtx_assert_ - thread_lock_flags_ Spare notes: - Likely we can get rid of all the 'INVARIANTS' specification in the ppbus code by using the same macro as done in this patch (but this is left to the ppbus maintainer) - all the other locking interfaces may require a similar cleanup, where the most notable case is sx which will allow a further cleanup of vm_map locking facilities - The patch should be fully compatible with older branches, thus a MFC is previewed (infact it uses all the underlying mechanisms already present). Comments review by: eadler, Ben Kaduk Discussed with: kib, jhb MFC after: 1 month
* Eliminate end-of-line white space.alc2011-11-171-2/+2
|
* Refactor the code that performs physically contiguous memory allocation,alc2011-11-165-109/+222
| | | | | | | | | | | | | | | | | | | | | | | | yielding a new public interface, vm_page_alloc_contig(). This new function addresses some of the limitations of the current interfaces, contigmalloc() and kmem_alloc_contig(). For example, the physically contiguous memory that is allocated with those interfaces can only be allocated to the kernel vm object and must be mapped into the kernel virtual address space. It also provides functionality that vm_phys_alloc_contig() doesn't, such as wiring the returned pages. Moreover, unlike that function, it respects the low water marks on the paging queues and wakes up the page daemon when necessary. That said, at present, this new function can't be applied to all types of vm objects. However, that restriction will be eliminated in the coming weeks. From a design standpoint, this change also addresses an inconsistency between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions. Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew about vnodes and reservations. Now, vm_page_alloc_contig() is responsible for these things. Reviewed by: kib Discussed with: jhb
* Update the device pager interface, while keeping the compatibilitykib2011-11-153-75/+175
| | | | | | | | | | | | | | | | | | | | | | | | layer for old KPI and KBI. New interface should be used together with d_mmap_single cdevsw method. Device pager can be allocated with the cdev_pager_allocate(9) function, which takes struct cdev_pager_ops, containing constructor/destructor and page fault handler methods supplied by driver. Constructor and destructor, called at the pager allocation and deallocation time, allow the driver to handle per-object private data. The pager handler is called to handle page fault on the vm map entry backed by the driver pager. Driver shall return either the vm_page_t which should be mapped, or error code (which does not cause kernel panic anymore). The page handler interface has a placeholder to specify the access mode causing the fault, but currently PROT_READ is always passed there. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
* Remove the condition that is always true.kib2011-11-151-1/+1
| | | | | Submitted by: alc MFC after: 1 week
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-073-3/+4
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* Wake up the page daemon in vm_page_alloc_freelist() if it couldn'talc2011-11-061-20/+36
| | | | | | | | | | | allocate the requested page because too few pages are cached or free. Document the VM_ALLOC_COUNT() option to vm_page_alloc() and vm_page_alloc_freelist(). Make style changes to vm_page_alloc() and vm_page_alloc_freelist(), such as using a variable name that more closely corresponds to the comments.
* Remove redundand definitions. The chunk was missed from r227102.kib2011-11-051-10/+0
| | | | MFC after: 2 weeks
* Provide typedefs for the type of bit mask for the page bits.kib2011-11-053-30/+33
| | | | | | | | | Use the defined types instead of int when manipulating masks. Supposedly, it could fix support for 32KB page size in the machine-independend VM layer. Reviewed by: alc MFC after: 2 weeks
* Simplify the implementation of the failure case in kmem_alloc_attr().alc2011-11-041-8/+7
|
* Add the posix_fadvise(2) system call. It is somewhat similar tojhb2011-11-042-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
* Add support for VM_ALLOC_WIRED and VM_ALLOC_ZERO to vm_page_alloc_freelist()alc2011-11-021-9/+42
| | | | | | | | | | | | and use these new options in the mips pmap. Wake up the page daemon in vm_page_alloc_freelist() if the number of free and cached pages becomes too low. Tidy up vm_page_alloc_init(). In particular, add a comment about an important restriction on its use. Tested by: jchandra@
* Eliminate vm_phys_bootstrap_alloc(). It was a failed attempt atalc2011-10-306-57/+76
| | | | | | | | | | | | | | | | | eliminating duplicated code in the various pmap implementations. Micro-optimize vm_phys_free_pages(). Introduce vm_phys_free_contig(). It is fast routine for freeing an arbitrary number of physically contiguous pages. In particular, it doesn't require the number of pages to be a power of two. Use "u_long" instead of "unsigned long". Bruce Evans (bde@) has convinced me that the "boundary" parameters to kmem_alloc_contig(), vm_phys_alloc_contig(), and vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not "u_long". Make this change.
* Use "u_long" instead of "unsigned long".alc2011-10-282-5/+4
|
* Tidy up the comment at the head of vm_page_alloc, and mention that thealc2011-10-271-6/+8
| | | | returned page has the flag VPO_BUSY set.
* Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls toalc2011-10-271-1/+1
| | | | | | vm_page_alloc(). While I'm here, for the sake of consistency, always specify the allocation class, such as VM_ALLOC_NORMAL, as the first of the flags.
* contigmalloc(9) and contigfree(9) are now implemented in terms of otheralc2011-10-271-28/+0
| | | | | | more general VM system interfaces. So, their implementation can now reside in kern_malloc.c alongside the other functions that are declared in malloc.h.
* Speed up vm_page_cache() and vm_page_remove() by checking for a fewalc2011-10-251-18/+72
| | | | | | | | | common cases that can be handled in constant time. The insight being that a page's parent in the vm object's tree is very often its predecessor or successor in the vm object's ordered memq. Tested by: jhb MFC after: 10 days
* VN_NRESERVLEVEL is used in this file but opt_vm is not includedattilio2011-10-221-0/+1
| | | | | | | | thus the stub switch won't be correctly handled. Include opt_vm.h. Submitted by: jeff MFC after: 3 days
* Control the execution permission of the readable segments forkib2011-10-151-1/+1
| | | | | | | i386 binaries on the amd64 and ia64 with the sysctl, instead of unconditionally enabling it. Reviewed by: marcel
* Fix a typo in a comment.jhb2011-10-141-1/+1
|
* In sys_obreak() and when compiling for amd64 or ia64, when the processmarcel2011-10-131-2/+12
| | | | | is ILP32 (i.e. i386) grant execute permissions by default. The JDK 1.4.x depends on being able to execute from the heap on i386.
* Make memguard(9) capable to guard uma(9) allocations.glebius2011-10-124-14/+84
|
* Style nit.kib2011-09-291-1/+0
| | | | | Submitted by: jhb MFC after: 2 weeks
* Fix grammar.kib2011-09-282-5/+5
| | | | | Submitted by: bf MFC after: 2 weeks
* Use the trick of performing the atomic operation on the contained alignedkib2011-09-283-49/+50
| | | | | | | | | | | word to handle the dirty mask updates in vm_page_clear_dirty_mask(). Remove the vm page queue lock around vm_page_dirty() call in vm_fault_hold() the sole purpose of which was to protect dirty on architectures which does not provide short or byte-wide atomics. Reviewed by: alc, attilio Tested by: flo (sparc64) MFC after: 2 weeks
* Use the explicitly-sized types for the dirty and valid masks.kib2011-09-281-8/+8
| | | | | | Requested by: attilio Reviewed by: alc MFC after: 2 weeks
* In order to maximize the re-usability of kernel code in user space thiskmacy2011-09-163-19/+19
| | | | | | | | | | | | | patch modifies makesyscalls.sh to prefix all of the non-compatibility calls (e.g. not linux_, freebsd32_) with sys_ and updates the kernel entry points and all places in the code that use them. It also fixes an additional name space collision between the kernel function psignal and the libc function of the same name by renaming the kernel psignal kern_psignal(). By introducing this change now we will ease future MFCs that change syscalls. Reviewed by: rwatson Approved by: re (bz)
* Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomickib2011-09-068-89/+106
| | | | | | | | | | | | | | | | | flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
* Update some comments in swap_pager.c.kib2011-08-221-30/+17
| | | | | | Reviewed and most wording by: alc MFC after: 1 week Approved by: re (bz)
* Apply the limit to avoid the overflows in the radix tree subr_blist.ckib2011-08-221-10/+12
| | | | | | | | | | | after the conversion of the swap device size to the page size units, not before. That lifts the limit on the usable swap partition size from 32GB to 256GB, that is less depressing for the modern systems. Submitted by: Alexander V. Chernikov <melifaro ipfw ru> Reviewed by: alc Approved by: re (bz) MFC after: 2 weeks
* Second-to-last commit implementing Capsicum capabilities in the FreeBSDrwatson2011-08-111-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
* - Move the PG_UNMANAGED flag from m->flags to m->oflags, renaming the flagkib2011-08-094-33/+37
| | | | | | | | | | | | | | to VPO_UNMANAGED (and also making the flag protected by the vm object lock, instead of vm page queue lock). - Mark the fake pages with both PG_FICTITIOUS (as it is now) and VPO_UNMANAGED. As a consequence, pmap code now can use use just VPO_UNMANAGED to decide whether the page is unmanaged. Reviewed by: alc Tested by: pho (x86, previous version), marius (sparc64), marcel (arm, ia64, powerpc), ray (mips) Sponsored by: The FreeBSD Foundation Approved by: re (bz)
* Fix an error in kmem_alloc_attr(). Unless "tries" is updated,alc2011-08-071-0/+1
| | | | | | | kmem_alloc_attr() could get stuck in a loop. Approved by: re (kib) MFC after: 3 days
* Implement the linprocfs swaps file, providing information about thekib2011-08-012-21/+40
| | | | | | | | | | configured swap devices in the Linux-compatible format. Based on the submission by: Robert Millan <rmh debian org> PR: kern/159281 Reviewed by: bde Approved by: re (kensmith) MFC after: 2 weeks
* Fix a race in the device pager allocation. If another thread won andkib2011-07-301-2/+9
| | | | | | | | | | | | allocated the device pager for the given handle, then the object fictitious pages list and the object membership in the global object list still need to be initialized. Otherwise, dev_pager_dealloc() will traverse uninitialized pointers. Reported and tested by: pho Reviewed by: jhb Approved by: re (kensmith) MFC after: 1 week
* Extract the code to translate VM error into errno, into an exportedkib2011-07-102-0/+8
| | | | | | | | | | function vm_mmap_to_errno(). It is useful for the drivers that implement mmap(2)-like functionality, to be able to return error codes consistent with mmap(2). Sponsored by: The FreeBSD Foundation No objections from: alc MFC after: 1 week
* Style.kib2011-07-101-1/+1
| | | | MFC after: 3 days
* Add a facility to disable processing page faults. When activated,kib2011-07-092-0/+18
| | | | | | | | uiomove generates EFAULT if any accessed address is not mapped, as opposed to handling the fault. Sponsored by: The FreeBSD Foundation Reviewed by: alc (previous version)
* All the racct_*() calls need to happen with the proc locked. Fixing thistrasz2011-07-066-0/+42
| | | | | | won't happen before 9.0. This commit adds "#ifdef RACCT" around all the "PROC_LOCK(p); racct_whatever(p, ...); PROC_UNLOCK(p)" instances, in order to avoid useless locking/unlocking in kernels built without "options RACCT".
* Handle a race between device_pager and devsw in a more graceful manner:attilio2011-07-061-2/+4
| | | | | | | | | return an error code rather than panic the kernel. Sponsored by: Sandvine Incorporated Reviewed by: kib Tested by: pho MFC after: 2 weeks
* Initialize marker pages as held rather than fictitious/wired. Marking thealc2011-07-021-2/+8
| | | | | | | page as held is more useful as a safety precaution in case someone forgets to check for PG_MARKER. Reviewed by: kib
* Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing thisalc2011-06-294-61/+83
| | | | | | | | | | | | | | | | | | option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib
* Revert to using the page queues lock in vm_page_clear_dirty_mask() onalc2011-06-232-4/+2
| | | | | MIPS. (At present, although atomic_clear_char() is defined by atomic.h on MIPS, it is not actually implemented by support.S.)
* Precisely document the synchronization rules for the page's dirty field.alc2011-06-193-10/+68
| | | | | | | | | | | | | (Saying that the lock on the object that the page belongs to must be held only represents one aspect of the rules.) Eliminate the use of the page queues lock for atomically performing read- modify-write operations on the dirty field when the underlying architecture supports atomic operations on char and short types. Document the fact that 32KB pages aren't really supported. Reviewed by: attilio, kib
* Assert that page is VPO_BUSY or page owner object is locked inkib2011-06-112-0/+26
| | | | | | | | vm_page_undirty(). The assert is not precise due to VPO_BUSY owner to tracked, so assertion does not catch the case when VPO_BUSY is owned by other thread. Reviewed by: alc
OpenPOWER on IntegriCloud