summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Keep track of the mount point associated with a special devicemckusick2012-03-281-0/+4
| | | | | | | | | | | | | | | | | to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib
* Handle spurious page faults that may occur in no-fault sections of thealc2012-03-221-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | kernel. When access restrictions are added to a page table entry, we flush the corresponding virtual address mapping from the TLB. In contrast, when access restrictions are removed from a page table entry, we do not flush the virtual address mapping from the TLB. This is exactly as recommended in AMD's documentation. In effect, when access restrictions are removed from a page table entry, AMD's MMUs will transparently refresh a stale TLB entry. In short, this saves us from having to perform potentially costly TLB flushes. In contrast, Intel's MMUs are allowed to generate a spurious page fault based upon the stale TLB entry. Usually, such spurious page faults are handled by vm_fault() without incident. However, when we are executing no-fault sections of the kernel, we are not allowed to execute vm_fault(). This change introduces special-case handling for spurious page faults that occur in no-fault sections of the kernel. In collaboration with: kib Tested by: gibbs (an earlier version) I would also like to acknowledge Hiroki Sato's assistance in diagnosing this problem. MFC after: 1 week
* Bah, just revert my earlier change entirely. (Missed alc's request to dojhb2012-03-191-1/+1
| | | | | | this earlier.) Requested by: alc
* Fix madvise(MADV_WILLNEED) to properly handle individual mappings largerjhb2012-03-193-16/+14
| | | | | | | | than 4GB. Specifically, the inlined version of 'ptoa' of the the 'int' count of pages overflowed on 64-bit platforms. While here, change vm_object_madvise() to accept two vm_pindex_t parameters (start and end) rather than a (start, count) tuple to match other VM APIs as suggested by alc@.
* Alter the previous commit to use vm_size_t instead of vm_pindex_t.jhb2012-03-191-1/+1
| | | | | vm_pindex_t is not a count of pages per se, it is more like vm_ooffset_t, but a page index instead of a byte offset.
* In vm_object_page_clean(), do not clean OBJ_MIGHTBEDIRTY object flagkib2012-03-177-26/+60
| | | | | | | | | | | | | | | | | | if the filesystem performed short write and we are skipping the page due to this. Propogate write error from the pager back to the callers of vm_pageout_flush(). Report the failure to write a page from the requested range as the FALSE return value from vm_object_page_clean(), and propagate it back to msync(2) to return EIO to usermode. While there, convert the clearobjflags variable in the vm_object_page_clean() and arguments of the helper functions to boolean. PR: kern/165927 Reviewed by: alc MFC after: 2 weeks
* Pedantic nit: use vm_pindex_t instead of long for a count of pages.jhb2012-03-141-1/+1
|
* Add KTR_VFS traces to track modifications to a vnode's writecount.jhb2012-03-081-0/+6
|
* Eliminate stale incorrect ARGSUSED comments.alc2012-03-021-3/+0
| | | | Submitted by: bde
* Simplify kmem_alloc() by eliminating code that existed on account ofalc2012-02-291-30/+0
| | | | | | | | | external pagers in Mach. FreeBSD doesn't implement external pagers. Moreover, it don't pageout the kernel object. So, the reasons for having code don't hold. Reviewed by: kib MFC after: 6 weeks
* Simplify vm_mmap()'s control flow.alc2012-02-251-16/+19
| | | | | | | | Add a comment describing what vm_mmap_to_errno() does. Reviewed by: kib MFC after: 3 weeks X-MFC after: r232071
* Simplify vmspace_fork()'s control flow by copying immutable data beforealc2012-02-251-14/+10
| | | | | | | | the vm map locks are acquired. Also, eliminate redundant initialization of the new vm map's timestamp. Reviewed by: kib MFC after: 3 weeks
* Place the if() at the right location, to activate the v_writecountkib2012-02-241-4/+4
| | | | | | | | | accounting for shared writeable mappings for all filesystems, not only for the bypass layers. Submitted by: alc Pointy hat to: kib MFC after: 20 days
* Account the writeable shared mappings backed by file in the vnodekib2012-02-236-15/+204
| | | | | | | | | | | | | | | | | | | | | | | | v_writecount. Keep the amount of the virtual address space used by the mappings in the new vm_object un_pager.vnp.writemappings counter. The vnode v_writecount is incremented when writemappings gets non-zero value, and decremented when writemappings is returned to zero. Writeable shared vnode-backed mappings are accounted for in vm_mmap(), and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on the created map entry. During deferred map entry deallocation, vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and decrements writemappings for the vm object. Now, the writeable mount cannot be demoted to read-only while writeable shared mappings of the vnodes from the mount point exist. Also, execve(2) fails for such files with ETXTBUSY, as it should be. Noted by: tegge Reviewed by: tegge (long time ago, early version), alc Tested by: pho MFC after: 3 weeks
* Remove wrong comment.kib2012-02-221-4/+0
| | | | | Discussed with: alc MFC after: 3 days
* When vm_mmap() is used to map a vm object into a kernel vm_map, italc2012-02-161-10/+10
| | | | | | | makes no sense to check the size of the kernel vm_map against the user-level resource limits for the calling process. Reviewed by: kib
* Close a race due to dropping of the map lock between creating map entrykib2012-02-113-10/+11
| | | | | | | | | for a shared mapping and marking the entry for inheritance. Other thread might execute vmspace_fork() in between (e.g. by fork(2)), resulting in the mapping becoming private. Noted and reviewed by: alc MFC after: 1 week
* Remove direct access to si_name.ed2012-02-101-3/+3
| | | | | | | | Code should just use the devtoname() function to obtain the name of a character device. Also add const keywords to pieces of code that need it to build properly. MFC after: 2 weeks
* Fix NULL dereference panic on attempt to turn off (on system shutdown)mav2012-02-011-1/+1
| | | | | | | | | | | | | disconnected swap device. This is quick and imperfect solution, as swap device will still be opened and GEOM will not be able to destroy it. Proper solution would be to automatically turn off and close disconnected swap device, but with existing code it will cause panic if there is at least one page on device, even if it is unimportant page of the user-level process. It needs some work. Reviewed by: kib@ MFC after: 1 week
* exclude kmem_alloc'ed ARC data buffers from kernel minidumps on amd64kmacy2012-01-276-0/+18
| | | | | | | | excluding other allocations including UMA now entails the addition of a single flag to kmem_alloc or uma zone create Reviewed by: alc, avg MFC after: 2 weeks
* Revert r212360 now that PowerPC can handle large sparse arguments tonwhitehorn2012-01-171-5/+2
| | | | | | pmap_remove() (changed in r228412). MFC after: 2 weeks
* Change the type of the paging_in_progress refcounter from u_short tokib2012-01-101-1/+1
| | | | | | | | | | u_int. With the auto-sized buffer cache on the modern machines, UFS metadata can generate more the 65535 pages belonging to the buffers undergoing i/o, overflowing the counter. Reported and tested by: jimharris Reviewed by: alc MFC after: 1 week
* Do not restart the scan in vm_object_page_clean() on the objectkib2012-01-041-4/+12
| | | | | | | | | | | | generation change if requested mode is async. The object generation is only changed when the object is marked as OBJ_MIGHTBEDIRTY. For async mode it is enough to write each dirty page, not to make a guarantee that all pages are cleared after the vm_object_page_clean() returned. Diagnosed by: truckman Tested by: flo Reviewed by: alc, truckman MFC after: 2 weeks
* Optimize vm_object_split()'s handling of reservations.alc2011-12-281-0/+15
|
* Optimize the common case of msyncing the whole file mapping withkib2011-12-231-3/+18
| | | | | | | | | | | | | | | MS_SYNC flag. The system must guarantee that all writes are finished before syscalls returned. Schedule the writes in async mode, which is much faster and allows the clustering to occur. Wait for writes using VOP_FSYNC(), since we are syncing the whole file mapping. Potentially, the restriction to only apply the optimization can be relaxed by not requiring that the mapping cover whole file, as it is done by other OSes. Reported and tested by: az Reviewed by: alc MFC after: 2 weeks
* Move kstack_cache_entry into the private header, and make thekib2011-12-161-7/+2
| | | | | | stack cache list header accessible outside vm_glue.c. MFC after: 1 week
* - The previous commit (r228449) accidentally moved the vm.stats.vm.* sysctlseadler2011-12-141-47/+50
| | | | | | | | | | to vm.stats.sys. Move them back. Noticed by: pho Reviewed by: bde (earlier version) Approved by: bz MFC after: 1 week Pointy hat to: me
* Document a large number of currently undocumented sysctls. While hereeadler2011-12-131-108/+63
| | | | | | | | | | | | fix some style(9) issues and reduce redundancy. PR: kern/155491 PR: kern/155490 PR: kern/155489 Submitted by: Galimov Albert <wtfcrap@mail.ru> Approved by: bde Reviewed by: jhb MFC after: 1 week
* Fix printf.kib2011-12-121-1/+1
| | | | | Submitted by: az MFC after: 1 week
* Introduce vm_reserv_alloc_contig() and teach vm_page_alloc_contig() how toalc2011-12-053-71/+269
| | | | | | | | | | | | | | | use superpage reservations. So, for the first time, kernel virtual memory that is allocated by contigmalloc(), kmem_alloc_attr(), and kmem_alloc_contig() can be promoted to superpages. In fact, even a series of small contigmalloc() allocations may collectively result in a promoted superpage. Eliminate some duplication of code in vm_reserv_alloc_page(). Change the type of vm_reserv_reclaim_contig()'s first parameter in order that it be consistent with other vm_*_contig() functions. Tested by: marius (sparc64)
* Rename vm_page_set_valid() to vm_page_set_valid_range().kib2011-11-303-6/+6
| | | | | | | The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc
* Hide the internals of vm_page_lock(9) from the loadable modules.kib2011-11-292-0/+49
| | | | | | | | | | | Since the address of vm_page lock mutex depends on the kernel options, it is easy for module to get out of sync with the kernel. No vm_page_lockptr() accessor is provided for modules. It can be added later if needed, unless proper KPI is developed to serve the needs. Reviewed by: attilio, alc MFC after: 3 weeks
* Introduce the same mutex-wise fix in r227758 for sx locks.attilio2011-11-211-29/+13
| | | | | | | | | | | | | | | | | | | | | The functions that offer file and line specifications are: - sx_assert_ - sx_downgrade_ - sx_slock_ - sx_slock_sig_ - sx_sunlock_ - sx_try_slock_ - sx_try_xlock_ - sx_try_upgrade_ - sx_unlock_ - sx_xlock_ - sx_xlock_sig_ - sx_xunlock_ Now vm_map locking is fully converted and can avoid to know specifics about locking procedures. Reviewed by: kib MFC after: 1 month
* Introduce macro stubs in the mutex implementation that will be alwaysattilio2011-11-201-15/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | defined and will allow consumers, willing to provide options, file and line to locking requests, to not worry about options redefining the interfaces. This is typically useful when there is the need to build another locking interface on top of the mutex one. The introduced functions that consumers can use are: - mtx_lock_flags_ - mtx_unlock_flags_ - mtx_lock_spin_flags_ - mtx_unlock_spin_flags_ - mtx_assert_ - thread_lock_flags_ Spare notes: - Likely we can get rid of all the 'INVARIANTS' specification in the ppbus code by using the same macro as done in this patch (but this is left to the ppbus maintainer) - all the other locking interfaces may require a similar cleanup, where the most notable case is sx which will allow a further cleanup of vm_map locking facilities - The patch should be fully compatible with older branches, thus a MFC is previewed (infact it uses all the underlying mechanisms already present). Comments review by: eadler, Ben Kaduk Discussed with: kib, jhb MFC after: 1 month
* Eliminate end-of-line white space.alc2011-11-171-2/+2
|
* Refactor the code that performs physically contiguous memory allocation,alc2011-11-165-109/+222
| | | | | | | | | | | | | | | | | | | | | | | | yielding a new public interface, vm_page_alloc_contig(). This new function addresses some of the limitations of the current interfaces, contigmalloc() and kmem_alloc_contig(). For example, the physically contiguous memory that is allocated with those interfaces can only be allocated to the kernel vm object and must be mapped into the kernel virtual address space. It also provides functionality that vm_phys_alloc_contig() doesn't, such as wiring the returned pages. Moreover, unlike that function, it respects the low water marks on the paging queues and wakes up the page daemon when necessary. That said, at present, this new function can't be applied to all types of vm objects. However, that restriction will be eliminated in the coming weeks. From a design standpoint, this change also addresses an inconsistency between vm_phys_alloc_contig() and the other vm_phys_alloc*() functions. Specifically, vm_phys_alloc_contig() manipulated vm_page fields that other functions in vm/vm_phys.c didn't. Moreover, vm_phys_alloc_contig() knew about vnodes and reservations. Now, vm_page_alloc_contig() is responsible for these things. Reviewed by: kib Discussed with: jhb
* Update the device pager interface, while keeping the compatibilitykib2011-11-153-75/+175
| | | | | | | | | | | | | | | | | | | | | | | | layer for old KPI and KBI. New interface should be used together with d_mmap_single cdevsw method. Device pager can be allocated with the cdev_pager_allocate(9) function, which takes struct cdev_pager_ops, containing constructor/destructor and page fault handler methods supplied by driver. Constructor and destructor, called at the pager allocation and deallocation time, allow the driver to handle per-object private data. The pager handler is called to handle page fault on the vm map entry backed by the driver pager. Driver shall return either the vm_page_t which should be mapped, or error code (which does not cause kernel panic anymore). The page handler interface has a placeholder to specify the access mode causing the fault, but currently PROT_READ is always passed there. Sponsored by: The FreeBSD Foundation Reviewed by: alc MFC after: 1 month
* Remove the condition that is always true.kib2011-11-151-1/+1
| | | | | Submitted by: alc MFC after: 1 week
* Mark all SYSCTL_NODEs static that have no corresponding SYSCTL_DECLs.ed2011-11-073-3/+4
| | | | | | The SYSCTL_NODE macro defines a list that stores all child-elements of that node. If there's no SYSCTL_DECL macro anywhere else, there's no reason why it shouldn't be static.
* Wake up the page daemon in vm_page_alloc_freelist() if it couldn'talc2011-11-061-20/+36
| | | | | | | | | | | allocate the requested page because too few pages are cached or free. Document the VM_ALLOC_COUNT() option to vm_page_alloc() and vm_page_alloc_freelist(). Make style changes to vm_page_alloc() and vm_page_alloc_freelist(), such as using a variable name that more closely corresponds to the comments.
* Remove redundand definitions. The chunk was missed from r227102.kib2011-11-051-10/+0
| | | | MFC after: 2 weeks
* Provide typedefs for the type of bit mask for the page bits.kib2011-11-053-30/+33
| | | | | | | | | Use the defined types instead of int when manipulating masks. Supposedly, it could fix support for 32KB page size in the machine-independend VM layer. Reviewed by: alc MFC after: 2 weeks
* Simplify the implementation of the failure case in kmem_alloc_attr().alc2011-11-041-8/+7
|
* Add the posix_fadvise(2) system call. It is somewhat similar tojhb2011-11-042-0/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | madvise(2) except that it operates on a file descriptor instead of a memory region. It is currently only supported on regular files. Just as with madvise(2), the advice given to posix_fadvise(2) can be divided into two types. The first type provide hints about data access patterns and are used in the file read and write routines to modify the I/O flags passed down to VOP_READ() and VOP_WRITE(). These modes are thus filesystem independent. Note that to ease implementation (and since this API is only advisory anyway), only a single non-normal range is allowed per file descriptor. The second type of hints are used to hint to the OS that data will or will not be used. These hints are implemented via a new VOP_ADVISE(). A default implementation is provided which does nothing for the WILLNEED request and attempts to move any clean pages to the cache page queue for the DONTNEED request. This latter case required two other changes. First, a new V_CLEANONLY flag was added to vinvalbuf(). This requests vinvalbuf() to only flush clean buffers for the vnode from the buffer cache and to not remove any backing pages from the vnode. This is used to ensure clean pages are not wired into the buffer cache before attempting to move them to the cache page queue. The second change adds a new vm_object_page_cache() method. This method is somewhat similar to vm_object_page_remove() except that instead of freeing each page in the specified range, it attempts to move clean pages to the cache queue if possible. To preserve the ABI of struct file, the f_cdevpriv pointer is now reused in a union to point to the currently active advice region if one is present for regular files. Reviewed by: jilles, kib, arch@ Approved by: re (kib) MFC after: 1 month
* Add support for VM_ALLOC_WIRED and VM_ALLOC_ZERO to vm_page_alloc_freelist()alc2011-11-021-9/+42
| | | | | | | | | | | | and use these new options in the mips pmap. Wake up the page daemon in vm_page_alloc_freelist() if the number of free and cached pages becomes too low. Tidy up vm_page_alloc_init(). In particular, add a comment about an important restriction on its use. Tested by: jchandra@
* Eliminate vm_phys_bootstrap_alloc(). It was a failed attempt atalc2011-10-306-57/+76
| | | | | | | | | | | | | | | | | eliminating duplicated code in the various pmap implementations. Micro-optimize vm_phys_free_pages(). Introduce vm_phys_free_contig(). It is fast routine for freeing an arbitrary number of physically contiguous pages. In particular, it doesn't require the number of pages to be a power of two. Use "u_long" instead of "unsigned long". Bruce Evans (bde@) has convinced me that the "boundary" parameters to kmem_alloc_contig(), vm_phys_alloc_contig(), and vm_reserv_reclaim_contig() should be of type "vm_paddr_t" and not "u_long". Make this change.
* Use "u_long" instead of "unsigned long".alc2011-10-282-5/+4
|
* Tidy up the comment at the head of vm_page_alloc, and mention that thealc2011-10-271-6/+8
| | | | returned page has the flag VPO_BUSY set.
* Eliminate vestiges of page coloring in VM_ALLOC_NOOBJ calls toalc2011-10-271-1/+1
| | | | | | vm_page_alloc(). While I'm here, for the sake of consistency, always specify the allocation class, such as VM_ALLOC_NORMAL, as the first of the flags.
* contigmalloc(9) and contigfree(9) are now implemented in terms of otheralc2011-10-271-28/+0
| | | | | | more general VM system interfaces. So, their implementation can now reside in kern_malloc.c alongside the other functions that are declared in malloc.h.
OpenPOWER on IntegriCloud