summaryrefslogtreecommitdiffstats
path: root/sys/vm/vnode_pager.c
Commit message (Collapse)AuthorAgeFilesLines
* MFC r271586:kib2014-09-211-10/+6
| | | | | | Fix mis-spelling of bits and types names in the vnode_pager_putpages(). Approved by: re (delphij)
* The soft and hard busy mechanism rely on the vm object lock to work.attilio2013-08-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unify the 2 concept into a real, minimal, sxlock where the shared acquisition represent the soft busy and the exclusive acquisition represent the hard busy. The old VPO_WANTED mechanism becames the hard-path for this new lock and it becomes per-page rather than per-object. The vm_object lock becames an interlock for this functionality: it can be held in both read or write mode. However, if the vm_object lock is held in read mode while acquiring or releasing the busy state, the thread owner cannot make any assumption on the busy state unless it is also busying it. Also: - Add a new flag to directly shared busy pages while vm_page_alloc and vm_page_grab are being executed. This will be very helpful once these functions happen under a read object lock. - Move the swapping sleep into its own per-object flag The KPI is heavilly changed this is why the version is bumped. It is very likely that some VM ports users will need to change their own code. Sponsored by: EMC / Isilon storage division Discussed with: alc Reviewed by: jeff, kib Tested by: gavin, bapt (older version) Tested by: pho, scottl
* - Correct a stale comment. We don't have vclean() anymore. The work isjeff2013-07-231-5/+0
| | | | | | | done by vgonel() and destroy_vobject() should only be called once from VOP_INACTIVE(). Sponsored by: EMC / Isilon Storage Division
* Assert that the object type for the vnode' non-NULL v_object, passedkib2013-04-281-0/+6
| | | | | | | | | | | | | | to vnode_pager_setsize(), is either OBJT_VNODE, or, if vnode was already reclaimed, OBJT_DEAD. Note that the later is only possible due to some filesystems, in particular, nfsiods from nfs clients, call vnode_pager_setsize() with unlocked vnode. More, if the object is terminated, do not perform the resizing operation. Reviewed by: alc Tested by: pho, bf MFC after: 1 week
* Convert panic() into KASSERT().kib2013-04-281-2/+1
| | | | | Reviewed by: alc MFC after: 1 week
* Fix the logic inversion in the r248512.kib2013-03-201-1/+1
| | | | Noted by: mckay
* Pass unmapped buffers for page in requests if the filesystem indicated supportkib2013-03-191-6/+30
| | | | | | | for the unmapped i/o. Sponsored by: The FreeBSD Foundation Tested by: pho
* Some style fixes.kib2013-03-141-1/+1
| | | | Sponsored by: The FreeBSD Foundation
* MFCattilio2013-02-261-2/+2
|
* Hide the details for the assertion for VM_OBJECT_LOCK operations.attilio2013-02-211-3/+3
| | | | | | | | Rename current VM_OBJECT_LOCK_ASSERT(foo, RA_WLOCKED) into VM_OBJECT_ASSERT_WLOCKED(foo) Sponsored by: EMC / Isilon storage division Requested by: alc
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-201-55/+55
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-201-5/+6
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* The r241025 fixed the case when a binary, executed from nullfs mount,kib2012-11-021-3/+3
| | | | | | | | | | | | | | | | | | | | | | | was still possible to open for write from the lower filesystem. There is a symmetric situation where the binary could already has file descriptors opened for write, but it can be executed from the nullfs overlay. Handle the issue by passing one v_writecount reference to the lower vnode if nullfs vnode has non-zero v_writecount. Note that only one write reference can be donated, since nullfs only keeps one use reference on the lower vnode. Always use the lower vnode v_writecount for the checks. Introduce the VOP_GET_WRITECOUNT to read v_writecount, which is currently always bypassed to the lower vnode, and VOP_ADD_WRITECOUNT to manipulate the v_writecount value, which manages a single bypass reference to the lower vnode. Caling the VOPs instead of directly accessing v_writecount provide the fix described in the previous paragraph. Tested by: pho MFC after: 3 weeks
* Remove the support for using non-mpsafe filesystem modules.kib2012-10-221-9/+0
| | | | | | | | | | | | In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
* Fix the mis-handling of the VV_TEXT on the nullfs vnodes.kib2012-09-281-1/+1
| | | | | | | | | | | | | | | | If you have a binary on a filesystem which is also mounted over by nullfs, you could execute the binary from the lower filesystem, or from the nullfs mount. When executed from lower filesystem, the lower vnode gets VV_TEXT flag set, and the file cannot be modified while the binary is active. But, if executed as the nullfs alias, only the nullfs vnode gets VV_TEXT set, and you still can open the lower vnode for write. Add a set of VOPs for the VV_TEXT query, set and clear operations, which are correctly bypassed to lower vnode. Tested by: pho (previous version) MFC after: 2 weeks
* Do not leave invalid pages in the object after the short read for akib2012-08-141-1/+1
| | | | | | | | | | | | | | network file systems (not only NFS proper). Short reads cause pages other then the requested one, which were not filled by read response, to stay invalid. Change the vm_page_readahead_finish() interface to not take the error code, but instead to make a decision to free or to (de)activate the page only by its validity. As result, not requested invalid pages are freed even if the read RPC indicated success. Noted and reviewed by: alc MFC after: 1 week
* After the PHYS_TO_VM_PAGE() function was de-inlined, the main reasonkib2012-08-051-0/+1
| | | | | | | | | | | | | to pull vm_param.h was removed. Other big dependency of vm_page.h on vm_param.h are PA_LOCK* definitions, which are only needed for in-kernel code, because modules use KBI-safe functions to lock the pages. Stop including vm_param.h into vm_page.h. Include vm_param.h explicitely for the kernel code which needs it. Suggested and reviewed by: alc MFC after: 2 weeks
* Reduce code duplication and exposure of direct access to structkib2012-08-041-31/+2
| | | | | | | | | vm_page oflags by providing helper function vm_page_readahead_finish(), which handles completed reads for pages with indexes other then the requested one, for VOP_GETPAGES(). Reviewed by: alc MFC after: 1 week
* Do a more targeted check on the page cache and avoid to check the cacheattilio2012-06-161-1/+1
| | | | | | | | | pointer directly in vnode_pager_setsize() by using newly introduced vm_page_is_cached() function. Reviewed by: alc MFC after: 2 weeks X-MFC: r234039,234064
* The page flag PGA_WRITEABLE is set and cleared exclusively by the pmapalc2012-06-161-1/+1
| | | | | | | | | | | | | | | | layer, but it is read directly by the MI VM layer. This change introduces pmap_page_is_write_mapped() in order to completely encapsulate all direct access to PGA_WRITEABLE in the pmap layer. Aesthetics aside, I am making this change because amd64 will likely begin using an alternative method to track write mappings, and having pmap_page_is_write_mapped() in place allows me to make such a change without further modification to the MI VM layer. As an added bonus, tidy up some nearby comments concerning page flags. Reviewed by: kib MFC after: 6 weeks
* Keep track of the mount point associated with a special devicemckusick2012-03-281-0/+4
| | | | | | | | | | | | | | | | | to enable the collection of counts of synchronous and asynchronous reads and writes for its associated filesystem. The counts are displayed using `mount -v'. Ensure that buffers used for paging indicate the vnode from which they are operating so that counts of paging I/O operations from the filesystem are collected. This checkin only adds the setting of the mount point for the UFS/FFS filesystem, but it would be trivial to add the setting and clearing of the mount point at filesystem mount/unmount time for other filesystems too. Reviewed by: kib
* Add KTR_VFS traces to track modifications to a vnode's writecount.jhb2012-03-081-0/+6
|
* Account the writeable shared mappings backed by file in the vnodekib2012-02-231-0/+85
| | | | | | | | | | | | | | | | | | | | | | | | v_writecount. Keep the amount of the virtual address space used by the mappings in the new vm_object un_pager.vnp.writemappings counter. The vnode v_writecount is incremented when writemappings gets non-zero value, and decremented when writemappings is returned to zero. Writeable shared vnode-backed mappings are accounted for in vm_mmap(), and vm_map_insert() is instructed to set MAP_ENTRY_VN_WRITECNT flag on the created map entry. During deferred map entry deallocation, vm_map_process_deferred() checks for MAP_ENTRY_VN_WRITECOUNT and decrements writemappings for the vm object. Now, the writeable mount cannot be demoted to read-only while writeable shared mappings of the vnodes from the mount point exist. Also, execve(2) fails for such files with ETXTBUSY, as it should be. Noted by: tegge Reviewed by: tegge (long time ago, early version), alc Tested by: pho MFC after: 3 weeks
* Rename vm_page_set_valid() to vm_page_set_valid_range().kib2011-11-301-2/+2
| | | | | | | The vm_page_set_valid() is the most reasonable name for the m->valid accessor. Reviewed by: attilio, alc
* Provide typedefs for the type of bit mask for the page bits.kib2011-11-051-2/+3
| | | | | | | | | Use the defined types instead of int when manipulating masks. Supposedly, it could fix support for 32KB page size in the machine-independend VM layer. Reviewed by: alc MFC after: 2 weeks
* Fix a typo in a comment.jhb2011-10-141-1/+1
|
* Split the vm_page flags PG_WRITEABLE and PG_REFERENCED into atomickib2011-09-061-1/+1
| | | | | | | | | | | | | | | | | flags field. Updates to the atomic flags are performed using the atomic ops on the containing word, do not require any vm lock to be held, and are non-blocking. The vm_page_aflag_set(9) and vm_page_aflag_clear(9) functions are provided to modify afalgs. Document the changes to flags field to only require the page lock. Introduce vm_page_reference(9) function to provide a stable KPI and KBI for filesystems like tmpfs and zfs which need to mark a page as referenced. Reviewed by: alc, attilio Tested by: marius, flo (sparc64); andreast (powerpc, powerpc64) Approved by: re (bz)
* Add a new option, OBJPR_NOTMAPPED, to vm_object_page_remove(). Passing thisalc2011-06-291-1/+1
| | | | | | | | | | | | | | | | | | option to vm_object_page_remove() asserts that the specified range of pages is not mapped, or more precisely that none of these pages have any managed mappings. Thus, vm_object_page_remove() need not call pmap_remove_all() on the pages. This change not only saves time by eliminating pointless calls to pmap_remove_all(), but it also eliminates an inconsistency in the use of pmap_remove_all() versus related functions, like pmap_remove_write(). It eliminates harmless but pointless calls to pmap_remove_all() that were being performed on PG_UNMANAGED pages. Update all of the existing assertions on pmap_remove_all() to reflect this change. Reviewed by: kib
* Fix a bug in r222586. Lock the page owner object around the modificationkib2011-06-111-0/+6
| | | | | | | of the m->dirty. Reported and tested by: nwhitehorn Reviewed by: alc
* In the VOP_PUTPAGES() implementations, change the default error fromkib2011-06-011-1/+18
| | | | | | | | | | | | | | | | VM_PAGER_AGAIN to VM_PAGER_ERROR for the uwritten pages. Return VM_PAGER_AGAIN for the partially written page. Always forward at least one page in the loop of vm_object_page_clean(). VM_PAGER_ERROR causes the page reactivation and does not clear the page dirty state, so the write is not lost. The change fixes an infinite loop in vm_object_page_clean() when the filesystem returns permanent errors for some page writes. Reported and tested by: gavin Reviewed by: alc, rmacklem MFC after: 1 week
* Minimize the use of the page queues lock for synchronizing access to thealc2010-06-021-2/+0
| | | | | page's dirty field. With the exception of one case, access to this field is now synchronized by the object lock.
* Push down page queues lock acquisition in pmap_enter_object() andalc2010-05-261-18/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pmap_is_referenced(). Eliminate the corresponding page queues lock acquisitions from vm_map_pmap_enter() and mincore(), respectively. In mincore(), this allows some additional cases to complete without ever acquiring the page queues lock. Assert that the page is managed in pmap_is_referenced(). On powerpc/aim, push down the page queues lock acquisition from moea*_is_modified() and moea*_is_referenced() into moea*_query_bit(). Again, this will allow some additional cases to complete without ever acquiring the page queues lock. Reorder a few statements in vm_page_dontneed() so that a race can't lead to an old reference persisting. This scenario is described in detail by a comment. Correct a spelling error in vm_page_dontneed(). Assert that the object is locked in vm_page_clear_dirty(), and restrict the page queues lock assertion to just those cases in which the page is currently writeable. Add object locking to vnode_pager_generic_putpages(). This was the one and only place where vm_page_clear_dirty() was being called without the object being locked. Eliminate an unnecessary vm_page_lock() around vnode_pager_setsize()'s call to vm_page_clear_dirty(). Change vnode_pager_generic_putpages() to the modern-style of function definition. Also, change the name of one of the parameters to follow virtual memory system naming conventions. Reviewed by: kib
* Push down the page queues lock into vm_page_activate().alc2010-05-071-6/+9
|
* Eliminate page queues locking around most calls to vm_page_free().alc2010-05-061-18/+0
|
* On Alan's advice, rather than do a wholesale conversion on a singlekmacy2010-04-301-26/+59
| | | | | | | | | | | | architecture from page queue lock to a hashed array of page locks (based on a patch by Jeff Roberson), I've implemented page lock support in the MI code and have only moved vm_page's hold_count out from under page queue mutex to page lock. This changes pmap_extract_and_hold on all pmaps. Supported by: Bitgravity Inc. Discussed with: alc, jeffr, and kib
* Remove write-only variable.kib2010-02-221-3/+0
| | | | MFC after: 3 days
* When a vnode-backed vm object is referenced, it increments the vnodekib2010-01-171-1/+6
| | | | | | | | | | | | | | | reference count, and decrements it on dereference. If referenced object is deallocated, object type is reset to OBJT_DEAD. Consequently, all vnode references that are owned by object references are never released. vunref() the vnode in vm object deallocation code for OBJT_VNODE appropriate number of times to prevent leak. Add an assertion to the vm_pageout() to make sure that we never get reference on the vnode but then do not execute code to release it. In collaboration with: pho Reviewed by: alc MFC after: 3 weeks
* Change the type of uio_resid member of struct uio from int to ssize_t.kib2009-06-251-1/+1
| | | | | | | | Note that this does not actually enable full-range i/o requests for 64 architectures, and is done now to update KBI only. Tested by: pho Reviewed by: jhb, bde (as part of the review of the bigger patch)
* Implement global and per-uid accounting of the anonymous memory. Addkib2009-06-231-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)
* Correct a boundary case error in the management of a page's dirty bits byalc2009-06-021-10/+16
| | | | | | | | shm_dotruncate() and vnode_pager_setsize(). Specifically, if the length of a shared memory object or a file is truncated such that the length modulo the page size is between 1 and 511, then all of the page's dirty bits were cleared. Now, a dirty bit is cleared only if the corresponding block is truncated in its entirety.
* Eliminate unnecessary clearing of the page's dirty mask from variousalc2009-05-151-5/+6
| | | | | | getpages functions. Eliminate a stale comment.
* Eliminate gratuitous clearing of the page's dirty mask.alc2009-05-121-1/+2
|
* Fix a race involving vnode_pager_input_smlfs(). Specifically, in the casealc2009-05-091-23/+10
| | | | | | | | | | | | | | | | | | | that vnode_pager_input_smlfs() zeroes the page, it should not mark the page as valid until after the page is zeroed. Otherwise, the page could be mapped for read access (e.g., by vm_map_pmap_enter()) before the page is zeroed. Reviewed by: tegge Eliminate gratuitous clearing of the page's dirty mask by vnode_pager_input_smlfs(). Instead, assert that the page is clean. Reviewed by: tegge Eliminate some blank lines. Eliminate pointless calls to pmap_clear_modify() and vm_page_undirty() from vnode_pager_input_old(). The page is not mapped. Therefore, it cannot have any page table entries that are modified. Eliminate an incorrect comment from vnode_pager_generic_getpages().
* Eliminate vnode_pager_input_smlfs()'s pointless call to pmap_clear_modify().alc2009-05-041-3/+0
| | | | | The page can't possibly have any modified page table entries because it isn't even mapped.
* Eliminate unnecessary calls to pmap_clear_modify(). Specifically, callingalc2009-04-251-2/+6
| | | | | | | | | pmap_clear_modify() on a page is pointless if that page is not mapped or it is only mapped for read access. Instead, assert that the page is not mapped or not mapped for write access as appropriate. Eliminate unnecessary clearing of a page's dirty mask. Instead, assert that the page's dirty mask is clear.
* Adjust some variables (mostly related to the buffer cache) that holdjhb2009-03-091-2/+2
| | | | | | | | | | | | | | | | | | | address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month
* Comment out the assertion from r188321. It is not valid for nfs.kib2009-02-091-1/+1
| | | | Reported by: alc
* Eliminate OBJ_NEEDGIANT. After r188331, OBJ_NEEDGIANT's only use is by aalc2009-02-081-2/+0
| | | | | | redundant assertion in vm_fault(). Reviewed by: kib
* Do not sleep for vnode lock while holding map lock in vm_fault. Try tokib2009-02-081-53/+0
| | | | | | | | | | | | | | | acquire vnode lock for OBJT_VNODE object after map lock is dropped. Because we have the busy page(s) in the object, sleeping there would result in deadlock with vnode resize. Try to get lock without sleeping, and, if the attempt failed, drop the state, lock the vnode, and restart the fault handler from the start with already locked vnode. Because the vnode_pager_lock() function is inlined in vm_fault(), axe it. Based on suggestion by: alc Reviewed by: tegge, alc Tested by: pho
* Assert that vnode is exclusively locked when its vm object is resized.kib2009-02-081-0/+1
| | | | Reviewed by: tegge
OpenPOWER on IntegriCloud