summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Bring in MemGuard, a very simple and small replacement allocatorbmilekic2005-01-212-0/+253
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | designed to help detect tamper-after-free scenarios, a problem more and more common and likely with multithreaded kernels where race conditions are more prevalent. Currently MemGuard can only take over malloc()/realloc()/free() for particular (a) malloc type(s) and the code brought in with this change manually instruments it to take over M_SUBPROC allocations as an example. If you are planning to use it, for now you must: 1) Put "options DEBUG_MEMGUARD" in your kernel config. 2) Edit src/sys/kern/kern_malloc.c manually, look for "XXX CHANGEME" and replace the M_SUBPROC comparison with the appropriate malloc type (this might require additional but small/simple code modification if, say, the malloc type is declared out of scope). 3) Build and install your kernel. Tune vm.memguard_divisor boot-time tunable which is used to scale how much of kmem_map you want to allott for MemGuard's use. The default is 10, so kmem_size/10. ToDo: 1) Bring in a memguard(9) man page. 2) Better instrumentation (e.g., boot-time) of MemGuard taking over malloc types. 3) Teach UMA about MemGuard to allow MemGuard to override zone allocations too. 4) Improve MemGuard if necessary. This work is partly based on some old patches from Ian Dowse.
* Add checks to vm_map_findspace() to test for address wrap. The conditionsalc2005-01-181-4/+8
| | | | | | | where this could occur are very rare, but possible. Submitted by: Mark W. Krentel MFC after: 2 weeks
* Consider three objects, O, BO, and BBO, where BO is O's backing objectalc2005-01-151-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | and BBO is BO's backing object. Now, suppose that O and BO are being collapsed. Furthermore, suppose that BO has been marked dead (OBJ_DEAD) by vm_object_backing_scan() and that either vm_object_backing_scan() has been forced to sleep due to encountering a busy page or vm_object_collapse() has been forced to sleep due to memory allocation in the swap pager. If vm_object_deallocate() is then called on BBO and BO is BBO's only shadow object, vm_object_deallocate() will collapse BO and BBO. In doing so, it adds a necessary temporary reference to BO. If this collapse also sleeps and the prior collapse resumes first, the temporary reference will cause vm_object_collapse to panic with the message "backing_object %p was somehow re-referenced during collapse!" Resolve this race by changing vm_object_deallocate() such that it doesn't collapse BO and BBO if BO is marked dead. Once O and BO are collapsed, vm_object_collapse() will attempt to collapse O and BBO. So, vm_object_deallocate() on BBO need do nothing. Reported by: Peter Holm on 20050107 URL: http://www.holm.cc/stress/log/cons102.html In collaboration with: tegge@ Candidate for RELENG_4 and RELENG_5 MFC after: 2 weeks
* Eliminate unused and unnecessary "cred" argument from vinvalbuf()phk2005-01-141-1/+1
|
* Remove the unused credential argument from VOP_FSYNC() and VFS_SYNC().phk2005-01-111-1/+1
| | | | | | | | | | | | | | | | | | I'm not sure why a credential was added to these in the first place, it is not used anywhere and it doesn't make much sense: The credentials for syncing a file (ability to write to the file) should be checked at the system call level. Credentials for syncing one or more filesystems ("none") should be checked at the system call level as well. If the filesystem implementation needs a particular credential to carry out the syncing it would logically have to the cached mount credential, or a credential cached along with any delayed write data. Discussed with: rwatson
* While we want the recursion protection for the bucket zones so thatbmilekic2005-01-111-1/+11
| | | | | | | | | | | | | | | | | recursion from the VM is handled (and the calling code that allocates buckets knows how to deal with it), we do not want to prevent allocation from the slab header zones (slabzone and slabrefzone) if uk_recurse is not zero for them. The reason is that it could lead to NULL being returned for the slab header allocations even in the M_WAITOK case, and the caller can't handle that (this is also explained in a comment with this commit). The problem analysis is documented in our mailing lists: http://docs.freebsd.org/cgi/getmsg.cgi?fetch=153445+0+archive/2004/freebsd-current/20041231.freebsd-current (see entire thread for proper context). Crash dump data provided by: Peter Holm <peter@holm.cc>
* ISO C requires at least one element in an initialiser list.stefanf2005-01-101-1/+1
|
* Move the acquisition and release of the page queues lock outside of a loopalc2005-01-081-2/+3
| | | | in vm_object_split() to avoid repeated acquisition and release.
* Transfer responsibility for freeing the page taken from the cachealc2005-01-071-19/+17
| | | | | | | | | | | queue and (possibly) unlocking the containing object from vm_page_alloc() to vm_page_select_cache(). Recent optimizations to vm_map_pmap_enter() (see vm_map.c revisions 1.362 and 1.363) and pmap_enter_quick() have resulted in panic()s because vm_page_alloc() mistakenly unlocked objects that had not been locked by vm_page_select_cache(). Reported by: Peter Holm and Kris Kennaway
* /* -> /*- for license, minor formatting changesimp2005-01-0734-36/+36
|
* Revise the part of vm_pageout_scan() that moves pages from the cachealc2005-01-061-12/+31
| | | | | | queue to the free queue. With this change, if a page from the cache queue belongs to a locked object, it is simply skipped over rather than moved to the inactive queue.
* When allocating bio's in the swap_pager use M_WAITOK since thephk2005-01-031-6/+7
| | | | alternative is much worse.
* Assert that page allocations during an interrupt specifyalc2004-12-311-2/+6
| | | | | | VM_ALLOC_INTERRUPT. Assert that pages removed from the cache queue are not busy.
* Access to the page's busy field is (now) synchronized by the containingalc2004-12-291-1/+0
| | | | | object's lock. Therefore, the assertion that the page queues lock is held can be removed from vm_page_io_start().
* Note that access to the page's busy count is synchronized by the containingalc2004-12-271-1/+1
| | | | object's lock.
* Assert that the vm object is locked on entry to vm_page_sleep_if_busy();alc2004-12-261-8/+3
| | | | remove some unneeded code.
* Add my copyright and update Jeff's copyright on UMA source files,bmilekic2004-12-265-10/+20
| | | | | | as per his request. Discussed with: Jeffrey Roberson
* fix commentphk2004-12-251-1/+1
|
* Continue the transition from synchronizing access to the page's PG_BUSYalc2004-12-241-11/+29
| | | | | | | | | flag and busy field with the global page queues lock to synchronizing their access with the containing object's lock. Specifically, acquire the containing object's lock before reading the page's PG_BUSY flag and busy field in vm_fault(). Reviewed by: tegge@
* Modify pmap_enter_quick() so that it expects the page queues to be lockedalc2004-12-232-7/+11
| | | | | | | | | | | on entry and it assumes the responsibility for releasing the page queues lock if it must sleep. Remove a bogus comment from pmap_enter_quick(). Using the first change, modify vm_map_pmap_enter() so that the page queues lock is acquired and released once, rather than each time that a page is mapped.
* Eliminate another unnecessary call to vm_page_busy(). (See revision 1.333alc2004-12-171-5/+0
| | | | for a detailed explanation.)
* Enable debug.mpsafevm by default on alpha.alc2004-12-171-1/+1
|
* In the common case, pmap_enter_quick() completes without sleeping.alc2004-12-152-17/+6
| | | | | | | | | | | | | | | | | | In such cases, the busying of the page and the unlocking of the containing object by vm_map_pmap_enter() and vm_fault_prefault() is unnecessary overhead. To eliminate this overhead, this change modifies pmap_enter_quick() so that it expects the object to be locked on entry and it assumes the responsibility for busying the page and unlocking the object if it must sleep. Note: alpha, amd64, i386 and ia64 are the only implementations optimized by this change; arm, powerpc, and sparc64 still conservatively busy the page and unlock the object within every pmap_enter_quick() call. Additionally, this change is the first case where we synchronize access to the page's PG_BUSY flag and busy field using the containing object's lock rather than the global page queues lock. (Modifications to the page's PG_BUSY flag and busy field have asserted both locks for several weeks, enabling an incremental transition.)
* With the removal of kern/uipc_jumbo.c and sys/jumbo.h,alc2004-12-082-22/+5
| | | | vm_object_allocate_wait() is not used. Remove it.
* Almost nine years ago, when support for 1TB files was introduced inalc2004-12-071-1/+1
| | | | | | | | | | | | | revision 1.55, the address parameter to vnode_pager_addr() was changed from an unsigned 32-bit quantity to a signed 64-bit quantity. However, an out-of-range check on the address was not updated. Consequently, memory-mapped I/O on files greater than 2GB could cause a kernel panic. Since the address is now a signed 64-bit quantity, the problem resolution is simply to remove a cast. Reviewed by: bde@ and tegge@ PR: 73010 MFC after: 1 week
* Correct a sanity check in vnode_pager_generic_putpages(). The cast usedalc2004-12-051-1/+1
| | | | | | | | | | to implement the sanity check should have been changed when we converted the implementation of vm_pindex_t from 32 to 64 bits. (Thus, RELENG_4 is not affected.) The consequence of this error would be a legimate write to an extremely large file being treated as an errant attempt to write meta- data. Discussed with: tegge@
* Don't include sys/user.h merely for its side-effect of recursivelydas2004-11-271-2/+0
| | | | including other headers.
* Remove useless casts.cognet2004-11-261-2/+2
|
* Try to close a potential, but serious race in our VM subsystem.delphij2004-11-241-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | Historically, our contigmalloc1() and contigmalloc2() assumes that a page in PQ_CACHE can be unconditionally reused by busying and freeing it. Unfortunatelly, when object happens to be not NULL, the code will set m->object to NULL and disregard the fact that the page is actually in the VM page bucket, resulting in page bucket hash table corruption and finally, a filesystem corruption, or a 'page not in hash' panic. This commit has borrowed the idea taken from DragonFlyBSD's fix to the VM fix by Matthew Dillon[1]. This version of patch will do the following checks: - When scanning pages in PQ_CACHE, check hold_count and skip over pages that are held temporarily. - For pages in PQ_CACHE and selected as candidate of being freed, check if it is busy at that time. Note: It seems that this is might be unrelated to kern/72539. Obtained from: DragonFlyBSD, sys/vm/vm_contig.c,v 1.11 and 1.12 [1] Reminded by: Matt Dillon Reworked by: alc MFC After: 1 week
* Disable U area swapping and remove the routines that create, destroy,das2004-11-204-206/+0
| | | | | | copy, and swap U areas. Reviewed by: arch@
* Make VOP_BMAP return a struct bufobj for the underlying storage devicephk2004-11-151-10/+13
| | | | | | | | | instead of a vnode for it. The vnode_pager does not and should not have any interest in what the filesystem uses for backend. (vfs_cluster doesn't use the backing store argument.)
* Add pbgetbo()/pbrelbo() lighter weight versions of pbgetvp()/pbrelvp().phk2004-11-151-0/+42
|
* More kasserts.phk2004-11-151-1/+6
|
* style polishing.phk2004-11-151-7/+3
|
* Move pbgetvp() and pbrelvp() to vm_pager.c with the rest of the pbuf stuff.phk2004-11-151-0/+44
|
* expect the caller to have called pbrelvp() if necessary.phk2004-11-151-3/+0
|
* Explicitly call pbrelvp()phk2004-11-151-0/+2
|
* Improve readability with a bunch of typedefs for the pager ops.phk2004-11-091-7/+15
| | | | These can also be used for prototypes in the pagers.
* #include <vm/vm_param.h> instead of <machine/vmparam.h> (the formerdes2004-11-081-6/+6
| | | | | | | | | | | | includes the latter, but also declares variables which are defined in kern/subr_param.c). Change som VM parameters from quad_t to unsigned long. They refer to quantities (size limits for text, heap and stack segments) which must necessarily be smaller than the size of the address space, so long is adequate on all platforms. MFC after: 1 week
* Eliminate an unnecessary atomic operation. Articulate the rationale inalc2004-11-061-4/+11
| | | | a comment.
* Abstract the logic to look up the uma_bucket_zone given a desiredrwatson2004-11-061-7/+23
| | | | | | | | | number of entries into bucket_zone_lookup(), which helps make more clear the logic of consumers of bucket zones. Annotate the behavior of bucket_init() with a comment indicating how the various data structures, including the bucket lookup tables, are initialized.
* Remove dangling variablephk2004-11-061-1/+0
|
* Annotate what bucket_size[] array does; staticize since it's used onlyrwatson2004-11-061-1/+5
| | | | in uma_core.c.
* Fix the last known race in swapoff(), which could lead to a spurious panic:das2004-11-061-21/+14
| | | | | | | | | | | | | swapoff: failed to locate %d swap blocks The race occurred because putpages() can block between the time it allocates swap space and the time it updates the swap metadata to associate that space with a vm_object, so swapoff() would complain about the temporary inconsistency. I hoped to fix this by making swp_pager_getswapspace() and swp_pager_meta_build() a single atomic operation, but that proved to be inconvenient. With this change, swapoff() simply doesn't attempt to be so clever about detecting when all the pageout activity to the target device should have drained.
* Move a call to wakeup() from vm_object_terminate() to vnode_pager_dealloc()alc2004-11-063-2/+6
| | | | | | | | | because this call is only needed to wake threads that slept when they discovered a dead object connected to a vnode. To eliminate unnecessary calls to wakeup() by vnode_pager_dealloc(), introduce a new flag, OBJ_DISCONNECTWNT. Reviewed by: tegge@
* - Set the priority of the page zeroing thread using sched_prio() when thejhb2004-11-051-14/+5
| | | | | | | | | | thread is created rather than adjusting the priority in the main function. (kthread_create() should probably take the initial priority as an argument.) - Only yield the CPU in the !PREEMPTION case if there are any other runnable threads. Yielding when there isn't anything else better to do just wastes time in pointless context switches (albeit while the system is idle.)
* During traversal of the inactive queue, try locking the page's containingalc2004-11-051-4/+9
| | | | object before accessing the page's flags or the object's reference count.
* Eliminate another unnecessary call to vm_page_busy() that immediatelyalc2004-11-051-1/+0
| | | | | precedes a call to vm_page_rename(). (See the previous revision for a detailed explanation.)
* Close a race in swapoff(). Here are the gory details:das2004-11-051-70/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to avoid livelock, swapoff() skips over objects with a nonzero pip count and makes another pass if necessary. Since it is impossible to know which objects we care about, it would choose an arbitrary object with a nonzero pip count and wait for it before making another pass, the theory being that this object would finish paging about as quickly as the ones we care about. Unfortunately, we may have slept since we acquired a reference to this object. Hack around this problem by tsleep()ing on the pointer anyway, but timeout after a fixed interval. More elegant solutions are possible, but the ones I considered unnecessarily complicate this rare case. Also, kill some nits that seem to have crept into the swapoff() code in the last 75 revisions or so: - Don't pass both sp and sp->sw_used to swap_pager_swapoff(), since the latter can be derived from the former. - Replace swp_pager_find_dev() with something simpler. There's no need to iterate over the entire list of swap devices just to determine if a given block is assigned to the one we're interested in. - Expand the scope of the swhash_mtx in a couple of places so that it isn't released and reacquired once for every hash bucket. - Don't drop the swhash_mtx while holding a reference to an object. We need to lock the object first. Unfortunately, doing so would violate the established lock order, so use VM_OBJECT_TRYLOCK() and try again on a subsequent pass if the object is already locked. - Refactor swp_pager_force_pagein() and swap_pager_swapoff() a bit.
* Retire b_magic now, we have the bufobj containing the same hint.phk2004-11-041-1/+0
|
OpenPOWER on IntegriCloud