summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Remove mpte optimization from pmap_enter_quick().ups2006-06-152-6/+4
| | | | | | | | | There is a race with the current locking scheme and removing it should have no measurable performance impact. This fixes page faults leading to panics in pmap_enter_quick_locked() on amd64/i386. Reviewed by: alc,jhb,peter,ps
* Correct an error in the previous revision that could lead to a panic:alc2006-06-141-0/+1
| | | | | | | | | | | | Found mapped cache page. Specifically, if cnt.v_free_count dips below cnt.v_free_reserved after p_start has been set to a non-NULL value, then vm_map_pmap_enter() would break out of the loop and incorrectly call pmap_enter_object() for the remaining address range. To correct this error, this revision truncates the address range so that pmap_enter_object() will not map any cache pages. In collaboration with: tegge@ Reported by: kris@
* Enable debug.mpsafevm on arm by default.alc2006-06-101-1/+1
| | | | Tested by: cognet@
* Introduce the function pmap_enter_object(). It maps a sequence of residentalc2006-06-052-5/+17
| | | | | | | pages from the same object. Use it in vm_map_pmap_enter() to reduce the locking overhead of premapping objects. Reviewed by: tegge@
* Fix minidumps to include pages allocated via pmap_map on amd64.ps2006-05-311-0/+9
| | | | | | | | These pages are allocated from the direct map, and were not previous tracked. This included the vm_page_array and the early UMA bootstrap pages. Reviewed by: peter
* Close race between vmspace_exitfree() and exit1() and races betweentegge2006-05-295-24/+102
| | | | | | | | | | | | | | | | | vmspace_exitfree() and vmspace_free() which could result in the same vmspace being freed twice. Factor out part of exit1() into new function vmspace_exit(). Attach to vmspace0 to allow old vmspace to be freed earlier. Add new function, vmspace_acquire_ref(), for obtaining a vmspace reference for a vmspace belonging to another process. Avoid changing vmspace refcount from 0 to 1 since that could also lead to the same vmspace being freed twice. Change vmtotal() and swapout_procs() to use vmspace_acquire_ref(). Reviewed by: alc
* When allocating a bucket to hold a free'd item in UMA fails, don'trwatson2006-05-211-2/+1
| | | | | | | | | | | | | report this as an allocation failure for the item type. The failure will be separately recorded with the bucket type. This my eliminate high mbuf allocation failure counts under some circumstances, which can be alarming in appearance, but not actually a problem in practice. MFC after: 2 weeks Reported by: ps, Peter J. Blok <pblok at bsd4all dot org>, OxY <oxy at field dot hu>, Gabor MICSKO <gmicskoa at szintezis dot hu>
* Simplify the implementation of vm_fault_additional_pages() based upon thealc2006-05-131-12/+5
| | | | | | | object's memq being ordered. Specifically, replace repeated calls to vm_page_lookup() by two simple constant-time operations. Reviewed by: tegge
* Use better order here.pjd2006-05-101-1/+1
|
* Add synchronization to vm_pageq_add_new_page() so that it can be calledalc2006-04-251-3/+3
| | | | | | safely after kernel initialization. Remove GIANT_REQUIRED. MFC after: 6 weeks
* It seems that POSIX would rather ENODEV returned in place of EINVAL whentrhodes2006-04-211-1/+1
| | | | | | | trying to mmap() an fd that isn't a normal file. Reference: http://www.opengroup.org/onlinepubs/009695399/functions/mmap.html Submitted by: fanf
* Introduce minidumps. Full physical memory crash dumps are still availablepeter2006-04-211-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | via the debug.minidump sysctl and tunable. Traditional dumps store all physical memory. This was once a good thing when machines had a maximum of 64M of ram and 1GB of kvm. These days, machines often have many gigabytes of ram and a smaller amount of kvm. libkvm+kgdb don't have a way to access physical ram that is not mapped into kvm at the time of the crash dump, so the extra ram being dumped is mostly wasted. Minidumps invert the process. Instead of dumping physical memory in in order to guarantee that all of kvm's backing is dumped, minidumps instead dump only memory that is actively mapped into kvm. amd64 has a direct map region that things like UMA use. Obviously we cannot dump all of the direct map region because that is effectively an old style all-physical-memory dump. Instead, introduce a bitmap and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that allow certain critical direct map pages to be included in the dump. uma_machdep.c's allocator is the intended consumer. Dumps are a custom format. At the very beginning of the file is a header, then a copy of the message buffer, then the bitmap of pages present in the dump, then the final level of the kvm page table trees (2MB mappings are expanded into a 4K page mappings), then the sparse physical pages according to the bitmap. libkvm can now conveniently access the kvm page table entries. Booting my test 8GB machine, forcing it into ddb and forcing a dump leads to a 48MB minidump. While this is a best case, I expect minidumps to be in the 100MB-500MB range. Obviously, never larger than physical memory of course. minidumps are on by default. It would want be necessary to turn them off if it was necessary to debug corrupt kernel page table management as that would mess up minidumps as well. Both minidumps and regular dumps are supported on the same machine.
* Change msleep() and tsleep() to not alter the calling thread's priorityjhb2006-04-171-3/+1
| | | | | | | | | | | | if the specified priority is zero. This avoids a race where the calling thread could read a snapshot of it's current priority, then a different thread could change the first thread's priority, then the original thread would call sched_prio() inside msleep() undoing the change made by the second thread. I used a priority of zero as no thread that calls msleep() or tsleep() should be specifying a priority of zero anyway. The various places that passed 'curthread->td_priority' or some variant as the priority now pass 0.
* On shutdown try to turn off all swap devices. This way GEOM providers arepjd2006-04-102-19/+63
| | | | | | | | properly closed on shutdown. Requested by: ru Reviewed by: alc MFC after: 2 weeks
* Remove the unused sva and eva arguments from pmap_remove_pages().peter2006-04-031-1/+1
|
* MFP4: Support for profiling dynamically loaded objects.jkoshy2006-03-261-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel changes: Inform hwpmc of executable objects brought into the system by kldload() and mmap(), and of their removal by kldunload() and munmap(). A helper function linker_hwpmc_list_objects() has been added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve the list of currently loaded kernel modules. The unused `MAPPINGCHANGE' event has been deprecated in favour of separate `MAP_IN' and `MAP_OUT' events; this change reduces space wastage in the log. Bump the hwpmc's ABI version to "2.0.00". Teach hwpmc(4) to handle the map change callbacks. Change the default per-cpu sample buffer size to hold 32 samples (up from 16). Increment __FreeBSD_version. libpmc(3) changes: Update libpmc(3) to deal with the new events in the log file; bring the pmclog(3) manual page in sync with the code. pmcstat(8) changes: Introduce new options to pmcstat(8): "-r" (root fs path), "-M" (mapfile name), "-q"/"-v" (verbosity control). Option "-k" now takes a kernel directory as its argument but will also work with the older invocation syntax. Rework string handling in pmcstat(8) to use an opaque type for interned strings. Clean up ELF parsing code and add support for tracking dynamic object mappings reported by a v2.0.00 hwpmc(4). Report statistics at the end of a log conversion run depending on the requested verbosity level. Reviewed by: jhb, dds (kernel parts of an earlier patch) Tested by: gallatin (earlier patch)
* Remove leading __ from __(inline|const|signed|volatile). They areimp2006-03-085-10/+10
| | | | obsolete. This should reduce diffs to NetBSD as well.
* Ignore dirty pages owned by "dead" objects.tegge2006-03-081-0/+4
|
* Eliminate a deadlock when creating snapshots. Blocking vn_start_write() musttegge2006-03-023-2/+6
| | | | | | be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.
* Hold extra reference to vm object while cleaning pages.tegge2006-03-021-0/+2
|
* Lock the vm_object while checking its type to see if it is a vnode-backedjhb2006-02-211-11/+25
| | | | | | | | | | | object that requires Giant in vm_object_deallocate(). This is somewhat hairy in that if we can't obtain Giant directly, we have to drop the object lock, then lock Giant, then relock the object lock and verify that we still need Giant. If we don't (because the object changed to OBJT_DEAD for example), then we drop Giant before continuing. Reviewed by: alc Tested by: kris
* Expand scope of marker to reduce the number of page queue scan restarts.tegge2006-02-171-12/+19
|
* Check return value from nonblocking call to vn_start_write().tegge2006-02-171-2/+8
|
* When the VM needs to allocated physical memory pages (for non interrupt use)ups2006-02-151-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | and it has not plenty of free pages it tries to free pages in the cache queue. Unfortunately freeing a cached page requires the locking of the object that owns the page. However in the context of allocating pages we may not be able to lock the object and thus can only TRY to lock the object. If the locking try fails the cache page can not be freed and is activated to move it out of the way so that we may try to free other cache pages. If all pages in the cache belong to objects that are currently locked the cache queue can be emptied without freeing a single page. This scenario caused two problems: 1) vm_page_alloc always failed allocation when it tried freeing pages from the cache queue and failed to do so. However if there are more than cnt.v_interrupt_free_min pages on the free list it should return pages when requested with priority VM_ALLOC_SYSTEM. Failure to do so can cause resource exhaustion deadlocks. 2) Threads than need to allocate pages spend a lot of time cleaning up the page queue without really getting anything done while the pagedaemon needs to work overtime to refill the cache. This change fixes the first problem. (1) Reviewed by: tegge@
* Skip per-cpu caches associated with absent CPUs when generating arwatson2006-02-111-0/+2
| | | | | | memory statistics record stream via sysctl. MFC after: 3 days
* - Fix silly VI locking that is used to check a single flag. The vnodejeff2006-02-061-14/+6
| | | | | | | | | lock also protects this flag so it is not necessary. - Don't rely on v_mount to detect whether or not we've been recycled, use the more appropriate VI_DOOMED instead. Sponsored by: Isilon Systems, Inc. MFC After: 1 week
* Remove an unnecessary call to pmap_remove_all(). The given page is notalc2006-02-041-1/+0
| | | | mapped because its contents are invalid.
* Adjust old comment (present in rev 1.1) to match changes in rev 1.82.tegge2006-02-021-1/+1
| | | | | PR: kern/92509 Submitted by: "Bryan Venteicher" <bryanv@daemoninthecloset.org>
* Use off_t for file size passed to vnode_create_vobject().yar2006-02-011-1/+1
| | | | | | | | | The former type, size_t, was causing truncation to 32 bits on i386, which immediately led to undersizing of VM objects backed by files >4GB. In particular, sendfile(2) was broken for such files. PR: kern/92243 MFC after: 5 days
* - Install a temporary bandaid in vm_object_reference() that will stopjeff2006-02-011-5/+5
| | | | mtx_assert()s from triggering until I find a real long-term solution.
* Change #if defined(DIAGNOSTIC) to KASSERT.alc2006-01-311-4/+3
|
* Add buffer corruption protection (RedZone) for kernel's malloc(9).pjd2006-01-312-0/+219
| | | | | | | | It detects both: buffer underflows and buffer overflows bugs at runtime (on free(9) and realloc(9)) and prints backtraces from where memory was allocated and from where it was freed. Tested by: kris
* The change a few years ago of having contigmalloc start its scan at the topscottl2006-01-291-2/+19
| | | | | | | | | | | | | | | | | of physical RAM instead of the bottom was a sound idea, but the implementation left a lot to be desired. Scans would spend considerable time looking at pages that are above of the address range given by the caller, and multiple calls (like what happens in busdma) would spend more time on top of that rescanning the same pages over and over. Solve this, at least for now, with two simple optimizations. The first is to not bother scanning high ordered pages that are outside of the provided address range. Second is to cache the page index from the last successful operation so that subsequent scans don't have to restart from the top. This is conditional on the numpages argument being the same or greater between calls. MFC After: 2 weeks
* Add a new macro wrapper WITNESS_CHECK() around the witness_warn() function.jhb2006-01-271-1/+1
| | | | | | | | | | The difference between WITNESS_CHECK() and WITNESS_WARN() is that WITNESS_CHECK() should be used in the places that the return value of witness_warn() is checked, whereas WITNESS_WARN() should be used in places where the return value is ignored. Specifically, in a kernel without WITNESS enabled, WITNESS_WARN() evaluates to an empty string where as WITNESS_CHECK evaluates to 0. I also updated the one place that was checking the return value of WITNESS_WARN() to use WITNESS_CHECK.
* Make sure b_vp and b_bufobj are NULL before calling relpbuf(), as it assertscognet2006-01-271-0/+9
| | | | | | | they are. They should be NULL at this point, except if we're coming from swapdev_strategy(). It should only affect the case where we're swapping directly on a file over NFS.
* Style: Add blank line after local variable declarations.alc2006-01-271-0/+1
|
* Use the new macros abstracting the page coloring/queues implementation.alc2006-01-271-2/+2
| | | | (There are no functional changes.)
* Use the new macros abstracting the page coloring/queues implementation.alc2006-01-274-8/+8
| | | | (There are no functional changes.)
* Plug a leak in the newer contigmalloc() implementation. Specifically, ifalc2006-01-261-10/+5
| | | | | | | a multipage allocation was aborted midway, the pages that were already allocated were not always returned to the free list. Submitted by: tegge
* - Avoid calling vm_object_backing_scan() when collapsing an object whenjeff2006-01-251-1/+3
| | | | | | | | the resident page count matches the object size. We know it fully backs its parent in this case. Reviewed by: acl, tegge Sponsored by: Isilon Systems, Inc.
* The previous revision incorrectly changed a switch statement into an ifalc2006-01-251-3/+3
| | | | | | | | | | statement. Specifically, a break statement that previously broke out of the enclosing switch was not changed. Consequently, the enclosing loop terminated prematurely. This could result in "vm_page_insert: page already inserted" panics. Submitted by: tegge
* With the recent changes to the implementation of page coloring, thealc2006-01-242-4/+2
| | | | | the option PQ_NOOPT is used exclusively by vm_pageq.c. Thus, the include of opt_vmpage.h can be removed from vm_page.h.
* In vm_page_set_invalid() invalidate all of the page's mappings as soon asalc2006-01-241-0/+2
| | | | | | any part of the page's contents is invalidated. Submitted by: tegge
* Make vm_object_vndeallocate() static. The external calls to it werealc2006-01-222-2/+2
| | | | eliminated in ufs/ffs/ffs_vnops.c's revision 1.125.
* Reduce the scope of one #ifdef to avoid duplicating a SYSCTL_INT() macrojhb2006-01-061-5/+1
| | | | | and trim another unneeded #ifdef (it was just around a macro that is already conditionally defined).
* Convert the PAGE_SIZE check into a CTASSERT.netchild2006-01-041-1/+3
| | | | Suggested by: jhb
* Prevent divide by zero, use default values in case one of the divisor'snetchild2006-01-041-1/+1
| | | | | | is zero. Tested by: Randy Bush <randy@psg.com>
* MI changes:netchild2005-12-318-145/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)
* Improve memguard a bit:pjd2005-12-302-0/+93
| | | | | | | | | | | | | | | | | - Provide tunable vm.memguard.desc, so one can specify memory type without changing the code and recompiling the kernel. - Allow to use memguard for kernel modules by providing sysctl vm.memguard.desc, which can be changed to short description of memory type before module is loaded. - Move as much memguard code as possible to memguard.c. - Add sysctl node vm.memguard. and move memguard-specific sysctl there. - Add malloc_desc2type() function for finding memory type based on its short description (ks_shortdesc field). - Memory type can be changed (via vm.memguard.desc sysctl) only if it doesn't exist (will be loaded later) or when no memory is allocated yet. If there is allocated memory for the given memory type, return EBUSY. - Implement two ways of memory types comparsion and make safer/slower the default.
* Don't access fs->first_object after dropping reference to it.tegge2005-12-201-1/+3
| | | | | | The result could be a missed or extra giant unlock. Reviewed by: alc
OpenPOWER on IntegriCloud