summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Eliminate OBJ_WRITEABLE. It hasn't been used in a long time.alc2006-07-213-6/+4
|
* Add pmap_clear_write() to the interface between the virtual memoryalc2006-07-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | system's machine-dependent and machine-independent layers. Once pmap_clear_write() is implemented on all of our supported architectures, I intend to replace all calls to pmap_page_protect() by calls to pmap_clear_write(). Why? Both the use and implementation of pmap_page_protect() in our virtual memory system has subtle errors, specifically, the management of execute permission is broken on some architectures. The "prot" argument to pmap_page_protect() should behave differently from the "prot" argument to other pmap functions. Instead of meaning, "give the specified access rights to all of the physical page's mappings," it means "don't take away the specified access rights from all of the physical page's mappings, but do take away the ones that aren't specified." However, owing to our i386 legacy, i.e., no support for no-execute rights, all but one invocation of pmap_page_protect() specifies VM_PROT_READ only, when the intent is, in fact, to remove only write permission. Consequently, a faithful implementation of pmap_page_protect(), e.g., ia64, would remove execute permission as well as write permission. On the other hand, some architectures that support execute permission have basically ignored whether or not VM_PROT_EXECUTE is passed to pmap_page_protect(), e.g., amd64 and sparc64. This change represents the first step in replacing pmap_page_protect() by the less subtle pmap_clear_write() that is already implemented on amd64, i386, and sparc64. Discussed with: grehan@ and marcel@
* Fix build of uma_core.c when DDB is not compiled into the kernel byrwatson2006-07-181-0/+2
| | | | | | making uma_zone_sumstat() ifdef DDB, as it's only used with DDB now. Submitted by: Wolfram Fenske <Wolfram.Fenske at Student.Uni-Magdeburg.DE>
* Ensure that vm_object_deallocate() doesn't dereference a stale objectalc2006-07-171-6/+13
| | | | | | | | | | | | pointer: When vm_object_deallocate() sleeps because of a non-zero paging in progress count on either object or object's shadow, vm_object_deallocate() must ensure that object is still the shadow's backing object when it reawakens. In fact, object may have been deallocated while vm_object_deallocate() slept. If so, reacquiring the lock on object can lead to a deadlock. Submitted by: ups@ MFC after: 3 weeks
* Remove sysctl_vm_zone() and vm.zone sysctl from 7.x. As of 6.x,rwatson2006-07-161-80/+0
| | | | | | | | libmemstat(3) is used by vmstat (and friends) to produce more accurate and more detailed statistics information in a machine-readable way, and vmstat continues to provide the same text-based front-end. This change should not be MFC'd.
* Set debug.mpsafevm to true on PowerPC. (Now, by default, all architecturesalc2006-07-101-4/+0
| | | | | | in CVS have debug.mpsafevm set to true.) Tested by: grehan@
* Move the code to handle the vm.blacklist tunable up a layer intojhb2006-06-232-31/+39
| | | | | | | | | | vm_page_startup(). As a result, we now only lookup the tunable once instead of looking it up once for every physical page of memory in the system. This cuts out about a 1 second or so delay in boot on x86 systems. The delay is much larger and more noticable on sun4v apparently. Reported by: kmacy MFC after: 1 week
* Make the mincore(2) return ENOMEM when requested range is not fully mapped.kib2006-06-211-3/+15
| | | | | | | Requested by: Bruno Haible <bruno at clisp org> Reviewed by: alc Approved by: pjd (mentor) MFC after: 1 month
* Use ptoa(psize) instead of size to compute the end of the mapping inalc2006-06-171-3/+3
| | | | vm_map_pmap_enter().
* Remove mpte optimization from pmap_enter_quick().ups2006-06-152-6/+4
| | | | | | | | | There is a race with the current locking scheme and removing it should have no measurable performance impact. This fixes page faults leading to panics in pmap_enter_quick_locked() on amd64/i386. Reviewed by: alc,jhb,peter,ps
* Correct an error in the previous revision that could lead to a panic:alc2006-06-141-0/+1
| | | | | | | | | | | | Found mapped cache page. Specifically, if cnt.v_free_count dips below cnt.v_free_reserved after p_start has been set to a non-NULL value, then vm_map_pmap_enter() would break out of the loop and incorrectly call pmap_enter_object() for the remaining address range. To correct this error, this revision truncates the address range so that pmap_enter_object() will not map any cache pages. In collaboration with: tegge@ Reported by: kris@
* Enable debug.mpsafevm on arm by default.alc2006-06-101-1/+1
| | | | Tested by: cognet@
* Introduce the function pmap_enter_object(). It maps a sequence of residentalc2006-06-052-5/+17
| | | | | | | pages from the same object. Use it in vm_map_pmap_enter() to reduce the locking overhead of premapping objects. Reviewed by: tegge@
* Fix minidumps to include pages allocated via pmap_map on amd64.ps2006-05-311-0/+9
| | | | | | | | These pages are allocated from the direct map, and were not previous tracked. This included the vm_page_array and the early UMA bootstrap pages. Reviewed by: peter
* Close race between vmspace_exitfree() and exit1() and races betweentegge2006-05-295-24/+102
| | | | | | | | | | | | | | | | | vmspace_exitfree() and vmspace_free() which could result in the same vmspace being freed twice. Factor out part of exit1() into new function vmspace_exit(). Attach to vmspace0 to allow old vmspace to be freed earlier. Add new function, vmspace_acquire_ref(), for obtaining a vmspace reference for a vmspace belonging to another process. Avoid changing vmspace refcount from 0 to 1 since that could also lead to the same vmspace being freed twice. Change vmtotal() and swapout_procs() to use vmspace_acquire_ref(). Reviewed by: alc
* When allocating a bucket to hold a free'd item in UMA fails, don'trwatson2006-05-211-2/+1
| | | | | | | | | | | | | report this as an allocation failure for the item type. The failure will be separately recorded with the bucket type. This my eliminate high mbuf allocation failure counts under some circumstances, which can be alarming in appearance, but not actually a problem in practice. MFC after: 2 weeks Reported by: ps, Peter J. Blok <pblok at bsd4all dot org>, OxY <oxy at field dot hu>, Gabor MICSKO <gmicskoa at szintezis dot hu>
* Simplify the implementation of vm_fault_additional_pages() based upon thealc2006-05-131-12/+5
| | | | | | | object's memq being ordered. Specifically, replace repeated calls to vm_page_lookup() by two simple constant-time operations. Reviewed by: tegge
* Use better order here.pjd2006-05-101-1/+1
|
* Add synchronization to vm_pageq_add_new_page() so that it can be calledalc2006-04-251-3/+3
| | | | | | safely after kernel initialization. Remove GIANT_REQUIRED. MFC after: 6 weeks
* It seems that POSIX would rather ENODEV returned in place of EINVAL whentrhodes2006-04-211-1/+1
| | | | | | | trying to mmap() an fd that isn't a normal file. Reference: http://www.opengroup.org/onlinepubs/009695399/functions/mmap.html Submitted by: fanf
* Introduce minidumps. Full physical memory crash dumps are still availablepeter2006-04-211-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | via the debug.minidump sysctl and tunable. Traditional dumps store all physical memory. This was once a good thing when machines had a maximum of 64M of ram and 1GB of kvm. These days, machines often have many gigabytes of ram and a smaller amount of kvm. libkvm+kgdb don't have a way to access physical ram that is not mapped into kvm at the time of the crash dump, so the extra ram being dumped is mostly wasted. Minidumps invert the process. Instead of dumping physical memory in in order to guarantee that all of kvm's backing is dumped, minidumps instead dump only memory that is actively mapped into kvm. amd64 has a direct map region that things like UMA use. Obviously we cannot dump all of the direct map region because that is effectively an old style all-physical-memory dump. Instead, introduce a bitmap and two helper routines (dump_add_page(pa) and dump_drop_page(pa)) that allow certain critical direct map pages to be included in the dump. uma_machdep.c's allocator is the intended consumer. Dumps are a custom format. At the very beginning of the file is a header, then a copy of the message buffer, then the bitmap of pages present in the dump, then the final level of the kvm page table trees (2MB mappings are expanded into a 4K page mappings), then the sparse physical pages according to the bitmap. libkvm can now conveniently access the kvm page table entries. Booting my test 8GB machine, forcing it into ddb and forcing a dump leads to a 48MB minidump. While this is a best case, I expect minidumps to be in the 100MB-500MB range. Obviously, never larger than physical memory of course. minidumps are on by default. It would want be necessary to turn them off if it was necessary to debug corrupt kernel page table management as that would mess up minidumps as well. Both minidumps and regular dumps are supported on the same machine.
* Change msleep() and tsleep() to not alter the calling thread's priorityjhb2006-04-171-3/+1
| | | | | | | | | | | | if the specified priority is zero. This avoids a race where the calling thread could read a snapshot of it's current priority, then a different thread could change the first thread's priority, then the original thread would call sched_prio() inside msleep() undoing the change made by the second thread. I used a priority of zero as no thread that calls msleep() or tsleep() should be specifying a priority of zero anyway. The various places that passed 'curthread->td_priority' or some variant as the priority now pass 0.
* On shutdown try to turn off all swap devices. This way GEOM providers arepjd2006-04-102-19/+63
| | | | | | | | properly closed on shutdown. Requested by: ru Reviewed by: alc MFC after: 2 weeks
* Remove the unused sva and eva arguments from pmap_remove_pages().peter2006-04-031-1/+1
|
* MFP4: Support for profiling dynamically loaded objects.jkoshy2006-03-261-0/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel changes: Inform hwpmc of executable objects brought into the system by kldload() and mmap(), and of their removal by kldunload() and munmap(). A helper function linker_hwpmc_list_objects() has been added to "sys/kern/kern_linker.c" and is used by hwpmc to retrieve the list of currently loaded kernel modules. The unused `MAPPINGCHANGE' event has been deprecated in favour of separate `MAP_IN' and `MAP_OUT' events; this change reduces space wastage in the log. Bump the hwpmc's ABI version to "2.0.00". Teach hwpmc(4) to handle the map change callbacks. Change the default per-cpu sample buffer size to hold 32 samples (up from 16). Increment __FreeBSD_version. libpmc(3) changes: Update libpmc(3) to deal with the new events in the log file; bring the pmclog(3) manual page in sync with the code. pmcstat(8) changes: Introduce new options to pmcstat(8): "-r" (root fs path), "-M" (mapfile name), "-q"/"-v" (verbosity control). Option "-k" now takes a kernel directory as its argument but will also work with the older invocation syntax. Rework string handling in pmcstat(8) to use an opaque type for interned strings. Clean up ELF parsing code and add support for tracking dynamic object mappings reported by a v2.0.00 hwpmc(4). Report statistics at the end of a log conversion run depending on the requested verbosity level. Reviewed by: jhb, dds (kernel parts of an earlier patch) Tested by: gallatin (earlier patch)
* Remove leading __ from __(inline|const|signed|volatile). They areimp2006-03-085-10/+10
| | | | obsolete. This should reduce diffs to NetBSD as well.
* Ignore dirty pages owned by "dead" objects.tegge2006-03-081-0/+4
|
* Eliminate a deadlock when creating snapshots. Blocking vn_start_write() musttegge2006-03-023-2/+6
| | | | | | be called without any vnode locks held. Remove calls to vn_start_write() and vn_finished_write() in vnode_pager_putpages() and add these calls before the vnode lock is obtained to most of the callers that don't already have them.
* Hold extra reference to vm object while cleaning pages.tegge2006-03-021-0/+2
|
* Lock the vm_object while checking its type to see if it is a vnode-backedjhb2006-02-211-11/+25
| | | | | | | | | | | object that requires Giant in vm_object_deallocate(). This is somewhat hairy in that if we can't obtain Giant directly, we have to drop the object lock, then lock Giant, then relock the object lock and verify that we still need Giant. If we don't (because the object changed to OBJT_DEAD for example), then we drop Giant before continuing. Reviewed by: alc Tested by: kris
* Expand scope of marker to reduce the number of page queue scan restarts.tegge2006-02-171-12/+19
|
* Check return value from nonblocking call to vn_start_write().tegge2006-02-171-2/+8
|
* When the VM needs to allocated physical memory pages (for non interrupt use)ups2006-02-151-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | and it has not plenty of free pages it tries to free pages in the cache queue. Unfortunately freeing a cached page requires the locking of the object that owns the page. However in the context of allocating pages we may not be able to lock the object and thus can only TRY to lock the object. If the locking try fails the cache page can not be freed and is activated to move it out of the way so that we may try to free other cache pages. If all pages in the cache belong to objects that are currently locked the cache queue can be emptied without freeing a single page. This scenario caused two problems: 1) vm_page_alloc always failed allocation when it tried freeing pages from the cache queue and failed to do so. However if there are more than cnt.v_interrupt_free_min pages on the free list it should return pages when requested with priority VM_ALLOC_SYSTEM. Failure to do so can cause resource exhaustion deadlocks. 2) Threads than need to allocate pages spend a lot of time cleaning up the page queue without really getting anything done while the pagedaemon needs to work overtime to refill the cache. This change fixes the first problem. (1) Reviewed by: tegge@
* Skip per-cpu caches associated with absent CPUs when generating arwatson2006-02-111-0/+2
| | | | | | memory statistics record stream via sysctl. MFC after: 3 days
* - Fix silly VI locking that is used to check a single flag. The vnodejeff2006-02-061-14/+6
| | | | | | | | | lock also protects this flag so it is not necessary. - Don't rely on v_mount to detect whether or not we've been recycled, use the more appropriate VI_DOOMED instead. Sponsored by: Isilon Systems, Inc. MFC After: 1 week
* Remove an unnecessary call to pmap_remove_all(). The given page is notalc2006-02-041-1/+0
| | | | mapped because its contents are invalid.
* Adjust old comment (present in rev 1.1) to match changes in rev 1.82.tegge2006-02-021-1/+1
| | | | | PR: kern/92509 Submitted by: "Bryan Venteicher" <bryanv@daemoninthecloset.org>
* Use off_t for file size passed to vnode_create_vobject().yar2006-02-011-1/+1
| | | | | | | | | The former type, size_t, was causing truncation to 32 bits on i386, which immediately led to undersizing of VM objects backed by files >4GB. In particular, sendfile(2) was broken for such files. PR: kern/92243 MFC after: 5 days
* - Install a temporary bandaid in vm_object_reference() that will stopjeff2006-02-011-5/+5
| | | | mtx_assert()s from triggering until I find a real long-term solution.
* Change #if defined(DIAGNOSTIC) to KASSERT.alc2006-01-311-4/+3
|
* Add buffer corruption protection (RedZone) for kernel's malloc(9).pjd2006-01-312-0/+219
| | | | | | | | It detects both: buffer underflows and buffer overflows bugs at runtime (on free(9) and realloc(9)) and prints backtraces from where memory was allocated and from where it was freed. Tested by: kris
* The change a few years ago of having contigmalloc start its scan at the topscottl2006-01-291-2/+19
| | | | | | | | | | | | | | | | | of physical RAM instead of the bottom was a sound idea, but the implementation left a lot to be desired. Scans would spend considerable time looking at pages that are above of the address range given by the caller, and multiple calls (like what happens in busdma) would spend more time on top of that rescanning the same pages over and over. Solve this, at least for now, with two simple optimizations. The first is to not bother scanning high ordered pages that are outside of the provided address range. Second is to cache the page index from the last successful operation so that subsequent scans don't have to restart from the top. This is conditional on the numpages argument being the same or greater between calls. MFC After: 2 weeks
* Add a new macro wrapper WITNESS_CHECK() around the witness_warn() function.jhb2006-01-271-1/+1
| | | | | | | | | | The difference between WITNESS_CHECK() and WITNESS_WARN() is that WITNESS_CHECK() should be used in the places that the return value of witness_warn() is checked, whereas WITNESS_WARN() should be used in places where the return value is ignored. Specifically, in a kernel without WITNESS enabled, WITNESS_WARN() evaluates to an empty string where as WITNESS_CHECK evaluates to 0. I also updated the one place that was checking the return value of WITNESS_WARN() to use WITNESS_CHECK.
* Make sure b_vp and b_bufobj are NULL before calling relpbuf(), as it assertscognet2006-01-271-0/+9
| | | | | | | they are. They should be NULL at this point, except if we're coming from swapdev_strategy(). It should only affect the case where we're swapping directly on a file over NFS.
* Style: Add blank line after local variable declarations.alc2006-01-271-0/+1
|
* Use the new macros abstracting the page coloring/queues implementation.alc2006-01-271-2/+2
| | | | (There are no functional changes.)
* Use the new macros abstracting the page coloring/queues implementation.alc2006-01-274-8/+8
| | | | (There are no functional changes.)
* Plug a leak in the newer contigmalloc() implementation. Specifically, ifalc2006-01-261-10/+5
| | | | | | | a multipage allocation was aborted midway, the pages that were already allocated were not always returned to the free list. Submitted by: tegge
* - Avoid calling vm_object_backing_scan() when collapsing an object whenjeff2006-01-251-1/+3
| | | | | | | | the resident page count matches the object size. We know it fully backs its parent in this case. Reviewed by: acl, tegge Sponsored by: Isilon Systems, Inc.
* The previous revision incorrectly changed a switch statement into an ifalc2006-01-251-3/+3
| | | | | | | | | | statement. Specifically, a break statement that previously broke out of the enclosing switch was not changed. Consequently, the enclosing loop terminated prematurely. This could result in "vm_page_insert: page already inserted" panics. Submitted by: tegge
OpenPOWER on IntegriCloud