summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Reduce the scope of one #ifdef to avoid duplicating a SYSCTL_INT() macrojhb2006-01-061-5/+1
| | | | | and trim another unneeded #ifdef (it was just around a macro that is already conditionally defined).
* Convert the PAGE_SIZE check into a CTASSERT.netchild2006-01-041-1/+3
| | | | Suggested by: jhb
* Prevent divide by zero, use default values in case one of the divisor'snetchild2006-01-041-1/+1
| | | | | | is zero. Tested by: Randy Bush <randy@psg.com>
* MI changes:netchild2005-12-318-145/+213
| | | | | | | | | | | | | | | | | | | | | | | | | | - provide an interface (macros) to the page coloring part of the VM system, this allows to try different coloring algorithms without the need to touch every file [1] - make the page queue tuning values readable: sysctl vm.stats.pagequeue - autotuning of the page coloring values based upon the cache size instead of options in the kernel config (disabling of the page coloring as a kernel option is still possible) MD changes: - detection of the cache size: only IA32 and AMD64 (untested) contains cache size detection code, every other arch just comes with a dummy function (this results in the use of default values like it was the case without the autotuning of the page coloring) - print some more info on Intel CPU's (like we do on AMD and Transmeta CPU's) Note to AMD owners (IA32 and AMD64): please run "sysctl vm.stats.pagequeue" and report if the cache* values are zero (= bug in the cache detection code) or not. Based upon work by: Chad David <davidc@acns.ab.ca> [1] Reviewed by: alc, arch (in 2004) Discussed with: alc, Chad David, arch (in 2004)
* Improve memguard a bit:pjd2005-12-302-0/+93
| | | | | | | | | | | | | | | | | - Provide tunable vm.memguard.desc, so one can specify memory type without changing the code and recompiling the kernel. - Allow to use memguard for kernel modules by providing sysctl vm.memguard.desc, which can be changed to short description of memory type before module is loaded. - Move as much memguard code as possible to memguard.c. - Add sysctl node vm.memguard. and move memguard-specific sysctl there. - Add malloc_desc2type() function for finding memory type based on its short description (ks_shortdesc field). - Memory type can be changed (via vm.memguard.desc sysctl) only if it doesn't exist (will be loaded later) or when no memory is allocated yet. If there is allocated memory for the given memory type, return EBUSY. - Implement two ways of memory types comparsion and make safer/slower the default.
* Don't access fs->first_object after dropping reference to it.tegge2005-12-201-1/+3
| | | | | | The result could be a missed or extra giant unlock. Reviewed by: alc
* Use sf_buf_alloc() instead of vm_map_find() on exec_map to create thealc2005-12-162-0/+74
| | | | | | | | | | | | | | | | | ephemeral mappings that are used as the source for three copy operations from kernel space to user space. There are two reasons for making this change: (1) Under heavy load exec_map can fill up causing vm_map_find() to fail. When it fails, the nascent process is aborted (SIGABRT). Whereas, this reimplementation using sf_buf_alloc() sleeps. (2) Although it is possible to sleep on vm_map_find()'s failure until address space becomes available (see kmem_alloc_wait()), using sf_buf_alloc() is faster. Furthermore, the reimplementation uses a CPU private mapping, avoiding a TLB shootdown on multiprocessors. Problem uncovered by: kris@ Reviewed by: tegge@ MFC after: 3 weeks
* Assert that the page that is given to vm_page_free_toq() does not have anyalc2005-12-131-0/+2
| | | | managed mappings.
* Remove unneeded calls to pmap_remove_all(). The given page is not mapped.alc2005-12-111-1/+0
| | | | Reviewed by: tegge
* Simplify vmspace_dofree().alc2005-12-041-3/+1
|
* Eliminate unneeded preallocation at initialization.alc2005-12-032-2/+0
| | | | Reviewed by: tegge
* Eliminate unneeded preallocation at initialization.alc2005-12-031-2/+0
| | | | Reviewed by: tegge
* Eliminate pmap_init2(). It's no longer used.alc2005-11-202-2/+0
|
* Reimplement the reclamation of PV entries. Specifically, performalc2005-11-092-36/+0
| | | | | | | | | | | | | | | | | | reclamation synchronously from get_pv_entry() instead of asynchronously as part of the page daemon. Additionally, limit the reclamation to inactive pages unless allocation from the PV entry zone or reclamation from the inactive queue fails. Previously, reclamation destroyed mappings to both inactive and active pages. get_pv_entry() still, however, wakes up the page daemon when reclamation occurs. The reason being that the page daemon may move some pages from the active queue to the inactive queue, making some new pages available to future reclamations. Print the "reclaiming PV entries" message at most once per minute, but don't stop printing it after the fifth time. This way, we do not give the impression that the problem has gone away. Reviewed by: tegge
* If a physical page is mapped by two or more virtual addresses, transmittedalc2005-11-081-0/+1
| | | | | | | | | | by the zero-copy sockets method, and written to before the transmission completes, we need to destroy all of the existing mappings to the page, not just the one that we fault on. Otherwise, the mappings will no longer be to the same page and changes made through one of the mappings will not be visible through the others. Observed by: tegge
* Rate limit vnode_pager_putpages printfs to once a second.ps2005-11-011-3/+8
|
* Consider the zero-copy transmission of a page that was wired by mlock(2).alc2005-11-011-0/+2
| | | | | | | | If a copy-on-write fault occurs on the page, the new copy should inherit a part of the original page's wire count. Submitted by: tegge MFC after: 1 week
* Normalize a significant number of kernel malloc type names:rwatson2005-10-311-1/+1
| | | | | | | | | | | | | | | | | | | - Prefer '_' to ' ', as it results in more easily parsed results in memory monitoring tools such as vmstat. - Remove punctuation that is incompatible with using memory type names as file names, such as '/' characters. - Disambiguate some collisions by adding subsystem prefixes to some memory types. - Generally prefer lower case to upper case. - If the same type is defined in multiple architecture directories, attempt to use the same name in additional cases. Not all instances were caught in this change, so more work is required to finish this conversion. Similar changes are required for UMA zone names.
* Use of the ZERO_COPY_SOCKETS options can result in an unusual state thatalc2005-10-221-4/+12
| | | | | | | | | | | | | vm_object_backing_scan() was not written to handle. Specifically, a wired page within a backing object that is shadowed by a page within the shadow object. Handle this state by removing the wired page from the backing object. The wired page will be freed by socow_iodone(). Stop masking errors: If a page is being freed by vm_object_backing_scan(), assert that it is no longer mapped rather than quietly destroying any mappings. Tested by: Harald Schmalzbauer
* Change format string for u_int64_t to %ju from %llu, in order to use therwatson2005-10-201-1/+1
| | | | | | correct format string on 64-bit systems. Pointed out by: pjd
* Add a "show uma" command to DDB, which prints out the current stats forrwatson2005-10-201-0/+36
| | | | | | | available UMA zones. Quite useful for post-mortem debugging of memory leaks without a dump device configured on a panicked box. MFC after: 2 weeks
* Move execve's access time update functionality into a newdds2005-10-121-11/+1
| | | | | | | | vfs_mark_atime() function, and use the new function for performing efficient atime updates in mmap(). Reviewed by: bde MFC after: 2 weeks
* As alc pointed out to me, vm_page.c 1.305 was incomplete: uma_startup()des2005-10-083-8/+8
| | | | | | | still uses the constant UMA_BOOT_PAGES. Change it to accept boot_pages as an additional argument. MFC after: 2 weeks
* Update the vnode's access time after an mmap operation on it.dds2005-10-041-0/+12
| | | | | | | | | | | | Before this change a copy operation with cp(1) would not update the file access times. According to the POSIX mmap(2) documentation: the st_atime field of the mapped file may be marked for update at any time between the mmap() call and the corresponding munmap() call. The initial read or write reference to a mapped region shall cause the file's st_atime field to be marked for update if it has not already been marked for update.
* Trim a couple of unneeded includes.jhb2005-09-291-1/+0
|
* Make sure we have a bufobj before calling bstrategy().cognet2005-09-211-1/+3
| | | | | | | I'm not sure this is the right thing to do, but at least I don't panic anymore when swapping on a NFS file without using md(4). X-MFC after: proper review
* Remove unused (but initialized) variable 'objsize' from vm_mmap()peter2005-09-201-2/+1
|
* Introduce a new lock for the purpose of synchronizing access to thealc2005-09-091-22/+9
| | | | | | | | | | | | | | | UMA boot pages. Disable recursion on the general UMA lock now that startup_alloc() no longer uses it. Eliminate the variable uma_boot_free. It serves no purpose. Note: This change eliminates a lock-order reversal between a system map mutex and the UMA lock. See http://sources.zabbadoz.net/freebsd/lor.html#109 for details. MFC after: 3 days
* Eliminate an incorrect cast.alc2005-09-071-1/+1
|
* Pass a value of type vm_prot_t to pmap_enter_quick() so that it determinealc2005-09-033-4/+5
| | | | whether the mapping should permit execute access.
* Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable.kan2005-08-132-9/+1
| | | | | | | | | | | vm_pager_init() is run before required nswbuf variable has been set to correct value. This caused system to run with single pbuf available for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt variable in the same way. Reported by: ade Obtained from: alc MFC after: 2 days
* Check for marker pages when scanning active and inactive page queues.tegge2005-08-121-0/+5
| | | | Reviewed by: alc
* Introduce the vm.boot_pages tunable and sysctl, which controls the numberdes2005-08-121-3/+8
| | | | | | of pages reserved to bootstrap the kernel memory allocator. MFC after: 2 weeks
* Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVEtegge2005-08-102-5/+75
| | | | | | | | | | | | | | | | | | | due to the vm object being locked. When a process writes large amounts of data to a file, the vm object associated with that file can contain most of the physical pages on the machine. If the process is preempted while holding the lock on the vm object, pagedaemon would be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to other vm objects. Temporarily unlock the page queues lock while locking vm objects to avoid lock order violation. Detect and handle relevant page queue changes. This change depends on both the lock portion of struct vm_object and normal struct vm_page being type stable. Reviewed by: alc
* Use atomic operations on runningbufspace.ssouhlal2005-08-081-2/+4
| | | | | | PR: kern/84318 Submitted by: ade MFC after: 3 days
* Don't perform a nested include of opt_vmpage.h if LIBMEMSTAT is defined,rwatson2005-08-041-1/+1
| | | | | | | as opt_vmpage.h will not be available to user space library builds. A similar existing check is present for KLD_MODULE for similar reasons. MFC after: 3 days
* Wrap inlines in uma_int.h in #ifdef _KERNEL so that uma_int.h can berwatson2005-08-041-0/+2
| | | | | | | used from memstat_uma.c for the purposes of kvm access without lots of additional unsafe includes. MFC after: 3 days
* Rename UMA_MAX_NAME to UTH_MAX_NAME, since it's a maximum in therwatson2005-07-252-5/+14
| | | | | | | | | | | monitoring API, which might or might not be the same as the internal maximum (currently none). Export flag information on UMA zones -- in particular, whether or not this is a secondary zone, and so the keg free count should be considered in that light. MFC after: 1 day
* Eliminate inconsistency in the setting of the B_DONE flag. Specifically,alc2005-07-201-2/+0
| | | | | | | | | | | | | | | | make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio(). Note: I don't believe that the other, more widely-used b_iodone callbacks are affected. Discussed with: jeff Reviewed by: phk MFC after: 2 weeks
* Further UMA statistics related changes:rwatson2005-07-201-14/+27
| | | | | | | | | | | | | | | | | | | - Add a new uma_zfree_internal() flag, ZFREE_STATFREE, which causes it to to update the zone's uz_frees statistic. Previously, the statistic was updated unconditionally. - Use the flag in situations where a "real" free occurs: i.e., one where the caller is freeing an allocated item, to be differentiated from situations where uma_zfree_internal() is used to tear down the item during slab teardown in order to invoke its fini() method. Also use the flag when UMA is freeing its internal objects. - When exchanging a bucket with the zone from the per-CPU cache when freeing an item, flush cache statistics back to the zone (since the zone lock and critical section are both held) to match the allocation case. MFC after: 3 days
* Eliminate an incorrect (and unnecessary) cast.alc2005-07-201-1/+1
|
* Use mp_maxid in preference to MAXCPU when creating exports of UMArwatson2005-07-161-3/+3
| | | | | | | | | per-CPU cache statistics. UMA sizes the cache array based on the number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU could read off the end of the array (into the next zone). Reported by: yongari MFC after: 1 week
* Improve canonicalization of copyrights. Order copyrights by order ofrwatson2005-07-165-20/+15
| | | | | | | assertion (jeff, bmilekic, rwatson). Suggested ages ago by: bde MFC after: 1 week
* Move the unlocking of the zone mutex in sysctl_vm_zone_stats() so thatrwatson2005-07-161-5/+9
| | | | | | | | | | | | | it covers the following of the uc_alloc/freebucket cache pointers. Originally, I felt that the race wasn't helped by holding the mutex, hence a comment in the code and not holding it across the cache access. However, it does improve consistency, as while it doesn't prevent bucket exchange, it does prevent bucket pointer invalidation. So a race in gathering cache free space statistics still can occur, but not one that follows an invalid bucket pointer, if the mutex is held. Submitted by: yongari MFC after: 1 week
* Increase the flags field for kegs from a 16 to a 32 bit value;silby2005-07-163-12/+12
| | | | we have exhausted all 16 flags.
* Track UMA(9) allocation failures by zone, and export via sysctl.rwatson2005-07-153-18/+35
| | | | | Requested by: victor cruceru <victor dot cruceru at gmail dot com> MFC after: 1 week
* Convert a remaining !fs.map->system_map tojhb2005-07-141-1/+1
| | | | | | | | | fs.first_object->flags & OBJ_NEEDGIANT test that was missed in an earlier revision. This fixes mutex assertion failures in the debug.mpsafevm=0 case. Reported by: ps MFC after: 3 days
* Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocatorrwatson2005-07-143-17/+240
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | statistics via a binary structure stream: - Add structure 'uma_stream_header', which defines a stream version, definition of MAXCPUs used in the stream, and the number of zone records in the stream. - Add structure 'uma_type_header', which defines the name, alignment, size, resource allocation limits, current pages allocated, preferred bucket size, and central zone + keg statistics. - Add structure 'uma_percpu_stat', which, for each per-CPU cache, includes the number of allocations and frees, as well as the number of free items in the cache. - When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUs uma_percpu_stat structures holding per-CPU allocation information. Typical values of MAXCPU will be 1 (UP compiled kernel) and 16 (SMP compiled kernel). This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs. While here, also export the number of UMA zones as a sysctl vm.uma_count, in order to assist in sizing user swpace buffers to receive the stream. A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. This change directly supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats rather than separately maintained mbuf allocator statistics. MFC after: 1 week
* In addition to tracking allocs in the zone, also track frees. Addrwatson2005-07-142-0/+7
| | | | | | a zone free counter, as well as a cache free counter. MFC after: 1 week
* In an earlier world order, UMA would flush per-CPU statistics to therwatson2005-07-141-1/+2
| | | | | | | | | | | | | | | zone whenever it was moving buckets between the zone and the cache, or when coalescing statistics across the CPU. Remove flushing of statistics to the zone when coalescing statistics as part of sysctl, as we won't be running on the right CPU to write to the cache statistics. Add a missed gathering of statistics: when uma_zalloc_internal() does a special case allocation of a single item, make sure to update the zone statistics to represent this. Previously this case wasn't accounted for in user-visible statistics. MFC after: 1 week
OpenPOWER on IntegriCloud