summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable.kan2005-08-132-9/+1
| | | | | | | | | | | vm_pager_init() is run before required nswbuf variable has been set to correct value. This caused system to run with single pbuf available for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt variable in the same way. Reported by: ade Obtained from: alc MFC after: 2 days
* Check for marker pages when scanning active and inactive page queues.tegge2005-08-121-0/+5
| | | | Reviewed by: alc
* Introduce the vm.boot_pages tunable and sysctl, which controls the numberdes2005-08-121-3/+8
| | | | | | of pages reserved to bootstrap the kernel memory allocator. MFC after: 2 weeks
* Don't allow pagedaemon to skip pages while scanning PQ_ACTIVE or PQ_INACTIVEtegge2005-08-102-5/+75
| | | | | | | | | | | | | | | | | | | due to the vm object being locked. When a process writes large amounts of data to a file, the vm object associated with that file can contain most of the physical pages on the machine. If the process is preempted while holding the lock on the vm object, pagedaemon would be able to move very few pages from PQ_INACTIVE to PQ_CACHE or from PQ_ACTIVE to PQ_INACTIVE, resulting in unlimited cleaning of dirty pages belonging to other vm objects. Temporarily unlock the page queues lock while locking vm objects to avoid lock order violation. Detect and handle relevant page queue changes. This change depends on both the lock portion of struct vm_object and normal struct vm_page being type stable. Reviewed by: alc
* Use atomic operations on runningbufspace.ssouhlal2005-08-081-2/+4
| | | | | | PR: kern/84318 Submitted by: ade MFC after: 3 days
* Don't perform a nested include of opt_vmpage.h if LIBMEMSTAT is defined,rwatson2005-08-041-1/+1
| | | | | | | as opt_vmpage.h will not be available to user space library builds. A similar existing check is present for KLD_MODULE for similar reasons. MFC after: 3 days
* Wrap inlines in uma_int.h in #ifdef _KERNEL so that uma_int.h can berwatson2005-08-041-0/+2
| | | | | | | used from memstat_uma.c for the purposes of kvm access without lots of additional unsafe includes. MFC after: 3 days
* Rename UMA_MAX_NAME to UTH_MAX_NAME, since it's a maximum in therwatson2005-07-252-5/+14
| | | | | | | | | | | monitoring API, which might or might not be the same as the internal maximum (currently none). Export flag information on UMA zones -- in particular, whether or not this is a secondary zone, and so the keg free count should be considered in that light. MFC after: 1 day
* Eliminate inconsistency in the setting of the B_DONE flag. Specifically,alc2005-07-201-2/+0
| | | | | | | | | | | | | | | | make the b_iodone callback responsible for setting it if it is needed. Previously, it was set unconditionally by bufdone() without holding whichever lock is shared by the b_iodone callback and the corresponding top-half function. Consequently, in a race, the top-half function could conclude that operation was done before the b_iodone callback finished. See, for example, aio_physwakeup() and aio_fphysio(). Note: I don't believe that the other, more widely-used b_iodone callbacks are affected. Discussed with: jeff Reviewed by: phk MFC after: 2 weeks
* Further UMA statistics related changes:rwatson2005-07-201-14/+27
| | | | | | | | | | | | | | | | | | | - Add a new uma_zfree_internal() flag, ZFREE_STATFREE, which causes it to to update the zone's uz_frees statistic. Previously, the statistic was updated unconditionally. - Use the flag in situations where a "real" free occurs: i.e., one where the caller is freeing an allocated item, to be differentiated from situations where uma_zfree_internal() is used to tear down the item during slab teardown in order to invoke its fini() method. Also use the flag when UMA is freeing its internal objects. - When exchanging a bucket with the zone from the per-CPU cache when freeing an item, flush cache statistics back to the zone (since the zone lock and critical section are both held) to match the allocation case. MFC after: 3 days
* Eliminate an incorrect (and unnecessary) cast.alc2005-07-201-1/+1
|
* Use mp_maxid in preference to MAXCPU when creating exports of UMArwatson2005-07-161-3/+3
| | | | | | | | | per-CPU cache statistics. UMA sizes the cache array based on the number of CPUs at boot (mp_maxid + 1), and iterating based on MAXCPU could read off the end of the array (into the next zone). Reported by: yongari MFC after: 1 week
* Improve canonicalization of copyrights. Order copyrights by order ofrwatson2005-07-165-20/+15
| | | | | | | assertion (jeff, bmilekic, rwatson). Suggested ages ago by: bde MFC after: 1 week
* Move the unlocking of the zone mutex in sysctl_vm_zone_stats() so thatrwatson2005-07-161-5/+9
| | | | | | | | | | | | | it covers the following of the uc_alloc/freebucket cache pointers. Originally, I felt that the race wasn't helped by holding the mutex, hence a comment in the code and not holding it across the cache access. However, it does improve consistency, as while it doesn't prevent bucket exchange, it does prevent bucket pointer invalidation. So a race in gathering cache free space statistics still can occur, but not one that follows an invalid bucket pointer, if the mutex is held. Submitted by: yongari MFC after: 1 week
* Increase the flags field for kegs from a 16 to a 32 bit value;silby2005-07-163-12/+12
| | | | we have exhausted all 16 flags.
* Track UMA(9) allocation failures by zone, and export via sysctl.rwatson2005-07-153-18/+35
| | | | | Requested by: victor cruceru <victor dot cruceru at gmail dot com> MFC after: 1 week
* Convert a remaining !fs.map->system_map tojhb2005-07-141-1/+1
| | | | | | | | | fs.first_object->flags & OBJ_NEEDGIANT test that was missed in an earlier revision. This fixes mutex assertion failures in the debug.mpsafevm=0 case. Reported by: ps MFC after: 3 days
* Introduce a new sysctl, vm.zone_stats, which exports UMA(9) allocatorrwatson2005-07-143-17/+240
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | statistics via a binary structure stream: - Add structure 'uma_stream_header', which defines a stream version, definition of MAXCPUs used in the stream, and the number of zone records in the stream. - Add structure 'uma_type_header', which defines the name, alignment, size, resource allocation limits, current pages allocated, preferred bucket size, and central zone + keg statistics. - Add structure 'uma_percpu_stat', which, for each per-CPU cache, includes the number of allocations and frees, as well as the number of free items in the cache. - When the sysctl is queried, return a stream header, followed by a series of type descriptions, each consisting of a type header followed by a series of MAXCPUs uma_percpu_stat structures holding per-CPU allocation information. Typical values of MAXCPU will be 1 (UP compiled kernel) and 16 (SMP compiled kernel). This query mechanism allows user space monitoring tools to extract memory allocation statistics in a machine-readable form, and to do so at a per-CPU granularity, allowing monitoring of allocation patterns across CPUs in order to better understand the distribution of work and memory flow over multiple CPUs. While here, also export the number of UMA zones as a sysctl vm.uma_count, in order to assist in sizing user swpace buffers to receive the stream. A follow-up commit of libmemstat(3), a library to monitor kernel memory allocation, will occur in the next few days. This change directly supports converting netstat(1)'s "-mb" mode to using UMA-sourced stats rather than separately maintained mbuf allocator statistics. MFC after: 1 week
* In addition to tracking allocs in the zone, also track frees. Addrwatson2005-07-142-0/+7
| | | | | | a zone free counter, as well as a cache free counter. MFC after: 1 week
* In an earlier world order, UMA would flush per-CPU statistics to therwatson2005-07-141-1/+2
| | | | | | | | | | | | | | | zone whenever it was moving buckets between the zone and the cache, or when coalescing statistics across the CPU. Remove flushing of statistics to the zone when coalescing statistics as part of sysctl, as we won't be running on the right CPU to write to the cache statistics. Add a missed gathering of statistics: when uma_zalloc_internal() does a special case allocation of a single item, make sure to update the zone statistics to represent this. Previously this case wasn't accounted for in user-visible statistics. MFC after: 1 week
* Change the panic in trash_ctor into just a printf for now. Once the reportssilby2005-06-261-2/+4
| | | | | | | of panics in trash_ctor relating to mbufs have been examined and a fix found, this will be turned back into a panic. Approved by: re (rwatson)
* Increase UMA_BOOT_PAGES to prevent a crash during initialization. Seealc2005-06-161-1/+1
| | | | | | | | | http://docs.FreeBSD.org/cgi/mid.cgi?42AD8270.8060906 for a detailed description of the crash. Reported by: Eric Anderson Approved by: re (scottl) MFC after: 3 days
* The new contigmalloc(9) has a bad degenerate case where there weregreen2005-06-111-11/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | many regions checked again and again despite knowing the pages contained were not usable and only satisfied the alignment constraints This case was compounded, especially for large allocations, by the practice of looping from the top of memory so as to keep out of the important low-memory regions. While the old contigmalloc(9) has the same problem, it is not as noticeable due to looping from the low memory to high. This degenerate case is fixed, as well as reversing the sense of the rest of the loops within it, to provide a tremendous speed increase. This makes the best case O(n * VM overhead) much more likely than the worst case O(4 * VM overhead). For comparison, the worst case for old contigmalloc would be O(5 * VM overhead) in addition to its strategy of turning used memory into free being highly pessimal. Also, fix a bug that in practice most likely couldn't have been triggered, int the new contigmalloc(9): it walked backwards from the end of memory without accounting for how many pages it needed. Potentially, nonexistant pages could have been mapped. This hasn't occurred because the kernel generally requests as its first contigmalloc(9) a single page. Reported by: Nicolas Dehaine <nicko@stbernard.com>, wes MFC After: 1 month More testing by: Nicolas Dehaine <nicko@stbernard.com>, wes
* Add a comment to the effect that fictitious pages do not require thealc2005-06-101-0/+4
| | | | initialization of their machine-dependent fields.
* Introduce a procedure, pmap_page_init(), that initializes thealc2005-06-102-0/+2
| | | | | | | | | | | | | | | | | | | vm_page's machine-dependent fields. Use this function in vm_pageq_add_new_page() so that the vm_page's machine-dependent and machine-independent fields are initialized at the same time. Remove code from pmap_init() for initializing the vm_page's machine-dependent fields. Remove stale comments from pmap_init(). Eliminate the Boolean variable pmap_initialized from the alpha, amd64, i386, and ia64 pmap implementations. Its use is no longer required because of the above changes and earlier changes that result in physical memory that is being mapped at initialization time being mapped without pv entries. Tested by: cognet, kensmith, marcel
* Update some comments to reflect the change from spl-based to lock-basedalc2005-05-281-4/+3
| | | | synchronization.
* Use low level constructs borrowed from interrupt threads to wait forups2005-05-231-1/+36
| | | | | work in proc0. Remove the TDP_WAKEPROC0 workaround.
* Swap in can occur safely without Giant. Release Giant on entry toalc2005-05-221-3/+2
| | | | scheduler().
* Remove GIANT_REQUIRED from swapout_procs().alc2005-05-221-2/+0
|
* Reduce the number of times that we acquire and release locks inalc2005-05-201-8/+6
| | | | | | swap_pager_getpages(). MFC after: 1 week
* Remove calls to spl*().alc2005-05-191-43/+0
|
* Remove a stale comment concerning spl* usage.alc2005-05-191-2/+0
|
* Update some comments to reflect the change from spl-based to lock-basedalc2005-05-181-2/+3
| | | | synchronization.
* Remove calls to spl*().alc2005-05-181-11/+0
|
* Revert revision 1.270: swp_pager_async_iodone() need not performalc2005-05-181-2/+0
| | | | | | VM_LOCK_GIANT(). Discussed with: jeff
* Correct 32 vs 64 bit signedness issues.bz2005-05-181-8/+9
| | | | | Approved by: pjd (mentor) MFC after: 2 weeks
* The final test in unlock_and_deallocate() to determine if GIANT needs to begrehan2005-05-121-1/+1
| | | | | | | unlocked wasn't updated to check for OBJ_NEEDGIANT. This caused a WITNESS panic when debug_mpsafevm was set to 0. Approved by: jeffr
* Enable debug_mpsafevm on ia64 due to the severe functional regressionmarcel2005-05-081-1/+1
| | | | | | | caused by recent locking changes when it's off. Revert the logic to trim down the conditional. Clued-in by: alc@
* - We need to inhert the OBJ_NEEDGIANT flag from the original object injeff2005-05-041-0/+1
| | | | | | vm_object_split(). Spotted by: alc
* - Add a new object flag "OBJ_NEEDSGIANT". We set this flag if thejeff2005-05-034-4/+14
| | | | | | | underlying vnode requires Giant. - In vm_fault only acquire Giant if the underlying object has NEEDSGIANT set. - In vm_object_shadow inherit the NEEDSGIANT flag from the backing object.
* Remove GIANT_REQUIRED from vmspace_exec().alc2005-05-021-1/+0
| | | | Prodded by: jeff
* - VM_LOCK_GIANT in the swap pager's iodone routine as VFS will soon call itjeff2005-04-301-0/+2
| | | | | | without Giant. Sponsored by: Isilon Systems, Inc.
* Modify UMA to use critical sections to protect per-CPU caches, rather thanrwatson2005-04-292-113/+120
| | | | | | | | | | | | | | | | | | | mutexes, which offers lower overhead on both UP and SMP. When allocating from or freeing to the per-cpu cache, without INVARIANTS enabled, we now no longer perform any mutex operations, which offers a 1%-3% performance improvement in a variety of micro-benchmarks. We rely on critical sections to prevent (a) preemption resulting in reentrant access to UMA on a single CPU, and (b) migration of the thread during access. In the event we need to go back to the zone for a new bucket, we release the critical section to acquire the global zone mutex, and must re-acquire the critical section and re-evaluate which cache we are accessing in case migration has occured, or circumstances have changed in the current cache. Per-CPU cache statistics are now gathered lock-free by the sysctl, which can result in small races in statistics reporting for caches. Reviewed by: bmilekic, jeff (somewhat) Tested by: rwatson, kris, gnn, scottl, mike at sentex dot net, others
* - Pass the ISOPEN flag to namei so filesystems will know we're about tojeff2005-04-271-1/+1
| | | | open them or otherwise access the data.
* Add the vm.exec_map_entries tunable and read-only sysctl, which controlskris2005-04-251-1/+7
| | | | | | | | | | | | the number of entries in exec_map (maximum number of simultaneous execs that can be handled by the kernel). The default value of 16 is insufficient on heavily loaded machines (particularly SMP machines), and if it is exceeded then executing further processes will generate a SIGABRT. This is a workaround until a better solution can be implemented. Reviewed by: alc MFC after: 3 days
* Unbreak the build on 64-bit architectures.des2005-04-161-1/+2
|
* Add a vm.blacklist tunable which can hold a space or comma seperated listjhb2005-04-151-0/+29
| | | | | | | | | | of physical addresses. The pages containing these physical addresses will not be added to the free list and thus will effectively be ignored by the VM system. This is mostly useful for the case when one knows of specific physical addresses that have bit errors (such as from a memtest run) so that one can blacklist the bad pages while waiting for the new sticks of RAM to arrive. The physical addresses of any ignored pages are listed in the message buffer as well.
* Move MAC check_vnode_mmap entry point out from being exclusive tocsjp2005-04-141-5/+5
| | | | | | | | | | | | | | | | | | | | MAP_SHARED so that the entry point gets executed un-conditionally. This may be useful for security policies which want to perform access control checks around run-time linking. -add the mmap(2) flags argument to the check_vnode_mmap entry point so that we can make access control decisions based on the type of mapped object. -update any dependent API around this parameter addition such as function prototype modifications, entry point parameter additions and the inclusion of sys/mman.h header file. -Change the MLS, BIBA and LOMAC security policies so that subject domination routines are not executed unless the type of mapping is shared. This is done to maintain compatibility between the old vm_mmap_vnode(9) and these policies. Reviewed by: rwatson MFC after: 1 month
* Tidy vcnt() by moving a duplicated line above #ifdef and removing a uselessjhb2005-04-121-5/+2
| | | | variable.
* Flip the switch and turn mpsafevm on by default for sparc64.jhb2005-04-041-1/+1
| | | | Approved by: alc
OpenPOWER on IntegriCloud