summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Remove references to vm_zone.h and switch over to the new uma API.jeff2002-03-2010-39/+30
|
* Remove __P.alfred2002-03-1917-213/+209
|
* Quit a warning introduced by UMA. This only occurs on machines wherejeff2002-03-191-1/+1
| | | | | | vm_size_t != unsigned long. Reviewed by: phk
* Fix a gcc-3.1+ warning.peter2002-03-191-0/+1
| | | | | | | | | | | warning: deprecated use of label at end of compound statement ie: you cannot do this anymore: switch(foo) { .... default: }
* This is the first part of the new kernel memory allocator. This replacesjeff2002-03-1912-94/+2865
| | | | | | malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
* Back out the modification of vm_map locks from lockmgr to sx locks. Thegreen2002-03-186-104/+89
| | | | | | | | | | best path forward now is likely to change the lockmgr locks to simple sleep mutexes, then see if any extra contention it generates is greater than removed overhead of managing local locking state information, cost of extra calls into lockmgr, etc. Additionally, making the vm_map lock a mutex and respecting it properly will put us much closer to not needing Giant magic in vm.
* Remove vm_object_count: It's unused, incorrectly maintained and duplicatesalc2002-03-171-4/+1
| | | | information maintained by the zone allocator.
* Undo part of revision 1.57: Now that (o)sendsig() doesn't call useracc(),alc2002-03-171-13/+3
| | | | | | | | the motivation for saving and restoring the map->hint in useracc() is gone. (The same tests that motivated this change in revision 1.57 now show that there is no performance loss from removing it.) This was really a hack and some day we would have had to add new synchronization here on map->hint to maintain it.
* Acquire a read lock on the map inside of vm_map_check_protection() ratheralc2002-03-172-4/+7
| | | | | than expecting the caller to do so. This (1) eliminates duplicated code in kernacc() and useracc() and (2) fixes missing synchronization in munmap().
* Convert all pmap_kenter/pmap_kremove pairs in MI code to use pmap_qenter/jake2002-03-172-3/+4
| | | | | | | | | | | | | | | pmap_qremove. pmap_kenter is not safe to use in MI code because it is not guaranteed to flush the mapping from the tlb on all cpus. If the process in question is preempted and migrates cpus between the call to pmap_kenter and pmap_kremove, the original cpu will be left with stale mappings in its tlb. This is currently not a problem for i386 because we do not use PG_G on SMP, and thus all mappings are flushed from the tlb on context switches, not just user mappings. This is not the case on all architectures, and if PG_G is to be used with SMP on i386 it will be a problem. This was committed by peter earlier as part of his fine grained tlb shootdown work for i386, which was backed out for other reasons. Reviewed by: peter
* Introduce the new 64-bit size disk block, daddr64_t. Changemckusick2002-03-151-2/+2
| | | | | | | | | | | | the bio and buffer structures to have daddr64_t bio_pblkno, b_blkno, and b_lblkno fields which allows access to disks larger than a Terabyte in size. This change also requires that the VOP_BMAP vnode operation accept and return daddr64_t blocks. This delta should not affect system operation in any way. It merely sets up the necessary interfaces to allow the development of disk drivers that work with these larger disk block addresses. It also allows for the development of UFS2 which will use 64-bit block addresses.
* Document faultstate.lookup_still_valid more than none.green2002-03-141-10/+14
| | | | Requested by: alfred
* Rename SI_SUB_MUTEX to SI_SUB_MTX_POOL to make the name at all accurate.green2002-03-136-84/+95
| | | | | | | | | | | | | | | | While doing this, move it earlier in the sysinit boot process so that the VM system can use it. After that, the system is now able to use sx locks instead of lockmgr locks in the VM system. To accomplish this, some of the more questionable uses of the locks (such as testing whether they are owned or not, as well as allowing shared+exclusive recursion) are removed, and simpler logic throughout is used so locks should also be easier to understand. This has been tested on my laptop for months, and has not shown any problems on SMP systems, either, so appears quite safe. One more user of lockmgr down, many more to go :)
* - Remove a number of extra newlines that do not belong here according toeivind2002-03-1024-427/+102
| | | | | | | | | style(9) - Minor space adjustment in cases where we have "( ", " )", if(), return(), while(), for(), etc. - Add /* SYMBOL */ after a few #endifs. Reviewed by: alc
* Revert change in revision 1.53 and add a small comment to protecttegge2002-03-091-0/+12
| | | | | | | | | | | | | | | | | | the revived code. vm pages newly allocated are marked busy (PG_BUSY), thus calling vm_page_delete before the pages has been freed or unbusied will cause a deadlock since vm_page_object_page_remove will wait for the busy flag to be cleared. This can be triggered by calling malloc with size > PAGE_SIZE and the M_NOWAIT flag on systems low on physical free memory. A kernel module that reproduces the problem, written by Logan Gabriel <logan@mail.2cactus.com>, can be found in the freebsd-hackers mail archive (12 Apr 2001). The problem was recently noticed again by Archie Cobbs <archie@dellroad.org>. Reviewed by: dillon
* Fix a bug in the vm_map_clean() procedure. msync()ing an area of memorydillon2002-03-071-1/+4
| | | | | | | that has just been mapped MAP_ANON|MAP_NOSYNC and has not yet been accessed will panic the machine. MFC after: 1 day
* Add a sequential iteration optimization to vm_object_page_clean(). Thisdillon2002-03-061-75/+194
| | | | | | | | | | | | | | | | | moderately improves msync's and VM object flushing for objects containing randomly dirtied pages (fsync(), msync(), filesystem update daemon), and improves cpu use for small-ranged sequential msync()s in the face of very large mmap()ings from O(N) to O(1) as might be performed by a database. A sysctl, vm.msync_flush_flag, has been added and defaults to 3 (the two committed optimizations are turned on by default). 0 will turn off both optimizations. This code has already been tested under stable and is one in a series of memq / vp->v_dirtyblkhd / fsync optimizations to remove O(N^2) restart conditions that will be coming down the pipe. MFC after: 3 days
* * Move bswlist declaration and initialization from kern/vfs_bio.c toeivind2002-03-051-1/+6
| | | | | | vm/vm_pager.c, which is the only place it is used. * Make the QUEUE_* definitions and bufqueues local to vfs_bio.c. * constify buf_wmesg.
* o Create vm_pageq_enqueue() to encapsulate code that is duplicated timealc2002-03-043-30/+24
| | | | | | and again in vm_page.c and vm_pageq.c. o Delete unusused prototypes. (Mainly a result of the earlier renaming of various functions from vm_page_*() to vm_pageq_*().)
* Call vm_pageq_remove_nowakeup() rather than duplicating it.alc2002-03-031-8/+2
|
* Remove some long dead code.alc2002-03-021-9/+0
|
* Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmeticjhb2002-02-271-6/+6
| | | | | | and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.
* Simple p_ucred -> td_ucred changes to start using the per-thread ucredjhb2002-02-273-15/+14
| | | | reference.
* Fix a horribly suboptimal algorithm in the vm_daemon.silby2002-02-272-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | In order to determine what to page out, the vm_daemon checks reference bits on all pages belonging to all processes. Unfortunately, the algorithm used reacted badly with shared pages; each shared page would be checked once per process sharing it; this caused an O(N^2) growth of tlb invalidations. The algorithm has been changed so that each page will be checked only 16 times. Prior to this change, a fork/sleepbomb of 1300 processes could cause the vm_daemon to take over 60 seconds to complete, effectively freezing the system for that time period. With this change in place, the vm_daemon completes in less than a second. Any system with hundreds of processes sharing pages should benefit from this change. Note that the vm_daemon is only run when the system is under extreme memory pressure. It is likely that many people with loaded systems saw no symptoms of this problem until they reached the point where swapping began. Special thanks go to dillon, peter, and Chuck Cranor, who helped me get up to speed with vm internals. PR: 33542, 20393 Reviewed by: dillon MFC after: 1 week
* Back out all the pmap related stuff I've touched over the last few days.peter2002-02-272-3/+3
| | | | | | There is some unresolved badness that has been eluding me, particularly affecting uniprocessor kernels. Turning off PG_G helped (which is a bad sign) but didn't solve it entirely. Userland programs still crashed.
* Jake further reduced IPI shootdowns on sparc64 in loops by using rangedpeter2002-02-272-3/+3
| | | | | | | | shootdowns in a couple of key places. Do the same for i386. This also hides some physical addresses from higher levels and has it use the generic vm_page_t's instead. This will help for PAE down the road. Obtained from: jake (MI code, suggestions for MD part)
* Remove unused variable (td)peter2002-02-261-1/+0
|
* GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED.phk2002-02-221-1/+1
|
* Add a page queue, PQ_HOLD, that temporarily owns pages with nonzero holdtegge2002-02-193-4/+11
| | | | | | | | count that would otherwise be on one of the free queues. This eliminates a panic when broken programs unmap memory that still has pending IO from raw devices. Reviewed by: dillon, alc
* Add one more comment to the OOM changes so that future readers ofsilby2002-02-191-0/+3
| | | | | | | the code may better understand the code. Suggested by: dillon MFC after: 1 week
* Changes to make the OOM killer much more effective:silby2002-02-194-4/+28
| | | | | | | | | | | | - Allow the OOM killer to target processes currently locked in memory. These very often are the ones doing the memory hogging. - Drop the wakeup priority of processes currently sleeping while waiting for their page fault to complete. In order for the OOM killer to work well, the killed process and other system processes waiting on memory must be allowed to wakeup first. Reviewed by: dillon MFC after: 1 week
* Garbage-collect options ACPI_NO_ENABLE_ON_BOOT, AML_DEBUG, BLEED,bde2002-02-152-2/+0
| | | | | DEVICE_SYSCTLS, KEY, LOUTB, NFS_MUIDHASHSIZ, NFS_UIDHASHSIZ, PCI_QUIET and SIMPLELOCK_DEBUG.
* In a threaded world, differnt priorirites become properties ofjulian2002-02-113-4/+6
| | | | | | different entities. Make it so. Reviewed by: jhb@freebsd.org (john baldwin)
* Pre-KSE/M3 commit.julian2002-02-072-5/+6
| | | | | | | | | | this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,
* Fix a race with free'ing vmspaces at process exit when vmspaces arealfred2002-02-054-17/+33
| | | | | | | | | | | | | | | | | | | shared. Also introduce vm_endcopy instead of using pointer tricks when initializing new vmspaces. The race occured because of how the reference was utilized: test vmspace reference, possibly block, decrement reference When sharing a vmspace between multiple processes it was possible for two processes exiting at the same time to test the reference count, possibly block and neither one free because they wouldn't see the other's update. Submitted by: green
* GC P_BUFEXHAUST leftovers, we've had a new mechanism to avoid bufferdillon2002-01-311-3/+0
| | | | | | cache lockups for over a year now. MFC after: 0 days
* Remove a parameter name from a prototype.dwmalone2002-01-251-1/+1
|
* Don't declare vm_swapout() in the NO_SWAPPING case when it is not defined.bde2002-01-171-6/+4
| | | | Fixed some style bugs.
* Replace ffind_* with fget calls.alfred2002-01-141-4/+1
| | | | | | | | Make fget MPsafe. Make fgetvp and fgetsock use the fget subsystem to reduce code bloat. Push giant down in fpathconf().
* SMP Lock struct file, filedesc and the global file list.alfred2002-01-131-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file *fp); /* increments reference count on a file */ struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */ struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */ struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.
* Change the preemption code for software interrupt thread schedules andjhb2002-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
* Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget()dillon2001-12-201-4/+11
| | | | | | | | against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release
* This fixes a large number of bugs in our NFS client side code. A recentdillon2001-12-142-2/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit by Kirk also fixed a softupdates bug that could easily be triggered by server side NFS. * An edge case with shared R+W mmap()'s and truncate whereby the system would inappropriately clear the dirty bits on still-dirty data. (applicable to all filesystems) THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING. see vm/vm_page.c line 1641 * The straddle case for VM pages and buffer cache buffers when truncating. (applicable to NFS client side) * Possible SMP database corruption due to vm_pager_unmap_page() not clearing the TLB for the other cpu's. (applicable to NFS client side but could effect all filesystems). Note: not considered serious since the corruption occurs beyond the file EOF. * When flusing a dirty buffer due to B_CACHE getting cleared, we were accidently setting B_CACHE again (that is, bwrite() sets B_CACHE), when we really want it to stay clear after the write is complete. This resulted in a corrupt buffer. (applicable to all filesystems but probably only triggered by NFS) * We have to call vtruncbuf() when ftruncate()ing to remove any buffer cache buffers. This is still tentitive, I may be able to remove it due to the second bug fix. (applicable to NFS client side) * vnode_pager_setsize() race against nfs_vinvalbuf()... we have to set n_size before calling nfs_vinvalbuf or the NFS code may recursively vnode_pager_setsize() to the original value before the truncate. This is what was causing the user mmap bus faults in the nfs tester program. (applicable to NFS client side) * Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made by Kirk). Testing program written by: Avadis Tevanian, Jr. Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step') MFC after: 1 week
* vm/vm_kern.c: rate limit (to once per second) diagnostic printf whenluigi2001-12-011-2/+8
| | | | | | | | | | | | | | | | | | | | | you run out of mbuf address space. kern/subr_mbuf.c: print a warning message when mb_alloc fails, again rate-limited to at most once per second. This covers other cases of mbuf allocation failures. Probably it also overlaps the one handled in vm/vm_kern.c, so maybe the latter should go away. This warning will let us gradually remove the printf that are scattered across most network drivers to report mbuf allocation failures. Those are potentially dangerous, in that they are not rate-limited and can easily cause systems to panic. Unless there is disagreement (which does not seem to be the case judging from the discussion on -net so far), and because this is sort of a safety bugfix, I plan to commit a similar change to STABLE during the weekend (it affects kern/uipc_mbuf.c there). Discussed-with: jlemon, silby and -net
* When laying out objects in a ZONE_INTERRUPT zone, allow them to crossjlemon2001-11-171-2/+4
| | | | | | | | a page boundary, since we've already allocated all our contiguous kva space up front. This eliminates some memory wastage, and allows us to actually reach the # of objects were specified in the zinit() call. Reviewed by: peter, dillon
* Fix deadlock introduced in 1.73 (Jan 1998). The paging-in-progress countdillon2001-11-091-1/+5
| | | | | | | | | on a vnode-backed object must be incremented *after* obtaining the vnode lock. If it is bumped before obtaining the vnode lock we can deadlock against vtruncbuf(). Submitted by: peter, ps MFC after: 3 days
* Adjust vnode_pager_input_smlfs() to not attempt to BMAP blocks beyond thedillon2001-11-051-2/+7
| | | | | | | | | | | file EOF. This works around a bug in the ISOFS (CDRom) BMAP code which returns bogus values for requests beyond the file EOF rather then returning an error, resulting in either corrupt data being mmap()'d beyond the file EOF or resulting in a seg-fault on the last page of a mmap()'d file (mmap()s of CDRom files). Reported by: peter / Yahoo MFC after: 3 days
* Don't let pmap_object_init_pt() exhaust all available free pagesdillon2001-10-312-1/+2
| | | | | | (allocating pv entries w/ zalloci) when called in a loop due to an madvise(). It is possible to completely exhaust the free page list and cause a system panic when an expected allocation fails.
* Move recently added procedure which was incorrectly placed within andillon2001-10-261-17/+16
| | | | #ifdef DDB block.
* Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has adillon2001-10-264-5/+42
| | | | | | | | | | | | | | | | real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week
OpenPOWER on IntegriCloud