summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* Fix a race with free'ing vmspaces at process exit when vmspaces arealfred2002-02-054-17/+33
| | | | | | | | | | | | | | | | | | | shared. Also introduce vm_endcopy instead of using pointer tricks when initializing new vmspaces. The race occured because of how the reference was utilized: test vmspace reference, possibly block, decrement reference When sharing a vmspace between multiple processes it was possible for two processes exiting at the same time to test the reference count, possibly block and neither one free because they wouldn't see the other's update. Submitted by: green
* GC P_BUFEXHAUST leftovers, we've had a new mechanism to avoid bufferdillon2002-01-311-3/+0
| | | | | | cache lockups for over a year now. MFC after: 0 days
* Remove a parameter name from a prototype.dwmalone2002-01-251-1/+1
|
* Don't declare vm_swapout() in the NO_SWAPPING case when it is not defined.bde2002-01-171-6/+4
| | | | Fixed some style bugs.
* Replace ffind_* with fget calls.alfred2002-01-141-4/+1
| | | | | | | | Make fget MPsafe. Make fgetvp and fgetsock use the fget subsystem to reduce code bloat. Push giant down in fpathconf().
* SMP Lock struct file, filedesc and the global file list.alfred2002-01-131-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Seigo Tanimura (tanimura) posted the initial delta. I've polished it quite a bit reducing the need for locking and adapting it for KSE. Locks: 1 mutex in each filedesc protects all the fields. protects "struct file" initialization, while a struct file is being changed from &badfileops -> &pipeops or something the filedesc should be locked. 1 mutex in each struct file protects the refcount fields. doesn't protect anything else. the flags used for garbage collection have been moved to f_gcflag which was the FILLER short, this doesn't need locking because the garbage collection is a single threaded container. could likely be made to use a pool mutex. 1 sx lock for the global filelist. struct file * fhold(struct file *fp); /* increments reference count on a file */ struct file * fhold_locked(struct file *fp); /* like fhold but expects file to locked */ struct file * ffind_hold(struct thread *, int fd); /* finds the struct file in thread, adds one reference and returns it unlocked */ struct file * ffind_lock(struct thread *, int fd); /* ffind_hold, but returns file locked */ I still have to smp-safe the fget cruft, I'll get to that asap.
* Change the preemption code for software interrupt thread schedules andjhb2002-01-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mutex releases to not require flags for the cases when preemption is not allowed: The purpose of the MTX_NOSWITCH and SWI_NOSWITCH flags is to prevent switching to a higher priority thread on mutex releease and swi schedule, respectively when that switch is not safe. Now that the critical section API maintains a per-thread nesting count, the kernel can easily check whether or not it should switch without relying on flags from the programmer. This fixes a few bugs in that all current callers of swi_sched() used SWI_NOSWITCH, when in fact, only the ones called from fast interrupt handlers and the swi_sched of softclock needed this flag. Note that to ensure that swi_sched()'s in clock and fast interrupt handlers do not switch, these handlers have to be explicitly wrapped in critical_enter/exit pairs. Presently, just wrapping the handlers is sufficient, but in the future with the fully preemptive kernel, the interrupt must be EOI'd before critical_exit() is called. (critical_exit() can switch due to a deferred preemption in a fully preemptive kernel.) I've tested the changes to the interrupt code on i386 and alpha. I have not tested ia64, but the interrupt code is almost identical to the alpha code, so I expect it will work fine. PowerPC and ARM do not yet have interrupt code in the tree so they shouldn't be broken. Sparc64 is broken, but that's been ok'd by jake and tmm who will be fixing the interrupt code for sparc64 shortly. Reviewed by: peter Tested on: i386, alpha
* Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget()dillon2001-12-201-4/+11
| | | | | | | | against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release
* This fixes a large number of bugs in our NFS client side code. A recentdillon2001-12-142-2/+41
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit by Kirk also fixed a softupdates bug that could easily be triggered by server side NFS. * An edge case with shared R+W mmap()'s and truncate whereby the system would inappropriately clear the dirty bits on still-dirty data. (applicable to all filesystems) THIS FIX TEMPORARILY DISABLED PENDING FURTHER TESTING. see vm/vm_page.c line 1641 * The straddle case for VM pages and buffer cache buffers when truncating. (applicable to NFS client side) * Possible SMP database corruption due to vm_pager_unmap_page() not clearing the TLB for the other cpu's. (applicable to NFS client side but could effect all filesystems). Note: not considered serious since the corruption occurs beyond the file EOF. * When flusing a dirty buffer due to B_CACHE getting cleared, we were accidently setting B_CACHE again (that is, bwrite() sets B_CACHE), when we really want it to stay clear after the write is complete. This resulted in a corrupt buffer. (applicable to all filesystems but probably only triggered by NFS) * We have to call vtruncbuf() when ftruncate()ing to remove any buffer cache buffers. This is still tentitive, I may be able to remove it due to the second bug fix. (applicable to NFS client side) * vnode_pager_setsize() race against nfs_vinvalbuf()... we have to set n_size before calling nfs_vinvalbuf or the NFS code may recursively vnode_pager_setsize() to the original value before the truncate. This is what was causing the user mmap bus faults in the nfs tester program. (applicable to NFS client side) * Fix to softupdates (see ufs/ffs/ffs_inode.c 1.73, commit made by Kirk). Testing program written by: Avadis Tevanian, Jr. Testing program supplied by: jkh / Apple (see Dec2001 posting to freebsd-hackers with Subject 'NFS: How to make FreeBS fall on its face in one easy step') MFC after: 1 week
* vm/vm_kern.c: rate limit (to once per second) diagnostic printf whenluigi2001-12-011-2/+8
| | | | | | | | | | | | | | | | | | | | | you run out of mbuf address space. kern/subr_mbuf.c: print a warning message when mb_alloc fails, again rate-limited to at most once per second. This covers other cases of mbuf allocation failures. Probably it also overlaps the one handled in vm/vm_kern.c, so maybe the latter should go away. This warning will let us gradually remove the printf that are scattered across most network drivers to report mbuf allocation failures. Those are potentially dangerous, in that they are not rate-limited and can easily cause systems to panic. Unless there is disagreement (which does not seem to be the case judging from the discussion on -net so far), and because this is sort of a safety bugfix, I plan to commit a similar change to STABLE during the weekend (it affects kern/uipc_mbuf.c there). Discussed-with: jlemon, silby and -net
* When laying out objects in a ZONE_INTERRUPT zone, allow them to crossjlemon2001-11-171-2/+4
| | | | | | | | a page boundary, since we've already allocated all our contiguous kva space up front. This eliminates some memory wastage, and allows us to actually reach the # of objects were specified in the zinit() call. Reviewed by: peter, dillon
* Fix deadlock introduced in 1.73 (Jan 1998). The paging-in-progress countdillon2001-11-091-1/+5
| | | | | | | | | on a vnode-backed object must be incremented *after* obtaining the vnode lock. If it is bumped before obtaining the vnode lock we can deadlock against vtruncbuf(). Submitted by: peter, ps MFC after: 3 days
* Adjust vnode_pager_input_smlfs() to not attempt to BMAP blocks beyond thedillon2001-11-051-2/+7
| | | | | | | | | | | file EOF. This works around a bug in the ISOFS (CDRom) BMAP code which returns bogus values for requests beyond the file EOF rather then returning an error, resulting in either corrupt data being mmap()'d beyond the file EOF or resulting in a seg-fault on the last page of a mmap()'d file (mmap()s of CDRom files). Reported by: peter / Yahoo MFC after: 3 days
* Don't let pmap_object_init_pt() exhaust all available free pagesdillon2001-10-312-1/+2
| | | | | | (allocating pv entries w/ zalloci) when called in a loop due to an madvise(). It is possible to completely exhaust the free page list and cause a system panic when an expected allocation fails.
* Move recently added procedure which was incorrectly placed within andillon2001-10-261-17/+16
| | | | #ifdef DDB block.
* Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has adillon2001-10-264-5/+42
| | | | | | | | | | | | | | | | real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week
* Syntax cleanup and documentation, no operational changes.dillon2001-10-211-5/+9
| | | | MFC after: 1 day
* Move the code that computes the system load average from vm_meter.ciedowse2001-10-202-56/+0
| | | | | | | | | | | | to kern_synch.c in preparation for adding some jitter to the inter-sample time. Note that the "vm.loadavg" sysctl still lives in vm_meter.c which isn't the right place, but it is appropriate for the current (bad) name of that sysctl. Suggested by: jhb (some time ago) Reviewed by: bde
* contigmalloc1() could cause the vm_page_zero_count to become incorrect.dillon2001-10-171-0/+2
| | | | | | Properly track the count. Submitted by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu>
* Don't use an uninitialized field reserved for callers in the bio structuretegge2001-10-151-3/+4
| | | | | | | passed to swap_pager_strategy(). Instead, use a field reserved for drivers and initialize it before usage. Reviewed by: dillon
* Don't remove all mappings of a swapped out process if the vm map containedtegge2001-10-141-1/+5
| | | | | | wired entries. vm_fault_unwire() depends on the mapping being intact. Reviewed by: dillon
* Fix locking violations during page wiring:tegge2001-10-141-3/+32
| | | | | | | | | | | | - vm map entries are not valid after the map has been unlocked. - An exclusive lock on the map is needed before calling vm_map_simplify_entry(). Fix cleanup after page wiring failure to unwire all pages that had been successfully wired before the failure was detected. Reviewed by: dillon
* Makes contigalloc[1]() create the vm_map / underlying wired pages in thedillon2001-10-131-4/+12
| | | | | | | | | | | kernel map and object in a manner that contigfree() is actually able to free. Previously contigfree() freed up the KVA space but could not unwire & free the underlying VM pages due to mismatched pageability between the map entry and the VM pages. Submitted by: Thomas Moestl <tmoestl@gmx.net> Testing by: mark tinguely <tinguely@web.cs.ndsu.nodak.edu> MFC after: 3 days
* Finally fix the VM bug where a file whos EOF occurs in the middle of a pagedillon2001-10-121-3/+21
| | | | | | | | | | | | | | | | | would sometimes prevent a dirty page from being cleaned, even when synced, resulting in the dirty page being re-flushed to disk every 30-60 seconds or so, forever. The problem is that when the filesystem flushes a page to its backing file it typically does not clear dirty bits representing areas of the page that are beyond the file EOF. If the file is also mmap()'d and a fault is taken, vm_fault (properly, is required to) set the vm_page_t->dirty bits to VM_PAGE_BITS_ALL. This combination could leave us with an uncleanable, unfreeable page. The solution is to have the vnode_pager detect the edge case and manually clear the dirty bits representing areas beyond the file EOF. The filesystem does the rest and the page comes up clean after the write completes. MFC after: 3 days
* Change the kernel's ucred API as follows:jhb2001-10-112-21/+14
| | | | | | | | - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
* Add missing includes of sys/ktr.h.jhb2001-10-111-0/+1
|
* Make MAXTSIZ, DFLDSIZ, MAXDSIZ, DFLSSIZ, MAXSSIZ, SGROWSIZ loaderps2001-10-104-11/+15
| | | | | | | tunable. Reviewed by: peter MFC after: 2 weeks
* Remove the SSLEEP case from the load average computation. This hasiedowse2001-10-041-5/+0
| | | | | | | been a no-op for as long as our CVS history goes back. Processes in state SSLEEP could only be counted if p_slptime == 0, but immediately before loadav() is called, schedcpu() has just incremented p_slptime on all SSLEEP processes.
* o Modify access control checks in mmap() to use securelevel_gt() insteadrwatson2001-09-261-1/+1
| | | | | | of direct variable access. Obtained from: TrustedBSD Project
* KSE Milestone 2julian2001-09-1215-213/+263
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* Rip some well duplicated code out of cpu_wait() and cpu_exit() and movepeter2001-09-102-2/+19
| | | | | | | | | | | | it to the MI area. KSE touched cpu_wait() which had the same change replicated five ways for each platform. Now it can just do it once. The only MD parts seemed to be dealing with fpu state cleanup and things like vm86 cleanup on x86. The rest was identical. XXX: ia64 and powerpc did not have cpu_throw(), so I've put a functional stub in place. Reviewed by: jake, tmm, dillon
* Process priority is locked by the sched_lock, not the proc lock.jhb2001-09-011-2/+2
|
* make swapon() MPSAFE (will adjust syscalls.master later)dillon2001-08-311-5/+13
|
* mark obreak() and ovadvise() as being MPSAFEdillon2001-08-311-0/+6
|
* Cleanupdillon2001-08-311-27/+68
|
* Implement idle zeroing of pages. I've been tinkering with thispeter2001-08-253-72/+100
| | | | | | | | | | | | | | on and off since John Dyson left his work-in-progress. It is off by default for now. sysctl vm.zeroidle_enable=1 to turn it on. There are some hacks here to deal with the present lack of preemption - we yield after doing a small number of pages since we wont preempt otherwise. This is basically Matt's algorithm [with hysteresis] with an idle process to call it in a similar way it used to be called from the idle loop. I cleaned up the includes a fair bit here too.
* Remove support for the badly broken MAP_INHERIT (from -current only).dillon2001-08-241-4/+1
|
* Move most of the kernel submap initialization code, including thedillon2001-08-222-0/+106
| | | | | | | | timeout callwheel and buffer cache, out of the platform specific areas and into the machine independant area. i386 and alpha adjusted here. Other cpus can be fixed piecemeal. Reviewed by: freebsd-smp, jake
* KASSERT if vm_page_t->wire_count overflows.dillon2001-08-221-0/+1
|
* Limit the amount of KVM reserved for the buffer cache and for swap-metadillon2001-08-201-2/+5
| | | | | | | | | | | | | | | information. The default limits only effect machines with > 1GB of ram and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and kern.maxbcache. This has the effect of leaving more KVM available for sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad adds memory to a machine and then sees the kernel panic on boot due to running out of KVM. Also change the default swap-meta auto-sizing calculation to allocate half of what it was previously allocating. The prior defaults were way too high. Note that we cannot afford to run out of swap-meta structures so we still stay somewhat conservative here.
* - Remove asleep(), await(), and M_ASLEEP.jhb2001-08-101-5/+2
| | | | | | | | | - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused. Reviewed by: jasone, peter
* - Remove asleep(), await(), and M_ASLEEP.jhb2001-08-102-28/+0
| | | | | | | | | - Callers of asleep() and await() have been converted to calling tsleep(). The only caller outside of M_ASLEEP was the ata driver, which called both asleep() and await() with spl-raised, so there was no need for the asleep() and await() pair. M_ASLEEP was unused. Reviewed by: jasone, peter
* Add a missing semicolon to unbreak the kernel build with INVARIANTStmm2001-08-051-1/+1
| | | | | | | | (which was unfortunately turned off in the confguration I used for the last test build). Spotted by: jake Pointy hat to: tmm
* Whitespace fixes.jhb2001-08-042-2/+2
|
* Add a zdestroy() function to the zone allocator. This is needed for thetmm2001-08-042-0/+109
| | | | | unload case of modules that use their own zones. It has been tested with the nfs module.
* Fixups for the initial allocation by dillon:alfred2001-08-021-7/+15
| | | | | | | | | | | | 1) allocate fewer buckets 2) when failing to allocate swap zone, keep reducing the zone by a third rather than a half in order to reduce the chance of allocating way too little. I also moved around some code for readability. Suggested by: dillon Reviewed by: dillon
* Oops. Last commit to vm_object.c should have got these files too.jake2001-07-313-8/+4
| | | | | | | Remove the use of atomic ops to manipulate vm_object and vm_page flags. Giant is required here, so they are superfluous. Discussed with: dillon
* Remove the use of atomic ops to manipulate vm_object and vm_page flags.jake2001-07-311-11/+6
| | | | | | Giant is required here, so they are superfluous. Discussed with: dillon
* Permit direct swapping to NFS regular files using swapon(2). Weiedowse2001-07-281-3/+10
| | | | | | | | | | already allow this for NFS swap configured via BOOTP, so it is known to work fine. For many diskless configurations is is more flexible to have the client set up swapping itself; it can recreate a sparse swap file to save on server space for example, and it works with a non-NFS root filesystem such as an in-kernel filesystem image.
* make vm_page_select_cache staticassar2001-07-232-2/+1
| | | | Requested by: bde
OpenPOWER on IntegriCloud