summaryrefslogtreecommitdiffstats
path: root/sys/vm
Commit message (Collapse)AuthorAgeFilesLines
* remove unneded sys/ucred.h includesalfred2000-11-301-1/+0
|
* Protect the following with a lockmgr lock:jake2000-11-224-1/+17
| | | | | | | | | | | | allproc zombproc pidhashtbl proc.p_list proc.p_hash nextpid Reviewed by: jhb Obtained from: BSD/OS and netbsd
* o Export dmmax ("Maximum size of a swap block") using SYSCTL_INT.rwatson2000-11-201-0/+3
| | | | | This removes a reason that systat requires setgid kmem. More to come.
* Implement a low-memory deadlock solution.dillon2000-11-184-79/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
* Add the splvm()'s suggested in PR 20609 to protect vm_pager_page_unswapped().dillon2000-11-181-0/+3
| | | | | | The remainder of the PR is still open. PR: kern/20609 (partial fix)
* This patchset fixes a large number of file descriptor race conditions.dillon2000-11-182-17/+41
| | | | | | | | | | | | Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
* Clear the MAP_ENTRY_USER_WIRED flag from cloned vm_map entries.tegge2000-11-021-0/+2
| | | | PR: 2840
* Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ingphk2000-10-291-1/+0
| | | | | | | | | | the offending inline function (BUF_KERNPROC) on it being #included already. I'm not sure BUF_KERNPROC() is even the right thing to do or in the right place or implemented the right way (inline vs normal function). Remove consequently unneeded #includes of <sys/proc.h>
* - Catch a machine/mutex.h -> sys/mutex.h I somehow missed.jhb2000-10-251-3/+3
| | | | | | | - Close a small race condition. The sched_lock mutex protects p->p_stat as well as the run queues. Another CPU could change p_stat of the process while we are waiting for the lock, and we would end up scheduling a process that isn't runnable.
* Implement write combining for crashdumps. This is useful whenps2000-10-171-1/+1
| | | | | | | | | | | | | | | | | | write caching is disabled on both SCSI and IDE disks where large memory dumps could take up to an hour to complete. Taking an i386 scsi based system with 512MB of ram and timing (in seconds) how long it took to complete a dump, the following results were obtained: Before: After: WCE TIME WCE TIME ------------------ ------------------ 1 141.820972 1 15.600111 0 797.265072 0 65.480465 Obtained from: Yahoo! Reviewed by: peter
* The swap bitmap allocator was not calculating the bitmap size properlydillon2000-10-133-10/+21
| | | | | | | | | | | | | | | in the face of non-stripe-aligned swap areas. The bug could cause a panic during boot. Refuse to configure a swap area that is too large (67 GB or so) Properly document the power-of-2 requirement for SWB_NPAGES. The patch is slightly different then the one Tor enclosed in the P.R., but accomplishes the same thing. PR: kern/20273 Submitted by: Tor.Egge@fast.no
* For lockmgr mutex protection, use an array of mutexes that are allocatedjasone2000-10-121-4/+4
| | | | | | | | | and initialized during boot. This avoids bloating sizeof(struct lock). As a side effect, it is no longer necessary to enforce the assumtion that lockinit()/lockdestroy() calls are paired, so the LK_VALID flag has been removed. Idea taken from: BSD/OS.
* If a process is over its resource limit for datasize, still allowdwmalone2000-10-061-3/+7
| | | | | | | | it to lower its memory usage. This was mentioned on the mailing lists ages ago, and I've lost the name of the person who brought it up. Reviewed by: alc
* Convert lockmgr locks from using simple locks to using mutexes.jasone2000-10-042-4/+14
| | | | | | Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
* - Add a new process flag P_NOLOAD that marks a process that should bejhb2000-09-151-0/+3
| | | | | ignored during load average calcuations. - Set this flag for the idle processes and the softinterrupt process.
* Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT.bp2000-09-121-0/+9
| | | | | | | They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon
* Major update to the way synchronization is done in the kernel. Highlightsjasone2000-09-073-1/+23
| | | | | | | | | | | | | | | include: * Mutual exclusion is used instead of spl*(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) * Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh
* Make the arguments match the functionality of the functions.obrien2000-08-262-7/+7
|
* Minor cleanups:peter2000-07-282-55/+19
| | | | | | - remove unused variables (fix warnings) - use a more consistant ansi style rather than a mixture - remove dead #if 0 code and declarations
* Clean up the snapshot code so that it no longer depends on the use ofmckusick2000-07-261-1/+1
| | | | | | | | | | | | | | the SF_IMMUTABLE flag to prevent writing. Instead put in explicit checking for the SF_SNAPSHOT flag in the appropriate places. With this change, it is now possible to rename and link to snapshot files. It is also possible to set or clear any of the owner, group, or other read bits on the file, though none of the write or execute bits can be set. There is also an explicit test to prevent the setting or clearing of the SF_SNAPSHOT flag via chflags() or fchflags(). Note also that the modify time cannot be changed as it needs to accurately reflect the time that the snapshot was taken. Submitted by: Robert Watson <rwatson@FreeBSD.org>
* Add snapshots to the fast filesystem. Most of the changes supportmckusick2000-07-112-1/+16
| | | | | | | | | | | | | | | | | | | | the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
* #elsif -> #elifalfred2000-07-111-8/+8
| | | | Noticed by: green
* Support for unsigned integer and long sysctl variables. Update thejhb2000-07-051-48/+48
| | | | | | | | | SYSCTL_LONG macro to be consistent with other integer sysctl variables and require an initial value instead of assuming 0. Update several sysctl variables to use the unsigned types. PR: 15251 Submitted by: Kelly Yancey <kbyanc@posi.net>
* Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.phk2000-07-042-3/+3
| | | | Pointed out by: bde
* Replace the PQ_*CACHE options with a single PQ_CACHESIZE option that youjhb2000-07-041-24/+31
| | | | | | | set equal to the number of kilobytes in your cache. The old options are still supported for backwards compatibility. Submitted by: Kelly Yancey <kbyanc@posi.net>
* Simplify and rationalise the management of the vnode free listmckusick2000-07-041-6/+2
| | | | (preparing the code to add snapshots).
* Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:phk2000-07-032-3/+3
| | | | | | | | Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our sources: -sysctl_vm_zone SYSCTL_HANDLER_ARGS +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
* Nifty idea from Jeroen van Gelderen; don't call a routine to check ifmarkm2000-06-251-1/+2
| | | | | | we are using the /dev/zero device, just check a flag (supplied by /dev/zero). Reviewed by: dfr
* Add missing increment of allocation counter.hsu2000-06-051-0/+2
|
* This is a cleanup patch to Peter's new OBJT_PHYS VM object typedillon2000-05-297-34/+92
| | | | | | | | | | | | | | | | | and sysv shared memory support for it. It implements a new PG_UNMANAGED flag that has slightly different characteristics from PG_FICTICIOUS. A new sysctl, kern.ipc.shm_use_phys has been added to enable the use of physically-backed sysv shared memory rather then swap-backed. Physically backed shm segments are not tracked with PV entries, allowing programs which use a large shm segment as a rendezvous point to operate without eating an insane amount of KVM in the PV entry management. Read: Oracle. Peter's OBJT_PHYS object will also allow us to eventually implement page-table sharing and/or 4MB physical page support for such segments. We're half way there.
* Brucify the pmap_enter_temporary() changes.dfr2000-05-291-1/+1
|
* Fix bug in vm_pageout_page_stats() that always resulted in a fulldillon2000-05-291-0/+2
| | | | | | | scan of the active queue. This fix is not expected to have any noticeable impact on performance. Noticed by: Rik van Riel <riel@conectiva.com.br>
* Add a new pmap entry point, pmap_enter_temporary() to be used duringdfr2000-05-281-0/+1
| | | | | | | dumps to create temporary page mappings. This replaces the use of CADDR1 which is fairly x86 specific. Reviewed by: dillon
* Back out the previous change to the queue(3) interface.jake2000-05-263-12/+12
| | | | | | It was not discussed and should probably not happen. Requested by: msmith and others
* Change the way that the queue(3) structures are declared; don't assume thatjake2000-05-233-12/+12
| | | | | | | | the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
* Checkpoint of a new physical memory backed object type, that does notpeter2000-05-215-3/+237
| | | | | | | | | | | | | have pv_entries. This is intended for very special circumstances, eg: a certain database that has a 1GB shm segment mapped into 300 processes. That would consume 2GB of kvm just to hold the pv_entries alone. This would not be used on systems unless the physical ram was available, as it's not pageable. This is a work-in-progress, but is a useful and functional checkpoint. Matt has got some more fixes for it that will be committed soon. Reviewed by: dillon
* Implement an optimization of the VM<->pmap API. Pass vm_page_t's directlypeter2000-05-2110-47/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry. Inspired by: John Dyson's 1998 patches. Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha). Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits). Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files. Reviewed by: dillon
* Fixed bug in madvise() / MADV_WILLNEED. When the request is offsetdillon2000-05-141-1/+5
| | | | | | | from the base of the first map_entry the call to pmap_object_init_pt() uses the wrong start VA. MFC to follow. PR: i386/18095
* Separate the struct bio related stuff out of <sys/buf.h> intophk2000-05-054-0/+4
| | | | | | | | | | | | | | | <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
* Convert the vm_pager_strategy() interface to take a struct bio instead ofphk2000-05-033-84/+62
| | | | | | a struct buf. Don't try to examine B_ASYNC, it is a layering violation to do so. The only current user of this interface is vn(4) which, since it emulates a disk interface, operates on struct bio already.
* Move and staticize the bufchain functions so they become local to thephk2000-05-013-137/+137
| | | | only piece of code using them. This will ease a rewrite of them.
* Remove unneeded #include <vm/vm_zone.h>phk2000-04-301-1/+0
| | | | Generated by: src/tools/tools/kerninclude
* Implement POSIX.1b shared memory objects. In this implementation,wollman2000-04-221-0/+10
| | | | | | | | shared memory objects are regular files; the shm_open(3) routine uses fcntl(2) to set a flag on the descriptor which tells mmap(2) to automatically apply MAP_NOSYNC. Not objected to by: bde, dillon, dufault, jasone
* vm_object_shadow: Remove an incorrect assertion. In obscure circumstancesalc2000-04-191-3/+0
| | | | | vm_object_shadow can be called on an object with ref_count > 1 and OBJ_ONEMAPPING set. This isn't really a problem for vm_object_shadow.
* Remove unneeded <sys/buf.h> includes.phk2000-04-181-1/+0
| | | | | Due to some interesting cpp tricks in lockmgr, the LINT kernel shrinks by 924 bytes.
* Complete the bio/buf divorce for all code below devfs::strategyphk2000-04-153-10/+10
| | | | | | | | | | Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS
* Fix _zget() so that it checks the return from kmem_alloc(), to avoidmsmith2000-04-041-3/+9
| | | | | zttempting to bzero NULL when the kernel map fills up. _zget() will now return NULL as it seems it was originally intended to do.
* Move B_ERROR flag to b_ioflags and call it BIO_ERROR.phk2000-04-024-22/+22
| | | | | | | | | | | | | (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.
* Add necessary spl protection for swapper. The problem was located bydillon2000-03-273-3/+19
| | | | | Alfred while testing his SPLASSERT stuff. This is not a complete fix, more protections are probably needed.
* Revert spelling mistake I made in the previous commitcharnier2000-03-2717-18/+18
| | | | Requested by: Alan and Bruce
OpenPOWER on IntegriCloud