summaryrefslogtreecommitdiffstats
path: root/sys/vm/swap_pager.c
Commit message (Collapse)AuthorAgeFilesLines
* Don't use an uninitialized field reserved for callers in the bio structuretegge2001-10-151-3/+4
| | | | | | | passed to swap_pager_strategy(). Instead, use a field reserved for drivers and initialize it before usage. Reviewed by: dillon
* Change the kernel's ucred API as follows:jhb2001-10-111-11/+6
| | | | | | | | - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
* Limit the amount of KVM reserved for the buffer cache and for swap-metadillon2001-08-201-2/+5
| | | | | | | | | | | | | | | information. The default limits only effect machines with > 1GB of ram and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and kern.maxbcache. This has the effect of leaving more KVM available for sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad adds memory to a machine and then sees the kernel panic on boot due to running out of KVM. Also change the default swap-meta auto-sizing calculation to allocate half of what it was previously allocating. The prior defaults were way too high. Note that we cannot afford to run out of swap-meta structures so we still stay somewhat conservative here.
* Fixups for the initial allocation by dillon:alfred2001-08-021-7/+15
| | | | | | | | | | | | 1) allocate fewer buckets 2) when failing to allocate swap zone, keep reducing the zone by a third rather than a half in order to reduce the chance of allocating way too little. I also moved around some code for readability. Suggested by: dillon Reviewed by: dillon
* whitespace / register cleanupdillon2001-07-041-1/+1
|
* With Alfred's permission, remove vm_mtx in favor of a fine-grained approachdillon2001-07-041-69/+31
| | | | | | | | | (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
* - Protect all accesses to nsw_[rw]count{,_{,a}sync} with the pbuf mutex.jhb2001-06-221-4/+4
| | | | | - Don't drop the vm mutex while grabbing the pbuf mutex to manipulate said variables.
* - Fix the sw_alloc_interlock to actually lock itself when the lock isjhb2001-05-231-16/+20
| | | | | | | acquired. - Assert Giant is held in the strategy, getpages, and putpages methods and the getchainbuf, flushchainbuf, and waitchainbuf functions. - Always call flushchainbuf() w/o the VM lock.
* aquire Giant when playing with the buffercache and doing IO.alfred2001-05-231-2/+5
| | | | use msleep against the vm mutex while waiting for a page IO to complete.
* aquire vm mutex in swp_pager_async_iodone. Don't call swp_pager_async_iodonealfred2001-05-221-2/+3
| | | | with the mutex held.
* Introduce a global lock for the vm subsystem (vm_mtx).alfred2001-05-191-11/+66
| | | | | | | | | | | | | | | | | | | vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
* Actually biofinish(struct bio *, struct devstat *, int error) is more generalphk2001-05-061-3/+1
| | | | | | than the bioerror(). Most of this patch is generated by scripts.
* Protect pager object creation with sx locks.alfred2001-04-181-10/+13
| | | | | | | | Protect pager object list manipulation with a mutex. It doesn't look possible to combine them under a single sx lock because creation may block and we can't have the object list manipulation block on anything other than a mutex because of interrupt requests.
* protect pbufs and associated counts with a mutexalfred2001-04-131-0/+2
|
* Introduce per-swap area accounting in the VM system, and exportrwatson2001-02-231-1/+10
| | | | | | | | | | | this information via the vm.nswapdev sysctl (number of swap areas) and vm.swapdevX nodes (where X is the device), which contain the MIBs dev, blocks, used, and flags. These changes are required to allow top and other userland swap-monitoring utilities to run without setgid kmem. Submitted by: Thomas Moestl <tmoestl@gmx.net> Reviewed by: freebsd-audit
* - If swap metadata does not fit into the KVM, reduce the number oftanimura2000-12-131-12/+23
| | | | | | | | | | | | | | | struct swblock entries by dividing the number of the entries by 2 until the swap metadata fits. - Reject swapon(2) upon failure of swap_zone allocation. This is just a temporary fix. Better solutions include: (suggested by: dillon) o reserving swap in SWAP_META_PAGES chunks, and o swapping the swblock structures themselves. Reviewed by: alfred, dillon
* Convert more malloc+bzero to malloc+M_ZERO.dwmalone2000-12-081-2/+1
| | | | | Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
* o Export dmmax ("Maximum size of a swap block") using SYSCTL_INT.rwatson2000-11-201-0/+3
| | | | | This removes a reason that systat requires setgid kmem. More to come.
* Implement a low-memory deadlock solution.dillon2000-11-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
* This patchset fixes a large number of file descriptor race conditions.dillon2000-11-181-2/+4
| | | | | | | | | | | | Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
* The swap bitmap allocator was not calculating the bitmap size properlydillon2000-10-131-1/+1
| | | | | | | | | | | | | | | in the face of non-stripe-aligned swap areas. The bug could cause a panic during boot. Refuse to configure a swap area that is too large (67 GB or so) Properly document the power-of-2 requirement for SWB_NPAGES. The patch is slightly different then the one Tor enclosed in the P.R., but accomplishes the same thing. PR: kern/20273 Submitted by: Tor.Egge@fast.no
* Implement an optimization of the VM<->pmap API. Pass vm_page_t's directlypeter2000-05-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry. Inspired by: John Dyson's 1998 patches. Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha). Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits). Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files. Reviewed by: dillon
* Separate the struct bio related stuff out of <sys/buf.h> intophk2000-05-051-0/+1
| | | | | | | | | | | | | | | <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
* Convert the vm_pager_strategy() interface to take a struct bio instead ofphk2000-05-031-77/+55
| | | | | | a struct buf. Don't try to examine B_ASYNC, it is a layering violation to do so. The only current user of this interface is vn(4) which, since it emulates a disk interface, operates on struct bio already.
* Move and staticize the bufchain functions so they become local to thephk2000-05-011-0/+137
| | | | only piece of code using them. This will ease a rewrite of them.
* Complete the bio/buf divorce for all code below devfs::strategyphk2000-04-151-2/+2
| | | | | | | | | | Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS
* Move B_ERROR flag to b_ioflags and call it BIO_ERROR.phk2000-04-021-5/+6
| | | | | | | | | | | | | (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.
* Add necessary spl protection for swapper. The problem was located bydillon2000-03-271-3/+4
| | | | | Alfred while testing his SPLASSERT stuff. This is not a complete fix, more protections are probably needed.
* Revert spelling mistake I made in the previous commitcharnier2000-03-271-1/+1
| | | | Requested by: Alan and Bruce
* Spellingcharnier2000-03-261-4/+4
|
* Fix one place which knew that B_WRITE was zero.phk2000-03-221-1/+2
| | | | | | Fix a stylistic mistake of mine while here. Found by: Stephen Hocking <shocking@prth.pgs.com>
* Rename the existing BUF_STRATEGY() to DEV_STRATEGY()phk2000-03-201-3/+3
| | | | | | | | substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.
* Remove B_READ, B_WRITE and B_FREEBUF and replace them with a newphk2000-03-201-13/+12
| | | | | | | | | | | | | | | | | | | | | field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.
* Eliminate the undocumented, experimental, non-delivering and highlyphk2000-03-161-10/+0
| | | | dangerous MAX_PERF option.
* Fix the swap backed vn case - this was broken by my rev 1.128 topeter1999-12-281-13/+8
| | | | | | | | | | | | | | | | | | swap_pager.c and related commits. Essentially swap_pager.c is backed out to before the changes, but swapdev_vp is converted into a real vnode with just VOP_STRATEGY(). It no longer abuses specfs vnops and no longer needs a dev_t and /dev/drum (or /dev/swapdev) for the intermediate layer. This essentially restores the vnode interface as the interface to the bottom of the swap pager, and vm_swap.c provides a clean vnode interface. This will need to be revisited when we swap to files (vnodes) - which is the other reason for keeping the vnode interface between the swap pager and the swap devices. OK'ed by: dillon
* Isolate the swapdev_vp "not quite" vnode in the only source file whichphk1999-11-221-0/+7
| | | | | | needs it now that /dev/drum is gone. Reviewed by: eivind, peter
* Remove the non-functional "swap device" userland front-end to thepeter1999-11-181-7/+7
| | | | | | | | | | | | | | | | | | | | | multiplexed underlying swap devices (/dev/drum). The only thing it did was to allow root to open /dev/drum, but not do anything with it. Various utilities used to grovel around in here, but Matt has written a much nicer (and clean) front-end to this for libkvm, and nothing uses the old system any more. The VM system was calling VOP_STRATEGY() on the vp of the first underlying swap device (not the /dev/drum one, the first real device), and using the VOP system to indirectly (and only) call swstrategy() to choose an underlying device and enqueue it on that device. I have changed it to avoid diverting through the VOP system and to call the only possible target directly, saving a little bit of time and some complexity. In all, nothing much changes, except some scaffolding to support the roundabout way of calling swstrategy() is gone. Matt gave me the ok to do this some time ago, and I apologize for taking so long to get around to it.
* useracc() the prequel:phk1999-10-291-1/+0
| | | | | | | | | | | Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
* Fix a number of spl bugs related to reserving and freeing swap space.dillon1999-09-171-291/+232
| | | | | | | | | | | | | | | | | | | | Swap space can be freed from an interrupt and so swap reservation and freeing must occur at splvm. Add swap_pager_reserve() code to support a new swap pre-reservation capability for the VN device. Generally cleanup the swap code by simplifying the swp_pager_meta_build() static function and consolidating the SWAPBLK_NONE test from a bit test to an absolute compare. The bit test was left over from a rejected swap allocation scheme that was not ultimately committed. A few other minor cleanups were also made. Reorganize the swap strategy code, again for VN support, to not reallocate swap when writing as this messes up pre-reservation and can fragment I/O unnecessarily as VN-baesd disk is messed around with. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Use devtoname to print dev_t's instead of casting them to u_long forbde1999-08-231-6/+7
| | | | | | misprinting with %lx. Cast pointers to intptr_t instead of casting them to long. Cosmetic.
* Correct an accidental omission of one "vm_page_undirty" replacementalc1999-08-171-2/+2
| | | | from the previous commit.
* Add the (inline) function vm_page_undirty for clearing the dirty bitmaskalc1999-08-171-2/+2
| | | | | | | | of a vm_page. Use it. Submitted by: dillon
* Remove vm_object::last_read. It is used by the old swap pager, butalc1999-07-161-2/+1
| | | | | | not by the new one, i.e., vm/swap_pager.c rev 1.108. Reviewed by: dillon@backplane.com
* Kirk missed a required BUF_KERNPROC(). Even though this is a non-asyncpeter1999-06-271-1/+2
| | | | | transfer, the b_iodone hook causes biodone() to release it from interrupt context.
* Convert buffer locking from using the B_BUSY and B_WANTED flags to usingmckusick1999-06-261-4/+5
| | | | | | | lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
* remove b_proc from struct buf, it's (now) unused.phk1999-05-061-5/+3
| | | | Reviewed by: dillon, bde
* Submitted by: Matt Dillon <dillon@freebsd.org>julian1999-03-141-17/+242
| | | | | | | | | | | The old VN device broke in -4.x when the definition of B_PAGING changed. This patch fixes this plus implements additional capabilities. The new VN device can be backed by a file ( as per normal ), or it can be directly backed by swap. Due to dependencies in VM include files (on opt_xxx options) the new vn device cannot be a module yet. This will be fixed in a later commit. This commit delimitted by tags {PRE,POST}_MATT_VNDEV
* Remove conditional sysctl'sdillon1999-02-211-46/+4
| | | | | | Leave swap_async_max sysctl intact, remove swap_cluster_max sysctl. Reviewed by: Alan Cox <alc@cs.rice.edu>
* Reviewed by: Alan Cox <alc@cs.rice.edu>dillon1999-02-211-9/+15
| | | | Fix problem w/ low-swap/low-memory handling as reported by Bruce Evans.
OpenPOWER on IntegriCloud