summaryrefslogtreecommitdiffstats
path: root/sys/vm/swap_pager.c
Commit message (Collapse)AuthorAgeFilesLines
* Be consistent about "static" functions: if the function is markedphk2002-09-281-2/+2
| | | | | | static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512
* - Lock access to numoutput on the swap devices.jeff2002-09-251-0/+2
|
* Reduce the maximum KVA reserved for swap meta structures from 70 to 32 MB.dillon2002-08-311-2/+2
| | | | | | Reduce the swap meta calculation by a factor of 2, it's still massive overkill. X-MFC after: immediately
* o Lock page queue accesses by vm_page_free().alc2002-07-211-0/+2
|
* o Lock page queue accesses by vm_page_try_to_cache(). (The accessesalc2002-07-201-0/+2
| | | | | in kern/vfs_bio.c are already locked.) o Assert that the page queues lock is held in vm_page_try_to_cache().
* Avoid using the 64-bit vm_pindex_t in a few places where 64-bitiedowse2002-06-261-12/+14
| | | | | | | | | | | | | types are not required, as the overhead is unnecessary: o In the i386 pmap_protect(), `sindex' and `eindex' represent page indices within the 32-bit virtual address space. o In swp_pager_meta_build() and swp_pager_meta_ctl(), use a temporary variable to store the low few bits of a vm_pindex_t that gets used as an array index. o vm_uiomove() uses `osize' and `idx' for page offsets within a map entry. o In vm_object_split(), `idx' is a page offset within a map entry.
* Use an explicit cast to avoid relying on sign extension to do theiedowse2002-06-261-2/+2
| | | | | | right thing in code such as `vm_pindex_t x = ~SWAP_META_MASK'. Reviewed by: dillon
* o Replace GIANT_REQUIRED in swap_pager_alloc() by the acquisition andalc2002-06-221-3/+4
| | | | release of Giant. (Annotate as MPSAFE.)
* Change callers of mtx_init() to pass in an appropriate lock type name. Injhb2002-04-041-1/+1
| | | | | | | most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64
* Remove references to vm_zone.h and switch over to the new uma API.jeff2002-03-201-14/+10
|
* Remove __P.alfred2002-03-191-16/+16
|
* This is the first part of the new kernel memory allocator. This replacesjeff2002-03-191-8/+8
| | | | | | malloc(9) and vm_zone with a slab like allocator. Reviewed by: arch@
* - Remove a number of extra newlines that do not belong here according toeivind2002-03-101-98/+11
| | | | | | | | | style(9) - Minor space adjustment in cases where we have "( ", " )", if(), return(), while(), for(), etc. - Add /* SYMBOL */ after a few #endifs. Reviewed by: alc
* Use thread0.td_ucred instead of proc0.p_ucred. This change is cosmeticjhb2002-02-271-6/+6
| | | | | | and isn't strictly required. However, it lowers the number of false positives found when grep'ing the kernel sources for p_ucred to ensure proper locking.
* GC: BIO_ORDERED, various infrastructure dealing with BIO_ORDERED.phk2002-02-221-1/+1
|
* Don't use an uninitialized field reserved for callers in the bio structuretegge2001-10-151-3/+4
| | | | | | | passed to swap_pager_strategy(). Instead, use a field reserved for drivers and initialize it before usage. Reviewed by: dillon
* Change the kernel's ucred API as follows:jhb2001-10-111-11/+6
| | | | | | | | - crhold() returns a reference to the ucred whose refcount it bumps. - crcopy() now simply copies the credentials from one credential to another and has no return value. - a new crshared() primitive is added which returns true if a ucred's refcount is > 1 and false (0) otherwise.
* Limit the amount of KVM reserved for the buffer cache and for swap-metadillon2001-08-201-2/+5
| | | | | | | | | | | | | | | information. The default limits only effect machines with > 1GB of ram and can be overriden with two new kernel conf variables VM_SWZONE_SIZE_MAX and VM_BCACHE_SIZE_MAX, or with loader variables kern.maxswzone and kern.maxbcache. This has the effect of leaving more KVM available for sizing NMBCLUSTERS and 'maxusers' and should avoid tripups where a sysad adds memory to a machine and then sees the kernel panic on boot due to running out of KVM. Also change the default swap-meta auto-sizing calculation to allocate half of what it was previously allocating. The prior defaults were way too high. Note that we cannot afford to run out of swap-meta structures so we still stay somewhat conservative here.
* Fixups for the initial allocation by dillon:alfred2001-08-021-7/+15
| | | | | | | | | | | | 1) allocate fewer buckets 2) when failing to allocate swap zone, keep reducing the zone by a third rather than a half in order to reduce the chance of allocating way too little. I also moved around some code for readability. Suggested by: dillon Reviewed by: dillon
* whitespace / register cleanupdillon2001-07-041-1/+1
|
* With Alfred's permission, remove vm_mtx in favor of a fine-grained approachdillon2001-07-041-69/+31
| | | | | | | | | (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
* - Protect all accesses to nsw_[rw]count{,_{,a}sync} with the pbuf mutex.jhb2001-06-221-4/+4
| | | | | - Don't drop the vm mutex while grabbing the pbuf mutex to manipulate said variables.
* - Fix the sw_alloc_interlock to actually lock itself when the lock isjhb2001-05-231-16/+20
| | | | | | | acquired. - Assert Giant is held in the strategy, getpages, and putpages methods and the getchainbuf, flushchainbuf, and waitchainbuf functions. - Always call flushchainbuf() w/o the VM lock.
* aquire Giant when playing with the buffercache and doing IO.alfred2001-05-231-2/+5
| | | | use msleep against the vm mutex while waiting for a page IO to complete.
* aquire vm mutex in swp_pager_async_iodone. Don't call swp_pager_async_iodonealfred2001-05-221-2/+3
| | | | with the mutex held.
* Introduce a global lock for the vm subsystem (vm_mtx).alfred2001-05-191-11/+66
| | | | | | | | | | | | | | | | | | | vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
* Actually biofinish(struct bio *, struct devstat *, int error) is more generalphk2001-05-061-3/+1
| | | | | | than the bioerror(). Most of this patch is generated by scripts.
* Protect pager object creation with sx locks.alfred2001-04-181-10/+13
| | | | | | | | Protect pager object list manipulation with a mutex. It doesn't look possible to combine them under a single sx lock because creation may block and we can't have the object list manipulation block on anything other than a mutex because of interrupt requests.
* protect pbufs and associated counts with a mutexalfred2001-04-131-0/+2
|
* Introduce per-swap area accounting in the VM system, and exportrwatson2001-02-231-1/+10
| | | | | | | | | | | this information via the vm.nswapdev sysctl (number of swap areas) and vm.swapdevX nodes (where X is the device), which contain the MIBs dev, blocks, used, and flags. These changes are required to allow top and other userland swap-monitoring utilities to run without setgid kmem. Submitted by: Thomas Moestl <tmoestl@gmx.net> Reviewed by: freebsd-audit
* - If swap metadata does not fit into the KVM, reduce the number oftanimura2000-12-131-12/+23
| | | | | | | | | | | | | | | struct swblock entries by dividing the number of the entries by 2 until the swap metadata fits. - Reject swapon(2) upon failure of swap_zone allocation. This is just a temporary fix. Better solutions include: (suggested by: dillon) o reserving swap in SWAP_META_PAGES chunks, and o swapping the swblock structures themselves. Reviewed by: alfred, dillon
* Convert more malloc+bzero to malloc+M_ZERO.dwmalone2000-12-081-2/+1
| | | | | Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>
* o Export dmmax ("Maximum size of a swap block") using SYSCTL_INT.rwatson2000-11-201-0/+3
| | | | | This removes a reason that systat requires setgid kmem. More to come.
* Implement a low-memory deadlock solution.dillon2000-11-181-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the *WRONG* page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>
* This patchset fixes a large number of file descriptor race conditions.dillon2000-11-181-2/+4
| | | | | | | | | | | | Pre-rfork code assumed inherent locking of a process's file descriptor array. However, with the advent of rfork() the file descriptor table could be shared between processes. This patch closes over a dozen serious race conditions related to one thread manipulating the table (e.g. closing or dup()ing a descriptor) while another is blocked in an open(), close(), fcntl(), read(), write(), etc... PR: kern/11629 Discussed with: Alexander Viro <viro@math.psu.edu>
* The swap bitmap allocator was not calculating the bitmap size properlydillon2000-10-131-1/+1
| | | | | | | | | | | | | | | in the face of non-stripe-aligned swap areas. The bug could cause a panic during boot. Refuse to configure a swap area that is too large (67 GB or so) Properly document the power-of-2 requirement for SWB_NPAGES. The patch is slightly different then the one Tor enclosed in the P.R., but accomplishes the same thing. PR: kern/20273 Submitted by: Tor.Egge@fast.no
* Implement an optimization of the VM<->pmap API. Pass vm_page_t's directlypeter2000-05-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry. Inspired by: John Dyson's 1998 patches. Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha). Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits). Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files. Reviewed by: dillon
* Separate the struct bio related stuff out of <sys/buf.h> intophk2000-05-051-0/+1
| | | | | | | | | | | | | | | <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
* Convert the vm_pager_strategy() interface to take a struct bio instead ofphk2000-05-031-77/+55
| | | | | | a struct buf. Don't try to examine B_ASYNC, it is a layering violation to do so. The only current user of this interface is vn(4) which, since it emulates a disk interface, operates on struct bio already.
* Move and staticize the bufchain functions so they become local to thephk2000-05-011-0/+137
| | | | only piece of code using them. This will ease a rewrite of them.
* Complete the bio/buf divorce for all code below devfs::strategyphk2000-04-151-2/+2
| | | | | | | | | | Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS
* Move B_ERROR flag to b_ioflags and call it BIO_ERROR.phk2000-04-021-5/+6
| | | | | | | | | | | | | (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.
* Add necessary spl protection for swapper. The problem was located bydillon2000-03-271-3/+4
| | | | | Alfred while testing his SPLASSERT stuff. This is not a complete fix, more protections are probably needed.
* Revert spelling mistake I made in the previous commitcharnier2000-03-271-1/+1
| | | | Requested by: Alan and Bruce
* Spellingcharnier2000-03-261-4/+4
|
* Fix one place which knew that B_WRITE was zero.phk2000-03-221-1/+2
| | | | | | Fix a stylistic mistake of mine while here. Found by: Stephen Hocking <shocking@prth.pgs.com>
* Rename the existing BUF_STRATEGY() to DEV_STRATEGY()phk2000-03-201-3/+3
| | | | | | | | substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.
* Remove B_READ, B_WRITE and B_FREEBUF and replace them with a newphk2000-03-201-13/+12
| | | | | | | | | | | | | | | | | | | | | field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.
* Eliminate the undocumented, experimental, non-delivering and highlyphk2000-03-161-10/+0
| | | | dangerous MAX_PERF option.
* Fix the swap backed vn case - this was broken by my rev 1.128 topeter1999-12-281-13/+8
| | | | | | | | | | | | | | | | | | swap_pager.c and related commits. Essentially swap_pager.c is backed out to before the changes, but swapdev_vp is converted into a real vnode with just VOP_STRATEGY(). It no longer abuses specfs vnops and no longer needs a dev_t and /dev/drum (or /dev/swapdev) for the intermediate layer. This essentially restores the vnode interface as the interface to the bottom of the swap pager, and vm_swap.c provides a clean vnode interface. This will need to be revisited when we swap to files (vnodes) - which is the other reason for keeping the vnode interface between the swap pager and the swap devices. OK'ed by: dillon
OpenPOWER on IntegriCloud