summaryrefslogtreecommitdiffstats
path: root/sys/sys/buf.h
Commit message (Collapse)AuthorAgeFilesLines
* nfs_write() can use the recently introduced vfs_bio_set_valid() instead ofalc2009-05-311-1/+0
| | | | | | vfs_bio_set_validclean(), thereby avoiding the page queues lock. Garbage collect vfs_bio_set_validclean(). Nothing uses it any longer.
* Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). Thisalc2009-05-171-0/+1
| | | | | | eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg(). In collaboration with: tegge
* Fix two issues with bufdaemon, often causing the processes to hang inkib2009-03-161-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | the "nbufkv" sleep. First, ffs background cg group block write requests a new buffer for the shadow copy. When ffs_bufwrite() is called from the bufdaemon due to buffers shortage, requesting the buffer deadlock bufdaemon. Introduce a new flag for getnewbuf(), GB_NOWAIT_BD, to request getblk to not block while allocating the buffer, and return failure instead. Add a flag argument to the geteblk to allow to pass the flags to getblk(). Do not repeat the getnewbuf() call from geteblk if buffer allocation failed and either GB_NOWAIT_BD is specified, or geteblk() is called from bufdaemon (or its helper, see below). In ffs_bufwrite(), fall back to synchronous cg block write if shadow block allocation failed. Since r107847, buffer write assumes that vnode owning the buffer is locked. The second problem is that buffer cache may accumulate many buffers belonging to limited number of vnodes. With such workload, quite often threads that own the mentioned vnodes locks are trying to read another block from the vnodes, and, due to buffer cache exhaustion, are asking bufdaemon for help. Bufdaemon is unable to make any substantial progress because the vnodes are locked. Allow the threads owning vnode locks to help the bufdaemon by doing the flush pass over the buffer cache before getnewbuf() is going to uninterruptible sleep. Move the flushing code from buf_daemon() to new helper function buf_do_flush(), that is called from getnewbuf(). The number of buffers flushed by single call to buf_do_flush() from getnewbuf() is limited by new sysctl vfs.flushbufqtarget. Prevent recursive calls to buf_do_flush() by marking the bufdaemon and threads that temporarily help bufdaemon by TDP_BUFNEED flag. In collaboration with: pho Reviewed by: tegge (previous version) Tested by: glebius, yandex ... MFC after: 3 weeks
* Adjust some variables (mostly related to the buffer cache) that holdjhb2009-03-091-4/+4
| | | | | | | | | | | | | | | | | | | address space sizes to be longs instead of ints. Specifically, the follow values are now longs: runningbufspace, bufspace, maxbufspace, bufmallocspace, maxbufmallocspace, lobufspace, hibufspace, lorunningspace, hirunningspace, maxswzone, maxbcache, and maxpipekva. Previously, a relatively small number (~ 44000) of buffers set in kern.nbuf would result in integer overflows resulting either in hangs or bogus values of hidirtybuffers and lodirtybuffers. Now one has to overflow a long to see such problems. There was a check for a nbuf setting that would cause overflows in the auto-tuning of nbuf. I've changed it to always check and cap nbuf but warn if a user-supplied tunable would cause overflow. Note that this changes the ABI of several sysctls that are used by things like top(1), etc., so any MFC would probably require a some gross shims to allow for that. MFC after: 1 month
* b_waiters cannot be adequately protected by the interlock because it isattilio2008-03-281-39/+15
| | | | | | | | | | | | | | | | dropped after the call to lockmgr() so just revert this approach using something similar to the precedent one: BUF_LOCKWAITERS() just checks if there are waiters (not the actual number of them) and it is based on newly introduced lockmgr_waiters() which returns if the lockmgr has waiters or not. The name has been choosen differently by old lockwaiters() in order to not confuse them. KPI results enriched by this commit so __FreeBSD_version bumping and manpage update will be happening soon. 'struct buf' also changes, so kernel ABI is disturbed. Bug found by: jeff Approved by: jeff, kib
* _lockmgr_args() accepts a 'char *' string as file, so modify _BUF_LOCK()attilio2008-03-281-2/+2
| | | | and _BUF_TIMELOCK() prototypes accordingly with this.
* Instruments buffer lock objects in order to track correctly consumersattilio2008-03-281-38/+33
| | | | | consumers in locking operations. While here, operates some style(9) cleanups.
* - Complete part of the unfinished bufobj work by consistently usingjeff2008-03-221-1/+1
| | | | | | | | | | | | | | | | | BO_LOCK/UNLOCK/MTX when manipulating the bufobj. - Create a new lock in the bufobj to lock bufobj fields independently. This leaves the vnode interlock as an 'identity' lock while the bufobj is an io lock. The bufobj lock is ordered before the vnode interlock and also before the mnt ilock. - Exploit this new lock order to simplify softdep_check_suspend(). - A few sync related functions are marked with a new XXX to note that we may not properly interlock against a non-zero bv_cnt when attempting to sync all vnodes on a mountlist. I do not believe this race is important. If I'm wrong this will make these locations easier to find. Reviewed by: kib (earlier diff) Tested by: kris, pho (earlier diff)
* - Handle buffer lock waiters count directly in the buffer cache insteadattilio2008-03-011-16/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | than rely on the lockmgr support [1]: * bump the waiters only if the interlock is held * let brelvp() return the waiters count * rely on brelvp() instead than BUF_LOCKWAITERS() in order to check for the waiters number - Remove a namespace pollution introduced recently with lockmgr.h including lock.h by including lock.h directly in the consumers and making it mandatory for using lockmgr. - Modify flags accepted by lockinit(): * introduce LK_NOPROFILE which disables lock profiling for the specified lockmgr * introduce LK_QUIET which disables ktr tracing for the specified lockmgr [2] * disallow LK_SLEEPFAIL and LK_NOWAIT to be passed there so that it can only be used on a per-instance basis - Remove BUF_LOCKWAITERS() and lockwaiters() as they are no longer used This patch breaks KPI so __FreBSD_version will be bumped and manpages updated by further commits. Additively, 'struct buf' changes results in a disturbed ABI also. [2] Really, currently there is no ktr tracing in the lockmgr, but it will be added soon. [1] Submitted by: kib Tested by: pho, Andrea Barberio <insomniac at slackware dot it>
* Axe the 'thread' argument from VOP_ISLOCKED() and lockstatus() as it isattilio2008-02-251-1/+1
| | | | | | | | | always curthread. As KPI gets broken by this patch, manpages and __FreeBSD_version will be updated by further commits. Tested by: Andrea Barberio <insomniac at slackware dot it>
* - Introduce lockmgr_args() in the lockmgr space. This function performsattilio2008-02-151-31/+5
| | | | | | | | | | | the same operation of lockmgr() but accepting a custom wmesg, prio and timo for the particular lock instance, overriding default values lkp->lk_wmesg, lkp->lk_prio and lkp->lk_timo. - Use lockmgr_args() in order to implement BUF_TIMELOCK() - Cleanup BUF_LOCK() - Remove LK_INTERNAL as it is nomore used in the lockmgr namespace Tested by: Andrea Barberio <insomniac at slackware dot it>
* - Add real assertions to lockmgr locking primitives.attilio2008-02-131-6/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A couple of notes for this: * WITNESS support, when enabled, is only used for shared locks in order to avoid problems with the "disowned" locks * KA_HELD and KA_UNHELD only exists in the lockmgr namespace in order to assert for a generic thread (not curthread) owning or not the lock. Really, this kind of check is bogus but it seems very widespread in the consumers code. So, for the moment, we cater this untrusted behaviour, until the consumers are not fixed and the options could be removed (hopefully during 8.0-CURRENT lifecycle) * Implementing KA_HELD and KA_UNHELD (not surported natively by WITNESS) made necessary the introduction of LA_MASKASSERT which specifies the range for default lock assertion flags * About other aspects, lockmgr_assert() follows exactly what other locking primitives offer about this operation. - Build real assertions for buffer cache locks on the top of lockmgr_assert(). They can be used with the BUF_ASSERT_*(bp) paradigm. - Add checks at lock destruction time and use a cookie for verifying lock integrity at any operation. - Redefine BUF_LOCKFREE() in order to not use a direct assert but let it rely on the aforementioned destruction time check. KPI results evidently broken, so __FreeBSD_version bumping and manpage update result necessary and will be committed soon. Side note: lockmgr_assert() will be used soon in order to implement real assertions in the vnode namespace replacing the legacy and still bogus "VOP_ISLOCKED()" way. Tested by: kris (earlier version) Reviewed by: jhb
* Cleanup lockmgr interface and exported KPI:attilio2008-01-241-3/+3
| | | | | | | | | | | | | | | | | | | | - Remove the "thread" argument from the lockmgr() function as it is always curthread now - Axe lockcount() function as it is no longer used - Axe LOCKMGR_ASSERT() as it is bogus really and no currently used. Hopefully this will be soonly replaced by something suitable for it. - Remove the prototype for dumplockinfo() as the function is no longer present Addictionally: - Introduce a KASSERT() in lockstatus() in order to let it accept only curthread or NULL as they should only be passed - Do a little bit of style(9) cleanup on lockmgr.h KPI results heavilly broken by this change, so manpages and FreeBSD_version will be modified accordingly by further commits. Tested by: matteo
* - Introduce the function lockmgr_recursed() which returns true if theattilio2008-01-191-24/+12
| | | | | | | | | | | | | | | | | | | lockmgr lkp, when held in exclusive mode, is recursed - Introduce the function BUF_RECURSED() which does the same for bufobj locks based on the top of lockmgr_recursed() - Introduce the function BUF_ISLOCKED() which works like the counterpart VOP_ISLOCKED(9), showing the state of lockmgr linked with the bufobj BUF_RECURSED() and BUF_ISLOCKED() entirely replace the usage of bogus BUF_REFCNT() in a more explicative and SMP-compliant way. This allows us to axe out BUF_REFCNT() and leaving the function lockcount() totally unused in our stock kernel. Further commits will axe lockcount() as well as part of lockmgr() cleanup. KPI results, obviously, broken so further commits will update manpages and freebsd version. Tested by: kris (on UFS and NFS)
* Remove explicit calling of lockmgr() with the NULL argument.attilio2008-01-081-4/+1
| | | | | | | | | | | | | | | | | | Now, lockmgr() function can only be called passing curthread and the KASSERT() is upgraded according with this. In order to support on-the-fly owner switching, the new function lockmgr_disown() has been introduced and gets used in BUF_KERNPROC(). KPI, so, results changed and FreeBSD version will be bumped soon. Differently from previous code, we assume idle thread cannot try to acquire the lockmgr as it cannot sleep, so loose the relative check[1] in BUF_KERNPROC(). Tested by: kris [1] kib asked for a KASSERT in the lockmgr_disown() about this condition, but after thinking at it, as this is a well known general rule, I found it not really necessary.
* Instead of doing comparisons using the pcpu area to see ifjulian2007-03-081-2/+1
| | | | | | | a thread is an idle thread, just see if it has the IDLETD flag set. That flag will probably move to the pflags word as it's permenent and never chenges for the life of the system so it doesn't need locking.
* Cylinder group bitmaps and blocks containing inode for a snapshotkib2007-01-231-0/+4
| | | | | | | | | | | | | | | | | | | | | file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)
* If the buffer lock has waiters after the buffer has changed identity thentegge2006-10-021-0/+11
| | | | | getnewbuf() needs to drop the buffer in order to wake waiters that might sleep on the buffer in the context of the old identity.
* - Add the B_NEEDSGIANT flag which is only set if the vnode that owns a bufjeff2006-03-311-1/+1
| | | | | | | | | requires Giant. It is set in bgetvp and cleared in brelvp. - Create QUEUE_DIRTY_GIANT for dirty buffers that require giant. - In the buf daemon, only grab giant when processing QUEUE_DIRTY_GIANT and only if we think there are buffers in that queue. Sponsored by: Isilon Systems, Inc.
* Changes imported from XFS for FreeBSD project:rodrigc2005-12-071-1/+10
| | | | | | | | | | | | | | - add fields to struct buf (needed by XFS) - 3 private fields: b_fsprivate1, b_fsprivate2, b_fsprivate3 - b_pin_count, count of pinned buffer - add new B_MANAGED flag - add breada() function to initiate asynchronous I/O on read-ahead blocks. - add bufdone_finish(), bpin(), bunpin_wait() functions Patches provided by: kan Reviewed by: phk Silence on: arch@
* Un-staticize waitrunningbufspace() and call it before returning fromtruckman2005-09-301-0/+1
| | | | | | | ffs_copyonwrite() if any async writes were launched. Restore the threads previous TDP_NORUNNINGBUF state before returning from ffs_copyonwrite().
* Un-staticize runningbufwakeup() and staticize updateproc.truckman2005-09-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new private thread flag to indicate that the thread should not sleep if runningbufspace is too large. Set this flag on the bufdaemon and syncer threads so that they skip the waitrunningbufspace() call in bufwrite() rather than than checking the proc pointer vs. the known proc pointers for these two threads. A way of preventing these threads from being starved for I/O but still placing limits on their outstanding I/O would be desirable. Set this flag in ffs_copyonwrite() to prevent bufwrite() calls from blocking on the runningbufspace check while holding snaplk. This prevents snaplk from being held for an arbitrarily long period of time if runningbufspace is high and greatly reduces the contention for snaplk. The disadvantage is that ffs_copyonwrite() can start a large amount of I/O if there are a large number of snapshots, which could cause a deadlock in other parts of the code. Call runningbufwakeup() in ffs_copyonwrite() to decrement runningbufspace before attempting to grab snaplk so that I/O requests waiting on snaplk are not counted in runningbufspace as being in-progress. Increment runningbufspace again before actually launching the original I/O request. Prior to the above two changes, the system could deadlock if enough I/O requests were blocked by snaplk to prevent runningbufspace from falling below lorunningspace and one of the bawrite() calls in ffs_copyonwrite() blocked in waitrunningbufspace() while holding snaplk. See <http://www.holm.cc/stress/log/cons143.html>
* Add a new struct buf flag bit, B_PERSISTENT, and use it to tagtruckman2005-09-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | struct bufs that are persistently held by ext2fs. Ignore any buffers with this flag in the code in boot() that counts "busy" and dirty buffers and attempts to sync the dirty buffers, which is done before attempting to unmount all the file systems during shutdown. This fixes the problem caused by any ext2fs file systems that are mounted at system shutdown time, which caused boot() to give up on a non-zero number of buffers and skip the call to vfs_unmountall(). This left all the mounted file systems in a dirty state and caused them to all require cleanup by fsck on reboot. Move the two separate copies of the "busy" buffer test in boot() to a separate function. Nuke the useless spl() stuff in the ext2fs ULCK_BUF() macro. Bring the PRINT_BUF_FLAGS definition in sys/buf.h up to date with this and previous flag changes. PR: kern/56675, kern/85163 Tested by: "Matthias Andree" matthias.andree at gmx.de Reviewed by: bde MFC after: 3 days
* Do not use vm_pager_init() to initialize vnode_pbuf_freecnt variable.kan2005-08-131-0/+2
| | | | | | | | | | | vm_pager_init() is run before required nswbuf variable has been set to correct value. This caused system to run with single pbuf available for vnode_pager. Handle both cluster_pbuf_freecnt and vnode_pbuf_freecnt variable in the same way. Reported by: ade Obtained from: alc MFC after: 2 days
* - We should never unlock a buf before we've cleared B_REMFREE. I believejeff2005-06-131-0/+2
| | | | | | | | this is happening at the moment and sometimes causing panics later on the package cluster when we bremfree() a buf whose delayed bremfree() did not previously happen. Sponsored by: Isilon Systems, Inc.
* Fix a serious deadlock with the NFS client. Given a large enoughgreen2005-06-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | atomic write request, it can fill the buffer cache with the entirety of that write in order to handle retries. However, it never drops the vnode lock, or else it wouldn't be atomic, so it ends up waiting indefinitely for more buf memory that cannot be gotten as it has it all, and it waits in an uncancellable state. To fix this, hibufspace is exported and scaled to a reasonable fraction. This is used as the limit of how much of an atomic write request by the NFS client will be handled asynchronously. If the request is larger than this, it will be turned into a synchronous request which won't deadlock the system. It's possible this value is far off from what is required by some, so it shall be tunable as soon as mount_nfs(8) learns of the new field. The slowdown between an asynchronous and a synchronous write on NFS appears to be on the order of 2x-4x. General nod by: gad MFC after: 2 weeks More testing: wes PR: kern/79208
* make cluster_callback() staticphk2005-02-101-1/+0
|
* Background writes are entirely an FFS/Softupdates thing.phk2005-02-081-8/+0
| | | | | | | | | | | | | | | | Give FFS vnodes a specific bufwrite method which contains all the background write stuff and then calls into the default bufwrite() for the rest of the job. Remove all the background write related stuff from the normal bufwrite. This drags the softdep_move_dependencies() back into FFS. Long term, it is worth looking at simply copying the data into allocated memory and issuing the bio directly and not create the "shadow buf" in the first place (just like copy-on-write is done in snapshots for instance). I don't think we really gain anything but complexity from doing this with a buf.
* Wrap the bufobj operations in macros: BO_STRATEGY() and BO_WRITE()phk2005-01-111-2/+2
|
* /* -> /*- for license, minor formatting changesimp2005-01-071-1/+1
|
* - Eliminate the acquisition and release of the bqlock in bremfree() byjeff2004-11-181-1/+2
| | | | | | | | | | | | | | | | | | | | setting the B_REMFREE flag in the buf. This is done to prevent lock order reversals with code that must call bremfree() with a local lock held. This also reduces overhead by removing two lock operations per buf for fsync() and similar. - Check for the B_REMFREE flag in brelse() and bqrelse() after the bqlock has been acquired so that we may remove ourself from the free-list. - Provide a bremfreef() function to immediately remove a buf from a free-list for use only by NFS. This is done because the nfsclient code overloads the b_freelist queue for its own async. io queue. - Simplify the numfreebuffers accounting by removing a switch statement that executed the same code in every possible case. - getnewbuf() can encounter locked bufs on free-lists once Giant is removed. Remove a panic associated with this condition and delay asserts that inspect the buf until after it is locked. Reviewed by: phk Sponsored by: Isilon Systems, Inc.
* Add pbgetbo()/pbrelbo() lighter weight versions of pbgetvp()/pbrelvp().phk2004-11-151-0/+2
|
* Retire b_magic now, we have the bufobj containing the same hint.phk2004-11-041-3/+0
|
* Eliminate the embedded struct bio in struct buf.phk2004-11-041-10/+9
| | | | Saves approx 100-170 bytes per buf depending on architecture.
* Remove buf->b_dev field.phk2004-11-041-1/+0
|
* Make the KASSERTS in bstrategy() stop claiming to be bwrite().phk2004-11-031-3/+4
| | | | Spotted by: delphij
* Move the syncer linkage from vnode to bufobj.phk2004-10-271-2/+3
| | | | | This is not quite a perfect separation: the syncer still think it knows that everything is a vnode.
* The island council met and voted buf_prewrite() home.phk2004-10-261-10/+0
| | | | | | | | | | | | | | Give ffs it's own bufobj->bo_ops vector and create a private strategy routine, (currently misnamed for forwards compatibility), which is just a copy of the generic bufstrategy routine except we call softdep_disk_prewrite() directly instead of through the buf_prewrite() indirection. Teach UFS about the need for softdep_disk_prewrite() and call the function directly in FFS. Remove buf_prewrite() from the default bufstrategy() and from the global bio_ops method vector.
* Collapse vnode->v_object and buf->b_object into bufobj->bo_object.phk2004-10-251-1/+2
|
* Move the buffer method vector (buf->b_op) to the bufobj.phk2004-10-241-9/+22
| | | | | | | | | | | | | | | | | Extend it with a strategy method. Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY song and dance. Rename ibwrite to bufwrite(). Move the two NFS buf_ops to more sensible places, add bufstrategy to them. Add inlines for bwrite() and bstrategy() which calls through buf->b_bufobj->b_ops->b_{write,strategy}(). Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
* Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.phk2004-10-221-3/+4
| | | | | | | | | | | | | | | | | | Initialize b_bufobj for all buffers. Make incore() and gbincore() take a bufobj instead of a vnode. Make inmem() local to vfs_bio.c Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj) also VI_MTX() to BO_MTX(), Make buf_vlist_add() take a bufobj instead of a vnode. Eliminate other uses of bp->b_vp where bp->b_bufobj will do. Various minor polishing: remove "register", turn panic into KASSERT, use new function declarations, TAILQ_FOREACH_SAFE() etc.
* Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAITphk2004-10-211-2/+1
| | | | | | | | | | Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write count on a bufobj. Bufobj_wdrop() replaces vwakeup(). Use these functions all relevant places except in ffs_softdep.c where the use if interlocked_sleep() makes this impossible. Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
* Give cluster_write() an explicit vnode argument.phk2004-09-271-1/+1
| | | | In the future a struct buf will not automatically point out a vnode for us.
* Remove unused B_WRITEINPROG flagphk2004-09-151-2/+2
|
* Eliminate unused second argument to reassignbuf() and simplify itphk2004-07-251-1/+1
| | | | accordingly.
* Remove advertising clause from University of California Regent's license,imp2004-04-071-4/+0
| | | | | | per letter dated July 22, 1999. Approved by: core
* Fix copy&paste-o.phk2004-03-121-1/+1
| | | | Spotted by: iedowse
* When I was a kid my work table was one cluttered mess an cleaning it upphk2004-03-111-0/+10
| | | | | | | | | | | | | were a rather overwhelming task. I soon learned that if you don't know where you're going to store something, at least try to pile it next to something slightly related in the hope that a pattern emerges. Apply the same principle to the ffs/snapshot/softupdates code which have leaked into specfs: Add yet a buf-quasi-method and call it from the only two places I can see it can make a difference and implement the magic in ffs_softdep.c where it belongs. It's not pretty, but at least it's one less layer violated.
* Properly vector all bwrite() and BUF_WRITE() calls through the same pathphk2004-03-111-2/+0
| | | | and s/BUF_WRITE()/bwrite()/ since it now does the same as bwrite().
* Send B_PHYS out to pasture, it no longer serves any function.phk2003-11-151-1/+1
|
OpenPOWER on IntegriCloud