summaryrefslogtreecommitdiffstats
path: root/sys/nfsclient/nfs_bio.c
Commit message (Collapse)AuthorAgeFilesLines
* Grab the process lock while calling psignal and before calling psignal.jhb2001-03-071-1/+5
|
* Separate the struct bio related stuff out of <sys/buf.h> intophk2000-05-051-0/+1
| | | | | | | | | | | | | | | <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
* Complete the bio/buf divorce for all code below devfs::strategyphk2000-04-151-3/+3
| | | | | | | | | | Exceptions: Vinum untouched. This means that it cannot be compiled. Greg Lehey is on the case. CCD not converted yet, casts to struct buf (still safe) atapi-cd casts to struct buf to examine B_PHYS
* Move B_ERROR flag to b_ioflags and call it BIO_ERROR.phk2000-04-021-12/+16
| | | | | | | | | | | | | (Much of this done by script) Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED. Move b_pblkno and b_iodone_chain to struct bio while we transition, they will be obsoleted once bio structs chain/stack. Add bio_queue field for struct bio aware disksort. Address a lot of stylistic issues brought up by bde.
* Rename the existing BUF_STRATEGY() to DEV_STRATEGY()phk2000-03-201-2/+2
| | | | | | | | substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.
* Remove B_READ, B_WRITE and B_FREEBUF and replace them with a newphk2000-03-201-11/+13
| | | | | | | | | | | | | | | | | | | | | field in struct buf: b_iocmd. The b_iocmd is enforced to have exactly one bit set. B_WRITE was bogusly defined as zero giving rise to obvious coding mistakes. Also eliminate the redundant struct buf flag B_CALL, it can just as efficiently be done by comparing b_iodone to NULL. Should you get a panic or drop into the debugger, complaining about "b_iocmd", don't continue. It is likely to write on your disk where it should have been reading. This change is a step in the direction towards a stackable BIO capability. A lot of this patch were machine generated (Thanks to style(9) compliance!) Vinum users: Greg has not had time to test this yet, be careful.
* Enhance reassignbuf(). When a buffer cannot be time-optimally inserteddillon2000-01-051-5/+43
| | | | | | | | | | | | | | | | | | | into vnode dirtyblkhd we append it to the list instead of prepend it to the list in order to maintain a 'forward' locality of reference, which is arguably better then 'reverse'. The original algorithm did things this way to but at a huge time cost. Enhance the append interlock for NFS writes to handle intr/soft mounts better. Fix the hysteresis for NFS async daemon I/O requests to reduce the number of unnecessary context switches. Modify handling of NFS mount options. Any given user option that is too high now defaults to the kernel maximum for that option rather then the kernel default for that option. Reviewed by: Alfred Perlstein <bright@wintelcom.net>
* Fix two problems: First, fix the append seek position race that candillon1999-12-141-46/+122
| | | | | | | | | | | | | | | | | | | | | | | | | | | occur due to np->n_size potentially changing if nfs_getcacheblk() blocks in nfs_write(). Second, under -current we must supply the proper bufsize when obtaining buffers that straddle the EOF, but due to the fact that np->n_size can change out from under us it is possible that we may specify the wrong buffer size and wind up truncating dirty data written by another process. Both problems are solved by implementing nfs_rslock(), which allows us to lock around sensitive buffer cache operations such as those that occur when appending to a file. It is believed that this race is responsible for causing dirtyoff/dirtyend and (in stable) validoff/validend to exceed the buffer size. Therefore we have now added a warning printf for the dirtyoff/end case in current. However, we have introduced a new problem which we need to fix at some point, and that is that soft or intr NFS mounts may become uninterruptable from the point of view of process A which is stuck waiting on rslock while process B is stuck doing the rpc. To unstick process A, process B would have to be interrupted first. Reviewed by: Alfred Perlstein <bright@wintelcom.net>
* Synopsis of problem being fixed: Dan Nelson originally reported thatdillon1999-12-121-10/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | blocks of zeros could wind up in a file written to over NFS by a client. The problem only occurs a few times per several gigabytes of data. This problem turned out to be bug #3 below. bug #1: B_CLUSTEROK must be cleared when an NFS buffer is reverted from stage 2 (ready for commit rpc) to stage 1 (ready for write). Reversions can occur when a dirty NFS buffer is redirtied with new data. Otherwise the VFS/BIO system may end up thinking that a stage 1 NFS buffer is clusterable. Stage 1 NFS buffers are not clusterable. bug #2: B_CLUSTEROK was inappropriately set for a 'short' NFS buffer (short buffers only occur near the EOF of the file). Change to only set when the buffer is a full biosize (usually 8K). This bug has no effect but should be fixed in -current anyway. It need not be backported. bug #3: B_NEEDCOMMIT was inappropriately set in nfs_flush() (which is typically only called by the update daemon). nfs_flush() does a multi-pass loop but due to the lack of vnode locking it is possible for new buffers to be added to the dirtyblkhd list while a flush operation is going on. This may result in nfs_flush() setting B_NEEDCOMMIT on a buffer which has *NOT* yet gone through its stage 1 write, causing only the commit rpc to be made and thus causing the contents of the buffer to be thrown away (never sent to the server). The patch also contains some cleanup, which only applies to the commit into -current. Reviewed by: dg, julian Originally Reported by: Dan Nelson <dnelson@emsphone.com>
* useracc() the prequel:phk1999-10-291-1/+0
| | | | | | | | | | | Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ|WRITE} rather than B_{READ|WRITE} as argument.
* Add comment to clarify a commit rpc optimization already being performed.dillon1999-09-201-0/+8
|
* Asynchronized client-side nfs_commit. NFS commit operations weredillon1999-09-171-3/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | previously issued synchronously even if async daemons (nfsiod's) were available. The commit has been moved from the strategy code to the doio code in order to asynchronize it. Removed use of lastr in preparation for removal of vnode->v_lastr. It has been replaced with seqcount, which is already supported by the system and, in fact, gives us a better heuristic for sequential detection then lastr ever did. Made major performance improvements to the server side commit. The server previously fsync'd the entire file for each commit rpc. The server now bawrite()s only those buffers related to the offset/size specified in the commit rpc. Note that we do not commit the meta-data yet. This works still needs to be done. Note that a further optimization can be done (and has not yet been done) on the client: we can merge multiple potential commit rpc's into a single rpc with a greater file offset/size range and greatly reduce rpc traffic. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Add the (inline) function vm_page_undirty for clearing the dirty bitmaskalc1999-08-171-3/+3
| | | | | | | | of a vm_page. Use it. Submitted by: dillon
* nfs_getcacheblk() can return 0 if the mount is interruptible. It need to bedt1999-08-121-1/+5
| | | | | | checked by the caller. Broken in: rev. 1.70 (1999/05/02)
* Convert buffer locking from using the B_BUSY and B_WANTED flags to usingmckusick1999-06-261-1/+2
| | | | | | | lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
* Add a vnode argument to VOP_BWRITE to get rid of the last vnodemckusick1999-06-161-3/+3
| | | | | operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.
* Don't mistake a non-async block that needs to be committed for anpeter1999-06-051-2/+2
| | | | | | interrupted write. Obtained from: fvdl@NetBSD.org via OpenBSD.
* remove b_proc from struct buf, it's (now) unused.phk1999-05-061-9/+7
| | | | Reviewed by: dillon, bde
* The VFS/BIO subsystem contained a number of hacks in order to optimizealc1999-05-021-173/+240
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
* Hold nfsd's upages in-core with PHOLD rather than P_NOSWAP.peter1999-04-061-2/+2
|
* Catch a case spotted by Tor where files mmapped could leave garbage in thejulian1999-04-051-10/+46
| | | | | | | | | | | | unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>
* Reviewed by: Many at differnt times in differnt parts,julian1999-03-121-21/+36
| | | | | | | | | | | | | | | | | | | | | | including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)
* This is a rather large commit that encompasses the new swapper,dillon1999-01-211-11/+27
| | | | | | | | | | changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
* (Hopefully) fix support for "large" files. Mostly cast block numbers to off_tdt1998-12-141-13/+13
| | | | before they multiplied to block sizes.
* The "easy" fixes for compiling the kernel -Wunused: remove unreferenced staticarchie1998-12-071-3/+1
| | | | and local variables, goto labels, and functions declared but not defined.
* Remove [apparently] bogus casts to u_long for the vnode_pager_setsize()peter1998-11-091-2/+2
| | | | | | | second argument. np_size is a 64 bit int, so is the second arg. This might have caused needless 2G/4G file size problems. I believe it was Bruce who queried this.
* Mark directory buffers that have no valid data with B_INVALmckusick1998-09-291-1/+6
| | | | so that they are not put in the cache.
* When adding data to a buffer, we need to clear the B_NEEDCOMMIT flagmckusick1998-09-291-1/+2
| | | | which says that the data is on server but not committed.
* Cosmetic changes to the PAGE_XXX macros to make them consistent withdfr1998-09-041-2/+2
| | | | the other objects in vm.
* Avoid an egcs pessimization for 64-bit signed division on i386's.bde1998-06-141-2/+2
| | | | | | | | | Pre-2.8 versions of gcc generate a call to __divdi3() for all 64-bit signed divisions, but egcs optimizes them to a shift and fixup when the divisor is a constant power of 2. Unfortunately, it generates a call to __cmpdi2() for the fixup, although all except possibly ancient versions of gcc and egcs do ordinary 64-bit comparisons inline.
* Make sure we go a nfs_fsinfo() in get/putpages before callingpeter1998-06-011-30/+70
| | | | | | | | readrpc/writerpc, since they assume it's already been done. This could break if the first read/write access to a nfs filesystem was an exec() or mmap() instead of a read(), write() syscall. (or statfs()). nfs_getpages() could return an errno (EOPNOTSUPP) instead of a VM_PAGER_* return code. Some layout tweaks for the get/putpages code.
* When using NFSv3, use the remote server's idea of the maximum file sizepeter1998-05-301-2/+7
| | | | | | | | | | | | | | | | rather than assuming 2^64. It may not like files that big. :-) On the nfs server, calculate and report the max file size as the point that the block numbers in the cache would turn negative. (ie: 1099511627775 bytes (1TB)). One of the things I'm worried about however, is that directory offsets are really cookies on a NFSv3 server and can be rather large, especially when/if the server generates the opaque directory cookies by using a local filesystem offset in what comes out as the upper 32 bits of the 64 bit cookie. (a server is free to do this, it could save byte swapping depending on the native 64 bit byte order) Obtained from: NetBSD
* A cleaner fix for PR#5102, clear nonsense flags at mount time rather thanpeter1998-05-201-3/+1
| | | | | | in the core of nfs_bio.c at the 11th hour. PR: 5102
* Allow control of the attribute cache timeouts at mount time.peter1998-05-191-3/+5
| | | | | | We had run out of bits in the nfs mount flags, I have moved the internal state flags into a seperate variable. These are no longer visible via statfs(), but I don't know of anything that looks at them.
* Don't allow the readdirplus routine to be used in NFS V2.steve1998-03-281-1/+3
| | | | | | PR: 5102 Reviewed by: msmith Submitted by: Dmitry Kohmanyuk <dk@farm.org>
* Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)julian1998-03-081-1/+5
| | | | | Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
* This mega-commit is meant to fix numerous interrelated problems. Theredyson1998-03-071-22/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | has been some bitrot and incorrect assumptions in the vfs_bio code. These problems have manifest themselves worse on NFS type filesystems, but can still affect local filesystems under certain circumstances. Most of the problems have involved mmap consistancy, and as a side-effect broke the vfs.ioopt code. This code might have been committed seperately, but almost everything is interrelated. 1) Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that are fully valid. 2) Rather than deactivating erroneously read initial (header) pages in kern_exec, we now free them. 3) Fix the rundown of non-VMIO buffers that are in an inconsistent (missing vp) state. 4) Fix the disassociation of pages from buffers in brelse. The previous code had rotted and was faulty in a couple of important circumstances. 5) Remove a gratuitious buffer wakeup in vfs_vmio_release. 6) Remove a crufty and currently unused cluster mechanism for VBLK files in vfs_bio_awrite. When the code is functional, I'll add back a cleaner version. 7) The page busy count wakeups assocated with the buffer cache usage were incorrectly cleaned up in a previous commit by me. Revert to the original, correct version, but with a cleaner implementation. 8) The cluster read code now tries to keep data associated with buffers more aggressively (without breaking the heuristics) when it is presumed that the read data (buffers) will be soon needed. 9) Change to filesystem lockmgr locks so that they use LK_NOPAUSE. The delay loop waiting is not useful for filesystem locks, due to the length of the time intervals. 10) Correct and clean-up spec_getpages. 11) Implement a fully functional nfs_getpages, nfs_putpages. 12) Fix nfs_write so that modifications are coherent with the NFS data on the server disk (at least as well as NFS seems to allow.) 13) Properly support MS_INVALIDATE on NFS. 14) Properly pass down MS_INVALIDATE to lower levels of the VM code from vm_map_clean. 15) Better support the notion of pages being busy but valid, so that fewer in-transit waits occur. (use p->busy more for pageouts instead of PG_BUSY.) Since the page is fully valid, it is still usable for reads. 16) It is possible (in error) for cached pages to be busy. Make the page allocation code handle that case correctly. (It should probably be a printf or panic, but I want the system to handle coding errors robustly. I'll probably add a printf.) 17) Correct the design and usage of vm_page_sleep. It didn't handle consistancy problems very well, so make the design a little less lofty. After vm_page_sleep, if it ever blocked, it is still important to relookup the page (if the object generation count changed), and verify it's status (always.) 18) In vm_pageout.c, vm_pageout_clean had rotted, so clean that up. 19) Push the page busy for writes and VM_PROT_READ into vm_pageout_flush. 20) Fix vm_pager_put_pages and it's descendents to support an int flag instead of a boolean, so that we can pass down the invalidate bit.
* Trivial filesystem getpages/putpages implementations, set the second.msmith1998-03-061-1/+16
| | | | | These should be considered the first steps in a work-in-progress. Submitted by: Terry Lambert <terry@freebsd.org>
* Back out DIAGNOSTIC changes.eivind1998-02-061-2/+1
|
* Turn DIAGNOSTIC into a new-style option.eivind1998-02-041-1/+2
|
* Release the buffer when an error occurs while reading directory entries.tegge1998-01-311-3/+6
|
* Various NFS fixes:dyson1998-01-251-87/+78
| | | | | | | | | | | | | | | | Make vfs_bio buffer mgmt work better. Buffers were being used after brelse. Make nfs_getpages work independently of other NFS interfaces. This eliminates some difficult recursion problems and decreases pagefault overhead. Remove an erroneous vfs_unbusy_pages. Fix a reentrancy problem, with nfs_vinvalbuf when vnode is already being rundown. Reassignbuf wasn't being called when needed under certain circumstances. (Thanks to Bill Paul for help.)
* Make our v_usecount vnode reference count work identically to thedyson1998-01-061-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
* Various of the ISP users have commented that the 1.41 version of thedyson1997-12-081-115/+19
| | | | | | nfs_bio.c code worked better than the 1.44. This commit reverts the important parts of 1.44 to 1.41, and we will fix it when we can get a handle on the problem.
* unifdef -U__NetBSD__ -D__FreeBSD__phk1997-09-101-5/+1
|
* Removed unused #includes.bde1997-08-021-3/+1
|
* Avoid small synchronous writes when an application does lots of random-accessdfr1997-06-251-19/+115
| | | | short writes within a block (e.g. ld).
* Upgrade NFS to support the new vfs_bio resource/buffer management.dyson1997-06-161-1/+2
|
* Fix a problem caused by removing large numbers of files from a directorydfr1997-06-061-2/+6
| | | | which could cause a bad size to be given to uiomove, causing a page fault.
OpenPOWER on IntegriCloud