summaryrefslogtreecommitdiffstats
path: root/sys/nfsclient
Commit message (Collapse)AuthorAgeFilesLines
* Make insmntque() externally visibile and allow it to fail (e.g. duringtegge2007-03-131-0/+11
| | | | | | | | | | | | | | | | | | | | | | | late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib
* Back out a chance to nfs_timer() that inadvertantly crept in the last checkin :(mohans2007-03-091-1/+1
|
* Over NFS, an open() call could result in multiple over-the-wiremohans2007-03-094-2/+31
| | | | | | | | | | | | GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.
* Use pause() rather than tsleep() on stack variables and function pointers.jhb2007-02-271-1/+1
|
* Backing out an earlier change. It seems harmless for NFS to miss the "forcemohans2007-02-161-6/+0
| | | | | unmount" flag, making the acquisition of the MNT_ILOCK in nfs_request() and nfs_sigintr() unnecessary. Pointed out by tegge@.
* Add missing MNT_ILOCK around some mnt_kern_flag accesses.mohans2007-02-111-0/+6
|
* Fix for a vnode lock leak in nfs_create() in the event of an error.mohans2007-01-311-0/+2
| | | | Spotted by ups@.
* Instead of always hard-coding the socket type for the nfs root mount askris2007-01-301-1/+1
| | | | | | | | | | | | | SOCK_DGRAM (i.e. UDP), respect the value configured earlier. This allows TCP NFS root mounts using e.g. the boot.nfsroot.options="tcp" tunable. In this case some of the connection parameters like the retry timer were previously set appropriately for TCP but inappropriately for the UDP socket that was actually used, leading to e.g. extremely long recovery times (O(hours)) after a nfs server reboot. Reviewed by: mohans MFC After: 2 weeks
* Unstaticize nfs_iosize() in nfsclient and use it in nfs4client insteadbde2007-01-252-7/+7
| | | | | | | | | | | of duplicating it except for larger style bugs in the copy. Fix some nearby style bugs (including a harmless type mismatch) in and near the remaining copy. This is part of fixing collisions of the 2 nfs*client's names. Even static names should have a unique prefixes so that they can be debugged easily.
* Cylinder group bitmaps and blocks containing inode for a snapshotkib2007-01-231-0/+1
| | | | | | | | | | | | | | | | | | | | | file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)
* NetApp filers return corrupt post op attrs in the wcc on NFS error responses.mohans2006-12-111-1/+8
| | | | | | | This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt for other NFS error responses. For now, disabling wcc pre-op attr checks and post-op attr loads on NFS errors (sysctl'ed). Reported by: Kris Kennaway
* consolidate parsing of nfs root mount options in one placesam2006-12-064-51/+77
| | | | | | | and handle all options (some may require fixes elsewhere) Reviewed by: jhb, mohans MFC after: 1 month
* In nfs_nget(), we must initialize the fh in the nfsnode before inserting themohans2006-11-291-6/+6
| | | | | | vnode into the vfs hash. Otherwise, another thread walking the hash can trip on an nfsnode with an uninitialized or partially initialized fh. Thanks to ups@ for spotting this race.
* bde@ pointed out that tprintf() acquires Giant so callers of tprintf() don'tmohans2006-11-271-6/+4
| | | | | | have to explicitly acquire Giant (although they need to be aware of this and not hold any locks at that point). Remove the acquisitions of Giant in the NFS client wrapping tprintf().
* Fix for a bug caused by a race when 2 threads lookup the samemohans2006-11-271-1/+7
| | | | | | | | file. Leave the loser's lock(s) initialized, so the reclaim logic can unconditionally destroy them when that race occurs (or if the vfs hash insert happened to fail for some other reason). Thanks to ups@ for a careful review of the code. Reported by : Kris Kennaway
* 1) Fix up locking in nfs_up() and nfs_down.mohans2006-11-202-31/+40
| | | | | | | | | | | | | | | 2) Reduce the acquisitions of the Giant lock in the nfs_socket.c paths significantly. - We don't need to acquire Giant before tsleeping on lbolt anymore, since jhb specialcased lbolt handling in msleep. - nfs_up() needs to acquire Giant only if printing the "server up" message. - nfs_timer() held Giant for the duration of the NFS timer processing, just because the printing of the message in nfs_down() needed it (and we acquire other locks in nfs_timer()). The acquisition of Giant is moved down into nfs_down() now, reducing the time Giant is held in that path. Reported by: Kris Kennaway
* vfs_hash_insert() vputs() the losing vnode before returning, in the event ofmohans2006-11-161-2/+1
| | | | | a race where a duplicate vnode is entered into the vfs hash. nfs_nget() shouldn't be releasing the vnode in that case.
* Fix to readdir+ reply handling. When inserting an entry into the namecache,mohans2006-11-161-0/+2
| | | | | initialize the nfsnode's ctime. Otherwise a subsequent lookup purges the just entered namecache entry.
* honor nolockd flag in root mount optionssam2006-11-071-0/+2
| | | | MFC after: 2 weeks
* Make EWOULDBLOCK a recoverable error so that the request is retransmitted.mohans2006-10-311-2/+2
| | | | | | | This bug results in data corruption with NFS/TCP. Writes are silently dropped on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires). Reviewed by: ups@
* Fixed some style bugs (especially ones involving long lines and usebde2006-10-171-17/+19
| | | | of __P(())). There are many more.
* Don't do null Setattr RPCs for VA_MARK_ATIME. When we added thebde2006-10-141-2/+2
| | | | | | | | | | | | | | | | | | VA_MARK_ATIME feature to fix POSIX conformance fore execve() and mmap(), we thought that it was optimized well enough for the one file system that supports it (ffs) and harmless for other file systems (except layered ones which already get the layering for VOP_SETATTR() wrong). However, nfs_setattr() doesn't do much parameter checking, so when it gets a combination of parameters that it doesn't understand, it always does a Setattr RPC. This RPC can't do anything good, and for VA_MARK_ATIME it is null except for wasting a lot of time. This is the smallest and easiest to fix of several bugs that have increased the number of RPCs for kernel builds on nfs by more than 100% since 2004-11-05. The real-time increase depends on network latency and parallelization and can also be very large (approaching the same percentage for unparallelized operations like "make depend" on systems with fast CPUs and high-latency networks).
* First part of a little cleanup in the calendar/timezone/RTC handling.phk2006-10-021-0/+1
| | | | | | Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.
* Add mnt_noasync counter to better handle interleaved calls to nmount(),tegge2006-09-261-1/+1
| | | | | | sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.
* Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.tegge2006-09-261-3/+13
| | | | | This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().
* Fixes up the handling of shared vnode lock lookups in the NFS client,mohans2006-09-135-14/+14
| | | | | | | | | | | | | | | | | | | | adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@
* Fix for a deadlock triggered by a 'umount -f' causing a NFS request to nevermohans2006-08-291-2/+14
| | | | | | retransmit (or return). Thanks to John Baldwin for helping nail this one. Found by : Kris Kennaway
* Fix typos in comment.thomas2006-08-161-1/+1
|
* Introduce a field to struct vm_page for storing flags that arealc2006-08-091-1/+1
| | | | | | | | | | | | | | | | synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().
* Add a new kernel environment variable "boot.netif.mtu" which is used tobrooks2006-08-091-0/+10
| | | | | | | | set the MTU prior to mounting root via NFS. This is required if the server supports a higher than default MTU because the client will not see the responses otherwise. MFC after: 3 weeks
* soreceive_generic(), and sopoll_generic(). Add new functions sosend(),rwatson2006-07-241-11/+6
| | | | | | | | | | | | | | | | soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman
* Signals may be delivered to process as well as to the thread. Check thekib2006-07-081-1/+3
| | | | | | | | thread-delivered signals in addition to the process one. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)
* Always supply curthread as argument to nfs_asyncio and nfs_doiokib2006-07-081-8/+2
| | | | | | | | | in nfs_strategy. Otherwise, for some buffers, signals would be ignored at the intr mounts. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)
* There is a consensus that ifaddr.ifa_addr should never be NULL,yar2006-06-292-6/+7
| | | | | | | | | | except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs. Suggested by: glebius, jhb, ru
* Use the elegant TAILQ_FOREACH() in place of a hand-rolled for() loop.yar2006-06-291-3/+1
|
* Kris Kennaway found that for '/' NFS mounts, the MPSAFE mount flag wasmohans2006-05-301-1/+2
| | | | not being set, which means Giant would be acquired for these mounts.
* Fix for a potential attempt to sleep while holding nm_mtx. Caught and reportedmohans2006-05-261-1/+1
| | | | | | by Witness (which forces the mbuf allocation flag to M_NOWAIT). Reported by: "sekes".
* Call vm_object_page_clean() with the object lock held.ups2006-05-251-0/+2
| | | | | | Submitted by: kensmith@ Reviewed by: mohans@ MFC after: 6 days
* Do not set B_NOCACHE on buffers when releasing them in flushbuflist().ups2006-05-251-0/+11
| | | | | | | | | | | | | | | If B_NOCACHE is set the pages of vm backed buffers will be invalidated. However clean buffers can be backed by dirty VM pages so invalidating them can lead to data loss. Add support for flush dirty page in the data invalidation function of some network file systems. This fixes data losses during vnode recycling (and other code paths using invalbuf(*,V_SAVE,*,*)) for data written using an mmaped file. Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@ Reviewed by: tegge@ MFC after: 7 days
* Since NFSv4 is not SMP safe, nfsiod needs to acquire Giant for NFSv4 mountsmohans2006-05-242-0/+9
| | | | | | before doing the read/write. Reported by: Chuck Lever.
* Adjust minimum iod threads from 4 to 0 -- since we compile the NFSrwatson2006-05-241-1/+1
| | | | | | | | | | | | | | | | | client into the kernel by default, and many users won't use NFS, don't start an extra 4 kernel threads that are unused. Once NFS becomes active, it will start nfsiod's as it needs them. We might consider mandating a minimum iod's equal to the number of active NFS mounts (truncated to some value), which would force some to remain available without having to create a new one if the file system is mostly inactive. PR: 70880 MFC after: 2 weeks Prodded by: cel Head nod: peter Pointed out by: Joe <fbsd_user at a1poweruser dot com>
* NFS over TCP retransmit behavior should default to a 60 second time out,cel2006-05-232-3/+9
| | | | | | | | | | | | mimicing the NFS reference implementation. NFS over TCP does not need fast retransmit timeouts, since network loss and congestion are managed by the transport (TCP), unlike with NFS over UDP. A long timeout prevents the unnecessary retransmission of non- idempotent NFS requests. Reviewed by: mohans, silby, rees? Sponsored by: Network Appliance, Incorporated
* Refactor the NFS over UDP retransmit timeout estimation logic to allowcel2006-05-233-62/+158
| | | | | | | | | | | | | the estimator to be more easily tuned and maintained. There should be no functional change except there is now a lower limit on the retransmit timeout to prevent the client from retransmitting faster than the server's disks can fill requests, and an upper limit to prevent the estimator from taking to long to retransmit during a server outage. Reviewed by: mohan, kris, silby Sponsored by: Network Appliance, Incorporated
* Vnode locks are recursive and the NFS client support shared vnode locks.mohans2006-05-231-0/+5
| | | | Found by: Kris Kennaway.
* Changes to make the NFS client MP safe.mohans2006-05-1910-450/+919
| | | | Thanks to Kris Kennaway for testing and sending lots of bugs my way.
* Fix a snafu caused while patching the previous fix from another branch.mohans2006-05-051-1/+0
|
* Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to getmohans2006-05-051-0/+31
| | | | | out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC errors etc.
* Keep track of the number of in-progress async direct IO writes in the nfsnode.mohans2006-04-063-5/+36
| | | | | Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and nfs_putpage().
* - Busy the filesystem in nfs_statfs to prevent us from creating a newjeff2006-04-011-1/+7
| | | | | | | | | vnode after vflush() has succeeded. This would cause a dangling vnode panic at unmount time otherwise. Other filesystems may have this problem via their VFS_VGET() routines. Found by: kris Sponsored by: Isilon Systems, Inc.
* Fix a bug in the NFS/TCP retransmission path.kris2006-03-231-0/+1
| | | | | | | | | | | | | | | | | The bug was that earlier, if a request was retransmitted, we would do subsequent retransmits every 10 msecs. This can cause data corruption under moderate loads by reordering operations as seen by the client NFS attribute cache, and on the server side when the retransmission occurs after the original request has left the duplicate cache, since the operation will be committed for a second time. Further work on retransmission handling is needed (e.g. they are still being done sent too often since they are scaled by HZ, and the size of the dup cache is too small and easily overwhelmed on busy servers). Submitted by: mohans
OpenPOWER on IntegriCloud