FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	Make insmntque() externally visibile and allow it to fail (e.g. during	tegge	2007-03-13	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	late stages of unmount). On failure, the vnode is recycled. Add insmntque1(), to allow for file system specific cleanup when recycling vnode on failure. Change getnewvnode() to no longer call insmntque(). Previously, embryonic vnodes were put onto the list of vnode belonging to a file system, which is unsafe for a file system marked MPSAFE. Change vfs_hash_insert() to no longer lock the vnode. The caller now has that responsibility. Change most file systems to lock the vnode and call insmntque() or insmntque1() after a new vnode has been sufficiently setup. Handle failed insmntque*() calls by propagating errors to callers, possibly after some file system specific cleanup. Approved by: re (kensmith) Reviewed by: kib In collaboration with: kib
*	Back out a chance to nfs_timer() that inadvertantly crept in the last checkin :(	mohans	2007-03-09	1	-1/+1
\|
*	Over NFS, an open() call could result in multiple over-the-wire	mohans	2007-03-09	4	-2/+31
\| \| \| \| \| \| \| \| \| \| \| \|	GETATTRs being generated - one from lookup()/namei() and the other from nfs_open() (for cto consistency). This change eliminates the GETATTR in nfs_open() if an otw GETATTR was done from the namei() path. Instead of extending the vop interface, we timestamp each attr load, and use this to detect whether a GETATTR was done from namei() for this syscall. Introduces a thread-local variable that counts the syscalls made by the thread and uses <pid, tid, thread syscalls> as the attrload timestamp. Thanks to jhb@ and peter@ for a discussion on thread state that could be used as the timestamp with minimal overhead.
*	Use pause() rather than tsleep() on stack variables and function pointers.	jhb	2007-02-27	1	-1/+1
\|
*	Backing out an earlier change. It seems harmless for NFS to miss the "force	mohans	2007-02-16	1	-6/+0
\| \| \| \| \|	unmount" flag, making the acquisition of the MNT_ILOCK in nfs_request() and nfs_sigintr() unnecessary. Pointed out by tegge@.
*	Add missing MNT_ILOCK around some mnt_kern_flag accesses.	mohans	2007-02-11	1	-0/+6
\|
*	Fix for a vnode lock leak in nfs_create() in the event of an error.	mohans	2007-01-31	1	-0/+2
\| \| \| \|	Spotted by ups@.
*	Instead of always hard-coding the socket type for the nfs root mount as	kris	2007-01-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	SOCK_DGRAM (i.e. UDP), respect the value configured earlier. This allows TCP NFS root mounts using e.g. the boot.nfsroot.options="tcp" tunable. In this case some of the connection parameters like the retry timer were previously set appropriately for TCP but inappropriately for the UDP socket that was actually used, leading to e.g. extremely long recovery times (O(hours)) after a nfs server reboot. Reviewed by: mohans MFC After: 2 weeks
*	Unstaticize nfs_iosize() in nfsclient and use it in nfs4client instead	bde	2007-01-25	2	-7/+7
\| \| \| \| \| \| \| \| \| \| \|	of duplicating it except for larger style bugs in the copy. Fix some nearby style bugs (including a harmless type mismatch) in and near the remaining copy. This is part of fixing collisions of the 2 nfs*client's names. Even static names should have a unique prefixes so that they can be debugged easily.
*	Cylinder group bitmaps and blocks containing inode for a snapshot	kib	2007-01-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	file are after snaplock, while other ffs device buffers are before snaplock in global lock order. By itself, this could cause deadlock when bdwrite() tries to flush dirty buffers on snapshotted ffs. If, during the flush, COW activity for snapshot needs to allocate block and ffs_alloccg() selects the cylinder group that is being written by bdwrite(), then kernel would panic due to recursive buffer lock acquision. Avoid dealing with buffers in bdwrite() that are from other side of snaplock divisor in the lock order then the buffer being written. Add new BOP, bop_bdwrite(), to do dirty buffer flushing for same vnode in the bdwrite(). Default implementation, bufbdflush(), refactors the code from bdwrite(). For ffs device buffers, specialized implementation is used. Reviewed by: tegge, jeff, Russell Cattelan (cattelan xfs org, xfs changes) Tested by: Peter Holm X-MFC after: 3 weeks (if ever: it changes ABI)
*	NetApp filers return corrupt post op attrs in the wcc on NFS error responses.	mohans	2006-12-11	1	-1/+8
\| \| \| \| \| \| \|	This is easy to reproduce for EROFS. I am not sure if the attrs can be corrupt for other NFS error responses. For now, disabling wcc pre-op attr checks and post-op attr loads on NFS errors (sysctl'ed). Reported by: Kris Kennaway
*	consolidate parsing of nfs root mount options in one place	sam	2006-12-06	4	-51/+77
\| \| \| \| \| \| \|	and handle all options (some may require fixes elsewhere) Reviewed by: jhb, mohans MFC after: 1 month
*	In nfs_nget(), we must initialize the fh in the nfsnode before inserting the	mohans	2006-11-29	1	-6/+6
\| \| \| \| \| \|	vnode into the vfs hash. Otherwise, another thread walking the hash can trip on an nfsnode with an uninitialized or partially initialized fh. Thanks to ups@ for spotting this race.
*	bde@ pointed out that tprintf() acquires Giant so callers of tprintf() don't	mohans	2006-11-27	1	-6/+4
\| \| \| \| \| \|	have to explicitly acquire Giant (although they need to be aware of this and not hold any locks at that point). Remove the acquisitions of Giant in the NFS client wrapping tprintf().
*	Fix for a bug caused by a race when 2 threads lookup the same	mohans	2006-11-27	1	-1/+7
\| \| \| \| \| \| \| \|	file. Leave the loser's lock(s) initialized, so the reclaim logic can unconditionally destroy them when that race occurs (or if the vfs hash insert happened to fail for some other reason). Thanks to ups@ for a careful review of the code. Reported by : Kris Kennaway
*	1) Fix up locking in nfs_up() and nfs_down.	mohans	2006-11-20	2	-31/+40
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	2) Reduce the acquisitions of the Giant lock in the nfs_socket.c paths significantly. - We don't need to acquire Giant before tsleeping on lbolt anymore, since jhb specialcased lbolt handling in msleep. - nfs_up() needs to acquire Giant only if printing the "server up" message. - nfs_timer() held Giant for the duration of the NFS timer processing, just because the printing of the message in nfs_down() needed it (and we acquire other locks in nfs_timer()). The acquisition of Giant is moved down into nfs_down() now, reducing the time Giant is held in that path. Reported by: Kris Kennaway
*	vfs_hash_insert() vputs() the losing vnode before returning, in the event of	mohans	2006-11-16	1	-2/+1
\| \| \| \| \|	a race where a duplicate vnode is entered into the vfs hash. nfs_nget() shouldn't be releasing the vnode in that case.
*	Fix to readdir+ reply handling. When inserting an entry into the namecache,	mohans	2006-11-16	1	-0/+2
\| \| \| \| \|	initialize the nfsnode's ctime. Otherwise a subsequent lookup purges the just entered namecache entry.
*	honor nolockd flag in root mount options	sam	2006-11-07	1	-0/+2
\| \| \| \|	MFC after: 2 weeks
*	Make EWOULDBLOCK a recoverable error so that the request is retransmitted.	mohans	2006-10-31	1	-2/+2
\| \| \| \| \| \| \|	This bug results in data corruption with NFS/TCP. Writes are silently dropped on EWOULDBLOCK (because socket send buffer is full and sockbuf timer fires). Reviewed by: ups@
*	Fixed some style bugs (especially ones involving long lines and use	bde	2006-10-17	1	-17/+19
\| \| \| \|	of __P(())). There are many more.
*	Don't do null Setattr RPCs for VA_MARK_ATIME. When we added the	bde	2006-10-14	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VA_MARK_ATIME feature to fix POSIX conformance fore execve() and mmap(), we thought that it was optimized well enough for the one file system that supports it (ffs) and harmless for other file systems (except layered ones which already get the layering for VOP_SETATTR() wrong). However, nfs_setattr() doesn't do much parameter checking, so when it gets a combination of parameters that it doesn't understand, it always does a Setattr RPC. This RPC can't do anything good, and for VA_MARK_ATIME it is null except for wasting a lot of time. This is the smallest and easiest to fix of several bugs that have increased the number of RPCs for kernel builds on nfs by more than 100% since 2004-11-05. The real-time increase depends on network latency and parallelization and can also be very large (approaching the same percentage for unparallelized operations like "make depend" on systems with fast CPUs and high-latency networks).
*	First part of a little cleanup in the calendar/timezone/RTC handling.	phk	2006-10-02	1	-0/+1
\| \| \| \| \| \|	Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.
*	Add mnt_noasync counter to better handle interleaved calls to nmount(),	tegge	2006-09-26	1	-1/+1
\| \| \| \| \| \|	sync() and sync_fsync() without losing MNT_ASYNC. Add MNTK_ASYNC flag which is set only when MNT_ASYNC is set and mnt_noasync is zero, and check that flag instead of MNT_ASYNC before initiating async io.
*	Use mount interlock to protect all changes to mnt_flag and mnt_kern_flag.	tegge	2006-09-26	1	-3/+13
\| \| \| \| \|	This eliminates a race where MNT_UPDATE flag could be lost when nmount() raced against sync(), sync_fsync() or quotactl().
*	Fixes up the handling of shared vnode lock lookups in the NFS client,	mohans	2006-09-13	5	-14/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	adds a FS type specific flag indicating that the FS supports shared vnode lock lookups, adds some logic in vfs_lookup.c to test this flag and set lock flags appropriately. - amd on 6.x is a non-starter (without this change). Using amd under heavy load results in a deadlock (with cascading vnode locks all the way to the root) very quickly. - This change should also fix the more general problem of cascading vnode deadlocks when an NFS server goes down. Ideally, we wouldn't need these changes, as enabling shared vnode lock lookups globally would work. Unfortunately, UFS, for example isn't ready for shared vnode lock lookups, crashing pretty quickly. This change is the result of discussions with Stephan Uphoff (ups@). Reviewed by: ups@
*	Fix for a deadlock triggered by a 'umount -f' causing a NFS request to never	mohans	2006-08-29	1	-2/+14
\| \| \| \| \| \|	retransmit (or return). Thanks to John Baldwin for helping nail this one. Found by : Kris Kennaway
*	Fix typos in comment.	thomas	2006-08-16	1	-1/+1
\|
*	Introduce a field to struct vm_page for storing flags that are	alc	2006-08-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	synchronized by the lock on the object containing the page. Transition PG_WANTED and PG_SWAPINPROG to use the new field, eliminating the need for holding the page queues lock when setting or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to VPO_WANTED and VPO_SWAPINPROG, respectively. Eliminate the assertion that the page queues lock is held in vm_page_io_finish(). Eliminate the acquisition and release of the page queues lock around calls to vm_page_io_finish() in kern_sendfile() and vfs_unbusy_pages().
*	Add a new kernel environment variable "boot.netif.mtu" which is used to	brooks	2006-08-09	1	-0/+10
\| \| \| \| \| \| \| \|	set the MTU prior to mounting root via NFS. This is required if the server supports a higher than default MTU because the client will not see the responses otherwise. MFC after: 3 weeks
*	soreceive_generic(), and sopoll_generic(). Add new functions sosend(),	rwatson	2006-07-24	1	-11/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	soreceive(), and sopoll(), which are wrappers for pru_sosend, pru_soreceive, and pru_sopoll, and are now used univerally by socket consumers rather than either directly invoking the old so*() functions or directly invoking the protocol switch method (about an even split prior to this commit). This completes an architectural change that was begun in 1996 to permit protocols to provide substitute implementations, as now used by UDP. Consumers now uniformly invoke sosend(), soreceive(), and sopoll() to perform these operations on sockets -- in particular, distributed file systems and socket system calls. Architectural head nod: sam, gnn, wollman
*	Signals may be delivered to process as well as to the thread. Check the	kib	2006-07-08	1	-1/+3
\| \| \| \| \| \| \| \|	thread-delivered signals in addition to the process one. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)
*	Always supply curthread as argument to nfs_asyncio and nfs_doio	kib	2006-07-08	1	-8/+2
\| \| \| \| \| \| \| \| \|	in nfs_strategy. Otherwise, for some buffers, signals would be ignored at the intr mounts. Reviewed by: mohan MFC after: 1 month Approved by: kan (mentor)
*	There is a consensus that ifaddr.ifa_addr should never be NULL,	yar	2006-06-29	2	-6/+7
\| \| \| \| \| \| \| \| \| \|	except in places dealing with ifaddr creation or destruction; and in such special places incomplete ifaddrs should never be linked to system-wide data structures. Therefore we can eliminate all the superfluous checks for "ifa->ifa_addr != NULL" and get ready to the system crashing honestly instead of masking possible bugs. Suggested by: glebius, jhb, ru
*	Use the elegant TAILQ_FOREACH() in place of a hand-rolled for() loop.	yar	2006-06-29	1	-3/+1
\|
*	Kris Kennaway found that for '/' NFS mounts, the MPSAFE mount flag was	mohans	2006-05-30	1	-1/+2
\| \| \| \|	not being set, which means Giant would be acquired for these mounts.
*	Fix for a potential attempt to sleep while holding nm_mtx. Caught and reported	mohans	2006-05-26	1	-1/+1
\| \| \| \| \| \|	by Witness (which forces the mbuf allocation flag to M_NOWAIT). Reported by: "sekes".
*	Call vm_object_page_clean() with the object lock held.	ups	2006-05-25	1	-0/+2
\| \| \| \| \| \|	Submitted by: kensmith@ Reviewed by: mohans@ MFC after: 6 days
*	Do not set B_NOCACHE on buffers when releasing them in flushbuflist().	ups	2006-05-25	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If B_NOCACHE is set the pages of vm backed buffers will be invalidated. However clean buffers can be backed by dirty VM pages so invalidating them can lead to data loss. Add support for flush dirty page in the data invalidation function of some network file systems. This fixes data losses during vnode recycling (and other code paths using invalbuf(,V_SAVE,,*)) for data written using an mmaped file. Collaborative effort by: jhb@,mohans@,peter@,ps@,ups@ Reviewed by: tegge@ MFC after: 7 days
*	Since NFSv4 is not SMP safe, nfsiod needs to acquire Giant for NFSv4 mounts	mohans	2006-05-24	2	-0/+9
\| \| \| \| \| \|	before doing the read/write. Reported by: Chuck Lever.
*	Adjust minimum iod threads from 4 to 0 -- since we compile the NFS	rwatson	2006-05-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	client into the kernel by default, and many users won't use NFS, don't start an extra 4 kernel threads that are unused. Once NFS becomes active, it will start nfsiod's as it needs them. We might consider mandating a minimum iod's equal to the number of active NFS mounts (truncated to some value), which would force some to remain available without having to create a new one if the file system is mostly inactive. PR: 70880 MFC after: 2 weeks Prodded by: cel Head nod: peter Pointed out by: Joe <fbsd_user at a1poweruser dot com>
*	NFS over TCP retransmit behavior should default to a 60 second time out,	cel	2006-05-23	2	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \|	mimicing the NFS reference implementation. NFS over TCP does not need fast retransmit timeouts, since network loss and congestion are managed by the transport (TCP), unlike with NFS over UDP. A long timeout prevents the unnecessary retransmission of non- idempotent NFS requests. Reviewed by: mohans, silby, rees? Sponsored by: Network Appliance, Incorporated
*	Refactor the NFS over UDP retransmit timeout estimation logic to allow	cel	2006-05-23	3	-62/+158
\| \| \| \| \| \| \| \| \| \| \| \| \|	the estimator to be more easily tuned and maintained. There should be no functional change except there is now a lower limit on the retransmit timeout to prevent the client from retransmitting faster than the server's disks can fill requests, and an upper limit to prevent the estimator from taking to long to retransmit during a server outage. Reviewed by: mohan, kris, silby Sponsored by: Network Appliance, Incorporated
*	Vnode locks are recursive and the NFS client support shared vnode locks.	mohans	2006-05-23	1	-0/+5
\| \| \| \|	Found by: Kris Kennaway.
*	Changes to make the NFS client MP safe.	mohans	2006-05-19	10	-450/+919
\| \| \| \|	Thanks to Kris Kennaway for testing and sending lots of bugs my way.
*	Fix a snafu caused while patching the previous fix from another branch.	mohans	2006-05-05	1	-1/+0
\|
*	Fix for a NFS/TCP client bug which would cause the NFS/TCP stream to get	mohans	2006-05-05	1	-0/+31
\| \| \| \| \|	out of sync under heavy loads, forcing frequent reconnets, causing EBADRPC errors etc.
*	Keep track of the number of in-progress async direct IO writes in the nfsnode.	mohans	2006-04-06	3	-5/+36
\| \| \| \| \|	Make fsync/close wait until all of these drain. Add a check to nfs_getpage() and nfs_putpage().
*	- Busy the filesystem in nfs_statfs to prevent us from creating a new	jeff	2006-04-01	1	-1/+7
\| \| \| \| \| \| \| \| \|	vnode after vflush() has succeeded. This would cause a dangling vnode panic at unmount time otherwise. Other filesystems may have this problem via their VFS_VGET() routines. Found by: kris Sponsored by: Isilon Systems, Inc.
*	Fix a bug in the NFS/TCP retransmission path.	kris	2006-03-23	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The bug was that earlier, if a request was retransmitted, we would do subsequent retransmits every 10 msecs. This can cause data corruption under moderate loads by reordering operations as seen by the client NFS attribute cache, and on the server side when the retransmission occurs after the original request has left the duplicate cache, since the operation will be committed for a second time. Further work on retransmission handling is needed (e.g. they are still being done sent too often since they are scaled by HZ, and the size of the dup cache is too small and easily overwhelmed on busy servers). Submitted by: mohans