| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
After nullfs rmdir operation, reclaim the directory vnode which was
unlinked. Otherwise the vnode stays cached, causing leak. This is
similar to r292961 for regular files.
Approved by: re (marius)
|
|
|
|
|
|
| |
Clear the cookie pointer on error in tmpfs_readdir().
Approved by: re (glebius)
|
|
|
|
|
|
|
|
|
|
|
|
| |
ext2fs: Remove panics for rename() race conditions.
Sync with r84642 from UFS:
The panics are inappropriate because the IN_RENAME flag only fixes a
few of the huge number of race conditions that can result in the
source path becoming invalid even prior to the VOP_RENAME() call.
Approved by: re (glebius)
|
|
|
|
|
|
|
|
| |
When devfs dirent is freed, a vnode might still keep a pointer to it,
apparently. Interlock and clear the pointer to avoid free memory
dereference.
Approved by: re (marius)
|
|
|
|
|
|
|
|
|
|
|
| |
Revert r294695; passthrough any extra timestamps to the dinode struct.
The original ext2fs change worked fine on disks formated with default
values, but it was the cause of a regression when inodes are small.
Revert it for now, while we figure out safer ways pass such values,
PR: 206820
Approved by: re
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ext2fs: passthrough any extra timestamps to the dinode struct.
In general we don't trust any of the extended timestamps unless the
EXT2F_ROCOMPAT_EXTRA_ISIZE feature is set. However, in the case where
we freshly allocated a new inode the information is valid and it is
better to pass it along instead of leaving the value undefined.
This should have no practical effect but should reduce the amount of
garbage if EXT2F_ROCOMPAT_EXTRA_ISIZE is set, like in cases where the
filesystem is converted from ext3 to ext4.
|
|
|
|
|
|
|
|
|
| |
bufobj lock
Add locking around access to bv_cnt which is currently being done unlocked
Approved by: jhb
Sponsored by: Panasas, Inc.
|
|
|
|
|
| |
Hide transient EBADF errors caused by the parallel revoke(2) or forced
unmount of devfs mounts, by restarting the failed syscall.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
286974:
Remove reference to non-existent kern_openat(9).
291653:
The cdevpriv_dtr_t typedef was not able to be used in a function prototype
like the various d_*_t typedefs since it declared a function pointer rather
than a function. Add a new d_priv_dtor_t typedef that declares the function
and can be used as a function prototype. The previous typedef wasn't
useful outside of the cdevpriv implementation, so retire it.
The name d_priv_dtor_t was chosen to be more consistent with cdev methods
since it is commonly used in place of d_close_t even though it is not a
direct pointer in struct cdevsw.
|
|
|
|
|
|
|
| |
ext4: add support for reading sparse files
Our older GCC can't handle anonymous unions, so
ia64 and powerpc LINT kernels are now failing.
|
|
|
|
|
|
|
|
|
|
|
|
| |
ext4: add support for reading sparse files
Add support for sparse files in ext4. Also implement read-ahead, which
greatly increases the performance when transferring files from ext4.
The sparse file support has become very common in ext4.
Both features implemented by Damjan Jovanovic.
PR: 205816
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change the type of newsize argument in the smbfs_smb_setfsize() function
from int to int64.
MSDN says that SMB_SET_FILE_END_OF_FILE_INFO uses signed 64-bit integer
to specify offset, but since smbfs_smb_setfsize() has used plain int,
a value was truncated in case when offset was larger than 2G.
https://msdn.microsoft.com/en-us/library/ff469975.aspx
In particular, now `truncate -s 10G` will work correctly on the mounted
SMB share.
|
|
|
|
|
|
|
|
|
|
|
| |
ext4: mount panic from freeing invalid pointers
Initialize the struct with those fields to zeroes on allocation,
preventing the panic.
Patch by: Damjan Jovanovic.
PR: 206056
|
|
|
|
|
|
|
|
|
|
| |
ext2fs: reading mmaped file in Ext4 causes panic
Always call brelse(path.ep_bp), fixing reading EXT4 files using mmap().
Patch by Damjan Jovanovic.
PR: 205938
|
|
|
|
| |
Hide vfs.pfs.trace variable if it is not used.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MFC r275121 (by kib). Only merge the syntax changes from r275121,
PROC_*LOCK() macros still lock the same proc spinlock.
The process spin lock currently has the following distinct uses:
- Threads lifetime cycle, in particular, counting of the threads in
the process, and interlocking with process mutex and thread lock.
The main reason of this is that turnstile locks are after thread
locks, so you e.g. cannot unlock blockable mutex (think process
mutex) while owning thread lock.
- Virtual and profiling itimers, since the timers activation is done
from the clock interrupt context. Replace the p_slock by p_itimmtx
and PROC_ITIMLOCK().
- Profiling code (profil(2)), for similar reason. Replace the p_slock
by p_profmtx and PROC_PROFLOCK().
- Resource usage accounting. Need for the spinlock there is subtle,
my understanding is that spinlock blocks context switching for the
current thread, which prevents td_runtime and similar fields from
changing (updates are done at the mi_switch()). Replace the p_slock
by p_statmtx and PROC_STATLOCK().
Discussed with: kib
|
|
|
|
| |
Minor style cleanup.
|
|
|
|
|
|
|
|
|
|
| |
ext2: recognize ext4 INCOMPAT_RECOVER flag
This is a flag specific for journalling in ext4.
Add it to the list of ext4 features we ignore for
read-only purposes.
PR: 205668
|
|
|
|
|
| |
Force nullfs vnode reclaim after unlinking, to potentially unlink
lower vnode.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This MFC includes changes to better manage the vnode freelist
and to streamline the allocation and freeing of vnodes.
Note that to maintain the KPI the VI_AGE flag is left defined
in sys/vnode.h though its use is dropped as described in 291380.
To maintain KBI the vfs.vlru_alloc_cache_src sysctl variable
remains though it no longer has any effect as described in 291244.
MFC of 291244:
Move the comment about resident pages preventing vnode from leaving
active list, into the header comment for vdrop(), which is the
function that decides whether to leave the vnode on the list. Note
that dirty page write-out in vinactive() is asynchronous.
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC of 291380:
Remove VI_AGE vnode iflag, it is unused.
Noted by: bde
Sponsored by: The FreeBSD Foundation
MFC of 291459:
For performance reasons, it is useful to have a single string used as
the name of a filesystem when setting it as the first parameter to the
getnewvnode() function. Most filesystems call getnewvnode from just one
place so can use a literal string as the first parameter. However, NFS
calls getnewvnode from two places, so we create a global constant string
that can be used by the two instances. This change also collapses two
instances of getnewvnode() in the UFS filesystem to a single call.
Reviewed by: kib
Tested by: Peter Holm
MFC of 291460:
As the kernel allocates and frees vnodes, it fully initializes them
on every allocation and fully releases them on every free. These
are not trivial costs: it starts by zeroing a large structure then
initializes a mutex, a lock manager lock, an rw lock, four lists,
and six pointers. And looking at vfs.vnodes_created, these operations
are being done millions of times an hour on a busy machine.
As a performance optimization, this code update uses the uma_init
and uma_fini routines to do these initializations and cleanups only
as the vnodes enter and leave the vnode_zone. With this change the
initializations are only done kern.maxvnodes times at system startup
and then only rarely again. The frees are done only if the vnode_zone
shrinks which never happens in practice. For those curious about the
avoided work, look at the vnode_init() and vnode_fini() functions in
kern/vfs_subr.c to see the code that has been removed from the main
vnode allocation/free path.
Reviewed by: kib
Tested by: Peter Holm
MFC of 291671:
We need to zero out the union of pointers in a freed vnode structure.
Fix from: Mateusz Guzik
Tested by: Jason Unovitch
MFC of 291743:
We need to zero out the clustering variables in a freed vnode structure.
For completeness add a VNASSERT that there are no threads waiting on a
range lock (this was previously checked on every vnode free).
Reported by; Rick Macklem
Fix from: Mateusz Guzik
|
|
|
|
|
| |
Keep devfs mount locked for the whole duration of the devfs_setattr(),
and ensure that our dirent is instantiated.
|
|
|
|
|
|
|
| |
Fix the memory leak that occurs when the nfscommon.ko module is unloaded.
This leak was introduced by r291527 (r292223 in stable/10).
Since the nfscommon.ko module is rarely unloaded, this leak would not
have been much of an issue.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add kernel support to the NFS server for the "-manage-gids"
option that will be added to the nfsuserd daemon in a future
commit. It modifies the cache used by NFSv4 for name<-->id
translation (both username/uid and group/gid) to support this.
When "-manage-gids" is set, the server looks up each uid
for the RPC and uses the list of groups cached in the server
instead of the list of groups provided in the RPC request.
The cached group list is acquired for the cache by the nfsuserd
daemon via getgrouplist(3).
This avoids the 16 groups limit for the list in the RPC request.
Since the cache is now used for every RPC when "-manage-gids"
is enabled, the code also modifies the cache to use a separate
mutex for each hash list instead of a single global mutex.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the nfsd threads are terminated, the NFSv4 server state
(opens, locks, etc) is retained, which I believe is correct behaviour.
However, for NFSv4.1, the server also retained a reference to the xprt
(RPC transport socket structure) for the backchannel. This caused
svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed
a socket upcall to occur after the mutexes in the svcpool were destroyed,
causing a crash.
This patch fixes the code so that the backchannel xprt structure is
dereferenced just before svcpool_destroy() is called, so the code
does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall.
|
|
|
|
|
|
|
|
| |
Revert r283330 since it broke directory caching in the client.
At this time I cannot see a way to fix directory caching when it
has partial blocks in the buffer cache, due to the fact that the
syscall's uio_offset won't stay the same as the lblkno * NFS_DIRBLKSIZ
offset.
|
|
|
|
|
|
|
|
|
|
| |
mnt_stat.f_iosize (which is used to set bo_bsize) must be set to
the largest size of buffer cache block or the mapping of the buffer
is bogus. When a mount with rsize=4096,wsize=4096 was done, f_iosize
would be set to 4096. This resulted in corrupted directory data, since
the buffer cache block size for directories is NFS_DIRBLKSIZ (8192).
This patch fixes the code so that it always sets f_iosize to at least
NFS_DIRBLKSIZ.
|
|
|
|
|
|
|
| |
After r286237 it should be fine to call vgone(9) on a busy GEOM vnode;
remove KASSERT that would prevent forced devfs unmount from working.
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide vnode in memory map info for files on tmpfs
When providing memory map information to userland, populate the vnode pointer
for tmpfs files. Set the memory mapping to appear as a vnode type, to match
FreeBSD 9 behavior.
This fixes the use of tmpfs files with the dtrace pid provider,
procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY).
Submitted by: Eric Badger <eric@badgerio.us> (initial revision)
Obtained from: Dell Inc.
PR: 198431
|
|
|
|
|
| |
Ensure that when a blockable open of fifo returns success, a valid
file descriptor opened for complimentary access exists as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Various fixes to orphan handling which also fix issues with following
forks.
283281:
Always set p_oppid when attaching to an existing process via procfs
tracing. This matches the behavior of ptrace(PT_ATTACH). Also,
the procfs detach request assumes p_oppid is always set.
283282:
Only reparent a traced process to its old parent if the tracing process is
not the old parent. Otherwise, proc_reap() will leave the zombie in place
resulting in the process' status being returned twice to its parent.
Add test cases for PT_TRACE_ME and PT_ATTACH which are fixed by
this change.
283562:
Do not allow a process to reap an orphan (a child currently being
traced by another process such as a debugger). The parent process does
need to check for matching orphan pids to avoid returning ECHILD if an
orphan has exited, but it should not return the exited status for the
child until after the debugger has detached from the orphan process
either explicitly or implicitly via wait().
Add two tests for for this case: one where the debugger is the direct
child (thus the parent has a non-empty children list) and one where
the debugger is not a direct child (so the only "child" of the parent
is the orphan).
283647:
Tweak the description of when waitpid() doesn't return any status for a
non-blocking wait to avoid the word "empty".
283836:
Consistently only use one end of the pipe in the parent and debugger
processes and do not rely on EOF due to a close() in the debugger.
284000:
Add a CHILD_REQUIRE macro similar to ATF_REQUIRE for use in child processes
of the main test process.
286158:
Clear P_TRACED before reparenting a detached process back to its
original parent. Otherwise the debugee will be set as an orphan of
the debugger.
Add tests for tracing forks via PT_FOLLOW_FORK.
|
|
|
|
|
|
|
|
| |
For the case where an NFSv4.1 ExchangeID operation has the client identifier
that already has a confirmed ClientID, the nfsrv_setclient() function would
not fill in the clientidp being returned. As such, the value of ClientID
returned would be whatever garbage was on the stack.
This patch fixes the problem by filling in these fields.
|
|
|
|
|
|
|
|
| |
This patch fixes a problem where, if the NFSv4 server has a previous
unconfirmed clientid structure for the same client on the last hash list,
this old entry would not be removed/deleted. I do not think this bug would have
caused serious problems, since the new entry would have been before the old one
on the list. This old entry would have eventually been scavenged/removed.
|
|
|
|
|
|
|
|
|
| |
If a "principal" argument isn't provided for a Kerberized NFS mount,
the kernel would generate a bogus one with a ":/<path>" suffix.
This would only occur for the case where there was no explicit
"principal" argument and the getaddrinfo() call in mount_nfs.c failed to a
return a cannonical name for the server.
This patch fixes this unusual case.
|
|
|
|
|
|
|
|
|
|
|
| |
Alex Burlyga reported a POLA violation for the new NFS client as
compared to the old NFS client via email to the freebsd-fs@ mailing list.
For the new client, when multiple clients attempted to create a symbolic
link concurrently, more that one client would report success instead of
EEXIST. This was caused by code in the new client that mapped EEXIST to
OK assuming it was caused by a retried RPC request.
Since the old client did not do this, the patch defaults to the old
behaviour and permits the new behaviour to be enabled via a sysctl.
|
|
|
|
| |
Restore the td_cookie value upon detach.
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clear p_stops when doing PT_DETACH and PROCFS_CTL_DETACH.
Without this, if a process was being traced by truss(1), which
uses different p_stops bits than gdb(1), the latter would
misbehave because of the unexpected bits.
Reported by: jceel
Submitted by: sef
Sponsored by: iXsystems, Inc.
|
|
|
|
|
|
| |
Make the NFS server use shared vnode locks for a few cases
that are allowed by the VFS/VOP interface instead of using
exclusive locks.
|
|
|
|
|
|
| |
head as r284216 are set via /boot/loader.conf in stable/10.
This is a direct commit to stable/10 because TUNABLE_INT()
is deprecated in head.
|
|
|
|
|
|
|
|
|
|
| |
Make the size of the hash tables used by the NFSv4 server tunable.
No appreciable change in performance was observed after increasing
the sizes of these tables and then testing with a single client.
However, there was an email that indicated high CPU overheads for
a heavily loaded NFSv4 and it is hoped that increasing the sizes
of the hash tables via these tunables might help.
The tables remain the same size by default.
|
|
|
|
|
|
|
|
| |
Perform SU cleanup in the AST handler. Do not sleep waiting for SU cleanup
while owning vnode lock.
On MFC, for KBI stability, td_su member was moved to the end of the
struct thread.
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NFS client generated directory block(s) with d_fileno == 0
so that it would not return less data than requested.
Since returning less directory data than requested is not a problem
for FreeBSD and even UFS no longer returns directory structures
with d_fileno == 0, this patch stops the client from doing this.
Although entries with d_fileno == 0 should not be a problem,
the man pages no longer document that these entries should be
ignored, so there was a concern that these entries might be an
issue in the future.
|
|
|
|
|
|
|
|
| |
The NFS client wasn't handling getdirentries(2) requests for sizes
that are not an exact multiple of DIRBLKSIZ correctly. Fortunately
readdir(3) always uses an exact multiple of DIRBLKSIZ, so few applications
were affected. This patch fixes this problem by reducing the size
of the directory read to an exact multiple of DIRBLKSIZ.
|
|
|
|
|
|
|
|
|
| |
Present implementation of large sync writes is too strict and so can be
quite slow. Instead of doing that, execute large async write in chunks,
syncing each chunk separately.
It would be good to fix large sync writes too, but I leave it to somebody
with more skills in this area.
|
|
|
|
|
|
|
|
|
|
| |
Make fuse(4) respect FOPEN_DIRECT_IO. This is required for correct
operation of GlusterFS.
PR: 192701
Submitted by: harsha at harshavardhana.net
Reviewed by: kib@
Sponsored by: The FreeBSD Foundation
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MAXBSIZE defines both the largest UFS block size and the
largest size for a buffer in the buffer cache. This patch
defines a new constant MAXBCACHEBUF, which is the largest
size for a buffer in the buffer cache. Having a separate
constant allows MAXBCACHEBUF to be set larger than MAXBSIZE
on a per-architecture basis, so that NFS can do larger read/writes
for these architectures. It modifies sys/param.h so that BKVASIZE
can also be set on a per-architecture basis.
A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE
is fixed as well.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change wcommitsize default from one empirical value to another.
The new value is more predictable with growing RAM size:
hibufspace maxvnodes old new
i386:
256MB 32980992 15800 2198732 2097152
2GB 94027776 107677 878764 4194304
amd64:
256MB 32980992 15800 2198732 2097152
1GB 114114560 68062 1678155 4194304
4GB 217055232 111807 1955452 4194304
16GB 1717846016 337308 5097465 16777216
64GB 1734918144 1164427 1490479 16777216
256GB 1734918144 4426453 391983 16777216
|
|
|
|
|
|
| |
Fix the NFS server's handling of a bogus NFSv2 ROOT RPC.
The ROOT RPC is deprecated in the NFSv2 RFC, RFC-1094
and should never be used by a client.
|
|
|
|
|
|
|
|
|
|
|
| |
mav@ has found that NFS servers exporting ZFS file systems
can perform better when using a 128K read/write data size.
This patch changes NFS_MAXDATA from 64K to 128K so that
clients can use 128K for NFS mounts to allow this.
The patch also renames NFS_MAXDATA to NFS_SRVMAXIO so
that it is clear that it applies to the NFS server side
only. It also avoids a name conflict with the NFS_MAXDATA
defined in rpcsvc/nfs_prot.h, that is used for userland RPC.
|
|
|
|
|
|
|
|
|
|
|
| |
File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.
|
|
|
|
|
|
|
|
| |
nfsrpc_createv4: fix double free.
Reported by: Oliver Pinter, clang static checker
Obtained from: HardenedBSD (63cac77c42c0c3fc67da62f97d5ab651d52ae707)
Reviewed by: rmacklem
|