summaryrefslogtreecommitdiffstats
path: root/sys/fs
Commit message (Collapse)AuthorAgeFilesLines
...
* MFC r299413:kib2016-05-182-43/+13
| | | | Use vfs_hash_ref(9) to eliminate LK_EXCLOTHER kludge.
* MFC r298609:pfg2016-05-101-4/+4
| | | | | | | ext2fs: make use of the howmany() macro when available. We have a howmany() macro in the <sys/param.h> header that is convenient to re-use as it makes things easier to read.
* MFC: r298523rmacklem2016-05-071-0/+1
| | | | | | | | | | | | Allow the NFSv4 server to reply NFSERR_WRONGSEC for the SetClientID operation. It was reported via email that a Linux client couldn't do a Kerberized NFS mount when only "sec=krb5" was specified for the exports. The Linux client attempted a mount via krb5i and the server replied NFSERR_SERVERFAULT. Although NFSERR_WRONGSEC isn't listed as an error for SetClientID, I think it is the correct reply, so this patch enables that. I do not know if this fixes the mount attempt, but adding "krb5i" to the list of allowed security flavours does allow the mount to work.
* MFC: r298495rmacklem2016-05-072-6/+6
| | | | | | | | | | | | | | | Fix a LOR in the NFSv4.1 server. The ordering of acquisition of the state and session mutexes was reversed in two cases executed when an NFSv4.1 client created/freed a session. Since clients will typically do this only when mounting and dismounting, the likelyhood of causing a deadlock was low but possible. This can only occur for NFSv4.1 mounts, since the others do not use sessions. This was detected while testing the pNFS server/client where the client crashed during dismounting. The patch also reorders the unlocks, although that isn't necessary for correct operation.
* MFC: r297869rmacklem2016-05-071-0/+10
| | | | | | | | | | | If the VOP_SETATTR() call that saves the exclusive create verifier failed, the NFS server would leave the newly created vnode locked. This could result in a file system that would not unmount and processes wedged, waiting for the file to be unlocked. Since this VOP_SETATTR() never fails for most file systems, this bug doesn't normally manifest itself. I found it during testing of an exported GlusterFS file system, which can fail. This patch adds the vput() and changes the error to the correct NFS one.
* MFC: r297837rmacklem2016-05-062-4/+6
| | | | | | | | | | Bruce Evans reported that there was a performance regression between the old and new NFS clients. He did a good job of isolating the problem which was caused by the new NFS client not setting the post write mtime correctly. The new NFS client code was cloned from the old client, but was incorrect, because the mtime in the nfs vnode's cache wasn't yet updated. This patch fixes this problem. The patch also adds missing mutex locking.
* MFC r298732:pfg2016-05-051-1/+1
| | | | | | | sys/devfs: unsign an index to prevent signed integer overflow. cdp_maxdirent in struct:cdev_priv is of type u_int. Use the same type for the corresponding index in devfs_revoke().
* MFC r298664kp2016-04-292-9/+23
| | | | | | | | | | | | | | msdosfs: Prevent buffer overflow when expanding win95 names In win2unixfn() we expand Windows 95 style long names. In some cases that requires moving the data in the nbp->nb_buf buffer backwards to make room. That code failed to check for overflows, leading to a stack overflow in win2unixfn(). We now check for this event, and mark the entire conversion as failed in that case. This means we present the 8 character, dos style, name instead. PR: 204643 Differential Revision: https://reviews.freebsd.org/D6015
* MFC r298518:pfg2016-04-281-1/+1
| | | | | | | | | | | | ext2_htree_release(): prevent signed integer overflow in a loop. h_levels_num, as most data structs in ext2fs, is unsigned so the index that addresses it has to be unsigned as well. To get to overflow here we would probably be considering a degenerate case though. MFC after: 5 days
* MFC r297796:pfg2016-04-251-1/+2
| | | | | | | | | ext2fs: replace 0 with NULL for pointers. While here do late initialization of ebap, similar as was done in UFS. Found with devel/coccinelle.
* MFC r297479, r297695:kevlo2016-04-142-6/+4
| | | | | | | Update comment: Linux does set a randomized generation number of an inode on ext2/3/4. While here use arc4random() instead of random(). Reviewed by: pfg
* MFC r287109 (by trasz): Make it possible to forcibly unmount devfs.mav2016-04-031-0/+2
|
* MFC r297335:kevlo2016-04-012-17/+47
| | | | | | Update superblock and inode structs for ext4. Reviewed by: pfg
* MFC r296652:kib2016-03-251-10/+12
| | | | | Do not perform unneccessary shared recursion on the allproc_lock in pfs_visible().
* MFC r295811:pfg2016-03-191-5/+3
| | | | | | | | | | | | Ext2: cleanup setting of ctime/mtime/birthtime. This adopts the same change as r291936 for UFS. Directly clear IN_ACCESS or IN_UPDATE when user supplied the time, and copy the value into the inode. This keeps the behaviour cleaner and is consistent with UFS. Reviewed by: bde
* MFC r296388:kib2016-03-181-1/+1
| | | | | Pass MNTK_NO_IOPF and MNTK_UNMAPPED_BUFS flags from the lower filesystem to the nullfs mount.
* MFC r294504, r294652, r294653, r294655:pfg2016-03-1711-120/+1498
| | | | | | | | | | | | | | | | | | ext2fs: Bring back the htree dir_index implementation. The htree dir_index is perhaps one of the most characteristic features of the linux ext3 implementation. It was removed in r281670, due to repeated bug reports. Damjan Jovanic detected and fixed three bugs and did some stress testing by building Apache OpenOffice on top of it so it is now in good shape to bring back. Differential Revision: https://reviews.freebsd.org/D5007 Submitted by: Damjan Jovanovic Reviewed by: pfg RelNotes: yes
* MFC r295717:kib2016-02-241-0/+9
| | | | | | | | After nullfs rmdir operation, reclaim the directory vnode which was unlinked. Otherwise the vnode stays cached, causing leak. This is similar to r292961 for regular files. Approved by: re (marius)
* MFC r295574:markj2016-02-221-1/+4
| | | | | | Clear the cookie pointer on error in tmpfs_readdir(). Approved by: re (glebius)
* MFC r295616:pfg2016-02-171-5/+8
| | | | | | | | | | | | ext2fs: Remove panics for rename() race conditions. Sync with r84642 from UFS: The panics are inappropriate because the IN_RENAME flag only fixes a few of the huge number of race conditions that can result in the source path becoming invalid even prior to the VOP_RENAME() call. Approved by: re (glebius)
* MFC r294595:kib2016-02-141-0/+7
| | | | | | | | When devfs dirent is freed, a vnode might still keep a pointer to it, apparently. Interlock and clear the pointer to avoid free memory dereference. Approved by: re (marius)
* MFC r295209;pfg2016-02-061-5/+7
| | | | | | | | | | | Revert r294695; passthrough any extra timestamps to the dinode struct. The original ext2fs change worked fine on disks formated with default values, but it was the cause of a regression when inodes are small. Revert it for now, while we figure out safer ways pass such values, PR: 206820 Approved by: re
* MFC r294695:pfg2016-01-281-7/+5
| | | | | | | | | | | | | ext2fs: passthrough any extra timestamps to the dinode struct. In general we don't trust any of the extended timestamps unless the EXT2F_ROCOMPAT_EXTRA_ISIZE feature is set. However, in the case where we freshly allocated a new inode the information is valid and it is better to pass it along instead of leaving the value undefined. This should have no practical effect but should reduce the amount of garbage if EXT2F_ROCOMPAT_EXTRA_ISIZE is set, like in cases where the filesystem is converted from ext3 to ext4.
* MFC r294200: [PR 206224] bv_cnt is sometimes examined without holding therpokala2016-01-261-0/+5
| | | | | | | | | bufobj lock Add locking around access to bv_cnt which is currently being done unlocked Approved by: jhb Sponsored by: Panasas, Inc.
* MFC r293059:kib2016-01-231-3/+3
| | | | | Hide transient EBADF errors caused by the parallel revoke(2) or forced unmount of devfs mounts, by restarting the failed syscall.
* MFC 286974,291653:jhb2016-01-231-1/+1
| | | | | | | | | | | | | | | | 286974: Remove reference to non-existent kern_openat(9). 291653: The cdevpriv_dtr_t typedef was not able to be used in a function prototype like the various d_*_t typedefs since it declared a function pointer rather than a function. Add a new d_priv_dtor_t typedef that declares the function and can be used as a function prototype. The previous typedef wasn't useful outside of the cdevpriv implementation, so retire it. The name d_priv_dtor_t was chosen to be more consistent with cdev methods since it is commonly used in place of d_close_t even though it is not a direct pointer in struct cdevsw.
* Revert r294271:pfg2016-01-224-80/+34
| | | | | | | ext4: add support for reading sparse files Our older GCC can't handle anonymous unions, so ia64 and powerpc LINT kernels are now failing.
* MFC r293680pfg2016-01-184-34/+80
| | | | | | | | | | | | ext4: add support for reading sparse files Add support for sparse files in ext4. Also implement read-ahead, which greatly increases the performance when transferring files from ext4. The sparse file support has become very common in ext4. Both features implemented by Damjan Jovanovic. PR: 205816
* MFC r293679:ae2016-01-183-6/+8
| | | | | | | | | | | | Change the type of newsize argument in the smbfs_smb_setfsize() function from int to int64. MSDN says that SMB_SET_FILE_END_OF_FILE_INFO uses signed 64-bit integer to specify offset, but since smbfs_smb_setfsize() has used plain int, a value was truncated in case when offset was larger than 2G. https://msdn.microsoft.com/en-us/library/ff469975.aspx In particular, now `truncate -s 10G` will work correctly on the mounted SMB share.
* MFC r293683:pfg2016-01-141-1/+1
| | | | | | | | | | | ext4: mount panic from freeing invalid pointers Initialize the struct with those fields to zeroes on allocation, preventing the panic. Patch by: Damjan Jovanovic. PR: 206056
* MFC r293370:pfg2016-01-101-6/+13
| | | | | | | | | | ext2fs: reading mmaped file in Ext4 causes panic Always call brelse(path.ep_bp), fixing reading EXT4 files using mmap(). Patch by Damjan Jovanovic. PR: 205938
* MFC r283495:dchagin2016-01-091-0/+2
| | | | Hide vfs.pfs.trace variable if it is not used.
* To facillitate an upcoming Linuxulator merging partiallydchagin2016-01-091-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | MFC r275121 (by kib). Only merge the syntax changes from r275121, PROC_*LOCK() macros still lock the same proc spinlock. The process spin lock currently has the following distinct uses: - Threads lifetime cycle, in particular, counting of the threads in the process, and interlocking with process mutex and thread lock. The main reason of this is that turnstile locks are after thread locks, so you e.g. cannot unlock blockable mutex (think process mutex) while owning thread lock. - Virtual and profiling itimers, since the timers activation is done from the clock interrupt context. Replace the p_slock by p_itimmtx and PROC_ITIMLOCK(). - Profiling code (profil(2)), for similar reason. Replace the p_slock by p_profmtx and PROC_PROFLOCK(). - Resource usage accounting. Need for the spinlock there is subtle, my understanding is that spinlock blocks context switching for the current thread, which prevents td_runtime and similar fields from changing (updates are done at the mi_switch()). Replace the p_slock by p_statmtx and PROC_STATLOCK(). Discussed with: kib
* MFC r293042:kib2016-01-081-1/+1
| | | | Minor style cleanup.
* MFC r292872:pfg2016-01-061-0/+3
| | | | | | | | | | ext2: recognize ext4 INCOMPAT_RECOVER flag This is a flag specific for journalling in ext4. Add it to the list of ext4 features we ignore for read-only purposes. PR: 205668
* MFC r292961:kib2016-01-061-3/+5
| | | | | Force nullfs vnode reclaim after unlinking, to potentially unlink lower vnode.
* MFC of 291244, 291380, 291459, 291460, 291671, and 291743:mckusick2015-12-303-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This MFC includes changes to better manage the vnode freelist and to streamline the allocation and freeing of vnodes. Note that to maintain the KPI the VI_AGE flag is left defined in sys/vnode.h though its use is dropped as described in 291380. To maintain KBI the vfs.vlru_alloc_cache_src sysctl variable remains though it no longer has any effect as described in 291244. MFC of 291244: Move the comment about resident pages preventing vnode from leaving active list, into the header comment for vdrop(), which is the function that decides whether to leave the vnode on the list. Note that dirty page write-out in vinactive() is asynchronous. Discussed with: alc Sponsored by: The FreeBSD Foundation MFC of 291380: Remove VI_AGE vnode iflag, it is unused. Noted by: bde Sponsored by: The FreeBSD Foundation MFC of 291459: For performance reasons, it is useful to have a single string used as the name of a filesystem when setting it as the first parameter to the getnewvnode() function. Most filesystems call getnewvnode from just one place so can use a literal string as the first parameter. However, NFS calls getnewvnode from two places, so we create a global constant string that can be used by the two instances. This change also collapses two instances of getnewvnode() in the UFS filesystem to a single call. Reviewed by: kib Tested by: Peter Holm MFC of 291460: As the kernel allocates and frees vnodes, it fully initializes them on every allocation and fully releases them on every free. These are not trivial costs: it starts by zeroing a large structure then initializes a mutex, a lock manager lock, an rw lock, four lists, and six pointers. And looking at vfs.vnodes_created, these operations are being done millions of times an hour on a busy machine. As a performance optimization, this code update uses the uma_init and uma_fini routines to do these initializations and cleanups only as the vnodes enter and leave the vnode_zone. With this change the initializations are only done kern.maxvnodes times at system startup and then only rarely again. The frees are done only if the vnode_zone shrinks which never happens in practice. For those curious about the avoided work, look at the vnode_init() and vnode_fini() functions in kern/vfs_subr.c to see the code that has been removed from the main vnode allocation/free path. Reviewed by: kib Tested by: Peter Holm MFC of 291671: We need to zero out the union of pointers in a freed vnode structure. Fix from: Mateusz Guzik Tested by: Jason Unovitch MFC of 291743: We need to zero out the clustering variables in a freed vnode structure. For completeness add a VNASSERT that there are no threads waiting on a range lock (this was previously checked on every vnode free). Reported by; Rick Macklem Fix from: Mateusz Guzik
* MFC r292621:kib2015-12-291-7/+14
| | | | | Keep devfs mount locked for the whole duration of the devfs_setattr(), and ensure that our dirent is instantiated.
* MFC: r291638rmacklem2015-12-163-0/+52
| | | | | | | Fix the memory leak that occurs when the nfscommon.ko module is unloaded. This leak was introduced by r291527 (r292223 in stable/10). Since the nfscommon.ko module is rarely unloaded, this leak would not have been much of an issue.
* MFC: r291527rmacklem2015-12-146-220/+547
| | | | | | | | | | | | | | | | Add kernel support to the NFS server for the "-manage-gids" option that will be added to the nfsuserd daemon in a future commit. It modifies the cache used by NFSv4 for name<-->id translation (both username/uid and group/gid) to support this. When "-manage-gids" is set, the server looks up each uid for the RPC and uses the list of groups cached in the server instead of the list of groups provided in the RPC request. The cached group list is acquired for the cache by the nfsuserd daemon via getgrouplist(3). This avoids the 16 groups limit for the list in the RPC request. Since the cache is now used for every RPC when "-manage-gids" is enabled, the code also modifies the cache to use a separate mutex for each hash list instead of a single global mutex.
* MFC: r291150rmacklem2015-12-053-7/+50
| | | | | | | | | | | | | When the nfsd threads are terminated, the NFSv4 server state (opens, locks, etc) is retained, which I believe is correct behaviour. However, for NFSv4.1, the server also retained a reference to the xprt (RPC transport socket structure) for the backchannel. This caused svcpool_destroy() to not call SVC_DESTROY() for the xprt and allowed a socket upcall to occur after the mutexes in the svcpool were destroyed, causing a crash. This patch fixes the code so that the backchannel xprt structure is dereferenced just before svcpool_destroy() is called, so the code does do an SVC_DESTROY() on the xprt, which shuts down the socket upcall.
* MFC: r291117rmacklem2015-12-051-0/+38
| | | | | | | | Revert r283330 since it broke directory caching in the client. At this time I cannot see a way to fix directory caching when it has partial blocks in the buffer cache, due to the fact that the syscall's uio_offset won't stay the same as the lblkno * NFS_DIRBLKSIZ offset.
* MFC: r290970rmacklem2015-12-011-1/+3
| | | | | | | | | | mnt_stat.f_iosize (which is used to set bo_bsize) must be set to the largest size of buffer cache block or the mapping of the buffer is bogus. When a mount with rsize=4096,wsize=4096 was done, f_iosize would be set to 4096. This resulted in corrupted directory data, since the buffer cache block size for directories is NFS_DIRBLKSIZ (8192). This patch fixes the code so that it always sets f_iosize to at least NFS_DIRBLKSIZ.
* MFC r287033:trasz2015-10-181-1/+2
| | | | | | | After r286237 it should be fine to call vgone(9) on a busy GEOM vnode; remove KASSERT that would prevent forced devfs unmount from working. Sponsored by: The FreeBSD Foundation
* MFC r283924vangyzen2015-10-021-4/+10
| | | | | | | | | | | | | | | Provide vnode in memory map info for files on tmpfs When providing memory map information to userland, populate the vnode pointer for tmpfs files. Set the memory mapping to appear as a vnode type, to match FreeBSD 9 behavior. This fixes the use of tmpfs files with the dtrace pid provider, procstat -v, procfs, linprocfs, pmc (pmcstat), and ptrace (PT_VM_ENTRY). Submitted by: Eric Badger <eric@badgerio.us> (initial revision) Obtained from: Dell Inc. PR: 198431
* MFC r288044:kib2015-09-271-2/+9
| | | | | Ensure that when a blockable open of fifo returns success, a valid file descriptor opened for complimentary access exists as well.
* MFC 283281,283282,283562,283647,283836,284000,286158:jhb2015-09-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various fixes to orphan handling which also fix issues with following forks. 283281: Always set p_oppid when attaching to an existing process via procfs tracing. This matches the behavior of ptrace(PT_ATTACH). Also, the procfs detach request assumes p_oppid is always set. 283282: Only reparent a traced process to its old parent if the tracing process is not the old parent. Otherwise, proc_reap() will leave the zombie in place resulting in the process' status being returned twice to its parent. Add test cases for PT_TRACE_ME and PT_ATTACH which are fixed by this change. 283562: Do not allow a process to reap an orphan (a child currently being traced by another process such as a debugger). The parent process does need to check for matching orphan pids to avoid returning ECHILD if an orphan has exited, but it should not return the exited status for the child until after the debugger has detached from the orphan process either explicitly or implicitly via wait(). Add two tests for for this case: one where the debugger is the direct child (thus the parent has a non-empty children list) and one where the debugger is not a direct child (so the only "child" of the parent is the orphan). 283647: Tweak the description of when waitpid() doesn't return any status for a non-blocking wait to avoid the word "empty". 283836: Consistently only use one end of the pipe in the parent and debugger processes and do not rely on EOF due to a close() in the debugger. 284000: Add a CHILD_REQUIRE macro similar to ATF_REQUIRE for use in child processes of the main test process. 286158: Clear P_TRACED before reparenting a detached process back to its original parent. Otherwise the debugee will be set as an orphan of the debugger. Add tests for tracing forks via PT_FOLLOW_FORK.
* MFC: r286790rmacklem2015-08-281-2/+5
| | | | | | | | For the case where an NFSv4.1 ExchangeID operation has the client identifier that already has a confirmed ClientID, the nfsrv_setclient() function would not fill in the clientidp being returned. As such, the value of ClientID returned would be whatever garbage was on the stack. This patch fixes the problem by filling in these fields.
* MFC: r286046rmacklem2015-08-011-1/+2
| | | | | | | | This patch fixes a problem where, if the NFSv4 server has a previous unconfirmed clientid structure for the same client on the last hash list, this old entry would not be removed/deleted. I do not think this bug would have caused serious problems, since the new entry would have been before the old one on the list. This old entry would have eventually been scavenged/removed.
* MFC: r285113rmacklem2015-07-311-2/+6
| | | | | | | | | If a "principal" argument isn't provided for a Kerberized NFS mount, the kernel would generate a bogus one with a ":/<path>" suffix. This would only occur for the case where there was no explicit "principal" argument and the getaddrinfo() call in mount_nfs.c failed to a return a cannonical name for the server. This patch fixes this unusual case.
OpenPOWER on IntegriCloud