summaryrefslogtreecommitdiffstats
path: root/sys/fs
Commit message (Collapse)AuthorAgeFilesLines
...
* ext2fs: Use the complete random() range in i_gen.pfg2013-06-301-1/+1
| | | | | | | i_gen is unsigned in ext2fs so we can handle the complete 32 bits. MFC after: 1 week
* Bring some updates from ufs_lookup to ext2fs.pfg2013-06-291-8/+11
| | | | | | | | | | | | | | | | | | | | | | r156418: Don't set IN_CHANGE and IN_UPDATE on inodes for potentially suspended file systems. This could cause deadlocks when creating snapshots. (We can't do snapshots on ext2fs but it is useful to keep things in sync). r183079: - Only set i_offset in the parent directory's i-node during a lookup for non-LOOKUP operations. - Relax a VOP assertion for a DELETE lookup. r187528: Move the code from ufs_lookup.c used to do dotdot lookup, into the helper function. It is supposed to be useful for any filesystem that has to unlock dvp to walk to the ".." entry in lookup routine. MFC after: 5 days
* Properly use v_data field. This magically worked (even if wrong) untildavide2013-06-281-1/+1
| | | | | now because v_data is the first field of the structure, but it's not something we should rely on.
* Garbage collect an useless check. smp should be never NULL.davide2013-06-281-5/+0
|
* Plug a couple of leakages in smbfs_lookup().davide2013-06-281-3/+6
|
* Minor sorting.pfg2013-06-261-1/+1
| | | | MFC after: 3 days
* Define and use e2fs_lbn_t in ext2fs.pfg2013-06-237-12/+19
| | | | | | | | | | | | | | | | | | In line to what is done in UFS, define an internal type e2fs_lbn_t for the logical block numbers. This change is basically a no-op as the new type is unchanged (int32_t) but it may be useful as bumping this may be required for ext4fs. Also, as pointed out by Bruce Evans: -Use daddr_t for daddr in ext2_bmaparray(). This seems to improve reliability with the reallocblks option. - Add a cast to the fsbtodb() macro as in UFS. Reviewed by: bde MFC after: 3 days
* Fix r252074 so that it builds on 64bit arches.rmacklem2013-06-221-3/+1
|
* The NFSv4.1 LayoutCommit operation requires a valid offset and length.rmacklem2013-06-211-8/+20
| | | | | (0, 0 is not sufficient) This patch a loop for each file layout, using the offset, length of each file layout in a separate LayoutCommit.
* When the NFSv4.1 client is writing to a pNFS Data Server (DS), thermacklem2013-06-212-4/+25
| | | | | | | file's size attribute does not get updated. As such, it is necessary to invalidate the attribute cache before clearing NMODIFIED for pNFS. MFC after: 2 weeks
* Since some NFSv4 servers enforce the requirement for a reserved port#,rmacklem2013-06-211-6/+0
| | | | | | | | enable use of the (no)resvport mount option for NFSv4. I had thought that the RFC required that non-reserved port #s be allowed, but I couldn't find it in the RFC. MFC after: 2 weeks
* Rename some prefixes in the Block Group Descriptor fields to ext4bgd_pfg2013-06-201-6/+6
| | | | | | | Change prefix to avoid confusion and denote that these fields are generally only available starting with ext4. MFC after: 3 days
* More ext2fs header cleanups:pfg2013-06-182-12/+12
| | | | | | | - Set MAXMNTLEN nearer to where it is used. - Move EXT2_LINK_MAX to ext2_dir.h . MFC after: 3 days
* Rename remaining DIAGNOSTIC to INVARIANTS.pfg2013-06-171-1/+1
| | | | MFC after: 3 days
* Re-sort ext2fs headers to make things easier to find.pfg2013-06-166-56/+33
| | | | | | | | | | | | | | In the ext2fs driver we have a mixture of headers: - The ext2_ prefixed headers have strong influence from NetBSD and are carry specific ext2/3/4 information. - The unprefixed headers are inspired on UFS and carry implementation specific information. Do some small adjustments so that the information is easier to find coming from either UFS or the NetBSD implementation. MFC after: 3 days
* Relax some unnecessary unsigned type changes in ext2fs.pfg2013-06-132-10/+10
| | | | | | | | | | | | | While the changes in r245820 are in line with the ext2 spec, the code derived from UFS can use negative values so it is better to relax some types to keep them as they were, and somewhat more similar to UFS. While here clean some casts. Some of the original types are still wrong and will require more work. Discussed with: bde MFC after: 3 days
* Turn DIAGNOSTICs to INVARIANTS in ext2fs.pfg2013-06-125-16/+16
| | | | | | | This is done to be consistent with what other filesystems and particularly ffs already does (see r173464). MFC after: 5 days
* s/file system/filesystem/gpfg2013-06-113-8/+8
| | | | | | Based on r96755 from UFS. MFC after: 3 days
* e2fs_bpg and e2fs_isize are always unsigned.pfg2013-06-091-2/+2
| | | | | | | | | | The superblock in ext2fs defines all the fields as unsigned but for some reason the in-memory superblock was carrying e2fs_bpg and e2fs_isize as signed. We should preserve the specified types for consistency. MFC after: 5 days
* Add missing VM object unlocks in an error case.alc2013-06-071-0/+2
| | | | Reviewed by: kib
* Don't busy the page unless we are likely to release the object lock.alc2013-06-061-4/+7
| | | | | Reviewed by: kib Sponsored by: EMC / Isilon Storage Division
* Relax the vm object locking. Use a read lock.alc2013-06-051-5/+5
| | | | Sponsored by: EMC / Isilon Storage Division
* Eliminate unnecessary vm object locking from tmpfs_nocacheread().alc2013-06-041-2/+0
|
* ext2fs: space vs tab.pfg2013-06-032-2/+2
| | | | | Obtained from: Christoph Mallon MFC after: 3 days
* ext2fs: Small cosmetic fixes.pfg2013-06-032-2/+3
| | | | | | | Make a long macro readable and sort a header. Obtained from: Christoph Mallon MFC after: 3 days
* ext2fs: Update Block Group Descriptor struct.pfg2013-06-031-3/+7
| | | | | | | | | Uncover some, previously reserved, fields that are used by Ext4. These are currently unused but it is good to have them for future reference. Reviewed by: bde MFC after: 3 days
* - Convert the bufobj lock to rwlock.jeff2013-05-315-4/+7
| | | | | | | | | | - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
* Assert that OBJ_TMPFS flag on the vm object for the tmpfs node iskib2013-05-301-0/+2
| | | | | | cleared when the tmpfs node is going away. Tested by: bdrewery, pho
* Post-r248567, there were times when the client would return armacklem2013-05-281-4/+2
| | | | | | | | | | | | truncated directory for some NFS servers. This turned out to be because the size of a directory reported by an NFS server can be smaller that the ufs-like directory created from the RPC XDR in the client. This patch fixes the problem by changing r248567 so that vnode_pager_setsize() is only done for regular files. Reported and tested by: hartmut.brandt@dlr.de Reviewed by: kib MFC after: 1 week
* Do not leak the NULLV_NOUNLOCK flag from the nullfs_unlink_lowervp(),kib2013-05-211-7/+19
| | | | | | | | | | for the case when the nullfs vnode is not reclaimed. Otherwise, later reclamation would not unlock the lower vnode. Reported by: antoine Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Fix typo in comment.des2013-05-151-1/+1
| | | | | Submitted by: Alex Weber <alexwebr@gmail.com> MFC after: 1 week
* Add support for the eofflag to nfs_readdir() in the new NFSrmacklem2013-05-121-1/+8
| | | | | | | | client so that it works under a unionfs mount. Submitted by: Jared Yanovich (slovichon@gmail.com) Reviewed by: kib MFC after: 2 weeks
* Fix several typoseadler2013-05-121-1/+1
| | | | | | PR: kern/176054 Submitted by: Christoph Mallon <christoph.mallon@gmx.de> MFC after: 3 days
* fdescfs: Supply a real value for d_type in readdir.jilles2013-05-121-1/+1
| | | | | All the fdescfs nodes (except . and ..) appear as character devices to stat(), so DT_CHR is correct.
* - Fix nullfs vnode reference leak in nullfs_reclaim_lowervp(). Thekib2013-05-114-7/+50
| | | | | | | | | | | | | | | | | | | | | | | null_hashget() obtains the reference on the nullfs vnode, which must be dropped. - Fix a wart which existed from the introduction of the nullfs caching, do not unlock lower vnode in the nullfs_reclaim_lowervp(). It should be innocent, but now it is also formally safe. Inform the nullfs_reclaim() about this using the NULLV_NOUNLOCK flag set on nullfs inode. - Add a callback to the upper filesystems for the lower vnode unlinking. When inactivating a nullfs vnode, check if the lower vnode was unlinked, indicated by nullfs flag NULLV_DROP or VV_NOSYNC on the lower vnode, and reclaim upper vnode if so. This allows nullfs to purge cached vnodes for the unlinked lower vnode, avoiding excessive caching. Reported by: G??ran L??wkrantz <goran.lowkrantz@ismobile.com> Tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Avoid deactivating the page if it is already on a queue, only requeuekib2013-05-061-6/+10
| | | | | | | | | | the page. This both reduces the number of queues locking and avoids moving the active page to inactive list just because the page was read or written. Based on the suggestion by: alc Reviewed by: alc Tested by: pho
* Change VM_OBJECT_LOCK/UNLOCK() -> VM_OBJECT_WLOCK/WUNLOCK() to reflectdavide2013-05-041-8/+9
| | | | | | the recent switch of the vm object lock to a rwlock. Reported by: attilio
* Overhaul locking in netsmb, getting rid of the obsolete lockmgr() primitive.davide2013-05-041-2/+2
| | | | | | This solves a long standing LOR between smb_conn and smb_vc. Tested by: martymac, pho (previous version)
* Completely rewrite the interface to smbdev switching from dev_clonedavide2013-05-042-10/+30
| | | | | | | | | to cdevpriv(9). This commit changes the semantic of mount_smbfs in userland as well, which now passes file descriptor in order to to mount a specific filesystem istance. Reviewed by: attilio, ed Tested by: martymac
* The fsync(2) call should sync the vnode in such way that even afterkib2013-05-022-13/+50
| | | | | | | | | | | | | | | | | | | | | | | | system crash which happen after successfull fsync() return, the data is accessible. For msdosfs, this means that FAT entries for the file must be written. Since we do not track the FAT blocks containing entries for the current file, just do a sloppy sync of the devvp vnode for the mount, which buffers, among other things, contain FAT blocks. Simultaneously, for deupdat(): - optimize by clearing the modified flags before short-circuiting a return, if the mount is read-only; - only ignore the rest of the function for denode with DE_MODIFIED flag clear when the waitfor argument is false. The directory buffer for the entry might be of delayed write; - microoptimize by comparing the updated directory entry with the current block content; - try to cluster the write, fall back to bawrite() if low on resources. Based on the submission by: bde MFC after: 2 weeks
* Fix the v_object leak for non-regular tmpfs vnodes.kib2013-05-021-0/+3
| | | | | Reported and tested by: pho Sponsored by: The FreeBSD Foundation
* For the new regular tmpfs vnode, v_object is initialized beforekib2013-05-023-14/+34
| | | | | | | | | | | | | | insmntque() is called. The standard insmntque destructor resets the vop vector to deadfs one, and calls vgone() on the vnode. As result, v_object is kept unchanged, which triggers an assertion in the reclaim code, on instmntque() failure. Also, in this case, OBJ_TMPFS flag on the backed vm object is not cleared. Provide the tmpfs insmntque() destructor which properly clears OBJ_TMPFS flag and resets v_object. Reported and tested by: pho Sponsored by: The FreeBSD Foundation
* The page read or written could be wired. Do not requeue if the pagekib2013-05-021-2/+4
| | | | | | | is not on a queue. Reported and tested by: pho Sponsored by: The FreeBSD Foundation
* Fix a bug that allows NFS clients to issue READDIR on files.des2013-04-291-0/+2
| | | | | | PR: kern/178016 Security: CVE-2013-3266 Security: FreeBSD-SA-13:05.nfsserver
* Rework the handling of the tmpfs node backing swap object and tmpfskib2013-04-282-164/+103
| | | | | | | | | | | | | | | | | | vnode v_object to avoid double-buffering. Use the same object both as the backing store for tmpfs node and as the v_object. Besides reducing memory use up to 2x times for situation of mapping files from tmpfs, it also makes tmpfs read and write operations copy twice bytes less. VM subsystem was already slightly adapted to tolerate OBJT_SWAP object as v_object. Now the vm_object_deallocate() is modified to not reinstantiate OBJ_ONEMAPPING flag and help the VFS to correctly handle VV_TEXT flag on the last dereference of the tmpfs backing object. Reviewed by: alc Tested by: pho, bf MFC after: 1 month
* When an NFS unmount occurs, once vflush() writes the last dirtyrmacklem2013-04-182-1/+20
| | | | | | | | | | | | | | | | buffer for the last vnode on the mount back to the server, it returns. At that point, the code continues with the unmount, including freeing up the nfs specific part of the mount structure. It is possible that an nfsiod thread will try to check for an empty I/O queue in the nfs specific part of the mount structure after it has been free'd by the unmount. This patch avoids this problem by setting the iodmount entries for the mount back to NULL while holding the mutex in the unmount and checking the appropriate entry is non-NULL after acquiring the mutex in the nfsiod thread. Reported and tested by: pho Reviewed by: kib MFC after: 2 weeks
* Both NFS clients can deadlock when using the "rdirplus" mountrmacklem2013-04-181-2/+10
| | | | | | | | | | | | | | | | | | | option. This can occur when an nfsiod thread that already holds a buffer lock attempts to acquire a vnode lock on an entry in the directory (a LOR) when another thread holding the vnode lock is waiting on an nfsiod thread. This patch avoids the deadlock by disabling readahead for this case, so the nfsiod threads never do readdirplus. Since readaheads for directories need the directory offset cookie from the previous read, they cannot normally happen in parallel. As such, testing by jhb@ and myself didn't find any performance degredation when this patch is applied. If there is a case where this results in a significant performance degradation, mounting without the "rdirplus" option can be done to re-enable readahead for directories. Reported and tested by: jhb Reviewed by: jhb MFC after: 2 weeks
* Move the NFS FHA (File Handle Affinity) code from sys/nfsserver token2013-04-172-2/+2
| | | | | | | | sys/nfs, since it is now shared by the two NFS servers. Suggested by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
* Revamp the old NFS server's File Handle Affinity (FHA) code so thatken2013-04-1712-21/+367
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | it will work with either the old or new server. The FHA code keeps a cache of currently active file handles for NFSv2 and v3 requests, so that read and write requests for the same file are directed to the same group of threads (reads) or thread (writes). It does not currently work for NFSv4 requests. They are more complex, and will take more work to support. This improves read-ahead performance, especially with ZFS, if the FHA tuning parameters are configured appropriately. Without the FHA code, concurrent reads that are part of a sequential read from a file will be directed to separate NFS threads. This has the effect of confusing the ZFS zfetch (prefetch) code and makes sequential reads significantly slower with clients like Linux that do a lot of prefetching. The FHA code has also been updated to direct write requests to nearby file offsets to the same thread in the same way it batches reads, and the FHA code will now also send writes to multiple threads when needed. This improves sequential write performance in ZFS, because writes to a file are now more ordered. Since NFS writes (generally less than 64K) are smaller than the typical ZFS record size (usually 128K), out of order NFS writes to the same block can trigger a read in ZFS. Sending them down the same thread increases the odds of their being in order. In order for multiple write threads per file in the FHA code to be useful, writes in the NFS server have been changed to use a LK_SHARED vnode lock, and upgrade that to LK_EXCLUSIVE if the filesystem doesn't allow multiple writers to a file at once. ZFS is currently the only filesystem that allows multiple writers to a file, because it has internal file range locking. This change does not affect the NFSv4 code. This improves random write performance to a single file in ZFS, since we can now have multiple writers inside ZFS at one time. I have changed the default tuning parameters to a 22 bit (4MB) window size (from 256K) and unlimited commands per thread as a result of my benchmarking with ZFS. The FHA code has been updated to allow configuring the tuning parameters from loader tunable variables in addition to sysctl variables. The read offset window calculation has been slightly modified as well. Instead of having separate bins, each file handle has a rolling window of bin_shift size. This minimizes glitches in throughput when shifting from one bin to another. sys/conf/files: Add nfs_fha_new.c and nfs_fha_old.c. Compile nfs_fha.c when either the old or the new NFS server is built. sys/fs/nfs/nfsport.h, sys/fs/nfs/nfs_commonport.c: Bring in changes from Rick Macklem to newnfs_realign that allow it to operate in blocking (M_WAITOK) or non-blocking (M_NOWAIT) mode. sys/fs/nfs/nfs_commonsubs.c, sys/fs/nfs/nfs_var.h: Bring in a change from Rick Macklem to allow telling nfsm_dissect() whether or not to wait for mallocs. sys/fs/nfs/nfsm_subs.h: Bring in changes from Rick Macklem to create a new nfsm_dissect_nonblock() inline function and NFSM_DISSECT_NONBLOCK() macro. sys/fs/nfs/nfs_commonkrpc.c, sys/fs/nfsclient/nfs_clkrpc.c: Add the malloc wait flag to a newnfs_realign() call. sys/fs/nfsserver/nfs_nfsdkrpc.c: Setup the new NFS server's RPC thread pool so that it will call the FHA code. Add the malloc flag argument to newnfs_realign(). Unstaticize newnfs_nfsv3_procid[] so that we can use it in the FHA code. sys/fs/nfsserver/nfs_nfsdsocket.c: In nfsrvd_dorpc(), add NFSPROC_WRITE to the list of RPC types that use the LK_SHARED lock type. sys/fs/nfsserver/nfs_nfsdport.c: In nfsd_fhtovp(), if we're starting a write, check to see whether the underlying filesystem supports shared writes. If not, upgrade the lock type from LK_SHARED to LK_EXCLUSIVE. sys/nfsserver/nfs_fha.c: Remove all code that is specific to the NFS server implementation. Anything that is server-specific is now accessed through a callback supplied by that server's FHA shim in the new softc. There are now separate sysctls and tunables for the FHA implementations for the old and new NFS servers. The new NFS server has its tunables under vfs.nfsd.fha, the old NFS server's tunables are under vfs.nfsrv.fha as before. In fha_extract_info(), use callouts for all server-specific code. Getting file handles and offsets is now done in the individual server's shim module. In fha_hash_entry_choose_thread(), change the way we decide whether two reads are in proximity to each other. Previously, the calculation was a simple shift operation to see whether the offsets were in the same power of 2 bucket. The issue was that there would be a bucket (and therefore thread) transition, even if the reads were in close proximity. When there is a thread transition, reads wind up going somewhat out of order, and ZFS gets confused. The new calculation simply tries to see whether the offsets are within 1 << bin_shift of each other. If they are, the reads will be sent to the same thread. The effect of this change is that for sequential reads, if the client doesn't exceed the max_reqs_per_nfsd parameter and the bin_shift is set to a reasonable value (22, or 4MB works well in my tests), the reads in any sequential stream will largely be confined to a single thread. Change fha_assign() so that it takes a softc argument. It is now called from the individual server's shim code, which will pass in the softc. Change fhe_stats_sysctl() so that it takes a softc parameter. It is now called from the individual server's shim code. Add the current offset to the list of things printed out about each active thread. Change the num_reads and num_writes counters in the fha_hash_entry structure to 32-bit values, and rename them num_rw and num_exclusive, respectively, to reflect their changed usage. Add an enable sysctl and tunable that allows the user to disable the FHA code (when vfs.XXX.fha.enable = 0). This is useful for before/after performance comparisons. nfs_fha.h: Move most structure definitions out of nfs_fha.c and into the header file, so that the individual server shims can see them. Change the default bin_shift to 22 (4MB) instead of 18 (256K). Allow unlimited commands per thread. sys/nfsserver/nfs_fha_old.c, sys/nfsserver/nfs_fha_old.h, sys/fs/nfsserver/nfs_fha_new.c, sys/fs/nfsserver/nfs_fha_new.h: Add shims for the old and new NFS servers to interface with the FHA code, and callbacks for the The shims contain all of the code and definitions that are specific to the NFS servers. They setup the server-specific callbacks and set the server name for the sysctl and loader tunable variables. sys/nfsserver/nfs_srvkrpc.c: Configure the RPC code to call fhaold_assign() instead of fha_assign(). sys/modules/nfsd/Makefile: Add nfs_fha.c and nfs_fha_new.c. sys/modules/nfsserver/Makefile: Add nfs_fha_old.c. Reviewed by: rmacklem Sponsored by: Spectra Logic MFC after: 2 weeks
* - Correct spelling in commentsgabor2013-04-171-1/+1
| | | | Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
OpenPOWER on IntegriCloud