summaryrefslogtreecommitdiffstats
path: root/sys/fs
Commit message (Collapse)AuthorAgeFilesLines
* MFC r321794: Improve FHA locality control for NFS read/write requests.mav2017-08-071-1/+1
| | | | | | | | | | | | | | | | | This change adds two new tunables, allowing to control serialization for read and write NFS requests separately. It does not change the default behavior since there are too many factors to consider, but gives additional space for further experiments and tuning. The main motivation for this change is very low write speed in case of ZFS with sync=always or when NFS clients requests sychronous operation, when every separate request has to be written/flushed to ZIL, and requests are processed one at a time. Setting vfs.nfsd.fha.write=0 in that case allows to increase ZIL throughput by several times by coalescing writes and cache flushes. There is a worry that doing it may increase data fragmentation on disks, but I suppose it should not happen for pool with SLOG. Sponsored by: iXsystems, Inc.
* MFC r320359:trasz2017-08-011-2/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Add vfs.nfsd.nfsd_enable_uidtostring, which works just like vfs.nfsd.nfsd_enable_stringtouid, but in reverse - when set to 1, it forces the NFSv4 server to return numeric UIDs and GIDs instead of "user@domain" strings. This helps with clients that can't translate returned identifiers, eg when rerooting. The same can be achieved by just never running nfsuserd(8), but the sysctl is useful to toggle the behaviour back and forth without rebooting. MFC r320409: Revert part of r320359, as suggested by rmacklem@. That case is only used for nfsuserd -manage-gids and shouldn't depend on sysctl. MFC r321196: Rename vfs.nfsd.enable_uidtostring to vfs.nfs.enable_uidtostring. It applies to both NFS client and NFS server, and is useful for both. This is different from vfs.nfsd.enable_stringtouid, which is specific to server side. Sponsored by: DARPA, AFRL
* MFC: r321314rmacklem2017-07-271-0/+5
| | | | | | | | | | | | r320062 introduced a bug when doing NFSv4.1 mounts against some non-FreeBSD servers. r320062 used nm_rsize, nm_wsize to set the maximum request/response sizes for the NFSv4.1 session. If rsize,wsize are not specified as options, the value of nm_rsize, nm_wsize is 0 at session creation, resulting in values for request/response that are too small. This patch fixes the problem. A workaround is to specify rsize=N,wsize=N mount options explicitly, so they are set before session creation. This bug only affects NFSv4.1 mounts against some non-FreeBSD servers.
* MFC: r320458rmacklem2017-07-161-0/+2
| | | | | | | | | | | Fix an NFSv3 client case that probably never happens. If an NFSv3 server were to reply with weak cache consistency attributes, but not post operation attributes, the client would use garbage attributes from memory. This was spotted during work on the code for the NFSv4.1 client. I have never seen evidence that this happens and it wouldn't make sense for an NFSv3 server to do this, so this patch is basically "theoretical", but does fix the problem if a server were to do the above.
* MFC: r320345rmacklem2017-07-155-29/+72
| | | | | | | | | | | | | | Add support to the NFSv4.1/pNFS client for commits through the DS. A NFSv4.1/pNFS server using File Layout can specify that Commit operations are to be done against the DS instead of MDS. Since no extant pNFS server did this, the code was untested and "#ifdef notyet". The FreeBSD pNFS server I am developing does specify that Commits be done through the DS, so the code has been enabled/tested. This patch should only affect the case of a pNFS server that specfies Commits through the DS. Relnotes: yes
* MFC r320408:pfg2017-07-091-2/+6
| | | | | | | | | | ext2fs: Support e2di_uid_high and e2di_gid_high. The fields exist on all versions of the filesystem and using them is a mount option on linux. For FreeBSD, the corresponding i_uid and i_gid are always long enough so use them by default. Reviewed by: Fedor Uporov
* MFC r320079:pfg2017-07-091-0/+1
| | | | | | | | | | ext2fs: Enable RO huge_file feature support. We have support for reading ext4 "huge" files but we can't write (anything) on ext4. and some filesystem. Formally enable the feature so that we can mount such filesystems. Submitted by: Fedor Uporov
* MFC: r320208rmacklem2017-07-071-7/+7
| | | | | | | | | | | | | | | | | | | | Ensure that the credentials field of the NFSv4 client open structure is initialized. bdrewery@ has reported panics "newnfs_copycred: negative nfsc_ngroups". The only way I can see that this occurs is that the credentials field of the open structure gets used before being filled in. I am not sure quite how this happens, but for the file create case, the code is serialized via the vnode lock on the directory. If, somehow, a link to the same file gets created just after file creation, this might occur. This patch ensures that the credentials field is initialized to a reasonable set of credentials before the structure is linked into any list, so I this should ensure it is initialized before use. I am committing the patch now, since bdrewery@ notes that the panics are intermittent and it may be months before he knows if the patch fixes his problem.
* MFC: r320062, r320070, r320126rmacklem2017-07-042-8/+36
| | | | | | | | | | | This is a partial merge of only the NFS changes and not the maxbcachebuf tunable. The NFS client changes make the code handle different I/O sizes more correctly. However, with the limit at 64K, they are not actually necessary. This MFC is mainly being done so that subsequent MFCs to the NFS code will merge easily.
* MFC: r319882rmacklem2017-07-031-1/+15
| | | | | | | Define NFS_MAXXDR as the upper bound on XDR overhead in an NFS RPC. This definition is a part of the maxiotune2 patch that will be committed soon.
* MFC: r318287rmacklem2017-05-221-0/+1
| | | | | | | Make nfscl_mtofh() return ENXIO when *nfhpp == NULL. r317272 introduced a case where nfscl_mtofh() could return 0 when *nfhpp is NULL. This patch makes it return ENXIO for this case.
* MFC: r317576rmacklem2017-05-141-2/+2
| | | | | | | | | | Modify the NFSv4.1/pNFS client to ask for a maximum length of layout. The code specified the length of a layout as INT64_MAX instead of UINT64_MAX. This could result in getting a layout for less than the full file for extremely large files. Although having little practical effect, this patch corrects this in the code. Detected during recent testing of the pNFS server.
* MFC: r317465rmacklem2017-05-101-1/+10
| | | | | | | | | | Fix handling of a NFSv4.1 callback reply from the session cache. The nfsv4_seqsession() call returns NFSERR_REPLYFROMCACHE when it has a reply in the session, due to a requestor retry. The code erroneously assumed a return of 0 for this case. This patch fixes this and adds a KASSERT(). This would be an extremely rare occurrence. It was found during code inspection during the pNFS server development.
* MFC: r317382rmacklem2017-05-081-1/+7
| | | | | | | | | | | | Allow use of a write open stateid for reading in the NFSv4 server. The NFSv4 RFCs give a server the option of allowing the use of an open stateid for write access to be used for a Read operation. This patch enables this by default and adds a sysctl to disable it, for anyone who does not want this capability. Allowing this is particularily useful for a pNFS Data Server (DS), since they are not permitted to allow the use of special stateids. Discovered during recent testing of the pNFS server under development.
* MFC: r317345rmacklem2017-05-083-10/+42
| | | | | | | | | | | | | Make the NFSv4 client to use a write open for reading if allowed by the server. An NFSv4 server has the option of allowing a Read to be done using a Write Open. If this is not allowed, the server will return NFSERR_OPENMODE. This patch attempts the read with a write open and then disables this if the server replies NFSERR_OPENMODE. This change will avoid some uses of the special stateids. This will be useful for pNFS/DS Reads, since they cannot use special stateids. It will also be useful for any NFSv4 server that does not support reading via the special stateids. It has been tested against both types of NFSv4 server.
* MFC: r317305rmacklem2017-05-081-19/+33
| | | | | | | | | | | Fix the NFSv4.1/pNFS client return layout on close. The "return layout on close" case in the pNFS client was badly broken. Fortunately, extant pNFS servers that I have tested against do not do this. This patch fixes it. It also changes the way the layout stateid.seqid is set for LayoutReturn. I think this change is correct w.r.t. the RFC, but I am not 100% sure. This was found during recent testing of the pNFS server under development.
* MFC: r317296rmacklem2017-05-082-5/+18
| | | | | | | | | | Fix some krpc leaks for the NFSv4.1/pNFS client. The NFSv4.1/pNFS client wasn't doing a newnfs_disconnect() call for the connection to the Data Server (DS) under some circumstances. The main effect of this was a leak of malloc'd structures in the krpc. This patch adds the newnfs_disconnect() calls to fix this. Detected during recent testing against the pNFS server under development.
* MFC: r317276rmacklem2017-05-071-1/+3
| | | | | | | | | Don't set ND_NOMOREDATA for a failed Setattr operation (NFSv4). The NFSv4 Setattr operation always has reply data even when it fails, so don't set the ND_NOMOREDATA for it. This would only affect unusual cases where Setattr fails and the RPC code wants to parse the rest of the compound. Detected during recent development related to the pNFS server.
* MFC: r317275, r317344rmacklem2017-05-072-2/+3
| | | | | | | | | Don't create a backchannel for a DS connection. An NFSv4.1 client connection to a Data Server (DS) should not have a backchannel. This patch fixes the NFSv4.1/pNFS client to not do a backchannel for this case. Found during recent testing with the pNFS server under development.
* MFC: r317272rmacklem2017-05-071-1/+10
| | | | | | | | | | Add checks for failed operations to the NFSv4 client function nfscl_mtofh(). The nfscl_mtofh() function didn't check for failed operations and, as such, would have returned EBADRPC for these cases, due to parsing failure. This patch adds checks, so that it returns with ND_NOMOREDATA set. This is needed for future use in the pNFS server and acts as a safety belt in the meantime.
* MFC: r317269rmacklem2017-05-071-2/+2
| | | | | | | | | | | | | | Set default uid/gid to nobody/nogroup for NFSv4 mapping. The default uid/gid for NFSv4 are set by the nfsuserd(8) daemon. However, they were 0 until the nfsuserd(8) was run. Since it is possible to use NFSv4 without running the nfsuserd(8) daemon, set them to nobody/nogroup initially. Without this patch, the values would be set by the nfsuserd(8) daemon and left changed even if the nfsuserd(8) daemon was killed. The default values of 0 meant that setting a group to "wheel" would fail even when done by root. It also adds a definition of GID_NOGROUP to sys/conf.h.
* MFC: r317236rmacklem2017-05-071-1/+3
| | | | | | | | | | | | | | | | | Fix the setting of atime for Linux client NFSv4 mounts. The FreeBSD NFSv4 server did not set the attribute bit for TimeAccess in the reply to an Open with exclusive_create, as required by the RFCs. (This is required since the FreeBSD NFS server stores the create_verifier in the va_atime attribute.) As such, the Linux NFSv4 client did not set the TimeAccess (atime) in the Setattr done in an RPC after the one with the Open/exclusive_create. This patch fixes the server to set the TimeAccess bit in the reply. I believe that storing the create_verifier in an extended attribute for file systems that support extended attributes might be a good idea, but I will wait for a discussion of this on the freebsd-fs@ email list before considering committing a patch to do this.
* MFC: r316829rmacklem2017-04-294-9/+8
| | | | | | | | | Remove unused "cred" argument to ncl_flush(). The "cred" argument of ncl_flush() is unused and it was confusing to have the code passing in NULL for this argument in some cases. This patch deletes this argument. There is no semantic change because of this patch.
* Revert r314937 as anonymous unions in GCC don't seem to work.pfg2017-04-274-80/+34
| | | | | | This has been breaking the powerpc(LINT64 at least) for quite a while now. Reported by: emaste
* MFC: r316792rmacklem2017-04-276-23/+94
| | | | | | | | | | | | Add an NFSv4.1 mount option for "use one openowner". Some NFSv4.1 servers such as AmazonEFS can only support a small fixed number of open_owner4s. This patch adds a mount option called "oneopenown" that can be used for NFSv4.1 mounts to make the client do all Opens with the same open_owner4 string. This option can only be used with NFSv4.1 and may not work correctly when Delegations are is use. Differential Revision: https://reviews.freebsd.org/D8988
* MFC: r316782rmacklem2017-04-271-0/+5
| | | | | | | | | | Add call to svcpool_close() for the NFSv4 callback pool (svcpool_nfscbd). A function called svcpool_close() was added to the server side krpc by r313735, so that a pool could be closed without destroying the data structures. This little patch adds a call to it for the callback pool (svcpool_nfscbd), so that the nfscbd daemon can be killed/restarted and continue to work correctly.
* MFC: r316745rmacklem2017-04-261-0/+32
| | | | | | | | | | | | | | | | | | Fix the NFS client for "text file modified, process killed" mmap'd case. When an mmap'd text file is written and then executed immediately afterwards, it was possible that the modify time would change after the text file was executing, resulting in the process executing the file being killed. This was usually only observed when the file system's times were set to higher resolution, but could have occurred for any time resolution. This patch adds a VOP_SET_TEXT() to the NFS client which flushed all dirty pages to the NFS server and then makes sure that n_mtime is up to date to avoid this from occurring. Thanks go to kib@ and pho@ for their help with developing this patch. The call to ncl_flush() has been removed. If r316532 is merged into stable/10, this call needs to go back into nfs_set_text().
* MFC: r316719rmacklem2017-04-261-9/+3
| | | | | | | | | | | Don't throw away Open state when a NFSv4.1 client recovery fails. If the ExchangeID/CreateSession operations done by an NFSv4.1 client after the server crashes/reboots fails, it is possible that some process/thread is waiting for an open_owner lock. If the client state is free'd, this can cause a crash. This would not normally happen, but has been observed on a mount of the AmazonEFS service.
* MFC: r316717rmacklem2017-04-261-9/+8
| | | | | | | | | | | | | During a server crash recovery, fix the NFSv4.1 client for a NFSERR_BADSESSION during recovery. If the NFSv4.1 client gets a NFSv4.1 NFSERR_BADSESSION reply to an Open/Lock operation while recovering from the server crash/reboot, allow the opens to be retained for a subsequent recovery attempt. Since NFSv4.1 servers should only reply NFSERR_BADSESSION after a crash/reboot that has lost state, this case should almost never happen. However, for the AmazonEFS file service, this has been observed when the client does a fresh TCP connection for RPCs.
* MFC: r316692rmacklem2017-04-261-0/+8
| | | | | | | | | Set initial values for nfsstatfs in the NFSv4 client. The AmazonEFS NFSv4.1 server does not support the FILES_FREE and FILES_TOTAL attributes. As such, an NFSv4.1 mount to the server would return garbage for these values. This patch initializes the fields of the nfsstatfs structure, so that "df" and friends will at least return consistent bogus values.
* MFC: r316669rmacklem2017-04-251-1/+11
| | | | | | | | | Avoid starvation of the server crash recovery thread for the NFSv4 client. This patch gives a requestor of the exclusive lock on the client state in the NFSv4 client priority over shared lock requestors. This avoids the server crash recovery thread being starved out by other threads doing RPCs.
* MFC: r316667rmacklem2017-04-251-1/+1
| | | | | | | | | | Fix the NFSv4 client hndling of a stale write verifier in the Commit operation. When the NFSv4 client Commit operation encountered a stale write verifier, it erroneously mapped that to EIO. This could have caused recently written data to be lost when a server crashes/reboots between an UNSTABLE write and the subsequent commit. This patch fixes this. The bug was only for the NFSv4 client and did not affect NFSv3.
* MFC: r316666rmacklem2017-04-251-1/+1
| | | | | | | | | | Fix the NFSv4.1 client for NFSERR_BADSESSION recovery via ReclaimComplete. For the ReclaimComplete operation, the RPC layer should not loop on NFSERR_BADSESSION. If it does, the recovery thread (nfscl) can get stuck looping and will not do a recovery. This patch fixes it so it does not loop. This bug only affects NFSv4.1 and only when a server reboots.
* MFC r316698:kib2017-04-251-8/+11
| | | | Remove debugging printf.
* MFC: r316655rmacklem2017-04-251-1/+1
| | | | | | | | | Fix parsing failure for NFSv4 Setattr operation for failed case. If an operation that preceeds a Setattr in an NFSv4 compound fails, there is no bitmap of attributes to parse. Without this patch, the parsing would fail and return EBADRPC instead of the correct failure error. This could break recovery from a server crash/reboot.
* MFC: r310491rmacklem2017-04-2511-181/+360
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix NFSv4.1 client recovery from NFS4ERR_BAD_SESSION errors. For most NFSv4.1 servers, a NFS4ERR_BAD_SESSION error is a rare failure that indicates that the server has lost session/open/lock state. However, recent testing by cperciva@ against the AmazonEFS server found several problems with client recovery from this due to it generating this failure frequently. Briefly, the problems fixed are: - If all session slots were in use at the time of the failure, some processes would continue to loop waiting for a slot on the old session forever. - If an RPC that doesn't use open/lock state failed with NFS4ERR_BAD_SESSION, it would fail the RPC/syscall instead of initiating recovery and then looping to retry the RPC. - If a successful reply to an RPC for an old session wasn't processed until after a new session was created for a NFS4ERR_BAD_SESSION error, it would erroneously update the new session and corrupt it. - The use of the first element of the session list in the nfs mount structure (which is always the current metadata session) was slightly racey. With changes for the above problems it became more racey, so all uses of this head pointer was wrapped with a NFSLOCKMNT()/NFSUNLOCKMNT(). - Although the kernel malloc() usually allocates more bytes than requested and, as such, this wouldn't have caused problems, the allocation of a session structure was 1 byte smaller than it should have been. (Null termination byte for the string not included in byte count.) There are probably still problems with a pNFS data server that fails with NFS4ERR_BAD_SESSION, but I have no server that does this to test against (the AmazonEFS server doesn't do pNFS), so I can't fix these yet. Although this patch is fairly large, it should only affect the handling of NFS4ERR_BAD_SESSION error replies from an NFSv4.1 server. Thanks go to cperciva@ for the extension testing he did to help isolate/fix these problems.
* Revert 294545:pfg2017-03-094-34/+80
| | | | | | | Bringing back ext4: add support for reading sparse files Add GCC_MS_EXTENSIONS to the CFLAGS in the module to make the old GCC in base happy. This workaround is only required in stable/10.
* MFC r283291: don't use CALLOUT_MPSAFE with callout_init()avg2017-03-041-1/+1
| | | | | The main purpose of this MFC is to reduce conflicts for other merges. Parts of the original change have already "trickled down" via individual MFCs.
* MFC r313897:pfg2017-02-241-1/+0
| | | | | | | | ext2fs: Remove unused assignment. The value is re-assigned a few lines later without being read. Found by: Clang static analyzer
* MFC r313800:kib2017-02-241-2/+7
| | | | | Do not access memory past the buffer end. Do not accept and silently truncate too long hostname.
* MFC r313735: add svcpool_close to handle killed nfsd threadsavg2017-02-211-11/+9
| | | | | | | PR: 204340 Reported by: Panzura Reviewed by: rmacklem Approved by: rmacklem
* MFC r313797:kib2017-02-191-2/+3
| | | | Minor style fixes.
* MFC r312432:kib2017-02-025-17/+54
| | | | Add a mount option for tmpfs(5) to not use namecache.
* MFC r312430:kib2017-02-021-0/+127
| | | | Implement VOP_VPTOCNP() for tmpfs.
* MFC r312429:kib2017-02-021-5/+0
| | | | VNON nodes cannot exist.
* MFC r312428:kib2017-02-024-44/+130
| | | | Refcount tmpfs nodes and mount structures.
* MFC r312425:kib2017-01-262-7/+11
| | | | Make tmpfs directory cursor available outside tmpfs_subr.c.
* MFC r312414:kib2017-01-262-7/+8
| | | | Rename tmpfs_mount member allnode_lock to include namespace prefix.
* MFC r312410:kib2017-01-263-14/+2
| | | | | | | Rework some tmpfs lock assertions. MFC r312412: Protect macro argument.
* MFC r312409:kib2017-01-264-164/+179
| | | | | | | Style fixes and comment updates. MFC r312435: Remove mistakenly merged field.
OpenPOWER on IntegriCloud