summaryrefslogtreecommitdiffstats
path: root/sys/fs
Commit message (Collapse)AuthorAgeFilesLines
* Rename global cnt to vm_cnt to avoid shadowing.bdrewery2014-03-221-1/+1
| | | | | | | | | | | | | | To reduce the diff struct pcu.cnt field was not renamed, so PCPU_OP(cnt.field) is still used. pc_cnt and pcpu are also used in kvm(3) and vmstat(8). The goal was to not affect externally used KPI. Bump __FreeBSD_version_ in case some out-of-tree module/code relies on the the global cnt variable. Exp-run revealed no ports using it directly. No objection from: arch@ Sponsored by: EMC / Isilon Storage Division
* Revert r263449;pfg2014-03-211-18/+2
| | | | | | | | | ext2fs: minor update to the dirpref policy. The change in UFS r254996, reverted the change as the older code seems to work better. This was not visible in local testing but we can trust UFS is vastly more exercised in diferent environments.
* ext2fs: minor update to the dirpref policy.pfg2014-03-201-2/+18
| | | | | | | | | | Bring in a minor change to the dirpref policy based on r248623. This is pretty minimal change to keep the implementation in sync with UFS but other parts from the original change are not directly applicable so don't expect improvements in fsck times. MFC after: 2 weeks
* msdosfs: minor format fix - spaces vs tabpfg2014-03-201-1/+1
| | | | MFC after: 3 days
* Update kernel inclusions of capability.h to use capsicum.h instead; somerwatson2014-03-164-4/+4
| | | | | | | | further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. MFC after: 3 weeks
* Add missing FALLTHROUGH comment in tmpfs_dir_getdents for looking up '.' andbdrewery2014-03-141-0/+1
| | | | | | | | '..'. Reviewed by: Russell Cattelan Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks
* Rename cnt to maxcookies and change its use as the condition for when tobdrewery2014-03-142-14/+21
| | | | | | | | | | | | lookup cookies to be less obscure. No functional change. Since r245115, cnt has not really been needed in tmpfs_dir_getdents(). Keep it for the MPASS() for now though. Sponsored by: EMC / Isilon Storage Division MFC after: 2 weeks
* Cleanup redundant logic and add some comments to help explain howbdrewery2014-03-141-14/+17
| | | | | | | it works in lieu of potentially less clear code. Sponsored by: EMC / Isilon Storage Division Discussed with: Russell Cattelan
* Fix -o size less than PAGE_SIZE resulting in SIZE_MAX being used.bdrewery2014-03-141-2/+4
| | | | | Discussed with: kib MFC after: 2 weeks
* ext2fs: Fix a bug when sorting htree entries.pfg2014-03-061-1/+1
| | | | | | | This a typo introduced when bringing the original code from NetBSD. Reported by: Mike Ma MFC after: 3 days
* ext2fs: small formatting fixes.pfg2014-03-012-3/+3
| | | | | | | Remove some redundant spaces. No functional change. MFC after: 3 days
* ext2fs: use of tab vs spaces.pfg2014-02-2817-218/+216
| | | | | | | | | | Consistently use a single tab after a #define as mentioned in style(9). Use tabs instead of space for indenting. Fix a typo: "hash_vesion". No functional change. MFC after: 3 days
* ext2fs: fully enable ext4 read-only support.pfg2014-02-222-5/+13
| | | | | | | | | | | | | | | | | The ext4 developers tend to tag Ext4-specific flags as "incompatible" even when such features are not relevant for read-only support. This is a consequence of the process though which this filesystem is implemented without design and the fact that some new features are not extensible to ext2/3. Organize the features according to what we support and sort them so that we can now read-only mount filesystems with some features that may be found in newly formatted ext4 fs. Submitted by: Zheng Liu Reviewed by: pfg MFC after: 5 days
* In sys/fs/nandfs/nandfs_vfsops.c, #if 0 an unused static function.dim2014-02-151-0/+2
| | | | MFC after: 3 days
* ext2fs: Use i_flag instead of i_flags for Ext4 inode flags.pfg2014-01-288-16/+17
| | | | | | | | | | | | | The ext4 inode flags do not have equivalents for chflags (1) and hold information that is private to the implementation. The i_flag field in the inode is a better place to hold the Ext4 inode flags as it saves us from masking flags while setting or getting attributes. It should also make things cleaner if we implement write support for Ext4. Suggested by: bde Tested by: Mike Ma MFC after: 3 days
* ext2fs: Re-enable reallocblk.pfg2014-01-241-2/+2
| | | | | | | The major corruption issues affecting this code have been fixed a while ago. MFC after: 1 week
* ext2fs: fix a bug in dirindex and re-enable.pfg2014-01-242-6/+1
| | | | | | | | | | | | The IN_* flags should be set in i_flag instead of corrupting i_flags [1]. Re-enable HTree dirindex as the last series of bug fixes seems to have fixed the issues. Reported by: bde [1] Tested by: kevlo MFC after: 1 week
* ext2fs: fix logic error in the previous change.pfg2014-01-223-6/+5
| | | | | | | | | Use the bitwise negation instead of bogus boolean negation and move the flag manipulation with the assignment. Fix some grammatical errors introduced in the same change. Reported by: bde MFC after: 3 days
* ext2fs: Translate the EXT4_EXTENTS and EXT4_INDEX to the inode flags.pfg2014-01-219-21/+35
| | | | | | | | | | | | | | r260545 cleared the inode flags to fix corruption problems but we still need to pass some EXT4 flags for the ext4 read-only mode. None of these attributes has an equivalent in FreeBSD and are uninteresting for the system utilities so they should be innaccessible in ext2_getattrib(). Note: we also use EXT4_HUGE_FILE but we use it directly from the dinode structure so it is not necessary to translate it, Suggested by: bde MFC after: 3 days
* Fix lock leak in purely hypothetical case of TCP connection without SVC_ACKmav2014-01-143-11/+13
| | | | | | method. This change should be NOP now, but it is better to be future safe. Reported by: rmacklem
* ext2fs: fix inode flag conversion.pfg2014-01-111-2/+2
| | | | | | | | | | | | | | | | | | After r252890 we are naively attempting to pass through the inode flags. This is technically incorrect as the ext2 inode flags don't match the UFS/system values used in FreeBSD and a clean conversion is needed. Some filtering was left in place so the change didn't cause significant changes in FreeBSD but some of the garbage passed is likely to be the cause for warning messages in linux. Fix the issue by resetting the flags before conversion as was done previously. This also means we will not pass the EXT4_* inode flags into FreeBSD's inode. PR: kern/185448 MFC after: 3 days
* Fix off-by-one error in r260229.mav2014-01-071-1/+1
| | | | Coverity CID: 1148955
* Rework NFS Duplicate Request Cache cleanup logic.mav2014-01-036-160/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | - Introduce additional hash to group requests by hash of sockref. This allows to process TCP acknowledgements without looping though all the cache, and as result allows to do it every time. - Indroduce additional callbacks to notify application layer about sockets disconnection. Without this last few requests processed just before socket disconnection never processed their ACKs and stuck in cache for many hours. - Implement transport-specific method for tracking reply acknowledgements. New implementation does not cross multiple stack layers to get the data and does not have race conditions that previously made some requests stuck in cache. This could be done more efficiently at sockbuf layer, but that would broke some KBIs, while I don't know other consumers for it aside NFS. - Instead of traversing all DRC twice per request, run cleaning only once per request, and except in some conditions traverse only single hash slot at a time. Together this limits NFS DRC growth only to situations of real connectivity problems. If network is working well, and so all replies are acknowledged, cache remains almost empty even after hours of heavy load. Without this change on the same test cache was growing to many thousand requests even with perfectly working local network. As another result this reduces CPU time spent on the DRC handling during SPEC NFS benchmark from about 10% to 0.5%. Sponsored by: iXsystems, Inc.
* Slightly simplify expiration logic introduced in r254337.mav2013-12-251-12/+20
| | | | | | - Do not update the histogram for items we are any way deleting from cache. - Do not update the histogram if nfsrc_tcphighwater is not set. - Remove some extra math operations.
* The NFSv4 server would call VOP_SETATTR() with a shared locked vnodermacklem2013-12-253-5/+12
| | | | | | | | when a Getattr for a file is done by a client other than the one that holds the file's delegation. This would only happen when delegations are enabled and the problem is fixed by this patch. MFC after: 1 week
* An intermittent problem with NFSv4 exporting of ZFS snapshots wasrmacklem2013-12-241-0/+37
| | | | | | | | | | | | | | | | | | | | reported to the freebsd-fs mailing list. I believe the problem was caused by the Readdir operation using VFS_VGET() for a snapshot file entry instead of VOP_LOOKUP(). This would not occur for NFSv3, since it will do a VFS_VGET() of "." which fails with ENOTSUPP at the beginning of the directory, whereas NFSv4 does not check "." or "..". This patch adds a call to VFS_VGET() for the directory being read to check for ENOTSUPP. I also observed that the mount_on_fileid and fsid attributes were not correct at the snapshot's auto mountpoints when looking at packet traces for the Readdir. This patch fixes the attributes by doing a check for different v_mount structure, even if the vnode v_mountedhere is not set. Reported by: jas@cse.yorku.ca Tested by: jas@cse.yorku.ca Reviewed by: asomers MFC after: 1 week
* The NFSv4 client was passing both the p and cred arguments tormacklem2013-12-242-2/+7
| | | | | | | | | nfsv4_fillattr() as NULLs for the Getattr callback. This caused nfsv4_fillattr() to not fill in the Change attribute for the reply. I believe this was a violation of the RFC, but had little effect on server behaviour. This patch passes a non-NULL p argument to fix this. MFC after: 1 week
* ext2fs: make the hashing algorithm match the linux code.pfg2013-12-231-2/+2
| | | | | | | | There appears to be a hash function compatibility issue. The code is currently disabled but fix it nevertheless. PR: kern/183230 MFC after: 3 days
* The NFSv4.1 client didn't return NFSv4.1 specific error codesrmacklem2013-12-231-6/+11
| | | | | | | | for the Getattr and Recall callbacks. This patch fixes it. Since the NFSv4.1 specific error codes would only happen for abnormal circumstances, this patch has little effect, in practice. MFC after: 1 week
* Fix RPC server threads file handle affinity to work better with ZFS.mav2013-12-231-7/+11
| | | | | | | | | | | | Instead of taking 8 specific bytes of file handle to identify file during RPC thread affitinity handling, use trivial hash of the full file handle. ZFS's struct zfid_short does not have padding field after the length field, as result, originally picked 8 bytes are loosing lower 16 bits of object ID, causing many false matches and unneeded requests affinity to same thread. This fix substantially improves NFS server latency and scalability in SPEC NFS benchmark by more flexible use of multiple NFS threads. Sponsored by: iXsystems, Inc.
* Do not allow O_EXEC opens for fifo, return EINVAL.kib2013-12-171-1/+1
| | | | | | | | | Besides not making sense, open(O_EXEC) for fifo creates fifoinfo with zero readers and writers counts, which causes premature free of pipes. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Fix long known bug with handling device aliases residing not in devfs root.mav2013-12-121-4/+12
| | | | | | | | | Historically creation of device aliases created symbolic links using only name of target device as a link target, not considering current directory. Fix that by adding number of "../" chunks to the terget device name, required to get out of the current directory to devfs root first. MFC after: 1 month
* For software builds, the NFS client does many smallrmacklem2013-12-074-11/+42
| | | | | | | | | | | | | | | | | | | | | synchronous (with FILE_SYNC) writes because non-contiguous byte ranges in the same buffer cache block are being written. This patch adds a new mount option "noncontigwr" which allows the non-contiguous byte ranges to be combined, with the dirty byte range becoming the superset of the bytes that are dirty, if the file has not been file locked. This reduces the number of writes significantly for software builds. The only case where this change might break existing applications is where an application is writing non-overlapping byte ranges within the same buffer cache block of a file from multiple clients concurrently. Since such an application would normally do file locking on the file, avoiding the byte range merge for files that have been file locked should be sufficient for most (maybe all?) cases. Submitted by: jhb (earlier version) Reviewed by: kib MFC after: 3 weeks
* ext2fs: add two new reserved inodes.pfg2013-12-041-0/+2
| | | | | | | | | | | According to online documentation [1], Ext4 has two new "special" inodes so add the new exclude and replica inodes. Reference: [1] https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout Reported by: Mike Ma MFC after: 3 weeks
* - Nuke a second copy of nfscl_attrcache extern declarations from underpluknet2013-11-261-12/+5
| | | | | ifdef KDTRACE_HOOKS. This fixes kernel build with options KDTRACE_HOOKS. - Fix style inconsistencies.
* Fix build, attempt two.glebius2013-11-261-3/+10
|
* Fix build.glebius2013-11-261-15/+0
|
* - For kernel compiled only with KDTRACE_HOOKS and not any lock debuggingattilio2013-11-256-9/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | option, unbreak the lock tracing release semantic by embedding calls to LOCKSTAT_PROFILE_RELEASE_LOCK() direclty in the inlined version of the releasing functions for mutex, rwlock and sxlock. Failing to do so skips the lockstat_probe_func invokation for unlocking. - As part of the LOCKSTAT support is inlined in mutex operation, for kernel compiled without lock debugging options, potentially every consumer must be compiled including opt_kdtrace.h. Fix this by moving KDTRACE_HOOKS into opt_global.h and remove the dependency by opt_kdtrace.h for all files, as now only KDTRACE_FRAMES is linked there and it is only used as a compile-time stub [0]. [0] immediately shows some new bug as DTRACE-derived support for debug in sfxge is broken and it was never really tested. As it was not including correctly opt_kdtrace.h before it was never enabled so it was kept broken for a while. Fix this by using a protection stub, leaving sfxge driver authors the responsibility for fixing it appropriately [1]. Sponsored by: EMC / Isilon storage division Discussed with: rstone [0] Reported by: rstone [1] Discussed with: philip
* Redo r258088 to avoid relying on signed arithmetic overflow, sincekib2013-11-201-9/+4
| | | | | | | | | | | | compiler interprets this as an undefined behaviour. Instead, ensure that the sum of uio_offset and uio_resid is below OFF_MAX using the operation which cannot overflow. Reported and tested by: pho Discussed with: bde Approved by: des (pseudofs maintainer) Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Remove useless comparisions of assigned offset and resid with thekib2013-11-131-4/+6
| | | | | | | | | | | | | sources from uio. Both uio_offset and offset, and uio_resid and resid have the same types for some time. Add check for buflen overflow by comparing the buflen with both offset and resid (vs. comparing with offset only, as it is currently done). Reported and tested by: pho Approved by: des (pseudofs maintainer) Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Fix an NFSv4.1 client specific case where a forced dismount would hang.rmacklem2013-11-093-7/+21
| | | | | | | | The hang occurred in nfsv4_setsequence() when it couldn't find an available session slot and is fixed by checking for a forced dismount in progress and just returning for this case. MFC after: 1 month
* During code inspection, I spotted that there was a code path wherermacklem2013-11-031-10/+11
| | | | | | | | | | CLNT_CONTROL() would be called on "client" after it was released via CLNT_RELEASE(). It was unlikely that this code path gets executed and I have not heard of any problem report caused by this bug. This patch fixes the code so that this cannot happen. MFC after: 2 months
* The r48589 promised to remove implicit inclusion of if_var.h soon. Prepareglebius2013-10-261-0/+1
| | | | | | | | to this event, adding if_var.h to files that do need it. Also, include all includes that now are included due to implicit pollution via if_var.h Sponsored by: Netflix Sponsored by: Nginx, Inc.
* UFS2: make di_extsize unsigned.pfg2013-10-241-1/+1
| | | | | | | | di_extsize is the EA size and as such it should be unsigned. Adjust related types for consistency. Reviewed by: mckusick (previous version) MFC after: 3 weeks
* Similar to debug.iosize_max_clamp sysctl, introducekib2013-10-151-0/+4
| | | | | | | | | devfs_iosize_max_clamp sysctl, which allows/disables SSIZE_MAX-sized i/o requests on the devfs files. Sponsored by: The FreeBSD Foundation Reminded by: Dmitry Sivachenko <trtrmitya@gmail.com> MFC after: 1 week
* Remove two instances of ARGSUSED comment, and wrap lines nearby thekib2013-10-151-4/+4
| | | | | | | code that is to be changed. Sponsored by: The FreeBSD Foundation MFC after: 1 week
* NULL stale pointers (should be a no-op as they should no longer bejmg2013-09-251-0/+5
| | | | | | | | | used)... Reviewed by: dteske Approved by: re (kib) Sponsored by: Vicor MFC after: 3 days
* fix a bug where we access a bread buffer after we have brelse'd it...jmg2013-09-251-5/+5
| | | | | | | | | | | The kernel normally didn't unmap/context switch away before we accessed the buffer most of the time, but under heavy I/O pressure and lots of mount/unmounting this would cause a fault on nofault panic... Reviewed by: dteske Approved by: re (kib) Sponsored by: Vicor MFC after: 3 days
* Fix the length calculation for the final block of a sendfile(2)des2013-09-101-0/+10
| | | | | | | | | | | | | | | | | | | | transmission which could be tricked into rounding up to the nearest page size, leaking up to a page of kernel memory. [13:11] In IPv6 and NetATM, stop SIOCSIFADDR, SIOCSIFBRDADDR, SIOCSIFDSTADDR and SIOCSIFNETMASK at the socket layer rather than pass them on to the link layer without validation or credential checks. [SA-13:12] Prevent cross-mount hardlinks between different nullfs mounts of the same underlying filesystem. [SA-13:13] Security: CVE-2013-5666 Security: FreeBSD-SA-13:11.sendfile Security: CVE-2013-5691 Security: FreeBSD-SA-13:12.ifioctl Security: CVE-2013-5710 Security: FreeBSD-SA-13:13.nullfs Approved by: re
* ext2fs: temporarily disable htree directory index.pfg2013-09-072-0/+4
| | | | | | | | | | | | Our code does not consider yet the case of hash collisions. This is a rather annoying situation where two or more files that happen to have the same hash value will not appear accessible. The situation is not difficult to work-around but given that things will just work without enabling htree we will save possible embarrassments for the next release. Reported by: Kevin Lo
OpenPOWER on IntegriCloud