FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	MFC of 291244, 291380, 291459, 291460, 291671, and 291743:	mckusick	2015-12-30	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This MFC includes changes to better manage the vnode freelist and to streamline the allocation and freeing of vnodes. Note that to maintain the KPI the VI_AGE flag is left defined in sys/vnode.h though its use is dropped as described in 291380. To maintain KBI the vfs.vlru_alloc_cache_src sysctl variable remains though it no longer has any effect as described in 291244. MFC of 291244: Move the comment about resident pages preventing vnode from leaving active list, into the header comment for vdrop(), which is the function that decides whether to leave the vnode on the list. Note that dirty page write-out in vinactive() is asynchronous. Discussed with: alc Sponsored by: The FreeBSD Foundation MFC of 291380: Remove VI_AGE vnode iflag, it is unused. Noted by: bde Sponsored by: The FreeBSD Foundation MFC of 291459: For performance reasons, it is useful to have a single string used as the name of a filesystem when setting it as the first parameter to the getnewvnode() function. Most filesystems call getnewvnode from just one place so can use a literal string as the first parameter. However, NFS calls getnewvnode from two places, so we create a global constant string that can be used by the two instances. This change also collapses two instances of getnewvnode() in the UFS filesystem to a single call. Reviewed by: kib Tested by: Peter Holm MFC of 291460: As the kernel allocates and frees vnodes, it fully initializes them on every allocation and fully releases them on every free. These are not trivial costs: it starts by zeroing a large structure then initializes a mutex, a lock manager lock, an rw lock, four lists, and six pointers. And looking at vfs.vnodes_created, these operations are being done millions of times an hour on a busy machine. As a performance optimization, this code update uses the uma_init and uma_fini routines to do these initializations and cleanups only as the vnodes enter and leave the vnode_zone. With this change the initializations are only done kern.maxvnodes times at system startup and then only rarely again. The frees are done only if the vnode_zone shrinks which never happens in practice. For those curious about the avoided work, look at the vnode_init() and vnode_fini() functions in kern/vfs_subr.c to see the code that has been removed from the main vnode allocation/free path. Reviewed by: kib Tested by: Peter Holm MFC of 291671: We need to zero out the union of pointers in a freed vnode structure. Fix from: Mateusz Guzik Tested by: Jason Unovitch MFC of 291743: We need to zero out the clustering variables in a freed vnode structure. For completeness add a VNASSERT that there are no threads waiting on a range lock (this was previously checked on every vnode free). Reported by; Rick Macklem Fix from: Mateusz Guzik
*	MFC r292541:	kib	2015-12-28	1	-32/+32
\| \| \| \|	Recheck curthread->td_su after the VFS_SYNC() call.
*	MFC r291936:	kib	2015-12-21	1	-8/+3
\| \| \| \| \|	Update ctime when atime or birthtime are updated. Cleanup setting of ctime/mtime/birthtime.
*	MFC r290047:	kib	2015-11-10	1	-2/+6
\| \| \| \| \|	Do not perform read-ahead for BA_CLRBUF request when we are low on memory or when dirty buffer queue is too large.
*	MFC r287361:	kib	2015-09-16	1	-19/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Handle excess of D_NEWBLK in the same way as excess of D_INODEDEP and D_DIRREM, by scheduling ast to flush dependencies. For 32bit arches, reduce the total amount of allowed dependencies by two. MFC r287479: Declare the writes around the call to VFS_SYNC() in softdep_ast_cleanup_proc(). MFC r287483: Do not consume extra reference.
*	MFC r284887:	kib	2015-07-11	1	-1/+18
\| \| \| \| \| \| \| \| \|	Handle errors from background write of the cylinder group blocks. MFC r284927: Simplify code. Approved by: re (gjb)
*	MFC r284495:	kib	2015-07-01	1	-3/+11
\| \| \| \| \|	Keep a vnode which is freed but still owing inactivation, on the active list. This closes a race where such vnode is not msync-ed until reboot.
*	MFC r283832:	kib	2015-06-14	1	-2/+0
\| \| \| \|	Remove unused variable.
*	MFC r283968:	kib	2015-06-10	1	-2/+2
\| \| \| \| \| \|	Syncing a directory vnode might drop the vnode lock in the softdep_sync() similarly to the regular vnode sync. Allow retry for both vnode types.
*	MFC r283604:	kib	2015-06-10	1	-38/+17
\| \| \| \|	Remove NODELAY flag.
*	MFC r283600:	kib	2015-06-10	1	-14/+104
\| \| \| \| \| \| \| \|	Perform SU cleanup in the AST handler. Do not sleep waiting for SU cleanup while owning vnode lock. On MFC, for KBI stability, td_su member was moved to the end of the struct thread.
*	MFC r283735:	kib	2015-06-05	6	-36/+2
\| \| \| \|	Remove several write-only variables.
*	MFC of 269533:	mckusick	2015-05-28	3	-16/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Limit the number of cylinder groups that will be searched when trying to build a cluster. The limit is tunable using the sysctl vfs.ffs.maxclustersearch. The current limit is 10 cylinder groups per block allocation. It was previously limited to the number of cylinder groups in the filesystem per block allocation. When there were no clusters of the needed size left, it repeatedly searched the whole filesystem for a non-existent cluster on every block allocation. The result was very slow filesystem allocation with 100% CPU utilization. The old behavior can be had by setting vfs.ffs.maxclustersearch to a huge number (1,000,000). This change affects only the layout policy routines so is not able to interfere with the integrity of the filesystem. Reported by: Dmitry Sivachenko (demon@) Tested by: Dmitry Sivachenko (demon@)
*	MFC: r281562	rmacklem	2015-04-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	File systems that do not use the buffer cache (such as ZFS) must use VOP_FSYNC() to perform the NFS server's Commit operation. This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which is set by file systems that use the buffer cache. If this flag is not set, the NFS server always does a VOP_FSYNC(). This should be ok for old file system modules that do not set MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although it might not be optimal for file systems that use the buffer cache.
*	MFC r280760:	kib	2015-04-10	2	-24/+68
\| \| \| \| \| \| \|	Fix the hand after the immediate reboot after the init binary is unlinked. MFC r280763: Fix build (with gcc).
*	Merge r263233 from HEAD to stable/10:	rwatson	2015-03-19	1	-1/+1
\| \| \| \| \| \| \| \| \|	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. Sponsored by: Google, Inc.
*	MFC r277922:	kib	2015-02-13	2	-12/+21
\| \| \| \| \| \| \| \|	When mounting SU-enabled mount point, wait until the softdep_flush() thread started and incremented the stat_flush_threads. MFC r278257: Partially revert r277922.
*	MFC r277794:	kib	2015-02-03	2	-3/+15
\| \| \| \| \| \|	The sys_quotactl() contract demands that the mount point is vfs_unbusy()ed when the cmd is Q_QUOTAON, regardless of other input parameters or error return.
*	MFC r276007:	kib	2015-01-04	1	-1/+3
\| \| \| \|	Handle MAKEENTRY cnp flag in the VOP_CREATE().
*	MFC r275897:	kib	2015-01-01	2	-2/+3
\| \| \| \| \|	Set NOCACHE flag for CREATE namei() calls, do not specially handle MAKEENTRY in VOP_LOOKUP().
*	MFC r273967:	kib	2014-11-09	1	-5/+6
\| \| \| \| \|	Only trigger a panic when forced operation is done. Convert direct panic() call into KASSERT().
*	MFC r272952:	kib	2014-10-18	1	-1/+1
\| \| \| \|	Do not set IN_ACCESS flag for read-only mounts.
*	MFC r270797:	kib	2014-09-05	1	-0/+13
\| \| \| \| \| \| \| \|	Direct access to the quota files, in particular, lookup, causes lock conflict with the quota metadata access. Mark quota vnode lock as recursive and always exclusive to avoid the problem. Approved by: re (gjb)
*	MFC r270204:	kib	2014-08-27	1	-9/+0
\| \| \| \|	Do not busy the UFS mount point inside VOP_RENAME().
*	MFC r270203:	kib	2014-08-27	1	-1/+1
\| \| \| \|	Correct the test for condition to suspend UFS filesystem during unmount.
*	MFC of 269533 (by mckusick):	mckusick	2014-08-18	2	-109/+254
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for multi-threading of soft updates. Replace a single soft updates thread with a thread per FFS-filesystem mount point. The threads are associated with the bufdaemon process. Reviewed by: kib Tested by: Peter Holm and Scott Long MFC after: 2 weeks Sponsored by: Netflix MFC of 269853 (by kib): Revision r269457 removed the Giant around mount and unmount code, but r269533, which was tested before r269457 was committed, implicitely relied on the Giant to protect the manipulations of the softdepmounts list. Use softdep global lock consistently to guarantee the list structure now. Insert the new struct mount_softdeps into the softdepmounts only after it is sufficiently initialized, to prevent softdep_speedup() from accessing bare memory. Similarly, remove struct mount_softdeps for the unmounted filesystem from the tailq before destroying structure rwlock. Reported and tested by: pho Reviewed by: mckusick Sponsored by: The FreeBSD Foundation
*	MFC of 269674:	mckusick	2014-08-14	1	-0/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The journal is only prepared to handle full-size block numbers, so we have to adjust freeblk records to reflect the change to a full-size block. For example, suppose we have a block made up of fragments 8-15 and want to free its last two fragments. We are given a request that says: FREEBLK ino=5, blkno=14, lbn=0, frags=2, oldfrags=0 where frags are the number of frags to free and oldfrags are the number of fragments to keep. To block align it, we have to change it to have a valid full-size blkno, so it becomes: FREEBLK ino=5, blkno=8, lbn=0, frags=2, oldfrags=6 Submitted by: Mikihito Takehara Tested by: Mikihito Takehara Reviewed by: Jeff Roberson
*	MFC r268764:	kib	2014-07-30	1	-4/+0
\| \| \| \| \|	Check for the cross-device cross-link attempt in the VFS, instead of VOP_LINK() implemenations.
*	MFC r268612:	kib	2014-07-28	1	-44/+7
\| \| \| \| \| \| \|	Add helper helper vfs_write_suspend_umnt(). Fix the bug in the FFS unmount, when suspension failed, the ufs extattrs were not reinitialized.
*	Merge r265463:	scottl	2014-07-01	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \|	Due to reasons unknown at this time, the system can be forced to write a journal block even when there are no journal entries to be written. Until the root cause is found, handle this case by ensuring that a valid journal segment is always written. Second, the data buffer used for writing journal entries was never being scrubbed of old data. Fix this. Submitted by: Takehara Mikihito Obtained from: Netflix, Inc.
*	MFC r267564:	kib	2014-06-24	1	-29/+2
\| \| \| \| \|	In msdosfs_setattr(), add a check for result of the utimes(2) permissions test. Refactor the permission checks for utimes(2).
*	MFC r267226:	kib	2014-06-15	1	-6/+4
\| \| \| \| \| \| \|	Initialize the pbuf counter for directio using SYSINIT. Mark ffs_rawread.c as requiring both ffs and directio options to be compiled into the kernel. Add ffs_rawread.c to the list of ufs.ko module' sources.
*	MFC r262814	scottl	2014-04-15	1	-2/+10
\| \| \| \| \| \| \| \| \| \| \|	- If we fail to do a non-blocking acquire of a buf lock while doing a waiting sync pass we need to do a blocking acquire and restart. Another thread, typically the buf daemon, may have this buf locked and if we don't wait we can fail to sync the file. This lead to a great variety of softdep panics because we rely on all dependencies being flushed before proceeding in several cases. Submitted by: jeffr
*	MFC r262812	scottl	2014-04-15	1	-3/+6
\| \| \| \| \| \| \| \|	- Gracefully handle truncation failures when trying to shrink directories. This could cause dirhash panics since the dirhash state would be successfully truncated while the directory was not. Submitted by: jeffr
*	MFC r262678;	pfg	2014-03-05	17	-196/+196
\| \| \| \| \| \| \| \| \| \|	ufs: small formatting fixes. Cleanup some extra space. Use of tabs vs. spaces. No functional change. Reviewed by: mckusick
*	MFC of 260088:	mckusick	2014-01-17	1	-4/+4
\| \| \| \| \| \| \| \|	Fine tune filesystem block allocations under low free-space conditions (-r254995) based on further operational experience. Submitted by: Dmitry Sivachenko Fix Tested by: Dmitry Sivachenko
*	MFC of 260079:	mckusick	2014-01-17	1	-10/+4
\| \| \| \|	Properly handle unsigned comparison.
*	MFC of 256801, 256803, 256808, 256812, 256817, 256845, and 256860.	mckusick	2013-12-30	6	-702/+1023
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This set of changes puts in place the infrastructure to allow soft updates to be multi-threaded. It introduces no functional changes from its current operation. MFC of 256860: Allow kernels without options SOFTUPDATES to build. This should fix the embedded tinderboxes. Reviewed by: emaste MFC of 256845: Fix build problem on ARM (which defaults to building without soft updates). Reported by: Tinderbox Sponsored by: Netflix MFC of 256817: Restructuring of the soft updates code to set it up so that the single kernel-wide soft update lock can be replaced with a per-filesystem soft-updates lock. This per-filesystem lock will allow each filesystem to have its own soft-updates flushing thread rather than being limited to a single soft-updates flushing thread for the entire kernel. Move soft update variables out of the ufsmount structure and into their own mount_softdeps structure referenced by ufsmount field um_softdep. Eventually the per-filesystem lock will be in this structure. For now there is simply a pointer to the kernel-wide soft updates lock. Change all instances of ACQUIRE_LOCK and FREE_LOCK to pass the lock pointer in the mount_softdeps structure instead of a pointer to the kernel-wide soft-updates lock. Replace the five hash tables used by soft updates with per-filesystem copies of these tables allocated in the mount_softdeps structure. Several functions that flush dependencies when too many are allocated in the kernel used to operate across all filesystems. They are now parameterized to flush dependencies from a specified filesystem. For now, we stick with the round-robin flushing strategy when the kernel as a whole has too many dependencies allocated. While there are many lines of changes, there should be no functional change in the operation of soft updates. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256812: Fourth of several cleanups to soft dependency implementation. Add KASSERTS that soft dependency functions only get called for filesystems running with soft dependencies. Calling these functions when soft updates are not compiled into the system become panic's. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256808: Third of several cleanups to soft dependency implementation. Ensure that softdep_unmount() and softdep_setup_sbupdate() only get called for filesystems running with soft dependencies. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256803: Second of several cleanups to soft dependency implementation. Delete two unused functions in ffs_sofdep.c. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix MFC of 256801: First of several cleanups to soft dependency implementation. Convert three functions exported from ffs_softdep.c to static functions as they are not used outside of ffs_softdep.c. No functional change. Tested by: Peter Holm and Scott Long Sponsored by: Netflix
*	MFC of 258789:	mckusick	2013-12-29	1	-2/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We needlessly panic when trying to flush MKDIR_PARENT dependencies. We had previously tried to flush all MKDIR_PARENT dependencies (and all the NEWBLOCK pagedeps) by calling ffs_update(). However this will only resolve these dependencies in direct blocks. So very large directories with MKDIR_PARENT dependencies in indirect blocks had not yet gotten flushed. As the directory is in the midst of doing a complete sync, we simply defer the checking of the MKDIR_PARENT dependencies until the indirect blocks have been sync'ed. Reported by: Shawn Wallbridge of imaginaryforces.com Tested by: John-Mark Gurney <jmg@funkthat.com> PR: 183424
*	MFC r256448, r257029;	pfg	2013-12-11	2	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	Make di_blocks unsigned in UFS1 as is the case already for UFS2. Most of the code between UFS1 and UFS2 is shared so this change is pretty safe. Not only this makes UFS1 and 2 consistent but it also matches what NetBSD and MacOS X have for some years now. UFS2: make di_extsize unsigned. di_extsize is the EA size and as such it should be unsigned. Adjust related types for consistency. Reviewed by: mckusick
*	Change the cap_rights_t type from uint64_t to a structure that we can extend	pjd	2013-09-05	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
*	In looking at block layouts as part of fixing filesystem block	mckusick	2013-08-28	1	-2/+2
\| \| \| \| \| \| \| \|	allocations under low free-space conditions (-r254995), determine that old block-preference search order used before -r249782 worked a bit better. This change reverts to that block-preference search order. MFC after: 2 weeks
*	A performance problem was reported in PR kern/181226:	mckusick	2013-08-28	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I have 25TB Dell PERC 6 RAID5 array. When it becomes almost full (10-20GB free), processes which write data to it start eating 100% CPU and write speed drops below 1MB/sec (normally to gives 400MB/sec). The revision at which it first became apparent was http://svnweb.freebsd.org/changeset/base/249782. The offending change reserved an area in each cylinder group to store metadata. The new algorithm attempts to save this area for metadata and allows its use for non-metadata only after all the data areas have been exhausted. The size of the reserved area defaults to half of minfree, so the filesystem reports full before the data area can completely fill. However, in this report, the filesystem has had minfree reduced to 1% thus forcing the metadata area to be used for data. As the filesystem approached full, it had only metadata areas left to allocate. The result was that every block allocation had to scan summary data for 30,000 cylinder groups before falling back to searching up to 30,000 metadata areas. The fix is to give up on saving the metadata areas once the free space reserve drops below 2%. The effect of this change is to use the old algorithm of just accepting the first available block that we find. Since most filesystems use the default 5% minfree, this will have no effect on their operation. For those that want to push to the limit, they will get their crappy block placements quickly. Submitted by: Dmitry Sivachenko Fix Tested by: Dmitry Sivachenko PR: kern/181226 MFC after: 2 weeks
*	Take a very small step toward the Century of the Anchovy by increasing the	ivoras	2013-08-28	1	-1/+1
\| \| \| \| \|	time dirhash entries stay in memory before being considered for eviction to 1 minute.
*	Expand the use of stat(2) flags to allow storing some Windows/DOS	ken	2013-08-21	1	-3/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and CIFS file attributes as BSD stat(2) flags. This work is intended to be compatible with ZFS, the Solaris CIFS server's interaction with ZFS, somewhat compatible with MacOS X, and of course compatible with Windows. The Windows attributes that are implemented were chosen based on the attributes that ZFS already supports. The summary of the flags is as follows: UF_SYSTEM: Command line name: "system" or "usystem" ZFS name: XAT_SYSTEM, ZFS_SYSTEM Windows: FILE_ATTRIBUTE_SYSTEM This flag means that the file is used by the operating system. FreeBSD does not enforce any special handling when this flag is set. UF_SPARSE: Command line name: "sparse" or "usparse" ZFS name: XAT_SPARSE, ZFS_SPARSE Windows: FILE_ATTRIBUTE_SPARSE_FILE This flag means that the file is sparse. Although ZFS may modify this in some situations, there is not generally any special handling for this flag. UF_OFFLINE: Command line name: "offline" or "uoffline" ZFS name: XAT_OFFLINE, ZFS_OFFLINE Windows: FILE_ATTRIBUTE_OFFLINE This flag means that the file has been moved to offline storage. FreeBSD does not have any special handling for this flag. UF_REPARSE: Command line name: "reparse" or "ureparse" ZFS name: XAT_REPARSE, ZFS_REPARSE Windows: FILE_ATTRIBUTE_REPARSE_POINT This flag means that the file is a Windows reparse point. ZFS has special handling code for reparse points, but we don't currently have the other supporting infrastructure for them. UF_HIDDEN: Command line name: "hidden" or "uhidden" ZFS name: XAT_HIDDEN, ZFS_HIDDEN Windows: FILE_ATTRIBUTE_HIDDEN This flag means that the file may be excluded from a directory listing if the application honors it. FreeBSD has no special handling for this flag. The name and bit definition for UF_HIDDEN are identical to the definition in MacOS X. UF_READONLY: Command line name: "urdonly", "rdonly", "readonly" ZFS name: XAT_READONLY, ZFS_READONLY Windows: FILE_ATTRIBUTE_READONLY This flag means that the file may not written or appended, but its attributes may be changed. ZFS currently enforces this flag, but Illumos developers have discussed disabling enforcement. The behavior of this flag is different than MacOS X. MacOS X uses UF_IMMUTABLE to represent the DOS readonly permission, but that flag has a stronger meaning than the semantics of DOS readonly permissions. UF_ARCHIVE: Command line name: "uarch", "uarchive" ZFS_NAME: XAT_ARCHIVE, ZFS_ARCHIVE Windows name: FILE_ATTRIBUTE_ARCHIVE The UF_ARCHIVED flag means that the file has changed and needs to be archived. The meaning is same as the Windows FILE_ATTRIBUTE_ARCHIVE attribute, and the ZFS XAT_ARCHIVE and ZFS_ARCHIVE attribute. msdosfs and ZFS have special handling for this flag. i.e. they will set it when the file changes. sys/param.h: Bump __FreeBSD_version to 1000047 for the addition of new stat(2) flags. chflags.1: Document the new command line flag names (e.g. "system", "hidden") available to the user. ls.1: Reference chflags(1) for a list of file flags and their meanings. strtofflags.c: Implement the mapping between the new command line flag names and new stat(2) flags. chflags.2: Document all of the new stat(2) flags, and explain the intended behavior in a little more detail. Explain how they map to Windows file attributes. Different filesystems behave differently with respect to flags, so warn the application developer to take care when using them. zfs_vnops.c: Add support for getting and setting the UF_ARCHIVE, UF_READONLY, UF_SYSTEM, UF_HIDDEN, UF_REPARSE, UF_OFFLINE, and UF_SPARSE flags. All of these flags are implemented using attributes that ZFS already supports, so the on-disk format has not changed. ZFS currently doesn't allow setting the UF_REPARSE flag, and we don't really have the other infrastructure to support reparse points. msdosfs_denode.c, msdosfs_vnops.c: Add support for getting and setting UF_HIDDEN, UF_SYSTEM and UF_READONLY in MSDOSFS. It supported SF_ARCHIVED, but this has been changed to be UF_ARCHIVE, which has the same semantics as the DOS archive attribute instead of inverse semantics like SF_ARCHIVED. After discussion with Bruce Evans, change several things in the msdosfs behavior: Use UF_READONLY to indicate whether a file is writeable instead of file permissions, but don't actually enforce it. Refuse to change attributes on the root directory, because it is special in FAT filesystems, but allow most other attribute changes on directories. Don't set the archive attribute on a directory when its modification time is updated. Windows and DOS don't set the archive attribute in that scenario, so we are now bug-for-bug compatible. smbfs_node.c, smbfs_vnops.c: Add support for UF_HIDDEN, UF_SYSTEM, UF_READONLY and UF_ARCHIVE in SMBFS. This is similar to changes that Apple has made in their version of SMBFS (as of smb-583.8, posted on opensource.apple.com), but not quite the same. We map SMB_FA_READONLY to UF_READONLY, because UF_READONLY is intended to match the semantics of the DOS readonly flag. The MacOS X code maps both UF_IMMUTABLE and SF_IMMUTABLE to SMB_FA_READONLY, but the immutable flags have stronger meaning than the DOS readonly bit. stat.h: Add definitions for UF_SYSTEM, UF_SPARSE, UF_OFFLINE, UF_REPARSE, UF_ARCHIVE, UF_READONLY and UF_HIDDEN. The definition of UF_HIDDEN is the same as the MacOS X definition. Add commented-out definitions of UF_COMPRESSED and UF_TRACKED. They are defined in MacOS X (as of 10.8.2), but we do not implement them (yet). ufs_vnops.c: Add support for getting and setting UF_ARCHIVE, UF_HIDDEN, UF_OFFLINE, UF_READONLY, UF_REPARSE, UF_SPARSE, and UF_SYSTEM in UFS. Alphabetize the flags that are supported. These new flags are only stored, UFS does not take any action if the flag is set. Sponsored by: Spectra Logic Reviewed by: bde (earlier version)
*	This bug fix is in a code path in rename taken when there is a	mckusick	2013-08-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	collision between a rename and an open system call for the same target file. Here, rename releases its vnode references, waits for the open to finish, and then restarts by reacquiring its needed vnode locks. In this case, rename was unlocking but failing to release its reference to one of its held vnodes. The effect was that even after all the actual references to the vnode had gone, the vnode still showed active references. For files that had been removed, their space was not reclaimed until the filesystem was forcibly unmounted. This bug manifested itself in the Postgres server which would leak/lose hundreds of files per day amounting to many gigabytes of disk space. This bug required shutting down Postgres, forcibly unmounting its filesystem, remounting its filesystem and restarting Postgres every few days to recover the lost space. Reported by: Dan Thomas and Palle Girgensohn Bug-fix by: kib Tested by: Dan Thomas and Palle Girgensohn MFC after: 2 weeks
*	With the addition of journalled soft updates, the "newblk" structures	mckusick	2013-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	persist much longer than previously. Historically we had at most 100 entries; now the count may reach a million. With the increased count we spent far too much time looking them up in the grossly undersized newblk hash table. Configure the newblk hash table to accurately reflect the number of entries that it must index. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
*	To better understand performance problems with journalled soft updates,	mckusick	2013-08-05	1	-9/+43
\| \| \| \| \| \| \| \| \| \|	we need to collect the highest level of allocation for each of the different soft update dependency structures. This change collects these statistics and makes them available using `sysctl debug.softdep.highuse'. Reviewed by: kib Tested by: Peter Holm MFC after: 2 weeks
*	Update to comments describing block allocation policy.	mckusick	2013-07-14	1	-7/+6
\| \| \| \|	Submitted by: Bruce Evans
*	Only copy as much bytes as there in superblock, instead of the full	kib	2013-07-12	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	block copy, when copying the superblock into the snapshot. UFS1 does not align superblock on the block boundary, and bcopy runs off the end of the buffer. Reported by: Andre Albsmeier <Andre.Albsmeier@siemens.com> Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week