FreeBSD-src - Raptor Engineering's fork of pfsense FreeBSD src with pfSense changes

	Commit message (Collapse)	Author	Age	Files	Lines
*	MFC of 269533:	mckusick	2015-05-28	1	-16/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Limit the number of cylinder groups that will be searched when trying to build a cluster. The limit is tunable using the sysctl vfs.ffs.maxclustersearch. The current limit is 10 cylinder groups per block allocation. It was previously limited to the number of cylinder groups in the filesystem per block allocation. When there were no clusters of the needed size left, it repeatedly searched the whole filesystem for a non-existent cluster on every block allocation. The result was very slow filesystem allocation with 100% CPU utilization. The old behavior can be had by setting vfs.ffs.maxclustersearch to a huge number (1,000,000). This change affects only the layout policy routines so is not able to interfere with the integrity of the filesystem. Reported by: Dmitry Sivachenko (demon@) Tested by: Dmitry Sivachenko (demon@)
*	Merge r263233 from HEAD to stable/10:	rwatson	2015-03-19	1	-1/+1
\| \| \| \| \| \| \| \| \|	Update kernel inclusions of capability.h to use capsicum.h instead; some further refinement is required as some device drivers intended to be portable over FreeBSD versions rely on __FreeBSD_version to decide whether to include capability.h. Sponsored by: Google, Inc.
*	MFC r262678;	pfg	2014-03-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	ufs: small formatting fixes. Cleanup some extra space. Use of tabs vs. spaces. No functional change. Reviewed by: mckusick
*	MFC of 260088:	mckusick	2014-01-17	1	-4/+4
\| \| \| \| \| \| \| \|	Fine tune filesystem block allocations under low free-space conditions (-r254995) based on further operational experience. Submitted by: Dmitry Sivachenko Fix Tested by: Dmitry Sivachenko
*	Change the cap_rights_t type from uint64_t to a structure that we can extend	pjd	2013-09-05	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	in the future in a backward compatible (API and ABI) way. The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough. The structure definition looks like this: struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; }; The initial CAP_RIGHTS_VERSION is 0. The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements. The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future. To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg. #define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL) We still support aliases that combine few rights, but the rights have to belong to the same array element, eg: #define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL) #define CAP_FCHMODAT (CAP_FCHMOD \| CAP_LOOKUP) There is new API to manage the new cap_rights_t structure: cap_rights_t cap_rights_init(cap_rights_t rights, ...); void cap_rights_set(cap_rights_t rights, ...); void cap_rights_clear(cap_rights_t rights, ...); bool cap_rights_is_set(const cap_rights_t rights, ...); bool cap_rights_is_valid(const cap_rights_t rights); void cap_rights_merge(cap_rights_t dst, const cap_rights_t src); void cap_rights_remove(cap_rights_t dst, const cap_rights_t src); bool cap_rights_contains(const cap_rights_t big, const cap_rights_t little); Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg: cap_rights_t rights; cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT); There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg: #define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...); Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1: cap_rights_init(&rights, CAP_LOOKUP \| CAP_PDKILL); Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition. This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x. Sponsored by: The FreeBSD Foundation
*	In looking at block layouts as part of fixing filesystem block	mckusick	2013-08-28	1	-2/+2
\| \| \| \| \| \| \| \|	allocations under low free-space conditions (-r254995), determine that old block-preference search order used before -r249782 worked a bit better. This change reverts to that block-preference search order. MFC after: 2 weeks
*	A performance problem was reported in PR kern/181226:	mckusick	2013-08-28	1	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I have 25TB Dell PERC 6 RAID5 array. When it becomes almost full (10-20GB free), processes which write data to it start eating 100% CPU and write speed drops below 1MB/sec (normally to gives 400MB/sec). The revision at which it first became apparent was http://svnweb.freebsd.org/changeset/base/249782. The offending change reserved an area in each cylinder group to store metadata. The new algorithm attempts to save this area for metadata and allows its use for non-metadata only after all the data areas have been exhausted. The size of the reserved area defaults to half of minfree, so the filesystem reports full before the data area can completely fill. However, in this report, the filesystem has had minfree reduced to 1% thus forcing the metadata area to be used for data. As the filesystem approached full, it had only metadata areas left to allocate. The result was that every block allocation had to scan summary data for 30,000 cylinder groups before falling back to searching up to 30,000 metadata areas. The fix is to give up on saving the metadata areas once the free space reserve drops below 2%. The effect of this change is to use the old algorithm of just accepting the first available block that we find. Since most filesystems use the default 5% minfree, this will have no effect on their operation. For those that want to push to the limit, they will get their crappy block placements quickly. Submitted by: Dmitry Sivachenko Fix Tested by: Dmitry Sivachenko PR: kern/181226 MFC after: 2 weeks
*	Update to comments describing block allocation policy.	mckusick	2013-07-14	1	-7/+6
\| \| \| \|	Submitted by: Bruce Evans
*	Make better use of metadata area by avoiding using it for data blocks	mckusick	2013-07-02	1	-1/+1
\| \| \| \| \| \|	that no should no longer immediately follow their indirect blocks. MFC after: 2 weeks
*	The purpose of this change to the FFS layout policy is to reduce the	mckusick	2013-03-22	1	-65/+163
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	running time for a full fsck. It also reduces the random access time for large files and speeds the traversal time for directory tree walks. The key idea is to reserve a small area in each cylinder group immediately following the inode blocks for the use of metadata, specifically indirect blocks and directory contents. The new policy is to preferentially place metadata in the metadata area and everything else in the blocks that follow the metadata area. The size of this area can be set when creating a filesystem using newfs(8) or changed in an existing filesystem using tunefs(8). Both utilities use the `-k held-for-metadata-blocks' option to specify the amount of space to be held for metadata blocks in each cylinder group. By default, newfs(8) sets this area to half of minfree (typically 4% of the data area). This work was inspired by a paper presented at Usenix's FAST '13: www.usenix.org/conference/fast13/ffsck-fast-file-system-checker Details of this implementation appears in the April 2013 of ;login: www.usenix.org/publications/login/april-2013-volume-38-number-2. A copy of the April 2013 ;login: paper can also be downloaded from: www.mckusick.com/publications/faster_fsck.pdf. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks
*	UFS support of the unmapped i/o for the user data buffers.	kib	2013-03-19	1	-4/+6
\| \| \| \| \|	Sponsored by: The FreeBSD Foundation Tested by: pho, scottl, jhb, bf
*	An inode block must not be blockingly read while cg block is owned.	kib	2013-02-27	1	-6/+67
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The order is inode buffer lock -> snaplk -> cg buffer lock, reversing the order causes deadlocks. Inode block must not be written while cg block buffer is owned. The FFS copy on write needs to allocate a block to copy the content of the inode block, and the cylinder group selected for the allocation might be the same as the owned cg block. The reserved block detection code in the ffs_copyonwrite() and ffs_bp_snapblk() is unable to detect the situation, because the locked cg buffer is not exposed to it. In order to maintain the dependency between initialized inode block and the cg_initediblk pointer, look up the inode buffer in non-blocking mode. If succeeded, brelse cg block, initialize the inode block and write it. After the write is finished, reread cg block and update the cg_initediblk. If inode block is already locked by another thread, let the another thread initialize it. If another thread raced with us after we started writing inode block, the situation is detected by an update of cg_initediblk. Note that double-initialization of the inode block is harmless, the block cannot be used until cg_initediblk is incremented. Sponsored by: The FreeBSD Foundation In collaboration with: pho Reviewed by: mckusick MFC after: 1 month X-MFC-note: after r246877
*	The UFS2 filesystem allocates new blocks of inodes as they are needed.	mckusick	2013-02-16	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a cylinder group runs short of inodes, a new block for inodes is allocated, zero'ed, and written to the disk. The zero'ed inodes must be on the disk before the cylinder group can be updated to claim them. If the cylinder group claiming the new inodes were written before the zero'ed block of inodes, the system could crash with the filesystem in an unrecoverable state. Rather than adding a soft updates dependency to ensure that the new inode block is written before it is claimed by the cylinder group map, we just do a barrier write of the zero'ed inode block to ensure that it will get written before the updated cylinder group map can be written. This change should only slow down bulk loading of newly created filesystems since that is the primary time that new inode blocks need to be created. Reported by: Robert Watson Reviewed by: kib Tested by: Peter Holm
*	Fix several unsafe pointer dereferences in the buffered_write()	kib	2013-02-10	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	function, implementing the sysctl vfs.ffs.set_bufoutput (not used in the tree yet). - The current directory vnode dereference is unsafe since fd_cdir could be changed and unreferenced, lock the filedesc around and vref the fd_cdir. - The VTOI() conversion of the fd_cdir is unsafe without first checking that the vnode is indeed from an FFS mount, otherwise the code dereferences a random memory. - The cdir could be reclaimed from under us, lock it around the checks. - The type of the fp vnode might be not a disk, or it might have changed while the thread was in flight, check the type. Reviewed and tested by: mckusick MFC after: 2 weeks
*	When a file is first being written, the dynamic block reallocation	mckusick	2012-11-03	1	-0/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	(implemented by ffs_reallocblks_ufs[12]) relocates the file's blocks so as to cluster them together into a contiguous set of blocks on the disk. When the cluster crosses the boundary into the first indirect block, the first indirect block is initially allocated in a position immediately following the last direct block. Block reallocation would usually destroy locality by moving the indirect block out of the way to keep the data blocks contiguous. This change compensates for this problem by noting that the first indirect block should be left immediately following the last direct block. It then tries to start a new cluster of contiguous blocks (referenced by the indirect block) immediately following the indirect block. We should also do this for other indirect block boundaries, but it is only important for the first one. Suggested by: Bruce Evans MFC: 2 weeks
*	Remove the support for using non-mpsafe filesystem modules.	kib	2012-10-22	1	-12/+2
\| \| \| \| \| \| \| \| \| \| \| \|	In particular, do not lock Giant conditionally when calling into the filesystem module, remove the VFS_LOCK_GIANT() and related macros. Stop handling buffers belonging to non-mpsafe filesystems. The VFS_VERSION is bumped to indicate the interface change which does not result in the interface signatures changes. Conducted and reviewed by: attilio Tested by: pho
*	Fix up kernel sources to be ready for a 64-bit ino_t.	mdf	2012-09-27	1	-12/+14
\| \| \| \|	Original code by: Gleb Kurtsou
*	Extend the KPI to lock and unlock f_offset member of struct file. It	kib	2012-07-02	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	now fully encapsulates all accesses to f_offset, and extends f_offset locking to other consumers that need it, in particular, to lseek() and variants of getdirentries(). Ensure that on 32bit architectures f_offset, which is 64bit quantity, always read and written under the mtxpool protection. This fixes apparently easy to trigger race when parallel lseek()s or lseek() and read/write could destroy file offset. The already broken ABI emulations, including iBCS and SysV, are not converted (yet). Tested by: pho No objections from: jhb MFC after: 3 weeks
*	Migrate ufs and ext2fs from skpc() to memcchr().	ed	2012-01-01	1	-13/+7
\| \| \| \| \| \| \| \|	While there, remove a useless check from the code. memcchr() always returns characters unequal to 0xff in this case, so inosused[i] ^ 0xff can never be equal to zero. Also, the fact that memcchr() returns a pointer instead of the number of bytes until the end, makes conversion to an offset far more easy.
*	Fix two cases involving opt_capsicum.h and module builds:	rwatson	2011-08-15	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	(1) opt_capsicum.h is no longer required in ffs_alloc.c, so remove the #include. (2) portalfs depends on opt_capsicum.h, so have the Makefile generate one if required. These affect only modules built without a kernel (i.e, not buildkernel, but yes buildworld if the dubious MODULES_WITH_WORLD is used). Approved by: re (bz) Sponsored by: Google Inc
*	Second-to-last commit implementing Capsicum capabilities in the FreeBSD	rwatson	2011-08-11	1	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	kernel for FreeBSD 9.0: Add a new capability mask argument to fget(9) and friends, allowing system call code to declare what capabilities are required when an integer file descriptor is converted into an in-kernel struct file *. With options CAPABILITIES compiled into the kernel, this enforces capability protection; without, this change is effectively a no-op. Some cases require special handling, such as mmap(2), which must preserve information about the maximum rights at the time of mapping in the memory map so that they can later be enforced in mprotect(2) -- this is done by narrowing the rights in the existing max_protection field used for similar purposes with file permissions. In namei(9), we assert that the code is not reached from within capability mode, as we're not yet ready to enforce namespace capabilities there. This will follow in a later commit. Update two capability names: CAP_EVENT and CAP_KEVENT become CAP_POST_KEVENT and CAP_POLL_KEVENT to more accurately indicate what they represent. Approved by: re (bz) Submitted by: jonathan Sponsored by: Google Inc
*	Update to -r224294 to ensure that only one of MNT_SUJ or MNT_SOFTDEP	mckusick	2011-07-30	1	-2/+2
\| \| \| \| \| \| \|	is set so that mount can revert back to using MNT_NOWAIT when doing getmntinfo. Approved by: re (kib)
*	Default debugging error messages to off for journaled soft updates sysctls.	mckusick	2011-07-22	1	-5/+3
\| \| \| \| \| \|	Delete limiting on output of these sysctls. Approved by: re (kib)
*	Add an FFS specific mount option to allow a filesystem checker	mckusick	2011-07-15	1	-8/+177
\| \| \| \| \| \| \| \| \|	(typically fsck_ffs) to register that it wishes to use FFS specific sysctl's to update the filesystem. This ensures that two checkers cannot run on a given filesystem at the same time and that no other process accidentally or maliciously uses the filesystem updating sysctls inappropriately. This functionality is needed by the journaling soft-updates recovery code.
*	When first creating snapshots, we may free some blocks within it.	mckusick	2011-07-10	1	-1/+5
\| \| \| \| \| \|	These blocks should not have TRIM applied to them. Submitted by: Kostik Belousov
*	- Fix directory count rollbacks by passing the mode to the journal dep	jeff	2011-06-20	1	-1/+1
\| \| \| \| \| \|	earlier. - Add rollback/forward code for frag and cluster accounting. - Handle the FREEDEP case in softdep_sync_buf(). (submitted by pho)
*	Ensure that filesystem metadata contained within persistent snapshots	mckusick	2011-06-15	1	-9/+10
\| \| \| \| \| \|	is always kept consistent. Suggested by: Jeff Roberson
*	With the restructuring of the block reclaimation code, the notification	mckusick	2011-06-15	1	-4/+3
\| \| \| \| \| \| \|	messages for a filesystem being out of space need to be moved so that they do not print out until after a failed cleanup attempt. Suggested by: Jeff Roberson
*	Update to soft updates journaling to properly track freed blocks	mckusick	2011-06-12	1	-1/+1
\| \| \| \| \| \| \|	that get claimed by snapshots. Submitted by: Jeff Roberson Tested by: Peter Holm
*	Implement fully asynchronous partial truncation with softupdates journaling	jeff	2011-06-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	to resolve errors which can cause corruption on recovery with the old synchronous mechanism. - Append partial truncation freework structures to indirdeps while truncation is proceeding. These prevent new block pointers from becoming valid until truncation completes and serialize truncations. - On completion of a partial truncate journal work waits for zeroed pointers to hit indirects. - softdep_journal_freeblocks() handles last frag allocation and last block zeroing. - vtruncbuf/ffs_page_remove moved into softdep_*_freeblocks() so it is only implemented in one place. - Block allocation failure handling moved up one level so it does not proceed with buf locks held. This permits us to do more extensive reclaims when filesystem space is exhausted. - softdep_sync_metadata() is broken into two parts, the first executes once at the start of ffs_syncvnode() and flushes truncations and inode dependencies. The second is called on each locked buf. This eliminates excessive looping and rollbacks. - Improve the mechanism in process_worklist_item() that handles acquiring vnode locks for handle_workitem_remove() so that it works more generally and does not loop excessively over the same worklist items on each call. - Don't corrupt directories by zeroing the tail in fsck. This is only done for regular files. - Push a fsync complete record for files that need it so the checker knows a truncation in the journal is no longer valid. Discussed with: mckusick, kib (ffs_pages_remove and ffs_truncate parts) Tested by: pho
*	Grammer fix in comment.	mckusick	2011-06-05	1	-3/+3
\| \| \| \| \| \| \| \|	Eliminate one (of several) possible conflicting buffer locks when trying to reclaim blocks. Rest of fix to be incorporated as part of SUJ update by jeff. Pointed out by: Kostik Belousov
*	Due to a lag in updating the fs_pendinginodes count, we cannot depend	mckusick	2011-05-28	1	-1/+1
\| \| \| \| \| \| \|	on it to decide whether we should try to reclaim inodes when we run short. Discovered by: Peter Holm
*	The check for whether a block is going to be claimed by a snapshot	mckusick	2011-05-26	1	-4/+12
\| \| \| \| \|	needs to happen before we notify the underlying layer that it is being freed.
*	VFS sometimes is unable to inactivate a vnode when vnode use count	kib	2011-04-24	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	goes to zero. E.g., the vnode might be only shared-locked at the time of vput() call. Such vnodes are kept in the hash, so they can be found later. If ffs_valloc() allocated an inode that has its vnode cached in hash, and still owing the inactivation, then vget() call from ffs_valloc() clears VI_OWEINACT, and then the vnode is reused for the newly allocated inode. The problem is, the vnode is not reclaimed before it is put to the new use. ffs_valloc() recycles vnode vm object, but this is not enough. In particular, at least v_vflag should be cleared, and several bits of UFS state need to be removed. It is very inconvenient to call vgone() at this point. Instead, move some parts of ufs_reclaim() into helper function ufs_prepare_reclaim(), and call the helper from VOP_RECLAIM and ffs_valloc(). Reviewed by: mckusick Tested by: pho MFC after: 3 weeks
*	Be far more persistent in reclaiming blocks and inodes before giving	mckusick	2011-04-05	1	-5/+5
\| \| \| \| \| \| \| \| \| \|	up and declaring a filesystem out of space. Especially necessary when running on a small filesystem. With this improvement, it should be possible to use soft updates on a small root filesystem. Kudos to: Peter Holm Testing by: Peter Holm MFC: 2 weeks
*	Add retry code analogous to the block allocation retry code	mckusick	2011-03-23	1	-3/+10
\| \| \| \| \| \|	to avoid running out of inodes. Reported by: Peter Holm
*	Use ffs() to locate free bits in the inode bitmap rather than a loop with	jhb	2011-03-04	1	-10/+6
\| \| \| \| \| \| \|	bit shifts. Reviewed by: mckusick MFC after: 1 month
*	Add kernel side support for BIO_DELETE/TRIM on UFS.	kib	2010-12-29	1	-2/+100
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The FS_TRIM fs flag indicates that administrator requested issuing of TRIM commands for the volume. UFS will only send the command to disk if the disk reports GEOM::candelete attribute. Since disk queue is reordered, data block is marked as free in the bitmap only after TRIM command completed. Due to need to sleep waiting for i/o to finish, TRIM bio_done routine schedules taskqueue to set the bitmap bit. Based on the patch by: mckusick Reviewed by: mckusick, pjd Tested by: pho MFC after: 1 month
*	- Handle the truncation of an inode with an effective link count of 0 in	jeff	2010-07-06	1	-20/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the context of the process that reduced the effective count. Previously all truncation as a result of unlink happened in the softdep flush thread. This had the effect of being impossible to rate limit properly with the journal code. Now the process issuing unlinks is suspended when the journal files. This has a side-effect of improving rm performance by allowing more concurrent work. - Handle two cases in inactive, one for effnlink == 0 and another when nlink finally reaches 0. - Eliminate the SPACECOUNTED related code since the truncation is no longer delayed. Discussed with: mckusick
*	- Merge soft-updates journaling from projects/suj/head into head. This	jeff	2010-04-24	1	-169/+83
\| \| \| \| \| \| \| \|	brings in support for an optional intent log which eliminates the need for background fsck on unclean shutdown. Sponsored by: iXsystems, Yahoo!, and Juniper. With help from: McKusick and Peter Holm
*	When ffs_realloccg() failed to allocate bigger fragment and, because	kib	2010-02-13	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \|	pending blocks are scheduled for removal, goes to retry the (re)allocation, clear the bp pointer. It might happen that meantime free space is really exhausted and we are entering nospace: label without bread()ing buffer, causing stale bp value to be brelse()d again. Tested by: pho (Producing a scenario to reliably reproduce the race appeared to be much harder then fixing the bug) MFC after: 1 week
*	This fix corrects a problem in the file system that treats large	mckusick	2010-02-10	1	-32/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	inode numbers as negative rather than unsigned. For a default (16K block) file system, this bug began to show up at a file system size above about 16Tb. To fully handle this problem, newfs must be updated to ensure that it will never create a filesystem with more than 2^32 inodes. That patch will be forthcoming soon. Reported by: Scott Burns, John Kilburg, Bruce Evans Followup by: Jeff Roberson PR: 133980 MFC after: 2 weeks
*	Cast 64-bit quantity to intptr_t rather than int so as to work properly	mckusick	2010-01-11	1	-2/+2
\| \| \| \| \| \|	with 64-bit architectures (such as amd64). Reported by: bz
*	Background:	mckusick	2010-01-11	1	-8/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When renaming a directory it passes through several intermediate states. First its new name will be created causing it to have two names (from possibly different parents). Next, if it has different parents, its value of ".." will be changed from pointing to the old parent to pointing to the new parent. Concurrently, its old name will be removed bringing it back into a consistent state. When fsck encounters an extra name for a directory, it offers to remove the "extraneous hard link"; when it finds that the names have been changed but the update to ".." has not happened, it offers to rewrite ".." to point at the correct parent. Both of these changes were considered unexpected so would cause fsck in preen mode or fsck in background mode to fail with the need to run fsck manually to fix these problems. Fsck running in preen mode or background mode now corrects these expected inconsistencies that arise during directory rename. The functionality added with this update is used by fsck running in background mode to make these fixes. Solution: This update adds three new fsck sysctl commands to support background fsck in correcting expected inconsistencies that arise from incomplete directory rename operations. They are: setcwd(dirinode) - set the current directory to dirinode in the filesystem associated with the snapshot. setdotdot(oldvalue, newvalue) - Verify that the inode number for ".." in the current directory is oldvalue then change it to newvalue. unlink(nameptr, oldvalue) - Verify that the inode number associated with nameptr in the current directory is oldvalue then unlink it. As with all other fsck sysctls, these new ones may only be used by processes with appropriate priviledge. Reported by: jeff Security issues: rwatson
*	Introduce vfs_bio_set_valid() and use it from ffs_realloccg(). This	alc	2009-05-17	1	-8/+6
\| \| \| \| \| \|	eliminates the misuse of vfs_bio_clrbuf() by ffs_realloccg(). In collaboration with: tegge
*	When a device containing mounted UFS filesystem disappears, the type	trasz	2009-02-06	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	of devvp becomes VBAD, which UFS incorrectly interprets as snapshot vnode, which in turns causes panic. Fix it by replacing '!= VCHR' with '== VREG'. With this fix in place, you should no longer be able to panic the system by removing a device with an UFS filesystem mounted from it - assuming you don't use softupdates. Reviewed by: kib Tested by: pho Approved by: rwatson (mentor) Sponsored by: FreeBSD Foundation
*	Following a fair amount of real world experience with ACLs and	rwatson	2009-01-27	1	-7/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	extended attributes since FreeBSD 5, make the following semantic changes: - Don't update the inode modification time (mtime) when extended attributes (and hence also ACLs) are added, modified, or removed. - Don't update the inode access tie (atime) when extended attributes (and hence also ACLs) are queried. This means that rsync (and related tools) won't improperly think that the data in the file has changed when only the ACL has changed. Note that ffs_reallocblks() has not been changed to not update on an IO_EXT transaction, but currently EAs don't use the cluster write routines so this shouldn't be a problem. If EAs grow support for clustering, then VOP_REALLOCBLKS() will need to grow a flag argument to carry down IO_EXT to UFS. MFC after: 1 week PR: ports/125739 Reported by: Alexander Zagrebin <alexz@visp.ru> Tested by: pluknet <pluknet@gmail.com>, Greg Byshenk <freebsd@byshenk.net> Discussed with: kib, kientzle, timur, Alexander Bokovoy <ab@samba.org>
*	In ffs_valloc(), ffs_vget() may fail because insmntque() refused to	kib	2008-08-28	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	insert new vnode into the mount vnode list. Then, for the SU-enabled mount, ffs_vfree could create freefile dependency. This dependency can hang around forever since inode is not marked as IN_MODIFIED and correspondingly inodeblock may be not marked as dirty. After ffs_vget() fails, retry with FFSV_FORCEINSMQ, mark the inode as modified, and vput() it immediately. Take care of the dup alloc. Tested by: pho Reviewed by: tegge MFC after: 1 month
*	Fix a broken check that recently became more annoying because it now	kensmith	2007-12-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \|	gets enabled when INVARIANTS is on instead of DIAGNOSTIC (which apparently nobody uses). From Tor's description: This happens when the block range spans two block maps, the first in the inode (mapping up to NDADDR direct blocks) and the second being the first indirect block. The current check assumes that both block maps are indirect blocks. Work done by: tegge Tested by: kris, kensmith
*	Turn most ffs 'DIAGNOSTIC's into INVARIANTS.	obrien	2007-11-08	1	-17/+17
\|