summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* Use ACL_PERM_NONE instead of hardcoding 0 when initializingjedgar2001-09-011-3/+3
| | | | | | ACL entry permissions. Reviewed by: rwatson
* o At some point, unmounting a non-EA file system with EA's compiledrwatson2001-09-011-2/+4
| | | | | | | | | in got a bit broken, when ufs_extattr_stop() was called and failed, ufs_extattr_destroy() would panic. This makes the call to destroy() conditional on the success of stop(). Submitted by: Christian Carstensen <cc@devcon.net> Obtained from: TrustedBSD Project
* If a file has been completely unlinked, stop automatically syncing thepeter2001-08-271-0/+2
| | | | | | file. ffs will discard any pending dirty pages when it is closed, so we may as well not waste time trying to clean them. This doesn't stop other things from writing it out, eg: pageout, fsync(2) etc.
* Stop using dirhash when a directory is removed, and ensure that weiedowse2001-08-262-0/+12
| | | | | | never attempt to hash directories once they are deleted. This fixes a problem where operations on a deleted directory could trigger dirhash sanity panics.
* When compacting directories, ufs_direnter() always trusted DIRSIZ()iedowse2001-08-261-11/+29
| | | | | | | | | | | | | | | | | | | | to supply the number of bytes to be bcopy()'d to move an entry. If d_ino == 0 however, DIRSIZ() is not guaranteed to return a sensible length, so ufs_direnter could end up corrupting a directory during compaction. In practice I believe this can only happen after fsck_ffs has fixed a previously-corrupted directory. We now deal with any mid-block unused entries specially to avoid using DIRSIZ() or bcopy() on such entries. We also ensure that the variables 'dsize' and 'spacefree' contain meaningful values at all times. Add a few comments to describe better this intricate piece of code. The special handling of mid-block unused entries makes the dirhash- specific bugfix in the previous revision (1.53) now uncecessary, so this change removes it. Reviewed by: mckusick
* When compressing directory blocks, the dirhash code didn't checkiedowse2001-08-221-1/+1
| | | | | | | | | that the directory entry was in use before attempting to find it in the hash structures to change its offset. Normally, unused entries do not need to be moved, but fsck can leave behind some unused entries that do. A dirhash sanity panic resulted when the entry to be moved was not found. Add a check that stops entries with d_ino == 0 from being passed to ufsdirhash_move().
* Sigh. ufs_lookup() calls ffs_snapgone(), meaning that 'options EXT2FS'peter2001-08-181-0/+5
| | | | without 'options FFS' would fail to link.
* Two recent commits in sys/ufs/ufs interacted badly with ext2fsiedowse2001-07-292-2/+5
| | | | | | | | | | | | | | because it shares ufs code. In ufs_fhtovp(), the test on i_effnlink is invalid because ext2fs does not maintain this field. In ufs_close(), i_effnlink is also tested, to determines whether or not to call vn_start_write(). The ufs_fhtovp issue breaks NFS exporting of ext2fs filesystems; I believe the other is harmless. Fix both cases by checking um_i_effnlink_valid in the ufsmount struct, and use i_nlink if necessary. Noticed by: bde Reviewed by: mckusick, bde
* Disable the dirhash sanity check that panics if an unused directoryiedowse2001-07-271-0/+8
| | | | | | | | | entry (d_ino == 0) is found in a position that is not the start of a DIRBLKSIZ block. While such entries cannot occur normally (ufs always extends the previous entry to cover the free space instead), they do not cause problems and fsck does not fix them, so panicking is bad.
* Use a fixed type for times in on-disk structures for ufs rather thanpeter2001-07-162-5/+5
| | | | something that could potentially change like time_t.
* Return a locked struct buf from ufsdirhash_lookup() to avoid oneiedowse2001-07-133-9/+9
| | | | | | | | | | extra getblk/brelse sequence for each lookup. We already had this buf in ufsdirhash_lookup(), so there was no point in brelse'ing it only to have the caller immediately reaquire the same buffer. This should make the case of sequential lookups marginally faster; in my tests, sequential lookups with dirhash enabled are now only around 1% slower than without dirhash.
* Bring in dirhash, a simple hash-based lookup optimisation for largeiedowse2001-07-105-2/+1276
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | directories. When enabled via "options UFS_DIRHASH", in-core hash arrays are maintained for large directories. These allow all directory operations to take place quickly instead of requiring long linear searches. For now anyway, dirhash is not enabled by default. The in-core hash arrays have a memory requirement that is approximately half the size of the size of the on-disk directory file. A number of new sysctl variables allow control over which directories get hashed and over the maximum amount of memory that dirhash will use: vfs.ufs.dirhash_minsize The minimum on-disk directory size for which hashing should be used. The default is 2560 (2.5k). vfs.ufs.dirhash_maxmem The system-wide maximum total memory to be used by dirhash data structures. The default is 2097152 (2MB). The current amount of memory being used by dirhash is visible through the read-only sysctl variable vfs.ufs.dirhash_maxmem. Finally, some extra sanity checks that are enabled by default, but which may have an impact on performance, can be disabled by setting vfs.ufs.dirhash_docheck to 0. Discussed on: -fs, -hackers
* With Alfred's permission, remove vm_mtx in favor of a fine-grained approachdillon2001-07-041-30/+5
| | | | | | | | | (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
* Fix more mntvnode and vnode interlock order reversals.jhb2001-06-281-2/+2
|
* - Fix a mntvnode and vnode interlock reversal.jhb2001-06-282-19/+46
| | | | | - Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.
* Fix warning:peter2001-06-151-1/+1
| | | | 1973: warning: int format, long int arg (arg 5)
* Build on the change in revision 1.98 by Tor.Egge@fast.no.mckusick2001-06-131-13/+21
| | | | | | | | | | | The symptom being treated in 1.98 was to avoid freeing a pagedep dependency if there was still a newdirblk dependency referencing it. That change is correct and no longer prints a warning message when it occurs. The other part of revision 1.98 was to panic when a newdirblk dependency was encountered during a file truncation. This fix removes that panic and replaces it with code to find and delete the newdirblk dependency so that the truncation can succeed.
* Call vn_close on the backing file vnode if ufs_extattr_enable failed totmm2001-06-071-1/+4
| | | | | | avoid leaking it. Reviewed by: rwatson
* Add a wrapper for the fifo kqfilter which falls through to the ufs routine.jlemon2001-06-061-0/+19
| | | | This permits the fifo to inherit the ufs VNODE kqfilter.
* Add a kqueue filter for writing to ufs filesystems which always returnsjlemon2001-06-051-0/+22
| | | | | | | true. This permits better interoperability with programs which register filters on their stdin/stdout handles. Submitted by: Niels Provos <provos@citi.umich.edu>
* There seems to be a problem that the order of disk write operation beingobrien2001-06-051-2/+11
| | | | | | | | | | | incorrect due to a missing check for some dependency. This change avoids the freelist corruption (but not the temporarily inconsistent state of the file system). A message is printed as a reminder of the under lying problem when a pagedep structure is not freed due to the NEWBLOCK flag being set. Submitted by: Tor.Egge@fast.no
* Revert the previous commit in favor of the fix in rev 1.42 ofjhb2001-05-301-1/+0
| | | | | | ufs/ffs/ffs_extern.h instead. Requested by: bde
* Forward declare struct cg to quiet a warning.jhb2001-05-301-0/+1
| | | | Submitted by: bde
* Include <ufs/ffs/fs.h> to get the definition of struct cg to quiet ajhb2001-05-291-0/+1
| | | | warning.
* Remove last vestiges of MFS.phk2001-05-292-14/+4
|
* Remove MFS from the kernel.phk2001-05-294-944/+0
|
* Add a check to determine whether extended attributes have beentmm2001-05-251-0/+8
| | | | | | | | | | initialized on the file system before trying to grab the lock of the per-mount extattr structure, as this lock is unitialized in that case. This is needed because ufs_extattr_vnode_inactive is called from ufs_inactive, which is also used by EA-unaware file systems such as ext2fs. Reviewed by: rwatson
* o Merge contents of struct pcred into struct ucred. Specifically, add therwatson2001-05-252-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
* This patch implements O_DIRECT about 80% of the way. It takes a patchsetdillon2001-05-241-7/+29
| | | | | | | | | | | | | | | | Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon
* ufs_bmaparray() may block on IO, drop vm mutex and aquire Giant whenalfred2001-05-231-0/+10
| | | | calling it from the pager routine
* - FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION fileru2001-05-231-1/+1
| | | | | | | | | | | | | | | systems were repo-copied from sys/miscfs to sys/fs. - Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs. - Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS. - Install header files for the above file systems. - Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.
* Update softdep_setup_directory_add prototype to reflect changes inmckusick2001-05-201-2/+3
| | | | | | actual function. Obtained from: Jim Bloom <bloom@jbloom.jbloom.org>
* Must ensure that all the entries on the pd_pendinghd list have beenmckusick2001-05-191-3/+11
| | | | | | | | | | | | | | committed to disk before clearing them. More specifically, when free_newdirblk is called, we know that the inode claims the new directory block. However, if the associated pagedep is still linked onto the directory buffer dependency chain, then some of the entries on the pd_pendinghd list may not be committed to disk yet. In this case, we will simply note that the inode claims the block and let the pd_pendinghd list be processed when the pagedep is next written. If the pagedep is no longer on the buffer dependency chain, then all the entries on the pd_pending list are committed to disk and we can free them in free_newdirblk. This corrects a window of vulnerability introduced in the code added in version 1.95.
* Introduce a global lock for the vm subsystem (vm_mtx).alfred2001-05-191-9/+38
| | | | | | | | | | | | | | | | | | | vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
* Must be a bit less aggressive about freeing pagedep structures.mckusick2001-05-181-1/+1
| | | | | Obtained from: Robert Watson <rwatson@FreeBSD.org> and Matthew Jacob <mjacob@feral.com>
* When a new block is allocated to a directory, an fsync of a filemckusick2001-05-174-39/+242
| | | | | | | | | | | | | whose name is within that block must ensure not only that the block containing the file name has been written, but also that the on-disk directory inode references that block. When a new directory block is created, we allocate a newdirblk structure which is linked to the associated allocdirect (on its ad_newdirblk list). When the allocdirect has been satisfied, the newdirblk structure is moved to the inodedep id_bufwait list of its directory to await the inode being written. When the inode is written, the directory entries are fully committed and can be deleted from their pagedep->id_pendinghd and inodedep->id_pendinghd lists.
* Change the second argument of vflush() to an integer that specifiesiedowse2001-05-161-3/+3
| | | | | | | | | | | | | | | | | | | | the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp
* Further fixes for deadlock in the presence of multiple snapshots.mckusick2001-05-141-7/+20
| | | | | There are still more to find, but this fix should cover the common cases that folks are hitting.
* If the effective link count is zero when an NFS file handle requestmckusick2001-05-131-1/+3
| | | | | | | | | | | | | | | | comes in for it, the file is really gone, so return ESTALE. The problem arises when the last reference to an FFS file is released because soft-updates may delay the actual freeing of the inode for some time. Since there are no filesystem links or open file descriptors referencing the inode, from the point of view of the system, the file is inaccessible. However, if the filesystem is NFS exported, then the remote client can still access the inode via ufs_fhtovp() until the inode really goes away. To prevent this anomoly, it is necessary to begin returning ESTALE at the same time that the file ceases to be accessible to the local filesystem. Obtained from: Ian Dowse <iedowse@maths.tcd.ie>
* Remove yet another deadlock case.mckusick2001-05-111-3/+6
|
* When running with soft updates, track the number of blocks and filesmckusick2001-05-089-11/+119
| | | | | | | | | | | | | that are committed to being freed and reflect these blocks in the counts returned by statfs (and thus also by the `df' command). This change allows programs such as those that do news expiration to know when to stop if they are trying to create a certain percentage of free space. Note that this change does not solve the much harder problem of making this to-be-freed space available to applications that want it (thus on a nearly full filesystem, you may still encounter out-of-space conditions even though the free space will show up eventually). Hopefully this harder problem will be the subject of a future enhancement.
* Several fixes for units errors:mckusick2001-05-081-10/+19
| | | | | | | | | | | | | | | | | 1) Do not assume that the superblock will be of size fs->fs_bsize. This fixes a panic when taking a snapshot on a filesystem with a block size bigger than 8K. 2) Properly calculate the number of fragments that follow the superblock summary information. This fixes a bug with inconsistent snapshots. 3) When cleaning up a snapshot that is about to be removed, properly calculate the number of blocks that need to be checked. This fixes a bug that created partially allocated inodes. 4) When moving blocks from a snapshot that is about to be removed to another snapshot, properly account for the reduced number of blocks in the snapshot from which they are taken. This fixes a bug in which the number of blocks released from a snapshot did not match the number that it claimed to have.
* When syncing out snapshot metadata, we must temporarily allow recursivemckusick2001-05-081-27/+29
| | | | | buffer locking so as to avoid locking against ourselves if we need to write filesystem metadata.
* Refinement to revision 1.16 of ufs/ffs/ffs_snapshot.c to reducemckusick2001-05-043-120/+227
| | | | | the amount of time that the filesystem must be suspended. The current snapshot is elided as well as the earlier snapshots.
* Use ufs_bmaparray() rather than VOP_BMAP() on our own vnodes.phk2001-05-011-2/+2
|
* Remove blatantly pointless call to VOP_BMAP().phk2001-05-012-9/+3
| | | | Use ufs_bmaparray() rather than VOP_BMAP() on our own vnodes.
* Implement vop_std{get|put}pages() and add them to the default vop[].phk2001-05-013-18/+0
| | | | | Un-copy&paste all the VOP_{GET|PUT}PAGES() functions which do nothing but the default.
* Undo part of the tangle of having sys/lock.h and sys/mutex.h included inmarkm2001-05-012-6/+11
| | | | | | | | | | | other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
* VOP_BALLOC was never really a VOP in the first place, so convert itphk2001-04-2911-49/+44
| | | | to UFS_BALLOC like the other "between UFS and FFS function interfaces".
* Add a vop_stdbmap(), and make it part of the default vop vector.phk2001-04-291-25/+1
| | | | | | Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.
OpenPOWER on IntegriCloud