summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* With Alfred's permission, remove vm_mtx in favor of a fine-grained approachdillon2001-07-041-30/+5
| | | | | | | | | (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.
* Fix more mntvnode and vnode interlock order reversals.jhb2001-06-281-2/+2
|
* - Fix a mntvnode and vnode interlock reversal.jhb2001-06-282-19/+46
| | | | | - Protect the mnt_vnode list with the mntvnode lock. - Use queue(9) macros.
* Fix warning:peter2001-06-151-1/+1
| | | | 1973: warning: int format, long int arg (arg 5)
* Build on the change in revision 1.98 by Tor.Egge@fast.no.mckusick2001-06-131-13/+21
| | | | | | | | | | | The symptom being treated in 1.98 was to avoid freeing a pagedep dependency if there was still a newdirblk dependency referencing it. That change is correct and no longer prints a warning message when it occurs. The other part of revision 1.98 was to panic when a newdirblk dependency was encountered during a file truncation. This fix removes that panic and replaces it with code to find and delete the newdirblk dependency so that the truncation can succeed.
* Call vn_close on the backing file vnode if ufs_extattr_enable failed totmm2001-06-071-1/+4
| | | | | | avoid leaking it. Reviewed by: rwatson
* Add a wrapper for the fifo kqfilter which falls through to the ufs routine.jlemon2001-06-061-0/+19
| | | | This permits the fifo to inherit the ufs VNODE kqfilter.
* Add a kqueue filter for writing to ufs filesystems which always returnsjlemon2001-06-051-0/+22
| | | | | | | true. This permits better interoperability with programs which register filters on their stdin/stdout handles. Submitted by: Niels Provos <provos@citi.umich.edu>
* There seems to be a problem that the order of disk write operation beingobrien2001-06-051-2/+11
| | | | | | | | | | | incorrect due to a missing check for some dependency. This change avoids the freelist corruption (but not the temporarily inconsistent state of the file system). A message is printed as a reminder of the under lying problem when a pagedep structure is not freed due to the NEWBLOCK flag being set. Submitted by: Tor.Egge@fast.no
* Revert the previous commit in favor of the fix in rev 1.42 ofjhb2001-05-301-1/+0
| | | | | | ufs/ffs/ffs_extern.h instead. Requested by: bde
* Forward declare struct cg to quiet a warning.jhb2001-05-301-0/+1
| | | | Submitted by: bde
* Include <ufs/ffs/fs.h> to get the definition of struct cg to quiet ajhb2001-05-291-0/+1
| | | | warning.
* Remove last vestiges of MFS.phk2001-05-292-14/+4
|
* Remove MFS from the kernel.phk2001-05-294-944/+0
|
* Add a check to determine whether extended attributes have beentmm2001-05-251-0/+8
| | | | | | | | | | initialized on the file system before trying to grab the lock of the per-mount extattr structure, as this lock is unitialized in that case. This is needed because ufs_extattr_vnode_inactive is called from ufs_inactive, which is also used by EA-unaware file systems such as ext2fs. Reviewed by: rwatson
* o Merge contents of struct pcred into struct ucred. Specifically, add therwatson2001-05-252-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | real uid, saved uid, real gid, and saved gid to ucred, as well as the pcred->pc_uidinfo, which was associated with the real uid, only rename it to cr_ruidinfo so as not to conflict with cr_uidinfo, which corresponds to the effective uid. o Remove p_cred from struct proc; add p_ucred to struct proc, replacing original macro that pointed. p->p_ucred to p->p_cred->pc_ucred. o Universally update code so that it makes use of ucred instead of pcred, p->p_ucred instead of p->p_pcred, cr_ruidinfo instead of p_uidinfo, cr_{r,sv}{u,g}id instead of p_*, etc. o Remove pcred0 and its initialization from init_main.c; initialize cr_ruidinfo there. o Restruction many credential modification chunks to always crdup while we figure out locking and optimizations; generally speaking, this means moving to a structure like this: newcred = crdup(oldcred); ... p->p_ucred = newcred; crfree(oldcred); It's not race-free, but better than nothing. There are also races in sys_process.c, all inter-process authorization, fork, exec, and exit. o Remove sigio->sio_ruid since sigio->sio_ucred now contains the ruid; remove comments indicating that the old arrangement was a problem. o Restructure exec1() a little to use newcred/oldcred arrangement, and use improved uid management primitives. o Clean up exit1() so as to do less work in credential cleanup due to pcred removal. o Clean up fork1() so as to do less work in credential cleanup and allocation. o Clean up ktrcanset() to take into account changes, and move to using suser_xxx() instead of performing a direct uid==0 comparision. o Improve commenting in various kern_prot.c credential modification calls to better document current behavior. In a couple of places, current behavior is a little questionable and we need to check POSIX.1 to make sure it's "right". More commenting work still remains to be done. o Update credential management calls, such as crfree(), to take into account new ruidinfo reference. o Modify or add the following uid and gid helper routines: change_euid() change_egid() change_ruid() change_rgid() change_svuid() change_svgid() In each case, the call now acts on a credential not a process, and as such no longer requires more complicated process locking/etc. They now assume the caller will do any necessary allocation of an exclusive credential reference. Each is commented to document its reference requirements. o CANSIGIO() is simplified to require only credentials, not processes and pcreds. o Remove lots of (p_pcred==NULL) checks. o Add an XXX to authorization code in nfs_lock.c, since it's questionable, and needs to be considered carefully. o Simplify posix4 authorization code to require only credentials, not processes and pcreds. Note that this authorization, as well as CANSIGIO(), needs to be updated to use the p_cansignal() and p_cansched() centralized authorization routines, as they currently do not take into account some desirable restrictions that are handled by the centralized routines, as well as being inconsistent with other similar authorization instances. o Update libkvm to take these changes into account. Obtained from: TrustedBSD Project Reviewed by: green, bde, jhb, freebsd-arch, freebsd-audit
* This patch implements O_DIRECT about 80% of the way. It takes a patchsetdillon2001-05-241-7/+29
| | | | | | | | | | | | | | | | Tor created a while ago, removes the raw I/O piece (that has cache coherency problems), and adds a buffer cache / VM freeing piece. Essentially this patch causes O_DIRECT I/O to not be left in the cache, but does not prevent it from going through the cache, hence the 80%. For the last 20% we need a method by which the I/O can be issued directly to buffer supplied by the user process and bypass the buffer cache entirely, but still maintain cache coherency. I also have the code working under -stable but the changes made to sys/file.h may not be MFCable, so an MFC is not on the table yet. Submitted by: tegge, dillon
* ufs_bmaparray() may block on IO, drop vm mutex and aquire Giant whenalfred2001-05-231-0/+10
| | | | calling it from the pager routine
* - FDESC, FIFO, NULL, PORTAL, PROC, UMAP and UNION fileru2001-05-231-1/+1
| | | | | | | | | | | | | | | systems were repo-copied from sys/miscfs to sys/fs. - Renamed the following file systems and their modules: fdesc -> fdescfs, portal -> portalfs, union -> unionfs. - Renamed corresponding kernel options: FDESC -> FDESCFS, PORTAL -> PORTALFS, UNION -> UNIONFS. - Install header files for the above file systems. - Removed bogus -I${.CURDIR}/../../sys CFLAGS from userland Makefiles.
* Update softdep_setup_directory_add prototype to reflect changes inmckusick2001-05-201-2/+3
| | | | | | actual function. Obtained from: Jim Bloom <bloom@jbloom.jbloom.org>
* Must ensure that all the entries on the pd_pendinghd list have beenmckusick2001-05-191-3/+11
| | | | | | | | | | | | | | committed to disk before clearing them. More specifically, when free_newdirblk is called, we know that the inode claims the new directory block. However, if the associated pagedep is still linked onto the directory buffer dependency chain, then some of the entries on the pd_pendinghd list may not be committed to disk yet. In this case, we will simply note that the inode claims the block and let the pd_pendinghd list be processed when the pagedep is next written. If the pagedep is no longer on the buffer dependency chain, then all the entries on the pd_pending list are committed to disk and we can free them in free_newdirblk. This corrects a window of vulnerability introduced in the code added in version 1.95.
* Introduce a global lock for the vm subsystem (vm_mtx).alfred2001-05-191-9/+38
| | | | | | | | | | | | | | | | | | | vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb
* Must be a bit less aggressive about freeing pagedep structures.mckusick2001-05-181-1/+1
| | | | | Obtained from: Robert Watson <rwatson@FreeBSD.org> and Matthew Jacob <mjacob@feral.com>
* When a new block is allocated to a directory, an fsync of a filemckusick2001-05-174-39/+242
| | | | | | | | | | | | | whose name is within that block must ensure not only that the block containing the file name has been written, but also that the on-disk directory inode references that block. When a new directory block is created, we allocate a newdirblk structure which is linked to the associated allocdirect (on its ad_newdirblk list). When the allocdirect has been satisfied, the newdirblk structure is moved to the inodedep id_bufwait list of its directory to await the inode being written. When the inode is written, the directory entries are fully committed and can be deleted from their pagedep->id_pendinghd and inodedep->id_pendinghd lists.
* Change the second argument of vflush() to an integer that specifiesiedowse2001-05-161-3/+3
| | | | | | | | | | | | | | | | | | | | the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp
* Further fixes for deadlock in the presence of multiple snapshots.mckusick2001-05-141-7/+20
| | | | | There are still more to find, but this fix should cover the common cases that folks are hitting.
* If the effective link count is zero when an NFS file handle requestmckusick2001-05-131-1/+3
| | | | | | | | | | | | | | | | comes in for it, the file is really gone, so return ESTALE. The problem arises when the last reference to an FFS file is released because soft-updates may delay the actual freeing of the inode for some time. Since there are no filesystem links or open file descriptors referencing the inode, from the point of view of the system, the file is inaccessible. However, if the filesystem is NFS exported, then the remote client can still access the inode via ufs_fhtovp() until the inode really goes away. To prevent this anomoly, it is necessary to begin returning ESTALE at the same time that the file ceases to be accessible to the local filesystem. Obtained from: Ian Dowse <iedowse@maths.tcd.ie>
* Remove yet another deadlock case.mckusick2001-05-111-3/+6
|
* When running with soft updates, track the number of blocks and filesmckusick2001-05-089-11/+119
| | | | | | | | | | | | | that are committed to being freed and reflect these blocks in the counts returned by statfs (and thus also by the `df' command). This change allows programs such as those that do news expiration to know when to stop if they are trying to create a certain percentage of free space. Note that this change does not solve the much harder problem of making this to-be-freed space available to applications that want it (thus on a nearly full filesystem, you may still encounter out-of-space conditions even though the free space will show up eventually). Hopefully this harder problem will be the subject of a future enhancement.
* Several fixes for units errors:mckusick2001-05-081-10/+19
| | | | | | | | | | | | | | | | | 1) Do not assume that the superblock will be of size fs->fs_bsize. This fixes a panic when taking a snapshot on a filesystem with a block size bigger than 8K. 2) Properly calculate the number of fragments that follow the superblock summary information. This fixes a bug with inconsistent snapshots. 3) When cleaning up a snapshot that is about to be removed, properly calculate the number of blocks that need to be checked. This fixes a bug that created partially allocated inodes. 4) When moving blocks from a snapshot that is about to be removed to another snapshot, properly account for the reduced number of blocks in the snapshot from which they are taken. This fixes a bug in which the number of blocks released from a snapshot did not match the number that it claimed to have.
* When syncing out snapshot metadata, we must temporarily allow recursivemckusick2001-05-081-27/+29
| | | | | buffer locking so as to avoid locking against ourselves if we need to write filesystem metadata.
* Refinement to revision 1.16 of ufs/ffs/ffs_snapshot.c to reducemckusick2001-05-043-120/+227
| | | | | the amount of time that the filesystem must be suspended. The current snapshot is elided as well as the earlier snapshots.
* Use ufs_bmaparray() rather than VOP_BMAP() on our own vnodes.phk2001-05-011-2/+2
|
* Remove blatantly pointless call to VOP_BMAP().phk2001-05-012-9/+3
| | | | Use ufs_bmaparray() rather than VOP_BMAP() on our own vnodes.
* Implement vop_std{get|put}pages() and add them to the default vop[].phk2001-05-013-18/+0
| | | | | Un-copy&paste all the VOP_{GET|PUT}PAGES() functions which do nothing but the default.
* Undo part of the tangle of having sys/lock.h and sys/mutex.h included inmarkm2001-05-012-6/+11
| | | | | | | | | | | other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
* VOP_BALLOC was never really a VOP in the first place, so convert itphk2001-04-2911-49/+44
| | | | to UFS_BALLOC like the other "between UFS and FFS function interfaces".
* Add a vop_stdbmap(), and make it part of the default vop vector.phk2001-04-291-25/+1
| | | | | | Make 7 filesystems which don't really know about VOP_BMAP rely on the default vector, rather than more or less complete local vop_nopbmap() implementations.
* Call ufs_bmaparray() directly instead of indirectly via VOP_BMAP().phk2001-04-291-2/+3
|
* Remove two unused arguments from ufs_bmaparray().phk2001-04-292-22/+17
|
* Remove faint traces of blind copy&paste.phk2001-04-291-1/+0
|
* Remove faint traces of non-existant ffs_bmap().phk2001-04-291-2/+0
|
* Revert consequences of changes to mount.h, part 2.grog2001-04-2916-32/+0
| | | | Requested by: bde
* Rather than copying all the indirect blocks of the snapshot,mckusick2001-04-261-35/+19
| | | | | | simply mark them as BLK_NOCOPY. This trick cuts the initial size of the snapshot in half and cuts the time to take a snapshot by a third.
* When closing the last reference to an unlinked file, it is freedmckusick2001-04-253-26/+104
| | | | | | | | | | | | | | | by the inactive routine. Because the freeing causes the filesystem to be modified, the close must be held up during periods when the filesystem is suspended. For snapshots to be consistent across crashes, they must write blocks that they copy and claim those written blocks in their on-disk block pointers before the old blocks that they referenced can be allowed to be written. Close a loophole that allowed unwritten blocks to be skipped when doing ffs_sync with a request to wait for all I/O activity to be completed.
* Move the netexport structure from the fs-specific mountstructurephk2001-04-256-38/+6
| | | | | | | | | | | | | | to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>
* Pre-dirpref versions of fsck may zero out the new superblock fieldsiedowse2001-04-241-0/+6
| | | | | | | | | | | fs_contigdirs, fs_avgfilesize and fs_avgfpdir. This could cause panics if these fields were zeroed while a filesystem was mounted read-only, and then remounted read-write. Add code to ffs_reload() which copies the fs_contigdirs pointer from the previous superblock, and reinitialises fs_avgf* if necessary. Reviewed by: mckusick
* Correct #includes to work with fixed sys/mount.h.grog2001-04-2316-0/+32
|
* This patch removes the VOP_BWRITE() vector.phk2001-04-171-1/+0
| | | | | | | | | | | | | VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.
* Add debugging option to always read/write cylinder groups as fullmckusick2001-04-171-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sized blocks. To enable this option, use: `sysctl -w debug.bigcgs=1'. Add debugging option to disable background writes of cylinder groups. To enable this option, use: `sysctl -w debug.dobkgrdwrite=0'. These debugging options should be tried on systems that are panicing with corrupted cylinder group maps to see if it makes the problem go away. The set of panics in question are: ffs_clusteralloc: map mismatch ffs_nodealloccg: map corrupted ffs_nodealloccg: block not in map ffs_alloccg: map corrupted ffs_alloccg: block not in map ffs_alloccgblk: cyl groups corrupted ffs_alloccgblk: can't find blk in cyl ffs_checkblk: partially free fragment The following panics are less likely to be related to this problem, but might be helped by these debugging options: ffs_valloc: dup alloc ffs_blkfree: freeing free block ffs_blkfree: freeing free frag ffs_vfree: freeing free inode If you try these options, please report whether they helped reduce your bitmap corruption panics to Kirk McKusick at <mckusick@mckusick.com> and to Matt Dillon <dillon@earth.backplane.com>.
OpenPOWER on IntegriCloud