summaryrefslogtreecommitdiffstats
path: root/sys/gnu/fs
Commit message (Collapse)AuthorAgeFilesLines
* Update to C99, s/__FUNCTION__/__func__/,obrien2001-12-101-1/+1
| | | | also don't use ANSI string concatenation.
* Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mountdillon2001-11-041-0/+1
| | | | | | | | structure changes now rather then piecemeal later on. mnt_nvnodelist currently holds all the vnodes under the mount point. This will eventually be split into a 'dirty' and 'clean' list. This way we only break kld's once rather then twice. nvnodelist will eventually turn into the dirty list and should remain compatible with the klds.
* Change the vnode list under the mount point from a LIST to a TAILQdillon2001-10-231-4/+5
| | | | | | in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days
* The addition of i_dirhash to struct inode pushed RELENG_4'siedowse2001-09-242-3/+3
| | | | | | | | | | | | | sizeof(struct inode) into a new malloc bucket on the i386. This didn't happen in -current due to the removal of i_lock, but it does no harm to apply the workaround to -current first. Reduce the size of the i_spare[] array in struct inode from 4 to 3 entries, and change ext2fs to use i_din.di_spare[1] so that it does not need i_spare[3]. Reviewed by: bde MFC after: 3 days
* KSE Milestone 2julian2001-09-128-112/+112
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* Bring in dirhash, a simple hash-based lookup optimisation for largeiedowse2001-07-101-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | directories. When enabled via "options UFS_DIRHASH", in-core hash arrays are maintained for large directories. These allow all directory operations to take place quickly instead of requiring long linear searches. For now anyway, dirhash is not enabled by default. The in-core hash arrays have a memory requirement that is approximately half the size of the size of the on-disk directory file. A number of new sysctl variables allow control over which directories get hashed and over the maximum amount of memory that dirhash will use: vfs.ufs.dirhash_minsize The minimum on-disk directory size for which hashing should be used. The default is 2560 (2.5k). vfs.ufs.dirhash_maxmem The system-wide maximum total memory to be used by dirhash data structures. The default is 2097152 (2MB). The current amount of memory being used by dirhash is visible through the read-only sysctl variable vfs.ufs.dirhash_maxmem. Finally, some extra sanity checks that are enabled by default, but which may have an impact on performance, can be disabled by setting vfs.ufs.dirhash_docheck to 0. Discussed on: -fs, -hackers
* Fix more mntvnode and vnode interlock order reversals.jhb2001-06-281-2/+2
|
* Fix a mntvnode and vnode interlock reversal.jhb2001-06-281-3/+4
|
* Remove last vestiges of MFS.phk2001-05-291-10/+0
|
* Change the second argument of vflush() to an integer that specifiesiedowse2001-05-161-2/+2
| | | | | | | | | | | | | | | | | | | | the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp
* When running with soft updates, track the number of blocks and filesmckusick2001-05-081-4/+3
| | | | | | | | | | | | | that are committed to being freed and reflect these blocks in the counts returned by statfs (and thus also by the `df' command). This change allows programs such as those that do news expiration to know when to stop if they are trying to create a certain percentage of free space. Note that this change does not solve the much harder problem of making this to-be-freed space available to applications that want it (thus on a nearly full filesystem, you may still encounter out-of-space conditions even though the free space will show up eventually). Hopefully this harder problem will be the subject of a future enhancement.
* Remove blatantly pointless call to VOP_BMAP().phk2001-05-011-4/+1
|
* Implement vop_std{get|put}pages() and add them to the default vop[].phk2001-05-011-32/+0
| | | | | Un-copy&paste all the VOP_{GET|PUT}PAGES() functions which do nothing but the default.
* Undo part of the tangle of having sys/lock.h and sys/mutex.h included inmarkm2001-05-011-6/+9
| | | | | | | | | | | other "system" header files. Also help the deprecation of lockmgr.h by making it a sub-include of sys/lock.h and removing sys/lockmgr.h form kernel .c files. Sort sys/*.h includes where possible in affected files. OK'ed by: bde (with reservations)
* VOP_BALLOC was never really a VOP in the first place, so convert itphk2001-04-291-0/+2
| | | | to UFS_BALLOC like the other "between UFS and FFS function interfaces".
* Make a panic less misleading.phk2001-04-291-1/+1
|
* Remove two unused arguments from ufs_bmaparray().phk2001-04-291-20/+15
|
* Revert consequences of changes to mount.h, part 2.grog2001-04-291-2/+0
| | | | Requested by: bde
* MFffs ffs_balloc.c 1.5.bde2001-04-251-0/+2
| | | | | | | | | | | | | | | | | Long ago, bread() set b_blkno to the disk block number as a side effect of doing physical i/o (or it just retained the setting from when the i/o was done). The setting is lost when buffers go away and then are reconsituted from VM. bread() originally compensated by doing a VOP_BMAP() to recover b_blkno, but this was no good since it sometimes caused extra i/o or even deadlock for bread()ing metadata to do the bmap. This was fixed in vfs_bio.c 1.33 (1995/03/03) and ffs_balloc.c 1.5, etc., by removing the VOP_BMAP() from bread() and breadn(), and changing all (?) places that used b_blkno to set it if necessary. ext2fs was not imported until later in 1995 and was still depending on the old behaviour of bread() in at least ext2_balloc(). This caused filesystem and file corruption by clobbering direct block numbers in inodes.
* Move the netexport structure from the fs-specific mountstructurephk2001-04-252-4/+2
| | | | | | | | | | | | | | to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>
* Correct #includes to work with fixed sys/mount.h.grog2001-04-231-0/+2
|
* Fixes to track snapshot copy-on-write checking in the specinfomckusick2001-03-071-1/+1
| | | | | | structure rather than assuming that the device vnode would reside in the FFS filesystem (which is obviously a broken assumption with the device filesystem).
* Grab the process lock while calling psignal and before calling psignal.jhb2001-03-071-0/+2
|
* Reviewed by: jlemonadrian2001-03-011-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | An initial tidyup of the mount() syscall and VFS mount code. This code replaces the earlier work done by jlemon in an attempt to make linux_mount() work. * the guts of the mount work has been moved into vfs_mount(). * move `type', `path' and `flags' from being userland variables into being kernel variables in vfs_mount(). `data' remains a pointer into userspace. * Attempt to verify the `type' and `path' strings passed to vfs_mount() aren't too long. * rework mount() and linux_mount() to take the userland parameters (besides data, as mentioned) and pass kernel variables to vfs_mount(). (linux_mount() already did this, I've just tidied it up a little more.) * remove the copyin*() stuff for `path'. `data' still requires copyin*() since its a pointer into userland. * set `mount->mnt_statf_mntonname' in vfs_mount() rather than in each filesystem. This variable is generally initialised with `path', and each filesystem can override it if they want to. * NOTE: f_mntonname is intiailised with "/" in the case of a root mount.
* Preceed/preceeding are not english words. Use precede or preceding.asmodai2001-02-181-1/+1
|
* Change and clean the mutex lock interface.bmilekic2001-02-091-13/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)
* Mechanical change to use <sys/queue.h> macro API instead ofphk2001-02-041-4/+4
| | | | | | | fondling implementation details. Created with: sed(1) Reviewed by: md5(1)
* Back out proc locking to protect p_ucred for obtaining additionaljhb2001-01-271-24/+3
| | | | references along with the actual obtaining of additional references.
* Convert all simplelocks to mutexes and remove the simplelock implementations.jasone2001-01-241-13/+12
|
* Proc locking, mostly protecting p_ucred while obtaining additionaljhb2001-01-232-5/+28
| | | | references.
* Avoid a data-consistency race between write() and mmap()dillon2000-12-171-0/+9
| | | | | | | | by ensuring that newly allocated blocks are zerod. The race can occur even in the case where the write covers the entire block. Reported by: Sven Berkvens <sven@berkvens.net>, Marc Olzheim <zlo@zlo.nu>
* Put the bits in place for Alpha support for ext2. Not tested.mjacob2000-12-092-0/+4
|
* Correct to a common %ld the 5 argument to a printf.mjacob2000-12-091-2/+2
|
* Use a pointer to a size_t for the 4th argument to copyinstr-mjacob2000-12-091-1/+1
| | | | not a pointer to a u_int.
* Backed out previous commit. Don't depend on namespace pollution inbde2000-12-022-0/+2
| | | | <sys/buf.h>.
* remove unneded sys/ucred.h includesalfred2000-11-302-2/+0
|
* Quick fix for not writing group descriptor group, inode bitmaps orbde2000-11-101-1/+2
| | | | | | | | | | | | | block bitmaps before unmount() completes. They were written using bdwrite(), so they were normally written less than 32 seconds after unmount(), but this is too late if the media is removed or the system is rebooted soon after unmount(). sync()ing before unmount() didn't help, because ext2fs uses buggy private caching for these blocks -- it doesn't even bdwrite() them until they are uncached or the filesystem is unmounted. sync()ing after unmount() didn't help, because sync() only applies to (vnodes for) mounted filesystems. PR: 22726
* Fixed breakage of mknod() in rev.1.48 of ext2_vnops.c and rev.1.126 ofbde2000-11-041-1/+3
| | | | | | | | | | | | | | | | | | | | ufs_vnops.c: 1) i_ino was confused with i_number, so the inode number passed to VFS_VGET() was usually wrong (usually 0U). 2) ip was dereferenced after vgone() freed it, so the inode number passed to VFS_VGET() was sometimes not even wrong. Bug (1) was usually fatal in ext2_mknod(), since ext2fs doesn't have space for inode 0 on the disk; ino_to_fsba() subtracts 1 from the inode number, so inode number 0U gives a way out of bounds array index. Bug(1) was usually harmless in ufs_mknod(); ino_to_fsba() doesn't subtract 1, and VFS_VGET() reads suitable garbage (all 0's?) from the disk for the invalid inode number 0U; ufs_mknod() returns a wrong vnode, but most callers just vput() it; the correct vnode is eventually obtained by an implicit VFS_VGET() just like it used to be. Bug (2) usually doesn't happen.
* Support filesystems with the not-so-new "sparse_superblocks" feature.bde2000-11-034-15/+45
| | | | | | | | | | | When this feature is enabled, mke2fs doesn't necessarily allocate a super block and its associated descriptor blocks for every group. The (non-)allocations are reflected in the block bitmap. Since the filesystem code doesn't write to these blocks except for the first superblock, all it has to do to support them is to not count them in ext2_statfs() and not attempt to check them at mount time in ext2_check_blocks_bitmap() (the check has never been enabled in FreeBSD anyway).
* Weaken a bogus dependency on <sys/proc.h> in <sys/buf.h> by #ifdef'ingphk2000-10-292-2/+0
| | | | | | | | | | the offending inline function (BUF_KERNPROC) on it being #included already. I'm not sure BUF_KERNPROC() is even the right thing to do or in the right place or implemented the right way (inline vs normal function). Remove consequently unneeded #includes of <sys/proc.h>
* Convert all users of fldoff() to offsetof(). fldoff() is badphk2000-10-271-2/+0
| | | | | | | | | | | | | | | | | | | | | | | because it only takes a struct tag which makes it impossible to use unions, typedefs etc. Define __offsetof() in <machine/ansi.h> Define offsetof() in terms of __offsetof() in <stddef.h> and <sys/types.h> Remove myriad of local offsetof() definitions. Remove includes of <stddef.h> in kernel code. NB: Kernelcode should *never* include from /usr/include ! Make <sys/queue.h> include <machine/ansi.h> to avoid polluting the API. Deprecate <struct.h> with a warning. The warning turns into an error on 01-12-2000 and the file gets removed entirely on 01-01-2001. Paritials reviews by: various. Significant brucifications by: bde
* Blow away the v_specmountpoint define, replacing it with what it waseivind2000-10-091-2/+2
| | | | defined as (rdev->si_mountpoint)
* Convert lockmgr locks from using simple locks to using mutexes.jasone2000-10-041-3/+5
| | | | | | Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.
* ext2fs depends on ufs code, so update it to properly handle v_lock field.bp2000-09-261-1/+1
| | | | Noticed by: bde
* Add a lock structure to vnode structure. Previously it was either allocatedbp2000-09-251-1/+0
| | | | | | | | | | | | | | | | | | | separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_no*lock*() functions maintain only interlock lock. vop_std*lock*() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick
* Fixed some serious bugs in ext2_readdir():bde2000-09-121-11/+22
| | | | | | | | | | | | | | | | | The cookie buffer was usually overrun by a large amount whenever cookies were used. Cookies are used by nfs and the Linuxulator, so this bug usually caused panics whenever an ext2fs filesystem was nfs mounted or a Linux utility that calls readdir() was run on an ext2fs filesystem. The directory buffer was sometimes overrun by a small amount. This sometimes caused panics and wrong results even for FreeBSD utilities, but it was usually harmless because FreeBSD utilities use a large enough buffer size (4K). Linux utilities usually triggered the bug since they use a too-small buffer size (512 bytes), at least with the old RedHat utilities that I tested with. PR: 19407 (this fix is incomplete or for a slightly different bug)
* This patch corrects the first round of panics and hangs reportedmckusick2000-07-241-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
* Add snapshots to the fast filesystem. Most of the changes supportmckusick2000-07-112-7/+17
| | | | | | | | | | | | | | | | | | | | the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
* Fix typo (accessable --> accessible).alex2000-06-141-1/+1
| | | | | | PR: 18588 Submitted by: Anatoly Vorobey <mellon@pobox.com> Reviewed by: asmodai
* Back out the previous change to the queue(3) interface.jake2000-05-261-1/+1
| | | | | | It was not discussed and should probably not happen. Requested by: msmith and others
OpenPOWER on IntegriCloud