summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_vnops.c
Commit message (Collapse)AuthorAgeFilesLines
* o vn_extattr_set() will now call appropriate vn_start_write() andrwatson2000-09-051-2/+8
| | | | | | vn_finished_write() if IO_NODELOCKED is not set. Obtained from: TrustedBSD Project
* o Introduce vn_extattr_{get,set}, wrapper routines for VOP_GETEXTATTRrwatson2000-08-081-0/+74
| | | | | | | | | | and VOP_SETEXTATTR to simplify calling from in-kernel consumers, such as capability code. Both accept a vnode (optionally locked, with ioflg to indicate that), attribute name, and a buffer + buffer length in UIO_SYSSPACE. Both authorize the call as a kernel request, with cred set to NULL for the actual VOP_ calls. Obtained from: TrustedBSD Project
* This patch corrects the first round of panics and hangs reportedmckusick2000-07-241-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.
* Add snapshots to the fast filesystem. Most of the changes supportmckusick2000-07-111-2/+162
| | | | | | | | | | | | | | | | | | | | the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
* Move the truncation code out of vn_open and into the open system callmckusick2000-07-041-24/+13
| | | | | | | | | | after the acquisition of any advisory locks. This fix corrects a case in which a process tries to open a file with a non-blocking exclusive lock. Even if it fails to get the lock it would still truncate the file even though its open failed. With this change, the truncation is done only after the lock is successfully acquired. Obtained from: BSD/OS
* Fix stupid braino in last commit, initialize `vp' before we test vp->v_tag.jlemon2000-06-251-2/+2
| | | | Spotted by: dillon
* Add a hack to fail registration of kq events on a non-ufs filesystem, asjlemon2000-06-221-0/+8
| | | | support for those is non-existent at the moment.
* Back out the previous change to the queue(3) interface.jake2000-05-261-1/+1
| | | | | | It was not discussed and should probably not happen. Requested by: msmith and others
* Change the way that the queue(3) structures are declared; don't assume thatjake2000-05-231-1/+1
| | | | | | | | the type argument to *_HEAD and *_ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd
* Fix comment typo.asmodai2000-05-121-1/+1
| | | | Submitted by: nrahlstr
* Separate the struct bio related stuff out of <sys/buf.h> intophk2000-05-051-0/+1
| | | | | | | | | | | | | | | <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
* Remove unneeded #include <vm/vm_zone.h>phk2000-04-301-1/+0
| | | | Generated by: src/tools/tools/kerninclude
* Introduce kqueue() and kevent(), a kernel event notification facility.jlemon2000-04-161-0/+77
|
* Change the write-behind code to take more care when startingdillon2000-04-021-27/+36
| | | | | | | | | | | | | | async I/O's. The sequential read heuristic has been extended to cover writes as well. We continue to call cluster_write() normally, thus blocks in the file will still be reallocated for large (but still random) I/O's, but I/O will only be initiated for truely sequential writes. This solves a number of annoying situations, especially with DBM (hash method) writes, and also has the side effect of fixing a number of (stupid) benchmarks. Reviewed-by: mckusick
* Give vn_isdisk() a second argument where it can return a suitable errno.phk2000-01-101-3/+1
| | | | Suggested by: bde
* Add bwillwrite to all system calls that create things in the filesystem.mckusick2000-01-101-0/+1
| | | | Benchmarks that create huge trees of empty files overwhelm the buffer cache.
* Introduce NDFREE (and remove VOP_ABORTOP)eivind1999-12-151-3/+10
|
* Ensure that garbage from the kernel stack does not wind up beingdillon1999-11-181-0/+8
| | | | | | | | returned to user mode in the spare fields of the stat structure. PR: kern/14966 Reviewed by: dillon@freebsd.org Submitted by: Kelly Yancey kbyanc@posi.net
* Add a vnode fo_stat() entry point.peter1999-11-081-1/+13
|
* This is what was "fdfix2.patch," a fix for fd sharing. It's prettygreen1999-09-191-7/+12
| | | | | | | | | | | | | | | | | far-reaching in fd-land, so you'll want to consult the code for changes. The biggest change is that now, you don't use fp->f_ops->fo_foo(fp, bar) but instead fo_foo(fp, bar), which increments and decrements the fp refcount upon entry and exit. Two new calls, fhold() and fdrop(), are provided. Each does what it seems like it should, and if fdrop() brings the refcount to zero, the fd is freed as well. Thanks to peter ("to hell with it, it looks ok to me.") for his review. Thanks to msmith for keeping me from putting locks everywhere :) Reviewed by: peter
* Changes to centralise the default blocksize behaviour.julian1999-09-091-13/+20
| | | | | | More likely to follow. Submitted by: phk@freebsd.org
* Revert a bunch of contraversial changes by PHK. Afterjulian1999-09-031-21/+13
| | | | | | | | | | a quick think and discussion among various people some form of some of these changes will probably be recommitted. The reversion requested was requested by dg while discussions proceed. PHK has indicated that he can live with this, and it has been agreed that some form of some of these changes may return shortly after further discussion.
* Improve the returned values in st_blksize a little bit, avoidphk1999-09-011-11/+21
| | | | accessing union fields not valid for dev_t type.
* Make bdev userland access work like cdev userland access unlessphk1999-08-301-2/+0
| | | | | | | | | | the highly non-recommended option ALLOW_BDEV_ACCESS is used. (bdev access is evil because you don't get write errors reported.) Kill si_bsize_best before it kills Matt :-) Use the specfs routines rather having cloned copies in devfs.
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Add FIODTYPE ioctl for getting d_flags (type) info on a device.green1999-08-271-1/+7
| | | | Okayed by: phk
* Add a couple of missing but unimportant break; statements.phk1999-08-251-1/+3
|
* oops: Add missing include.phk1999-08-131-1/+2
|
* Move the special-casing of stat(2)->st_blksize for device filesphk1999-08-131-2/+15
| | | | | | from UFS to the generic level. For chr/blk devices we don't care about the blocksize of the filesystem, we want what the device asked for.
* Fix fd race conditions (during shared fd table usage.) Badfileops isgreen1999-08-041-1/+3
| | | | | | | | | | | | now used in f_ops in place of NULL, and modifications to the files are more carefully ordered. f_ops should also be set to &badfileops upon "close" of a file. This does not fix other problems mentioned in this PR than the first one. PR: 11629 Reviewed by: peter
* Add sysctl and support code to allow directories to be VMIO'd. The defaultalc1999-07-261-2/+2
| | | | | | setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon
* These changes appear to give us benefits with both small (32MB) andmckusick1999-07-081-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() *PRIOR* to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* Make sure that stat(2) and friends always return a valid st_dev field.phk1999-07-021-2/+5
| | | | | | Pseudo-FS need not fill in the va_fsid anymore, the syscall code will use the first half of the fsid, which now looks like a udev_t with major 255.
* This Implements the mumbled about "Jail" feature.phk1999-04-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
* Suser() simplification:phk1999-04-271-2/+2
| | | | | | | | | | | | | | | | | | | 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>. 3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.
* Address several problems in vn_read and vn_write:alc1999-04-211-35/+21
| | | | | | | | | | | | | | | 1. Make read-ahead work for pread and aio_read. 2. Fix one place where a comparison of uio_offset with -1 wasn't updated to use FOF_OFFSET. 3. Honor O_APPEND in the FOF_OFFSET case. In addition, use the variable name "ioflag" in both vn_read and vn_write to avoid possible confusion between the variable "flag" and the parameter "flags". Submitted by: Bruce Evans <bde@zeta.org.au> and me
* Add standard padding argument to pread and pwrite syscall. That should make themdt1999-04-041-7/+9
| | | | | | | | | NetBSD compatible. Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that the offset is set in the struct uio). Factor out some common code from read/pread/write/pwrite syscalls.
* Changed vn_read/write such that fp->f_offset isn't touchedalc1999-03-261-4/+15
| | | | | | | | | if uio->uio_offset != -1. This fixes a problem with aio_read/write and permits a straightforward implementation of pread/pwrite. PR: kern/8669 Submitted by: John Plevyak <jplevyak@inktomi.com> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
* Use suser() to determine super-user-ness, don't examine cr_uid directly.phk1999-01-301-2/+2
|
* Add 'options DEBUG_LOCKS', which stores extra information in structeivind1999-01-201-2/+15
| | | | | | | | lock, and add some macros and function parameters to make sure that the information get to the point where it can be put in the lock structure. While I'm here, add DEBUG_VFS_LOCKS to LINT.
* Remove the 'waslocked' parameter to vfs_object_create().eivind1999-01-051-2/+2
|
* Only do one VOP_ACCESS() per open() instead of two. This should reducepeter1998-11-021-8/+9
| | | | | | | | | | | the NFSv3 ACCESS RPC problems a little for busy clients that do a lot of open/close. The nfs code could probably cache the results, but I'm not sure whether this would be legal or useful. The problem is that with a CPU farm, on each open there would be a lookup, getattr then access RPC then the read/write RPC activity. Caching the access results probably isn't going to help much if the clients access lots of files. Having the nfs_access() routine interpret the getattr results is a bit of a hack, but it's how NFSv2 is done and it might be OK for a mount attribute for v3.
* Report the mode as the result of the VOP_GETATTR rather than thephk1998-06-271-2/+2
| | | | vnodes type, they may not correspond.
* This commit fixes various 64bit portability problems required fordfr1998-06-071-3/+3
| | | | | | | | | | FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
* In the words of the submitter:msmith1998-05-071-3/+5
| | | | | | | | | | | | | | | | | | | --------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly. Quality testing was done with testvn, and lat_fs from the lmbench suite. Some NFS client testing courtesy of Patrik Kudo. vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. --------- Submitted by: Michael Hancock <michaelh@cet.co.jp>
* Grammar police.alex1998-04-101-2/+2
|
* New mount option nosymfollow. If enabled, the kernel lookup()wosch1998-04-081-1/+6
| | | | | function will not follow symbolic links on the mounted file system and return EACCES (Permission denied).
* Today is not my lucky day. Fix missing brace and I got a requestpeter1998-04-061-3/+3
| | | | to use EMLINK instead.
* Use a different errno (ELOOP (as sef mentioned) since the text that goespeter1998-04-061-2/+6
| | | | with the error sounds ok for the condition) if O_NOFOLLOW gets a link.
* Rather than let users get fd's to symlink files, make O_NOFOLLOW causepeter1998-04-061-3/+3
| | | | | | an error if it gets a link (like it does if it gets a socket). The implications of letting users try and do file operations on symlinks themselves were too worrying.
OpenPOWER on IntegriCloud