summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_vnops.c
Commit message (Collapse)AuthorAgeFilesLines
* Give vn_isdisk() a second argument where it can return a suitable errno.phk2000-01-101-3/+1
| | | | Suggested by: bde
* Add bwillwrite to all system calls that create things in the filesystem.mckusick2000-01-101-0/+1
| | | | Benchmarks that create huge trees of empty files overwhelm the buffer cache.
* Introduce NDFREE (and remove VOP_ABORTOP)eivind1999-12-151-3/+10
|
* Ensure that garbage from the kernel stack does not wind up beingdillon1999-11-181-0/+8
| | | | | | | | returned to user mode in the spare fields of the stat structure. PR: kern/14966 Reviewed by: dillon@freebsd.org Submitted by: Kelly Yancey kbyanc@posi.net
* Add a vnode fo_stat() entry point.peter1999-11-081-1/+13
|
* This is what was "fdfix2.patch," a fix for fd sharing. It's prettygreen1999-09-191-7/+12
| | | | | | | | | | | | | | | | | far-reaching in fd-land, so you'll want to consult the code for changes. The biggest change is that now, you don't use fp->f_ops->fo_foo(fp, bar) but instead fo_foo(fp, bar), which increments and decrements the fp refcount upon entry and exit. Two new calls, fhold() and fdrop(), are provided. Each does what it seems like it should, and if fdrop() brings the refcount to zero, the fd is freed as well. Thanks to peter ("to hell with it, it looks ok to me.") for his review. Thanks to msmith for keeping me from putting locks everywhere :) Reviewed by: peter
* Changes to centralise the default blocksize behaviour.julian1999-09-091-13/+20
| | | | | | More likely to follow. Submitted by: phk@freebsd.org
* Revert a bunch of contraversial changes by PHK. Afterjulian1999-09-031-21/+13
| | | | | | | | | | a quick think and discussion among various people some form of some of these changes will probably be recommitted. The reversion requested was requested by dg while discussions proceed. PHK has indicated that he can live with this, and it has been agreed that some form of some of these changes may return shortly after further discussion.
* Improve the returned values in st_blksize a little bit, avoidphk1999-09-011-11/+21
| | | | accessing union fields not valid for dev_t type.
* Make bdev userland access work like cdev userland access unlessphk1999-08-301-2/+0
| | | | | | | | | | the highly non-recommended option ALLOW_BDEV_ACCESS is used. (bdev access is evil because you don't get write errors reported.) Kill si_bsize_best before it kills Matt :-) Use the specfs routines rather having cloned copies in devfs.
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Add FIODTYPE ioctl for getting d_flags (type) info on a device.green1999-08-271-1/+7
| | | | Okayed by: phk
* Add a couple of missing but unimportant break; statements.phk1999-08-251-1/+3
|
* oops: Add missing include.phk1999-08-131-1/+2
|
* Move the special-casing of stat(2)->st_blksize for device filesphk1999-08-131-2/+15
| | | | | | from UFS to the generic level. For chr/blk devices we don't care about the blocksize of the filesystem, we want what the device asked for.
* Fix fd race conditions (during shared fd table usage.) Badfileops isgreen1999-08-041-1/+3
| | | | | | | | | | | | now used in f_ops in place of NULL, and modifications to the files are more carefully ordered. f_ops should also be set to &badfileops upon "close" of a file. This does not fix other problems mentioned in this PR than the first one. PR: 11629 Reviewed by: peter
* Add sysctl and support code to allow directories to be VMIO'd. The defaultalc1999-07-261-2/+2
| | | | | | setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon
* These changes appear to give us benefits with both small (32MB) andmckusick1999-07-081-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() *PRIOR* to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* Make sure that stat(2) and friends always return a valid st_dev field.phk1999-07-021-2/+5
| | | | | | Pseudo-FS need not fill in the va_fsid anymore, the syscall code will use the first half of the fsid, which now looks like a udev_t with major 255.
* This Implements the mumbled about "Jail" feature.phk1999-04-281-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
* Suser() simplification:phk1999-04-271-2/+2
| | | | | | | | | | | | | | | | | | | 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>. 3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.
* Address several problems in vn_read and vn_write:alc1999-04-211-35/+21
| | | | | | | | | | | | | | | 1. Make read-ahead work for pread and aio_read. 2. Fix one place where a comparison of uio_offset with -1 wasn't updated to use FOF_OFFSET. 3. Honor O_APPEND in the FOF_OFFSET case. In addition, use the variable name "ioflag" in both vn_read and vn_write to avoid possible confusion between the variable "flag" and the parameter "flags". Submitted by: Bruce Evans <bde@zeta.org.au> and me
* Add standard padding argument to pread and pwrite syscall. That should make themdt1999-04-041-7/+9
| | | | | | | | | NetBSD compatible. Add parameter to fo_read and fo_write. (The only flag FOF_OFFSET mean that the offset is set in the struct uio). Factor out some common code from read/pread/write/pwrite syscalls.
* Changed vn_read/write such that fp->f_offset isn't touchedalc1999-03-261-4/+15
| | | | | | | | | if uio->uio_offset != -1. This fixes a problem with aio_read/write and permits a straightforward implementation of pread/pwrite. PR: kern/8669 Submitted by: John Plevyak <jplevyak@inktomi.com> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com>
* Use suser() to determine super-user-ness, don't examine cr_uid directly.phk1999-01-301-2/+2
|
* Add 'options DEBUG_LOCKS', which stores extra information in structeivind1999-01-201-2/+15
| | | | | | | | lock, and add some macros and function parameters to make sure that the information get to the point where it can be put in the lock structure. While I'm here, add DEBUG_VFS_LOCKS to LINT.
* Remove the 'waslocked' parameter to vfs_object_create().eivind1999-01-051-2/+2
|
* Only do one VOP_ACCESS() per open() instead of two. This should reducepeter1998-11-021-8/+9
| | | | | | | | | | | the NFSv3 ACCESS RPC problems a little for busy clients that do a lot of open/close. The nfs code could probably cache the results, but I'm not sure whether this would be legal or useful. The problem is that with a CPU farm, on each open there would be a lookup, getattr then access RPC then the read/write RPC activity. Caching the access results probably isn't going to help much if the clients access lots of files. Having the nfs_access() routine interpret the getattr results is a bit of a hack, but it's how NFSv2 is done and it might be OK for a mount attribute for v3.
* Report the mode as the result of the VOP_GETATTR rather than thephk1998-06-271-2/+2
| | | | vnodes type, they may not correspond.
* This commit fixes various 64bit portability problems required fordfr1998-06-071-3/+3
| | | | | | | | | | FreeBSD/alpha. The most significant item is to change the command argument to ioctl functions from int to u_long. This change brings us inline with various other BSD versions. Driver writers may like to use (__FreeBSD_version == 300003) to detect this change. The prototype FreeBSD/alpha machdep will follow in a couple of days time.
* In the words of the submitter:msmith1998-05-071-3/+5
| | | | | | | | | | | | | | | | | | | --------- Make callers of namei() responsible for releasing references or locks instead of having the underlying filesystems do it. This eliminates redundancy in all terminal filesystems and makes it possible for stacked transport layers such as umapfs or nullfs to operate correctly. Quality testing was done with testvn, and lat_fs from the lmbench suite. Some NFS client testing courtesy of Patrik Kudo. vop_mknod and vop_symlink still release the returned vpp. vop_rename still releases 4 vnode arguments before it returns. These remaining cases will be corrected in the next set of patches. --------- Submitted by: Michael Hancock <michaelh@cet.co.jp>
* Grammar police.alex1998-04-101-2/+2
|
* New mount option nosymfollow. If enabled, the kernel lookup()wosch1998-04-081-1/+6
| | | | | function will not follow symbolic links on the mounted file system and return EACCES (Permission denied).
* Today is not my lucky day. Fix missing brace and I got a requestpeter1998-04-061-3/+3
| | | | to use EMLINK instead.
* Use a different errno (ELOOP (as sef mentioned) since the text that goespeter1998-04-061-2/+6
| | | | with the error sounds ok for the condition) if O_NOFOLLOW gets a link.
* Rather than let users get fd's to symlink files, make O_NOFOLLOW causepeter1998-04-061-3/+3
| | | | | | an error if it gets a link (like it does if it gets a socket). The implications of letting users try and do file operations on symlinks themselves were too worrying.
* Implement a new open(2) flag: O_NOFOLLOW. This will instruct openpeter1998-04-061-2/+3
| | | | | | | | | | | to not follow symlinks, but to open a handle on the link itself(!). As strange as this might sound, it has several useful applications safe race-free ways of opening files in hostile areas (eg: /tmp, a mode 1777 /var/mail, etc). It also would allow things like fchown() to work on the link rather than having to implement a new syscall specifically for that task. Reviewed by: phk
* Removed unused #includes.bde1998-02-251-2/+1
|
* Back out DIAGNOSTIC changes.eivind1998-02-061-3/+1
|
* Turn DIAGNOSTIC into a new-style option.eivind1998-02-041-1/+3
|
* Fix some vnode management problems, and better mgmt of vnode free list.dyson1998-01-121-4/+2
| | | | | | | | | | | | | | | Fix the UIO optimization code. Fix an assumption in vm_map_insert regarding allocation of swap pagers. Fix an spl problem in the collapse handling in vm_object_deallocate. When pages are freed from vnode objects, and the criteria for putting the associated vnode onto the free list is reached, either put the vnode onto the list, or put it onto an interrupt safe version of the list, for further transfer onto the actual free list. Some minor syntax changes changing pre-decs, pre-incs to post versions. Remove a bogus timeout (that I added for debugging) from vn_lock. PHK will likely still have problems with the vnode list management, and so do I, but it is better than it was.
* Make our v_usecount vnode reference count work identically to thedyson1998-01-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
* Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problemdyson1997-12-291-3/+4
| | | | with the object cache removal.
* Lots of improvements, including restructring the caching and managementdyson1997-12-291-2/+4
| | | | | | | | | | | | | | of vnodes and objects. There are some metadata performance improvements that come along with this. There are also a few prototypes added when the need is noticed. Changes include: 1) Cleaning up vref, vget. 2) Removal of the object cache. 3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore. 4) Correct some missing LK_RETRY's in vn_lock. 5) Correct the page range in the code for msync. Be gentle, and please give me feedback asap.
* Changes to allow event-based process monitoring and control.sef1997-12-061-2/+3
|
* Fix and complete the AIO syscalls. There are some performance enhancementsdyson1997-11-291-3/+4
| | | | coming up soon, but the code is functional. Docs will be forthcoming.
* Remove a bunch of variables which were unused both in GENERIC and LINT.phk1997-11-071-2/+2
| | | | Found by: -Wunused
* Use 127 instead of CHAR_MAX for the limit on the sequence count. Thebde1997-10-271-18/+17
| | | | | | | | | | limit doesn't have anything to do with characters. The count mainly needs to fit in the VOP_READ() ioflag after being left shifted by 16. Moved vn_lock() before vn_closefile(). vn_lock() was mismerged from Lite2. Removed some gratuitous braces.
* Relax the vnode locking for read only operations.dyson1997-10-061-3/+3
|
* vn_select -> vn_pollpeter1997-09-141-8/+9
|
OpenPOWER on IntegriCloud