summaryrefslogtreecommitdiffstats
path: root/sys/ufs/ufs
Commit message (Collapse)AuthorAgeFilesLines
...
* Bug fixes for currently harmless bugs that could rise to bitemckusick2000-03-152-4/+4
| | | | | | | | | | | | | | | | | | | | | | the unwary if the code were called in slightly different ways. 1) In ufs_bmaparray() the code for calculating 'runb' will stop one block short of the first entry in an indirect block. i.e. if an indirect block contains N block numbers b[0]..b[N-1] then the code will never check if b[0] and b[1] are sequential. For reference, compare with the equivalent code that deals with direct blocks. 2) In ufs_lookup() there is an off-by-one error in the test that checks if dp->i_diroff is outside the range of the the current directory size. This is completely harmless, since the following while-loop condition 'dp->i_offset < endsearch' is never met, so the code immediately does a second pass starting at dp->i_offset = 0. 3) Again in ufs_lookup(), the condition in a sanity check is wrong for directories that are longer than one block. This bug means that the sanity check is only effective for small directories. Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
* In the 'found' case for ufs_lookup() the underlying bp's data wasdillon2000-03-091-1/+1
| | | | | | | | being accessed after the bp had been releaed. A simple move of the brelse() solves the problem. Approved by: jkh Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
* After much consulting with bde, concluded that this fix was the best fixrwatson2000-02-221-2/+2
| | | | | | | | | | to the current jail/chflags interactions. This fix conditionalizes ``root behavior'' in the chflags() case on not being in jail, so attempts to perform a chflags in a jail are limited to what a normal user could do. For example, this does allow setting of user flags as appropriate, but prohibits changing of system flags. Reviewed by: bde
* Disable chflags() from within jail() so that root within jail can't makerwatson2000-02-201-1/+1
| | | | | | | a mess in securelevel environments. Results in one warning during /etc/rc as it attempts to remove file flags, but this is harmless. Approved by: High Lord Hubbard
* Several performance improvements for soft updates have been added:mckusick2000-01-103-29/+56
| | | | | | | | | | | | | | | 1) Fastpath deletions. When a file is being deleted, check to see if it was so recently created that its inode has not yet been written to disk. If so, the delete can proceed to immediately free the inode. 2) Background writes: No file or block allocations can be done while the bitmap is being written to disk. To avoid these stalls, the bitmap is copied to another buffer which is written thus leaving the original available for futher allocations. 3) Link count tracking. Constantly track the difference in i_effnlink and i_nlink so that inodes that have had no change other than i_effnlink need not be written. 4) Identify buffers with rollback dependencies so that the buffer flushing daemon can choose to skip over them.
* Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL"peter1999-12-293-7/+7
| | | | | | is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.
* Introduce NDFREE (and remove VOP_ABORTOP)eivind1999-12-152-43/+3
|
* Remove WILLRELE from VOP_SYMLINKeivind1999-11-131-1/+2
| | | | | | Note: Previous commit to these files (except coda_vnops and devfs_vnops) that claimed to remove WILLRELE from VOP_RENAME actually removed it from VOP_MKNOD.
* Remove WILLRELE from VOP_RENAMEeivind1999-11-121-2/+6
|
* Quick fix for breakage of ext2fs link counts as reported by stat(2) bybde1999-11-032-1/+3
| | | | | | | the soft updates changes: only report the link count to be i_effnlink in ufs_getattr() for file systems that maintain i_effnlink. Tested by: Mike Dracopoulos <mdraco@math.uoa.gr>
* Add sysctl debug.dircheck to allow directory sanity checking to be turneddillon1999-10-301-0/+11
| | | | | | | on with a sysctl. Fix two bugs in ufs_lookup that can cause deadlocks due to out-of-order locking. This fix was tested for a few days prior to commit.
* Remove v_maxio from struct vnode.phk1999-09-291-2/+1
| | | | | | Replace it with mnt_iosize_max in struct mount. Nits from: bde
* More removals of vnode->v_lastr, replaced by preexisting seqcountdillon1999-09-201-78/+11
| | | | | | | | | heuristic to detect sequential operation. VM-related forced clustering code removed from ufs in preparation for a commit to vm/vm_fault.c that does it more generally. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>
* Fix a harmless bug I introduced, simplify a bit more while here.phk1999-09-201-6/+4
|
* Step one of replacing devsw->d_maxio with si_bsize_max.phk1999-09-201-35/+4
| | | | | | | | Rename dev->si_bsize_max to si_iosize_max and set it in spec_open if the device didn't. Set vp->v_maxio from dev->si_bsize_max in spec_open rather than in ufs_bmap.c
* Removed diskerr()'s unused d_name arg and updated callers. This fixesbde1999-09-131-2/+2
| | | | | | warnings caused by the arg having the wrong type (not const enough). The arg was also wrong (a full name instead of a short one) for calls from from subr_diskmbr.c and pc98/diskslice_machdep.c.
* Seperate the export check in VFS_FHTOVP, exports are now checked viaalfred1999-09-112-18/+33
| | | | | | | | | VFS_CHECKEXP. Add fh(open|stat|stafs) syscalls to allow userland to query filesystems based on (network) filehandle. Obtained from: NetBSD
* remove unused variables.phk1999-08-281-1/+1
|
* We don't need to pass the diskname argument all over the diskslice/labelphk1999-08-281-1/+1
| | | | code, we can find the name from any convenient dev_t
* $Id$ -> $FreeBSD$peter1999-08-2815-15/+15
|
* Simplify the handling of VCHR and VBLK vnodes using the new dev_t:phk1999-08-261-19/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Make the alias list a SLIST. Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore. Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively. Make the revoke syscalls use vcount() instead of VALIASED. Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag. vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one. Print the devicename in specfs/vprint(). Remove a couple of stale LFS vnode flags. Remove unimplemented/unused LK_DRAINED;
* Use devtoname() to print dev_t's instead of casting them to long or u_longbde1999-08-231-3/+3
| | | | for misprinting in %lx format.
* Support full-precision file timestamps. Until now, only the secondsjdp1999-08-221-10/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | have been maintained, and that is still the default. A new sysctl variable "vfs.timestamp_precision" can be used to enable higher levels of precision: 0 = seconds only; nanoseconds zeroed (default). 1 = seconds and nanoseconds, accurate within 1/HZ. 2 = seconds and nanoseconds, truncated to microseconds. >=3 = seconds and nanoseconds, maximum precision. Level 1 uses getnanotime(), which is fast but can be wrong by up to 1/HZ. Level 2 uses microtime(). It might be desirable for consistency with utimes() and friends, which take timeval structures rather than timespecs. Level 3 uses nanotime() for the higest precision. I benchmarked levels 0, 1, and 3 by copying a 550 MB tree with "cpio -pdu". There was almost negligible difference in the system times -- much less than 1%, and less than the variation among multiple runs at the same level. Bruce Evans dreamed up a torture test involving 1-byte reads with intervening fstat() calls, but the cpio test seems more realistic to me. This feature is currently implemented only for the UFS (FFS and MFS) filesystems. But I think it should be easy to support it in the others as well. An earlier version of this was reviewed by Bruce. He's not to blame for any breakage I've introduced since then. Reviewed by: bde (an earlier version of the code)
* Add the (inline) function vm_page_undirty for clearing the dirty bitmaskalc1999-08-171-2/+2
| | | | | | | | of a vm_page. Use it. Submitted by: dillon
* Spring cleaning around strategy and disklabels/slices:phk1999-08-141-9/+7
| | | | | | | | | | | | | | Introduce BUF_STRATEGY(struct buf *, int flag) macro, and use it throughout. please see comment in sys/conf.h about the flag argument. Remove strategy argument from all the diskslice/label/bad144 implementations, it should be found from the dev_t. Remove bogus and unused strategy1 routines. Remove open/close arguments from dssize(). Pick them up from dev_t. Remove unused and unfinished setgeom support from diskslice/label/bad144 code.
* Move the special-casing of stat(2)->st_blksize for device filesphk1999-08-131-15/+2
| | | | | | from UFS to the generic level. For chr/blk devices we don't care about the blocksize of the filesystem, we want what the device asked for.
* The bdevsw() and cdevsw() are now identical, so kill the former.phk1999-08-131-4/+4
|
* s/v_specinfo/v_rdev/phk1999-08-131-4/+4
|
* Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,phk1999-08-082-4/+3
| | | | | | a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.
* Move the memory access behavior information provided by madvisealc1999-08-011-5/+4
| | | | | | from the vm_object to the vm_map. Submitted by: dillon
* Fixed access timestamp bugs:bde1999-07-251-5/+17
| | | | | | | | | | Set IN_ACCESS for successful reads of 0 bytes (except for requests to read 0 bytes). This was broken in rev.1.42. PR: misc/10148 Don't set IN_ACCESS for requests to read 0 bytes. Don't set IN_ACCESS for unsuccessful reads.
* Create the macro DOINGASYNC to check whether the MNT_ASYNC flag hasmckusick1999-07-134-26/+38
| | | | | | | | | been set for a mount point. Insert missing checks to ensure that all write operations are done asynchronously when the MNT_ASYNC option has been requested. Submitted by: Craig A Soules <soules+@andrew.cmu.edu> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* These changes appear to give us benefits with both small (32MB) andmckusick1999-07-081-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() *PRIOR* to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* Convert buffer locking from using the B_BUSY and B_WANTED flags to usingmckusick1999-06-261-5/+5
| | | | | | | lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
* Add a vnode argument to VOP_BWRITE to get rid of the last vnodemckusick1999-06-162-6/+6
| | | | | operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.
* Divorce "dev_t" from the "major|minor" bitmap, which is now calledphk1999-05-112-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
* I got tired of seeing all the cdevsw[major(foo)] all over the place.phk1999-05-081-4/+4
| | | | | | | | Made a new (inline) function devsw(dev_t dev) and substituted it. Changed to the BDEV variant to this format as well: bdevsw(dev_t dev) DEVFS will eventually benefit from this change too.
* Continue where Julian left off in July 1998:phk1999-05-071-4/+4
| | | | | | | | | | | | | | Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline) function. Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention to the order of the cmaj/bmaj arguments!) Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE (ditto!) (Next step will be to convert all bdev dev_t's to cdev dev_t's before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
* The VFS/BIO subsystem contained a number of hacks in order to optimizealc1999-05-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | piecemeal, middle-of-file writes for NFS. These hacks have caused no end of trouble, especially when combined with mmap(). I've removed them. Instead, NFS will issue a read-before-write to fully instantiate the struct buf containing the write. NFS does, however, optimize piecemeal appends to files. For most common file operations, you will not notice the difference. The sole remaining fragment in the VFS/BIO system is b_dirtyoff/end, which NFS uses to avoid cache coherency issues with read-merge-write style operations. NFS also optimizes the write-covers-entire-buffer case by avoiding the read-before-write. There is quite a bit of room for further optimization in these areas. The VM system marks pages fully-valid (AKA vm_page_t->valid = VM_PAGE_BITS_ALL) in several places, most noteably in vm_fault. This is not correct operation. The vm_pager_get_pages() code is now responsible for marking VM pages all-valid. A number of VM helper routines have been added to aid in zeroing-out the invalid portions of a VM page prior to the page being marked all-valid. This operation is necessary to properly support mmap(). The zeroing occurs most often when dealing with file-EOF situations. Several bugs have been fixed in the NFS subsystem, including bits handling file and directory EOF situations and buf->b_flags consistancy issues relating to clearing B_ERROR & B_INVAL, and handling B_DONE. getblk() and allocbuf() have been rewritten. B_CACHE operation is now formally defined in comments and more straightforward in implementation. B_CACHE for VMIO buffers is based on the validity of the backing store. B_CACHE for non-VMIO buffers is based simply on whether the buffer is B_INVAL or not (B_CACHE set if B_INVAL clear, and vise-versa). biodone() is now responsible for setting B_CACHE when a successful read completes. B_CACHE is also set when a bdwrite() is initiated and when a bwrite() is initiated. VFS VOP_BWRITE routines (there are only two - nfs_bwrite() and bwrite()) are now expected to set B_CACHE. This means that bowrite() and bawrite() also set B_CACHE indirectly. There are a number of places in the code which were previously using buf->b_bufsize (which is DEV_BSIZE aligned) when they should have been using buf->b_bcount. These have been fixed. getblk() now clears B_DONE on return because the rest of the system is so bad about dealing with B_DONE. Major fixes to NFS/TCP have been made. A server-side bug could cause requests to be lost by the server due to nfs_realign() overwriting other rpc's in the same TCP mbuf chain. The server's kernel must be recompiled to get the benefit of the fixes. Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
* This Implements the mumbled about "Jail" feature.phk1999-04-282-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a seriously beefed up chroot kind of thing. The process is jailed along the same lines as a chroot does it, but with additional tough restrictions imposed on what the superuser can do. For all I know, it is safe to hand over the root bit inside a prison to the customer living in that prison, this is what it was developed for in fact: "real virtual servers". Each prison has an ip number associated with it, which all IP communications will be coerced to use and each prison has its own hostname. Needless to say, you need more RAM this way, but the advantage is that each customer can run their own particular version of apache and not stomp on the toes of their neighbors. It generally does what one would expect, but setting up a jail still takes a little knowledge. A few notes: I have no scripts for setting up a jail, don't ask me for them. The IP number should be an alias on one of the interfaces. mount a /proc in each jail, it will make ps more useable. /proc/<pid>/status tells the hostname of the prison for jailed processes. Quotas are only sensible if you have a mountpoint per prison. There are no privisions for stopping resource-hogging. Some "#ifdef INET" and similar may be missing (send patches!) If somebody wants to take it from here and develop it into more of a "virtual machine" they should be most welcome! Tools, comments, patches & documentation most welcome. Have fun... Sponsored by: http://www.rndassociates.com/ Run for almost a year by: http://www.servetheweb.com/
* Suser() simplification:phk1999-04-272-8/+8
| | | | | | | | | | | | | | | | | | | 1: s/suser/suser_xxx/ 2: Add new function: suser(struct proc *), prototyped in <sys/proc.h>. 3: s/suser_xxx(\([a-zA-Z0-9_]*\)->p_ucred, \&\1->p_acflag)/suser(\1)/ The remaining suser_xxx() calls will be scrutinized and dealt with later. There may be some unneeded #include <sys/cred.h>, but they are left as an exercise for Bruce. More changes to the suser() API will come along with the "jail" code.
* Catch a case spotted by Tor where files mmapped could leave garbage in thejulian1999-04-051-6/+17
| | | | | | | | | | | | unallocated parts of the last page when the file ended on a frag but not a page boundary. Delimitted by tags PRE_MATT_MMAP_EOF and POST_MATT_MMAP_EOF, in files alpha/alpha/pmap.c i386/i386/pmap.c nfs/nfs_bio.c vm/pmap.h vm/vm_page.c vm/vm_page.h vm/vnode_pager.c miscfs/specfs/spec_vnops.c ufs/ufs/ufs_readwrite.c kern/vfs_bio.c Submitted by: Matt Dillon <dillon@freebsd.org> Reviewed by: Alan Cox <alc@freebsd.org>
* Don't depend on <ufs/ufs/quota.h> or another (old) prerequisite includingbde1999-03-061-1/+2
| | | | | <sys/queue.h>. This fixes my recent breakage of biosboot by unpolluting <ufs/ufs/quota.h> in the !KERNEL case.
* Moved kernel declarations inside the KERNEL ifdef, and removedbde1999-03-051-7/+7
| | | | | | | | | | | | | | | include of <sys/queue.h> in the !KERNEL case. The prerequisites for <ufs/ufs/quota.h> were broken in Lite2 by converting some of the kernel declarations to use queue macros without including <sys/queue.h>. <sys/queue.h> was included in applications in /usr/src instead. We polluted this file instead of merging the changes in the applications. Include <sys/queue.h> in the KERNEL case, and forward-declare all structs that are used in prototypes, so that this file is almost self-sufficient even in the kernel. Obtained from: mostly from NetBSD
* Changed the type of quotactl()'s 4th arg from `char *' to `void *'bde1999-03-051-2/+2
| | | | | | | | | so that non-sloppy applications can call it without using disgusting casts to avoid warnings. The 4th arg is sort of varargs -- it must sometimes represent a filename, sometimes a struct pointer, and is sometimes unused. The arg type is still caddr_t in the kernel. Obtained from: mostly from NetBSD
* Merge patch to ufs_vnops.c's ufs_rename to the copy of ufs_rename thatimp1999-03-021-2/+2
| | | | | | lives in ext2_vnops.c for ext2fs. Also remove cast from comparision. Bruce pointed out that it was bogus since we'd force a signed comparision when we really wanted an unsigned comparison.
* Fix last commit based on feedback from Guido, Bruce and Terry.imp1999-02-261-5/+6
| | | | | | | | | | | | | | | | Specifically, the test was in the wrong place, lacked a cast, didn't unlock the node, and exited to bad rather than abortit. Now we don't allow renaming of a file with LINK_MAX references. Move the test to earlier in the code as it is closer to where ip is obtained, as that is the style of the rest of the function. Didn't fix the problems bruce pointed out in the rename man page to include EMLINK, nor address his complaints about how the whole idea of incrementing the link count during a rename is potentially asking for trouble. Also didn't try to correct potential problem Terry pointed out with decrements not being similarly protected against underflow.
* Add missing check for LINK_MAX in ufs_rename. Since ip->i_effnlink andimp1999-02-251-1/+5
| | | | | | ip->nlink were different types, there was a masked overflow. Reported by: Mark Slemco <marcs@znep.com>
* Update ufs_vnops code to use new specinfo fields rather then guess.dillon1999-02-251-7/+14
| | | | This is part of general specinfo / d_parms() commit.
* Remove XXX comment in regarsd to why NFS doesn't use VOP_ABORT(). NFSdillon1999-02-131-3/+3
| | | | is being fixed now.
OpenPOWER on IntegriCloud