summaryrefslogtreecommitdiffstats
path: root/sys/kern/vfs_subr.c
Commit message (Collapse)AuthorAgeFilesLines
* The bdevsw() and cdevsw() are now identical, so kill the former.phk1999-08-131-2/+2
|
* s/v_specinfo/v_rdev/phk1999-08-131-4/+4
|
* Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>,phk1999-08-081-4/+2
| | | | | | a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.
* Add sysctl and support code to allow directories to be VMIO'd. The defaultalc1999-07-261-3/+3
| | | | | | setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon
* Now a dev_t is a pointer to struct specinfo which is shared by all specdevphk1999-07-201-51/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | vnodes referencing this device. Details: cdevsw->d_parms has been removed, the specinfo is available now (== dev_t) and the driver should modify it directly when applicable, and the only driver doing so, does so: vn.c. I am not sure the logic in checking for "<" was right before, and it looks even less so now. An intial pool of 50 struct specinfo are depleted during early boot, after that malloc had better work. It is likely that fewer than 50 would do. Hashing is done from udev_t to dev_t with a prime number remainder hash, experiments show no better hash available for decent cost (MD5 is only marginally better) The prime number used should not be close to a power of two, we use 83 for now. Add new checkalias2() to get around the loss of info from dev2udev() in bdevvp(); The aliased vnodes are hung on a list straight of the dev_t, and speclisth[SPECSZ] is unused. The sharing of struct specinfo means that the v_specnext moves into the vnode which grows by 4 bytes. Don't use a VBLK dev_t which doesn't make sense in MFS, now we hang a dummy cdevsw on B/Cmaj 253 so that things look sane. Storage overhead from all of this is O(50k). Bump __FreeBSD_version to 400009 The next step will add the stuff needed so device-drivers can start to hang things from struct specinfo
* [click] Now all dev_t's in the kernel have their char device major.phk1999-07-191-2/+4
| | | | | | | | | | Only know casualy of this is swapinfo/pstat which should be fixes the right way: Store the actual pathname in the kernel like mount does. [Volounteers sought for this task] The road map from here is roughly: expand struct specinfo into struct based dev_t. Add dev_t registration facilities for device drivers and start to use them.
* Introduce the vn_todev(struct vnode*) function, which returns the dev_tphk1999-07-181-1/+13
| | | | corresponding to a VBLK or VCHR node, or NODEV.
* Fix 2nd arg to udev2dev().phk1999-07-171-2/+2
|
* I have not one single time remembered the name of this function correctlyphk1999-07-171-4/+4
| | | | so obviously I gave it the wrong name. s/umakedev/makeudev/g
* Correct a couple of spelling errors in comments.kris1999-07-121-3/+3
|
* These changes appear to give us benefits with both small (32MB) andmckusick1999-07-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | large (1G) memory machine configurations. I was able to run 'dbench 32' on a 32MB system without bring the machine to a grinding halt. * buffer cache hash table now dynamically allocated. This will have no effect on memory consumption for smaller systems and will help scale the buffer cache for larger systems. * minor enhancement to pmap_clearbit(). I noticed that all the calls to it used constant arguments. Making it an inline allows the constants to propogate to deeper inlines and should produce better code. * removal of inherent vfs_ioopt support through the emplacement of appropriate #ifdef's, with John's permission. If we do not find a use for it by the end of the year we will remove it entirely. * removal of getnewbufloops* counters & sysctl's - no longer necessary for debugging, getnewbuf() is now optimal. * buffer hash table functions removed from sys/buf.h and localized to vfs_bio.c * VFS_BIO_NEED_DIRTYFLUSH flag and support code added ( bwillwrite() ), allowing processes to block when too many dirty buffers are present in the system. * removal of a softdep test in bdwrite() that is no longer necessary now that bdwrite() no longer attempts to flush dirty buffers. * slight optimization added to bqrelse() - there is no reason to test for available buffer space on B_DELWRI buffers. * addition of reverse-scanning code to vfs_bio_awrite(). vfs_bio_awrite() will attempt to locate clusterable areas in both the forward and reverse direction relative to the offset of the buffer passed to it. This will probably not make much of a difference now, but I believe we will start to rely on it heavily in the future if we decide to shift some of the burden of the clustering closer to the actual I/O initiation. * Removal of the newbufcnt and lastnewbuf counters that Kirk added. They do not fix any race conditions that haven't already been fixed by the gbincore() test done after the only call to getnewbuf(). getnewbuf() is a static, so there is no chance of it being misused by other modules. ( Unless Kirk can think of a specific thing that this code fixes. I went through it very carefully and didn't see anything ). * removal of VOP_ISLOCKED() check in flushbufqueues(). I do not think this check is necessary, the buffer should flush properly whether the vnode is locked or not. ( yes? ). * removal of extra arguments passed to getnewbuf() that are not necessary. * missed cluster_wbuild() that had to be a cluster_wbuild_wb() in vfs_cluster.c * vn_write() now calls bwillwrite() *PRIOR* to locking the vnode, which should greatly aid flushing operations in heavy load situations - both the pageout and update daemons will be able to operate more efficiently. * removal of b_usecount. We may add it back in later but for now it is useless. Prior implementations of the buffer cache never had enough buffers for it to be useful, and current implementations which make more buffers available might not benefit relative to the amount of sophistication required to implement a b_usecount. Straight LRU should work just as well, especially when most things are VMIO backed. I expect that (even though John will not like this assumption) directories will become VMIO backed some point soon. Submitted by: Matthew Dillon <dillon@backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* The buffer queue mechanism has been reformulated. Instead of havingmckusick1999-07-041-10/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QUEUE_AGE, QUEUE_LRU, and QUEUE_EMPTY we instead have QUEUE_CLEAN, QUEUE_DIRTY, QUEUE_EMPTY, and QUEUE_EMPTYKVA. With this patch clean and dirty buffers have been separated. Empty buffers with KVM assignments have been separated from truely empty buffers. getnewbuf() has been rewritten and now operates in a 100% optimal fashion. That is, it is able to find precisely the right kind of buffer it needs to allocate a new buffer, defragment KVM, or to free-up an existing buffer when the buffer cache is full (which is a steady-state situation for the buffer cache). Buffer flushing has been reorganized. Previously buffers were flushed in the context of whatever process hit the conditions forcing buffer flushing to occur. This resulted in processes blocking on conditions unrelated to what they were doing. This also resulted in inappropriate VFS stacking chains due to multiple processes getting stuck trying to flush dirty buffers or due to a single process getting into a situation where it might attempt to flush buffers recursively - a situation that was only partially fixed in prior commits. We have added a new daemon called the buf_daemon which is responsible for flushing dirty buffers when the number of dirty buffers exceeds the vfs.hidirtybuffers limit. This daemon attempts to dynamically adjust the rate at which dirty buffers are flushed such that getnewbuf() calls (almost) never block. The number of nbufs and amount of buffer space is now scaled past the 8MB limit that was previously imposed for systems with over 64MB of memory, and the vfs.{lo,hi}dirtybuffers limits have been relaxed somewhat. The number of physical buffers has been increased with the intention that we will manage physical I/O differently in the future. reassignbuf previously attempted to keep the dirtyblkhd list sorted which could result in non-deterministic operation under certain conditions, such as when a large number of dirty buffers are being managed. This algorithm has been changed. reassignbuf now keeps buffers locally sorted if it can do so cheaply, and otherwise gives up and adds buffers to the head of the dirtyblkhd list. The new algorithm is deterministic but not perfect. The new algorithm greatly reduces problems that previously occured when write_behind was turned off in the system. The P_FLSINPROG proc->p_flag bit has been replaced by the more descriptive P_BUFEXHAUST bit. This bit allows processes working with filesystem buffers to use available emergency reserves. Normal processes do not set this bit and are not allowed to dig into emergency reserves. The purpose of this bit is to avoid low-memory deadlocks. A small race condition was fixed in getpbuf() in vm/vm_pager.c. Submitted by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* Make sure that stat(2) and friends always return a valid st_dev field.phk1999-07-021-4/+5
| | | | | | Pseudo-FS need not fill in the va_fsid anymore, the syscall code will use the first half of the fsid, which now looks like a udev_t with major 255.
* Slight reorganization of kernel thread/process creation. Instead of usingpeter1999-07-011-3/+4
| | | | | | | | | | | | | | | SYSINIT_KT() etc (which is a static, compile-time procedure), use a NetBSD-style kthread_create() interface. kproc_start is still available as a SYSINIT() hook. This allowed simplification of chunks of the sysinit code in the process. This kthread_create() is our old kproc_start internals, with the SYSINIT_KT fork hooks grafted in and tweaked to work the same as the NetBSD one. One thing I'd like to do shortly is get rid of nfsiod as a user initiated process. It makes sense for the nfs client code to create them on the fly as needed up to a user settable limit. This means that nfsiod doesn't need to be in /sbin and is always "available". This is a fair bit easier to do outside of the SYSINIT_KT() framework.
* Convert buffer locking from using the B_BUSY and B_WANTED flags to usingmckusick1999-06-261-28/+23
| | | | | | | lockmgr locks. This commit should be functionally equivalent to the old semantics. That is, all buffer locking is done with LK_EXCLUSIVE requests. Changes to take advantage of LK_SHARED and LK_RECURSIVE will be done in future commits.
* Add a vnode argument to VOP_BWRITE to get rid of the last vnodemckusick1999-06-161-4/+4
| | | | | operator special case. Delete special case code from vnode_if.sh, vnode_if.src, umap_vnops.c, and null_vnops.c.
* Get rid of the global variable rushjob and replace it with a function inmckusick1999-06-151-6/+36
| | | | | kern/vfs_subr.c named speedup_syncer() which handles the speedup request. Change the various clients of rushjob to use the new function.
* Simplify cdevsw registration.phk1999-05-311-9/+7
| | | | | | | | | | | | | | | | | | | | | | | | | The cdevsw_add() function now finds the major number(s) in the struct cdevsw passed to it. cdevsw_add_generic() is no longer needed, cdevsw_add() does the same thing. cdevsw_add() will print an message if the d_maj field looks bogus. Remove nblkdev and nchrdev variables. Most places they were used bogusly. Instead check a dev_t for validity by seeing if devsw() or bdevsw() returns NULL. Move bdevsw() and devsw() functions to kern/kern_conf.c Bump __FreeBSD_version to 400006 This commit removes: 72 bogus makedev() calls 26 bogus SYSINIT functions if_xe.c bogusly accessed cdevsw[], author/maintainer please fix. I4b and vinum not changed. Patches emailed to authors. LINT probably broken until they catch up.
* Remove the test for bdevsw(dev) == NULL from bdevvp() because it failsjb1999-05-241-2/+2
| | | | | | | | if there is no character device associated with the block device. In this case that doesn't matter because bdevvp() doesn't use the character device structure. I can use the pointy bit of the axe too.
* Legally acquire a major number for mfs.luoqi1999-05-141-5/+2
|
* Previously directories were sync'ed every 10 seconds while bitmaps &mckusick1999-05-141-3/+3
| | | | | | | | inodes were synced every 15 seconds. This is now reversed as during directory create, we cannot commit the directory entry until its inode has been written. With this switch, the inodes will be more likely to be written by the time that the directory is written thus reducing the number of directory rollbacks that are needed.
* Fix (?) SPECHASH dev_t/major/minor/etc argspeter1999-05-121-2/+2
|
* Don't peek into dev_tphk1999-05-121-2/+2
|
* Divorce "dev_t" from the "major|minor" bitmap, which is now calledphk1999-05-111-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | udev_t in the kernel but still called dev_t in userland. Provide functions to manipulate both types: major() umajor() minor() uminor() makedev() umakedev() dev2udev() udev2dev() For now they're functions, they will become in-line functions after one of the next two steps in this process. Return major/minor/makedev to macro-hood for userland. Register a name in cdevsw[] for the "filedescriptor" driver. In the kernel the udev_t appears in places where we have the major/minor number combination, (ie: a potential device: we may not have the driver nor the device), like in inodes, vattr, cdevsw registration and so on, whereas the dev_t appears where we carry around a reference to a actual device. In the future the cdevsw and the aliased-from vnode will be hung directly from the dev_t, along with up to two softc pointers for the device driver and a few houskeeping bits. This will essentially replace the current "alias" check code (same buck, bigger bang). A little stunt has been provided to try to catch places where the wrong type is being used (dev_t vs udev_t), if you see something not working, #undef DEVT_FASCIST in kern/kern_conf.c and see if it makes a difference. If it does, please try to track it down (many hands make light work) or at least try to reproduce it as simply as possible, and describe how to do that. Without DEVT_FASCIST I belive this patch is a no-op. Stylistic/posixoid comments about the userland view of the <sys/*.h> files welcome now, from userland they now contain the end result. Next planned step: make all dev_t's refer to the same devsw[] which means convert BLK's to CHR's at the perimeter of the vnodes and other places where they enter the game (bootdev, mknod, sysctl).
* Fix some of the places where too much inside knowledge about major/minorphk1999-05-081-3/+3
| | | | layout and dev_t structure is being (ab)used.
* I got tired of seeing all the cdevsw[major(foo)] all over the place.phk1999-05-081-7/+7
| | | | | | | | Made a new (inline) function devsw(dev_t dev) and substituted it. Changed to the BDEV variant to this format as well: bdevsw(dev_t dev) DEVFS will eventually benefit from this change too.
* Continue where Julian left off in July 1998:phk1999-05-071-5/+5
| | | | | | | | | | | | | | Virtualize bdevsw[] from cdevsw. bdevsw() is now an (inline) function. Join CDEV_MODULE and BDEV_MODULE to DEV_MODULE (please pay attention to the order of the cmaj/bmaj arguments!) Join CDEV_DRIVER_MODULE and BDEV_DRIVER_MODULE to DEV_DRIVER_MODULE (ditto!) (Next step will be to convert all bdev dev_t's to cdev dev_t's before they get to do any damage^H^H^H^H^H^Hwork in the kernel.)
* Add sysctl descriptions to many SYSCTL_XXXsbillf1999-05-031-2/+3
| | | | | | | PR: kern/11197 Submitted by: Adrian Chadd <adrian@FreeBSD.org> Reviewed by: billf(spelling/style/minor nits) Looked at by: bde(style)
* Reviewed by: Many at differnt times in differnt parts,julian1999-03-121-12/+23
| | | | | | | | | | | | | | | | | | | | | | including alan, john, me, luoqi, and kirk Submitted by: Matt Dillon <dillon@frebsd.org> This change implements a relatively sophisticated fix to getnewbuf(). There were two problems with getnewbuf(). First, the writerecursion can lead to a system stack overflow when you have NFS and/or VN devices in the system. Second, the free/dirty buffer accounting was completely broken. Not only did the nfs routines blow it trying to manually account for the buffer state, but the accounting that was done did not work well with the purpose of their existance: figuring out when getnewbuf() needs to sleep. The meat of the change is to kern/vfs_bio.c. The remaining diffs are all minor except for NFS, which includes both the fixes for bp interaction AND fixes for a 'biodone(): buffer already done' lockup. Sys/buf.h also contains a chaining structure which is not used by this patchset but is used by other patches that are coming soon. This patch deliniated by tags PRE_MAT_GETBUF and POST_MAT_GETBUF. (sorry for the missing T matt)
* Reviewed by: Julian Elischer <julian@whistle.com>dillon1999-02-251-7/+29
| | | | | | | Add d_parms() to {c,b}devsw[]. If non-NULL this function points to a device routine that will properly fill in the specinfo structure. vfs_subr.c's checkalias() supplies appropriate defaults. This change should be fully backwards compatible with existing devices.
* Protect vn worklist and vn->v_{clean,dirty}blkhd at splbio().dillon1999-02-191-9/+17
| | | | | | | Get rid of extra LIST_REMOVE() Reviewed by: hsu@FreeBSD.ORG (Jeffrey Hsu), mckusick@McKusick.COM Submitted by: hsu@FreeBSD.ORG (Jeffrey Hsu), dillon@backplane.com ( Matthew Dillon )
* vp->v_object must be valid after normal flow of vfs_object_create()dillon1999-02-041-3/+9
| | | | | | | | | | | completes, change if() to KASSERT(). This is not a bug, we are simplify clarifying and optimizing the code. In if/else in vfs_object_create(), the failure of both conditionals will lead to a NULL object. Exit gracefully if this case occurs. ( this case does not normally occur, but needed to be handled ). Obtained from: Eivind Eklund <eivind@FreeBSD.org>
* More const fixes for -Wall, -Wcast-qualdillon1999-01-291-2/+2
|
* Fix warnings in preparation for adding -Wall -Wcast-qual to thedillon1999-01-281-2/+2
| | | | kernel compile
* This is a rather large commit that encompasses the new swapper,dillon1999-01-211-1/+45
| | | | | | | | | | changes to the VM system to support the new swapper, VM bug fixes, several VM optimizations, and some additional revamping of the VM code. The specific bug fixes will be documented with additional forced commits. This commit is somewhat rough in regards to code cleanup issues. Reviewed by: "John S. Dyson" <root@dyson.iquest.net>, "David Greenman" <dg@root.com>
* KNFize, by bde.eivind1999-01-101-2/+4
|
* Split DIAGNOSTIC -> DIAGNOSTIC, INVARIANTS, and INVARIANT_SUPPORT aseivind1999-01-081-30/+11
| | | | | | | | | discussed on -hackers. Introduce 'KASSERT(assertion, ("panic message", args))' for simple check + panic. Reviewed by: msmith
* Remove the 'waslocked' parameter to vfs_object_create().eivind1999-01-051-19/+6
|
* Finish staticization.eivind1999-01-051-5/+5
|
* Ifdefed conditionally used simplock variables.bde1999-01-021-2/+4
|
* Restored rev.1.31 which was clobbered by rev.1.69 (the big Lite2bde1998-12-241-2/+3
| | | | | | | | | merge). This fixes at least hanging in revoke(2) when a somewhat active slave pty is revoked. The hang made the window for the null pointer bug in ufsspec_{read,write} much larger. There are many other bugs in this area (revoke of an active fifo at best leaks memory...).
* Check return value of tsleep(). I've checked of all call points -eivind1998-12-221-4/+7
| | | | | | | | there does not seem to be a problem with this. PR: kern/8732 Analysis by: David G Andersen <danderse@cs.utah.edu> Tested by: Alfred Perlstein <bright@hotjobs.com>
* Staticize.eivind1998-12-211-9/+9
|
* Examine all occurrences of sprintf(), strcat(), and str[n]cpy()archie1998-12-041-2/+2
| | | | | | | | | | | | | | for possible buffer overflow problems. Replaced most sprintf()'s with snprintf(); for others cases, added terminating NUL bytes where appropriate, replaced constants like "16" with sizeof(), etc. These changes include several bug fixes, but most changes are for maintainability's sake. Any instance where it wasn't "immediately obvious" that a buffer overflow could not occur was made safer. Reviewed by: Bruce Evans <bde@zeta.org.au> Reviewed by: Matthew Dillon <dillon@apollo.backplane.com> Reviewed by: Mike Spengler <mks@networkcs.com>
* Convert lists for bufs attached to vnodes from a LIST to a TAILQ.peter1998-10-311-51/+58
| | | | | | | | | | | | | | | | | | | | | | - Use TAILQ_* macros extensively instead of internal names - use b_xflags instead of the NOLIST magic number hack in the next pointer - clean bufs are inserted at the tail rather than the head. - redo dirty buffer insert so that metadata (negative lbn) goes to the tail directly rather than at the HEAD. This makes a difference when inserting dirty data blocks in lbn sorted order since data block insertion will not have to bypass all the metadata cruft. data is lbn sorted since it makes sense for clustering and writeback ordering, while metadata sorting doesn't help much since the lbn's are meaningless when walking the list for writebacks. Small systems will not notice much (if any) benefit from this, but really busy systems with large dirty block lists should get a lot more. I've tested this with softdep, and it doesn't seem to mind the change of queueing of metadata. Reviewed (in princible) by: dg Obtained from: partly from John Dyson's work-in-progress patches in June.
* The last argument to vm_object_page_clean() are now bit flags, rather thanpeter1998-10-311-2/+2
| | | | | | | | | | | | the old true/false. While here, have vfs_msync() only call vm_object_page_clean() with OBJPC_SYNC if called with MNT_WAIT flags. vfs_msync() is called at unmount time (with MNT_WAIT) and from the syncer process (formerly update). This should make dirty mmap writebacks a little less nasty. I have tested this a little with SOFTUPDATES enabled, but I don't normally use it since I've been badly burned too many times.
* Oops, rev.1.167 made the device number checking in bdevvp() too strictbde1998-10-291-3/+4
| | | | for mfs root mounts. Don't require major 255 to be in bdevsw[].
* Remove the V_SAVEMETA flag, nothing uses it any more now that msdosfs andpeter1998-10-291-18/+7
| | | | | ext2fs call vtruncbuf() directly. This simplifies and cleans up vinvalbuf() a little.
* Updated the major number check in vfs_object_create(). It's notbde1998-10-261-2/+3
| | | | | | clear if the check is necessary, but vfs_object_create() is called for all vnodes and it was silly to create objects for VBLK vnodes that don't even have a driver.
* Nitpicking and dusting performed on a train. Removes trivial warningsphk1998-10-251-13/+5
| | | | about unused variables, labels and other lint.
OpenPOWER on IntegriCloud