summaryrefslogtreecommitdiffstats
path: root/sys/ufs/ffs/ffs_balloc.c
Commit message (Collapse)AuthorAgeFilesLines
* This update is a performance improvement when allocating blocks onmckusick2002-10-221-0/+12
| | | | | | | | | | | | | a full filesystem. Previously, if the allocation failed, we had to fsync the file before rolling back any partial allocation of indirect blocks. Most block allocation requests only need to allocate a single data block and if that allocation fails, there is nothing to unroll. So, before doing the fsync, we check to see if any rollback will really be necessary. If none is necessary, then we simply return. This update eliminates the flurry of disk activity that got triggered whenever a filesystem would run out of space. Sponsored by: DARPA & NAI Labs.
* Fix a file-rewrite performance case for UFS[2]. When rewriting portionsdillon2002-10-181-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of a file in chunks that are less then the filesystem block size, if the data is not already cached the system will perform a read-before-write. The problem is that it does this on a block-by-block basis, breaking up the I/Os and making clustering impossible for the writes. Programs such as INN using cyclic file buffers suffer greatly. This problem is only going to get worse as we use larger and larger filesystem block sizes. The solution is to extend the sequential heuristic so UFS[2] can perform a far larger read and readahead when dealing with this case. (note: maximum disk write bandwidth is 27MB/sec thru filesystem) (note: filesystem blocksize in test is 8K (1K frag)) dd if=/dev/zero of=test.dat bs=1k count=2m conv=notrunc Before: (note half of these are reads) tty da0 da1 acd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 14.21 598 8.30 0.00 0 0.00 0.00 0 0.00 0 0 7 1 92 0 76 14.09 813 11.19 0.00 0 0.00 0.00 0 0.00 0 0 9 5 86 0 76 14.28 821 11.45 0.00 0 0.00 0.00 0 0.00 0 0 8 1 91 After: (note half of these are reads) tty da0 da1 acd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 63.62 434 26.99 0.00 0 0.00 0.00 0 0.00 0 0 18 1 80 0 76 63.58 424 26.30 0.00 0 0.00 0.00 0 0.00 0 0 17 2 82 0 76 63.82 438 27.32 0.00 0 0.00 0.00 0 0.00 1 0 19 2 79 Reviewed by: mckusick Approved by: re X-MFC after: immediately (was heavily tested in -stable for 4 months)
* Add support to UFS2 to provide storage for extended attributes.mckusick2002-07-191-42/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags |= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.
* Rename the BALLOC flags from B_* to BA_* to avoid confusion with thedillon2002-06-231-20/+20
| | | | | | struct buf B_ flags. Approved by: mckusick
* This commit adds basic support for the UFS2 filesystem. The UFS2mckusick2002-06-211-43/+399
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | filesystem expands the inode to 256 bytes to make space for 64-bit block pointers. It also adds a file-creation time field, an ability to use jumbo blocks per inode to allow extent like pointer density, and space for extended attributes (up to twice the filesystem block size worth of attributes, e.g., on a 16K filesystem, there is space for 32K of attributes). UFS2 fully supports and runs existing UFS1 filesystems. New filesystems built using newfs can be built in either UFS1 or UFS2 format using the -O option. In this commit UFS1 is the default format, so if you want to build UFS2 format filesystems, you must specify -O 2. This default will be changed to UFS2 when UFS2 proves itself to be stable. In this commit the boot code for reading UFS2 filesystems is not compiled (see /sys/boot/common/ufsread.c) as there is insufficient space in the boot block. Once the size of the boot block is increased, this code can be defined. Things to note: the definition of SBSIZE has changed to SBLOCKSIZE. The header file <ufs/ufs/dinode.h> must be included before <ufs/ffs/fs.h> so as to get the definitions of ufs2_daddr_t and ufs_lbn_t. Still TODO: Verify that the first level bootstraps work for all the architectures. Convert the utility ffsinfo to understand UFS2 and test growfs. Add support for the extended attribute storage. Update soft updates to ensure integrity of extended attribute storage. Switch the current extended attribute interfaces to use the extended attribute storage. Add the extent like functionality (framework is there, but is currently never used). Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@freebsd.org>
* More s/file system/filesystem/gtrhodes2002-05-161-1/+1
|
* When taking a snapshot, we must check for active files that havemckusick2002-02-021-1/+1
| | | | | | been unlinked (e.g., with a zero link count). We have to expunge all trace of these files from the snapshot so that they are neither reclaimed prematurely by fsck nor saved unnecessarily by dump.
* KSE Milestone 2julian2001-09-121-3/+3
| | | | | | | | | | | | | | Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha
* VOP_BALLOC was never really a VOP in the first place, so convert itphk2001-04-291-20/+13
| | | | to UFS_BALLOC like the other "between UFS and FFS function interfaces".
* Revert consequences of changes to mount.h, part 2.grog2001-04-291-2/+0
| | | | Requested by: bde
* Correct #includes to work with fixed sys/mount.h.grog2001-04-231-0/+2
|
* Add snapshots to the fast filesystem. Most of the changes supportmckusick2000-07-111-0/+9
| | | | | | | | | | | | | | | | | | | | the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).
* Separate the struct bio related stuff out of <sys/buf.h> intophk2000-05-051-0/+1
| | | | | | | | | | | | | | | <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter
* Fix a 'freeing free block' panic in UFS. The problem occurs when thedillon2000-02-241-1/+24
| | | | | | | | | | | | | | filesystem fills up. If the first indirect block exists and FFS is able to allocate deeper indirect blocks, but is not able to allocate the data block, FFS improperly unwinds the indirect blocks and leaves a block pointer hanging to a freed block. This will cause a panic later when the file is removed. The solution is to properly account for the first block-pointer-to-an-indirect-block we had to create in a balloc operation and then unwind it if a failure occurs. Detective work by: Ian Dowse <iedowse@maths.tcd.ie> Reviewed by: mckusick, Ian Dowse <iedowse@maths.tcd.ie> Approved by: jkh
* The only known cause of this panic is running out of disk space.mckusick2000-01-111-3/+13
| | | | | | | | | | | | | | | | | | | | | | The problem occurs when an indirect block and a data block are being allocated at the same time. For example when the 13th block of the file is written, the filesystem needs to allocate the first indirect block and a data block. If the indirect block allocation succeeds, but the data block allocation fails, the error code dellocates the indirect block as it has nothing at which to point. Unfortunately, it does not deallocate the indirect block's associated dependencies which then fail when they find the block unexpectedly gone (ptr == 0 instead of its expected value). The fix is to fsync the file before doing the block rollback, as the fsync will flush out all of the dependencies. Once the rollback is done the file must be fsync'ed again so that the soft updates code does not find unexpected changes. This approach is much slower than writing the code to back out the extraneous dependencies, but running out of disk space is not expected to be a common occurence, so just getting it right is the main criterion. PR: kern/15063 Submitted by: Assar Westerlund <assar@stacken.kth.se>
* $Id$ -> $FreeBSD$peter1999-08-281-1/+1
|
* Create the macro DOINGASYNC to check whether the MNT_ASYNC flag hasmckusick1999-07-131-2/+4
| | | | | | | | | been set for a mount point. Insert missing checks to ensure that all write operations are done asynchronously when the MNT_ASYNC option has been requested. Submitted by: Craig A Soules <soules+@andrew.cmu.edu> Reviewed by: Kirk McKusick <mckusick@mckusick.com>
* Fix warnings in preparation for adding -Wall -Wcast-qual to thedillon1999-01-281-8/+8
| | | | kernel compile
* Don't dereference an uninitialized pointer in dead code. The deadbde1998-09-121-2/+2
| | | | code gets executed if it is compiled without optimization.
* Reviewed by: dyson@freebsd.org (john Dyson), dg@root.com (david greenman)julian1998-03-081-25/+67
| | | | | Submitted by: Kirk McKusick (mcKusick@mckusick.com) Obtained from: WHistle development tree
* Back out DIAGNOSTIC changes.eivind1998-02-061-3/+1
|
* Turn DIAGNOSTIC into a new-style option.eivind1998-02-041-1/+3
|
* Make our v_usecount vnode reference count work identically to thedyson1998-01-061-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | original BSD code. The association between the vnode and the vm_object no longer includes reference counts. The major difference is that vm_object's are no longer freed gratuitiously from the vnode, and so once an object is created for the vnode, it will last as long as the vnode does. When a vnode object reference count is incremented, then the underlying vnode reference count is incremented also. The two "objects" are now more intimately related, and so the interactions are now much less complex. When vnodes are now normally placed onto the free queue with an object still attached. The rundown of the object happens at vnode rundown time, and happens with exactly the same filesystem semantics of the original VFS code. There is absolutely no need for vnode_pager_uncache and other travesties like that anymore. A side-effect of these changes is that SMP locking should be much simpler, the I/O copyin/copyout optimizations work, NFS should be more ponderable, and further work on layered filesystems should be less frustrating, because of the totally coherent management of the vnode objects and vnodes. Please be careful with your system while running this code, but I would greatly appreciate feedback as soon a reasonably possible.
* Don't include <sys/lock.h> in headers when only `struct simplelock' isbde1997-12-051-1/+2
| | | | required. Fixed everything that depended on the pollution.
* Removed unused #includes.bde1997-08-021-5/+1
|
* Use smalllblktosize() instead of multiplying small block numbersbde1997-04-101-4/+4
| | | | | by fs->fs_bsize. The macro is usually faster and makes it clearer that the multiplication can't overflow.
* Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are notpeter1997-02-221-1/+1
| | | | ready for it yet.
* This is the kernel Lite/2 commit. There are some requisite userlanddyson1997-02-101-48/+68
| | | | | | | | | | | | | | | changes, so don't expect to be able to run the kernel as-is (very well) without the appropriate Lite/2 userland changes. The system boots and can mount UFS filesystems. Untested: ext2fs, msdosfs, NFS Known problems: Incorrect Berkeley ID strings in some files. Mount_std mounts will not work until the getfsent library routine is changed. Reviewed by: various people Submitted by: Jeffery Hsu <hsu@freebsd.org>
* Make the long-awaited change from $Id$ to $FreeBSD$jkh1997-01-141-1/+1
| | | | | | | | This will make a number of things easier in the future, as well as (finally!) avoiding the Id-smashing problem which has plagued developers for so long. Boy, I'm glad we're not using sup anymore. This update would have been insane otherwise.
* Eliminated many redundant vm_map_lookup operations for vm_mmap.dyson1996-01-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish overhead for merged cache. Efficiency improvement for vfs_cluster. It used to do alot of redundant calls to cluster_rbuild. Correct the ordering for vrele of .text and release of credentials. Use the selective tlb update for 486/586/P6. Numerous fixes to the size of objects allocated for files. Additionally, fixes in the various pagers. Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs. Fixes in the swap pager for exhausted resources. The pageout code will not as readily thrash. Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE), thereby improving efficiency of several routines. Eliminate even more unnecessary vm_page_protect operations. Significantly speed up process forks. Make vm_object_page_clean more efficient, thereby eliminating the pause that happens every 30seconds. Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the case of filesystems mounted async. Fix a panic with busy pages when write clustering is done for non-VMIO buffers.
* Remove trailing whitespace.rgrimes1995-05-301-2/+2
|
* Kill bogus vnode_pager_setsize(). It was being called at the wrong timedg1995-05-281-2/+1
| | | | | | | and resulted in the object size being too small. This caused bad things to happen later when the file was mapped. Reviewed by: John Dyson
* Changes from John Dyson and myself:dg1995-04-091-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixed remaining known bugs in the buffer IO and VM system. vfs_bio.c: Fixed some race conditions and locking bugs. Improved performance by removing some (now) unnecessary code and fixing some broken logic. Fixed process accounting of # of FS outputs. Properly handle NFS interrupts (B_EINTR). (various) Replaced calls to clrbuf() with calls to an optimized routine called vfs_bio_clrbuf(). (various FS sync) Sync out modified vnode_pager backed pages. ffs_vnops.c: Do two passes: Sync out file data first, then indirect blocks. vm_fault.c: Fixed deadly embrace caused by acquiring locks in the wrong order. vnode_pager.c: Changed to use buffer I/O system for writing out modified pages. This should fix the problem with the modification date previous not getting updated. Also dramatically simplifies the code. Note that this is going to change in the future and be implemented via VOP_PUTPAGES(). vm_object.c: Fixed a pile of bugs related to cleaning (vnode) objects. The performance of vm_object_page_clean() is terrible when dealing with huge objects, but this will change when we implement a binary tree to keep the object pages sorted. vm_pageout.c: Fixed broken clustering of pageouts. Fixed race conditions and other lockup style bugs in the scanning of pages. Improved performance.
* Removed redundant newlines that were in some panic strings.dg1995-03-191-2/+2
|
* Fixes from John Dyson to work around vnode lock hang. Basically, removedg1995-03-031-1/+3
| | | | | | the VOP_BMAP calls, and add one to bdwrite. Submitted by: John Dyson
* Cosmetics. make gcc less noisy. Still some way to go here.phk1994-10-101-10/+16
|
* Added $Id$dg1994-08-021-0/+1
|
* The big 4.4BSD Lite to FreeBSD 2.0.0 (Development) patch.rgrimes1994-05-251-0/+1
| | | | | Reviewed by: Rodney W. Grimes Submitted by: John Dyson and David Greenman
* BSD 4.4 Lite Kernel Sourcesrgrimes1994-05-241-0/+282
OpenPOWER on IntegriCloud