summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* Remove M_TRYWAIT/M_WAITOK/M_WAIT. Callers should use 0.alfred2003-01-219-29/+29
| | | | Merge M_NOWAIT/M_DONTWAIT into a single flag M_NOWAIT.
* Bow to the whining masses and change a union back into void *. Retaindillon2003-01-131-1/+1
| | | | | removal of unnecessary casts and throw in some minor cleanups to see if anyone complains, just for the hell of it.
* Change struct file f_data to un_data, a union of the correct structdillon2003-01-121-1/+1
| | | | | | | | | | pointer types, and remove a huge number of casts from code using it. Change struct xfile xf_data to xun_data (ABI is still compatible). If we need to add a #define for f_data and xf_data we can, but I don't think it will be necessary. There are no operational changes in this commit.
* o Improve wording of the comment that accompanies fs_pad. Themarcel2003-01-101-1/+6
| | | | | | | | | | | padding is not specific to non-i386 architectures. It is caused by non-i386 specific alignment requirements of fs_swuid, o Add a CTASSERT to catch a change in the size of struct fs at compile-time rather than run-time. Ok'd: gordon Tested on: i386 ia64
* Fix superblock alignment problems on non-i386 platforms. Also change fs_uuidgordon2003-01-091-2/+3
| | | | | | | | to fs_swuid, making it more descriptive. Submitted by: marcel Reviewed by: peter Pointy hat to: gordon
* Steal some space from fs_fsmnt to create fs_volname and fs_uuid. The volnamegordon2003-01-081-1/+9
| | | | | | | | will be used to support volume names with the help of a GEOM module (to be committed). uuid will be used to deal with conflicting volume names (which doesn't work just yet). Approved by: mckusick@
* This patch fixes a problem caused by applications that rapidly andmckusick2003-01-072-5/+13
| | | | | | | | | | | | | | | | | | repeatedly truncate the same file. Each time the file is truncated, a buffer is grabbed to store the indirect block numbers that need to be freed. Those blocks cannot be freed until the inode claiming them is written to disk. Thus, the number of buffers being held by soft updates explodes and in extreme cases can run the kernel out of buffers. The problem can be avoided by doing an fsync on the file every debug.maxindirdep truncates (currently defaulted to 50). The fsync causes the inode to be written so that the held buffers can be freed. The check for excessive buffers is checked as part of the existing hook for excessive dependencies (softdep_slowdown) in the truncate code. Reported by: David Schultz <dschultz@uclink.Berkeley.EDU> Sponsored by: DARPA & NAI Labs. MFC after: 3 weeks
* Temporarily introduce a new VOP_SPECSTRATEGY operation while I tryphk2003-01-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | to sort out disk-io from file-io in the vm/buffer/filesystem space. The intent is to sort VOP_STRATEGY calls into those which operate on "real" vnodes and those which operate on VCHR vnodes. For the latter kind, the call will be changed to VOP_SPECSTRATEGY, possibly conditionally for those places where dual-use happens. Add a default VOP_SPECSTRATEGY method which will call the normal VOP_STRATEGY. First time it is called it will print debugging information. This will only happen if a normal vnode is passed to VOP_SPECSTRATEGY by mistake. Add a real VOP_SPECSTRATEGY in specfs, which does what VOP_STRATEGY does on a VCHR vnode today. Add a new VOP_STRATEGY method in specfs to catch instances where the conversion to VOP_SPECSTRATEGY has not yet happened. Handle the request just like we always did, but first time called print debugging information. Apart up to two instances of console messages per boot, this amounts to a glorified no-op commit. If you get any of the messages on your console I would very much like a copy of them mailed to phk@freebsd.org
* Since Jeffr made the std* functions the default in rev 1.63 ofphk2003-01-041-9/+0
| | | | | | | kern/vfs_defaults.c it is wrong for the individual filesystems to use the std* functions as that prevents override of the default. Found by: src/tools/tools/vop_table
* Convert calls to BUF_STRATEGY to VOP_STRATEGY calls. This is a no-op sincephk2003-01-032-2/+2
| | | | all BUF_STRATEGY did in the first place was call VOP_STRATEGY.
* Correct typos, mostly s/ a / an / where appropriate. Some whitespace cleanup,schweikh2003-01-012-2/+2
| | | | especially in troff files.
* When compiling the kernel do not implicitly include filedesc.h from proc.h,alfred2003-01-011-0/+1
| | | | | | this was causing filedesc work to be very painful. In order to make this work split out sigio definitions to thier own header (sigio.h) which is included from proc.h for the time being.
* Use three UMA zones for FFS/UFS inodes instead of malloc space.phk2002-12-271-11/+20
| | | | | Since inodes are currently 144 bytes, this will save 112 bytes per inode. This can amount to up to 10MByte on large systems.
* Move the allocation of the inode contents into ffs_vfsops.c rather thanphk2002-12-273-12/+10
| | | | passing malloc types around.
* Make ffs_mountfs() static.phk2002-12-274-17/+24
| | | | | | | | | Remove the malloctype from the ufs mount structure, instead add a callback to the storage method for freeing inodes: UFS_IFREE(). Add vfs_ifree() method function which frees an inode. Unvariablelize the malloc type used for allocating inodes.
* Fix corruption introduced in previous delta.mckusick2002-12-181-4/+12
| | | | | Reported by: Aurelien Nephtali <aurelien.nephtali@wanadoo.fr> Sponsored by: DARPA & NAI Labs.
* Keep comments consistent with the code. Minor optimization.mckusick2002-12-181-14/+4
| | | | Sponsored by: DARPA & NAI Labs.
* Cosmetic cleanup of unsigned buglets.mckusick2002-12-181-5/+5
| | | | | Submitted by: Bruce Evans <bde@zeta.org.au> Sponsored by: DARPA & NAI Labs.
* Remove unused lockcnt variable.phk2002-12-171-3/+0
| | | | Approved by: mckusick
* Update to previous change (1.54) to use an approperly wide inode fieldmckusick2002-12-152-9/+12
| | | | | | | | so as to work correctly on 64-bit platforms. Reported-by: Jake Burkholder <jake@locore.ca> Sponsored by: DARPA & NAI Labs. Approved by: Ian Dowse <iedowse@maths.tcd.ie>
* Undo the adjustment of the total memory used by dirhash in the caseiedowse2002-12-141-6/+10
| | | | | | | where allocating the dirhash structure fails. Fix a few typos in comments and update copyright. MFC after: 1 week
* Only the most recent snapshot contains the complete list of blocksmckusick2002-12-142-140/+188
| | | | | | | | | | | that were copied in all of the earlier snapshots, thus its precomputed list must be used in the copyonwrite test. Using incomplete lists may lead to deadlock. Also do not include the blocks used for the indirect pointers in the indirect pointers as this may lead to inconsistent snapshots. Sponsored by: DARPA & NAI Labs. Approved by: re
* Remove the comment about dump(8) not working properly with snapshots.trhodes2002-12-121-3/+1
| | | | | Discussed with: mckusick Approved by: re (rwatson)
* More tightly verify the preference returned for the new inode.mckusick2002-12-061-1/+1
| | | | | | Submitted by: Kris Kennaway <kris@obsecurity.org> Sponsored by: DARPA & NAI Labs. Approved by: re
* Have to use bread() rather than UFS_BALLOC() when obtaining amckusick2002-12-031-24/+30
| | | | | | | | | | | previously allocated block as the previous use of the block may have fallen out of the cache. Failure to reread its contents cause zeroed results to be written instead of the proper contents. Conversely, when the block is going to be entirely filled in, it is not necessary reread the old contents. Sponsored by: DARPA & NAI Labs. Approved by: re
* Add a check to disable the previous patch so that future filesystemsmckusick2002-11-301-2/+4
| | | | | | | that choose to place their superblocks in non-standard locations will not get them smashed. Sponsored by: DARPA & NAI Labs.
* Remove a race condition / deadlock from snapshots. Whenmckusick2002-11-301-54/+112
| | | | | | | | | | | | converting from individual vnode locks to the snapshot lock, be sure to pass any waiting processes along to the new lock as well. This transfer is done by a new function in the lock manager, transferlockers(from_lock, to_lock); Thanks to Lamont Granquist <lamont@scriptkiddie.org> for his help in pounding on snapshots beyond all reason and finding this deadlock. Sponsored by: DARPA & NAI Labs.
* Fix two deadlocks in snapshots:mckusick2002-11-301-2/+7
| | | | | | | | | | | | | | | | 1) Release the snapshot file lock while suspending the system. Otherwise a process trying to read the lock may block on its containing directory preventing the suspension from completing. Thanks to Sean Kelly <smkelly@zombie.org> for finding this deadlock. 2) Replace some bdwrite's with bawrite's so as not to fill all the buffers with dirty data. The buffers could not be cleaned as the snapshot vnode was locked hence the system could deadlock when making snapshots of really massive filesystems. Thanks to Hidetoshi Shimokawa <simokawa@sat.t.u-tokyo.ac.jp> for figuring this out. Sponsored by: DARPA & NAI Labs.
* Check to make sure that the fs_sblockloc field was properly updatedmckusick2002-11-291-0/+10
| | | | | | | | | | before using it to write the superblock. This is to guard against accidentally trashing the disklabel if the superblock format missed being upgraded by the new kernel. Reported by: Sam Leffler <sam@errno.com> Sponsored by: DARPA & NAI Labs. Approved by: Murray Stokely <murray@FreeBSD.org>
* Create a new 32-bit fs_flags word in the superblock. Add code to movemckusick2002-11-273-15/+24
| | | | | | | | | | | | | | | | | the old 8-bit fs_old_flags to the new location the first time that the filesystem is mounted by a new kernel. One of the unused flags in fs_old_flags is used to indicate that the flags have been moved. Leave the fs_old_flags word intact so that it will work properly if used on an old kernel. Change the fs_sblockloc superblock location field to be in units of bytes instead of in units of filesystem fragments. The old units did not work properly when the fragment size exceeeded the superblock size (8192). Update old fs_sblockloc values at the same time that the flags are moved. Suggested by: BOUWSMA Barry <freebsd-misuser@netscum.dyndns.dk> Sponsored by: DARPA & NAI Labs.
* The target for the maximum number of dependencies has been cutmckusick2002-11-201-1/+1
| | | | | | | | | in half because of reports that under heavy load the kernel could exhaust its memory pool. The limit is now (desiredvnodes * 4) rather than (desiredvnodes * 8), so it will still scale with larger systems, just not as quickly. Sponsored by: DARPA & NAI Labs.
* If an error occurs while writing a buffer, then the data willmckusick2002-11-201-0/+6
| | | | | | | | | | not have hit the disk and the dependencies cannot be unrolled. In this case, the system will mark the buffer as dirty again so that the write can be retried in the future. When the write succeeds or the system gives up on the buffer and marks it as invalid (B_INVAL), the dependencies will be cleared. Sponsored by: DARPA & NAI Labs.
* Do not assume that time_t is an int.peter2002-11-151-2/+2
| | | | Approved by: re (jhb)
* Print daddr_t's with %j and intmax_t.jhb2002-11-081-4/+5
|
* Update licenses and wording: NAI has authorized the removal of clause threerwatson2002-11-041-7/+4
| | | | | of their BSD-style license; also, carry out the NAI Labs -> Network Associates Laboratories renaming in these files.
* Implement the new 1003.1-2001 pathconf() keys, including the Advisorywollman2002-10-271-11/+47
| | | | | | | Information option. Other filesystem implementations should do something similar. With advice from: mckusick, phk
* Slightly change the semantics of vnode labels for MAC: rather thanrwatson2002-10-262-15/+30
| | | | | | | | | | | | | | | | | | | | | "refreshing" the label on the vnode before use, just get the label right from inception. For single-label file systems, set the label in the generic VFS getnewvnode() code; for multi-label file systems, leave the labeling up to the file system. With UFS1/2, this means reading the extended attribute during vfs_vget() as the inode is pulled off disk, rather than hitting the extended attributes frequently during operations later, improving performance. This also corrects sematics for shared vnode locks, which were not previously present in the system. This chances the cache coherrency properties WRT out-of-band access to label data, but in an acceptable form. With UFS1, there is a small race condition during automatic extended attribute start -- this is not present with UFS2, and occurs because EAs aren't available at vnode inception. We'll introduce a work around for this shortly. Approved by: re Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Within ufs, the ffs_sync and ffs_fsync functions did not alwaysmckusick2002-10-252-4/+11
| | | | | | | | | | | | check for and/or report I/O errors. The result is that a VFS_SYNC or VOP_FSYNC called with MNT_WAIT could loop infinitely on ufs in the presence of a hard error writing a disk sector or in a filesystem full condition. This patch ensures that I/O errors will always be checked and returned. This patch also ensures that every call to VFS_SYNC or VOP_FSYNC with MNT_WAIT set checks for and takes appropriate action when an error is returned. Sponsored by: DARPA & NAI Labs.
* We must be careful to avoid recursive copy-on-write faults whenmckusick2002-10-231-1/+14
| | | | | | trying to clean up during disk-full senarios. Sponsored by: DARPA & NAI Labs.
* Missplaced FREE_LOCK causes a panic when hit while taking a snapshot.mckusick2002-10-231-1/+1
| | | | Sponsored by: DARPA & NAI Labs.
* This update further fine tunes the locking of snapshot vnodes inmckusick2002-10-221-12/+21
| | | | | | | | | the ffs_copyonwrite routine to avoid a deadlock between the syncer daemon trying to sync out a snapshot vnode and the bufdaemon trying to write out a buffer containing the snapshot inode. With any luck this will be the last snapshot race condition. Sponsored by: DARPA & NAI Labs.
* This update is a performance improvement when allocating blocks onmckusick2002-10-221-0/+12
| | | | | | | | | | | | | a full filesystem. Previously, if the allocation failed, we had to fsync the file before rolling back any partial allocation of indirect blocks. Most block allocation requests only need to allocate a single data block and if that allocation fails, there is nothing to unroll. So, before doing the fsync, we check to see if any rollback will really be necessary. If none is necessary, then we simply return. This update eliminates the flurry of disk activity that got triggered whenever a filesystem would run out of space. Sponsored by: DARPA & NAI Labs.
* This checkin reimplements the io-request priority hack in a waymckusick2002-10-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | that works in the new threaded kernel. It was commented out of the disksort routine earlier this year for the reasons given in kern/subr_disklabel.c (which is where this code used to reside before it moved to kern/subr_disk.c): ---------------------------- revision 1.65 date: 2002/04/22 06:53:20; author: phk; state: Exp; lines: +5 -0 Comment out Kirks io-request priority hack until we can do this in a civilized way which doesn't cause grief. The problem is that it is not generally safe to cast a "struct bio *" to a "struct buf *". Things like ccd, vinum, ata-raid and GEOM constructs bio's which are not entrails of a struct buf. Also, curthread may or may not have anything to do with the I/O request at hand. The correct solution can either be to tag struct bio's with a priority derived from the requesting threads nice and have disksort act on this field, this wouldn't address the "silly-seek syndrome" where two equal processes bang the diskheads from one edge to the other of the disk repeatedly. Alternatively, and probably better: a sleep should be introduced either at the time the I/O is requested or at the time it is completed where we can be sure to sleep in the right thread. The sleep also needs to be in constant timeunits, 1/hz can be practicaly any sub-second size, at high HZ the current code practically doesn't do anything. ---------------------------- As suggested in this comment, it is no longer located in the disk sort routine, but rather now resides in spec_strategy where the disk operations are being queued by the thread that is associated with the process that is really requesting the I/O. At that point, the disk queues are not visible, so the I/O for positively niced processes is always slowed down whether or not there is other activity on the disk. On the issue of scaling HZ, I believe that the current scheme is better than using a fixed quantum of time. As machines and I/O subsystems get faster, the resolution on the clock also rises. So, ten years from now we will be slowing things down for shorter periods of time, but the proportional effect on the system will be about the same as it is today. So, I view this as a feature rather than a drawback. Hence this patch sticks with using HZ. Sponsored by: DARPA & NAI Labs. Reviewed by: Poul-Henning Kamp <phk@critter.freebsd.dk>
* Rename _POSIX_FOO_PRESENT and friends from POSIX.1e to _PC_FOO_PRESENTrwatson2002-10-201-3/+3
| | | | | | | and related friends. This would have been corrected had POSIX.1e progressed to a standard. Pointed out by: wollman
* Implement _POSIX_ACL_PATH_MAX, which returns the maximum number of ACLrwatson2002-10-201-0/+10
| | | | | | | entries for a file system node using pathconf(). Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Teach UFS to respond to pathconf() tests for _POSIX_ACL_EXTENDED andrwatson2002-10-201-0/+20
| | | | | | | | _POSIX_MAC_PRESENT based on available mount flags, if the services are available. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Clarify that the UFS1 extended attribute configuration steps do not applyrwatson2002-10-191-2/+2
| | | | | | | to UFS2 file systems. Submitted by: jedgar Obtained from: TrustedBSD Project
* Fix a file-rewrite performance case for UFS[2]. When rewriting portionsdillon2002-10-183-7/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of a file in chunks that are less then the filesystem block size, if the data is not already cached the system will perform a read-before-write. The problem is that it does this on a block-by-block basis, breaking up the I/Os and making clustering impossible for the writes. Programs such as INN using cyclic file buffers suffer greatly. This problem is only going to get worse as we use larger and larger filesystem block sizes. The solution is to extend the sequential heuristic so UFS[2] can perform a far larger read and readahead when dealing with this case. (note: maximum disk write bandwidth is 27MB/sec thru filesystem) (note: filesystem blocksize in test is 8K (1K frag)) dd if=/dev/zero of=test.dat bs=1k count=2m conv=notrunc Before: (note half of these are reads) tty da0 da1 acd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 14.21 598 8.30 0.00 0 0.00 0.00 0 0.00 0 0 7 1 92 0 76 14.09 813 11.19 0.00 0 0.00 0.00 0 0.00 0 0 9 5 86 0 76 14.28 821 11.45 0.00 0 0.00 0.00 0 0.00 0 0 8 1 91 After: (note half of these are reads) tty da0 da1 acd0 cpu tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy in id 0 76 63.62 434 26.99 0.00 0 0.00 0.00 0 0.00 0 0 18 1 80 0 76 63.58 424 26.30 0.00 0 0.00 0.00 0 0.00 0 0 17 2 82 0 76 63.82 438 27.32 0.00 0 0.00 0.00 0 0.00 1 0 19 2 79 Reviewed by: mckusick Approved by: re X-MFC after: immediately (was heavily tested in -stable for 4 months)
* Update extended attribute readme file to note that no special configurationrwatson2002-10-181-1/+6
| | | | | | | | is required to use EAs with UFS2, and that UFS2 is recommend for EA use for a variety of reasons. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
* Update instructions for ACLs given recent tunefs, mount changes. Alsorwatson2002-10-181-5/+33
| | | | | | | | note that UFS2 doesn't require explicit extended attribute configuration, and is recommends for this and other reasons if you plan to use ACLs. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories
OpenPOWER on IntegriCloud