summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* Update to comments describing block allocation policy.mckusick2013-07-141-7/+6
| | | | Submitted by: Bruce Evans
* Only copy as much bytes as there in superblock, instead of the fullkib2013-07-121-1/+1
| | | | | | | | | | | block copy, when copying the superblock into the snapshot. UFS1 does not align superblock on the block boundary, and bcopy runs off the end of the buffer. Reported by: Andre Albsmeier <Andre.Albsmeier@siemens.com> Reviewed by: mckusick Sponsored by: The FreeBSD Foundation MFC after: 1 week
* Change i_gen in UFS to an unsigned type.pfg2013-07-101-1/+1
| | | | | | | | | | Missing type change from r252435. This fixes a "Stale NFS file handle" error. Reported by: Claude Bisson Tested by: Claude Bisson Pointed hat: pfg
* There are several code sequences likekib2013-07-093-4/+4
| | | | | | | | | | | | | | | | | | | vfs_busy(mp); vfs_write_suspend(mp); which are problematic if other thread starts unmount between two calls. The unmount starts a write, while vfs_write_suspend() drain writers. On the other hand, unmount drains busy references, causing the deadlock. Add a flag argument to vfs_write_suspend and require the callers of it to specify VS_SKIP_UNMOUNT flag, when the call is performed not in the mount path, i.e. the covered vnode is not locked. The suspension is not attempted if VS_SKIP_UNMOUNT is specified and unmount is in progress. Reported and tested by: Andreas Longwitz <longwitz@incore.de> Sponsored by: The FreeBSD Foundation MFC after: 3 weeks
* Make better use of metadata area by avoiding using it for data blocksmckusick2013-07-022-3/+25
| | | | | | that no should no longer immediately follow their indirect blocks. MFC after: 2 weeks
* Style fix: spaces.pfg2013-07-021-1/+1
| | | | | | | Cleanup the incomplete revert. Reported by: bde MFC after: 4 weeks
* Change i_gen in UFS to an unsigned type.pfg2013-07-011-1/+1
| | | | | | | | | | Revert the simplification of the i_gen calculation. It is still a good idea to avoid zero values and for the case of old filesystems there is probably no advantage in using the complete 32 bits anyways. Discussed with: bde MFC after: 4 weeks
* Change i_gen in UFS to an unsigned type.pfg2013-07-011-1/+1
| | | | | | | | | Further simplify the i_gen calculation for older disks. Having a zero here is not really a problem and this is more similar to what is done in newfs_random(). Reported by: Xin Li MFC after: 4 weeks
* Don't assume that UFS on-disk format of a directory is the same asgleb2013-07-012-106/+108
| | | | | | | | | | | | | | | | | | defined by <sys/dirent.h> Always start parsing at DIRBLKSIZ aligned offset, skip first entries if uio_offset is not DIRBLKSIZ aligned. Return EINVAL if buffer is too small for single entry. Preallocate buffer for cookies. Cookies will be replaced with d_off field in struct dirent at later point. Skip entries with zero inode number. Stop mangling dirent in ufs_extattr_iterate_directory(). Reviewed by: kib Sponsored by: Google Summer Of Code 2011
* Change i_gen in UFS to an unsigned type.pfg2013-07-011-1/+1
| | | | | | | Missed format specifier. Reported by: mdf MFC after: 4 weeks
* Change i_gen in UFS to an unsigned type.pfg2013-07-014-5/+5
| | | | | | | | | | | | | In UFS, i_gen is a random generated value and there is not way for it to be negative. Actually, the value of i_gen is just used to match bit patterns and it is of not consequence if the values are signed or not. Following other filesystems, set it to unsigned and use it as such, Discussed by: mckusick Reviewed by: mckusick (previous version) MFC after: 4 weeks
* - Convert the bufobj lock to rwlock.jeff2013-05-314-66/+67
| | | | | | | | | | - Use a shared bufobj lock in getblk() and inmem(). - Convert softdep's lk to rwlock to match the bufobj lock. - Move INFREECNT to b_flags and protect it with the buf lock. - Remove unnecessary locking around bremfree() and BKGRDINPROG. Sponsored by: EMC / Isilon Storage Division Discussed with: mckusick, kib, mdf
* Properly spell sentinel (missed in 250891)mckusick2013-05-221-1/+1
| | | | | | | No functional changes. Spotted by: Navdeep Parhar and Alexey Dokuchaev MFC after: 2 weeks
* Add missing buffer releases (brelse) after bread calls that returnmckusick2013-05-221-2/+6
| | | | | | | | | an error. One could argue that returning a buffer even when it is not valid is incorrect, but bread has always returned a buffer valid or not. Reviewed by: kib MFC after: 2 weeks
* Add missing 28th element to softdep types name array.mckusick2013-05-221-1/+4
| | | | | | Found by: Coverity Scan, CID 1007621 Reviewed by: kib MFC after: 2 weeks
* Null a pointer after it is freed so that when it is returnedmckusick2013-05-221-0/+1
| | | | | | | | | | | the return value is NULL. Based on the returned flags, the return value should never be inspected in the case where NULL is returned, but it is good coding practice not to return a pointer to freed memory. Found by: Coverity Scan, CID 1006096 Reviewed by: kib MFC after: 2 weeks
* Remove a bogus check for a NULL buffer pointer.mckusick2013-05-221-7/+8
| | | | | | | | Add a KASSERT that it is not NULL. Found by: Coverity Scan, CID 1009114 Reviewed by: kib MFC after: 2 weeks
* Properly spell sentinel (not sintenel or sentinal).mckusick2013-05-221-28/+28
| | | | | | | No functional changes. Spotted by: kib MFC after: 2 weeks
* Fix several typoseadler2013-05-121-1/+1
| | | | | | PR: kern/176054 Submitted by: Christoph Mallon <christoph.mallon@gmx.de> MFC after: 3 days
* - Correct mispellings of the word occurrencegabor2013-04-171-1/+1
| | | | Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)
* Prepare to replace the buf splay with a trie:jeff2013-04-062-27/+18
| | | | | | | | | | | | | | | | - Don't insert BKGRDMARKER bufs into the splay or dirty/clean buf lists. No consumers need to find them there and it complicates the tree. These flags are all FFS specific and could be moved out of the buf cache. - Use pbgetvp() and pbrelvp() to associate the background and journal bufs with the vp. Not only is this much cheaper it makes more sense for these transient bufs. - Fix the assertions in pbget* and pbrel*. It's not safe to check list pointers which were never initialized. Use the BX flags instead. We also check B_PAGING in reassignbuf() so this should cover all cases. Discussed with: kib, mckusick, attilio Sponsored by: EMC / Isilon Storage Division
* The code in clear_remove() and clear_inodedeps() skips one entrymckusick2013-04-031-4/+4
| | | | | | | | | | | | | | | | in the pagedep and inodedep hash tables. An entry in the table is skipped because 'pagedep_hash' and 'inodedep_hash' hold the size of the hash tables - 1. The chance that this would have any operational failure is extremely unlikely. These funtions only need to find a single entry and are only called when there are too many entries. The chance that they would fail because all the entries are on the single skipped hash chain are remote. Submitted by: Pedro Martelletto Reviewed by: kib MFC after: 2 weeks
* The purpose of this change to the FFS layout policy is to reduce themckusick2013-03-223-70/+175
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | running time for a full fsck. It also reduces the random access time for large files and speeds the traversal time for directory tree walks. The key idea is to reserve a small area in each cylinder group immediately following the inode blocks for the use of metadata, specifically indirect blocks and directory contents. The new policy is to preferentially place metadata in the metadata area and everything else in the blocks that follow the metadata area. The size of this area can be set when creating a filesystem using newfs(8) or changed in an existing filesystem using tunefs(8). Both utilities use the `-k held-for-metadata-blocks' option to specify the amount of space to be held for metadata blocks in each cylinder group. By default, newfs(8) sets this area to half of minfree (typically 4% of the data area). This work was inspired by a paper presented at Usenix's FAST '13: www.usenix.org/conference/fast13/ffsck-fast-file-system-checker Details of this implementation appears in the April 2013 of ;login: www.usenix.org/publications/login/april-2013-volume-38-number-2. A copy of the April 2013 ;login: paper can also be downloaded from: www.mckusick.com/publications/faster_fsck.pdf. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks
* When renaming a directory from one parent directory to another,mckusick2013-03-201-17/+30
| | | | | | | | | | | | | | we need to call ufs_checkpath() to walk from our new location to the root of the filesystem to ensure that we do not encounter ourselves along the way. Until now, we accomplished this by reading the ".." entries of each directory in our path until we reached the root (or encountered an error). This change tries to avoid the I/O of reading the ".." entries by first looking them up in the name cache and only doing the I/O when the name cache lookup fails. Reviewed by: kib Tested by: Peter Holm MFC after: 4 weeks
* UFS support of the unmapped i/o for the user data buffers.kib2013-03-195-35/+62
| | | | | Sponsored by: The FreeBSD Foundation Tested by: pho, scottl, jhb, bf
* Do not remap usermode pages into KVA for physio.kib2013-03-191-2/+2
| | | | | Sponsored by: The FreeBSD Foundation Tested by: pho
* Remove negative name cache entry pointing to the target name, whichkib2013-03-171-0/+1
| | | | | | | | could be instantiated while tdvp was unlocked. Reported by: Rick Miller <vmiller at hostileadmin com> Tested by: pho MFC after: 1 week
* Some style fixes.kib2013-03-141-4/+4
| | | | Sponsored by: The FreeBSD Foundation
* Add currently unused flag argument to the cluster_read(),kib2013-03-142-4/+6
| | | | | | | | cluster_write() and cluster_wbuild() functions. The flags to be allowed are a subset of the GB_* flags for getblk(). Sponsored by: The FreeBSD Foundation Tested by: pho
* MFCattilio2013-02-274-23/+129
|
* Rename VM_OBJECT_LOCK(), VM_OBJECT_UNLOCK() and VM_OBJECT_TRYLOCK() toattilio2013-02-202-5/+5
| | | | | | their "write" versions. Sponsored by: EMC / Isilon storage division
* Switch vm_object lock to be a rwlock.attilio2013-02-202-0/+2
| | | | | | | | * VM_OBJECT_LOCK and VM_OBJECT_UNLOCK are mapped to write operations * VM_OBJECT_SLEEP() is introduced as a general purpose primitve to get a sleep operation using a VM_OBJECT_LOCK() as protection * The approach must bear with vm_pager.h namespace pollution so many files require including directly rwlock.h
* The UFS2 filesystem allocates new blocks of inodes as they are needed.mckusick2013-02-161-3/+10
| | | | | | | | | | | | | | | | | | | | | When a cylinder group runs short of inodes, a new block for inodes is allocated, zero'ed, and written to the disk. The zero'ed inodes must be on the disk before the cylinder group can be updated to claim them. If the cylinder group claiming the new inodes were written before the zero'ed block of inodes, the system could crash with the filesystem in an unrecoverable state. Rather than adding a soft updates dependency to ensure that the new inode block is written before it is claimed by the cylinder group map, we just do a barrier write of the zero'ed inode block to ensure that it will get written before the updated cylinder group map can be written. This change should only slow down bulk loading of newly created filesystems since that is the primary time that new inode blocks need to be created. Reported by: Robert Watson Reviewed by: kib Tested by: Peter Holm
* Fix several unsafe pointer dereferences in the buffered_write()kib2013-02-101-3/+23
| | | | | | | | | | | | | | | | | | | function, implementing the sysctl vfs.ffs.set_bufoutput (not used in the tree yet). - The current directory vnode dereference is unsafe since fd_cdir could be changed and unreferenced, lock the filedesc around and vref the fd_cdir. - The VTOI() conversion of the fd_cdir is unsafe without first checking that the vnode is indeed from an FFS mount, otherwise the code dereferences a random memory. - The cdir could be reclaimed from under us, lock it around the checks. - The type of the fp vnode might be not a disk, or it might have changed while the thread was in flight, check the type. Reviewed and tested by: mckusick MFC after: 2 weeks
* Remove unused MAXSYMLINKLEN macro.pfg2013-02-081-4/+0
| | | | | | Reviewed by: mckusick PR: kern/175794 MFC after: 1 week
* UFS: Remove dead assignment.pfg2013-02-031-1/+0
| | | | | Submitted by: Christoph Mallon MFC after: 3 days
* For UFS2 i_blocks is unsigned. The current "sanity" check that itmckusick2013-02-031-3/+3
| | | | | | | | | | | has gone below zero after the blocks in its inode are freed is a no-op which the compiler fails to warn about because of the use of the DIP macro. Change the sanity check to compare the number of blocks being freed against the value i_blocks. If the number of blocks being freed exceeds i_blocks, just set i_blocks to zero. Reported by: Pedro Giffuni (pfg@) MFC after: 2 weeks
* Add flags argument to vfs_write_resume() and removekib2013-01-114-14/+10
| | | | | | vfs_write_resume_flags(). Sponsored by: The FreeBSD Foundation
* The process_deferred_inactive() function locks the vnodes of the ufskib2013-01-011-1/+1
| | | | | | | | | | | | | | | mount, which means that is must not be called while the snaplock is owned. The vfs_write_resume(9) does call the function as the VFS_SUSP_CLEAN() method, which is too early and falls into the region still protected by snaplock. Add yet another flag for the vfs_write_resume_flags() to avoid calling suspension cleanup handler after the suspend is lifted, and use it in the ffs_snapshot() call to vfs_write_resume. Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 2 weeks
* Make it possible to atomically resume writes on the mount and accountkib2012-12-281-2/+1
| | | | | | | | | | | | | | | | | | | the write start, by adding a variation of the vfs_write_resume(9) which accepts flags. Use the new function to prevent a deadlock between parallel suspension and snapshotting a UFS mount. The ffs_snapshot() code performed vfs_write_resume() followed by vn_start_write() while owning the snaplock. If the suspension intervene between resume and vn_start_write(), the deadlock occured after the suspending thread tried to lock the snaplock, most typically during the write in the ffs_copyonwrite(). Reported and tested by: Andreas Longwitz <longwitz@incore.de> Reviewed by: mckusick MFC after: 2 weeks X-MFC-note: make the vfs_write_resume(9) function a macro after the MFC, in HEAD
* Fixup r218424: uio_yield() was scaling directly to userland priority.attilio2012-12-211-1/+1
| | | | | | | | | | | | | | | When kern_yield() was introduced with the possibility to specify a new priority, the behaviour changed by not lowering priority at all in the consumers, making the yielding mechanism highly ineffective for high priority kthreads like bufdaemon, syncer, vlrudaemon, etc. There are no evidences that consumers could bear with such change in semantic and this situation could finally lead to bugs similar to the ones fixed in r244240. Re-specify userland pri for kthreads involved. Tested by: pho Reviewed by: kib, mdf MFC after: 1 week
* Fix a typo, resulting in the NULL pointer dereference.kib2012-12-151-1/+1
| | | | | | Reported and tested by: pho Sponsored by: The FreeBSD Foundation MFC after: 3 days
* r16312 is not any longer real since many years (likely since when VFSattilio2012-11-191-8/+0
| | | | | | | | | | received granular locking) but the comment present in UFS has been copied all over other filesystems code incorrectly for several times. Removes comments that makes no sense now. Reviewed by: kib MFC after: 3 days
* Fix build of kdump(1).trasz2012-11-181-0/+1
|
* Add UFS writesuspension mechanism, designed to allow userland processestrasz2012-11-185-10/+384
| | | | | | | to modify on-disk metadata for filesystems mounted for write. Reviewed by: kib, mckusick Sponsored by: FreeBSD Foundation
* - Fix a truncation bug with softdep journaling that could leak blocks onjeff2012-11-141-39/+100
| | | | | | | | | | | | | | crash. When truncating a file that never made it to disk we use the canceled allocation dependencies to hold the journal records until the truncation completes. Previously allocdirect dependencies on the id_bufwait list were not considered and their journal space could expire before the bitmaps were written. Cancel them and attach them to the freeblks as we do for other allocdirects. - Add KTR traces that were used to debug this problem. - When adding jsegdeps, always use jwork_insert() so we don't have more than one segdep on a given jwork list. Sponsored by: EMC / Isilon Storage Division
* - Fix a bug that has existed since the original softdep implementation.jeff2012-11-121-14/+27
| | | | | | | | | | | | | When a background copy of a cg is written we complete any work associated with that bmsafemap. If new work has been added to the non-background copy of the buffer it will be completed before the next write happens. The solution is to do the rollbacks when we make the copy so only those dependencies that were present at the time of writing will be completed when the background write completes. This would've resulted in various bitmap related corruptions and panics. It also would've expired journal entries early causing journal replay to miss some records. MFC after: 2 weeks
* Complete MPSAFE VFS interface and remove MNTK_MPSAFE flag.attilio2012-11-091-2/+2
| | | | | Porters should refer to __FreeBSD_version 1000021 for this change as it may have happened at the same timeframe.
* - Correct rev 242734, segments can sometimes get stuck. Be a bit morejeff2012-11-091-1/+4
| | | | | | defensive with segment state. Reported by: b. f. <bf1783@googlemail.com>
* - Implement BIO_FLUSH support around journal entries. This will not 100%jeff2012-11-081-16/+121
| | | | | | | | | | | solve power loss problems with dishonest write caches. However, it should improve the situation and force a full fsck when it is unable to resolve with the journal. - Resolve a case where the journal could wrap in an unsafe way causing us to prematurely lose journal entries in very specific scenarios. Discussed with: mckusick MFC after: 1 month
OpenPOWER on IntegriCloud