summaryrefslogtreecommitdiffstats
path: root/sys/ufs
Commit message (Collapse)AuthorAgeFilesLines
* For now on every 10 cyclinder groups flush the buffer cache to freeambrisko2008-11-131-0/+4
| | | | | | | | | up space. If the buffer cache fills up then the disk systems can grind to a halt. Better tuning can be figured out later. Tested by: Tim, others and work Reviewed by: Kostik Belousov PR: 128832
* Quiet a WITNESS warning with the dirhash sx locks by setting the DUPOKjhb2008-11-041-1/+10
| | | | | | | flag. Specifically, if two threads race to create a dirhash for a directory, then one might already have created a private dirhash structure (and locked it) when it realizes the directory now has a structure and tries to lock that one.
* In UFS, when reading EA that contains ACL fails for some reason, includetrasz2008-11-041-2/+5
| | | | | | inode number and filesystem name, so the administrator can fix the problem. Approved by: rwatson (mentor)
* Improve VFS locking:attilio2008-11-022-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | - Implement real draining for vfs consumers by not relying on the mnt_lock and using instead a refcount in order to keep track of lock requesters. - Due to the change above, remove the mnt_lock lockmgr because it is now useless. - Due to the change above, vfs_busy() is no more linked to a lockmgr. Change so its KPI by removing the interlock argument and defining 2 new flags for it: MBF_NOWAIT which basically replaces the LK_NOWAIT of the old version (which was unlinked from the lockmgr alredy) and MBF_MNTLSTLOCK which provides the ability to drop the mountlist_mtx once the mnt interlock is held (ability still desired by most consumers). - The stub used into vfs_mount_destroy(), that allows to override the mnt_ref if running for more than 3 seconds, make it totally useless. Remove it as it was thought to work into older versions. If a problem of "refcount held never going away" should appear, we will need to fix properly instead than trust on such hackish solution. - Fix a bug where returning (with an error) from dounmount() was still leaving the MNTK_MWAIT flag on even if it the waiters were actually woken up. Just a place in vfs_mount_destroy() is left because it is going to recycle the structure in any case, so it doesn't matter. - Remove the markercnt refcount as it is useless. This patch modifies VFS ABI and breaks KPI for vfs_busy() so manpages and __FreeBSD_version will be modified accordingly. Discussed with: kib Tested by: pho
* Introduce accmode_t. This is required for NFSv4 ACLs - it will be neccessarytrasz2008-10-282-12/+12
| | | | | | | to add more V* constants, and the variables changed by this patch were often being assigned to mode_t variables, which is 16 bit. Approved by: rwatson (mentor)
* Provide an explanation for getinoquota() call in the ufs_access vop.kib2008-10-281-0/+11
| | | | MFC after: 3 days
* Fix a number of style issues in the MALLOC / FREE commit. I've tried todes2008-10-232-3/+6
| | | | | be careful not to fix anything that was already broken; the NFSv4 code is particularly bad in this respect.
* Retire the MALLOC and FREE macros. They are an abomination unto style(9).des2008-10-237-65/+62
| | | | MFC after: 3 months
* Assert that v_holdcnt is non-zero before entering lockmgr in vn_lockkib2008-10-201-0/+4
| | | | | | | | | and ffs_lock. This cannot catch situations where holdcnt is incremented not by curthread, but I think it is useful. Reviewed by: tegge, attilio Tested by: pho MFC after: 2 weeks
* Sync up summary information for cylinder groups while data is alreadykib2008-10-131-0/+7
| | | | | | | | in memory during snapshot creation. This improves the results of the background fsck. Submitted by: tegge MFC after: 1 week
* Remove the struct thread unuseful argument from bufobj interface.attilio2008-10-102-6/+6
| | | | | | | | | | | | | | | | | | | | | In particular following functions KPI results modified: - bufobj_invalbuf() - bufsync() and BO_SYNC() "virtual method" of the buffer objects set. Main consumers of bufobj functions are affected by this change too and, in particular, functions which changed their KPI are: - vinvalbuf() - g_vfs_close() Due to the KPI breakage, __FreeBSD_version will be bumped in a later commit. As a side note, please consider just temporary the 'curthread' argument passing to VOP_SYNC() (in bufsync()) as it will be axed out ASAP Reviewed by: kib Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* Enable shared lookups on UFS. There are some remaining issues with forcedjhb2008-09-241-1/+1
| | | | | | unmounts, but those are in the VFS lookup code are not UFS specific. Tested by: pho, kris
* Close a race between concurrent calls to ufsdirhash_recycle() andjhb2008-09-221-5/+10
| | | | | | | | ufsdirhash_free() introduced in my last commit by removing the dirhash about to be free'd in ufsdirhash_free() from the global dirhash list before dropping the sx lock. Tested by: kris
* Initialize va_flags and va_filerev properly in VOP_GETATTR(). Don'tkib2008-09-201-2/+0
| | | | | | | | | | initialize va_vaflags and va_spare because they are not part of the VOP_GETATTR() API. Also don't initialize birthtime to ctime or zero. Submitted by: Jaakko Heinonen <jh saunalahti fi> Reviewed by: bde Discussed on: freebsd-fs MFC after: 1 month
* Retire the 'i_reclen' field from the in-memory i-node. Previously,jhb2008-09-162-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | during a DELETE lookup operation, lookup would cache the length of the directory entry to be deleted in 'i_reclen'. Later, the actual VOP to remove the directory entry (ufs_remove, ufs_rename, etc.) would call ufs_dirremove() which extended the length of the previous directory entry to "remove" the deleted entry. However, we always read the entire block containing the directory entry when doing the removal, so we always have the directory entry to be deleted in-memory when doing the update to the directory block. Also, we already have to figure out where the directory entry that is being removed is in the block so that we can pass the component name to the dirhash code to update the dirhash. So, instead of passing 'i_reclen' from ufs_lookup() to the ufs_dirremove() routine, just read the 'd_reclen' field directly out of the entry being removed when updating the length of the previous entry in the block. This avoids a cosmetic issue of writing to 'i_reclen' while holding a shared vnode lock. It also slightly reduces the amount of side-band data passed from ufs_lookup() to operations updating a directory via the directory's i-node. Reviewed by: jeff
* Fix a race with shared lookups on UFS. If the the dirhash code reached thejhb2008-09-162-34/+87
| | | | | | | | | | | | | | | | cap on memory usage, then shared LOOKUP operations could start free'ing dirhash structures. Without these fixes, concurrent free's on the same directory could result in one of the threads blocked on a lock in a dirhash structure free'd by the other thread. - Replace the lockmgr lock in the dirhash structure with an sx lock. - Use a reference count managed with ufsdirhash_hold()/drop() to determine when to free the dirhash structures. The directory i-node holds a reference while the dirhash is attached to an i-node. Code that wishes to lock the dirhash while holding a shared vnode lock must first acquire a private reference to the dirhash while holding the vnode interlock before acquiring the dirhash sx lock. After acquiring the sx lock, it drops the private reference after checking to see if the dirhash is still used by the directory i-node.
* - Only set i_offset in the parent directory's i-node during a lookup forjhb2008-09-161-3/+9
| | | | | | | | | | | | | non-LOOKUP operations. - Relax a VOP assertion for a DELETE lookup. rename() uses WANTPARENT instead of LOCKPARENT when looking up the source pathname. ufs_rename() uses a relookup() to lock the parent directory when it decides to finally remove the source path. Thus, it is ok for a DELETE with WANTPARENT set instead of LOCKPARENT to use a shared vnode lock rather than an exclusive vnode lock. Reported by: kris (2) Reviewed by: jeff
* vdropl() drops the vnode interlock. Thus, the code in the QUOTA case thatjhb2008-09-161-3/+2
| | | | | | | | | | upgrades the vnode lock if it is share locked was dropping the interlock before actually checking VI_DOOMED. Fix this by do the vdropl() after the check and relying on it to drop the vnode interlock. Reported by: pho Reviewed by: kib MFC after: 1 week
* Suspend the write operations on the UFS filesystem being unmounted orkib2008-09-161-14/+73
| | | | | | | | remounted from rw to ro. Proposed and reviewed by: tegge In collaboration with: pho MFC after: 1 month
* When attempt is made to suspend a filesystem that is already syspended,kib2008-09-163-2/+7
| | | | | | | | | | | | | | | | | | | wait until the current suspension is lifted instead of silently returning success immediately. The consequences of calling vfs_write() resume when not owning the suspension are not well-defined at best. Add the vfs_susp_clean() mount method to be called from vfs_write_resume(). Set it to process_deferred_inactive() for ffs, and stop calling it manually. Add the thread flag TDP_IGNSUSP that allows to bypass the suspension point in the vn_start_write. It is intended for use by VFS in the situations where the suspender want to do some i/o requiring calls to vn_start_write(), and this i/o cannot be done later. Reviewed by: tegge In collaboration with: pho MFC after: 1 month
* Add the ffs structures introspection functions for ddb.kib2008-09-162-1/+65
| | | | | | | | | Show the b_dep value for the buffer in the show buffer command. Add a comand to dump the dirty/clean buffer list for vnode. Reviewed by: tegge Tested and used by: pho MFC after: 1 month
* When downgrading the read-write mount to read-only, do_unmount() setskib2008-09-166-4/+16
| | | | | | | | | | | | | | | MNT_RDONLY flag before the VFS_MOUNT() is called. In ufs_inactive() and ufs_itimes_locked(), UFS verifies whether the fs is read-only by checking MNT_RDONLY, but this may cause loss of the IN_MODIFIED flag for inode on the fs being remounted rw->ro. Introduce UFS_RDONLY() struct ufsmount' method that reports the value of the fs_ronly. The later is set to 1 only after the remount is finished. Reviewed by: tegge In collaboration with: pho MFC after: 1 month
* The struct inode *ip supplied to softdep_freefile is not neccessary thekib2008-09-161-1/+2
| | | | | | | | | | | inode having number ino. In r170991, the ip was marked IN_MODIFIED, that is not quite correct. Mark only the right inode modified by checking inode number. Reviewed by: tegge In collaboration with: pho MFC after: 1 month
* When calling extattr_check_cred, use V{READ,WRITE}, not I{READ,WRITE}.trasz2008-09-032-7/+7
| | | | Approved by: rwatson (mentor)
* Decontextualize vfs_busy(), vfs_unbusy() and vfs_mount_alloc() functions.attilio2008-08-312-4/+4
| | | | | | Manpages are updated accordingly. Tested by: Diego Sardina <siarodx at gmail dot com>
* Decontextualize the couplet VOP_GETATTR / VOP_SETATTR as the passed threadattilio2008-08-282-4/+2
| | | | | | was always curthread and totally unuseful. Tested by: Giovanni Trematerra <giovanni dot trematerra at gmail dot com>
* In ffs_valloc(), ffs_vget() may fail because insmntque() refused tokib2008-08-281-1/+11
| | | | | | | | | | | | | | insert new vnode into the mount vnode list. Then, for the SU-enabled mount, ffs_vfree could create freefile dependency. This dependency can hang around forever since inode is not marked as IN_MODIFIED and correspondingly inodeblock may be not marked as dirty. After ffs_vget() fails, retry with FFSV_FORCEINSMQ, mark the inode as modified, and vput() it immediately. Take care of the dup alloc. Tested by: pho Reviewed by: tegge MFC after: 1 month
* Softdep code may need to instantiate vnode when processingkib2008-08-283-15/+59
| | | | | | | | | | | | | | | | | | dependencies. In particular, it may need this while syncing filesystem being unmounted. Since during unmount MNTK_NOINSMNTQUE flag is set, that could sometimes disallow insertion of the vnode into the vnode mount list, softdep code needs to overwrite the MNTK_NOINSMNTQUE flag. Create the ffs_vgetf() function that sets the VV_FORCEINSMQ flag for new vnode and use it consistently from the softdep code instead of ffs_vget(). Add the retry logic to the softdep_flushfiles() to flush the vnodes that could be instantiated while flushing softdep dependencies. Tested by: pho, kris Reviewed by: tegge MFC after: 1 month
* Put the relocked variable from the r182111 into the #ifdef QUOTA braceskib2008-08-241-1/+4
| | | | | | | to prevent warning about unused var on the !QUOTA kernels. Reported by: ed MFC after: 1 week
* Revert the r167541: "Remove unneeded getinoquota() call in thekib2008-08-241-1/+23
| | | | | | | | | | | | | | | | ufs_access()." The call to getinoquota in ufs_access() serves the purpose of instantiating inode dquot from the vn_open(). Since quotas are accounted only for the inodes with already attached dquot, removal of the call prevented opened inodes from participation in the quota calculations. Since ufs_access() may be called with the vnode being only shared locked, upgrade (and then downgrade) vnode lock if calling getinoquota(). Reported by: simon at optinet com In collaboration with: pho MFC after: 1 week
* Revert r181345.kib2008-08-101-2/+1
| | | | | | | Move the NULL pointer check to the vfs_deleteopt() function. Discussed with: rodrigc MFC after: 3 days
* User may do "mount -o snapshot ...", that causes new FFS mount to bekib2008-08-061-1/+2
| | | | | | | | performed with snapshot option, while the mp->mnt_opt is NULL. Protect against NULL pointer dereference. Noted by: Mateusz Guzik <mjguzik gmail com> MFC after: 3 days
* ufsmount.h uses "struct\tfoo *bar;", except where it doesn't.des2008-08-052-7/+7
| | | | | quota.h uses "struct foo\t*bar;", except where it doesn't. Try to make them both agree with themselves (though not with eachother)
* Whitespace, prototypesdes2008-08-051-88/+27
|
* Whitespace tweak.jhb2008-07-301-1/+0
|
* The ffs_balloc_ufs{1,2} functions call bdwrite() while having severalkib2008-07-231-2/+22
| | | | | | | | | | | | | | | vnode buffers locked at once. In particular, there are indirect buffers among locked ones. The bdwrite() may start the flushing to keep dirty buffer list at the bounds. If any buffer on the dirty list requires translation from logical to physical block number, code may ends up trying to lock an indirect buffer already locked in ffs_balloc_ufsX. Prevent the bdflush() activity when several buffers are locked at once by setting the TDP_INBDFUSH for the problematic code blocks. Reported and tested by: pho, Josef Buchsteiner at Juniper In collaboration with: kan MFC after: 1 month
* Say hi to svn, by simplifing ffs_vget() function a bit - there is no need forpjd2008-07-191-3/+1
| | | | a variable that is used only once.
* Fix comments to replace SBSIZE with SBLOCKSIZE, since SBSIZErodrigc2008-05-241-2/+2
| | | | | | was renamed to SBLOCKSIZE in version 1.33 Reviewed by: mckusick
* After converting the "snapshot" mount option to the MNT_SNAPSHOT flag,rodrigc2008-05-241-1/+8
| | | | | | | | | | | delete "snapshot" from the persistent mount options list. This should fix problems with doing a mount -o snapshot of a file system, followed by an NFS export of the same file system. PR: 122833 Reported by: Leon Kos <leon.kos lecad fs uni-lj si>, Jaakko Heinonen <jh saunalahti fi> MFC after: 1 month
* For the following mount options, do not perform the string to flag conversionsrodrigc2008-05-241-21/+0
| | | | | | | | | | | | | here, because we already do them further up in vfs_donmount() in vfs_mount.c async -> MNT_ASYNC force -> MNT_FORCE multilabel -> MNT_MULTILABEL noatime -> MNT_NOATIME noclusterr -> MNT_NOCLUSTERR noclusterw -> MNT_NOCLUSTERW MFC after: 1 month
* Allow VM object creation in ufs_lookup. (If vfs.vmiodirenable is set)ups2008-05-201-0/+10
| | | | | | | | | | | | Directory IO without a VM object will store data in 'malloced' buffers severely limiting caching of the data. Without this change VM objects for directories are only created on an open() of the directory. TODO: Inline test if VM object already exists to avoid locking/function call overhead. Tested by: kris@ Reviewed by: jeff@ Reported by: David Filo
* - Use a local variable for i_ino in ufs_lookup. It is only used tojeff2008-04-222-14/+10
| | | | | | | communicate between two parts of this one function. This was causing problems with shared lookups as each would trash the ino value in the inode. - Remove the unused i_ino field from the inode structure.
* Move the head of byte-level advisory lock list from thekib2008-04-162-42/+0
| | | | | | | | | | | | | | | | | | | | | | filesystem-specific vnode data to the struct vnode. Provide the default implementation for the vop_advlock and vop_advlockasync. Purge the locks on the vnode reclaim by using the lf_purgelocks(). The default implementation is augmented for the nfs and smbfs. In the nfs_advlock, push the Giant inside the nfs_dolock. Before the change, the vop_advlock and vop_advlockasync have taken the unlocked vnode and dereferenced the fs-private inode data, racing with with the vnode reclamation due to forced unmount. Now, the vop_getattr under the shared vnode lock is used to obtain the inode size, and later, in the lf_advlockasync, after locking the vnode interlock, the VI_DOOMED flag is checked to prevent an operation on the doomed vnode. The implementation of the lf_purgelocks() is submitted by dfr. Reported by: kris Tested by: kris, pho Discussed with: jeff, dfr MFC after: 2 weeks
* - Use a lockmgr lock rather than a mtx to protect dirhash. This lockjeff2008-04-112-228/+291
| | | | | | | | | | | | | | may be held for the duration of the various dirhash operations which avoids many complex unlock/lock/revalidate sequences. - Permit shared locks on lookup. To protect the ip->i_dirhash pointer we use the vnode interlock in the shared case. Callers holding the exclusive vnode lock can run without fear of concurrent modification to i_dirhash. - Hold an exclusive dirhash lock when creating the dirhash structure for the first time or when re-creating a dirhash structure which has been recycled. Tested by: kris, pho
* - cache dp->i_offset in the local 'i_offset' variable for use in loopjeff2008-04-111-29/+48
| | | | | | | | | | | | | | | indexes so directory lookup becomes shared lock safe. In the modifying cases an exclusive lock is held here so the commit routine may rely on the state of i_offset. - Similarly handle i_diroff by fetching at the start and setting only once the operation is complete. Without the exclusive lock these are only considered hints. - Assert that an exclusive lock is held when we're preparing for a commit routine. - Honor the lock type request from lookup instead of always using exclusive locking. Tested by: pho, kris
* Correct function name in panic().pjd2008-04-071-1/+1
| | | | Reported by: kensmith
* Optimize lockmgr in order to get rid of the pool mutex interlock, of theattilio2008-04-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | state transitioning flags and of msleep(9) callings. Use, instead, an algorithm very similar to what sx(9) and rwlock(9) alredy do and direct accesses to the sleepqueue(9) primitive. In order to avoid writer starvation a mechanism very similar to what rwlock(9) uses now is implemented, with the correspective per-thread shared lockmgrs counter. This patch also adds 2 new functions to lockmgr KPI: lockmgr_rw() and lockmgr_args_rw(). These two are like the 2 "normal" versions, but they both accept a rwlock as interlock. In order to realize this, the general lockmgr manager function "__lockmgr_args()" has been implemented through the generic lock layer. It supports all the blocking primitives, but currently only these 2 mappers live. The patch drops the support for WITNESS atm, but it will be probabilly added soon. Also, there is a little race in the draining code which is also present in the current CVS stock implementation: if some sharers, once they wakeup, are in the runqueue they can contend the lock with the exclusive drainer. This is hard to be fixed but the now committed code mitigate this issue a lot better than the (past) CVS version. In addition assertive KA_HELD and KA_UNHELD have been made mute assertions because they are dangerous and they will be nomore supported soon. In order to avoid namespace pollution, stack.h is splitted into two parts: one which includes only the "struct stack" definition (_stack.h) and one defining the KPI. In this way, newly added _lockmgr.h can just include _stack.h. Kernel ABI results heavilly changed by this commit (the now committed version of "struct lock" is a lot smaller than the previous one) and KPI results broken by lockmgr_rw() / lockmgr_args_rw() introduction, so manpages and __FreeBSD_version will be updated accordingly. Tested by: kris, pho, jeff, danger Reviewed by: jeff Sponsored by: Google, Summer of Code program 2007
* Add the support for the AT_FDCWD and fd-relative name lookups to thekib2008-03-311-0/+1
| | | | | | | | | namei(9). Based on the submission by rdivacky, sponsored by Google Summer of Code 2007 Reviewed by: rwatson, rdivacky Tested by: pho
* - Since rev 1.142 of ffs_snapshot.c the interlock has not been requiredjeff2008-03-311-11/+4
| | | | | | | | | | | | | to protect the v_lock pointer. Removing the interlock acquisition here allows vn_lock() to proceed without requiring the interlock at all. - If the lock mutated while we were sleeping on it the interlock has been dropped. It is conceivable that the upper layer code was relying on the interlock and LK_NOWAIT to protect the identity or state of the vnode while acquiring the lock. In this case return EBUSY rather than trying the new lock to prevent potential races. Reviewed by: tegge
* - Don't free snapdata structures when they are no longer in use.jeff2008-03-311-67/+109
| | | | | | | | | | | | | | Keeping the lockmgr lock valid allows us to switch the v_lock pointer in snapshot vnodes between the embedded lockmgr lock and snapdata lock without needing the vnode interlock to protect against races - Keep unused snapdata structures in a list. - Add a function to lock the devvp and allocate a snapdata to it or acquire a new one without races. The old function was safe from creation races because we set the mount flag when creating snapshots and thus serializing them. However, it might have been subject to destroying races. Reviewed by: tegge
OpenPOWER on IntegriCloud