From 79752b63e1e72d72a46513db9b95df486877fdd2 Mon Sep 17 00:00:00 2001 From: kib Date: Mon, 22 Jan 2007 11:25:22 +0000 Subject: Below is slightly edited description of the LOR by Tor Egge: -------------------------- [Deadlock] is caused by a lock order reversal in vfs_lookup(), where [some] process is trying to lock a directory vnode, that is the parent directory of covered vnode) while holding an exclusive vnode lock on covering vnode. A simplified scenario: root fs var fs / A / (/var) D /var B /log (/var/log) E vfs lock C vfs lock F Within each file system, the lock order is clear: C->A->B and F->D->E When traversing across mounts, the system can choose between two lock orders, but everything must then follow that lock order: L1: C->A->B | +->F->D->E L2: F->D->E | +->C->A->B The lookup() process for namei("/var") mixes those two lock orders: VOP_LOOKUP() obtains B while A is held vfs_busy() obtains a shared lock on F while A and B are held (follows L1, violates L2) vput() releases lock on B VOP_UNLOCK() releases lock on A VFS_ROOT() obtains lock on D while shared lock on F is held vfs_unbusy() releases shared lock on F vn_lock() obtains lock on A while D is held (violates L1, follows L2) dounmount() follows L1 (B is locked while F is drained). Without unmount activity, vfs_busy() will always succeed without blocking and the deadlock isn't triggered (the system behaves as if L2 is followed). With unmount, you can get 4 processes in a deadlock: p1: holds D, want A (in lookup()) p2: holds shared lock on F, want D (in VFS_ROOT()) p3: holds B, want drain lock on F (in dounmount()) p4: holds A, want B (in VOP_LOOKUP()) You can have more than one instance of p2. The reversal was introduced in revision 1.81 of src/sys/kern/vfs_lookup.c and MFCed to revision 1.80.2.1, probably to avoid a cascade of vnode locks when nfs servers are dead (VFS_ROOT() just hangs) spreading to the root fs root vnode. - Tor Egge To fix the LOR, ups@ noted that when crossing the mount point, ni_dvp is actually not used by the callers of namei. Thus, placeholder deadfs vnode vp_crossmp is introduced that is filled into ni_dvp. Idea by: ups Reviewed by: tegge, ups, jeff, rwatson (mac interaction) Tested by: Peter Holm MFC after: 2 weeks --- sys/fs/deadfs/dead_vnops.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) (limited to 'sys/fs/deadfs') diff --git a/sys/fs/deadfs/dead_vnops.c b/sys/fs/deadfs/dead_vnops.c index 1bbf453..a5c359f 100644 --- a/sys/fs/deadfs/dead_vnops.c +++ b/sys/fs/deadfs/dead_vnops.c @@ -49,6 +49,7 @@ static vop_poll_t dead_poll; static vop_read_t dead_read; static vop_write_t dead_write; static vop_getwritemount_t dead_getwritemount; +static vop_rename_t dead_rename; struct vop_vector dead_vnodeops = { .vop_default = &default_vnodeops, @@ -73,7 +74,7 @@ struct vop_vector dead_vnodeops = { .vop_readlink = VOP_EBADF, .vop_reclaim = VOP_NULL, .vop_remove = VOP_PANIC, - .vop_rename = VOP_PANIC, + .vop_rename = dead_rename, .vop_rmdir = VOP_PANIC, .vop_setattr = VOP_EBADF, .vop_symlink = VOP_PANIC, @@ -211,3 +212,25 @@ dead_poll(ap) { return (POLLHUP); } + +static int +dead_rename(ap) + struct vop_rename_args /* { + struct vnode *a_fdvp; + struct vnode *a_fvp; + struct componentname *a_fcnp; + struct vnode *a_tdvp; + struct vnode *a_tvp; + struct componentname *a_tcnp; + } */ *ap; +{ + if (ap->a_tvp) + vput(ap->a_tvp); + if (ap->a_tdvp == ap->a_tvp) + vrele(ap->a_tdvp); + else + vput(ap->a_tdvp); + vrele(ap->a_fdvp); + vrele(ap->a_fvp); + return (EXDEV); +} -- cgit v1.1