diff options
author | Philipp Reisner <philipp.reisner@linbit.com> | 2014-11-10 17:21:11 +0100 |
---|---|---|
committer | Jens Axboe <axboe@fb.com> | 2014-11-10 09:27:35 -0700 |
commit | a88215312c5ed74697973f6c9f0fce718bcf18ad (patch) | |
tree | 831cef10aa7728cff1fe109ba5f6ee8dad4e3225 /drivers/block/drbd/drbd_nl.c | |
parent | f221f4bcc5f40e2967e4596ef167bdbc987c8e9d (diff) | |
download | op-kernel-dev-a88215312c5ed74697973f6c9f0fce718bcf18ad.zip op-kernel-dev-a88215312c5ed74697973f6c9f0fce718bcf18ad.tar.gz |
drbd: fix race between role change and handshake
Symptoms:
If DRBD was "cleanly shut down" (all in sync, both Secondary before
disconnect, identical data generation uuids), and then one side was
promoted *during* the next connection handshake, the role change
could confuse the handshake.
The Primary would get stuck in WFBitmapS, the Secondary would log
unexpected cstate (Connected) in receive_bitmap
and get stuck in WFBitmapT.
Fix:
The test in is_valid_soft_transition wrong. It works because
the not allowed actions (promote/attach) do not touch the
cstate. The previous condition failed to demand a cstate change
in one clause.
In order to avoid deadlocks give up the state_mutex while waiting
for the transient state to go away.
Conflicts:
drbd/drbd_state.c
drbd/drbd_state.h
drbd/drbd_wrappers.h
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Diffstat (limited to 'drivers/block/drbd/drbd_nl.c')
-rw-r--r-- | drivers/block/drbd/drbd_nl.c | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c index 4782d07..74df8cf 100644 --- a/drivers/block/drbd/drbd_nl.c +++ b/drivers/block/drbd/drbd_nl.c @@ -588,7 +588,7 @@ drbd_set_role(struct drbd_device *const device, enum drbd_role new_role, int for val.i = 0; val.role = new_role; while (try++ < max_tries) { - rv = _drbd_request_state(device, mask, val, CS_WAIT_COMPLETE); + rv = _drbd_request_state_holding_state_mutex(device, mask, val, CS_WAIT_COMPLETE); /* in case we first succeeded to outdate, * but now suddenly could establish a connection */ |