Btrfs: fix race between device replace and read repair

While we are finishing a device replace operation we can have a concurrent task trying to do a read repair operation, in which case it will call btrfs_map_block() to get a struct btrfs_bio which can have a stripe that points to the source device of the device replace operation. This allows for the read repair task to dereference the stripe's device pointer after the device replace operation has freed the source device, resulting in an invalid memory access. This is similar to the problem solved by my previous patch in the same series and named "Btrfs: fix race between device replace and discard". So fix this by surrounding the call to btrfs_map_block() and the code that uses the returned struct btrfs_bio with calls to btrfs_bio_counter_inc_blocked() and btrfs_bio_counter_dec(), giving the proper serialization with the finishing phase of the device replace operation. Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <jbacik@fb.com>
author: Filipe Manana <fdmanana@suse.com> 2016-05-27 22:21:27 +0100
committer: Filipe Manana <fdmanana@suse.com> 2016-05-31 01:00:03 +0100
commit: b5de8d0df80fa87f1f97fbcc4bbc8cad0a018802 (patch)
tree: de3fd8d94140b60fb49bdcfc675c80ee069aec26 /fs
parent: 2999241daa8d77947f108dfbde35c268cd7bd709 (diff)
download: op-kernel-dev-b5de8d0df80fa87f1f97fbcc4bbc8cad0a018802.zip
op-kernel-dev-b5de8d0df80fa87f1f97fbcc4bbc8cad0a018802.tar.gz
1 files changed, 10 insertions, 0 deletions
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3cd5782..6e953de 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2025,9 +2025,16 @@ int repair_io_failure(struct inode *inode, u64 start, u64 length, u64 logical,
 	bio->bi_iter.bi_size = 0;
 	map_length = length;
 
+	/*
+	 * Avoid races with device replace and make sure our bbio has devices
+	 * associated to its stripes that don't go away while we are doing the
+	 * read repair operation.
+	 */
+	btrfs_bio_counter_inc_blocked(fs_info);
 	ret = btrfs_map_block(fs_info, WRITE, logical,
 			      &map_length, &bbio, mirror_num);
 	if (ret) {
+		btrfs_bio_counter_dec(fs_info);
 		bio_put(bio);
 		return -EIO;
 	}
@@ -2037,6 +2044,7 @@ int repair_io_failure(struct inode *inode, u64 start, u64 length, u64 logical,
 	dev = bbio->stripes[mirror_num-1].dev;
 	btrfs_put_bbio(bbio);
 	if (!dev || !dev->bdev || !dev->writeable) {
+		btrfs_bio_counter_dec(fs_info);
 		bio_put(bio);
 		return -EIO;
 	}
@@ -2045,6 +2053,7 @@ int repair_io_failure(struct inode *inode, u64 start, u64 length, u64 logical,
 
 	if (btrfsic_submit_bio_wait(WRITE_SYNC, bio)) {
 		/* try to remap that extent elsewhere? */
+		btrfs_bio_counter_dec(fs_info);
 		bio_put(bio);
 		btrfs_dev_stat_inc_and_print(dev, BTRFS_DEV_STAT_WRITE_ERRS);
 		return -EIO;
@@ -2054,6 +2063,7 @@ int repair_io_failure(struct inode *inode, u64 start, u64 length, u64 logical,
 		"read error corrected: ino %llu off %llu (dev %s sector %llu)",
 				  btrfs_ino(inode), start,
 				  rcu_str_deref(dev->name), sector);
+	btrfs_bio_counter_dec(fs_info);
 	bio_put(bio);
 	return 0;
 }
author	Filipe Manana <fdmanana@suse.com>	2016-05-27 22:21:27 +0100
committer	Filipe Manana <fdmanana@suse.com>	2016-05-31 01:00:03 +0100
commit	b5de8d0df80fa87f1f97fbcc4bbc8cad0a018802 (patch)
tree	de3fd8d94140b60fb49bdcfc675c80ee069aec26 /fs
parent	2999241daa8d77947f108dfbde35c268cd7bd709 (diff)
download	op-kernel-dev-b5de8d0df80fa87f1f97fbcc4bbc8cad0a018802.zip op-kernel-dev-b5de8d0df80fa87f1f97fbcc4bbc8cad0a018802.tar.gz