Btrfs: prevent RAID level downgrades when space is low

The extent allocator has code that allows us to fill allocations from any available block group, even if it doesn't match the raid level we've requested. This was put in because adding a new drive to a filesystem made with the default mkfs options actually upgrades the metadata from single spindle dup to full RAID1. But, the code also allows us to allocate from a raid0 chunk when we really want a raid1 or raid10 chunk. This can cause big trouble because mkfs creates a small (4MB) raid0 chunk for data and metadata which then goes unused for raid1/raid10 installs. The allocator will happily wander in and allocate from that chunk when things get tight, which is not correct. The fix here is to make sure that we provide duplication when the caller has asked for it. It does all the dups to be any raid level, which preserves the dup->raid1 upgrade abilities. Signed-off-by: Chris Mason <chris.mason@oracle.com>
author: Chris Mason <chris.mason@oracle.com> 2010-12-13 15:06:46 -0500
committer: Chris Mason <chris.mason@oracle.com> 2010-12-13 20:07:01 -0500
commit: 83a50de97fe96aca82389e061862ed760ece2283 (patch)
tree: 95421594f180c32cca1ff7f6881f4cf272cf2b5c /fs
parent: cd02dca56442e1504fd6bc5b96f7f1870162b266 (diff)
download: op-kernel-dev-83a50de97fe96aca82389e061862ed760ece2283.zip
op-kernel-dev-83a50de97fe96aca82389e061862ed760ece2283.tar.gz
1 files changed, 19 insertions, 1 deletions
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 4be231e..7e5162e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -4943,6 +4943,25 @@ search:
 		btrfs_get_block_group(block_group);
 		search_start = block_group->key.objectid;
 
+		/*
+		 * this can happen if we end up cycling through all the
+		 * raid types, but we want to make sure we only allocate
+		 * for the proper type.
+		 */
+		if (!block_group_bits(block_group, data)) {
+		    u64 extra = BTRFS_BLOCK_GROUP_DUP |
+				BTRFS_BLOCK_GROUP_RAID1 |
+				BTRFS_BLOCK_GROUP_RAID10;
+
+			/*
+			 * if they asked for extra copies and this block group
+			 * doesn't provide them, bail.  This does allow us to
+			 * fill raid0 from raid1.
+			 */
+			if ((data & extra) && !(block_group->flags & extra))
+				goto loop;
+		}
+
 have_block_group:
 		if (unlikely(block_group->cached == BTRFS_CACHE_NO)) {
 			u64 free_percent;
@@ -8273,7 +8292,6 @@ int btrfs_read_block_groups(struct btrfs_root *root)
 			break;
 		if (ret != 0)
 			goto error;
-
 		leaf = path->nodes[0];
 		btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
 		cache = kzalloc(sizeof(*cache), GFP_NOFS);
author	Chris Mason <chris.mason@oracle.com>	2010-12-13 15:06:46 -0500
committer	Chris Mason <chris.mason@oracle.com>	2010-12-13 20:07:01 -0500
commit	83a50de97fe96aca82389e061862ed760ece2283 (patch)
tree	95421594f180c32cca1ff7f6881f4cf272cf2b5c /fs
parent	cd02dca56442e1504fd6bc5b96f7f1870162b266 (diff)
download	op-kernel-dev-83a50de97fe96aca82389e061862ed760ece2283.zip op-kernel-dev-83a50de97fe96aca82389e061862ed760ece2283.tar.gz