summaryrefslogtreecommitdiffstats
path: root/fs/btrfs/volumes.c
Commit message (Collapse)AuthorAgeFilesLines
* Remove Btrfs compat code for older kernelsChris Mason2008-09-251-20/+0
| | | | | | | | Btrfs had compatibility code for kernels back to 2.6.18. These have been removed, and will be maintained in a separate backport git tree from now on. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: free space accounting redoJosef Bacik2008-09-251-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1) replace the per fs_info extent_io_tree that tracked free space with two rb-trees per block group to track free space areas via offset and size. The reason to do this is because most allocations come with a hint byte where to start, so we can usually find a chunk of free space at that hint byte to satisfy the allocation and get good space packing. If we cannot find free space at or after the given offset we fall back on looking for a chunk of the given size as close to that given offset as possible. When we fall back on the size search we also try to find a slot as close to the size we want as possible, to avoid breaking small chunks off of huge areas if possible. 2) remove the extent_io_tree that tracked the block group cache from fs_info and replaced it with an rb-tree thats tracks block group cache via offset. also added a per space_info list that tracks the block group cache for the particular space so we can lookup related block groups easily. 3) cleaned up the allocation code to make it a little easier to read and a little less complicated. Basically there are 3 steps, first look from our provided hint. If we couldn't find from that given hint, start back at our original search start and look for space from there. If that fails try to allocate space if we can and start looking again. If not we're screwed and need to start over again. 4) small fixes. there were some issues in volumes.c where we wouldn't allocate the rest of the disk. fixed cow_file_range to actually pass the alloc_hint, which has helped a good bit in making the fs_mark test I run have semi-normal results as we run out of space. Generally with data allocations we don't track where we last allocated from, so everytime we did a data allocation we'd search through every block group that we have looking for free space. Now searching a block group with no free space isn't terribly time consuming, it was causing a slight degradation as we got more data block groups. The alloc_hint has fixed this slight degredation and made things semi-normal. There is still one nagging problem I'm working on where we will get ENOSPC when there is definitely plenty of space. This only happens with metadata allocations, and only when we are almost full. So you generally hit the 85% mark first, but sometimes you'll hit the BUG before you hit the 85% wall. I'm still tracking it down, but until then this seems to be pretty stable and make a significant performance gain. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: properly set blocksize when adding new device.Zheng Yan2008-09-251-0/+2
| | | | | | --- Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add debugging checks to track down corrupted metadataChris Mason2008-09-251-19/+21
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Throttle for async bio submits higher up the chainChris Mason2008-09-251-6/+0
| | | | | | | | | | | | | | | The current code waits for the count of async bio submits to get below a given threshold if it is too high right after adding the latest bio to the work queue. This isn't optimal because the caller may have sequential adjacent bios pending they are waiting to send down the pipe. This changeset requires the caller to wait on the async bio count, and changes the async checksumming submits to wait for async bios any time they self throttle. The end result is much higher sequential throughput. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Wait for async bio submissions to make some progress at queue timeChris Mason2008-09-251-1/+17
| | | | | | | | Before, the btrfs bdi congestion function was used to test for too many async bios. This keeps that check to throttle pdflush, but also adds a check while queuing bios. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Count async bios separately from async checksum work itemsChris Mason2008-09-251-3/+3
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix the multi-bio code to save the original bio for completionChris Mason2008-09-251-1/+10
| | | | | | | | | | | | | | | | The multi-bio code is responsible for duplicating blocks in raid1 and single spindle duplication. It has counters to make sure all of the locations for a given extent are properly written before io completion is returned to the higher layers. But, it didn't always complete the same bio it was given, sometimes a clone was completed instead. This lead to problems with the async work queues because they saved a pointer to the bio in a struct off bi_private. The fix is to remember the original bio and only complete that one. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Hold a reference on bios during submit_bio, add some extra bio checksChris Mason2008-09-251-1/+9
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: implement memory reclaim for leaf reference cacheYan2008-09-251-1/+0
| | | | | | | | | | | | | | The memory reclaiming issue happens when snapshot exists. In that case, some cache entries may not be used during old snapshot dropping, so they will remain in the cache until umount. The patch adds a field to struct btrfs_leaf_ref to record create time. Besides, the patch makes all dead roots of a given snapshot linked together in order of create time. After a old snapshot was completely dropped, we check the dead root list and remove all cache entries created before the oldest dead root in the list. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add locking around volume management (device add/remove/balance)Chris Mason2008-09-251-14/+44
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Replace the transaction work queue with kthreadsChris Mason2008-09-251-4/+8
| | | | | | | This creates one kthread for commits and one kthread for deleting old snapshots. All the work queues are removed. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Replace the big fs_mutex with a collection of other locksChris Mason2008-09-251-6/+13
| | | | | | | | Extent alloctions are still protected by a large alloc_mutex. Objectid allocations are covered by a objectid mutex Other btree operations are protected by a lock on individual btree nodes Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add a thread pool just for submit_bioChris Mason2008-09-251-1/+2
| | | | | | | | If a bio submission is after a lock holder waiting for the bio on the work queue, it is possible to deadlock. Move the bios into their own pool. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add async worker threads for pre and post IO checksummingChris Mason2008-09-251-5/+157
| | | | | | | | | | | | | | | | | | | | Btrfs has been using workqueues to spread the checksumming load across other CPUs in the system. But, workqueues only schedule work on the same CPU that queued the work, giving them a limited benefit for systems with higher CPU counts. This code adds a generic facility to schedule work with pools of kthreads, and changes the bio submission code to queue bios up. The queueing is important to make sure large numbers of procs on the system don't turn streaming workloads into random workloads by sending IO down concurrently. The end result of all of this is much higher performance (and CPU usage) when doing checksumming on large machines. Two worker pools are created, one for writes and one for endio processing. The two could deadlock if we tried to service both from a single pool. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Allocator fix variety packChris Mason2008-09-251-8/+4
| | | | | | | | | | | | | | * Force chunk allocation when find_free_extent has to do a full scan * Record the max key at the start of defrag so it doesn't run forever * Block groups might not be contiguous, make a forward search for the next block group in extent-tree.c * Get rid of extra checks for total fs size * Fix relocate_one_reference to avoid relocating the same file data block twice when referenced by an older transaction * Use the open device count when allocating chunks so that we don't try to allocate from devices that don't exist Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Use kzalloc on the fs_devices allocationChris Mason2008-09-251-2/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Handle transid == 0 while opening devicesChris Mason2008-09-251-1/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Fix btrfs_open_devices to deal with changes since the scan ioctlsChris Mason2008-09-251-11/+59
| | | | | | | Devices can change after the scan ioctls are done, and btrfs_open_devices needs to be able to verify them as they are opened and used by the FS. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add mount -o degraded to allow mounts to continue with missing devicesChris Mason2008-09-251-78/+201
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Handle write errors on raid1 and raid10Chris Mason2008-09-251-3/+8
| | | | | | | | | | | | When duplicate copies exist, writes are allowed to fail to one of those copies. This changeset includes a few changes that allow the FS to continue even when some IOs fail. It also adds verification of the parent generation number for btree blocks. This generation is stored in the pointer to a block, and it ensures that missed writes to are detected. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Chunk relocation fine tuning, and add a few printks to show progressChris Mason2008-09-251-0/+2
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Only open block devices once during mount -o subvol=Chris Mason2008-09-251-0/+3
| | | | | | | btrfs_open_devices needed a check to see if the device was already open. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add support for online device removalChris Mason2008-09-251-9/+212
| | | | | | | | | | | | | This required a few structural changes to the code that manages bdev pointers: The VFS super block now gets an anon-bdev instead of a pointer to the lowest bdev. This allows us to avoid swapping the super block bdev pointer around at run time. The code to read in the super block no longer goes through the extent buffer interface. Things got ugly keeping the mapping constant. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Compile warning fixup in volume.cChris Mason2008-09-251-1/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Tune stripe selection for raid1 and raid10Chris Mason2008-09-251-10/+7
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Deal with failed writes in mirrored configurationsChris Mason2008-09-251-3/+14
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Drop some verbose printksChris Mason2008-09-251-2/+0
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add balance ioctl to restripe the chunksChris Mason2008-09-251-9/+106
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add new ioctl to add devicesChris Mason2008-09-251-0/+75
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Make the resizer work based on shrinking and growing devicesChris Mason2008-09-251-12/+312
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add failure handling for read_sys_arrayChris Mason2008-09-251-7/+9
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Fix btrfs_get_extent and get_block corner cases, and disable O_DIRECT readsChris Mason2008-09-251-1/+1
| | | | | | | The generic O_DIRECT code assumes all the bios have the same bdev, which isn't true for multi-device btrfs. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add a special device list for chunk allocationsChris Mason2008-09-251-5/+10
| | | | | | | This allows other code that needs to walk every device in the FS to do so without locking against allocations. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Simplify device selection for mirrored readsChris Mason2008-09-251-16/+7
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Make an unplug function that doesn't unplug every spindleChris Mason2008-09-251-22/+57
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add 1MB to the min_free in alloc_chunkChris Mason2008-09-251-0/+3
| | | | | | This properly reflects the first 1MB we skip at the start of the device Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Fix chunk allocation when some devices don't have enough room for stripesChris Mason2008-09-251-16/+29
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Calculate appropriate chunk sizes for both small and large filesystemsChris Mason2008-09-251-7/+61
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add support for labels in the super blockChris Mason2008-09-251-8/+9
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Check device uuids along with devidsChris Mason2008-09-251-7/+23
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Avoid 64 bit div for RAID10Chris Mason2008-09-251-1/+1
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Use the extent map cache to find the logical disk block during data ↵Chris Mason2008-09-251-0/+3
| | | | | | | | | | | | | | | | | | | retries The data read retry code needs to find the logical disk block before it can resubmit new bios. But, finding this block isn't allowed to take the fs_mutex because that will deadlock with a number of different callers. This changes the retry code to use the extent map cache instead, but that requires the extent map cache to have the extent we're looking for. This is a problem because btrfs_drop_extent_cache just drops the entire extent instead of the little tiny part it is invalidating. The bulk of the code in this patch changes btrfs_drop_extent_cache to invalidate only a portion of the extent cache, and changes btrfs_get_extent to deal with the results. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add RAID10 supportChris Mason2008-09-251-5/+41
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Add chunk uuids and update multi-device back referencesChris Mason2008-09-251-26/+50
| | | | | | | | | | | | | | | | | | | | Block headers now store the chunk tree uuid Chunk items records the device uuid for each stripes Device extent items record better back refs to the chunk tree Block groups record better back refs to the chunk tree The chunk tree format has also changed. The objectid of BTRFS_CHUNK_ITEM_KEY used to be the logical offset of the chunk. Now it is a chunk tree id, with the logical offset being stored in the offset field of the key. This allows a single chunk tree to record multiple logical address spaces, upping the number of bytes indexed by a chunk tree from 2^64 to 2^128. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: A few updates for 2.6.18 and versions older than 2.6.25Chris Mason2008-09-251-8/+7
| | | | | | | This includes fixing a missing spinlock init call that caused oops on mount for most kernels other than 2.6.25. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: bio_endio support for linux 2.6.23 and older.Miguel2008-09-251-0/+4
| | | | | | | bio_endio() changed prototype on linux 2.6.24, support older kernels using the older prototype. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Write out all super blocks on commit, and bring back proper barrier ↵Chris Mason2008-09-251-3/+5
| | | | | | support Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Btrfs: Retry metadata reads in the face of checksum failuresChris Mason2008-09-251-4/+35
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
* Change btrfs_map_block to return a structure with mappings for all stripesChris Mason2008-09-251-60/+75
| | | | Signed-off-by: Chris Mason <chris.mason@oracle.com>
OpenPOWER on IntegriCloud