summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Btrfs: fix crash on close_ctree() if cleaner starts new transactionFilipe Manana2015-06-301-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Often when running fstests btrfs/079 I was running into the following trace during umount on one of my qemu/kvm test vms: [ 8245.682441] WARNING: CPU: 8 PID: 25064 at fs/btrfs/extent-tree.c:138 btrfs_put_block_group+0x51/0x69 [btrfs]() [ 8245.685039] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 acpi_cpufreq processor psmouse i2c_core thermal_sys parport evdev serio_raw button pcspkr microcode ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod virtio e1000 [last unloaded: btrfs] [ 8245.693860] CPU: 8 PID: 25064 Comm: umount Tainted: G W 4.1.0-rc5-btrfs-next-10+ #1 [ 8245.695081] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [ 8245.697583] 0000000000000009 ffff88020d047ce8 ffffffff8145eec7 ffffffff81095dce [ 8245.699234] 0000000000000000 ffff88020d047d28 ffffffff8104b399 0000000000000028 [ 8245.700995] ffffffffa04db07b ffff8801c6036c00 ffff8801c6036d68 ffff880202eb40b0 [ 8245.702510] Call Trace: [ 8245.703006] [<ffffffff8145eec7>] dump_stack+0x4f/0x7b [ 8245.705393] [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2 [ 8245.706569] [<ffffffff8104b399>] warn_slowpath_common+0xa1/0xbb [ 8245.707747] [<ffffffffa04db07b>] ? btrfs_put_block_group+0x51/0x69 [btrfs] [ 8245.709101] [<ffffffff8104b456>] warn_slowpath_null+0x1a/0x1c [ 8245.710274] [<ffffffffa04db07b>] btrfs_put_block_group+0x51/0x69 [btrfs] [ 8245.711823] [<ffffffffa04e3473>] btrfs_free_block_groups+0x145/0x322 [btrfs] [ 8245.713251] [<ffffffffa04ef31a>] close_ctree+0x1ef/0x325 [btrfs] [ 8245.714448] [<ffffffff8117d26e>] ? evict_inodes+0xdc/0xeb [ 8245.715539] [<ffffffffa04cb3ad>] btrfs_put_super+0x19/0x1b [btrfs] [ 8245.716835] [<ffffffff81167607>] generic_shutdown_super+0x73/0xef [ 8245.718015] [<ffffffff81167a3a>] kill_anon_super+0x13/0x1e [ 8245.719101] [<ffffffffa04cb1b6>] btrfs_kill_super+0x17/0x23 [btrfs] [ 8245.720316] [<ffffffff81167544>] deactivate_locked_super+0x3b/0x68 [ 8245.721517] [<ffffffff81167dd6>] deactivate_super+0x3f/0x43 [ 8245.722581] [<ffffffff8117fbb9>] cleanup_mnt+0x59/0x78 [ 8245.723538] [<ffffffff8117fc18>] __cleanup_mnt+0x12/0x14 [ 8245.724572] [<ffffffff81065371>] task_work_run+0x8f/0xbc [ 8245.725598] [<ffffffff810028fb>] do_notify_resume+0x45/0x53 [ 8245.726892] [<ffffffff814651ac>] int_signal+0x12/0x17 [ 8245.737887] ---[ end trace a01d038397e99b92 ]--- [ 8245.769363] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [ 8245.770737] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse parport_pc i2c_piix4 acpi_cpufreq processor psmouse i2c_core thermal_sys parport evdev serio_raw button pcspkr microcode ext4 crc16 jbd2 mbcache sg sr_mod cdrom sd_mod ata_generic virtio_scsi ata_piix libata floppy virtio_pci virtio_ring scsi_mod virtio e1000 [last unloaded: btrfs] [ 8245.772641] CPU: 2 PID: 25064 Comm: umount Tainted: G W 4.1.0-rc5-btrfs-next-10+ #1 [ 8245.772641] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [ 8245.772641] task: ffff880013005810 ti: ffff88020d044000 task.ti: ffff88020d044000 [ 8245.772641] RIP: 0010:[<ffffffffa051c8e6>] [<ffffffffa051c8e6>] btrfs_queue_work+0x2c/0x14d [btrfs] [ 8245.772641] RSP: 0018:ffff88020d0478b8 EFLAGS: 00010202 [ 8245.772641] RAX: 0000000000000004 RBX: 6b6b6b6b6b6b6b6b RCX: ffffffffa0581488 [ 8245.772641] RDX: 0000000000000000 RSI: ffff880194b7bf48 RDI: ffff880144b6a7a0 [ 8245.772641] RBP: ffff88020d0478d8 R08: 0000000000000000 R09: 000000000000ffff [ 8245.772641] R10: 0000000000000004 R11: 0000000000000005 R12: ffff880194b7bf48 [ 8245.772641] R13: ffff880194b7bf48 R14: 0000000000000410 R15: 0000000000000000 [ 8245.772641] FS: 00007f991e77d840(0000) GS:ffff88023e280000(0000) knlGS:0000000000000000 [ 8245.772641] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 8245.772641] CR2: 00007fbbd325ee68 CR3: 000000021de8e000 CR4: 00000000000006e0 [ 8245.772641] Stack: [ 8245.772641] ffff880194b7bf00 ffff880202eb4000 ffff880194b7bf48 0000000000000410 [ 8245.772641] ffff88020d047958 ffffffffa04ec6d5 ffff8801629b2ee8 0000000082987570 [ 8245.772641] 0000000000a5813f 0000000000000001 ffff880013006100 0000000000000002 [ 8245.772641] Call Trace: [ 8245.772641] [<ffffffffa04ec6d5>] btrfs_wq_submit_bio+0xe1/0x17b [btrfs] [ 8245.772641] [<ffffffff81086bff>] ? check_irq_usage+0x76/0x87 [ 8245.772641] [<ffffffffa04ec825>] btree_submit_bio_hook+0xb6/0xd9 [btrfs] [ 8245.772641] [<ffffffffa04ebb7c>] ? btree_csum_one_bio+0xad/0xad [btrfs] [ 8245.772641] [<ffffffffa04eb1a6>] ? btree_io_failed_hook+0x5e/0x5e [btrfs] [ 8245.772641] [<ffffffffa050a6e7>] submit_one_bio+0x8c/0xc7 [btrfs] [ 8245.772641] [<ffffffffa050d75b>] submit_extent_page.isra.18+0x9d/0x186 [btrfs] [ 8245.772641] [<ffffffffa050d95b>] write_one_eb+0x117/0x1ae [btrfs] [ 8245.772641] [<ffffffffa050a79b>] ? end_extent_buffer_writeback+0x21/0x21 [btrfs] [ 8245.772641] [<ffffffffa0510510>] btree_write_cache_pages+0x2ab/0x385 [btrfs] [ 8245.772641] [<ffffffffa04eb2b8>] btree_writepages+0x23/0x5c [btrfs] [ 8245.772641] [<ffffffff8111c661>] do_writepages+0x23/0x2c [ 8245.772641] [<ffffffff81189cd4>] __writeback_single_inode+0xda/0x5bd [ 8245.772641] [<ffffffff8118aa60>] ? writeback_single_inode+0x2b/0x173 [ 8245.772641] [<ffffffff8118aafd>] writeback_single_inode+0xc8/0x173 [ 8245.772641] [<ffffffff8118ac95>] write_inode_now+0x8a/0x95 [ 8245.772641] [<ffffffff81247bf0>] ? _atomic_dec_and_lock+0x30/0x4e [ 8245.772641] [<ffffffff8117cc5e>] iput+0x17d/0x26a [ 8245.772641] [<ffffffffa04ef355>] close_ctree+0x22a/0x325 [btrfs] [ 8245.772641] [<ffffffff8117d26e>] ? evict_inodes+0xdc/0xeb [ 8245.772641] [<ffffffffa04cb3ad>] btrfs_put_super+0x19/0x1b [btrfs] [ 8245.772641] [<ffffffff81167607>] generic_shutdown_super+0x73/0xef [ 8245.772641] [<ffffffff81167a3a>] kill_anon_super+0x13/0x1e [ 8245.772641] [<ffffffffa04cb1b6>] btrfs_kill_super+0x17/0x23 [btrfs] [ 8245.772641] [<ffffffff81167544>] deactivate_locked_super+0x3b/0x68 [ 8245.772641] [<ffffffff81167dd6>] deactivate_super+0x3f/0x43 [ 8245.772641] [<ffffffff8117fbb9>] cleanup_mnt+0x59/0x78 [ 8245.772641] [<ffffffff8117fc18>] __cleanup_mnt+0x12/0x14 [ 8245.772641] [<ffffffff81065371>] task_work_run+0x8f/0xbc [ 8245.772641] [<ffffffff810028fb>] do_notify_resume+0x45/0x53 [ 8245.772641] [<ffffffff814651ac>] int_signal+0x12/0x17 [ 8245.772641] Code: 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49 89 f4 48 8b 46 70 a8 04 74 09 48 8b 5f 08 48 85 db 75 03 48 8b 1f 49 89 5c 24 68 <83> 7b 5c ff 74 04 f0 ff 43 50 49 83 7c 24 08 00 74 2c 4c 8d 6b [ 8245.772641] RIP [<ffffffffa051c8e6>] btrfs_queue_work+0x2c/0x14d [btrfs] [ 8245.772641] RSP <ffff88020d0478b8> [ 8245.845040] ---[ end trace a01d038397e99b93 ]--- For logical reasons such as the phase of the moon, this happened more often with "-o inode_cache" than without any mount options. After some debugging it turned out to be simple to understand what was happening: 1) close_ctree() is called; 2) It then stops the transaction kthread, which commits the current transaction; 3) It asks the cleaner kthread to stop, which is currently running btrfs_delete_unused_bgs(); 4) btrfs_delete_unused_bgs() finds an unused block group, starts a new transaction, deletes the block group, which implies COWing some tree nodes and leafs and dirtying their respective pages, and then finally it ends the transaction it started, without committing it; 5) The cleaner kthread stops; 6) close_ctree() releases (from memory) the block group objects, which produces the warning in the trace pasted above; 7) Then it invalidates all pages of the btree inode, by calling invalidate_inode_pages2(), which waits for any pages under writeback, and releases any non-dirty pages; 8) All work queues are destroyed (waiting first for their current tasks to finish execution); 9) A final iput() is called against the btree inode; 10) This iput triggers a writeback of the btree inode because it still has dirty pages; 11) This starts the whole chain of callbacks for the btree inode until it eventually reaches btrfs_wq_submit_bio() where it leads to a NULL pointer dereference because the work queues were already destroyed. Fix this by making the cleaner commit any transaction that it started after the transaction kthread was stopped. Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
* Btrfs: fix race between caching kthread and returning inode to inode cacheFilipe Manana2015-06-301-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While the inode cache caching kthread is calling btrfs_unpin_free_ino(), we could have a concurrent call to btrfs_return_ino() that adds a new entry to the root's free space cache of pinned inodes. This concurrent call does not acquire the fs_info->commit_root_sem before adding a new entry if the caching state is BTRFS_CACHE_FINISHED, which is a problem because the caching kthread calls btrfs_unpin_free_ino() after setting the caching state to BTRFS_CACHE_FINISHED and therefore races with the task calling btrfs_return_ino(), which is adding a new entry, while the former (caching kthread) is navigating the cache's rbtree, removing and freeing nodes from the cache's rbtree without acquiring the spinlock that protects the rbtree. This race resulted in memory corruption due to double free of struct btrfs_free_space objects because both tasks can end up doing freeing the same objects. Note that adding a new entry can result in merging it with other entries in the cache, in which case those entries are freed. This is particularly important as btrfs_free_space structures are also used for the block group free space caches. This memory corruption can be detected by a debugging kernel, which reports it with the following trace: [132408.501148] slab error in verify_redzone_free(): cache `btrfs_free_space': double free detected [132408.505075] CPU: 15 PID: 12248 Comm: btrfs-ino-cache Tainted: G W 4.1.0-rc5-btrfs-next-10+ #1 [132408.505075] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [132408.505075] ffff880023e7d320 ffff880163d73cd8 ffffffff8145eec7 ffffffff81095dce [132408.505075] ffff880009735d40 ffff880163d73ce8 ffffffff81154e1e ffff880163d73d68 [132408.505075] ffffffff81155733 ffffffffa054a95a ffff8801b6099f00 ffffffffa0505b5f [132408.505075] Call Trace: [132408.505075] [<ffffffff8145eec7>] dump_stack+0x4f/0x7b [132408.505075] [<ffffffff81095dce>] ? console_unlock+0x356/0x3a2 [132408.505075] [<ffffffff81154e1e>] __slab_error.isra.28+0x25/0x36 [132408.505075] [<ffffffff81155733>] __cache_free+0xe2/0x4b6 [132408.505075] [<ffffffffa054a95a>] ? __btrfs_add_free_space+0x2f0/0x343 [btrfs] [132408.505075] [<ffffffffa0505b5f>] ? btrfs_unpin_free_ino+0x8e/0x99 [btrfs] [132408.505075] [<ffffffff810f3b30>] ? time_hardirqs_off+0x15/0x28 [132408.505075] [<ffffffff81084d42>] ? trace_hardirqs_off+0xd/0xf [132408.505075] [<ffffffff811563a1>] ? kfree+0xb6/0x14e [132408.505075] [<ffffffff811563d0>] kfree+0xe5/0x14e [132408.505075] [<ffffffffa0505b5f>] btrfs_unpin_free_ino+0x8e/0x99 [btrfs] [132408.505075] [<ffffffffa0505e08>] caching_kthread+0x29e/0x2d9 [btrfs] [132408.505075] [<ffffffffa0505b6a>] ? btrfs_unpin_free_ino+0x99/0x99 [btrfs] [132408.505075] [<ffffffff8106698f>] kthread+0xef/0xf7 [132408.505075] [<ffffffff810f3b08>] ? time_hardirqs_on+0x15/0x28 [132408.505075] [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad [132408.505075] [<ffffffff814653d2>] ret_from_fork+0x42/0x70 [132408.505075] [<ffffffff810668a0>] ? __kthread_parkme+0xad/0xad [132408.505075] ffff880023e7d320: redzone 1:0x9f911029d74e35b, redzone 2:0x9f911029d74e35b. [132409.501654] slab: double free detected in cache 'btrfs_free_space', objp ffff880023e7d320 [132409.503355] ------------[ cut here ]------------ [132409.504241] kernel BUG at mm/slab.c:2571! Therefore fix this by having btrfs_unpin_free_ino() acquire the lock that protects the rbtree while doing the searches and removing entries. Fixes: 1c70d8fb4dfa ("Btrfs: fix inode caching vs tree log") Cc: stable@vger.kernel.org Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
* Btrfs: use kmem_cache_free when freeing entry in inode cacheFilipe Manana2015-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | The free space entries are allocated using kmem_cache_zalloc(), through __btrfs_add_free_space(), therefore we should use kmem_cache_free() and not kfree() to avoid any confusion and any potential problem. Looking at the kfree() definition at mm/slab.c it has the following comment: /* * (...) * * Don't free memory not originally allocated by kmalloc() * or you will run into trouble. */ So better be safe and use kmem_cache_free(). Cc: stable@vger.kernel.org Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com>
* Btrfs: fix race between balance and unused block group deletionFilipe Manana2015-06-304-6/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a race between deleting an unused block group and balancing the same block group that leads to an assertion failure/BUG(), producing the following trace: [181631.208236] BTRFS: assertion failed: 0, file: fs/btrfs/volumes.c, line: 2622 [181631.220591] ------------[ cut here ]------------ [181631.222959] kernel BUG at fs/btrfs/ctree.h:4062! [181631.223932] invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC [181631.224566] Modules linked in: btrfs dm_flakey dm_mod crc32c_generic xor raid6_pq nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc loop fuse acpi_cpufreq parpor$ [181631.224566] CPU: 8 PID: 17451 Comm: btrfs Tainted: G W 4.1.0-rc5-btrfs-next-10+ #1 [181631.224566] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20150316_085822-nilsson.home.kraxel.org 04/01/2014 [181631.224566] task: ffff880127e09590 ti: ffff8800b5824000 task.ti: ffff8800b5824000 [181631.224566] RIP: 0010:[<ffffffffa03f19f6>] [<ffffffffa03f19f6>] assfail.constprop.50+0x1e/0x20 [btrfs] [181631.224566] RSP: 0018:ffff8800b5827ae8 EFLAGS: 00010246 [181631.224566] RAX: 0000000000000040 RBX: ffff8800109fc218 RCX: ffffffff81095dce [181631.224566] RDX: 0000000000005124 RSI: ffffffff81464819 RDI: 00000000ffffffff [181631.224566] RBP: ffff8800b5827ae8 R08: 0000000000000001 R09: 0000000000000000 [181631.224566] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8800109fc200 [181631.224566] R13: ffff880020095000 R14: ffff8800b1a13f38 R15: ffff880020095000 [181631.224566] FS: 00007f70ca0b0c80(0000) GS:ffff88013ec00000(0000) knlGS:0000000000000000 [181631.224566] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [181631.224566] CR2: 00007f2872ab6e68 CR3: 00000000a717c000 CR4: 00000000000006e0 [181631.224566] Stack: [181631.224566] ffff8800b5827ba8 ffffffffa03f3916 ffff8800b5827b38 ffffffffa03d080e [181631.224566] ffffffffa03d1423 ffff880020095000 ffff88001233c000 0000000000000001 [181631.224566] ffff880020095000 ffff8800b1a13f38 0000000a69c00000 0000000000000000 [181631.224566] Call Trace: [181631.224566] [<ffffffffa03f3916>] btrfs_remove_chunk+0xa4/0x6bb [btrfs] [181631.224566] [<ffffffffa03d080e>] ? join_transaction.isra.8+0xb9/0x3ba [btrfs] [181631.224566] [<ffffffffa03d1423>] ? wait_current_trans.isra.13+0x22/0xfc [btrfs] [181631.224566] [<ffffffffa03f3fbc>] btrfs_relocate_chunk.isra.29+0x8f/0xa7 [btrfs] [181631.224566] [<ffffffffa03f54df>] btrfs_balance+0xaa4/0xc52 [btrfs] [181631.224566] [<ffffffffa03fd388>] btrfs_ioctl_balance+0x23f/0x2b0 [btrfs] [181631.224566] [<ffffffff810872f9>] ? trace_hardirqs_on+0xd/0xf [181631.224566] [<ffffffffa04019a3>] btrfs_ioctl+0xfe2/0x2220 [btrfs] [181631.224566] [<ffffffff812603ed>] ? __this_cpu_preempt_check+0x13/0x15 [181631.224566] [<ffffffff81084669>] ? arch_local_irq_save+0x9/0xc [181631.224566] [<ffffffff81138def>] ? handle_mm_fault+0x834/0xcd2 [181631.224566] [<ffffffff81138def>] ? handle_mm_fault+0x834/0xcd2 [181631.224566] [<ffffffff8103e48c>] ? __do_page_fault+0x211/0x424 [181631.224566] [<ffffffff811755e6>] do_vfs_ioctl+0x3c6/0x479 (...) The sequence of steps leading to this are: CPU 0 CPU 1 btrfs_balance() btrfs_relocate_chunk() btrfs_relocate_block_group(bg X) btrfs_lookup_block_group(bg X) cleaner_kthread locks fs_info->cleaner_mutex btrfs_delete_unused_bgs() finds bg X, which became unused in the previous transaction checks bg X ->ro == 0, so it proceeds sets bg X ->ro to 1 (btrfs_set_block_group_ro(bg X)) blocks on fs_info->cleaner_mutex btrfs_remove_chunk(bg X) unlocks fs_info->cleaner_mutex acquires fs_info->cleaner_mutex relocate_block_group() --> does nothing, no extents found in the extent tree from bg X unlocks fs_info->cleaner_mutex btrfs_relocate_block_group(bg X) returns btrfs_remove_chunk(bg X) extent map not found --> ASSERT(0) Fix this by using a new mutex to make sure these 2 operations, block group relocation and removal, are serialized. This issue is reproducible by running fstests generic/038 (which stresses chunk allocation and automatic removal of unused block groups) together with the following balance loop: while true; do btrfs balance start -dusage=0 <mountpoint> ; done Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Chris Mason <clm@fb.com>
* btrfs: add error handling for scrub_workers_get()Zhao Lei2015-06-301-19/+20
| | | | | | | | | Although it is a rare case, we'd better free previous allocated memory on error. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* btrfs: cleanup noused initialization of dev in btrfs_end_bio()Zhao Lei2015-06-301-1/+1
| | | | | | | | | | | | It is introduced by: c404e0dc2c843b154f9a36c3aec10d0a715d88eb Btrfs: fix use-after-free in the finishing procedure of the device replace But seems no relationship with that bug, this patch revirt these code block for cleanup. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* btrfs: qgroup: allow user to clear the limitation on qgroupYang Dongsheng2015-06-301-8/+41
| | | | | | | | | | | | | Currently, we can only set a limitation on a qgroup, but we can not clear it. This patch provide a choice to user to clear a limitation on qgroup by passing a value of CLEAR_VALUE(-1) to kernel. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com> Tested-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* btrfs: delayed-ref: double free in btrfs_add_delayed_tree_ref()Dan Carpenter2015-06-241-9/+11
| | | | | | | | | There is a cut and paste error so instead of freeing "head_ref", we free "ref" twice. Fixes: 3368d001ba5d ('btrfs: qgroup: Record possible quota-related extent for qgroup.') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chris Mason <clm@fb.com>
* Merge branch 'sysfs-fsdevices-4.2-part1' of ↵Chris Mason2015-06-238-53/+239
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux into anand
| * Btrfs: Check if kobject is initialized before putAnand Jain2015-06-221-3/+5
| | | | | | | | | | | | Signed-off-by: Anand Jain <anand.jain@oracle.com> Tested-by: David Sterba <dsterba@suse.cz> Signed-off-by: David Sterba <dsterba@suse.cz>
| * lib: export symbol kobject_move()Anand Jain2015-06-191-0/+1
| | | | | | | | | | | | | | | | | | drivers/cpufreq/cpufreq.c is already using this function. And now btrfs needs it as well. Export symbol kobject_move(). Signed-off-by: Anand Jain <anand.jain@oracle.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: add support to show replacing target in the sysfsAnand Jain2015-06-192-1/+7
| | | | | | | | | | | | | | | | This patch will add support to show the replacing target in sysfs during the process of replacement. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: free the stale deviceAnand Jain2015-06-191-0/+61
| | | | | | | | | | | | | | | | | | When btrfs on a device is overwritten with a new btrfs (mkfs), the old btrfs instance in the kernel becomes stale. So with this patch, if kernel finds device is overwritten then delete the stale fsid/uuid. Signed-off-by: Anand Jain <anand.jain@oracle.com>
| * Btrfs: sysfs: don't fail seeding for the sake of sysfs kobject issueAnand Jain2015-05-271-1/+1
| | | | | | | | | | Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: add support to add parent for fsidAnand Jain2015-05-271-2/+2
| | | | | | | | | | | | | | | | | | To support seed sysfs layout and represent seed fsid under the sprout we need the facility to create fsid under the specified parent. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: separate kobject and attribute creationAnand Jain2015-05-272-14/+19
| | | | | | | | | | Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: btrfs_sysfs_remove_fsid() make it non staticAnand Jain2015-05-272-1/+2
| | | | | | | | | | Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: make btrfs_sysfs_add_device() non staticAnand Jain2015-05-271-0/+1
| | | | | | | | | | Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: make btrfs_sysfs_add_fsid() non staticAnand Jain2015-05-272-1/+3
| | | | | | | | | | Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs btrfs_kobj_rm_device() pass fs_devices instead of fs_infoAnand Jain2015-05-274-10/+10
| | | | | | | | | | | | | | since btrfs_kobj_rm_device() does nothing with fs_info Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs btrfs_kobj_add_device() pass fs_devices instead of fs_infoAnand Jain2015-05-274-7/+6
| | | | | | | | | | | | | | btrfs_kobj_add_device() does not need fs_info any more. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: provide framework to remove all fsid sysfs kobjectAnand Jain2015-05-271-1/+16
| | | | | | | | | | | | | | Just a helper function to clean up the sysfs fsid kobjects. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: add pointer to access fs_info from fs_devicesAnand Jain2015-05-273-0/+25
| | | | | | | | | | | | | | adds fs_info pointer with struct btrfs_fs_devices. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: introduce btrfs_get_fs_uuids to get fs_uuidsAnand Jain2015-05-272-0/+5
| | | | | | | | | | Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: move super_kobj and device_dir_kobj from fs_info to ↵Anand Jain2015-05-274-43/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_fs_devices This patch will provide a framework and help to create attributes from the structure btrfs_fs_devices which are available even before fs_info is created. So by moving the parent kobject super_kobj from fs_info to btrfs_fs_devices, it will help to create attributes from the btrfs_fs_devices as well. Patches on top of this patch now will be able to create the sys/fs/btrfs/fsid kobject and attributes from btrfs_fs_devices when devices are scanned and registered to the kernel. Just to note, this does not change any of the existing btrfs sysfs external kobject names and its attributes and not even the life cycle of them. Changes are internal only. And to ensure the same, this path has been tested with various device operations and, checking and comparing the sysfs kobjects and attributes with sysfs kobject and attributes with out this patch, and they remain same. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: separate device kobject and its attribute creationAnand Jain2015-05-271-6/+15
| | | | | | | | | | | | | | | | Separate device kobject and its attribute creation so that device kobject can be created from the device discovery thread. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: let default_attrs be separate from the ksetAnand Jain2015-05-271-4/+8
| | | | | | | | | | | | | | | | | | | | | | As of now btrfs_attrs are provided using the default_attrs through the kset. Separate them and create the default_attrs using the sysfs_create_files instead. By doing this we will have the flexibility that device discovery thread could create fsid kobject. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: introduce function btrfs_sysfs_add_fsid() to create sysfs fsidAnand Jain2015-05-271-1/+14
| | | | | | | | | | | | | | | | We need it in a seperate function so that it can be called from the device discovery thread as well. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: rename __btrfs_sysfs_remove_one to btrfs_sysfs_remove_fsidAnand Jain2015-05-271-4/+4
| | | | | | | | | | Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: reorder the kobject creationsAnand Jain2015-05-271-10/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of now the order in which the kobjects are created at btrfs_sysfs_add_one() is.. fsid features unknown features (dynamic features) devices. Since we would move fsid and device kobject to fs_devices from fs_info structure, this patch will reorder in which the kobjects are created as below. fsid devices features unknown features (dynamic features) And hence the btrfs_sysfs_remove_one() will follow the same in reverse order. and the device kobject destroy now can be moved into the function __btrfs_sysfs_remove_one() Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfc: sysfs: fix, check if device_dir_kobj is init before destroyAnand Jain2015-05-271-4/+6
| | | | | | | | | | | | | | | | | | Since the failure code in the btrfs_sysfs_add_one() can call btrfs_sysfs_remove_one() even before device_dir_kobj has been created we need to check if its null. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: fix, kobject pointer clean up needed after kobject releaseAnand Jain2015-05-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The sysfs clean up self test like in the below code fails, since fs_info->device_dir_kobject still points to its stale kobject. Reseting this pointer will help to fix this. open_ctree() { ret = btrfs_sysfs_add_one(fs_info); :: + btrfs_sysfs_remove_one(fs_info); + ret = btrfs_sysfs_add_one(fs_info); + if (ret) { + pr_err("BTRFS: failed to init sysfs interface: %d\n", ret); + goto fail_block_groups; + } Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: fix, undo sysfs device linksAnand Jain2015-05-271-0/+17
| | | | | | | | | | | | | | | | Theoritically need to remove the device links attributes, but since its entire device kobject was removed, so there wasn't any issue of about it. Just do it nicely. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: fix, fs_info kobject_unregister has init_completion() twiceAnand Jain2015-05-271-1/+0
| | | | | | | | | | | | | | | | | | kobject_unregister is to handle the release of the kobject, its completion init is being called in btrfs_sysfs_add_one(), so we don't have to do the same in the open_ctree() again. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
| * Btrfs: sysfs: fix, btrfs_release_super_kobj() should to clean up the kobject ↵Anand Jain2015-05-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | data The following test case fails indicating that, thread tried to init an initialized object. kernel: [232104.016513] kobject (ffff880006c1c980): tried to init an initialized object, something is seriously wrong. btrfs_sysfs_remove_one() self test code: open_tree() { :: ret = btrfs_sysfs_add_one(fs_info); if (ret) { pr_err("BTRFS: failed to init sysfs interface: %d\n", ret); goto fail_block_groups; } + btrfs_sysfs_remove_one(fs_info); + ret = btrfs_sysfs_add_one(fs_info); + if (ret) { + pr_err("BTRFS: failed to init sysfs interface: %d\n", ret); + goto fail_block_groups; + } cleaning up the unregistered kobject fixes this. Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.cz>
* | Btrfs: use received_uuid of parent during sendJosef Bacik2015-06-121-4/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Neil Horman pointed out a problem where if he did something like this receive A snap A B change B send -p A B and then on another box do recieve A receive B the receive B would fail because we use the UUID of A for the clone sources for B. This makes sense most of the time because normally you are sending from the original sources, not a received source. However when you use a recieved subvol its UUID is going to be something completely different, so if you then try to receive the diff on a different volume it won't find the UUID because the new A will be something else. The only constant is the received uuid. So instead check to see if we have received_uuid set on the root, and if so use that as the clone source, as btrfs receive looks for matches either in received_uuid or uuid. Thanks, Reported-by: Neil Horman <nhorman@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Reviewed-by: Hugo Mills <hugo@carfax.org.uk> Signed-off-by: Chris Mason <clm@fb.com>
* | Btrfs: fix use-after-free in btrfs_replay_logLiu Bo2015-06-121-1/+2
| | | | | | | | | | | | | | | | | | @log_root_tree should not be referenced after kfree. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Reviewed-by: David Sterba <dsterba@suse.cz> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: wait for delayed iputs on no spaceZhao Lei2015-06-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs will report no_space when we run following write and delete file loop: # FILE_SIZE_M=[ 75% of fs space ] # DEV=[ some dev ] # MNT=[ some dir ] # # mkfs.btrfs -f "$DEV" # mount -o nodatacow "$DEV" "$MNT" # for ((i = 0; i < 100; i++)); do dd if=/dev/zero of="$MNT"/file0 bs=1M count="$FILE_SIZE_M"; rm -f "$MNT"/file0; done # Reason: iput() and evict() is run after write pages to block device, if write pages work is not finished before next write, the "rm"ed space is not freed, and caused above bug. Fix: We can add "-o flushoncommit" mount option to avoid above bug, but it have performance problem. Actually, we can to wait for on-the-fly writes only when no-space happened, it is which this patch do. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: qgroup: Make snapshot accounting work with new extent-orientedQu Wenruo2015-06-101-20/+33
| | | | | | | | | | | | | | | | | | | | qgroup. Make snapshot accounting work with new extent-oriented mechanism by skipping given root in new/old_roots in create_pending_snapshot(). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: qgroup: Add the ability to skip given qgroup for old/new_roots.Qu Wenruo2015-06-104-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | This is used by later qgroup fix patches for snapshot. As current snapshot accounting is done by btrfs_qgroup_inherit(), but new extent oriented quota mechanism will account extent from btrfs_copy_root() and other snapshot things, causing wrong result. So add this ability to handle snapshot accounting. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: ulist: Add ulist_del() function.Qu Wenruo2015-06-102-11/+37
| | | | | | | | | | | | | | | | | | | | | | This function will delete unode with given (val,aux) pair. And with this patch, seqnum for debug usage doesn't have any meaning now, so remove them. This is used by later patches to skip snapshot root. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: qgroup: Cleanup the old ref_node-oriented mechanism.Qu Wenruo2015-06-105-972/+3
| | | | | | | | | | | | | | Goodbye, the old mechanisim. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: qgroup: Switch self test to extent-oriented qgroup mechanism.Qu Wenruo2015-06-103-27/+89
| | | | | | | | | | | | | | | | Since the self test transaction don't have delayed_ref_roots, so use find_all_roots() and export btrfs_qgroup_account_extent() to simulate it Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: qgroup: Switch to new extent-oriented qgroup mechanism.Qu Wenruo2015-06-102-100/+28
| | | | | | | | | | | | | | | | | | | | | | | | Switch from old ref_node based qgroup to extent based qgroup mechanism for normal operations. The new mechanism should hugely reduce the overhead of btrfs quota system, and further more, the codes and logic should be more clean and easier to maintain. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: qgroup: Switch rescan to new mechanism.Qu Wenruo2015-06-101-36/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Switch rescan to use the new new extent oriented mechanism. As rescan is also based on extent, new mechanism is just a perfect match for rescan. With re-designed internal functions, rescan is quite easy, just call btrfs_find_all_roots() and then btrfs_qgroup_account_one_extent(). Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: qgroup: Add new qgroup calculation functionQu Wenruo2015-06-103-0/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_qgroup_account_extents(). The new btrfs_qgroup_account_extents() function should be called in btrfs_commit_transaction() and it will update all the qgroup according to delayed_ref_root->dirty_extent_root. The new function can handle both normal operation during commit_transaction() or in rescan in a unified method with clearer logic. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: backref: Add special time_seq == (u64)-1 case forQu Wenruo2015-06-101-6/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | btrfs_find_all_roots(). Allow btrfs_find_all_roots() to skip all delayed_ref_head lock and tree lock to do tree search. This is important for later qgroup implement which will call find_all_roots() after fs trees are committed. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: qgroup: Add new function to record old_roots.Qu Wenruo2015-06-102-0/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add function btrfs_qgroup_prepare_account_extents() to get old_roots which are needed for qgroup. We do it in commit_transaction() and before switch_roots(), and only search commit_root, so it gives a quite accurate view for previous transaction. With old_roots from previous transaction, we can use it to do accurate account with current transaction. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: qgroup: Record possible quota-related extent for qgroup.Qu Wenruo2015-06-105-7/+95
| | | | | | | | | | | | | | | | | | Add hook in add_delayed_ref_head() to record quota-related extent record into delayed_ref_root->dirty_extent_record rb-tree for later qgroup accounting. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
* | btrfs: qgroup: Add function qgroup_update_counters().Qu Wenruo2015-06-101-0/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | Add function qgroup_update_counters(), which will update related qgroups' rfer/excl according to old/new_roots. This is one of the two core functions for the new qgroup implement. This is based on btrfs_adjust_coutners() but with clearer logic and comment. Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com>
OpenPOWER on IntegriCloud