summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'irqchip/urgent-gic' into irqchip/urgentJason Cooper2015-03-292-9/+65
|\
| * irqchip: gicv3-its: Use non-cacheable accesses when no shareabilityMarc Zyngier2015-03-292-4/+47
| | | | | | | | | | | | | | | | | | | | If the ITS or the redistributors report their shareability as zero, then it is important to make sure they will no generate any cacheable traffic, as this is unlikely to produce the expected result. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1427465705-17126-5-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Fix PROP/PEND and BASE/CBASE confusionMarc Zyngier2015-03-292-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ITS driver sometime mixes up the use of GICR_PROPBASE bitfields for the GICR_PENDBASE register, and GITS_BASER for GICR_CBASE. This does not lead to any observable bug because similar bits are at the same location, but this just make the code even harder to understand... This patch provides the required #defines and fixes the mixup. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1427465705-17126-4-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Fix device ID encodingAndre Przywara2015-03-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | When building ITS commands which have the device ID in it, we should mask off the whole upper 32 bits of the first command word before inserting the new value in there. Signed-off-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1427465705-17126-3-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Fix encoding of collection's target redistributorMarc Zyngier2015-03-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With a monolithic GICv3, redistributors are addressed using a linear number, while a distributed implementation uses physical addresses. When encoding a target address into a command, we strip the lower 16 bits, as redistributors are always 64kB aligned. This works perfectly well with a distributed implementation, but has the silly effect of always encoding target 0 in the monolithic case (unless you have more than 64k CPUs, of course). The obvious fix is to shift the linear target number by 16 when computing the target address, so that we don't loose any precious bit. Reported-by: Andre Przywara <andre.przywara@arm.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1427465705-17126-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
* | Merge branch 'irqchip/urgent-gic' into irqchip/urgentJason Cooper2015-03-154-38/+146
|\ \ | |/
| * irqchip: gicv3-its: Support safe initializationYun Wu2015-03-081-0/+35
| | | | | | | | | | | | | | | | | | | | | | | | It's unsafe to change the configurations of an activated ITS directly since this will lead to unpredictable results. This patch guarantees the ITSes being initialized are quiescent. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Yun Wu <wuyun.wu@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-12-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Define macros for GITS_CTLR fieldsYun Wu2015-03-082-1/+4
| | | | | | | | | | | | | | | | | | | | Define macros for GITS_CTLR fields to avoid using magic numbers. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Yun Wu <wuyun.wu@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-11-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Add limitation to page orderYun Wu2015-03-081-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When required size of Device Table is out of the page allocator's capability, the whole ITS will fail in probing. This actually is not the hardware's problem and is mainly a limitation of the kernel page allocator. This patch will keep ITS going on to the next initializaion stage with an explicit warning. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Yun Wu <wuyun.wu@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-10-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Use 64KB page as default granuleYun Wu2015-03-081-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The field of page size in register GITS_BASERn might be read-only if an implementation only supports a single, fixed page size. But currently the ITS driver will throw out an error when PAGE_SIZE is less than the minimum size supported by an ITS. So addressing this problem by using 64KB pages as default granule for all the ITS base tables. Acked-by: Marc Zyngier <marc.zyngier@arm.com> [maz: fixed bug breaking non Device Table allocations] Signed-off-by: Yun Wu <wuyun.wu@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-9-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Zero itt before handling to hardwareYun Wu2015-03-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some kind of brain-dead implementations chooses to insert ITEes in rapid sequence of disabled ITEes, and an un-zeroed ITT will confuse ITS on judging whether an ITE is really enabled or not. Considering the implementations are still supported by the GICv3 architecture, in which ITT is not required to be zeroed before being handled to hardware, we do the favor in ITS driver. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Yun Wu <wuyun.wu@huawei.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-8-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gic-v3: Fix out of bounds access to cpu_logical_mapVladimir Murzin2015-03-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While playing with KASan support for arm64/arm the following appeared on boot: ================================================================== BUG: AddressSanitizer: out of bounds access in __asan_load8+0x14/0x1c at addr ffffffc000ad0dc0 Read of size 8 by task swapper/0/1 page:ffffffbdc202b400 count:1 mapcount:0 mapping: (null) index:0x0 flags: 0x400(reserved) page dumped because: kasan: bad access detected Address belongs to variable __cpu_logical_map+0x200/0x220 CPU: 2 PID: 1 Comm: swapper/0 Not tainted 3.19.0-rc6-next-20150129+ #481 Hardware name: FVP Base (DT) Call trace: [<ffffffc00008a794>] dump_backtrace+0x0/0x184 [<ffffffc00008a928>] show_stack+0x10/0x1c [<ffffffc00075e46c>] dump_stack+0xa0/0xf8 [<ffffffc0001df490>] kasan_report_error+0x23c/0x264 [<ffffffc0001e0188>] check_memory_region+0xc0/0xe4 [<ffffffc0001dedf0>] __asan_load8+0x10/0x1c [<ffffffc000431294>] gic_raise_softirq+0xc4/0x1b4 [<ffffffc000091fc0>] smp_send_reschedule+0x30/0x3c [<ffffffc0000f0d1c>] try_to_wake_up+0x394/0x434 [<ffffffc0000f0de8>] wake_up_process+0x2c/0x6c [<ffffffc0000d9570>] wake_up_worker+0x38/0x48 [<ffffffc0000dbb50>] insert_work+0xac/0xec [<ffffffc0000dbd38>] __queue_work+0x1a8/0x374 [<ffffffc0000dbf60>] queue_work_on+0x5c/0x7c [<ffffffc0000d8a78>] call_usermodehelper_exec+0x170/0x188 [<ffffffc0004037b8>] kobject_uevent_env+0x650/0x6bc [<ffffffc000403830>] kobject_uevent+0xc/0x18 [<ffffffc00040292c>] kset_register+0xa8/0xc8 [<ffffffc0004d6c88>] bus_register+0x134/0x2e8 [<ffffffc0004d73b4>] subsys_virtual_register+0x2c/0x5c [<ffffffc000a76a4c>] wq_sysfs_init+0x14/0x20 [<ffffffc000082a28>] do_one_initcall+0xa8/0x1fc [<ffffffc000a70db4>] kernel_init_freeable+0x1ec/0x294 [<ffffffc00075aa5c>] kernel_init+0xc/0xec Memory state around the buggy address: ffffff80003e0820: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffff80003e0830: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffff80003e0840: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00 ^ ffffff80003e0850: 00 00 fa fa fa fa fa fa 00 00 00 00 00 00 00 00 ================================================================== The reason for that cpumask_next() returns >= nr_cpu_ids if no further cpus set, but "==" condition is checked only, so we end up with out-of-bounds access to cpu_logical_map. Fix is by using the condition check for cpumask_next. Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-7-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gic: Fix unsafe locking reported by lockdepMarc Zyngier2015-03-081-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiled with CONFIG_LOCKDEP, the kernel shouts badly, saying that the locking in the GIC code is unsafe. I'm afraid the kernel is right: CPU0 ---- lock(irq_controller_lock); <Interrupt> lock(irq_controller_lock); *** DEADLOCK *** This can happen while enabling, disabling, setting the type or the affinity of an interrupt. The fix is to take the interrupt_controller_lock with interrupts disabled in these cases. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-6-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Fix unsafe locking reported by lockdepMarc Zyngier2015-03-081-9/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When compiled with CONFIG_LOCKDEP, the kernel shouts badly, saying that my locking is unsafe. I'm afraid the kernel is right: CPU0 CPU1 ---- ---- lock(&its->lock); local_irq_disable(); lock(&irq_desc_lock_class); lock(&its->lock); <Interrupt> lock(&irq_desc_lock_class); *** DEADLOCK *** The fix is to always take its->lock with interrupts disabled. Reported-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-5-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Iterate over PCI aliases to generate ITS configurationMarc Zyngier2015-03-081-8/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current PCI/MSI support in the GICv3 ITS doesn't really deal with systems where different PCI devices end-up using the same RequesterID (as it would be the case with non-transparent bridges, for example). It is likely that none of these devices would actually generate any interrupt, as the ITS is programmed with the device's own ID, and not that of the bridge. A solution to this is to iterate over the PCI hierarchy to discover what the device aliases too. We also use this to discover the upper bound of the number of MSIs that this sub-hierarchy can generate. With this in place, PCI aliases can be supported. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-4-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Allocate enough memory for the full range of DeviceIDMarc Zyngier2015-03-082-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ITS table allocator is only allocating a single page per table. This works fine for most things, but leads to silent lack of interrupt delivery if we end-up with a device that has an ID that is out of the range defined by a single page of memory. Even worse, depending on the page size, behaviour changes, which is not a very good experience. A solution is actually to allocate memory for the full range of ID that the ITS supports. A massive waste memory wise, but at least a safe bet. Tested on a Phytium SoC. Tested-by: Chen Baozi <chenbaozi@kylinos.com.cn> Acked-by: Chen Baozi <chenbaozi@kylinos.com.cn> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-3-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
| * irqchip: gicv3-its: Fix ITS CPU initVladimir Murzin2015-03-081-5/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We skip initialisation of ITS in case the device-tree has no corresponding description, but we are still accessing to ITS bits while setting CPU interface what leads to the kernel panic: ITS: No ITS available, not enabling LPIs CPU0: found redistributor 0 region 0:0x000000002f100000 Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = ffffffc0007fb000 [00000000] *pgd=00000000fc407003, *pud=00000000fc407003, *pmd=00000000fc408003, *pte=006000002f000707 Internal error: Oops: 96000005 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-rc2+ #318 Hardware name: FVP Base (DT) task: ffffffc00077edb0 ti: ffffffc00076c000 task.ti: ffffffc00076c000 PC is at its_cpu_init+0x2c/0x320 LR is at gic_cpu_init+0x168/0x1bc It happens in gic_rdists_supports_plpis() because gic_rdists is NULL. The gic_rdists is set to non-NULL only when ITS node is presented in the device-tree. Fix this by moving the call to gic_rdists_supports_plpis() inside the !list_empty(&its_nodes) block, because it is that list that guards the validity of the rest of the information in this driver. Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Link: https://lkml.kernel.org/r/1425659870-11832-2-git-send-email-marc.zyngier@arm.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
* | irqchip: armada-370-xp: Fix chained per-cpu interruptsMaxime Ripard2015-03-081-1/+20
|/ | | | | | | | | | | | | | | | | | | | | | | | | | On the Cortex-A9-based Armada SoCs, the MPIC is not the primary interrupt controller. Yet, it still has to handle some per-cpu interrupt. To do so, it is chained with the GIC using a per-cpu interrupt. However, the current code only call irq_set_chained_handler, which is called and enable that interrupt only on the boot CPU, which means that the parent per-CPU interrupt is never unmasked on the secondary CPUs, preventing the per-CPU interrupt to actually work as expected. This was not seen until now since the only MPIC PPI users were the Marvell timers that were not working, but not used either since the system use the ARM TWD by default, and the ethernet controllers, that are faking there interrupts as SPI, and don't really expect to have interrupts on the secondary cores anyway. Add a CPU notifier that will enable the PPI on the secondary cores when they are brought up. Cc: <stable@vger.kernel.org> # 3.15+ Signed-off-by: Maxime Ripard <maxime.ripard@free-electrons.com> Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Link: https://lkml.kernel.org/r/1425378443-28822-1-git-send-email-maxime.ripard@free-electrons.com Signed-off-by: Jason Cooper <jason@lakedaemon.net>
* Linux 4.0-rc1v4.0-rc1Linus Torvalds2015-02-221-4/+4
| | | | | | | | | | | | | | | | | | | | | .. after extensive statistical analysis of my G+ polling, I've come to the inescapable conclusion that internet polls are bad. Big surprise. But "Hurr durr I'ma sheep" trounced "I like online polls" by a 62-to-38% margin, in a poll that people weren't even supposed to participate in. Who can argue with solid numbers like that? 5,796 votes from people who can't even follow the most basic directions? In contrast, "v4.0" beat out "v3.20" by a slimmer margin of 56-to-44%, but with a total of 29,110 votes right now. Now, arguably, that vote spread is only about 3,200 votes, which is less than the almost six thousand votes that the "please ignore" poll got, so it could be considered noise. But hey, I asked, so I'll honor the votes.
* Merge tag 'ext4_for_linus' of ↵Linus Torvalds2015-02-225-56/+108
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fixes from Ted Ts'o: "Ext4 bug fixes. We also reserved code points for encryption and read-only images (for which the implementation is mostly just the reserved code point for a read-only feature :-)" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix indirect punch hole corruption ext4: ignore journal checksum on remount; don't fail ext4: remove duplicate remount check for JOURNAL_CHECKSUM change ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesize ext4: support read-only images ext4: change to use setup_timer() instead of init_timer() ext4: reserve codepoints used by the ext4 encryption feature jbd2: complain about descriptor block checksum errors
| * ext4: fix indirect punch hole corruptionOmar Sandoval2015-02-141-34/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4f579ae7de56 (ext4: fix punch hole on files with indirect mapping) rewrote FALLOC_FL_PUNCH_HOLE for ext4 files with indirect mapping. However, there are bugs in several corner cases. This fixes 5 distinct bugs: 1. When there is at least one entire level of indirection between the start and end of the punch range and the end of the punch range is the first block of its level, we can't return early; we have to free the intervening levels. 2. When the end is at a higher level of indirection than the start and ext4_find_shared returns a top branch for the end, we still need to free the rest of the shared branch it returns; we can't decrement partial2. 3. When a punch happens within one level of indirection, we need to converge on an indirect block that contains the start and end. However, because the branches returned from ext4_find_shared do not necessarily start at the same level (e.g., the partial2 chain will be shallower if the last block occurs at the beginning of an indirect group), the walk of the two chains can end up "missing" each other and freeing a bunch of extra blocks in the process. This mismatch can be handled by first making sure that the chains are at the same level, then walking them together until they converge. 4. When the punch happens within one level of indirection and ext4_find_shared returns a top branch for the start, we must free it, but only if the end does not occur within that branch. 5. When the punch happens within one level of indirection and ext4_find_shared returns a top branch for the end, then we shouldn't free the block referenced by the end of the returned chain (this mirrors the different levels case). Signed-off-by: Omar Sandoval <osandov@osandov.com>
| * ext4: ignore journal checksum on remount; don't failEric Sandeen2015-02-121-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As of v3.18, ext4 started rejecting a remount which changes the journal_checksum option. Prior to that, it was simply ignored; the problem here is that if someone has this in their fstab for the root fs, now the box fails to boot properly, because remount of root with the new options will fail, and the box proceeds with a readonly root. I think it is a little nicer behavior to accept the option, but warn that it's being ignored, rather than failing the mount, but that might be a subjective matter... Reported-by: Cónräd <conradsand.arma@gmail.com> Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: remove duplicate remount check for JOURNAL_CHECKSUM changeEric Sandeen2015-02-121-11/+0
| | | | | | | | | | | | | | | | | | | | rejection of, changing journal_checksum during remount. One suffices. While we're at it, remove old comment about the "check" option which has been deprecated for some time now. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesizeXiaoguang Wang2015-02-121-0/+7
| | | | | | | | | | | | | | | | | | Since commit 90a8020 and d6320cb, Jan Kara has fixed this issue partially. This mmap data corruption still exists in nodelalloc mode, fix this. Signed-off-by: Xiaoguang Wang <wangxg.fnst@cn.fujitsu.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * ext4: support read-only imagesDarrick J. Wong2015-02-122-1/+10
| | | | | | | | | | | | | | | | | | Add a rocompat feature, "readonly" to mark a FS image as read-only. The feature prevents the kernel and e2fsprogs from changing the image; the flag can be toggled by tune2fs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: change to use setup_timer() instead of init_timer()Jan Mrazek2015-01-261-3/+2
| | | | | | | | | | Signed-off-by: Jan Mrazek <email@honzamrazek.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * ext4: reserve codepoints used by the ext4 encryption featureTheodore Ts'o2015-01-191-4/+13
| | | | | | | | Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * jbd2: complain about descriptor block checksum errorsDarrick J. Wong2015-01-191-0/+3
| | | | | | | | | | | | | | | | | | We should complain in dmesg when journal recovery fails on account of the descriptor block being corrupt, so that the diagnostic data can be recovered. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | Merge branch 'for-linus-2' of ↵Linus Torvalds2015-02-2270-758/+907
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull more vfs updates from Al Viro: "Assorted stuff from this cycle. The big ones here are multilayer overlayfs from Miklos and beginning of sorting ->d_inode accesses out from David" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits) autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation procfs: fix race between symlink removals and traversals debugfs: leave freeing a symlink body until inode eviction Documentation/filesystems/Locking: ->get_sb() is long gone trylock_super(): replacement for grab_super_passive() fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) SELinux: Use d_is_positive() rather than testing dentry->d_inode Smack: Use d_is_positive() rather than testing dentry->d_inode TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR() Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb VFS: Split DCACHE_FILE_TYPE into regular and special types VFS: Add a fallthrough flag for marking virtual dentries VFS: Add a whiteout dentry type VFS: Introduce inode-getting helpers for layered/unioned fs environments Infiniband: Fix potential NULL d_inode dereference posix_acl: fix reference leaks in posix_acl_create autofs4: Wrong format for printing dentry ...
| * | autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocationAl Viro2015-02-221-2/+6
| | | | | | | | | | | | | | | | | | X-Coverup: just ask spender Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | procfs: fix race between symlink removals and traversalsAl Viro2015-02-223-12/+22
| | | | | | | | | | | | | | | | | | | | | use_pde()/unuse_pde() in ->follow_link()/->put_link() resp. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | debugfs: leave freeing a symlink body until inode evictionAl Viro2015-02-221-17/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As it is, we have debugfs_remove() racing with symlink traversals. Supply ->evict_inode() and do freeing there - inode will remain pinned until we are done with the symlink body. And rip the idiocy with checking if dentry is positive right after we'd verified debugfs_positive(), which is a stronger check... Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | Documentation/filesystems/Locking: ->get_sb() is long goneAl Viro2015-02-221-2/+0
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | trylock_super(): replacement for grab_super_passive()Konstantin Khlebnikov2015-02-223-26/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I've noticed significant locking contention in memory reclaimer around sb_lock inside grab_super_passive(). Grab_super_passive() is called from two places: in icache/dcache shrinkers (function super_cache_scan) and from writeback (function __writeback_inodes_wb). Both are required for progress in memory allocator. Grab_super_passive() acquires sb_lock to increment sb->s_count and check sb->s_instances. It seems sb->s_umount locked for read is enough here: super-block deactivation always runs under sb->s_umount locked for write. Protecting super-block itself isn't a problem: in super_cache_scan() sb is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers are still active. Inside writeback super-block comes from inode from bdi writeback list under wb->list_lock. This patch removes locking sb_lock and checks s_instances under s_umount: generic_shutdown_super() unlinks it under sb->s_umount locked for write. New variant is called trylock_super() and since it only locks semaphore, callers must call up_read(&sb->s_umount) instead of drop_super(sb) when they're done. Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversionsDavid Howells2015-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup() rather than d_is_dir() when checking a dir watch and give an error on fake directories. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversionsDavid Howells2015-02-224-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack thereof) in cachefiles: (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as it doesn't want to deal with automounts in its cache. (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in cachefiles. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)David Howells2015-02-2234-71/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really *is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') || die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") || die $coccifile; print($fd "$_\n") || die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 || die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | SELinux: Use d_is_positive() rather than testing dentry->d_inodeDavid Howells2015-02-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Use d_is_positive() rather than testing dentry->d_inode in SELinux to get rid of direct references to d_inode outside of the VFS. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | Smack: Use d_is_positive() rather than testing dentry->d_inodeDavid Howells2015-02-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Use d_is_positive() rather than testing dentry->d_inode in Smack to get rid of direct references to d_inode outside of the VFS. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR()David Howells2015-02-221-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | Use d_is_dir() rather than d_inode and S_ISDIR(). Note that this will include fake directories such as automount triggers. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inodeDavid Howells2015-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Use d_is_positive(dentry) or d_is_negative(dentry) rather than testing dentry->d_inode as the dentry may cover another layer that has an inode when the top layer doesn't or may hold a 0,0 chardev that's actually a whiteout. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sbDavid Howells2015-02-222-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | mediated_filesystem() should use dentry->d_sb not dentry->d_inode->i_sb and should avoid file_inode() also since it is really dealing with the path. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | VFS: Split DCACHE_FILE_TYPE into regular and special typesDavid Howells2015-02-222-8/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Split DCACHE_FILE_TYPE into DCACHE_REGULAR_TYPE (dentries representing regular files) and DCACHE_SPECIAL_TYPE (representing blockdev, chardev, FIFO and socket files). d_is_reg() and d_is_special() are added to detect these subtypes and d_is_file() is left as the union of the two. This allows a number of places that use S_ISREG(dentry->d_inode->i_mode) to use d_is_reg(dentry) instead. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | VFS: Add a fallthrough flag for marking virtual dentriesDavid Howells2015-02-222-1/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a DCACHE_FALLTHRU flag to indicate that, in a layered filesystem, this is a virtual dentry that covers another one in a lower layer that should be used instead. This may be recorded on medium if directory integration is stored there. The flag can be set with d_set_fallthru() and tested with d_is_fallthru(). Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | VFS: Add a whiteout dentry typeDavid Howells2015-02-221-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add DCACHE_WHITEOUT_TYPE and provide a d_is_whiteout() accessor function. A d_is_miss() accessor is also added for ordinary cache misses and d_is_negative() is modified to indicate either an ordinary miss or an enforced miss (whiteout). Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | VFS: Introduce inode-getting helpers for layered/unioned fs environmentsDavid Howells2015-02-221-0/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce some function for getting the inode (and also the dentry) in an environment where layered/unioned filesystems are in operation. The problem is that we have places where we need *both* the union dentry and the lower source or workspace inode or dentry available, but we can only have a handle on one of them. Therefore we need to derive the handle to the other from that. The idea is to introduce an extra field in struct dentry that allows the union dentry to refer to and pin the lower dentry. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | Merge branch 'overlayfs-next' of ↵Al Viro2015-02-207-319/+517
| |\ \ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs into for-next
| | * | ovl: discard independent cursor in readdir()hujianyang2015-01-091-24/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the ovl_dir_cache is stable during a directory reading, the cursor of struct ovl_dir_file don't need to be an independent entry in the list of a merged directory. This patch changes *cursor* to a pointer which points to the entry in the ovl_dir_cache. After this, we don't need to check *is_cursor* either. Signed-off-by: hujianyang <hujianyang@huawei.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | ovl: document lower layer orderingMiklos Szeredi2015-01-081-2/+6
| | | | | | | | | | | | | | | | | | | | Reported-by: Fabian Sturm <fabian.sturm@aduu.de> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | ovl: Prevent rw remount when it should be ro mountSeunghun Lee2015-01-081-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Overlayfs should be mounted read-only when upper-fs is read-only or nonexistent. But now it can be remounted read-write and this can cause kernel panic. So we should prevent read-write remount when the above situation happens. Signed-off-by: Seunghun Lee <waydi1@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
OpenPOWER on IntegriCloud