summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ext4: check for overlapping extents in ext4_valid_extent_entries()Eryu Guan2013-12-031-1/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A corrupted ext4 may have out of order leaf extents, i.e. extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT ^^^^ overlap with previous extent Reading such extent could hit BUG_ON() in ext4_es_cache_extent(). BUG_ON(end < lblk); The problem is that __read_extent_tree_block() tries to cache holes as well but assumes 'lblk' is greater than 'prev' and passes underflowed length to ext4_es_cache_extent(). Fix it by checking for overlapping extents in ext4_valid_extent_entries(). I hit this when fuzz testing ext4, and am able to reproduce it by modifying the on-disk extent by hand. Also add the check for (ee_block + len - 1) in ext4_valid_extent() to make sure the value is not overflow. Ran xfstests on patched ext4 and no regression. Cc: Lukáš Czerner <lczerner@redhat.com> Signed-off-by: Eryu Guan <guaneryu@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
* ext4: fix use-after-free in ext4_mb_new_blocksJunho Ryu2013-12-031-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count. While ext4_mb_use_preallocated checks pa->pa_deleted first and then increments pa->count later, ext4_mb_put_pa decrements pa->pa_count before holding pa->pa_lock and then sets pa->pa_deleted. * Free sequence ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count ext4_mb_put_pa (2): lock pa->pa_lock ext4_mb_put_pa (3): check pa->pa_deleted ext4_mb_put_pa (4): set pa->pa_deleted=1 ext4_mb_put_pa (5): unlock pa->pa_lock ext4_mb_put_pa (6): remove pa from a list ext4_mb_pa_callback: free pa * Use sequence ext4_mb_use_preallocated (1): iterate over preallocation ext4_mb_use_preallocated (2): lock pa->pa_lock ext4_mb_use_preallocated (3): check pa->pa_deleted ext4_mb_use_preallocated (4): increase pa->pa_count ext4_mb_use_preallocated (5): unlock pa->pa_lock ext4_mb_release_context: access pa * Use-after-free sequence [initial status] <pa->pa_deleted = 0, pa_count = 1> ext4_mb_use_preallocated (1): iterate over preallocation ext4_mb_use_preallocated (2): lock pa->pa_lock ext4_mb_use_preallocated (3): check pa->pa_deleted ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count [pa_count decremented] <pa->pa_deleted = 0, pa_count = 0> ext4_mb_use_preallocated (4): increase pa->pa_count [pa_count incremented] <pa->pa_deleted = 0, pa_count = 1> ext4_mb_use_preallocated (5): unlock pa->pa_lock ext4_mb_put_pa (2): lock pa->pa_lock ext4_mb_put_pa (3): check pa->pa_deleted ext4_mb_put_pa (4): set pa->pa_deleted=1 [race condition!] <pa->pa_deleted = 1, pa_count = 1> ext4_mb_put_pa (5): unlock pa->pa_lock ext4_mb_put_pa (6): remove pa from a list ext4_mb_pa_callback: free pa ext4_mb_release_context: access pa AddressSanitizer has detected use-after-free in ext4_mb_new_blocks Bug report: http://goo.gl/rG1On3 Signed-off-by: Junho Ryu <jayr@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
* ext4: call ext4_error_inode() if jbd2_journal_dirty_metadata() failsTheodore Ts'o2013-12-021-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | While it's true that errors can only happen if there is a bug in jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt the kernel or remount the file system read-only in order to avoid further data loss. The ext4_journal_abort_handle() function doesn't do any of this, and while it's likely that this call (since it doesn't adjust refcounts) will likely result in the file system eventually deadlocking since the current transaction will never be able to close, it's much cleaner to call let ext4's error handling system deal with this situation. There's a separate bug here which is that if certain jbd2 errors errors occur and file system is mounted errors=continue, the file system will probably eventually end grind to a halt as described above. But things have been this way in a long time, and usually when we have these sorts of errors it's pretty much a disaster --- and that's why the jbd2 layer aggressively retries memory allocations, which is the most likely cause of these jbd2 errors. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@vger.kernel.org
* Linux 3.13-rc2v3.13-rc2Linus Torvalds2013-11-291-1/+1
|
* Merge tag 'arm64-stable' of ↵Linus Torvalds2013-11-298-66/+67
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 Pull ARM64 fixes from Catalin Marinas: - Remove preempt_count modifications in the arm64 IRQ handling code since that's already dealt with in generic irq_enter/irq_exit - PTE_PROT_NONE bit moved higher up to avoid overlapping with the hardware bits (for PROT_NONE mappings which are pte_present) - Big-endian fixes for ptrace support - Asynchronous aborts unmasking while in the kernel - pgprot_writecombine() change to create Normal NonCacheable memory rather than Device GRE * tag 'arm64-stable' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64: arm64: Move PTE_PROT_NONE higher up arm64: Use Normal NonCacheable memory for writecombine arm64: debug: make aarch32 bkpt checking endian clean arm64: ptrace: fix compat registes get/set to be endian clean arm64: Unmask asynchronous aborts when in kernel mode arm64: dts: Reserve the memory used for secondary CPU release address arm64: let the core code deal with preempt_count
| * arm64: Move PTE_PROT_NONE higher upCatalin Marinas2013-11-291-14/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PTE_PROT_NONE means that a pte is present but does not have any read/write attributes. However, setting the memory type like pgprot_writecombine() is allowed and such bits overlap with PTE_PROT_NONE. This causes mmap/munmap issues in drivers that change the vma->vm_pg_prot on PROT_NONE mappings. This patch reverts the PTE_FILE/PTE_PROT_NONE shift in commit 59911ca4325d (ARM64: mm: Move PTE_PROT_NONE bit) and moves PTE_PROT_NONE together with the other software bits. Signed-off-by: Steve Capper <steve.capper@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Tested-by: Steve Capper <steve.capper@linaro.org> Cc: <stable@vger.kernel.org> # 3.11+
| * arm64: Use Normal NonCacheable memory for writecombineCatalin Marinas2013-11-291-1/+1
| | | | | | | | | | | | | | | | This provides better performance compared to Device GRE and also allows unaligned accesses. Such memory is intended to be used with standard RAM (e.g. framebuffers) and not I/O. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: debug: make aarch32 bkpt checking endian cleanMatthew Leach2013-11-281-8/+12
| | | | | | | | | | | | | | | | | | | | The current breakpoint instruction checking code for A32 is not endian clean. Fix this with appropriate byte-swapping when retrieving instructions. Signed-off-by: Matthew Leach <matthew.leach@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: ptrace: fix compat registes get/set to be endian cleanMatthew Leach2013-11-281-21/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | On a BE system the wrong half of the X registers is retrieved/written when attempting to get/set the value of aarch32 registers through ptrace. Ensure that types are the correct width so that the relevant casting occurs. Signed-off-by: Matthew Leach <matthew.leach@arm.com> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: Unmask asynchronous aborts when in kernel modeCatalin Marinas2013-11-253-0/+9
| | | | | | | | | | | | | | | | | | The asynchronous aborts are generally fatal for the kernel but they can be masked via the pstate A bit. If a system error happens while in kernel mode, it won't be visible until returning to user space. This patch enables this kind of abort early to help identifying the cause. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: dts: Reserve the memory used for secondary CPU release addressCatalin Marinas2013-11-251-0/+2
| | | | | | | | | | | | | | | | | | With the spin-table SMP booting method, secondary CPUs poll a location passed in the DT. The foundation-v8.dts file doesn't have this memory reserved and there is a risk of Linux using it before secondary CPUs are started. Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
| * arm64: let the core code deal with preempt_countMarc Zyngier2013-11-251-22/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit f27dde8deef3 (sched: Add NEED_RESCHED to the preempt_count) introduced the use of bit 31 in preempt_count for obscure scheduling purposes. This causes interrupts taken from EL0 to hit the (open coded) BUG when this flag is flipped while handling the interrupt (we compare the values before and after, and kill the kernel if they are different). The fix is to stop messing with the preempt count entirely, as this is already being dealt with in the generic code (irq_enter/irq_exit). Tested on a dual A53 FPGA running cyclictest. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2013-11-2914-88/+87
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Martin Schwidefsky: "One performance improvement and a few bug fixes. Two of the fixes deal with the clock related problems we have seen on recent kernels" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/mm: handle asce-type exceptions as normal page fault s390,time: revert direct ktime path for s390 clockevent device s390/time,vdso: convert to the new update_vsyscall interface s390/uaccess: add missing page table walk range check s390/mm: optimize copy_page s390/dasd: validate request size before building CCW/TCW request s390/signal: always restore saved runtime instrumentation psw bit
| * | s390/mm: handle asce-type exceptions as normal page faultMartin Schwidefsky2013-11-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Git commit 9e34f2686bb088b211b6cac8772e1f644c6180f8 "s390/mm,tlb: tlb flush on page table upgrade fixup" removed the exception handler for the asce-type exception. This is incorrect as the user-copy with MVCOS can cause asce-type exceptions in the kernel if a user pointer is too large. Those need to be handled with do_no_context to branch to the fixup in the user-copy code. The simplest fix for this problem is to call do_dat_exception for asce-type excpetions, as there is no vma for the address the code will handle the exception correctly. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390,time: revert direct ktime path for s390 clockevent deviceMartin Schwidefsky2013-11-251-15/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Git commit 4f37a68cdaf6dea833cfdded2a3e0c47c0f006da "s390: Use direct ktime path for s390 clockevent device" makes use of the CLOCK_EVT_FEAT_KTIME clockevent option to avoid the delta calculation with ktime_get() in clockevents_program_event and the get_tod_clock() in s390_next_event. This is based on the assumption that the difference between the internal ktime and the hardware clock is reflected in the wall_to_monotonic delta. But this is not true, the ntp corrections are applied via changes to the tk->mult multiplier and this is not reflected in wall_to_monotonic. In theory this could be solved by using the raw monotonic clock but it is simpler to switch back to the standard clock delta calculation. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/time,vdso: convert to the new update_vsyscall interfaceMartin Schwidefsky2013-11-258-45/+62
| | | | | | | | | | | | | | | | | | | | | Switch to the improved update_vsyscall interface that provides sub-nanosecond precision for gettimeofday and clock_gettime. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/uaccess: add missing page table walk range checkHeiko Carstens2013-11-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When translating a user space address, the address must be checked against the ASCE limit of the process. If the address is larger than the maximum address that is reachable with the ASCE, an ASCE type exception must be generated. The current code simply ignored the higher order bits. This resulted in an address wrap around in user space instead of an exception in user space. Cc: stable@vger.kernel.org # v3.9+ Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | s390/mm: optimize copy_pageHeiko Carstens2013-11-201-25/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Always use the mvcl instruction to copy a page instead of mvpg or a couple of mvc instructions. Copying a huge page is 25% faster this way. Also bypass caches when copying pages since only parts of a page will be used afterwards. Especially when copying a huge page this would kick everything out of the L1 and L2 data caches on a zEC12 machine. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
| * | s390/dasd: validate request size before building CCW/TCW requestStefan Weinhuber2013-11-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An I/O request that does not read or write full blocks cannot be translated into a correct CCW or TCW program and should be rejected right away. In particular the code that creates TCW requests will not notice this problem and create broken TCWs that will be rejected by the hardware. Signed-off-by: Stefan Weinhuber <wein@de.ibm.com> Reference-ID: RQM1956
| * | s390/signal: always restore saved runtime instrumentation psw bitHendrik Brueckner2013-11-202-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit "s390: fix handling of runtime instrumentation psw bit" (5ebf250dab) changed the behavior of setting the runtime instrumentation psw bit. This commit restores the original logic: 1. When returning from the signal handler, the runtime instrumentation psw bit is restored to its saved state. 2. If the runtime instrumentation psw bit is enabled during the signal handler, it is always turned off when leaving the signal handler. The saved state is restored as described in 1. That also implies that turning on runtime instrumentation in the signal handler is only effective while running in the signal context. Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
* | | Merge branch 'i2c/for-current' of ↵Linus Torvalds2013-11-295-13/+19
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Some easy but needed fixes for i2c drivers since rc1" * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: bcm2835: Linking platform nodes to adapter nodes i2c: omap: raw read and write endian fix i2c: i2c-bcm-kona: Fix module build i2c: i2c-diolan-u2c: different usb endpoints for DLN-2-U2C i2c: bcm-kona: remove duplicated include i2c: davinci: raw read and write endian fix
| * | | i2c: bcm2835: Linking platform nodes to adapter nodesFlorian Meier2013-11-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to find I2C devices in the device tree, the platform nodes have to be known by the I2C core. This requires setting the dev.of_node parameter of the adapter. Signed-off-by: Florian Meier <florian.meier@koalo.de> Tested-by: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | | i2c: omap: raw read and write endian fixVictor Kamensky2013-11-271-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All OMAP IP blocks expect LE data, but CPU may operate in BE mode. Need to use endian neutral functions to read/write h/w registers. I.e instead of __raw_read[lw] and __raw_write[lw] functions code need to use read[lw]_relaxed and write[lw]_relaxed functions. If the first simply reads/writes register, the second will byteswap it if host operates in BE mode. Changes are trivial sed like replacement of __raw_xxx functions with xxx_relaxed variant. Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org> Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | | i2c: i2c-bcm-kona: Fix module buildTim Kryger2013-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct a typo that prevented the driver from being built as a module. Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Tim Kryger <tim.kryger@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | | i2c: i2c-diolan-u2c: different usb endpoints for DLN-2-U2CMartin Vogt2013-11-261-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The previous diolan adapter uses other out/in endpoints than the current DLN-2-U2C in compatibility mode. They changed from 0x2/0x84 to 0x3/0x83. This patch gets the endpoints from the usb interface, instead of hardcode them in the driver. This was tested on a current DLN-2-U2C board. Signed-off-by: Martin Vogt <mvogt1@gmail.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | | i2c: bcm-kona: remove duplicated includeWei Yongjun2013-11-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Markus Mayer <markus.mayer@linaro.org> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
| * | | i2c: davinci: raw read and write endian fixTaras Kondratiuk2013-11-261-2/+2
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I2C IP block expect LE data, but CPU may operate in BE mode. Need to use endian neutral functions to read/write h/w registers. I.e instead of __raw_read[lw] and __raw_write[lw] functions code need to use read[lw]_relaxed and write[lw]_relaxed functions. If the first simply reads/writes register, the second will byteswap it if host operates in BE mode. Changes are trivial sed like replacement of __raw_xxx functions with xxx_relaxed variant. Signed-off-by: Taras Kondratiuk <taras.kondratiuk@linaro.org> Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
* | | Merge branch 'for-3.13-fixes' of ↵Linus Torvalds2013-11-291-13/+37
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "This contains one important fix. The NUMA support added a while back broke ordering guarantees on ordered workqueues. It was enforced by having single frontend interface with @max_active == 1 but the NUMA support puts multiple interfaces on unbound workqueues on NUMA machines thus breaking the ordered guarantee. This is fixed by disabling NUMA support on ordered workqueues. The above and a couple other patches were sitting in for-3.12-fixes but I forgot to push that out, so they ended up waiting a bit too long. My aplogies. Other fixes are minor" * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in init_workqueues workqueue: fix comment typo for __queue_work() workqueue: fix ordered workqueues in NUMA setups workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITY
| * | | workqueue: fix pool ID allocation leakage and remove BUILD_BUG_ON() in ↵Li Bin2013-11-221-6/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | init_workqueues When one work starts execution, the high bits of work's data contain pool ID. It can represent a maximum of WORK_OFFQ_POOL_NONE. Pool ID is assigned WORK_OFFQ_POOL_NONE when the work being initialized indicating that no pool is associated and get_work_pool() uses it to check the associated pool. So if worker_pool_assign_id() assigns a ID greater than or equal WORK_OFFQ_POOL_NONE to a pool, it triggers leakage, and it may break the non-reentrance guarantee. This patch fix this issue by modifying the worker_pool_assign_id() function calling idr_alloc() by setting @end param WORK_OFFQ_POOL_NONE. Furthermore, in the current implementation, the BUILD_BUG_ON() in init_workqueues makes no sense. The number of worker pools needed cannot be determined at compile time, because the number of backing pools for UNBOUND workqueues is dynamic based on the assigned custom attributes. So remove it. tj: Minor comment and indentation updates. Signed-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | workqueue: fix comment typo for __queue_work()Li Bin2013-11-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It seems the "dying" should be "draining" here. Signed-off-by: Li Bin <huawei.libin@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | workqueue: fix ordered workqueues in NUMA setupsTejun Heo2013-11-221-2/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | An ordered workqueue implements execution ordering by using single pool_workqueue with max_active == 1. On a given pool_workqueue, work items are processed in FIFO order and limiting max_active to 1 enforces the queued work items to be processed one by one. Unfortunately, 4c16bd327c ("workqueue: implement NUMA affinity for unbound workqueues") accidentally broke this guarantee by applying NUMA affinity to ordered workqueues too. On NUMA setups, an ordered workqueue would end up with separate pool_workqueues for different nodes. Each pool_workqueue still limits max_active to 1 but multiple work items may be executed concurrently and out of order depending on which node they are queued to. Fix it by using dedicated ordered_wq_attrs[] when creating ordered workqueues. The new attrs match the unbound ones except that no_numa is always set thus forcing all NUMA nodes to share the default pool_workqueue. While at it, add sanity check in workqueue creation path which verifies that an ordered workqueues has only the default pool_workqueue. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Libin <huawei.libin@huawei.com> Cc: stable@vger.kernel.org Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
| * | | workqueue: swap set_cpus_allowed_ptr() and PF_NO_SETAFFINITYOleg Nesterov2013-11-221-4/+5
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | Move the setting of PF_NO_SETAFFINITY up before set_cpus_allowed() in create_worker(). Otherwise userland can change ->cpus_allowed in between. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | Merge branch 'for-3.13-fixes' of ↵Linus Torvalds2013-11-295-5/+6
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata Pull libata fixes from Tejun Heo: "libata device removal path was removing parent device node before its child, which is mostly harmless but triggers warning after recent sysfs changes. Rafael's patch fixes the order. Other than that, minor controller-specific fixes and device ID additions" * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata: ATA: Fix port removal ordering ahci: add Marvell 9230 to the AHCI PCI device list ata: fix acpi_bus_get_device() return value check pata_arasan_cf: add missing clk_disable_unprepare() on error path ahci: add support for IBM Akebono platform device
| * | | ATA: Fix port removal orderingRafael J. Wysocki2013-11-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit bcdde7e221a8 (sysfs: make __sysfs_remove_dir() recursive) Mika Westerberg sees traces analogous to the one below in Thunderbolt hot-remove testing: WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0() sysfs group ffffffff81c6f1e0 not found for kobject 'host7' Modules linked in: CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13 Hardware name: /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013 Workqueue: kacpi_hotplug acpi_hotplug_work_fn 0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8 ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0 ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48 Call Trace: [<ffffffff817daab1>] dump_stack+0x45/0x56 [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0 [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50 [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70 [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0 [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50 [<ffffffff8142a0d0>] device_del+0x40/0x1b0 [<ffffffff8142a24d>] device_unregister+0xd/0x20 [<ffffffff8144131a>] scsi_remove_host+0xba/0x110 [<ffffffff8145f526>] ata_host_detach+0xc6/0x100 [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20 [<ffffffff812e8f48>] pci_device_remove+0x28/0x60 [<ffffffff8142d854>] __device_release_driver+0x64/0xd0 [<ffffffff8142d8de>] device_release_driver+0x1e/0x30 [<ffffffff8142d257>] bus_remove_device+0xf7/0x140 [<ffffffff8142a1b1>] device_del+0x121/0x1b0 [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0 [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0 [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0 [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20 [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0 [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0 [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0 [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0 [<ffffffff812fd90d>] hotplug_event+0xcd/0x160 [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60 [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22 [<ffffffff8105cf3a>] process_one_work+0x17a/0x430 [<ffffffff8105db29>] worker_thread+0x119/0x390 [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0 [<ffffffff81063a5d>] kthread+0xcd/0xf0 [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180 [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0 [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180 The source of this problem is that SCSI hosts are removed from ATA ports after calling ata_tport_delete() which removes the port's sysfs directory, among other things. Now, after commit bcdde7e221a8, the sysfs directory is removed along with all of its subdirectories that include the SCSI host's sysfs directory and its subdirectories at this point. Consequently, when device_del() is finally called for any child device of the SCSI host and tries to remove its "power" group (which is already gone then), it triggers the above warning. To make the warnings go away, change the removal ordering in ata_port_detach() so that the SCSI host is removed from the port before ata_tport_delete() is called. References: https://bugzilla.kernel.org/show_bug.cgi?id=65281 Reported-and-tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>
| * | | ahci: add Marvell 9230 to the AHCI PCI device listSamir Benmendil2013-11-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tested with a DAWICONTROL DC-624e on 3.10.10 Signed-off-by: Samir Benmendil <samir.benmendil@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Levente Kurusa <levex@linux.com> Cc: stable@vger.kernel.org
| * | | ata: fix acpi_bus_get_device() return value checkYijing Wang2013-11-231-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since acpi_bus_get_device() returns plain int and not acpi_status, ACPI_FAILURE() should not be used for checking its return value. Fix that. tj: Dropped unused local variable @status from odd_can_poweroff(). Reported by kbuild test bot. Signed-off-by: Yijing Wang <wangyijing@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Aaron Lu <aaron.lu@intel.com> Cc: linux-ide@vger.kernel.org Cc: kbuild test robot <fengguang.wu@intel.com>
| * | | pata_arasan_cf: add missing clk_disable_unprepare() on error pathWei Yongjun2013-11-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the missing clk_disable_unprepare() before return from cf_init() in the error handling case. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
| * | | ahci: add support for IBM Akebono platform deviceAlistair Popple2013-11-221-0/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | The new IBM Akebono board has a PPC476GTR SoC with an AHCI compliant SATA controller. This patch adds a compatible property for the new SoC to the AHCI platform driver. Signed-off-by: Alistair Popple <alistair@popple.id.au> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
* | | Merge branch 'for-3.13-fixes' of ↵Linus Torvalds2013-11-292-6/+37
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Fixes for three issues. - cgroup destruction path could swamp system_wq possibly leading to deadlock. This actually seems to happen in the wild with memcg because memcg destruction path adds nested dependency on system_wq. Resolved by isolating cgroup destruction work items on its dedicated workqueue. - Possible locking context deadlock through seqcount reported by lockdep - Memory leak under certain conditions" * 'for-3.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: fix cgroup_subsys_state leak for seq_files cpuset: Fix memory allocator deadlock cgroup: use a dedicated workqueue for cgroup destruction
| * | | cgroup: fix cgroup_subsys_state leak for seq_filesTejun Heo2013-11-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a cgroup file implements either read_map() or read_seq_string(), such file is served using seq_file by overriding file->f_op to cgroup_seqfile_operations, which also overrides the release method to single_release() from cgroup_file_release(). Because cgroup_file_open() didn't use to acquire any resources, this used to be fine, but since f7d58818ba42 ("cgroup: pin cgroup_subsys_state when opening a cgroupfs file"), cgroup_file_open() pins the css (cgroup_subsys_state) which is put by cgroup_file_release(). The patch forgot to update the release path for seq_files and each open/release cycle leaks a css reference. Fix it by updating cgroup_file_release() to also handle seq_files and using it for seq_file release path too. Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org # v3.12
| * | | cpuset: Fix memory allocator deadlockPeter Zijlstra2013-11-271-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Juri hit the below lockdep report: [ 4.303391] ====================================================== [ 4.303392] [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ] [ 4.303394] 3.12.0-dl-peterz+ #144 Not tainted [ 4.303395] ------------------------------------------------------ [ 4.303397] kworker/u4:3/689 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire: [ 4.303399] (&p->mems_allowed_seq){+.+...}, at: [<ffffffff8114e63c>] new_slab+0x6c/0x290 [ 4.303417] [ 4.303417] and this task is already holding: [ 4.303418] (&(&q->__queue_lock)->rlock){..-...}, at: [<ffffffff812d2dfb>] blk_execute_rq_nowait+0x5b/0x100 [ 4.303431] which would create a new lock dependency: [ 4.303432] (&(&q->__queue_lock)->rlock){..-...} -> (&p->mems_allowed_seq){+.+...} [ 4.303436] [ 4.303898] the dependencies between the lock to be acquired and SOFTIRQ-irq-unsafe lock: [ 4.303918] -> (&p->mems_allowed_seq){+.+...} ops: 2762 { [ 4.303922] HARDIRQ-ON-W at: [ 4.303923] [<ffffffff8108ab9a>] __lock_acquire+0x65a/0x1ff0 [ 4.303926] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303929] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303931] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303933] SOFTIRQ-ON-W at: [ 4.303933] [<ffffffff8108abcc>] __lock_acquire+0x68c/0x1ff0 [ 4.303935] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303940] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303955] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303959] INITIAL USE at: [ 4.303960] [<ffffffff8108a884>] __lock_acquire+0x344/0x1ff0 [ 4.303963] [<ffffffff8108cbe3>] lock_acquire+0x93/0x140 [ 4.303966] [<ffffffff81063dd6>] kthreadd+0x86/0x180 [ 4.303969] [<ffffffff816ded6c>] ret_from_fork+0x7c/0xb0 [ 4.303972] } Which reports that we take mems_allowed_seq with interrupts enabled. A little digging found that this can only be from cpuset_change_task_nodemask(). This is an actual deadlock because an interrupt doing an allocation will hit get_mems_allowed()->...->__read_seqcount_begin(), which will spin forever waiting for the write side to complete. Cc: John Stultz <john.stultz@linaro.org> Cc: Mel Gorman <mgorman@suse.de> Reported-by: Juri Lelli <juri.lelli@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Tested-by: Juri Lelli <juri.lelli@gmail.com> Acked-by: Li Zefan <lizefan@huawei.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: stable@vger.kernel.org
| * | | cgroup: use a dedicated workqueue for cgroup destructionTejun Heo2013-11-221-3/+27
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since be44562613851 ("cgroup: remove synchronize_rcu() from cgroup_diput()"), cgroup destruction path makes use of workqueue. css freeing is performed from a work item from that point on and a later commit, ea15f8ccdb430 ("cgroup: split cgroup destruction into two steps"), moves css offlining to workqueue too. As cgroup destruction isn't depended upon for memory reclaim, the destruction work items were put on the system_wq; unfortunately, some controller may block in the destruction path for considerable duration while holding cgroup_mutex. As large part of destruction path is synchronized through cgroup_mutex, when combined with high rate of cgroup removals, this has potential to fill up system_wq's max_active of 256. Also, it turns out that memcg's css destruction path ends up queueing and waiting for work items on system_wq through work_on_cpu(). If such operation happens while system_wq is fully occupied by cgroup destruction work items, work_on_cpu() can't make forward progress because system_wq is full and other destruction work items on system_wq can't make forward progress because the work item waiting for work_on_cpu() is holding cgroup_mutex, leading to deadlock. This can be fixed by queueing destruction work items on a separate workqueue. This patch creates a dedicated workqueue - cgroup_destroy_wq - for this purpose. As these work items shouldn't have inter-dependencies and mostly serialized by cgroup_mutex anyway, giving high concurrency level doesn't buy anything and the workqueue's @max_active is set to 1 so that destruction work items are executed one by one on each CPU. Hugh Dickins: Because cgroup_init() is run before init_workqueues(), cgroup_destroy_wq can't be allocated from cgroup_init(). Do it from a separate core_initcall(). In the future, we probably want to reorder so that workqueue init happens before cgroup_init(). Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Hugh Dickins <hughd@google.com> Reported-by: Shawn Bohrer <shawn.bohrer@gmail.com> Link: http://lkml.kernel.org/r/20131111220626.GA7509@sbohrermbp13-local.rgmadvisors.com Link: http://lkml.kernel.org/g/alpine.LNX.2.00.1310301606080.2333@eggly.anvils Cc: stable@vger.kernel.org # v3.9+
* | | Merge tag 'sound-3.13-rc2' of ↵Linus Torvalds2013-11-298-40/+138
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Quite a few HD-Audio fixes, a WUSB audio fix and a fix for FireWire audio. The HD-audio part contains a couple of fixes for the generic parser, and these are the only intrusive fixes. The rest are mostly device-specific fixes" * tag 'sound-3.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda - Add LFE chmap to ASUS ET2700 ALSA: hda - Initialize missing bass speaker pin for ASUS AIO ET2700 ALSA: hda - limit mic boost on Asus UX31[A,E] ALSA: hda - Check leaf nodes to find aamix amps ALSA: hda - Fix hp-mic mode without VREF bits ALSA: hda - Create Headhpone Mic Jack Mode when really needed ALSA: usb: use multiple packets per urb for Wireless USB inbound audio ALSA: hda - Enable mute/mic-mute LEDs for more Thinkpads with Conexant codec ALSA: hda - Drop bus->avoid_link_reset flag ALSA: hda/realtek - Set pcbeep amp for ALC668 ALSA: hda/realtek - Add support of ALC231 codec ALSA: firewire-lib: fix wrong value for FDF field as an empty packet
| * | | ALSA: hda - Add LFE chmap to ASUS ET2700Takashi Iwai2013-11-281-3/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As the previous commit 1f0bbf03cb82 added the pin config for the bass speaker, this patch adds the corresponding LFE-only channel map on ASUS ET2700. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65961 Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | ALSA: hda - Initialize missing bass speaker pin for ASUS AIO ET2700Takashi Iwai2013-11-281-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a fixup entry for the missing bass speaker pin 0x16 on ASUS ET2700 AiO desktop. The channel map will be added in the next patch, so that this can be backported easily to stable kernels. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65961 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | ALSA: hda - limit mic boost on Asus UX31[A,E]Oleksij Rempel2013-11-281-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This both devices need limit for internal dmic. [cosmetic change; renamed fixup name by tiwai] Signed-off-by: Oleksij Rempel <linux@rempel-privat.de> Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | ALSA: hda - Check leaf nodes to find aamix ampsTakashi Iwai2013-11-281-12/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current generic parser assumes blindly that the volume and mute amps are found in the aamix node itself. But on some codecs, typically Analog Devices ones, the aamix amps are separately implemented in each leaf node of the aamix node, and the current driver can't establish the correct amp controls. This is a regression compared with the previous static quirks. This patch extends the search for the amps to the leaf nodes for allowing the aamix controls again on such codecs. In this implementation, I didn't code to loop through the whole paths, since usually one depth should suffice, and we can't search too deeply, as it may result in the conflicting control assignments. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65641 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | ALSA: hda - Fix hp-mic mode without VREF bitsTakashi Iwai2013-11-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the hp mic pin has no VREF bits, the driver forgot to set PIN_IN bit. Spotted during debugging old MacBook Airs. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65681 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | ALSA: hda - Create Headhpone Mic Jack Mode when really neededTakashi Iwai2013-11-271-8/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a headphone jack is configurable as input, the generic parser tries to make it retaskable as Headphone Mic. The switching can be done smoothly if Capture Source control exists (i.e. there is another input source). Or when user explicitly enables the creation of jack mode controls, "Headhpone Mic Jack Mode" will be created accordingly. However, if the headphone mic is the only input source, we have to create "Headphone Mic Jack Mode" control because there is no capture source selection. Otherwise, the generic parser assumes that the input is constantly enabled, thus the headphone is permanently set as input. This situation happens on the old MacBook Airs where no input is supported properly, for example. This patch fixes the problem: now "Headphone Mic Jack Mode" is created when such an input selection isn't possible. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=65681 Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>
| * | | ALSA: usb: use multiple packets per urb for Wireless USB inbound audioThomas Pugliese2013-11-271-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For Wireless USB audio devices, use multiple isoc packets per URB for inbound endpoints with a datainterval < 5. This allows the WUSB host controller to take advantage of bursting to service endpoints whose logical polling interval is less than the 4ms minimum polling interval limit in WUSB. Signed-off-by: Thomas Pugliese <thomas.pugliese@gmail.com> Signed-off-by: Takashi Iwai <tiwai@suse.de>
OpenPOWER on IntegriCloud