summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* x86/platform/quark: Print boundaries correctlyAndy Shevchenko2016-01-211-5/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | When we print values, such as @size, we have to understand that it's derived from [begin .. end] as: size = end - begin + 1 On the opposite the @end is derived from the rest as: end = begin + size - 1 Correct the IMR code to print values correctly. Note that @__end_rodata actually points to the next address after the aligned .rodata section. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Ong, Boon Leong <boon.leong.ong@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1453320821-64328-1-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/platform/UV: Remove EFI memmap quirk for UV2+Alex Thorlton2016-01-192-5/+17
| | | | | | | | | | | | | | | | | | | | | | | Commit a5d90c923bcf ("x86/efi: Quirk out SGI UV") added a quirk to efi_apply_memmap_quirks to force SGI UV systems to fall back to the old EFI memmap mechanism. We have a BIOS fix for this issue on all systems except for UV1. This commit fixes up the EFI quirk/MMR mapping code so that we only apply the special case to UV1 hardware. Signed-off-by: Alex Thorlton <athorlton@sgi.com> Reviewed-by: Matt Fleming <matt@codeblueprint.co.uk> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Hedi Berriche <hedi@sgi.com> Cc: Len Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Travis <travis@sgi.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/1449867585-189233-2-git-send-email-athorlton@sgi.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/platform/intel-mid: Join string and fix SoC nameAndy Shevchenko2016-01-191-5/+3
| | | | | | | | | | | | | | | | | | | Join string back to make grepping a bit easier. While here, lowering case for Penwell SoC name in one case to be aligned with the rest messages. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452888668-147116-2-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/platform/intel-mid: Enable 64-bit buildAndy Shevchenko2016-01-192-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel Tangier SoC is known to have 64-bit dual core CPU. Enable 64-bit build for it. The kernel has been tested on Intel Edison board: Linux buildroot 4.4.0-next-20160115+ #25 SMP Fri Jan 15 22:03:19 EET 2016 x86_64 GNU/Linux processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 74 model name : Genuine Intel(R) CPU 4000 @ 500MHz stepping : 8 Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452888668-147116-1-git-send-email-andriy.shevchenko@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/irq: Plug vector cleanup raceThomas Gleixner2016-01-151-10/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We still can end up with a stale vector due to the following: CPU0 CPU1 CPU2 lock_vector() data->move_in_progress=0 sendIPI() unlock_vector() set_affinity() assign_irq_vector() lock_vector() handle_IPI move_in_progress = 1 lock_vector() unlock_vector() move_in_progress == 1 So we need to serialize the vector assignment against a pending cleanup. The solution is rather simple now. We not only check for the move_in_progress flag in assign_irq_vector(), we also check whether there is still a cleanup pending in the old_domain cpumask. If so, we return -EBUSY to the caller and let him deal with it. Though we have to be careful in the cpu unplug case. If the cleanout has not yet completed then the following setaffinity() call would return -EBUSY. Add code which prevents this. Full context is here: http://lkml.kernel.org/r/5653B688.4050809@stratus.com Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160107.207265407@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Call irq_force_move_complete with irq descriptorThomas Gleixner2016-01-153-7/+11
| | | | | | | | | | | | | | | | | First of all there is no point in looking up the irq descriptor again, but we also need the descriptor for the final cleanup race fix in the next patch. Make that change seperate. No functional difference. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160107.125211743@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Remove outgoing CPU from vector cleanup maskThomas Gleixner2016-01-151-2/+16
| | | | | | | | | | | | | | | | We want to synchronize new vector assignments with a pending cleanup. Remove a dying cpu from a pending cleanup mask. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160107.045961667@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Remove the cpumask allocation from send_cleanup_vector()Thomas Gleixner2016-01-151-13/+3
| | | | | | | | | | | | | | | | | There is no need to allocate a new cpumask for sending the cleanup vector. The old_domain mask is now protected by the vector_lock, so we can safely remove the offline cpus from it and send the IPI with the resulting mask. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.967993932@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Clear move_in_progress before sending cleanup IPIThomas Gleixner2016-01-151-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | send_cleanup_vector() fiddles with the old_domain mask unprotected because it relies on the protection by the move_in_progress flag. But this is fatal, as the flag is reset after the IPI has been sent. So a cpu which receives the IPI can still see the flag set and therefor ignores the cleanup request. If no other cleanup request happens then the vector stays stale on that cpu and in case of an irq removal the vector still persists. That can lead to use after free when the next cleanup IPI happens. Protect the code with vector_lock and clear move_in_progress before sending the IPI. This does not plug the race which Joe reported because: CPU0 CPU1 CPU2 lock_vector() data->move_in_progress=0 sendIPI() unlock_vector() set_affinity() assign_irq_vector() lock_vector() handle_IPI move_in_progress = 1 lock_vector() unlock_vector() move_in_progress == 1 The full fix comes with a later patch. Reported-and-tested-by: Joe Lawrence <joe.lawrence@stratus.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.892412198@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Remove offline cpus from vector cleanupThomas Gleixner2016-01-151-2/+6
| | | | | | | | | | | | | | | No point of keeping offline cpus in the cleanup mask. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.808642683@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Get rid of code duplicationThomas Gleixner2016-01-151-18/+15
| | | | | | | | | | | | | | | | | | Reusing an existing vector and assigning a new vector has duplicated code. Consolidate it. This is also a preparatory patch for finally plugging the cleanup race. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.721599216@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Copy vectormask instead of an AND operationThomas Gleixner2016-01-151-1/+1
| | | | | | | | | | | | | | | | | | In the case that the new vector mask is a subset of the existing mask there is no point to do a AND operation of currentmask & newmask. The result is newmask. So we can simply copy the new mask to the current mask and be done with it. Preparatory patch for further consolidation. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.640253454@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Check vector allocation earlyThomas Gleixner2016-01-151-13/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | __assign_irq_vector() uses the vector_cpumask which is assigned by apic->vector_allocation_domain() without doing basic sanity checks. That can result in a situation where the final assignement of a newly found vector fails in apic->cpu_mask_to_apicid_and(). So we have to do rollbacks for no reason. apic->cpu_mask_to_apicid_and() only fails if vector_cpumask & requested_cpumask & cpu_online_mask is empty. Check for this condition right away and if the result is empty try immediately the next possible cpu in the requested mask. So in case of a failure the old setting is unchanged and we can remove the rollback code. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.561877324@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Reorganize the search in assign_irq_vectorThomas Gleixner2016-01-151-8/+16
| | | | | | | | | | | | | | | | | | | Split out the code which advances the target cpu for the search so we can reuse it for the next patch which adds an early validation check for the vectormask which we get from the apic. Add comments while at it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.484562040@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Reorganize the return path in assign_irq_vectorThomas Gleixner2016-01-151-14/+8
| | | | | | | | | | | | | | | | | | Use an explicit goto for the cases where we have success in the search/update and return -ENOSPC if the search loop ends due to no space. Preparatory patch for fixes. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.403491024@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Do not use apic_chip_data.old_domain as temporary bufferJiang Liu2016-01-151-3/+5
| | | | | | | | | | | | | | | | | | | | Function __assign_irq_vector() makes use of apic_chip_data.old_domain as a temporary buffer, which is in the way of using apic_chip_data.old_domain for synchronizing the vector cleanup with the vector assignement code. Use a proper temporary cpumask for this. [ tglx: Renamed the mask to searched_cpumask for clarity ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/1450880014-11741-1-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Validate that irq descriptor is still activeThomas Gleixner2016-01-151-0/+9
| | | | | | | | | | | | | | | | | In fixup_irqs() we unconditionally dereference the irq chip of an irq descriptor. The descriptor might still be valid, but already cleaned up, i.e. the chip removed. Add a check for this condition. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: Borislav Petkov <bp@alien8.de> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/20151231160106.236423282@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Fix a race in x86_vector_free_irqs()Jiang Liu2016-01-151-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's a race condition between x86_vector_free_irqs() { free_apic_chip_data(irq_data->chip_data); xxxxx //irq_data->chip_data has been freed, but the pointer //hasn't been reset yet irq_domain_reset_irq_data(irq_data); } and smp_irq_move_cleanup_interrupt() { raw_spin_lock(&vector_lock); data = apic_chip_data(irq_desc_get_irq_data(desc)); access data->xxxx // may access freed memory raw_spin_unlock(&desc->lock); } which may cause smp_irq_move_cleanup_interrupt() to access freed memory. Call irq_domain_reset_irq_data(), which clears the pointer with vector lock held. [ tglx: Free memory outside of lock held region. ] Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Tested-by: Borislav Petkov <bp@alien8.de> Tested-by: Joe Lawrence <joe.lawrence@stratus.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: stable@vger.kernel.org #4.3+ Link: http://lkml.kernel.org/r/1450880014-11741-3-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/irq: Call chip->irq_set_affinity in proper contextThomas Gleixner2016-01-151-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | setup_ioapic_dest() calls irqchip->irq_set_affinity() completely unprotected. That's wrong in several aspects: - it opens a race window where irq_set_affinity() can be interrupted and the irq chip left in unconsistent state. - it triggers a lockdep splat when we fix the vector race for 4.3+ because vector lock is taken with interrupts enabled. The proper calling convention is irq descriptor lock held and interrupts disabled. Reported-and-tested-by: Borislav Petkov <bp@alien8.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Cc: Jeremiah Mahler <jmmahler@gmail.com> Cc: andy.shevchenko@gmail.com Cc: Guenter Roeck <linux@roeck-us.net> Cc: Joe Lawrence <joe.lawrence@stratus.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1601140919420.3575@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/cpu/amd: Remove an unneeded condition in srat_detect_node()Dan Carpenter2016-01-141-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Originally we calculated ht_nodeid as "ht_nodeid = apicid - boot_cpu_id;" so presumably it could be negative. But after commit: 01aaea1afbcd ('x86: introduce initial apicid') we use c->initial_apicid which is an unsigned short and thus always >= 0. It causes a static checker warning to test for impossible conditions so let's remove it. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Cc: Borislav Petkov <bp@suse.de> Cc: Hector Marco-Gisbert <hecmargi@upv.es> Cc: Huang Rui <ray.huang@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yinghai Lu <yhlu.kernel@gmail.com> Link: http://lkml.kernel.org/r/20160113123940.GE19993@mwanda Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/vdso/pvclock: Protect STABLE check with the seqcountAndy Lutomirski2016-01-131-6/+6
| | | | | | | | | | | | | | | | | | | | | | If the clock becomes unstable while we're reading it, we need to bail. We can do this by simply moving the check into the seqcount loop. Reported-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Alexander Graf <agraf@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/755dcedb17269e1d7ce12a9a713dea303835137e.1451949191.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm: Improve switch_mm() barrier commentsAndy Lutomirski2016-01-131-7/+8
| | | | | | | | | | | | | | | | | | | | | My previous comments were still a bit confusing and there was a typo. Fix it up. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 71b3c126e611 ("x86/mm: Add barriers and document switch_mm()-vs-flush synchronization") Link: http://lkml.kernel.org/r/0a0b43cdcdd241c5faaaecfbcc91a155ddedc9a1.1452631609.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* selftests/x86: Test __kernel_sigreturn and __kernel_rt_sigreturnAndy Lutomirski2016-01-132-1/+90
| | | | | | | | | | | | | | | | | | | | | | | | | | The vdso-based sigreturn mechanism is fragile and isn't used by modern glibc so, if we break it, we'll only notice when someone tests an unusual libc. Add an explicit selftest. [ I wrote this while debugging a Bionic breakage -- my first guess was that I had somehow messed up sigreturn. I've caused problems in that code before, and it's really easy to fail to notice it because there's nothing on a modern distro that needs vdso-based sigreturn. ] Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/32946d714156879cd8e5d8eab044cd07557ed558.1452628504.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/reboot/quirks: Add iMac10,1 to pci_reboot_dmi_table[]Mario Kleiner2016-01-121-0/+8
| | | | | | | | | | | | | | | | | | | | Without the reboot=pci method, the iMac 10,1 simply hangs after printing "Restarting system" at the point when it should reboot. This fixes it. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Cc: <stable@vger.kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1450466646-26663-1-git-send-email-mario.kleiner.de@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* lguest: Map switcher text R/ORusty Russell2016-01-122-24/+54
| | | | | | | | | | | | | | | | | | | | | | | Pavel noted that lguest maps the switcher code executable and read-write. This is a bad idea for any kernel text, but particularly for text mapped at a fixed address. Create two vmas, one for the text (PAGE_KERNEL_RX) and another for the stacks (PAGE_KERNEL). Use VM_NO_GUARD to map them adjacent (as expected by the rest of the code). Reported-by: Pavel Machek <pavel@ucw.cz> Tested-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/boot: Hide local labels in verify_cpu()Borislav Petkov2016-01-121-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... from the final ELF image's symbol table as they're not really needed there. Before: $ readelf -a vmlinux | grep verify_cpu 43: ffffffff810001a9 0 NOTYPE LOCAL DEFAULT 1 verify_cpu 45: ffffffff8100028f 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_no_longmode 46: ffffffff810001de 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_noamd 47: ffffffff8100022b 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_check 48: ffffffff8100021c 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_clear_xd 49: ffffffff81000263 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_sse_test 50: ffffffff81000296 0 NOTYPE LOCAL DEFAULT 1 verify_cpu_sse_ok After: $ readelf -a vmlinux | grep verify_cpu 43: ffffffff810001a9 0 NOTYPE LOCAL DEFAULT 1 verify_cpu No functionality change. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1451860733-21163-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/fpu: Disable AVX when eagerfpu is offyu-cheng yu2016-01-122-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When "eagerfpu=off" is given as a command-line input, the kernel should disable AVX support. The Task Switched bit used for lazy context switching does not support AVX. If AVX is enabled without eagerfpu context switching, one task's AVX state could become corrupted or leak to other tasks. This is a bug and has bad security implications. This only affects systems that have AVX/AVX2/AVX512 and this issue will be found only when one actually uses AVX/AVX2/AVX512 _AND_ does eagerfpu=off. Reference: Intel Software Developer's Manual Vol. 3A Sec. 2.5 Control Registers: TS Task Switched bit (bit 3 of CR0) -- Allows the saving of the x87 FPU/ MMX/SSE/SSE2/SSE3/SSSE3/SSE4 context on a task switch to be delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed by the new task. Sec. 13.4.1 Using the TS Flag to Control the Saving of the X87 FPU and SSE State When the TS flag is set, the processor monitors the instruction stream for x87 FPU, MMX, SSE instructions. When the processor detects one of these instructions, it raises a device-not-available exeception (#NM) prior to executing the instruction. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-5-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/fpu: Disable MPX when eagerfpu is offyu-cheng yu2016-01-123-14/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This issue is a fallout from the command-line parsing move. When "eagerfpu=off" is given as a command-line input, the kernel should disable MPX support. The decision for turning off MPX was made in fpu__init_system_ctx_switch(), which is after the selection of the XSAVE format. This patch fixes it by getting that decision done earlier in fpu__init_system_xstate(). Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-4-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/fpu: Disable XGETBV1 when no XSAVEyu-cheng yu2016-01-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When "noxsave" is given as a command-line input, the kernel should disable XGETBV1. This issue currently does not cause any actual problems. XGETBV1 is only useful if we have something using the 'init optimization' (i.e. xsaveopt, xsaves). We already clear both of those in fpu__xstate_clear_all_cpu_caps(). But this is good for completeness. Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Reviewed-by: Dave Hansen <dave.hansen@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-3-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/fpu: Fix early FPU command-line parsingyu-cheng yu2016-01-121-71/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function fpu__init_system() is executed before parse_early_param(). This causes wrong FPU configuration. This patch fixes this issue by parsing boot_command_line in the beginning of fpu__init_system(). With all four patches in this series, each parameter disables features as the following: eagerfpu=off: eagerfpu, avx, avx2, avx512, mpx no387: fpu nofxsr: fxsr, fxsropt, xmm noxsave: xsave, xsaveopt, xsaves, xsavec, avx, avx2, avx512, mpx, xgetbv1 noxsaveopt: xsaveopt noxsaves: xsaves Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Ravi V. Shankar <ravi.v.shankar@intel.com> Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: yu-cheng yu <yu-cheng.yu@intel.com> Link: http://lkml.kernel.org/r/1452119094-7252-2-git-send-email-yu-cheng.yu@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm: Use PAGE_ALIGNED instead of IS_ALIGNEDKefeng Wang2016-01-121-2/+1
| | | | | | | | | | | | | Use PAGE_ALIGEND macro in <linux/mm.h> to simplify code. Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: <guohanjun@huawei.com> Cc: Alexander Kuleshov <kuleshovmail@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1452565170-11083-1-git-send-email-wangkefeng.wang@huawei.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* selftests/x86: Disable the ldt_gdt_64 test for nowAndy Lutomirski2016-01-121-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ldt_gdt.c relies on cross-cpu invalidation of SS to do one of its tests. On 32-bit builds, this works fine, but on 64-bit builds, it only works if the kernel has proper SS sigcontext handling for 64-bit user programs. Since the SS fixes are currently reverted, restrict the test case to 32 bits for now. In principle, I could change the test to use a different segment register, but it would be messy: CS can't point to the LDT for 64-bit code, and the other registers don't result in immediate faults because they aren't reloaded on kernel -> user transitions. When we fix sigcontext (in 4.6?), we can revert this. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/231591d9122d282402d8f53175134f8db5b3bc73.1452561752.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/pat: Make split_page_count() check for empty levels to fix ↵Dave Jones2016-01-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | /proc/meminfo output In CONFIG_PAGEALLOC_DEBUG=y builds, we disable 2M pages. Unfortunatly when we split up mappings during boot, split_page_count() doesn't take this into account, and starts decrementing an empty direct_pages_count[] level. This results in /proc/meminfo showing crazy things like: DirectMap2M: 18446744073709543424 kB Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge commit 'linus' into x86/urgent, to pick up recent x86 changesIngo Molnar2016-01-12576-5089/+20459
|\ | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds2016-01-116-3/+107
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 platform updates from Ingo Molnar: "Two changes: - one to quirk-save/restore certain system MSRs across suspend/resume, to make certain Intel systems work better (Chen Yu) - and also to constify a read only structure (Julia Lawall)" * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/calgary: Constify cal_chipset_ops structures x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume
| | * x86/platform/calgary: Constify cal_chipset_ops structuresJulia Lawall2015-11-292-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The cal_chipset_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Cc: Borislav Petkov <bp@alien8.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jon D. Mason <jdmason@kudzu.us> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Muli Ben-Yehuda <muli@il.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448726295-10959-1-git-send-email-Julia.Lawall@lip6.fr Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * x86/pm: Introduce quirk framework to save/restore extra MSR registers around ↵Chen Yu2015-11-264-0/+104
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | suspend/resume A bug was reported that on certain Broadwell platforms, after resuming from S3, the CPU is running at an anomalously low speed. It turns out that the BIOS has modified the value of the THERM_CONTROL register during S3, and changed it from 0 to 0x10, thus enabled clock modulation(bit4), but with undefined CPU Duty Cycle(bit1:3) - which causes the problem. Here is a simple scenario to reproduce the issue: 1. Boot up the system 2. Get MSR 0x19a, it should be 0 3. Put the system into sleep, then wake it up 4. Get MSR 0x19a, it shows 0x10, while it should be 0 Although some BIOSen want to change the CPU Duty Cycle during S3, in our case we don't want the BIOS to do any modification. Fix this issue by introducing a more generic x86 framework to save/restore specified MSR registers(THERM_CONTROL in this case) for suspend/resume. This allows us to fix similar bugs in a much simpler way in the future. When the kernel wants to protect certain MSRs during suspending, we simply add a quirk entry in msr_save_dmi_table, and customize the MSR registers inside the quirk callback, for example: u32 msr_id_need_to_save[] = {MSR_ID0, MSR_ID1, MSR_ID2...}; and the quirk mechanism ensures that, once resumed from suspend, the MSRs indicated by these IDs will be restored to their original, pre-suspend values. Since both 64-bit and 32-bit kernels are affected, this patch covers the common 64/32-bit suspend/resume code path. And because the MSRs specified by the user might not be available or readable in any situation, we use rdmsrl_safe() to safely save these MSRs. Reported-and-tested-by: Marcin Kaszewski <marcin.kaszewski@intel.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: bp@suse.de Cc: len.brown@intel.com Cc: linux@horizon.com Cc: luto@kernel.org Cc: rjw@rjwysocki.net Link: http://lkml.kernel.org/r/c9abdcbc173dd2f57e8990e304376f19287e92ba.1448382971.git.yu.c.chen@intel.com [ More edits to the naming of data structures. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds2016-01-1118-54/+146
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 mm updates from Ingo Molnar: "The main changes in this cycle were: - make the debugfs 'kernel_page_tables' file read-only, as it only has read ops. (Borislav Petkov) - micro-optimize clflush_cache_range() (Chris Wilson) - swiotlb enhancements, which fixes certain KVM emulated devices (Igor Mammedov) - fix an LDT related debug message (Jan Beulich) - modularize CONFIG_X86_PTDUMP (Kees Cook) - tone down an overly alarming warning (Laura Abbott) - Mark variable __initdata (Rasmus Villemoes) - PAT additions (Toshi Kani)" * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mm: Micro-optimise clflush_cache_range() x86/mm/pat: Change free_memtype() to support shrinking case x86/mm/pat: Add untrack_pfn_moved for mremap x86/mm: Drop WARN from multi-BAR check x86/LDT: Print the real LDT base address x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFN x86/mm: Introduce max_possible_pfn x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-only x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() as __initdata x86/mm: Turn CONFIG_X86_PTDUMP into a module
| | * | x86/mm: Micro-optimise clflush_cache_range()Chris Wilson2016-01-081-4/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whilst inspecting the asm for clflush_cache_range() and some perf profiles that required extensive flushing of single cachelines (from part of the intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading boot_cpu_data.x86_clflush_size on every iteration of the loop. We can manually hoist that read which perf regarded as taking ~25% of the function time for a single cacheline flush. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Borislav Petkov <bp@suse.de> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Sai Praneeth <sai.praneeth.prakhya@intel.com> Link: http://lkml.kernel.org/r/1452246933-10890-1-git-send-email-chris@chris-wilson.co.uk Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | x86/mm/pat: Change free_memtype() to support shrinking caseToshi Kani2016-01-052-10/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using mremap() to shrink the map size of a VM_PFNMAP range causes the following error message, and leaves the pfn range allocated. x86/PAT: test:3493 freeing invalid memtype [mem 0x483200000-0x4863fffff] This is because rbt_memtype_erase(), called from free_memtype() with spin_lock held, only supports to free a whole memtype node in memtype_rbroot. Therefore, this patch changes rbt_memtype_erase() to support a request that shrinks the size of a memtype node for mremap(). memtype_rb_exact_match() is renamed to memtype_rb_match(), and is enhanced to support EXACT_MATCH and END_MATCH in @match_type. Since the memtype_rbroot tree allows overlapping ranges, rbt_memtype_erase() checks with EXACT_MATCH first, i.e. free a whole node for the munmap case. If no such entry is found, it then checks with END_MATCH, i.e. shrink the size of a node from the end for the mremap case. On the mremap case, rbt_memtype_erase() proceeds in two steps, 1) remove the node, and then 2) insert the updated node. This allows proper update of augmented values, subtree_max_end, in the tree. Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: stsp@list.ru Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1450832064-10093-3-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | x86/mm/pat: Add untrack_pfn_moved for mremapToshi Kani2016-01-053-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | mremap() with MREMAP_FIXED on a VM_PFNMAP range causes the following WARN_ON_ONCE() message in untrack_pfn(). WARNING: CPU: 1 PID: 3493 at arch/x86/mm/pat.c:985 untrack_pfn+0xbd/0xd0() Call Trace: [<ffffffff817729ea>] dump_stack+0x45/0x57 [<ffffffff8109e4b6>] warn_slowpath_common+0x86/0xc0 [<ffffffff8109e5ea>] warn_slowpath_null+0x1a/0x20 [<ffffffff8106a88d>] untrack_pfn+0xbd/0xd0 [<ffffffff811d2d5e>] unmap_single_vma+0x80e/0x860 [<ffffffff811d3725>] unmap_vmas+0x55/0xb0 [<ffffffff811d916c>] unmap_region+0xac/0x120 [<ffffffff811db86a>] do_munmap+0x28a/0x460 [<ffffffff811dec33>] move_vma+0x1b3/0x2e0 [<ffffffff811df113>] SyS_mremap+0x3b3/0x510 [<ffffffff817793ee>] entry_SYSCALL_64_fastpath+0x12/0x71 MREMAP_FIXED moves a pfnmap from old vma to new vma. untrack_pfn() is called with the old vma after its pfnmap page table has been removed, which causes follow_phys() to fail. The new vma has a new pfnmap to the same pfn & cache type with VM_PAT set. Therefore, we only need to clear VM_PAT from the old vma in this case. Add untrack_pfn_moved(), which clears VM_PAT from a given old vma. move_vma() is changed to call this function with the old vma when VM_PFNMAP is set. move_vma() then calls do_munmap(), and untrack_pfn() is a no-op since VM_PAT is cleared. Reported-by: Stas Sergeev <stsp@list.ru> Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Borislav Petkov <bp@suse.de> Cc: linux-mm@kvack.org Link: http://lkml.kernel.org/r/1450832064-10093-2-git-send-email-toshi.kani@hpe.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | x86/mm: Drop WARN from multi-BAR checkLaura Abbott2015-12-291-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ioremapping multiple BARs produces a warning with a message "Your kernel is fine". This message mostly serves to comfort kernel developers. Users do not read the message, they only see the big scary warning which means something must be horribly broken with their system. Less dramatically, the warn also sets the taint flag which makes it difficult to differentiate problems. If the kernel is actually fine as the warning claims it doesn't make sense for it to be tainted. Change the WARN_ONCE to a pr_warn with the caller of the ioremap. Signed-off-by: Laura Abbott <labbott@fedoraproject.org> Link: http://lkml.kernel.org/r/1450728074-31029-1-git-send-email-labbott@fedoraproject.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | x86/LDT: Print the real LDT base addressJan Beulich2015-12-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This was meant to print base address and entry count; make it do so again. Fixes: 37868fe113ff "x86/ldt: Make modify_ldt synchronous" Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Andy Lutomirski <luto@kernel.org> Link: http://lkml.kernel.org/r/56797D8402000078000C24F0@prv-mh.provo.novell.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| | * | x86/mm/64: Enable SWIOTLB if system has SRAT memory regions above MAX_DMA32_PFNIgor Mammedov2015-12-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when memory hotplug enabled system is booted with less than 4GB of RAM and then later more RAM is hotplugged 32-bit devices stop functioning with following error: nommu_map_single: overflow 327b4f8c0+1522 of device mask ffffffff the reason for this is that if x86_64 system were booted with RAM less than 4GB, it doesn't enable SWIOTLB and when memory is hotplugged beyond MAX_DMA32_PFN, devices that expect 32-bit addresses can't handle 64-bit addresses. Fix it by tracking max possible PFN when parsing memory affinity structures from SRAT ACPI table and enable SWIOTLB if there is hotpluggable memory regions beyond MAX_DMA32_PFN. It fixes KVM guests when they use emulated devices (reproduces with ata_piix, e1000 and usb devices, RHBZ: 1275941, 1275977, 1271527) It also fixes the HyperV, VMWare with emulated devices which are affected by this issue as well. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: fujita.tomonori@lab.ntt.co.jp Cc: konrad.wilk@oracle.com Cc: pbonzini@redhat.com Cc: revers@redhat.com Cc: riel@redhat.com Link: http://lkml.kernel.org/r/1449234426-273049-3-git-send-email-imammedo@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | x86/mm: Introduce max_possible_pfnIgor Mammedov2015-12-065-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | max_possible_pfn will be used for tracking max possible PFN for memory that isn't present in E820 table and could be hotplugged later. By default max_possible_pfn is initialized with max_pfn, but later it could be updated with highest PFN of hotpluggable memory ranges declared in ACPI SRAT table if any present. Signed-off-by: Igor Mammedov <imammedo@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: akataria@vmware.com Cc: fujita.tomonori@lab.ntt.co.jp Cc: konrad.wilk@oracle.com Cc: pbonzini@redhat.com Cc: revers@redhat.com Cc: riel@redhat.com Link: http://lkml.kernel.org/r/1449234426-273049-2-git-send-email-imammedo@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | x86/mm/ptdump: Make (debugfs)/kernel_page_tables read-onlyBorislav Petkov2015-12-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | File should be created with S_IRUSR and not with S_IWUSR too because writing to it doesn't make any sense. I mean, we don't have a ->write method anyway but let's have the permissions correct too. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1448885579-32506-1-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | x86/mm/mtrr: Mark the 'range_new' static variable in mtrr_calc_range_state() ↵Rasmus Villemoes2015-12-041-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | as __initdata 'range_new' doesn't seem to be used after init. It is only passed to memset(), sum_ranges(), memcmp() and x86_get_mtrr_mem_range(), the latter of which also only passes it on to various *range* library functions. So mark it __initdata to free up an extra page after init. Its contents are wiped at every call to mtrr_calc_range_state(), so it being static is not about preserving state between calls, but simply to avoid a 4k+ stack frame. While there, add a comment explaining this and why it's safe. We could also mark nr_range_new as __initdata, but since it's just a single int and also doesn't carry state between calls (it is unconditionally assigned to before it is read), we might as well make it an ordinary automatic variable. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hp.com> Link: http://lkml.kernel.org/r/1449002691-20783-1-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | x86/mm: Turn CONFIG_X86_PTDUMP into a moduleKees Cook2015-11-234-33/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Being able to examine page tables is handy, so make this a module that can be loaded as needed. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Toshi Kani <toshi.kani@hpe.com> Cc: Vladimir Murzin <vladimir.murzin@arm.com> Cc: Will Deacon <will.deacon@arm.com> Link: http://lkml.kernel.org/r/20151120010755.GA9060@www.outflux.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | Merge branch 'x86-fpu-for-linus' of ↵Linus Torvalds2016-01-113-94/+87
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Ingo Molnar: "This cleans up the FPU fault handling methods to be more robust, and moves eligible variables to .init.data" * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Put a few variables in .init.data x86/fpu: Get rid of xstate_fault() x86/fpu: Add an XSTATE_OP() macro
| | * | | x86/fpu: Put a few variables in .init.dataRasmus Villemoes2015-11-272-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These are clearly just used during init. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1447424312-26400-1-git-send-email-linux@rasmusvillemoes.dk Signed-off-by: Ingo Molnar <mingo@kernel.org>
OpenPOWER on IntegriCloud