summaryrefslogtreecommitdiffstats
path: root/arch/x86
Commit message (Collapse)AuthorAgeFilesLines
* Merge 3.9-rc4 into driver-core-nextGreg Kroah-Hartman2013-01-174-9/+109
|\ | | | | | | | | | | | | This is to fix up a build problem with a wireless driver due to the dynamic-debug patches in this branch. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * Merge branch 'x86/urgent' of ↵Linus Torvalds2013-01-162-1/+81
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is mainly a workaround for a bug in Sandy Bridge graphics which causes corruption of certain memory pages." * 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCI x86/Sandy Bridge: mark arrays in __init functions as __initconst x86/Sandy Bridge: reserve pages when integrated graphics is present x86, efi: correct precedence of operators in setup_efi_pci
| | * x86/Sandy Bridge: Sandy Bridge workaround depends on CONFIG_PCIH. Peter Anvin2013-01-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | early_pci_allowed() and read_pci_config_16() are only available if CONFIG_PCI is defined. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * x86/Sandy Bridge: mark arrays in __init functions as __initconstH. Peter Anvin2013-01-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | Mark static arrays as __initconst so they get removed when the init sections are flushed. Reported-by: Mathias Krause <minipli@googlemail.com> Link: http://lkml.kernel.org/r/75F4BEE6-CB0E-4426-B40B-697451677738@googlemail.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| | * x86/Sandy Bridge: reserve pages when integrated graphics is presentJesse Barnes2013-01-111-0/+78
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | SNB graphics devices have a bug that prevent them from accessing certain memory ranges, namely anything below 1M and in the pages listed in the table. So reserve those at boot if set detect a SNB gfx device on the CPU to avoid GPU hangs. Stephane Marchesin had a similar patch to the page allocator awhile back, but rather than reserving pages up front, it leaked them at allocation time. [ hpa: made a number of stylistic changes, marked arrays as static const, and made less verbose; use "memblock=debug" for full verbosity. ] Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| | * x86, efi: correct precedence of operators in setup_efi_pciSasha Levin2012-12-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the current code, the condition in the if() doesn't make much sense due to precedence of operators. Signed-off-by: Sasha Levin <sasha.levin@oracle.com> Link: http://lkml.kernel.org/r/1356030701-16284-25-git-send-email-sasha.levin@oracle.com Cc: Matt Fleming <matt.fleming@intel.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
| * | Merge git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2013-01-102-8/+28
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM bugfixes from Marcelo Tosatti. * git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86: use dynamic percpu allocations for shared msrs area KVM: PPC: Book3S HV: Fix compilation without CONFIG_PPC_POWERNV powerpc: Corrected include header path in kvm_para.h Add rcu user eqs exception hooks for async page fault
| | * | KVM: x86: use dynamic percpu allocations for shared msrs areaMarcelo Tosatti2013-01-081-6/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use dynamic percpu allocations for the shared msrs structure, to avoid using the limited reserved percpu space. Reviewed-by: Gleb Natapov <gleb@redhat.com> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
| | * | Add rcu user eqs exception hooks for async page faultLi Zhong2012-12-181-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds user eqs exception hooks for async page fault page not present code path, to exit the user eqs and re-enter it as necessary. Async page fault is different from other exceptions that it may be triggered from idle process, so we still need rcu_irq_enter() and rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code that needs use rcu. As Frederic pointed out it would be safest and simplest to protect the whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the code there deeply for potential RCU uses and ensure it will never be extended later to use RCU.". However, We'd better re-enter the cpu idle eqs if we get the exception in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt(). So the patch does what Frederic suggested for rcu_irq_*() API usage here, except that I moved the rcu_irq_*() pair originally in do_async_page_fault() into kvm_async_pf_task_wait(). That's because, I think it's better to have rcu_irq_*() pairs to be in one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here, kvm_async_pf_task_wait() has other callers, which might cause rcu_irq_exit() be called without a matching rcu_irq_enter() before it, which is illegal if the cpu happens to be in rcu idle state. Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Gleb Natapov <gleb@redhat.com>
* | | | arch/x86/um: remove depends on CONFIG_EXPERIMENTALKees Cook2013-01-111-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Jeff Dike <jdike@addtoit.com> CC: Richard Weinberger <richard@nod.at> CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Richard Weinberger <richard@nod.at>
* | | | arch/x86: remove depends on CONFIG_EXPERIMENTALKees Cook2013-01-111-12/+10
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The CONFIG_EXPERIMENTAL config item has not carried much meaning for a while now and is almost always enabled by default. As agreed during the Linux kernel summit, remove it from any "depends on" lines in Kconfigs. CC: Thomas Gleixner <tglx@linutronix.de> CC: Ingo Molnar <mingo@redhat.com> CC: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Kees Cook <keescook@chromium.org> Acked-by: Ingo Molnar <mingo@kernel.org>
* | | X86: drivers: remove __dev* attributes.Greg Kroah-Hartman2013-01-0321-89/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CONFIG_HOTPLUG is going away as an option. As a result, the __dev* markings need to be removed. This change removes the use of __devinit, __devexit_p, __devinitconst, and __devexit from these drivers. Based on patches originally written by Bill Pemberton, but redone by me in order to handle some of the coding style issues better, by hand. Cc: Bill Pemberton <wfp5p@virginia.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Daniel Drake <dsd@laptop.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | | PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)Myron Stowe2012-12-261-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 284f5f9 was intended to disable the "only_one_child()" optimization on Stratus ftServer systems, but its DMI check is wrong. It looks for DMI_SYS_VENDOR that contains "ftServer", when it should look for DMI_SYS_VENDOR containing "Stratus" and DMI_PRODUCT_NAME containing "ftServer". Tested on Stratus ftServer 6400. Reported-by: Fadeeva Marina <astarta@rat.ru> Reference: https://bugzilla.kernel.org/show_bug.cgi?id=51331 Signed-off-by: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org # v3.5+
* | | Merge branch 'for-linus' of ↵Linus Torvalds2012-12-2019-118/+24
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal Pull signal handling cleanups from Al Viro: "sigaltstack infrastructure + conversion for x86, alpha and um, COMPAT_SYSCALL_DEFINE infrastructure. Note that there are several conflicts between "unify SS_ONSTACK/SS_DISABLE definitions" and UAPI patches in mainline; resolution is trivial - just remove definitions of SS_ONSTACK and SS_DISABLED from arch/*/uapi/asm/signal.h; they are all identical and include/uapi/linux/signal.h contains the unified variant." Fixed up conflicts as per Al. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: alpha: switch to generic sigaltstack new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to those generic compat_sys_sigaltstack() introduce generic sys_sigaltstack(), switch x86 and um to it new helper: compat_user_stack_pointer() new helper: restore_altstack() unify SS_ONSTACK/SS_DISABLE definitions new helper: current_user_stack_pointer() missing user_stack_pointer() instances Bury the conditionals from kernel_thread/kernel_execve series COMPAT_SYSCALL_DEFINE: infrastructure
| * | new helpers: __save_altstack/__compat_save_altstack, switch x86 and um to thoseAl Viro2012-12-193-25/+7
| | | | | | | | | | | | | | | | | | note that they are relying on access_ok() already checked by caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | generic compat_sys_sigaltstack()Al Viro2012-12-198-67/+6
| | | | | | | | | | | | | | | | | | Again, conditional on CONFIG_GENERIC_SIGALTSTACK Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | introduce generic sys_sigaltstack(), switch x86 and um to itAl Viro2012-12-1910-16/+4
| | | | | | | | | | | | | | | | | | | | | Conditional on CONFIG_GENERIC_SIGALTSTACK; architectures that do not select it are completely unaffected Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | new helper: compat_user_stack_pointer()Al Viro2012-12-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compat counterpart of current_user_stack_pointer(); for most of the biarch architectures those two are identical, but e.g. arm64 and arm use different registers for stack pointer... Note that amd64 variants of current_user_stack_pointer/compat_user_stack_pointer do *not* rely on pt_regs having been through FIXUP_TOP_OF_STACK. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | unify SS_ONSTACK/SS_DISABLE definitionsAl Viro2012-12-191-6/+0
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | missing user_stack_pointer() instancesAl Viro2012-12-191-0/+1
| | | | | | | | | | | | | | | | | | | | | for the architectures that have usp in pt_regs and do not have user_stack_pointer() already defined. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | Bury the conditionals from kernel_thread/kernel_execve seriesAl Viro2012-12-193-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All architectures have CONFIG_GENERIC_KERNEL_THREAD CONFIG_GENERIC_KERNEL_EXECVE __ARCH_WANT_SYS_EXECVE None of them have __ARCH_WANT_KERNEL_EXECVE and there are only two callers of kernel_execve() (which is a trivial wrapper for do_execve() now) left. Kill the conditionals and make both callers use do_execve(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | Merge tag 'iommu-updates-v3.8' of ↵Linus Torvalds2012-12-201-0/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: "A few new features this merge-window. The most important one is probably, that dma-debug now warns if a dma-handle is not checked with dma_mapping_error by the device driver. This requires minor changes to some architectures which make use of dma-debug. Most of these changes have the respective Acks by the Arch-Maintainers. Besides that there are updates to the AMD IOMMU driver for refactor the IOMMU-Groups support and to make sure it does not trigger a hardware erratum. The OMAP changes (for which I pulled in a branch from Tony Lindgren's tree) have a conflict in linux-next with the arm-soc tree. The conflict is in the file arch/arm/mach-omap2/clock44xx_data.c which is deleted in the arm-soc tree. It is safe to delete the file too so solve the conflict. Similar changes are done in the arm-soc tree in the common clock framework migration. A missing hunk from the patch in the IOMMU tree will be submitted as a seperate patch when the merge-window is closed." * tag 'iommu-updates-v3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (29 commits) ARM: dma-mapping: support debug_dma_mapping_error ARM: OMAP4: hwmod data: ipu and dsp to use parent clocks instead of leaf clocks iommu/omap: Adapt to runtime pm iommu/omap: Migrate to hwmod framework iommu/omap: Keep mmu enabled when requested iommu/omap: Remove redundant clock handling on ISR iommu/amd: Remove obsolete comment iommu/amd: Don't use 512GB pages iommu/tegra: smmu: Move bus_set_iommu after probe for multi arch iommu/tegra: gart: Move bus_set_iommu after probe for multi arch iommu/tegra: smmu: Remove unnecessary PTC/TLB flush all tile: dma_debug: add debug_dma_mapping_error support sh: dma_debug: add debug_dma_mapping_error support powerpc: dma_debug: add debug_dma_mapping_error support mips: dma_debug: add debug_dma_mapping_error support microblaze: dma-mapping: support debug_dma_mapping_error ia64: dma_debug: add debug_dma_mapping_error support c6x: dma_debug: add debug_dma_mapping_error support ARM64: dma_debug: add debug_dma_mapping_error support intel-iommu: Prevent devices with RMRRs from being placed into SI Domain ...
| | \ \
| | \ \
| | \ \
| | \ \
| | \ \
| | \ \
| *-----. \ \ Merge branches 'iommu/fixes', 'dma-debug', 'x86/amd', 'x86/vt-d', ↵Joerg Roedel2012-12-161-0/+1
| |\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | 'arm/tegra' and 'arm/omap' into next
| | * | | | | | dma-debug: New interfaces to debug dma mapping errorsShuah Khan2012-10-241-0/+1
| | |/ / / / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add dma-debug interface debug_dma_mapping_error() to debug drivers that fail to check dma mapping errors on addresses returned by dma_map_single() and dma_map_page() interfaces. This interface clears a flag set by debug_dma_map_page() to indicate that dma_mapping_error() has been called by the driver. When driver does unmap, debug_dma_unmap() checks the flag and if this flag is still set, prints warning message that includes call trace that leads up to the unmap. This interface can be called from dma_mapping_error() routines to enable dma mapping error check debugging. Tested: Intel iommu and swiotlb (iommu=soft) on x86-64 with CONFIG_DMA_API_DEBUG enabled and disabled. Signed-off-by: Shuah Khan <shuah.khan@hp.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
* | | | | | | Merge branch 'x86/nuke386' of ↵Linus Torvalds2012-12-193-52/+1
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull one final 386 removal patch from Peter Anvin. IRQ 13 FPU error handling is gone. That was not one of the proudest moments in PC history. * 'x86/nuke386' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, 386 removal: Remove support for IRQ 13 FPU error reporting
| * | | | | | | x86, 386 removal: Remove support for IRQ 13 FPU error reportingH. Peter Anvin2012-12-173-52/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove support for FPU error reporting via IRQ 13, as opposed to exception 16 (#MF). One last remnant of i386 gone. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Cc: Alan Cox <alan@linux.intel.com>
* | | | | | | | Merge tag 'modules-next-for-linus' of ↵Linus Torvalds2012-12-192-0/+2
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux Pull module update from Rusty Russell: "Nothing all that exciting; a new module-from-fd syscall for those who want to verify the source of the module (ChromeOS) and/or use standard IMA on it or other security hooks." * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: MODSIGN: Fix kbuild output when using default extra_certificates MODSIGN: Avoid using .incbin in C source modules: don't hand 0 to vmalloc. module: Remove a extra null character at the top of module->strtab. ASN.1: Use the ASN1_LONG_TAG and ASN1_INDEFINITE_LENGTH constants ASN.1: Define indefinite length marker constant moduleparam: use __UNIQUE_ID() __UNIQUE_ID() MODSIGN: Add modules_sign make target powerpc: add finit_module syscall. ima: support new kernel module syscall add finit_module syscall to asm-generic ARM: add finit_module syscall to ARM security: introduce kernel_module_from_file hook module: add flags arg to sys_finit_module() module: add syscall to load module from fd
| * | | | | | | | module: add syscall to load module from fdKees Cook2012-12-142-0/+2
| | |/ / / / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As part of the effort to create a stronger boundary between root and kernel, Chrome OS wants to be able to enforce that kernel modules are being loaded only from our read-only crypto-hash verified (dm_verity) root filesystem. Since the init_module syscall hands the kernel a module as a memory blob, no reasoning about the origin of the blob can be made. Earlier proposals for appending signatures to kernel modules would not be useful in Chrome OS, since it would involve adding an additional set of keys to our kernel and builds for no good reason: we already trust the contents of our root filesystem. We don't need to verify those kernel modules a second time. Having to do signature checking on module loading would slow us down and be redundant. All we need to know is where a module is coming from so we can say yes/no to loading it. If a file descriptor is used as the source of a kernel module, many more things can be reasoned about. In Chrome OS's case, we could enforce that the module lives on the filesystem we expect it to live on. In the case of IMA (or other LSMs), it would be possible, for example, to examine extended attributes that may contain signatures over the contents of the module. This introduces a new syscall (on x86), similar to init_module, that has only two arguments. The first argument is used as a file descriptor to the module and the second argument is a pointer to the NULL terminated string of module arguments. Signed-off-by: Kees Cook <keescook@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (merge fixes)
* | | | | | | | Merge branch 'akpm' (more patches from Andrew)Linus Torvalds2012-12-181-10/+57
|\ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge patches from Andrew Morton: "Most of the rest of MM, plus a few dribs and drabs. I still have quite a few irritating patches left around: ones with dubious testing results, lack of review, ones which should have gone via maintainer trees but the maintainers are slack, etc. I need to be more activist in getting these things wrapped up outside the merge window, but they're such a PITA." * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (48 commits) mm/vmscan.c: avoid possible deadlock caused by too_many_isolated() vmscan: comment too_many_isolated() mm/kmemleak.c: remove obsolete simple_strtoul mm/memory_hotplug.c: improve comments mm/hugetlb: create hugetlb cgroup file in hugetlb_init mm/mprotect.c: coding-style cleanups Documentation: ABI: /sys/devices/system/node/ slub: drop mutex before deleting sysfs entry memcg: add comments clarifying aspects of cache attribute propagation kmem: add slab-specific documentation about the kmem controller slub: slub-specific propagation changes slab: propagate tunable values memcg: aggregate memcg cache values in slabinfo memcg/sl[au]b: shrink dead caches memcg/sl[au]b: track all the memcg children of a kmem_cache memcg: destroy memcg caches sl[au]b: allocate objects from memcg cache sl[au]b: always get the cache from its page in kmem_cache_free() memcg: skip memcg kmem allocations in specified code regions memcg: infrastructure to match an allocation to the right cache ...
| * | | | | | | | arch/x86/platform/iris/iris.c: register a platform device and a platform driverShérab2012-12-181-10/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This makes the iris driver use the platform API, so it is properly exposed in /sys. [akpm@linux-foundation.org: remove commented-out code, add missing space to printk, clean up code layout] Signed-off-by: Shérab <Sebastien.Hinderer@ens-lyon.org> Cc: Len Brown <lenb@kernel.org> Cc: Matthew Garrett <mjg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | Merge branch 'release' of ↵Linus Torvalds2012-12-181-0/+37
|\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull powertool update from Len Brown: "This updates the tree w/ the latest version of turbostat, which reports temperature and - on SNB and later - Watts." Fix up semantic merge conflict as per Len. * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools: Allow tools to be installed in a user specified location tools/power: turbostat: make Makefile a bit more capable tools/power x86_energy_perf_policy: close /proc/stat in for_every_cpu() tools/power turbostat: v3.0: monitor Watts and Temperature tools/power turbostat: fix output buffering issue tools/power turbostat: prevent infinite loop on migration error path x86 power: define RAPL MSRs tools/power/x86/turbostat: share kernel MSR #defines
| * | | | | | | | | x86 power: define RAPL MSRsLen Brown2012-11-231-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The Run Time Average Power Limiting interface is currently model specific, present on Sandy Bridge and Ivy Bridge processors. These #defines correspond to documentation in the latest "Intel® 64 and IA-32 Architectures Software Developer Manual", plus some typos in that document corrected. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org
| * | | | | | | | | tools/power/x86/turbostat: share kernel MSR #definesLen Brown2012-11-231-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that turbostat is built in the kernel tree, it can share MSR #defines with the kernel. Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org
* | | | | | | | | | Merge tag 'stable/for-linus-3.8-rc0-bugfix-tag' of ↵Linus Torvalds2012-12-182-4/+5
|\ \ \ \ \ \ \ \ \ \ | |_|/ / / / / / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen Pull Xen bugfixes from Konrad Rzeszutek Wilk: "Two fixes. One of them is caused by the recent change introduced by the 'x86-bsp-hotplug-for-linus' tip tree that inhibited bootup (old function does not do what it used to do). The other one is just a vanilla bug. - Fix to bootup regression introduced by 'x86-bsp-hotplug-for-linus' tip branch. - Fix to vcpu hotplug code." * tag 'stable/for-linus-3.8-rc0-bugfix-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/vcpu: Fix vcpu restore path. xen: Add EVTCHNOP_reset in Xen interface header files. xen/smp: Use smp_store_boot_cpu_info() to store cpu info for BSP during boot time.
| * | | | | | | | | xen/vcpu: Fix vcpu restore path.Wei Liu2012-12-171-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The runstate of vcpu should be restored for all possible cpus, as well as the vcpu info placement. Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Wei Liu <wei.liu2@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * | | | | | | | | xen/smp: Use smp_store_boot_cpu_info() to store cpu info for BSP during boot ↵Konrad Rzeszutek Wilk2012-12-171-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | time. Git commit 30106c174311b8cfaaa3186c7f6f9c36c62d17da ("x86, hotplug: Support functions for CPU0 online/offline") alters what the call to smp_store_cpu_info() does. For BSP we should use the smp_store_boot_cpu_info() and for secondary CPU's the old variant of smp_store_cpu_info() should be used. This fixes the regression introduced by said commit. Reported-and-Tested-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | | | | | | | | | x86, paravirt: fix build error when thp is disabledDavid Rientjes2012-12-181-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With CONFIG_PARAVIRT=y and CONFIG_TRANSPARENT_HUGEPAGE=n, the build breaks because set_pmd_at() is undeclared: mm/memory.c: In function 'do_pmd_numa_page': mm/memory.c:3520: error: implicit declaration of function 'set_pmd_at' mm/mprotect.c: In function 'change_pmd_protnuma': mm/mprotect.c:120: error: implicit declaration of function 'set_pmd_at' This is because paravirt defines set_pmd_at() only when CONFIG_TRANSPARENT_HUGEPAGE=y and such a restriction is unneeded. The fix is to define it for all CONFIG_PARAVIRT configurations. Signed-off-by: David Rientjes <rientjes@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | Merge tag 'md-3.8' of git://neil.brown.name/mdLinus Torvalds2012-12-181-2/+3
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull md update from Neil Brown: "Mostly just little fixes. Probably biggest part is AVX accelerated RAID6 calculations." * tag 'md-3.8' of git://neil.brown.name/md: md/raid5: add blktrace calls md/raid5: use async_tx_quiesce() instead of open-coding it. md: Use ->curr_resync as last completed request when cleanly aborting resync. lib/raid6: build proper files on corresponding arch lib/raid6: Add AVX2 optimized gen_syndrome functions lib/raid6: Add AVX2 optimized recovery functions md: Update checkpoint of resync/recovery based on time. md:Add place to update ->recovery_cp. md.c: re-indent various 'switch' statements. md: close race between removing and adding a device. md: removed unused variable in calc_sb_1_csm.
| * | | | | | | | | | lib/raid6: Add AVX2 optimized recovery functionsJim Kukunas2012-12-131-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize RAID6 recovery functions to take advantage of the 256-bit YMM integer instructions introduced in AVX2. The patch was tested and benchmarked before submission. However hardware is not yet released so benchmark numbers cannot be reported. Acked-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Jim Kukunas <james.t.kukunas@linux.intel.com> Signed-off-by: NeilBrown <neilb@suse.de>
* | | | | | | | | | | create non-empty arch/x86/include/uapi/asm/ filesAndrew Morton2012-12-172-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | patch(1) doesn't create zero-length files, so my kernel didn't compile. Put something in these files so patch(1) actually creates them. Cc: David Howells <dhowells@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Dave Jones <davej@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2012-12-161-51/+59
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem updates from James Morris: "A quiet cycle for the security subsystem with just a few maintenance updates." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: Smack: create a sysfs mount point for smackfs Smack: use select not depends in Kconfig Yama: remove locking from delete path Yama: add RCU to drop read locking drivers/char/tpm: remove tasklet and cleanup KEYS: Use keyring_alloc() to create special keyrings KEYS: Reduce initial permissions on keys KEYS: Make the session and process keyrings per-thread seccomp: Make syscall skipping and nr changes more consistent key: Fix resource leak keys: Fix unreachable code KEYS: Add payload preparsing opportunity prior to key instantiate or update
| * | | | | | | | | | | seccomp: Make syscall skipping and nr changes more consistentAndy Lutomirski2012-10-021-51/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes two issues that could cause incompatibility between kernel versions: - If a tracer uses SECCOMP_RET_TRACE to select a syscall number higher than the largest known syscall, emulate the unknown vsyscall by returning -ENOSYS. (This is unlikely to make a noticeable difference on x86-64 due to the way the system call entry works.) - On x86-64 with vsyscall=emulate, skipped vsyscalls were buggy. This updates the documentation accordingly. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Acked-by: Will Drewry <wad@chromium.org> Signed-off-by: James Morris <james.l.morris@oracle.com>
* | | | | | | | | | | | Merge tag 'balancenuma-v11' of ↵Linus Torvalds2012-12-164-3/+44
|\ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma Pull Automatic NUMA Balancing bare-bones from Mel Gorman: "There are three implementations for NUMA balancing, this tree (balancenuma), numacore which has been developed in tip/master and autonuma which is in aa.git. In almost all respects balancenuma is the dumbest of the three because its main impact is on the VM side with no attempt to be smart about scheduling. In the interest of getting the ball rolling, it would be desirable to see this much merged for 3.8 with the view to building scheduler smarts on top and adapting the VM where required for 3.9. The most recent set of comparisons available from different people are mel: https://lkml.org/lkml/2012/12/9/108 mingo: https://lkml.org/lkml/2012/12/7/331 tglx: https://lkml.org/lkml/2012/12/10/437 srikar: https://lkml.org/lkml/2012/12/10/397 The results are a mixed bag. In my own tests, balancenuma does reasonably well. It's dumb as rocks and does not regress against mainline. On the other hand, Ingo's tests shows that balancenuma is incapable of converging for this workloads driven by perf which is bad but is potentially explained by the lack of scheduler smarts. Thomas' results show balancenuma improves on mainline but falls far short of numacore or autonuma. Srikar's results indicate we all suffer on a large machine with imbalanced node sizes. My own testing showed that recent numacore results have improved dramatically, particularly in the last week but not universally. We've butted heads heavily on system CPU usage and high levels of migration even when it shows that overall performance is better. There are also cases where it regresses. Of interest is that for specjbb in some configurations it will regress for lower numbers of warehouses and show gains for higher numbers which is not reported by the tool by default and sometimes missed in treports. Recently I reported for numacore that the JVM was crashing with NullPointerExceptions but currently it's unclear what the source of this problem is. Initially I thought it was in how numacore batch handles PTEs but I'm no longer think this is the case. It's possible numacore is just able to trigger it due to higher rates of migration. These reports were quite late in the cycle so I/we would like to start with this tree as it contains much of the code we can agree on and has not changed significantly over the last 2-3 weeks." * tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits) mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable mm/rmap: Convert the struct anon_vma::mutex to an rwsem mm: migrate: Account a transhuge page properly when rate limiting mm: numa: Account for failed allocations and isolations as migration failures mm: numa: Add THP migration for the NUMA working set scanning fault case build fix mm: numa: Add THP migration for the NUMA working set scanning fault case. mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG mm: sched: numa: Control enabling and disabling of NUMA balancing mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely task<->node relationships mm: numa: migrate: Set last_nid on newly allocated page mm: numa: split_huge_page: Transfer last_nid on tail page mm: numa: Introduce last_nid to the page frame sched: numa: Slowly increase the scanning period as NUMA faults are handled mm: numa: Rate limit setting of pte_numa if node is saturated mm: numa: Rate limit the amount of memory that is migrated between nodes mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting mm: numa: Migrate pages handled during a pmd_numa hinting fault mm: numa: Migrate on reference policy ...
| * | | | | | | | | | | | mm: numa: Add fault driven placement and migrationPeter Zijlstra2012-12-111-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NOTE: This patch is based on "sched, numa, mm: Add fault driven placement and migration policy" but as it throws away all the policy to just leave a basic foundation I had to drop the signed-offs-by. This patch creates a bare-bones method for setting PTEs pte_numa in the context of the scheduler that when faulted later will be faulted onto the node the CPU is running on. In itself this does nothing useful but any placement policy will fundamentally depend on receiving hints on placement from fault context and doing something intelligent about it. Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com>
| * | | | | | | | | | | | mm: numa: pte_numa() and pmd_numa()Andrea Arcangeli2012-12-111-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement pte_numa and pmd_numa. We must atomically set the numa bit and clear the present bit to define a pte_numa or pmd_numa. Once a pte or pmd has been set as pte_numa or pmd_numa, the next time a thread touches a virtual address in the corresponding virtual range, a NUMA hinting page fault will trigger. The NUMA hinting page fault will clear the NUMA bit and set the present bit again to resolve the page fault. The expectation is that a NUMA hinting page fault is used as part of a placement policy that decides if a page should remain on the current node or migrated to a different node. Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de>
| * | | | | | | | | | | | mm: numa: define _PAGE_NUMAAndrea Arcangeli2012-12-111-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The objective of _PAGE_NUMA is to be able to trigger NUMA hinting page faults to identify the per NUMA node working set of the thread at runtime. Arming the NUMA hinting page fault mechanism works similarly to setting up a mprotect(PROT_NONE) virtual range: the present bit is cleared at the same time that _PAGE_NUMA is set, so when the fault triggers we can identify it as a NUMA hinting page fault. _PAGE_NUMA on x86 shares the same bit number of _PAGE_PROTNONE (but it could also use a different bitflag, it's up to the architecture to decide). It would be confusing to call the "NUMA hinting page faults" as "do_prot_none faults". They're different events and _PAGE_NUMA doesn't alter the semantics of mprotect(PROT_NONE) in any way. Sharing the same bitflag with _PAGE_PROTNONE in fact complicates things: it requires us to ensure the code paths executed by _PAGE_PROTNONE remains mutually exclusive to the code paths executed by _PAGE_NUMA at all times, to avoid _PAGE_NUMA and _PAGE_PROTNONE to step into each other toes. Because we want to be able to set this bitflag in any established pte or pmd (while clearing the present bit at the same time) without losing information, this bitflag must never be set when the pte and pmd are present, so the bitflag picked for _PAGE_NUMA usage, must not be used by the swap entry format. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com>
| * | | | | | | | | | | | x86/mm: Introduce pte_accessible()Rik van Riel2012-12-111-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need pte_present to return true for _PAGE_PROTNONE pages, to indicate that the pte is associated with a page. However, for TLB flushing purposes, we would like to know whether the pte points to an actually accessible page. This allows us to skip remote TLB flushes for pages that are not actually accessible. Fill in this method for x86 and provide a safe (but slower) method on other architectures. Signed-off-by: Rik van Riel <riel@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixed-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/n/tip-66p11te4uj23gevgh4j987ip@git.kernel.org [ Added Linus's review fixes. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | | | | | | | | | | x86: mm: drop TLB flush from ptep_set_access_flagsRik van Riel2012-12-111-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Intel has an architectural guarantee that the TLB entry causing a page fault gets invalidated automatically. This means we should be able to drop the local TLB invalidation. Because of the way other areas of the page fault code work, chances are good that all x86 CPUs do this. However, if someone somewhere has an x86 CPU that does not invalidate the TLB entry causing a page fault, this one-liner should be easy to revert. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michel Lespinasse <walken@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com>
| * | | | | | | | | | | | x86: mm: only do a local tlb flush in ptep_set_access_flags()Rik van Riel2012-12-111-1/+8
| | |_|_|_|_|_|_|/ / / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function ptep_set_access_flags() is only ever invoked to set access flags or add write permission on a PTE. The write bit is only ever set together with the dirty bit. Because we only ever upgrade a PTE, it is safe to skip flushing entries on remote TLBs. The worst that can happen is a spurious page fault on other CPUs, which would flush that TLB entry. Lazily letting another CPU incur a spurious page fault occasionally is (much!) cheaper than aggressively flushing everybody else's TLB. Signed-off-by: Rik van Riel <riel@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michel Lespinasse <walken@google.com> Cc: Ingo Molnar <mingo@kernel.org>
* | | | | | | | | | | | Revert "x86-64/efi: Use EFI to deal with platform wall clock (again)"Linus Torvalds2012-12-155-51/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit bd52276fa1d4 ("x86-64/efi: Use EFI to deal with platform wall clock (again)"), and the two supporting commits: da5a108d05b4: "x86/kernel: remove tboot 1:1 page table creation code" 185034e72d59: "x86, efi: 1:1 pagetable mapping for virtual EFI calls") as they all depend semantically on commit 53b87cf088e2 ("x86, mm: Include the entire kernel memory map in trampoline_pgd") that got reverted earlier due to the problems it caused. This was pointed out by Yinghai Lu, and verified by me on my Macbook Air that uses EFI. Pointed-out-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OpenPOWER on IntegriCloud