summaryrefslogtreecommitdiffstats
path: root/arch/x86/xen/enlighten.c
Commit message (Collapse)AuthorAgeFilesLines
* x86/xen: Probe target addresses in set_aliased_prot() before the hypercallAndy Lutomirski2015-07-311-0/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The update_va_mapping hypercall can fail if the VA isn't present in the guest's page tables. Under certain loads, this can result in an OOPS when the target address is in unpopulated vmap space. While we're at it, add comments to help explain what's going on. This isn't a great long-term fix. This code should probably be changed to use something like set_memory_ro. Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andrew Cooper <andrew.cooper3@citrix.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: David Vrabel <dvrabel@cantab.net> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jan Beulich <jbeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: security@kernel.org <security@kernel.org> Cc: <stable@vger.kernel.org> Cc: xen-devel <xen-devel@lists.xen.org> Link: http://lkml.kernel.org/r/0b0e55b995cda11e7829f140b833ef932fcabe3a.1438291540.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge branch 'x86-core-for-linus' of ↵Linus Torvalds2015-06-221-3/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Ingo Molnar: "There were so many changes in the x86/asm, x86/apic and x86/mm topics in this cycle that the topical separation of -tip broke down somewhat - so the result is a more traditional architecture pull request, collected into the 'x86/core' topic. The topics were still maintained separately as far as possible, so bisectability and conceptual separation should still be pretty good - but there were a handful of merge points to avoid excessive dependencies (and conflicts) that would have been poorly tested in the end. The next cycle will hopefully be much more quiet (or at least will have fewer dependencies). The main changes in this cycle were: * x86/apic changes, with related IRQ core changes: (Jiang Liu, Thomas Gleixner) - This is the second and most intrusive part of changes to the x86 interrupt handling - full conversion to hierarchical interrupt domains: [IOAPIC domain] ----- | [MSI domain] --------[Remapping domain] ----- [ Vector domain ] | (optional) | [HPET MSI domain] ----- | | [DMAR domain] ----------------------------- | [Legacy domain] ----------------------------- This now reflects the actual hardware and allowed us to distangle the domain specific code from the underlying parent domain, which can be optional in the case of interrupt remapping. It's a clear separation of functionality and removes quite some duct tape constructs which plugged the remap code between ioapic/msi/hpet and the vector management. - Intel IOMMU IRQ remapping enhancements, to allow direct interrupt injection into guests (Feng Wu) * x86/asm changes: - Tons of cleanups and small speedups, micro-optimizations. This is in preparation to move a good chunk of the low level entry code from assembly to C code (Denys Vlasenko, Andy Lutomirski, Brian Gerst) - Moved all system entry related code to a new home under arch/x86/entry/ (Ingo Molnar) - Removal of the fragile and ugly CFI dwarf debuginfo annotations. Conversion to C will reintroduce many of them - but meanwhile they are only getting in the way, and the upstream kernel does not rely on them (Ingo Molnar) - NOP handling refinements. (Borislav Petkov) * x86/mm changes: - Big PAT and MTRR rework: making the code more robust and preparing to phase out exposing direct MTRR interfaces to drivers - in favor of using PAT driven interfaces (Toshi Kani, Luis R Rodriguez, Borislav Petkov) - New ioremap_wt()/set_memory_wt() interfaces to support Write-Through cached memory mappings. This is especially important for good performance on NVDIMM hardware (Toshi Kani) * x86/ras changes: - Add support for deferred errors on AMD (Aravind Gopalakrishnan) This is an important RAS feature which adds hardware support for poisoned data. That means roughly that the hardware marks data which it has detected as corrupted but wasn't able to correct, as poisoned data and raises an APIC interrupt to signal that in the form of a deferred error. It is the OS's responsibility then to take proper recovery action and thus prolonge system lifetime as far as possible. - Add support for Intel "Local MCE"s: upcoming CPUs will support CPU-local MCE interrupts, as opposed to the traditional system- wide broadcasted MCE interrupts (Ashok Raj) - Misc cleanups (Borislav Petkov) * x86/platform changes: - Intel Atom SoC updates ... and lots of other cleanups, fixlets and other changes - see the shortlog and the Git log for details" * 'x86-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (222 commits) x86/hpet: Use proper hpet device number for MSI allocation x86/hpet: Check for irq==0 when allocating hpet MSI interrupts x86/mm/pat, drivers/infiniband/ipath: Use arch_phys_wc_add() and require PAT disabled x86/mm/pat, drivers/media/ivtv: Use arch_phys_wc_add() and require PAT disabled x86/platform/intel/baytrail: Add comments about why we disabled HPET on Baytrail genirq: Prevent crash in irq_move_irq() genirq: Enhance irq_data_to_desc() to support hierarchy irqdomain iommu, x86: Properly handle posted interrupts for IOMMU hotplug iommu, x86: Provide irq_remapping_cap() interface iommu, x86: Setup Posted-Interrupts capability for Intel iommu iommu, x86: Add cap_pi_support() to detect VT-d PI capability iommu, x86: Avoid migrating VT-d posted interrupts iommu, x86: Save the mode (posted or remapped) of an IRTE iommu, x86: Implement irq_set_vcpu_affinity for intel_ir_chip iommu: dmar: Provide helper to copy shared irte fields iommu: dmar: Extend struct irte for VT-d Posted-Interrupts iommu: Add new member capability to struct irq_remap_ops x86/asm/entry/64: Disentangle error_entry/exit gsbase/ebx/usermode code x86/asm/entry/32: Shorten __audit_syscall_entry() args preparation x86/asm/entry/32: Explain reloading of registers after __audit_syscall_entry() ...
| *-. Merge branches 'x86/apic', 'x86/asm', 'x86/mm' and 'x86/platform' into ↵Ingo Molnar2015-06-221-2/+3
| |\ \ | | | | | | | | | | | | | | | | | | | | x86/core, to merge last updates Signed-off-by: Ingo Molnar <mingo@kernel.org>
| | * | x86/mm/pat: Emulate PAT when it is disabledBorislav Petkov2015-06-071-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case when PAT is disabled on the command line with "nopat" or when virtualization doesn't support PAT (correctly) - see 9d34cfdf4796 ("x86: Don't rely on VMWare emulating PAT MSR correctly"). we emulate it using the PWT and PCD cache attribute bits. Get rid of boot_pat_state while at it. Based on a conglomerate patch from Toshi Kani. Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Toshi Kani <toshi.kani@hp.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Elliott@hp.com Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luis R. Rodriguez <mcgrof@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: arnd@arndb.de Cc: hch@lst.de Cc: hmh@hmh.eng.br Cc: konrad.wilk@oracle.com Cc: linux-mm <linux-mm@kvack.org> Cc: linux-nvdimm@lists.01.org Cc: stefan.bader@canonical.com Cc: yigal@plexistor.com Link: http://lkml.kernel.org/r/1433436928-31903-3-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | Merge branch 'linus' into x86/asm, before applying dependent patchIngo Molnar2015-05-081-98/+19
| |\ \ \ | | |/ / | | | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * | | x86, paravirt, xen: Remove the 64-bit ->irq_enable_sysexit() pvopAndy Lutomirski2015-04-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't use irq_enable_sysexit on 64-bit kernels any more. Remove all the paravirt and Xen machinery to support it on 64-bit kernels. Tested-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Denys Vlasenko <vda.linux@googlemail.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | x86/fpu: Simplify fpu__cpu_init()Ingo Molnar2015-05-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After the latest round of cleanups, fpu__cpu_init() has become a simple call to fpu__init_cpu(). Rename fpu__init_cpu() to fpu__cpu_init() and remove the extra layer. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | | x86/fpu: Rename fpu_init() to fpu__cpu_init()Ingo Molnar2015-05-191-1/+1
| |/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fpu_init() is a bit of a misnomer in that it (falsely) creates the impression that it's related to the (old) fpu_finit() function, which initializes FPU ctx state. Rename it to fpu__cpu_init() to make its boot time initialization clear, and to move it to the fpu__*() namespace. Also fix and extend its comment block to point out that it's called not only on the boot CPU, but on secondary CPUs as well. Reviewed-by: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | | hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guestsBoris Ostrovsky2015-05-051-9/+18
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 61f01dd941ba ("x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue") makes AMD processors set SS to __KERNEL_DS in __switch_to() to deal with cases when SS is NULL. This breaks Xen PV guests who do not want to load SS with__KERNEL_DS. Since the problem that the commit is trying to address would have to be fixed in the hypervisor (if it in fact exists under Xen) there is no reason to set X86_BUG_SYSRET_SS_ATTRS flag for PV VPCUs here. This can be easily achieved by adding x86_hyper_xen_hvm.set_cpu_features op which will clear this flag. (And since this structure is no longer HVM-specific we should do some renaming). Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reported-by: Sander Eikelenboom <linux@eikelenboom.it> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | Merge tag 'stable/for-linus-4.1-rc0-tag' of ↵Linus Torvalds2015-04-161-89/+1
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen features and fixes from David Vrabel: - use a single source list of hypercalls, generating other tables etc. at build time. - add a "Xen PV" APIC driver to support >255 VCPUs in PV guests. - significant performance improve to guest save/restore/migration. - scsiback/front save/restore support. - infrastructure for multi-page xenbus rings. - misc fixes. * tag 'stable/for-linus-4.1-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pci: Try harder to get PXM information for Xen xenbus_client: Extend interface to support multi-page ring xen-pciback: also support disabling of bus-mastering and memory-write-invalidate xen: support suspend/resume in pvscsi frontend xen: scsiback: add LUN of restored domain xen-scsiback: define a pr_fmt macro with xen-pvscsi xen/mce: fix up xen_late_init_mcelog() error handling xen/privcmd: improve performance of MMAPBATCH_V2 xen: unify foreign GFN map/unmap for auto-xlated physmap guests x86/xen/apic: WARN with details. x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUs xen/pciback: Don't print scary messages when unsupported by hypervisor. xen: use generated hypercall symbols in arch/x86/xen/xen-head.S xen: use generated hypervisor symbols in arch/x86/xen/trace.c xen: synchronize include/xen/interface/xen.h with xen xen: build infrastructure for generating hypercall depending symbols xen: balloon: Use static attribute groups for sysfs entries xen: pcpu: Use static attribute groups for sysfs entry
| * x86/xen: Provide a "Xen PV" APIC driver to support >255 VCPUsKonrad Rzeszutek Wilk2015-03-161-89/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of mangling the default APIC driver, provide a Xen PV guest specific one that explicitly provides appropriate methods. This allows use to report that all APIC IDs are valid, allowing dom0 to boot with more than 255 VCPUs. Since the probe order of APIC drivers is link dependent, we add in an late probe function to change to the Xen PV if it hadn't been done during bootup. Suggested-by: David Vrabel <david.vrabel@citrix.com> Reported-by: Cathy Avery <cathy.avery@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | x86/asm/entry: Add this_cpu_sp0() to read sp0 for the current cpuAndy Lutomirski2015-03-061-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently store references to the top of the kernel stack in multiple places: kernel_stack (with an offset) and init_tss.x86_tss.sp0 (no offset). The latter is defined by hardware and is a clean canonical way to find the top of the stack. Add an accessor so we can start using it. This needs minor paravirt tweaks. On native, sp0 defines the top of the kernel stack and is therefore always correct. On Xen and lguest, the hypervisor tracks the top of the stack, but we want to start reading sp0 in the kernel. Fixing this is simple: just update our local copy of sp0 as well as the hypervisor's copy on task switches. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8d675581859712bee09a055ed8f785d80dac1eca.1425611534.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/xen: Initialize cr4 shadow for 64-bit PV(H) guestsBoris Ostrovsky2015-02-231-0/+1
| | | | | | | | | | | Commit 1e02ce4cccdc ("x86: Store a per-cpu shadow copy of CR4") introduced CR4 shadows. These shadows are initialized in early boot code. The commit missed initialization for 64-bit PV(H) guests that this patch adds. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* x86/xen: Make sure X2APIC_ENABLE bit of MSR_IA32_APICBASE is not setBoris Ostrovsky2015-02-231-1/+18
| | | | | | | | | | | | | Commit d524165cb8db ("x86/apic: Check x2apic early") tests X2APIC_ENABLE bit of MSR_IA32_APICBASE when CONFIG_X86_X2APIC is off and panics the kernel when this bit is set. Xen's PV guests will pass this MSR read to the hypervisor which will return its version of the MSR, where this bit might be set. Make sure we clear it before returning MSR value to the caller. Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* x86: Clean up cr4 manipulationAndy Lutomirski2015-02-041-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | CR4 manipulation was split, seemingly at random, between direct (write_cr4) and using a helper (set/clear_in_cr4). Unfortunately, the set_in_cr4 and clear_in_cr4 helpers also poke at the boot code, which only a small subset of users actually wanted. This patch replaces all cr4 access in functions that don't leave cr4 exactly the way they found it with new helpers cr4_set_bits, cr4_clear_bits, and cr4_set_bits_and_update_boot. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Vince Weaver <vince@deater.net> Cc: "hillf.zj" <hillf.zj@alibaba-inc.com> Cc: Valdis Kletnieks <Valdis.Kletnieks@vt.edu> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/495a10bdc9e67016b8fd3945700d46cfd5c12c2f.1414190806.git.luto@amacapital.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge tag 'stable/for-linus-3.19-rc4-tag' of ↵Linus Torvalds2015-01-141-1/+21
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen bug fixes from David Vrabel: "Several critical linear p2m fixes that prevented some hosts from booting" * tag 'stable/for-linus-3.19-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: properly retrieve NMI reason xen: check for zero sized area when invalidating memory xen: use correct type for physical addresses xen: correct race in alloc_p2m_pmd() xen: correct error for building p2m list on 32 bits x86/xen: avoid freeing static 'name' when kasprintf() fails x86/xen: add extra memory for remapped frames during setup x86/xen: don't count how many PFNs are identity mapped x86/xen: Free bootmem in free_p2m_page() during early boot x86/xen: Remove unnecessary BUG_ON(preemptible()) in xen_setup_timer()
| * x86/xen: properly retrieve NMI reasonJan Beulich2015-01-131-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Using the native code here can't work properly, as the hypervisor would normally have cleared the two reason bits by the time Dom0 gets to see the NMI (if passed to it at all). There's a shared info field for this, and there's an existing hook to use - just fit the two together. This is particularly relevant so that NMIs intended to be handled by APEI / GHES actually make it to the respective handler. Note that the hook can (and should) be used irrespective of whether being in Dom0, as accessing port 0x61 in a DomU would be even worse, while the shared info field would just hold zero all the time. Note further that hardware NMI handling for PVH doesn't currently work anyway due to missing code in the hypervisor (but it is expected to work the native rather than the PV way). Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | xen: Support Xen pv-domains using PATJuergen Gross2014-11-161-18/+7
|/ | | | | | | | | | | | | | | | | | | | | | | | With the dynamical mapping between cache modes and pgprot values it is now possible to use all cache modes via the Xen hypervisor PAT settings in a pv domain. All to be done is to read the PAT configuration MSR and set up the translation tables accordingly. Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stefan.bader@canonical.com Cc: xen-devel@lists.xensource.com Cc: ville.syrjala@linux.intel.com Cc: jbeulich@suse.com Cc: toshi.kani@hp.com Cc: plagnioj@jcrosoft.com Cc: tomi.valkeinen@ti.com Cc: bhelgaas@google.com Link: http://lkml.kernel.org/r/1415019724-4317-19-git-send-email-jgross@suse.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* x86/xen: delay construction of mfn_list_listJuergen Gross2014-10-231-3/+0
| | | | | | | | | | | | | | | | | | | | | | | The 3 level p2m tree for the Xen tools is constructed very early at boot by calling xen_build_mfn_list_list(). Memory needed for this tree is allocated via extend_brk(). As this tree (other than the kernel internal p2m tree) is only needed for domain save/restore, live migration and crash dump analysis it doesn't matter whether it is constructed very early or just some milliseconds later when memory allocation is possible by other means. This patch moves the call of xen_build_mfn_list_list() just after calling xen_pagetable_p2m_copy() simplifying this function, too, as it doesn't have to bother with two parallel trees now. The same applies for some other internal functions. While simplifying code, make early_can_reuse_p2m_middle() static and drop the unused second parameter. p2m_mid_identity_mfn can be removed as well, it isn't used either. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* Merge branch 'for-3.18-consistent-ops' of ↵Linus Torvalds2014-10-151-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu Pull percpu consistent-ops changes from Tejun Heo: "Way back, before the current percpu allocator was implemented, static and dynamic percpu memory areas were allocated and handled separately and had their own accessors. The distinction has been gone for many years now; however, the now duplicate two sets of accessors remained with the pointer based ones - this_cpu_*() - evolving various other operations over time. During the process, we also accumulated other inconsistent operations. This pull request contains Christoph's patches to clean up the duplicate accessor situation. __get_cpu_var() uses are replaced with with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr(). Unfortunately, the former sometimes is tricky thanks to C being a bit messy with the distinction between lvalues and pointers, which led to a rather ugly solution for cpumask_var_t involving the introduction of this_cpu_cpumask_var_ptr(). This converts most of the uses but not all. Christoph will follow up with the remaining conversions in this merge window and hopefully remove the obsolete accessors" * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits) irqchip: Properly fetch the per cpu offset percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write. percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t Revert "powerpc: Replace __get_cpu_var uses" percpu: Remove __this_cpu_ptr clocksource: Replace __this_cpu_ptr with raw_cpu_ptr sparc: Replace __get_cpu_var uses avr32: Replace __get_cpu_var with __this_cpu_write blackfin: Replace __get_cpu_var uses tile: Use this_cpu_ptr() for hardware counters tile: Replace __get_cpu_var uses powerpc: Replace __get_cpu_var uses alpha: Replace __get_cpu_var ia64: Replace __get_cpu_var uses s390: cio driver &__get_cpu_var replacements s390: Replace __get_cpu_var uses mips: Replace __get_cpu_var uses MIPS: Replace __get_cpu_var uses in FPU emulator. arm: Replace __this_cpu_ptr with raw_cpu_ptr ...
| * x86: Replace __get_cpu_var usesChristoph Lameter2014-08-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | __get_cpu_var() is used for multiple purposes in the kernel source. One of them is address calculation via the form &__get_cpu_var(x). This calculates the address for the instance of the percpu variable of the current processor based on an offset. Other use cases are for storing and retrieving data from the current processors percpu area. __get_cpu_var() can be used as an lvalue when writing data or on the right side of an assignment. __get_cpu_var() is defined as : #define __get_cpu_var(var) (*this_cpu_ptr(&(var))) __get_cpu_var() always only does an address determination. However, store and retrieve operations could use a segment prefix (or global register on other platforms) to avoid the address calculation. this_cpu_write() and this_cpu_read() can directly take an offset into a percpu area and use optimized assembly code to read and write per cpu variables. This patch converts __get_cpu_var into either an explicit address calculation using this_cpu_ptr() or into a use of this_cpu operations that use the offset. Thereby address calculations are avoided and less registers are used when code is generated. Transformations done to __get_cpu_var() 1. Determine the address of the percpu instance of the current processor. DEFINE_PER_CPU(int, y); int *x = &__get_cpu_var(y); Converts to int *x = this_cpu_ptr(&y); 2. Same as #1 but this time an array structure is involved. DEFINE_PER_CPU(int, y[20]); int *x = __get_cpu_var(y); Converts to int *x = this_cpu_ptr(y); 3. Retrieve the content of the current processors instance of a per cpu variable. DEFINE_PER_CPU(int, y); int x = __get_cpu_var(y) Converts to int x = __this_cpu_read(y); 4. Retrieve the content of a percpu struct DEFINE_PER_CPU(struct mystruct, y); struct mystruct x = __get_cpu_var(y); Converts to memcpy(&x, this_cpu_ptr(&y), sizeof(x)); 5. Assignment to a per cpu variable DEFINE_PER_CPU(int, y) __get_cpu_var(y) = x; Converts to __this_cpu_write(y, x); 6. Increment/Decrement etc of a per cpu variable DEFINE_PER_CPU(int, y); __get_cpu_var(y)++ Converts to __this_cpu_inc(y) Cc: Thomas Gleixner <tglx@linutronix.de> Cc: x86@kernel.org Acked-by: H. Peter Anvin <hpa@linux.intel.com> Acked-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | x86/xen: Set EFER.NX and EFER.SCE in PVH guestsMukesh Rathor2014-10-061-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes two bugs in PVH guests: - Not setting EFER.NX means the NX bit in page table entries is ignored on Intel processors and causes reserved bit page faults on AMD processors. - After the Xen commit 7645640d6ff1 ("x86/PVH: don't set EFER_SCE for pvh guest") PVH guests are required to set EFER.SCE to enable the SYSCALL instruction. Secondary VCPUs are started with pagetables with the NX bit set so EFER.NX must be set before using any stack or data segment. xen_pvh_cpu_early_init() is the new secondary VCPU entry point that sets EFER before jumping to cpu_bringup_and_idle(). Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | xen: eliminate scalability issues from initrd handlingJuergen Gross2014-10-031-2/+9
| | | | | | | | | | | | | | | | | | | | Size restrictions native kernels wouldn't have resulted from the initrd getting mapped into the initial mapping. The kernel doesn't really need the initrd to be mapped, so use infrastructure available in Xen to avoid the mapping and hence the restriction. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | x86: remove the Xen-specific _PAGE_IOMAP PTE flagDavid Vrabel2014-09-231-2/+0
|/ | | | | | | | | | | | | | | | | The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs that were used to map I/O regions that are 1:1 in the p2m. This allowed Xen to obtain the correct PFN when converting the MFNs read from a PTE back to their PFN. Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn() returns the correct PFN by using a combination of the m2p and p2m to determine if an MFN corresponds to a 1:1 mapping in the the p2m. Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for future uses of the PTE flag. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com>
* Merge tag 'stable/for-linus-3.17-rc0-tag' of ↵Linus Torvalds2014-08-071-0/+13
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from David Vrabel: - remove unused V2 grant table support - note that Konrad is xen-blkkback/front maintainer - add 'xen_nopv' option to disable PV extentions for x86 HVM guests - misc minor cleanups * tag 'stable/for-linus-3.17-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-pciback: Document the 'quirks' sysfs file xen/pciback: Fix error return code in xen_pcibk_attach() xen/events: drop negativity check of unsigned parameter xen/setup: Remove Identity Map Debug Message xen/events/fifo: remove a unecessary use of BM() xen/events/fifo: ensure all bitops are properly aligned even on x86 xen/events/fifo: reset control block and local HEADs on resume xen/arm: use BUG_ON xen/grant-table: remove support for V2 tables x86/xen: safely map and unmap grant frames when in atomic context MAINTAINERS: Make me the Xen block subsystem (front and back) maintainer xen: Introduce 'xen_nopv' to disable PV extensions for HVM guests.
| * xen: Introduce 'xen_nopv' to disable PV extensions for HVM guests.Konrad Rzeszutek Wilk2014-07-141-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | By default when CONFIG_XEN and CONFIG_XEN_PVHVM kernels are run, they will enable the PV extensions (drivers, interrupts, timers, etc) - which is the best option for the majority of use cases. However, in some cases (kexec not fully working, benchmarking) we want to disable Xen PV extensions. As such introduce the 'xen_nopv' parameter that will do it. This parameter is intended only for HVM guests as the Xen PV guests MUST boot with PV extensions. However, even if you use 'xen_nopv' on Xen PV guests it will be ignored. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> --- [v2: s/off/xen_nopv/ per Boris Ostrovsky recommendation.] [v3: Add Reviewed-by] [v4: Clarify that this is only for HVM guests]
* | Merge branch 'x86-efi-for-linus' of ↵Linus Torvalds2014-08-041-0/+2
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI changes from Ingo Molnar: "Main changes in this cycle are: - arm64 efi stub fixes, preservation of FP/SIMD registers across firmware calls, and conversion of the EFI stub code into a static library - Ard Biesheuvel - Xen EFI support - Daniel Kiper - Support for autoloading the efivars driver - Lee, Chun-Yi - Use the PE/COFF headers in the x86 EFI boot stub to request that the stub be loaded with CONFIG_PHYSICAL_ALIGN alignment - Michael Brown - Consolidate all the x86 EFI quirks into one file - Saurabh Tangri - Additional error logging in x86 EFI boot stub - Ulf Winkelvos - Support loading initrd above 4G in EFI boot stub - Yinghai Lu - EFI reboot patches for ACPI hardware reduced platforms" * 'x86-efi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) efi/arm64: Handle missing virtual mapping for UEFI System Table arch/x86/xen: Silence compiler warnings xen: Silence compiler warnings x86/efi: Request desired alignment via the PE/COFF headers x86/efi: Add better error logging to EFI boot stub efi: Autoload efivars efi: Update stale locking comment for struct efivars arch/x86: Remove efi_set_rtc_mmss() arch/x86: Replace plain strings with constants xen: Put EFI machinery in place xen: Define EFI related stuff arch/x86: Remove redundant set_bit(EFI_MEMMAP) call arch/x86: Remove redundant set_bit(EFI_SYSTEM_TABLES) call efi: Introduce EFI_PARAVIRT flag arch/x86: Do not access EFI memory map if it is not available efi: Use early_mem*() instead of early_io*() arch/ia64: Define early_memunmap() x86/reboot: Add EFI reboot quirk for ACPI Hardware Reduced flag efi/reboot: Allow powering off machines using EFI efi/reboot: Add generic wrapper around EfiResetSystem() ...
| * arch/x86/xen: Silence compiler warningsDaniel Kiper2014-07-181-14/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Compiler complains in the following way when x86 32-bit kernel with Xen support is build: CC arch/x86/xen/enlighten.o arch/x86/xen/enlighten.c: In function ‘xen_start_kernel’: arch/x86/xen/enlighten.c:1726:3: warning: right shift count >= width of type [enabled by default] Such line contains following EFI initialization code: boot_params.efi_info.efi_systab_hi = (__u32)(__pa(efi_systab_xen) >> 32); There is no issue if x86 64-bit kernel is build. However, 32-bit case generate warning (even if that code will not be executed because Xen does not work on 32-bit EFI platforms) due to __pa() returning unsigned long type which has 32-bits width. So move whole EFI initialization stuff to separate function and build it conditionally to avoid above mentioned warning on x86 32-bit architecture. Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
| * xen: Put EFI machinery in placeDaniel Kiper2014-07-181-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables EFI usage under Xen dom0. Standard EFI Linux Kernel infrastructure cannot be used because it requires direct access to EFI data and code. However, in dom0 case it is not possible because above mentioned EFI stuff is fully owned and controlled by Xen hypervisor. In this case all calls from dom0 to EFI must be requested via special hypercall which in turn executes relevant EFI code in behalf of dom0. When dom0 kernel boots it checks for EFI availability on a machine. If it is detected then artificial EFI system table is filled. Native EFI callas are replaced by functions which mimics them by calling relevant hypercall. Later pointer to EFI system table is passed to standard EFI machinery and it continues EFI subsystem initialization taking into account that there is no direct access to EFI boot services, runtime, tables, structures, etc. After that system runs as usual. This patch is based on Jan Beulich and Tang Liang work. Signed-off-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Tang Liang <liang.tang@oracle.com> Signed-off-by: Daniel Kiper <daniel.kiper@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com Signed-off-by: Matt Fleming <matt.fleming@intel.com>
* | Merge tag 'stable/for-linus-3.16-rc1-tag' of ↵Linus Torvalds2014-06-191-1/+4
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen fixes from David Vrabel: "Xen regression and PVH fixes for 3.16-rc1 - fix dom0 PVH memory setup on latest unstable Xen releases - fix 64-bit x86 PV guest boot failure on Xen 3.1 and earlier - fix resume regression on non-PV (auto-translated physmap) guests" * tag 'stable/for-linus-3.16-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/grant-table: fix suspend for non-PV guests x86/xen: no need to explicitly register an NMI callback Revert "xen/pvh: Update E820 to work with PVH (v2)" x86/xen: fix memory setup for PVH dom0
| * x86/xen: fix memory setup for PVH dom0David Vrabel2014-06-051-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Since af06d66ee32b (x86: fix setup of PVH Dom0 memory map) in Xen, PVH dom0 need only use the memory memory provided by Xen which has already setup all the correct holes. xen_memory_setup() then ends up being trivial for a PVH guest so introduce a new function (xen_auto_xlated_memory_setup()). Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Tested-by: Roger Pau Monné <roger.pau@citrix.com>
* | Merge tag 'stable/for-linus-3.16-rc0-tag' of ↵Linus Torvalds2014-06-021-0/+1
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into next Pull Xen updates from David Vrabel: "xen: features and fixes for 3.16-rc0 - support foreign mappings in PVH domains (needed when dom0 is PVH) - fix mapping high MMIO regions in x86 PV guests (this is also the first half of removing the PAGE_IOMAP PTE flag). - ARM suspend/resume support. - ARM multicall support" * tag 'stable/for-linus-3.16-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: map foreign pfns for autotranslated guests xen-acpi-processor: Don't display errors when we get -ENOSYS xen/pciback: Document the entry points for 'pcistub_put_pci_dev' xen/pciback: Document when the 'unbind' and 'bind' functions are called. xen-pciback: Document when we FLR an PCI device. xen-pciback: First reset, then free. xen-pciback: Cleanup up pcistub_put_pci_dev x86/xen: do not use _PAGE_IOMAP in xen_remap_domain_mfn_range() x86/xen: set regions above the end of RAM as 1:1 x86/xen: only warn once if bad MFNs are found during setup x86/xen: compactly store large identity ranges in the p2m x86/xen: fix set_phys_range_identity() if pfn_e > MAX_P2M_PFN x86/xen: rename early_p2m_alloc() and early_p2m_alloc_middle() xen/x86: set panic notifier priority to minimum arm,arm64/xen: introduce HYPERVISOR_suspend() xen: refactor suspend pre/post hooks arm: xen: export HYPERVISOR_multicall to modules. arm64: introduce virt_to_pfn arm/xen: Remove definiition of virt_to_pfn in asm/xen/page.h arm: xen: implement multicall hypercall support.
| * xen/x86: set panic notifier priority to minimumRadim Krčmář2014-05-151-0/+1
| | | | | | | | | | | | | | | | | | | | Execution is not going to continue after telling Xen about the crash. Let other panic notifiers run by postponing the final hypercall as much as possible. Signed-off-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com>
* | asmlinkage, x86: Add explicit __visible to arch/x86/*Andi Kleen2014-05-051-1/+1
|/ | | | | | | | | | | As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* xen/pvh: set CR4 flags for APsMukesh Rathor2014-02-031-0/+12
| | | | | | | | | | | | | During bootup in the 'probe_page_size_mask' these CR4 flags are set in there. But for AP processors they are not set as we do not use 'secondary_startup_64' which the baremetal kernels uses. Instead do it in this function which we use in Xen PVH during our startup for AP processors. As such fix it up to make sure we have that flag set. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/pvh: Set X86_CR0_WP and others in CR0 (v2)Roger Pau Monne2014-01-211-3/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | otherwise we will get for some user-space applications that use 'clone' with CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID end up hitting an assert in glibc manifested by: general protection ip:7f80720d364c sp:7fff98fd8a80 error:0 in libc-2.13.so[7f807209e000+180000] This is due to the nature of said operations which sets and clears the PID. "In the successful one I can see that the page table of the parent process has been updated successfully to use a different physical page, so the write of the tid on that page only affects the child... On the other hand, in the failed case, the write seems to happen before the copy of the original page is done, so both the parent and the child end up with the same value (because the parent copies the page after the write of the child tid has already happened)." (Roger's analysis). The nature of this is due to the Xen's commit of 51e2cac257ec8b4080d89f0855c498cbbd76a5e5 "x86/pvh: set only minimal cr0 and cr4 flags in order to use paging" the CR0_WP was removed so COW features of the Linux kernel were not operating properly. While doing that also update the rest of the CR0 flags to be inline with what a baremetal Linux kernel would set them to. In 'secondary_startup_64' (baremetal Linux) sets: X86_CR0_PE | X86_CR0_MP | X86_CR0_ET | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM | X86_CR0_PG The hypervisor for HVM type guests (which PVH is a bit) sets: X86_CR0_PE | X86_CR0_ET | X86_CR0_TS For PVH it specifically sets: X86_CR0_PG Which means we need to set the rest: X86_CR0_MP | X86_CR0_NE | X86_CR0_WP | X86_CR0_AM to have full parity. Signed-off-by: Roger Pau Monne <roger.pau@citrix.com> Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Took out the cr4 writes to be a seperate patch] [v2: 0-DAY kernel found xen_setup_gdt to be missing a static]
* xen/pvh: remove duplicated include from enlighten.cWei Yongjun2014-01-071-1/+0
| | | | | | | Remove duplicated include. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/pvh: Piggyback on PVHVM for event channels (v2)Mukesh Rathor2014-01-061-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | PVH is a PV guest with a twist - there are certain things that work in it like HVM and some like PV. There is a similar mode - PVHVM where we run in HVM mode with PV code enabled - and this patch explores that. The most notable PV interfaces are the XenBus and event channels. We will piggyback on how the event channel mechanism is used in PVHVM - that is we want the normal native IRQ mechanism and we will install a vector (hvm callback) for which we will call the event channel mechanism. This means that from a pvops perspective, we can use native_irq_ops instead of the Xen PV specific. Albeit in the future we could support pirq_eoi_map. But that is a feature request that can be shared with PVHVM. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
* xen/pvh: Secondary VCPU bringup (non-bootup CPUs)Mukesh Rathor2014-01-061-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | The VCPU bringup protocol follows the PV with certain twists. From xen/include/public/arch-x86/xen.h: Also note that when calling DOMCTL_setvcpucontext and VCPU_initialise for HVM and PVH guests, not all information in this structure is updated: - For HVM guests, the structures read include: fpu_ctxt (if VGCT_I387_VALID is set), flags, user_regs, debugreg[*] - PVH guests are the same as HVM guests, but additionally use ctrlreg[3] to set cr3. All other fields not used should be set to 0. This is what we do. We piggyback on the 'xen_setup_gdt' - but modify a bit - we need to call 'load_percpu_segment' so that 'switch_to_new_gdt' can load per-cpu data-structures. It has no effect on the VCPU0. We also piggyback on the %rdi register to pass in the CPU number - so that when we bootup a new CPU, the cpu_bringup_and_idle will have passed as the first parameter the CPU number (via %rdi for 64-bit). Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* xen/pvh: Load GDT/GS in early PV bootup code for BSP.Mukesh Rathor2014-01-061-2/+37
| | | | | | | | | | | | | | | | | | | | | | | | During early bootup we start life using the Xen provided GDT, which means that we are running with %cs segment set to FLAT_KERNEL_CS (FLAT_RING3_CS64 0xe033, GDT index 261). But for PVH we want to be use HVM type mechanism for segment operations. As such we need to switch to the HVM one and also reload ourselves with the __KERNEL_CS:eip to run in the proper GDT and segment. For HVM this is usually done in 'secondary_startup_64' in (head_64.S) but since we are not taking that bootup path (we start in PV - xen_start_kernel) we need to do that in the early PV bootup paths. For good measure we also zero out the %fs, %ds, and %es (not strictly needed as Xen has already cleared them for us). The %gs is loaded by 'switch_to_new_gdt'. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
* xen/pvh: Don't setup P2M tree.Konrad Rzeszutek Wilk2014-01-061-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | P2M is not available for PVH. Fortunatly for us the P2M code already has mostly the support for auto-xlat guest thanks to commit 3d24bbd7dddbea54358a9795abaf051b0f18973c "grant-table: call set_phys_to_machine after mapping grant refs" which: " introduces set_phys_to_machine calls for auto_translated guests (even on x86) in gnttab_map_refs and gnttab_unmap_refs. translated by swiotlb-xen... " so we don't need to muck much. with above mentioned "commit you'll get set_phys_to_machine calls from gnttab_map_refs and gnttab_unmap_refs but PVH guests won't do anything with them " (Stefano Stabellini) which is OK - we want them to be NOPs. This is because we assume that an "IOMMU is always present on the plaform and Xen is going to make the appropriate IOMMU pagetable changes in the hypercall implementation of GNTTABOP_map_grant_ref and GNTTABOP_unmap_grant_ref, then eveything should be transparent from PVH priviligied point of view and DMA transfers involving foreign pages keep working with no issues[sp] Otherwise we would need a P2M (and an M2P) for PVH priviligied to track these foreign pages .. (see arch/arm/xen/p2m.c)." (Stefano Stabellini). We still have to inhibit the building of the P2M tree. That had been done in the past by not calling xen_build_dynamic_phys_to_machine (which setups the P2M tree and gives us virtual address to access them). But we are missing a check for xen_build_mfn_list_list - which was continuing to setup the P2M tree and would blow up at trying to get the virtual address of p2m_missing (which would have been setup by xen_build_dynamic_phys_to_machine). Hence a check is needed to not call xen_build_mfn_list_list when running in auto-xlat mode. Instead of replicating the check for auto-xlat in enlighten.c do it in the p2m.c code. The reason is that the xen_build_mfn_list_list is called also in xen_arch_post_suspend without any checks for auto-xlat. So for PVH or PV with auto-xlat - we would needlessly allocate space for an P2M tree. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
* xen/pvh: Early bootup changes in PV code (v4).Mukesh Rathor2014-01-061-14/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't use the filtering that 'xen_cpuid' is doing because the hypervisor treats 'XEN_EMULATE_PREFIX' as an invalid instruction. This means that all of the filtering will have to be done in the hypervisor/toolstack. Without the filtering we expose to the guest the: - cpu topology (sockets, cores, etc); - the APERF (which the generic scheduler likes to use), see 5e626254206a709c6e937f3dda69bf26c7344f6f "xen/setup: filter APERFMPERF cpuid feature out" - and the inability to figure out whether MWAIT_LEAF should be exposed or not. See df88b2d96e36d9a9e325bfcd12eb45671cbbc937 "xen/enlighten: Disable MWAIT_LEAF so that acpi-pad won't be loaded." - x2apic, see 4ea9b9aca90cfc71e6872ed3522356755162932c "xen: mask x2APIC feature in PV" We also check for vector callback early on, as it is a required feature. PVH also runs at default kernel IOPL. Finally, pure PV settings are moved to a separate function that are only called for pure PV, ie, pv with pvmmu. They are also #ifdef with CONFIG_XEN_PVMMU. Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
* Merge tag 'stable/for-linus-3.12-rc0-tag-two' of ↵Linus Torvalds2013-09-101-1/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen bug-fixes from Konrad Rzeszutek Wilk: "This pull I usually do after rc1 is out but because we have a nice amount of fixes, some bootup related fixes for ARM, and it is early in the cycle we figured to do it now to help with tracking of potential regressions. The simple ones are the ARM ones - one of the patches fell through the cracks, other fixes a bootup issue (unconditionally using Xen functions). Then a fix for a regression causing preempt count being off (patch causing this went in v3.12). Lastly are the fixes to make Xen PVHVM guests use PV ticketlocks (Xen PV already does). The enablement of that was supposed to be part of the x86 spinlock merge in commit 816434ec4a67 ("The biggest change here are paravirtualized ticket spinlocks (PV spinlocks), which bring a nice speedup on various benchmarks...") but unfortunatly it would cause hang when booting Xen PVHVM guests. Yours truly got all of the bugs fixed last week and they (six of them) are included in this pull. Bug-fixes: - Boot on ARM without using Xen unconditionally - On Xen ARM don't run cpuidle/cpufreq - Fix regression in balloon driver, preempt count warnings - Fixes to make PVHVM able to use pv ticketlock. - Revert Xen PVHVM disabling pv ticketlock (aka, re-enable pv ticketlocks)" * tag 'stable/for-linus-3.12-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/spinlock: Don't use __initdate for xen_pv_spin Revert "xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM" xen/spinlock: Don't setup xen spinlock IPI kicker if disabled. xen/smp: Update pv_lock_ops functions before alternative code starts under PVHVM xen/spinlock: We don't need the old structure anymore xen/spinlock: Fix locking path engaging too soon under PVHVM. xen/arm: disable cpuidle and cpufreq when linux is running as dom0 xen/p2m: Don't call get_balloon_scratch_page() twice, keep interrupts disabled for multicalls ARM: xen: only set pm function ptrs for Xen guests
| * xen/spinlock: Fix locking path engaging too soon under PVHVM.Konrad Rzeszutek Wilk2013-09-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The xen_lock_spinning has a check for the kicker interrupts and if it is not initialized it will spin normally (not enter the slowpath). But for PVHVM case we would initialize the kicker interrupt before the CPU came online. This meant that if the booting CPU used a spinlock and went in the slowpath - it would enter the slowpath and block forever. The forever part because during bootup: the spinlock would be taken _before_ the CPU sets itself to be online (more on this further), and we enter to poll on the event channel forever. The bootup CPU (see commit fc78d343fa74514f6fd117b5ef4cd27e4ac30236 "xen/smp: initialize IPI vectors before marking CPU online" for details) and the CPU that started the bootup consult the cpu_online_mask to determine whether the booting CPU should get an IPI. The booting CPU has to set itself in this mask via: set_cpu_online(smp_processor_id(), true); However, if the spinlock is taken before this (and it is) and it polls on an event channel - it will never be woken up as the kernel will never send an IPI to an offline CPU. Note that the PVHVM logic in sending IPIs is using the HVM path which has numerous checks using the cpu_online_mask and cpu_active_mask. See above mention git commit for details. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
* | Merge tag 'stable/for-linus-3.12-rc0-tag' of ↵Linus Torvalds2013-09-041-5/+10
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from Konrad Rzeszutek Wilk: "A couple of features and a ton of bug-fixes. There is also some maintership changes. Jeremy is enjoying the full-time work at the startup and as much as he would love to help - he can't find the time. I have a bunch of other things that I promised to work on - paravirt diet, get SWIOTLB working everywhere, etc, but haven't been able to find the time. As such both David Vrabel and Boris Ostrovsky have graciously volunteered to help with the maintership role. They will keep the lid on regressions, bug-fixes, etc. I will be in the background to help - but eventually there will be less of me doing the Xen GIT pulls and more of them. Stefano is still doing the ARM/ARM64 and will continue on doing so. Features: - Xen Trusted Platform Module (TPM) frontend driver - with the backend in MiniOS. - Scalability improvements in event channel. - Two extra Xen co-maintainers (David, Boris) and one going away (Jeremy) Bug-fixes: - Make the 1:1 mapping work during early bootup on selective regions. - Add scratch page to balloon driver to deal with unexpected code still holding on stale pages. - Allow NMIs on PV guests (64-bit only) - Remove unnecessary TLB flush in M2P code. - Fixes duplicate callbacks in Xen granttable code. - Fixes in PRIVCMD_MMAPBATCH ioctls to allow retries - Fix for events being lost due to rescheduling on different VCPUs. - More documentation" * tag 'stable/for-linus-3.12-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (23 commits) hvc_xen: Remove unnecessary __GFP_ZERO from kzalloc drivers/xen-tpmfront: Fix compile issue with missing option. xen/balloon: don't set P2M entry for auto translated guest xen/evtchn: double free on error Xen: Fix retry calls into PRIVCMD_MMAPBATCH*. xen/pvhvm: Initialize xen panic handler for PVHVM guests xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mapping xen: fix ARM build after 6efa20e4 MAINTAINERS: Remove Jeremy from the Xen subsystem. xen/events: document behaviour when scanning the start word for events x86/xen: during early setup, only 1:1 map the ISA region x86/xen: disable premption when enabling local irqs swiotlb-xen: replace dma_length with sg_dma_len() macro swiotlb: replace dma_length with sg_dma_len() macro xen/balloon: set a mapping for ballooned out pages xen/evtchn: improve scalability by using per-user locks xen/p2m: avoid unneccesary TLB flush in m2p_remove_override() MAINTAINERS: Add in two extra co-maintainers of the Xen tree. MAINTAINERS: Update the Xen subsystem's with proper mailing list. xen: replace strict_strtoul() with kstrtoul() ...
| * xen/pvhvm: Initialize xen panic handler for PVHVM guestsVaughan Cao2013-08-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kernel use callback linked in panic_notifier_list to notice others when panic happens. NORET_TYPE void panic(const char * fmt, ...){ ... atomic_notifier_call_chain(&panic_notifier_list, 0, buf); } When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to send out an event with reason code - SHUTDOWN_crash. xen_panic_handler_init() is defined to register on panic_notifier_list but we only call it in xen_arch_setup which only be called by PV, this patch is necessary for PVHVM. Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config file won't lead a vmcore to be generate when the guest panics. It can be reproduced with 'echo c > /proc/sysrq-trigger'. Signed-off-by: Vaughan Cao <vaughan.cao@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: Joe Jin <joe.jin@oracle.com>
| * xen: Support 64-bit PV guest receiving NMIsKonrad Rzeszutek Wilk2013-08-091-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is based on a patch that Zhenzhong Duan had sent - which was missing some of the remaining pieces. The kernel has the logic to handle Xen-type-exceptions using the paravirt interface in the assembler code (see PARAVIRT_ADJUST_EXCEPTION_FRAME - pv_irq_ops.adjust_exception_frame and and INTERRUPT_RETURN - pv_cpu_ops.iret). That means the nmi handler (and other exception handlers) use the hypervisor iret. The other changes that would be neccessary for this would be to translate the NMI_VECTOR to one of the entries on the ipi_vector and make xen_send_IPI_mask_allbutself use different events. Fortunately for us commit 1db01b4903639fcfaec213701a494fe3fb2c490b (xen: Clean up apic ipi interface) implemented this and we piggyback on the cleanup such that the apic IPI interface will pass the right vector value for NMI. With this patch we can trigger NMIs within a PV guest (only tested x86_64). For this to work with normal PV guests (not initial domain) we need the domain to be able to use the APIC ops - they are already implemented to use the Xen event channels. For that to be turned on in a PV domU we need to remove the masking of X86_FEATURE_APIC. Incidentally that means kgdb will also now work within a PV guest without using the 'nokgdbroundup' workaround. Note that the 32-bit version is different and this patch does not enable that. CC: Lisa Nguyen <lisa@xenapiadmin.com> CC: Ben Guthro <benjamin.guthro@citrix.com> CC: Zhenzhong Duan <zhenzhong.duan@oracle.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> [v1: Fixed up per David Vrabel comments] Reviewed-by: Ben Guthro <benjamin.guthro@citrix.com> Reviewed-by: David Vrabel <david.vrabel@citrix.com>
* | x86: Correctly detect hypervisorJason Wang2013-08-051-6/+3
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We try to handle the hypervisor compatibility mode by detecting hypervisor through a specific order. This is not robust, since hypervisors may implement each others features. This patch tries to handle this situation by always choosing the last one in the CPUID leaves. This is done by letting .detect() return a priority instead of true/false and just re-using the CPUID leaf where the signature were found as the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who has the highest priority. Other sophisticated detection method could also be implemented on top. Suggested by H. Peter Anvin and Paolo Bonzini. Acked-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Doug Covelli <dcovelli@vmware.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* x86: delete __cpuinit usage from all x86 filesPaul Gortmaker2013-07-141-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The __cpuinit type of throwaway sections might have made sense some time ago when RAM was more constrained, but now the savings do not offset the cost and complications. For example, the fix in commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time") is a good example of the nasty type of bugs that can be created with improper use of the various __init prefixes. After a discussion on LKML[1] it was decided that cpuinit should go the way of devinit and be phased out. Once all the users are gone, we can then finally remove the macros themselves from linux/init.h. Note that some harmless section mismatch warnings may result, since notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c) are flagged as __cpuinit -- so if we remove the __cpuinit from arch specific callers, we will also get section mismatch warnings. As an intermediate step, we intend to turn the linux/init.h cpuinit content into no-ops as early as possible, since that will get rid of these warnings. In any case, they are temporary and harmless. This removes all the arch/x86 uses of the __cpuinit macros from all C files. x86 only had the one __CPUINIT used in assembly files, and it wasn't paired off with a .previous or a __FINIT, so we can delete it directly w/o any corresponding additional change there. [1] https://lkml.org/lkml/2013/5/20/589 Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Ingo Molnar <mingo@kernel.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
* x86: Get rid of ->hard_math and all the FPU asm fuH. Peter Anvin2013-06-061-1/+1
| | | | | | | | | | | | | | | | | Reimplement FPU detection code in C and drop old, not-so-recommended detection method in asm. Move all the relevant stuff into i387.c where it conceptually belongs. Finally drop cpuinfo_x86.hard_math. [ hpa: huge thanks to Borislav for taking my original concept patch and productizing it ] [ Boris, note to self: do not use static_cpu_has before alternatives! ] Signed-off-by: H. Peter Anvin <hpa@zytor.com> Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
OpenPOWER on IntegriCloud