op-kernel-dev - Development kernel branch for OpenPOWER systems

	Commit message (Collapse)	Author	Age	Files	Lines
*	powerpc: Fix hypervisor facility unavaliable vector number	Michael Neuling	2013-08-09	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently if we take hypervisor facility unavaliable (from 0xf80/0x4f80) we mark it as an OS facility unavaliable (0xf60) as the two share the same code path. The becomes a problem in facility_unavailable_exception() as we aren't able to see the hypervisor facility unavailable exceptions. Below fixes this by duplication the required macros. Signed-off-by: Michael Neuling <mikey@neuling.org> Cc: <stable@vger.kernel.org> [v3.10] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	Merge tag 'v3.10' into next	Benjamin Herrenschmidt	2013-07-01	1	-1/+1
\|\ \| \| \| \| \| \| \| \| \| \|	Merge 3.10 in order to get some of the last minute powerpc changes, resolve conflicts and add additional fixes on top of them.
\| *	powerpc: Fix emulation of illegal instructions on PowerNV platform	Paul Mackerras	2013-06-15	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Normally, the kernel emulates a few instructions that are unimplemented on some processors (e.g. the old dcba instruction), or privileged (e.g. mfpvr). The emulation of unimplemented instructions is currently not working on the PowerNV platform. The reason is that on these machines, unimplemented and illegal instructions cause a hypervisor emulation assist interrupt, rather than a program interrupt as on older CPUs. Our vector for the emulation assist interrupt just calls program_check_exception() directly, without setting the bit in SRR1 that indicates an illegal instruction interrupt. This fixes it by making the emulation assist interrupt set that bit before calling program_check_interrupt(). With this, old programs that use no-longer implemented instructions such as dcba now work again. CC: <stable@vger.kernel.org> Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* \|	powerpc: Wire up the HV facility unavailable exception	Michael Ellerman	2013-07-01	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to the facility unavailble exception, except the facilities are controlled by HFSCR. Adapt the facility_unavailable_exception() so it can be called for either the regular or Hypervisor facility unavailable exceptions. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> CC: <stable@vger.kernel.org> [v3.10] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* \|	powerpc: Rename and flesh out the facility unavailable exception handler	Michael Ellerman	2013-07-01	1	-14/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The exception at 0xf60 is not the TM (Transactional Memory) unavailable exception, it is the "Facility Unavailable Exception", rename it as such. Flesh out the handler to acknowledge the fact that it can be called for many reasons, one of which is TM being unavailable. Use STD_EXCEPTION_COMMON() for the exception body, for some reason we had it open-coded, I've checked the generated code is identical. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> CC: <stable@vger.kernel.org> [v3.10] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* \|	powerpc: Remove KVMTEST from RELON exception handlers	Michael Ellerman	2013-07-01	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	KVMTEST is a macro which checks whether we are taking an exception from guest context, if so we branch out of line and eventually call into the KVM code to handle the switch. When running real guests on bare metal (HV KVM) the hardware ensures that we never take a relocation on exception when transitioning from guest to host. For PR KVM we disable relocation on exceptions ourself in kvmppc_core_init_vm(), as of commit a413f47 "Disable relocation on exceptions whenever PR KVM is active". So convert all the RELON macros to use NOTEST, and drop the remaining KVM_HANDLER() definitions we have for 0xe40 and 0xe80. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> CC: <stable@vger.kernel.org> [v3.9+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* \|	powerpc: Remove unreachable relocation on exception handlers	Michael Ellerman	2013-07-01	1	-15/+3
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have relocation on exception handlers defined for h_data_storage and h_instr_storage. However we will never take relocation on exceptions for these because they can only come from a guest, and we never take relocation on exceptions when we transition from guest to host. We also have a handler for hmi_exception (Hypervisor Maintenance) which is defined in the architecture to never be delivered with relocation on, see see v2.07 Book III-S section 6.5. So remove the handlers, leaving a branch to self just to be double extra paranoid. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> CC: <stable@vger.kernel.org> [v3.9+] Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/power8: Update denormalization handler	Michael Neuling	2013-06-10	1	-0/+10
\| \| \| \| \| \| \| \| \|	POWER8 can take a denormalisation exception on any VSX registers. This does the extra 32 VSX registers we don't currently handle. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pseries: Simplify denormalization handler	Michael Neuling	2013-06-10	1	-64/+16
\| \| \| \| \| \| \| \| \| \| \|	The following simplifies the denorm code by using macros to generate the long stream of almost identical instructions. This patch results in no changes to the output binary, but removes a lot of lines of code. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Save DAR and DSISR in pt_regs on MCE	Aneesh Kumar K.V	2013-04-30	1	-0/+9
\| \| \| \| \| \| \| \| \|	We were not saving DAR and DSISR on MCE. Save then and also print the values along with exception details in xmon. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Fix "attempt to move .org backwards" error	Paul Mackerras	2013-04-26	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Building a 64-bit powerpc kernel with PR KVM enabled currently gives this error: AS arch/powerpc/kernel/head_64.o arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:258: Error: attempt to move .org backwards make[2]: *** [arch/powerpc/kernel/head_64.o] Error 1 This happens because the MASKABLE_EXCEPTION_PSERIES macro turns into 33 instructions, but we only have space for 32 at the decrementer interrupt vector (from 0x900 to 0x980). In the code generated by the MASKABLE_EXCEPTION_PSERIES macro, we currently have two instances of the HMT_MEDIUM macro, which has the effect of setting the SMT thread priority to medium. One is the first instruction, and is overwritten by a no-op on processors where we save the PPR (processor priority register), that is, POWER7 or later. The other is after we have saved the PPR. In order to reduce the code at 0x900 by one instruction, we omit the first HMT_MEDIUM. On processors without SMT this will have no effect since HMT_MEDIUM is a no-op there. On POWER5 and RS64 machines this will mean that the first few instructions take a little longer in the case where a decrementer interrupt occurs when the hardware thread is running at low SMT priority. On POWER6 and later machines, the hardware automatically boosts the thread priority when a decrementer interrupt is taken if the thread priority was below medium, so this change won't make any difference. The alternative would be to branch out of line after saving the CFAR. However, that would incur an extra overhead on all processors, whereas the approach adopted here only adds overhead on older threaded processors. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Fix hardware IRQs with MMU on exceptions when HV=0	Michael Neuling	2013-04-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	POWER8 allows us to take interrupts with the MMU on. This gives us a second set of vectors offset at 0x4000. Unfortunately when coping these vectors we missed checking for MSR HV for hardware interrupts (0x500). This results in us trying to use HSRR0/1 when HV=0, rather than SRR0/1 on HW IRQs The below fixes this to check CPU_FTR_HVMODE when patching the code at 0x4500. Also we remove the check for CPU_FTR_ARCH_206 since relocation on IRQs are only available in arch 2.07 and beyond. Thanks to benh for helping find this. Signed-off-by: Michael Neuling <mikey@neuling.org> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: remove dead CONFIG_HVC_SCOM code	Paul Bolle	2013-04-18	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit c1fb6816fb1b78dd94b673b0fdaa9a7a16e97bd1 ("powerpc: Add relocation on exception vector handlers") added two lines of code that depend on the macro CONFIG_HVC_SCOM. That macro doesn't exist. Perhaps it was intended to use CONFIG_PPC_SCOM here. But since "maintence_interrupt" is a typo and there's nothing in arch/powerpc that looks like maintenance_interrupt it seems best to just delete these lines. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Acked-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
*	powerpc: make additional room in exception vector area	Chen Gang	2013-03-25	1	-72/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The FWNMI region is fixed at 0x7000 and the vector are now overflowing that with allmodconfig. Fix that by moving slb_miss_realmode code out of that region as it doesn't need to be that close to the call sites (it is a _GLOBAL function) Fixes this build error: arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:1304: Error: attempt to move .org backwards Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
*	powerpc: Rename USER_ESID_BITS* to ESID_BITS*	Aneesh Kumar K.V	2013-03-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	Now we use ESID_BITS of kernel address to build proto vsid. So rename USER_ESIT_BITS to ESID_BITS Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.8]
*	powerpc: Update kernel VSID range	Aneesh Kumar K.V	2013-03-17	1	-9/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch change the kernel VSID range so that we limit VSID_BITS to 37. This enables us to support 64TB with 65 bit VA (37+28). Without this patch we have boot hangs on platforms that only support 65 bit VA. With this patch we now have proto vsid generated as below: We first generate a 37-bit "proto-VSID". Proto-VSIDs are generated from mmu context id and effective segment id of the address. For user processes max context id is limited to ((1ul << 19) - 5) for kernel space, we use the top 4 context ids to map address as below 0x7fffc - [ 0xc000000000000000 - 0xc0003fffffffffff ] 0x7fffd - [ 0xd000000000000000 - 0xd0003fffffffffff ] 0x7fffe - [ 0xe000000000000000 - 0xe0003fffffffffff ] 0x7ffff - [ 0xf000000000000000 - 0xf0003fffffffffff ] Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Tested-by: Geoff Levand <geoff@infradead.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: <stable@vger.kernel.org> [v3.8]
*	powerpc: Avoid link stack corruption in MMU on syscall entry path	Michael Neuling	2013-03-05	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we use the link register to branch up high in the early MMU on syscall entry path. Unfortunately, this trashes the link stack as the address we are going to is not associated with the earlier mflr. This patch simply converts us to used the count register (volatile over syscalls anyway) instead. This is much better at predicting in this scenario and doesn't trash link stack causing a bunch of additional branch mispredicts later. Benchmarking this on POWER8 saves a bunch of cycles on Anton's null syscall benchmark here: http://ozlabs.org/~anton/junkcode/null_syscall.c Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Hook in new transactional memory code	Michael Neuling	2013-02-15	1	-2/+54
\| \| \| \| \| \| \| \| \|	This hooks the new transactional memory code into context switching, FP/VMX/VMX unavailable and exception return. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add transactional memory unavaliable execption handler	Michael Neuling	2013-02-15	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \|	These should never happen since we always turn on MSR TM when in userspace. We don't do lazy TM. Hence if we hit this, we barf and kill the task as something's gone horribly wrong. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Save CFAR before branching in interrupt entry paths	Paul Mackerras	2013-02-15	1	-30/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some of the interrupt vectors on 64-bit POWER server processors are only 32 bytes long, which is not enough for the full first-level interrupt handler. For these we currently just have a branch to an out-of-line handler. However, this means that we corrupt the CFAR (come-from address register) on POWER7 and later processors. To fix this, we split the EXCEPTION_PROLOG_1 macro into two pieces: EXCEPTION_PROLOG_0 contains the part up to the point where the CFAR is saved in the PACA, and EXCEPTION_PROLOG_1 contains the rest. We then put EXCEPTION_PROLOG_0 in the short interrupt vectors before we branch to the out-of-line handler, which contains the rest of the first-level interrupt handler. To facilitate this, we define new _OOL (out of line) variants of STD_EXCEPTION_PSERIES, etc. In order to get EXCEPTION_PROLOG_0 to be short enough, i.e., no more than 6 instructions, it was necessary to move the stores that move the PPR and CFAR values into the PACA into __EXCEPTION_PROLOG_1 and to get rid of one of the two HMT_MEDIUM instructions. Previously there was a HMT_MEDIUM_PPR_DISCARD before the prolog, which was nop'd out on processors with the PPR (POWER7 and later), and then another HMT_MEDIUM inside the HMT_MEDIUM_PPR_SAVE macro call inside __EXCEPTION_PROLOG_1, which was nop'd out on processors without PPR. Now the HMT_MEDIUM inside EXCEPTION_PROLOG_0 is there unconditionally and the HMT_MEDIUM_PPR_DISCARD is not strictly necessary, although this leaves it in for the interrupt vectors where there is room for it. Previously we had a handler for hypervisor maintenance interrupts at 0xe50, which doesn't leave enough room for the vector for hypervisor emulation assist interrupts at 0xe40, since we need 8 instructions. The 0xe50 vector was only used on POWER6, as the HMI vector was moved to 0xe60 on POWER7. Since we don't support running in hypervisor mode on POWER6, we just remove the handler at 0xe50. This also changes denorm_exception_hv to use EXCEPTION_PROLOG_0 instead of open-coding it, and removes the HMT_MEDIUM_PPR_DISCARD from the relocation-on vectors (since any CPU that supports relocation-on interrupts also has the PPR). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Remove Cell-specific relocation-on interrupt vector code	Paul Mackerras	2013-02-15	1	-10/+0
\| \| \| \| \| \| \| \| \|	The Cell processor doesn't support relocation-on interrupts, so we don't need relocation-on versions of the interrupt vectors that are purely Cell-specific. This removes them. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Make room in exception vector area	Benjamin Herrenschmidt	2013-01-10	1	-55/+55
\| \| \| \| \| \| \| \| \| \|	The FWNMI region is fixed at 0x7000 and the vector are now overflowing that with some configurations. Fix that by moving some hash management code out of that region as it doesn't need to be that close to the call sites (isn't accessed using conditional branches). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Hardware breakpoints rewrite to handle non DABR breakpoint registers	Michael Neuling	2013-01-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	This is a rewrite so that we don't assume we are using the DABR throughout the code. We now use the arch_hw_breakpoint to store the breakpoint in a generic manner in the thread_struct, rather than storing the raw DABR value. The ptrace GET/SET_DEBUGREG interface currently passes the raw DABR in from userspace. We keep this functionality, so that future changes (like the POWER8 DAWR), will still fake the DABR to userspace. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Implement PPR save/restore	Haren Myneni	2013-01-10	1	-10/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[PATCH 6/6] powerpc: Implement PPR save/restore When the task enters in to kernel space, the user defined priority (PPR) will be saved in to PACA at the beginning of first level exception vector and then copy from PACA to thread_info in second level vector. PPR will be restored from thread_info before exits the kernel space. P7/P8 temporarily raises the thread priority to higher level during exception until the program executes HMT_* calls. But it will not modify PPR register. So we save PPR value whenever some register is available to use and then calls HMT_MEDIUM to increase the priority. This feature supports on P7 or later processors. We save/ restore PPR for all exception vectors except system call entry. GLIBC will be saving / restore for system calls. So the default PPR value (3) will be set for the system call exit when the task returned to the user space. Signed-off-by: Haren Myneni <haren@us.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add code to handle soft-disabled doorbells on server	Ian Munsie	2013-01-10	1	-12/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds the logic to properly handle doorbells that come in when interrupts have been soft disabled and to replay them when interrupts are re-enabled: - masked_##_H##interrupt is modified to leave interrupts enabled when a doorbell has come in since doorbells are edge sensitive and as such won't be automatically re-raised. - __check_irq_replay now tests if a doorbell happened on book3s, and returns either 0xe80 or 0xa00 depending on whether we are the hypervisor or not. - restore_check_irq_replay now tests for the two possible server doorbell vector numbers to replay. - __replay_interrupt also adds tests for the two server doorbell vector numbers, and is modified to use a compare instruction rather than an andi. on the single bit difference between 0x500 and 0x900. The last two use a CPU feature section to avoid needlessly testing against the hypervisor vector if it is not the hypervisor, and vice versa. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add book3s privileged doorbell exception vectors	Ian Munsie	2013-01-10	1	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	Directed Privileged Doorbell Interrupts come in at 0xa00 (or 0xc000000000004a00 if relocation on exception is enabled), so add exception vectors at these locations. If doorbell support is not compiled in we handle it as an unknown_exception. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add book3s hypervisor doorbell exception vectors	Ian Munsie	2013-01-10	1	-3/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	Directed Hypervisor Doorbell Interrupts come in at 0xe80 (or 0xc000000000004e80 if relocation on exceptions is enabled), so add exception vectors at these locations. If doorbell support is not compiled in we handle it as an unknown_exception. Signed-off-by: Ian Munsie <imunsie@au1.ibm.com> Tested-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add relocation on exception vector handlers	Michael Neuling	2012-11-15	1	-8/+172
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	POWER8/v2.07 allows exceptions to be taken with the MMU still on. A new set of exception vectors is added at 0xc000_0000_0000_4xxx. When the HW takes us here, MSR IR/DR will be set already and we no longer need a costly RFID to turn the MMU back on again. The original 0x0 based exception vectors remain for when the HW can't leave the MMU on. Examples of this are when we can't trust the current MMU mappings, like when we are changing from guest to hypervisor (HV 0 -> 1) or when the MMU was off already. In these cases the HW will take us to the original 0x0 based exception vectors with the MMU off as before. This uses the new macros added previously too implement these new execption vectors at 0xc000_0000_0000_4xxx. We exit these exception vectors using mflr/blr (rather than mtspr SSR0/RFID), since we don't need the costly MMU switch anymore. This moves the __end_interrupts marker down past these new 0x4000 vectors since they will need to be copied down to 0x0 when the kernel is not at 0x0. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add new macros needed for relocation on exceptions	Michael Neuling	2012-11-15	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	POWER8/v2.07 allows exceptions to be taken with the MMU still on. A new set of exception vectors is added at 0xc000_0000_0000_4xxx. When the HW takes us here, MSR IR/DR will be set already and we no longer need a costly RFID to turn the MMU back on again. The original 0x0 based exception vectors remain for when the HW can't leave the MMU on. Examples of this are when we can't trust the current the MMU mappings, like when we are changing from guest to hypervisor (HV 0 -> 1) or when the MMU was off already. In these cases the HW will take us to the original 0x0 based exception vectors with the MMU off as before. The below macros are copies of the macros used at the 0x0 offset but modified to handle the MMU being on. In these macros we use the link register to jump to the secondary handlers rather than using RFID (RFID was also use to turn on the MMU). Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Turn syscall handler into macros	Michael Neuling	2012-11-15	1	-23/+40
\| \| \| \| \| \| \| \| \|	This turns the syscall handler into macros as we are going to want to reuse them again later. Signed-off-by: Matt Evans <matt@ozlabs.org> Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Make load_hander handle upto 64k offset	Michael Neuling	2012-11-15	1	-2/+2
\| \| \| \| \| \| \|	If we change load_hander() to use an ori instead of addi, we can load handlers upto 64k away provided we are still 64k aligned. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Remove unessessary 0x3000 location enforcement	Michael Neuling	2012-11-15	1	-1/+3
\| \| \| \| \| \| \|	This removes the large gap between 0x1800 and 0x3000. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Whitespace changes in exception64s.S	Michael Neuling	2012-11-15	1	-15/+15
\| \| \| \| \| \| \|	Remove redundancy spaces and make tab usage consistent. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Fix denorm symbol name	Michael Neuling	2012-11-15	1	-1/+1
\| \| \| \| \| \| \|	Fix global symbol name to match actual denorm_exception_hv label. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/mm: Add 64TB support	Aneesh Kumar K.V	2012-09-17	1	-1/+3
\| \| \| \| \| \| \| \| \|	Increase max addressable range to 64TB. This is not tested on real hardware yet. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add denormalisation exception handling for POWER6/7	Michael Neuling	2012-09-17	1	-0/+123
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On POWER6 and POWER7 if the input operand to an instruction is a denormalised single precision binary floating point value we can take a denormalisation exception where it's expected that the hypervisor (HV=1) will fix up the inputs before the instruction is run. This adds code to handle this denormalisation exception for POWER6 and POWER7. It also add a CONFIG_PPC_DENORMALISATION option and sets it in pseries/ppc64_defconfig. This is useful on bare metal systems only. Based on patch from Milton Miller. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Give hypervisor decrementer interrupts their own handler	Paul Mackerras	2012-09-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At the moment the handler for hypervisor decrementer interrupts is the same as for decrementer interrupts, i.e. timer_interrupt(). This is bogus; if we ever do get a hypervisor decrementer interrupt it won't have anything to do with the next timer event. In fact the only time we get hypervisor decrementer interrupts is when one is left pending on exit from a KVM guest. When we get a hypervisor decrementer interrupt we don't need to do anything special to clear it, since they are edge-triggered on the transition of HDEC from 0 to -1. Thus this adds an empty handler function for them. We don't need to have them masked when interrupts are soft-disabled, so we use STD_EXCEPTION_HV instead of MASKABLE_EXCEPTION_HV. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add a symbol for hypervisor trampolines	Michael Ellerman	2012-07-11	1	-0/+1
\| \| \| \| \| \| \| \|	Purely for cosmetic purposes, otherwise it can appear that we are in single_step_pSeries() which is slightly confusing. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Use CURRENT_THREAD_INFO instead of open coded assembly	Stuart Yoder	2012-07-11	1	-1/+1
\| \| \| \| \|	Signed-off-by: Stuart Yoder <stuart.yoder@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2012-05-24	1	-5/+7
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull KVM changes from Avi Kivity: "Changes include additional instruction emulation, page-crossing MMIO, faster dirty logging, preventing the watchdog from killing a stopped guest, module autoload, a new MSI ABI, and some minor optimizations and fixes. Outside x86 we have a small s390 and a very large ppc update. Regarding the new (for kvm) rebaseless workflow, some of the patches that were merged before we switch trees had to be rebased, while others are true pulls. In either case the signoffs should be correct now." Fix up trivial conflicts in Documentation/feature-removal-schedule.txt arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h. I suspect the kvm_para.h resolution ends up doing the "do I have cpuid" check effectively twice (it was done differently in two different commits), but better safe than sorry ;) * 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits) KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block KVM: s390: onereg for timer related registers KVM: s390: epoch difference and TOD programmable field KVM: s390: KVM_GET/SET_ONEREG for s390 KVM: s390: add capability indicating COW support KVM: Fix mmu_reload() clash with nested vmx event injection KVM: MMU: Don't use RCU for lockless shadow walking KVM: VMX: Optimize %ds, %es reload KVM: VMX: Fix %ds/%es clobber KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte() KVM: VMX: unlike vmcs on fail path KVM: PPC: Emulator: clean up SPR reads and writes KVM: PPC: Emulator: clean up instruction parsing kvm/powerpc: Add new ioctl to retreive server MMU infos kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler KVM: PPC: Book3S: Enable IRQs during exit handling KVM: PPC: Fix PR KVM on POWER7 bare metal KVM: PPC: Fix stbux emulation KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields ...
\| *	KVM: PPC: Book3S HV: Make secondary threads more robust against stray IPIs	Paul Mackerras	2012-04-08	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently on POWER7, if we are running the guest on a core and we don't need all the hardware threads, we do nothing to ensure that the unused threads aren't executing in the kernel (other than checking that they are offline). We just assume they're napping and we don't do anything to stop them trying to enter the kernel while the guest is running. This means that a stray IPI can wake up the hardware thread and it will then try to enter the kernel, but since the core is in guest context, it will execute code from the guest in hypervisor mode once it turns the MMU on, which tends to lead to crashes or hangs in the host. This fixes the problem by adding two new one-byte flags in the kvmppc_host_state structure in the PACA which are used to interlock between the primary thread and the unused secondary threads when entering the guest. With these flags, the primary thread can ensure that the unused secondaries are not already in kernel mode (i.e. handling a stray IPI) and then indicate that they should not try to enter the kernel if they do get woken for any reason. Instead they will go into KVM code, find that there is no vcpu to run, acknowledge and clear the IPI and go back to nap mode. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* \|	Merge branch 'merge' into next	Benjamin Herrenschmidt	2012-05-09	1	-1/+1
\|\ \
\| * \|	powerpc/irq: Make alignment & program interrupt behave the same	Benjamin Herrenschmidt	2012-05-09	1	-1/+1
\| \|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Alignment was the last user of the ENABLE_INTS macro, which we can now remove. All non-syscall exceptions now disable interrupts on entry, they get re-enabled conditionally from C code. Don't unconditionally re-enable in program check either, check the original context. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* \|	powerpc: Remove CONFIG_POWER4_ONLY	Anton Blanchard	2012-04-30	1	-4/+0
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove CONFIG_POWER4_ONLY, the option is badly named and only does two things: - It wraps the MMU segment table code. With feature fixups there is little downside to compiling this in. - It uses the newer mtocrf instruction in various assembly functions. Instead of making this a compile option just do it at runtime via a feature fixup. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	Merge branch 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2012-03-28	1	-4/+4
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull kvm updates from Avi Kivity: "Changes include timekeeping improvements, support for assigning host PCI devices that share interrupt lines, s390 user-controlled guests, a large ppc update, and random fixes." This is with the sign-off's fixed, hopefully next merge window we won't have rebased commits. * 'kvm-updates/3.4' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits) KVM: Convert intx_mask_lock to spin lock KVM: x86: fix kvm_write_tsc() TSC matching thinko x86: kvmclock: abstract save/restore sched_clock_state KVM: nVMX: Fix erroneous exception bitmap check KVM: Ignore the writes to MSR_K7_HWCR(3) KVM: MMU: make use of ->root_level in reset_rsvds_bits_mask KVM: PMU: add proper support for fixed counter 2 KVM: PMU: Fix raw event check KVM: PMU: warn when pin control is set in eventsel msr KVM: VMX: Fix delayed load of shared MSRs KVM: use correct tlbs dirty type in cmpxchg KVM: Allow host IRQ sharing for assigned PCI 2.3 devices KVM: Ensure all vcpus are consistent with in-kernel irqchip settings KVM: x86 emulator: Allow PM/VM86 switch during task switch KVM: SVM: Fix CPL updates KVM: x86 emulator: VM86 segments must have DPL 3 KVM: x86 emulator: Fix task switch privilege checks arch/powerpc/kvm/book3s_hv.c: included linux/sched.h twice KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation KVM: mmu_notifier: Flush TLBs before releasing mmu_lock ...
\| *	KVM: PPC: Implement MMIO emulation support for Book3S HV guests	Paul Mackerras	2012-03-05	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This provides the low-level support for MMIO emulation in Book3S HV guests. When the guest tries to map a page which is not covered by any memslot, that page is taken to be an MMIO emulation page. Instead of inserting a valid HPTE, we insert an HPTE that has the valid bit clear but another hypervisor software-use bit set, which we call HPTE_V_ABSENT, to indicate that this is an absent page. An absent page is treated much like a valid page as far as guest hcalls (H_ENTER, H_REMOVE, H_READ etc.) are concerned, except of course that an absent HPTE doesn't need to be invalidated with tlbie since it was never valid as far as the hardware is concerned. When the guest accesses a page for which there is an absent HPTE, it will take a hypervisor data storage interrupt (HDSI) since we now set the VPM1 bit in the LPCR. Our HDSI handler for HPTE-not-present faults looks up the hash table and if it finds an absent HPTE mapping the requested virtual address, will switch to kernel mode and handle the fault in kvmppc_book3s_hv_page_fault(), which at present just calls kvmppc_hv_emulate_mmio() to set up the MMIO emulation. This is based on an earlier patch by Benjamin Herrenschmidt, but since heavily reworked. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Avi Kivity <avi@redhat.com>
* \|	powerpc: Rework lazy-interrupt handling	Benjamin Herrenschmidt	2012-03-09	1	-92/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current implementation of lazy interrupts handling has some issues that this tries to address. We don't do the various workarounds we need to do when re-enabling interrupts in some cases such as when returning from an interrupt and thus we may still lose or get delayed decrementer or doorbell interrupts. The current scheme also makes it much harder to handle the external "edge" interrupts provided by some BookE processors when using the EPR facility (External Proxy) and the Freescale Hypervisor. Additionally, we tend to keep interrupts hard disabled in a number of cases, such as decrementer interrupts, external interrupts, or when a masked decrementer interrupt is pending. This is sub-optimal. This is an attempt at fixing it all in one go by reworking the way we do the lazy interrupt disabling from the ground up. The base idea is to replace the "hard_enabled" field with a "irq_happened" field in which we store a bit mask of what interrupt occurred while soft-disabled. When re-enabling, either via arch_local_irq_restore() or when returning from an interrupt, we can now decide what to do by testing bits in that field. We then implement replaying of the missed interrupts either by re-using the existing exception frame (in exception exit case) or via the creation of a new one from an assembly trampoline (in the arch_local_irq_enable case). This removes the need to play with the decrementer to try to create fake interrupts, among others. In addition, this adds a few refinements: - We no longer hard disable decrementer interrupts that occur while soft-disabled. We now simply bump the decrementer back to max (on BookS) or leave it stopped (on BookE) and continue with hard interrupts enabled, which means that we'll potentially get better sample quality from performance monitor interrupts. - Timer, decrementer and doorbell interrupts now hard-enable shortly after removing the source of the interrupt, which means they no longer run entirely hard disabled. Again, this will improve perf sample quality. - On Book3E 64-bit, we now make the performance monitor interrupt act as an NMI like Book3S (the necessary C code for that to work appear to already be present in the FSL perf code, notably calling nmi_enter instead of irq_enter). (This also fixes a bug where BookE perfmon interrupts could clobber r14 ... oops) - We could make "masked" decrementer interrupts act as NMIs when doing timer-based perf sampling to improve the sample quality. Signed-off-by-yet: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: - Add hard-enable to decrementer, timer and doorbells - Fix CR clobber in masked irq handling on BookE - Make embedded perf interrupt act as an NMI - Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want to retrigger an interrupt without preventing hard-enable v3: - Fix or vs. ori bug on Book3E - Fix enabling of interrupts for some exceptions on Book3E v4: - Fix resend of doorbells on return from interrupt on Book3E v5: - Rebased on top of my latest series, which involves some significant rework of some aspects of the patch. v6: - 32-bit compile fix - more compile fixes with various .config combos - factor out the asm code to soft-disable interrupts - remove the C wrapper around preempt_schedule_irq v7: - Fix a bug with hard irq state tracking on native power7
* \|	powerpc: Replace mfmsr instructions with load from PACA kernel_msr field	Benjamin Herrenschmidt	2012-03-09	1	-3/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On 64-bit, the mfmsr instruction can be quite slow, slower than loading a field from the cache-hot PACA, which happens to already contain the value we want in most cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* \|	powerpc: Disable interrupts in 64-bit kernel FP and vector faults	Benjamin Herrenschmidt	2012-03-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we get a floating point, altivec or vsx unavaible interrupt in kernel, we trigger a kernel error. There is no point preserving the interrupt state, in fact, that can even make debugging harder as the processor state might change (we may even preempt) between taking the exception and landing in a debugger. So just make those 3 disable interrupts unconditionally. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2: On BookE only disable when hitting the kernel unavailable path, otherwise it will fail to restore softe as fast_exception_return doesn't do it.
* \|	powerpc: Call do_page_fault() with interrupts off	Benjamin Herrenschmidt	2012-03-09	1	-42/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We currently turn interrupts back to their previous state before calling do_page_fault(). This can be annoying when debugging as a bad fault will potentially have lost some processor state before getting into the debugger. We also end up calling some generic code with interrupts enabled such as notify_page_fault() with interrupts enabled, which could be unexpected. This changes our code to behave more like other architectures, and make the assembly entry code call into do_page_faults() with interrupts disabled. They are conditionally re-enabled from within do_page_fault() in the same spot x86 does it. While there, add the might_sleep() test in the case of a successful trylock of the mmap semaphore, again like x86. Also fix a bug in the existing assembly where r12 (_MSR) could get clobbered by C calls (the DTL accounting in the exception common macro and DISABLE_INTS) in some cases. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- v2. Add the r12 clobber fix