summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/mm
Commit message (Collapse)AuthorAgeFilesLines
* powerpc/64e: Add __ref to early_alloc_pgtable()Scott Wood2014-08-051-1/+1
| | | | | | | | | | This silences a section mismatch warning. early_alloc_pgtable() is called from map_kernel_page() which cannot be __init, but only when slab_is_available() returns false which can only happen during early boot. Signed-off-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/mm/numa: Fix break placementAndrey Utkin2014-08-051-1/+1
| | | | | | | | Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81631 Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Andrey Utkin <andrey.krieger.utkin@gmail.com> CC: <stable@vger.kernel.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Merge remote-tracking branch 'scott/next' into nextBenjamin Herrenschmidt2014-08-051-12/+56
|\ | | | | | | | | | | | | | | Scott writes: Highlights include e6500 hardware threading support, an e6500 TLB erratum workaround, corenet error reporting, support for a new board, and some minor fixes.
| * powerpc/e6500: Work around erratum A-008139Scott Wood2014-07-291-12/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | Erratum A-008139 can cause duplicate TLB entries if an indirect entry is overwritten using tlbwe while the other thread is using it to do a lookup. Work around this by using tlbilx to invalidate prior to overwriting. To avoid the need to save another register to hold MAS1 during the workaround code, TID clearing has been moved from tlb_miss_kernel_e6500 until after the SMT section. Signed-off-by: Scott Wood <scottwood@freescale.com>
* | powerpc: Remove power3 from commentsMichael Ellerman2014-07-282-2/+2
| | | | | | | | | | | | | | | | There are still a few occurences where it remains, because it helps to explain something that persists. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc: Remove CONFIG_POWER3Michael Ellerman2014-07-281-1/+1
| | | | | | | | | | | | | | | | | | Now that we have dropped power3 support we can remove CONFIG_POWER3. The usage in pgtable_32.c was already dead code as CONFIG_POWER3 was not selectable on PPC32. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc: Remove MMU_FTR_SLBMichael Ellerman2014-07-281-4/+2
| | | | | | | | | | | | | | | | We now only support cpus that use an SLB, so we don't need an MMU feature to indicate that. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc: Remove STAB codeMichael Ellerman2014-07-283-303/+5
| | | | | | | | | | | | | | | | | | Old cpus didn't have a Segment Lookaside Buffer (SLB), instead they had a Segment Table (STAB). Now that we've dropped support for those cpus, we can remove the STAB support entirely. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | Merge branch 'merge' into nextBenjamin Herrenschmidt2014-07-111-11/+1
|\ \ | |/ |/|
| * powerpc: Clean up MMU_FTRS_A2 and MMU_FTR_TYPE_3EMichael Ellerman2014-07-111-11/+1
| | | | | | | | | | | | | | | | | | | | | | | | In fb5a515704d7 "powerpc: Remove platforms/wsp and associated pieces", we removed the last user of MMU_FTRS_A2. So remove it. MMU_FTRS_A2 was the last user of MMU_FTR_TYPE_3E, so remove it also. This leaves some unreachable code in mmu_context_nohash.c, so remove that also. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc/booke64: wrap tlb lock and search in htw miss with FTR_SMTLaurentiu Tudor2014-06-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Virtualized environments may expose a e6500 dual-threaded core as two single-threaded e6500 cores. Take advantage of this and get rid of the tlb lock and the trap-causing tlbsx in the htw miss handler by guarding with CPU_FTR_SMT, as it's already being done in the bolted tlb1 miss handler. As seen in the results below, measurements done with lmbench random memory access latency test running under Freescale's Embedded Hypervisor, there is a ~34% improvement. Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) ---------------------------------------------------- Host Mhz L1 $ L2 $ Main mem Rand mem --------- --- ---- ---- -------- -------- smt 1665 1.8020 13.2 83.0 1149.7 nosmt 1665 1.8020 13.2 83.0 758.1 Signed-off-by: Laurentiu Tudor <Laurentiu.Tudor@freescale.com> Cc: Scott Wood <scottwood@freescale.com> [scottwood@freescale.com: commit message tweak] Signed-off-by: Scott Wood <scottwood@freescale.com>
* | powerpc/e6500: hw tablewalk: fix recursive tlb lock on cpu 0Scott Wood2014-06-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 82d86de25b9c99db546e17c6f7ebf9a691da557e "TLB lock recursive" introduced a bug whereby cpu 0 uses the same value for "lock held" as is used to indicate that the lock is free. This means that cpu 1 can acquire the lock whenever it wants, regardless of whether cpu 0 has it locked, which in turn means we can get duplicate TLB entries. Add one to the CPU value to ensure we do not use zero as a "lock held" value. Signed-off-by: Scott Wood <scottwood@freescale.com> Reported-by: Ed Swarthout <ed.swarthout@freescale.com>
* | powerpc/e6500: hw tablewalk: clear TID in kernel indirect entriesScott Wood2014-06-201-7/+5
|/ | | | | | | | | | | | | | | | | | | | Previously TID was being cleared before the tlbsx, but not after. This can lead to a multiway hit between a TLB entry with TID=0 (previously inserted when PID=0) and a TLB entry with TID!=0 that matches PID. This can theoretically result in undefined behavior, though we probably get lucky due to the details of the overlap. It also results in the inability to use multihit detection to detect other conflicting TLB entries, as well as poorer TLB utilization due to duplicating kernel TLB entries. Rather than try to patch up MAS1 after tlbsx, the entire value is saved/restored as with MAS2. I observed a slight improvement in TLB miss performance with this patch applied. Signed-off-by: Scott Wood <scottwood@freescale.com> Reported-by: Ed Swarthout <ed.swarthout@freescale.com>
* Merge branch 'next' of ↵Linus Torvalds2014-06-105-62/+107
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "Here is the bulk of the powerpc changes for this merge window. It got a bit delayed in part because I wasn't paying attention, and in part because I discovered I had a core PCI change without a PCI maintainer ack in it. Bjorn eventually agreed it was ok to merge it though we'll probably improve it later and I didn't want to rebase to add his ack. There is going to be a bit more next week, essentially fixes that I still want to sort through and test. The biggest item this time is the support to build the ppc64 LE kernel with our new v2 ABI. We previously supported v2 userspace but the kernel itself was a tougher nut to crack. This is now sorted mostly thanks to Anton and Rusty. We also have a fairly big series from Cedric that add support for 64-bit LE zImage boot wrapper. This was made harder by the fact that traditionally our zImage wrapper was always 32-bit, but our new LE toolchains don't really support 32-bit anymore (it's somewhat there but not really "supported") so we didn't want to rely on it. This meant more churn that just endian fixes. This brings some more LE bits as well, such as the ability to run in LE mode without a hypervisor (ie. under OPAL firmware) by doing the right OPAL call to reinitialize the CPU to take HV interrupts in the right mode and the usual pile of endian fixes. There's another series from Gavin adding EEH improvements (one day we *will* have a release with less than 20 EEH patches, I promise!). Another highlight is the support for the "Split core" functionality on P8 by Michael. This allows a P8 core to be split into "sub cores" of 4 threads which allows the subcores to run different guests under KVM (the HW still doesn't support a partition per thread). And then the usual misc bits and fixes ..." [ Further delayed by gmail deciding that BenH is a dirty spammer. Google knows. ] * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (155 commits) powerpc/powernv: Add missing include to LPC code selftests/powerpc: Test the THP bug we fixed in the previous commit powerpc/mm: Check paca psize is up to date for huge mappings powerpc/powernv: Pass buffer size to OPAL validate flash call powerpc/pseries: hcall functions are exported to modules, need _GLOBAL_TOC() powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC() powerpc/powernv: Set memory_block_size_bytes to 256MB powerpc: Allow ppc_md platform hook to override memory_block_size_bytes powerpc/powernv: Fix endian issues in memory error handling code powerpc/eeh: Skip eeh sysfs when eeh is disabled powerpc: 64bit sendfile is capped at 2GB powerpc/powernv: Provide debugfs access to the LPC bus via OPAL powerpc/serial: Use saner flags when creating legacy ports powerpc: Add cpu family documentation powerpc/xmon: Fix up xmon format strings powerpc/powernv: Add calls to support little endian host powerpc: Document sysfs DSCR interface powerpc: Fix regression of per-CPU DSCR setting powerpc: Split __SYSFS_SPRSETUP macro arch: powerpc/fadump: Cleaning up inconsistent NULL checks ...
| * powerpc/mm: Check paca psize is up to date for huge mappingsMichael Ellerman2014-06-061-11/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a bug in our hugepage handling which exhibits as an infinite loop of hash faults. If the fault is being taken in the kernel it will typically trigger the softlockup detector, or the RCU stall detector. The bug is as follows: 1. mmap(0xa0000000, ..., MAP_FIXED | MAP_HUGE_TLB | MAP_ANONYMOUS ..) 2. Slice code converts the slice psize to 16M. 3. The code on lines 539-540 of slice.c in slice_get_unmapped_area() synchronises the mm->context with the paca->context. So the paca slice mask is updated to include the 16M slice. 3. Either: * mmap() fails because there are no huge pages available. * mmap() succeeds and the mapping is then munmapped. In both cases the slice psize remains at 16M in both the paca & mm. 4. mmap(0xa0000000, ..., MAP_FIXED | MAP_ANONYMOUS ..) 5. The slice psize is converted back to 64K. Because of the check on line 539 of slice.c we DO NOT update the paca->context. The paca slice mask is now out of sync with the mm slice mask. 6. User/kernel accesses 0xa0000000. 7. The SLB miss handler slb_allocate_realmode() **uses the paca slice mask** to create an SLB entry and inserts it in the SLB. 18. With the 16M SLB entry in place the hardware does a hash lookup, no entry is found so a data access exception is generated. 19. The data access handler calls do_page_fault() -> handle_mm_fault(). 10. __handle_mm_fault() creates a THP mapping with do_huge_pmd_anonymous_page(). 11. The hardware retries the access, there is still nothing in the hash table so once again a data access exception is generated. 12. hash_page() calls into __hash_page_thp() and inserts a mapping in the hash. Although the THP mapping maps 16M the hashing is done using 64K as the segment page size. 13. hash_page() returns immediately after calling __hash_page_thp(), skipping over the code at line 1125. Resulting in the mismatch between the paca->context and mm->context not being detected. 14. The hardware retries the access, the hash it generates using the 16M SLB entry does NOT match the hash we inserted. 15. We take another data access and go into __hash_page_thp(). 16. We see a valid entry in the hpte_slot_array and so we call updatepp() which succeeds. 17. Goto 14. We could fix this in two ways. The first would be to remove or modify the check on line 539 of slice.c. The second option is to cause the check of paca psize in hash_page() on line 1125 to also be done for THP pages. We prefer the latter, because the check & update of the paca psize is not done until we know it's necessary. It's also done only on the current cpu, so we don't need to IPI all other cpus. Without further rearranging the code, the simplest fix is to pull out the code that checks paca psize and call it in two places. Firstly for THP/hugetlb, and secondly for other mappings as before. Thanks to Dave Jones for trinity, which originally found this bug. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: stable@vger.kernel.org [v3.11+]
| * powerpc/fsl-booke64: Set vmemmap_psize to 4KScott Wood2014-05-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The only way Freescale booke chips support mappings larger than 4K is via TLB1. The only way we support (direct) TLB1 entries is via hugetlb, which is not what map_kernel_page() does when given a large page size. Without this, a kernel with CONFIG_SPARSEMEM_VMEMMAP enabled crashes on boot with messages such as: PID hash table entries: 4096 (order: 3, 32768 bytes) Sorting __ex_table... BUG: Bad page state in process swapper pfn:00a2f page:8000040000023a48 count:0 mapcount:0 mapping:0000040000ffce48 index:0x40000ffbe50 page flags: 0x40000ffda40(active|arch_1|private|private_2|head|tail|swapcache|mappedtodisk|reclaim|swapbacked|unevictable|mlocked) page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set bad because of flags: page flags: 0x311840(active|private|private_2|swapcache|unevictable|mlocked) Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.15.0-rc1-00003-g7fa250c #299 Call Trace: [c00000000098ba20] [c000000000008b3c] .show_stack+0x7c/0x1cc (unreliable) [c00000000098baf0] [c00000000060aa50] .dump_stack+0x88/0xb4 [c00000000098bb70] [c0000000000c0468] .bad_page+0x144/0x1a0 [c00000000098bc10] [c0000000000c0628] .free_pages_prepare+0x164/0x17c [c00000000098bcc0] [c0000000000c24cc] .free_hot_cold_page+0x48/0x214 [c00000000098bd60] [c00000000086c318] .free_all_bootmem+0x1fc/0x354 [c00000000098be70] [c00000000085da84] .mem_init+0xac/0xdc [c00000000098bef0] [c0000000008547b0] .start_kernel+0x21c/0x4d4 [c00000000098bf90] [c000000000000448] .start_here_common+0x20/0x58 Signed-off-by: Scott Wood <scottwood@freescale.com>
| * Merge remote-tracking branch 'anton/abiv2' into nextBenjamin Herrenschmidt2014-05-054-46/+58
| |\ | | | | | | | | | | | | | | | | | | This series adds support for building the powerpc 64-bit LE kernel using the new ABI v2. We already supported running ABI v2 userspace programs but this adds support for building the kernel itself using the new ABI.
| | * powerpc: Fix branch patching code for ABIv2Anton Blanchard2014-04-234-33/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The MMU hashtable and SLB branch patching code uses function pointers for the update sites. This creates a difference between ABIv1 and ABIv2 because we don't have function descriptors on ABIv2. Get rid of the function pointer and just point at the update sites directly. This works on both ABIs. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * powerpc: Use ppc_function_entry instead of open coding itAnton Blanchard2014-04-231-10/+8
| | | | | | | | | | | | | | | | | | | | | Replace FUNCTION_TEXT with ppc_function_entry which can handle both ABIv1 and ABIv2. Signed-off-by: Anton Blanchard <anton@samba.org>
| | * powerpc: No need to use dot symbols when branching to a functionAnton Blanchard2014-04-231-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | binutils is smart enough to know that a branch to a function descriptor is actually a branch to the functions text address. Alan tells me that binutils has been doing this for 9 years. Signed-off-by: Anton Blanchard <anton@samba.org>
| * | powerpc/mm: use macro PGTABLE_EADDR_SIZE instead of digitalLiu Ping Fan2014-05-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | In case of extending the eaddr in future, use this macro PGTABLE_EADDR_SIZE to ease the maintenance of the code. Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | powerpc: Use 64k io pages when we never see an HEAAlexander Graf2014-05-011-3/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we never get around to seeing an HEA ethernet adapter, there's no point in restricting ourselves to 4k IO page size. This speeds up IO maps when CONFIG_IBMEBUS is disabled. [ Updated the test to also lift the restriction on arch 2.07 (Power 8) which cannot have an HEA -- BenH ] Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> foo
* | | hugetlb: restrict hugepage_migration_support() to x86_64Naoya Horiguchi2014-06-041-10/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently hugepage migration is available for all archs which support pmd-level hugepage, but testing is done only for x86_64 and there're bugs for other archs. So to avoid breaking such archs, this patch limits the availability strictly to x86_64 until developers of other archs get interested in enabling this feature. Simply disabling hugepage migration on non-x86_64 archs is not enough to fix the reported problem where sys_move_pages() hits the BUG_ON() in follow_page(FOLL_GET), so let's fix this by checking if hugepage migration is supported in vma_migratable(). Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Hugh Dickins <hughd@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Miller <davem@davemloft.net> Cc: <stable@vger.kernel.org> [3.12+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'devicetree-for-3.16' of ↵Linus Torvalds2014-06-041-11/+11
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux into next Pull DeviceTree updates from Rob Herring: - Another round of clean-up of FDT related code in architecture code. This removes knowledge of internal FDT details from most architectures except powerpc. - Conversion of kernel's custom FDT parsing code to use libfdt. - DT based initialization for generic serial earlycon. The introduction of generic serial earlycon support went in through the tty tree. - Improve the platform device naming for DT probed devices to ensure unique naming and use parent names instead of a global index. - Fix a race condition in of_update_property. - Unify the various linker section OF match tables and fix several function prototype errors. - Update platform_get_irq_byname to work in deferred probe cases. - 2 binding doc updates * tag 'devicetree-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (58 commits) of: handle NULL node in next_child iterators of/irq: provide more wrappers for !CONFIG_OF devicetree: bindings: Document micrel vendor prefix dt: bindings: dwc2: fix required value for the phy-names property of_pci_irq: kill useless variable in of_irq_parse_pci() of/irq: do irq resolution in platform_get_irq_byname() of: Add a testcase for of_find_node_by_path() of: Make of_find_node_by_path() handle /aliases of: Create unlocked version of for_each_child_of_node() lib: add glibc style strchrnul() variant of: Handle memory@0 node on PPC32 only pci/of: Remove dead code of: fix race between search and remove in of_update_property() of: Use NULL for pointers of: Stop naming platform_device using dcr address of: Ensure unique names without sacrificing determinism tty/serial: pl011: add DT based earlycon support of/fdt: add FDT serial scanning for earlycon of/fdt: add FDT address translation support serial: earlycon: add DT support ...
| * \ \ Merge branch 'dt-bus-name' into for-nextRob Herring2014-05-131-22/+16
| |\ \ \ | | |/ /
| * | | of/fdt: update of_get_flat_dt_prop in prep for libfdtRob Herring2014-04-301-11/+11
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | Make of_get_flat_dt_prop arguments compatible with libfdt fdt_getprop call in preparation to convert FDT code to use libfdt. Make the return value const and the property length ptr type an int. Signed-off-by: Rob Herring <robh@kernel.org> Tested-by: Michal Simek <michal.simek@xilinx.com> Tested-by: Grant Likely <grant.likely@linaro.org> Tested-by: Stephen Chivers <schivers@csc.com>
* | | Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm into nextLinus Torvalds2014-06-041-1/+1
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull KVM updates from Paolo Bonzini: "At over 200 commits, covering almost all supported architectures, this was a pretty active cycle for KVM. Changes include: - a lot of s390 changes: optimizations, support for migration, GDB support and more - ARM changes are pretty small: support for the PSCI 0.2 hypercall interface on both the guest and the host (the latter acked by Catalin) - initial POWER8 and little-endian host support - support for running u-boot on embedded POWER targets - pretty large changes to MIPS too, completing the userspace interface and improving the handling of virtualized timer hardware - for x86, a larger set of changes is scheduled for 3.17. Still, we have a few emulator bugfixes and support for running nested fully-virtualized Xen guests (para-virtualized Xen guests have always worked). And some optimizations too. The only missing architecture here is ia64. It's not a coincidence that support for KVM on ia64 is scheduled for removal in 3.17" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (203 commits) KVM: add missing cleanup_srcu_struct KVM: PPC: Book3S PR: Rework SLB switching code KVM: PPC: Book3S PR: Use SLB entry 0 KVM: PPC: Book3S HV: Fix machine check delivery to guest KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs KVM: PPC: Book3S HV: Make sure we don't miss dirty pages KVM: PPC: Book3S HV: Fix dirty map for hugepages KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number KVM: PPC: Book3S: Add ONE_REG register names that were missed KVM: PPC: Add CAP to indicate hcall fixes KVM: PPC: MPIC: Reset IRQ source private members KVM: PPC: Graciously fail broken LE hypercalls PPC: ePAPR: Fix hypercall on LE guest KVM: PPC: BOOK3S: Remove open coded make_dsisr in alignment handler KVM: PPC: BOOK3S: Always use the saved DAR value PPC: KVM: Make NX bit available with magic page KVM: PPC: Disable NX for old magic page using guests KVM: PPC: BOOK3S: HV: Add mixed page-size support for guest ...
| * | | KVM: PPC: Book3S PR: Rework SLB switching codeAlexander Graf2014-05-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On LPAR guest systems Linux enables the shadow SLB to indicate to the hypervisor a number of SLB entries that always have to be available. Today we go through this shadow SLB and disable all ESID's valid bits. However, pHyp doesn't like this approach very much and honors us with fancy machine checks. Fortunately the shadow SLB descriptor also has an entry that indicates the number of valid entries following. During the lifetime of a guest we can just swap that value to 0 and don't have to worry about the SLB restoration magic. While we're touching the code, let's also make it more readable (get rid of rldicl), allow it to deal with a dynamic number of bolted SLB entries and only do shadow SLB swizzling on LPAR systems. Signed-off-by: Alexander Graf <agraf@suse.de>
* | | | Merge tag 'signed-for-3.15' of git://github.com/agraf/linux-2.6 into kvm-masterPaolo Bonzini2014-05-131-0/+4
|\ \ \ \ | |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch queue for 3.15 - 2014-05-12 This request includes a few bug fixes that really shouldn't wait for the next release. It fixes KVM on 32bit PowerPC when built as module. It also fixes the PV KVM acceleration when NX gets honored by the host. Furthermore we fix transactional memory support and numa support on HV KVM.
| * | | KVM guest: Make pv trampoline code executableAlexander Graf2014-04-291-0/+4
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Our PV guest patching code assembles chunks of instructions on the fly when it encounters more complicated instructions to hijack. These instructions need to live in a section that we don't mark as non-executable, as otherwise we fault when jumping there. Right now we put it into the .bss section where it automatically gets marked as non-executable. Add a check to the NX setting function to ensure that we leave these particular pages executable. Signed-off-by: Alexander Graf <agraf@suse.de>
* | | powerpc/mm: Fix tlbie to add AVAL fields for 64K pagesAneesh Kumar K.V2014-04-281-22/+16
| |/ |/| | | | | | | | | | | The if condition check was based on a draft ISA doc. Remove the same. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | powerpc/mm: fix ".__node_distance" undefinedMike Qiu2014-04-181-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CHK include/config/kernel.release CHK include/generated/uapi/linux/version.h CHK include/generated/utsrelease.h ... Building modules, stage 2. WARNING: 1 bad relocations c0000000013d6a30 R_PPC64_ADDR64 uprobes_fetch_type_table WRAP arch/powerpc/boot/zImage.pseries WRAP arch/powerpc/boot/zImage.epapr MODPOST 1849 modules ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined! make[1]: *** [__modpost] Error 1 make: *** [modules] Error 2 make: *** Waiting for unfinished jobs.... The reason is symbol "__node_distance" not been exported in powerpc. Signed-off-by: Mike Qiu <qiudayu@linux.vnet.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Cc: Jesse Larrew <jlarrew@linux.vnet.ibm.com> Cc: Robert Jennings <rcj@linux.vnet.ibm.com> Cc: Alistair Popple <alistair@popple.id.au> Cc: Mike Qiu <qiudayu@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* power, sched: stop updating inside arch_update_cpu_topology() when nothing ↵Michael Wang2014-04-091-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | to be update Since v1: Edited the comment according to Srivatsa's suggestion. During the testing, we encounter below WARN followed by Oops: WARNING: at kernel/sched/core.c:6218 ... NIP [c000000000101660] .build_sched_domains+0x11d0/0x1200 LR [c000000000101358] .build_sched_domains+0xec8/0x1200 PACATMSCRATCH [800000000000f032] Call Trace: [c00000001b103850] [c000000000101358] .build_sched_domains+0xec8/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... Oops: Kernel access of bad area, sig: 11 [#1] ... NIP [c00000000045c000] .__bitmap_weight+0x60/0xf0 LR [c00000000010132c] .build_sched_domains+0xe9c/0x1200 PACATMSCRATCH [8000000000029032] Call Trace: [c00000001b1037a0] [c000000000288ff4] .kmem_cache_alloc_node_trace+0x184/0x3a0 [c00000001b103850] [c00000000010132c] .build_sched_domains+0xe9c/0x1200 [c00000001b1039a0] [c00000000010aad4] .partition_sched_domains+0x484/0x510 [c00000001b103aa0] [c00000000016d0a8] .rebuild_sched_domains+0x68/0xa0 [c00000001b103b30] [c00000000005cbf0] .topology_work_fn+0x10/0x30 ... This was caused by that 'sd->groups == NULL' after building groups, which was caused by the empty 'sd->span'. The cpu's domain contained nothing because the cpu was assigned to a wrong node, due to the following unfortunate sequence of events: 1. The hypervisor sent a topology update to the guest OS, to notify changes to the cpu-node mapping. However, the update was actually redundant - i.e., the "new" mapping was exactly the same as the old one. 2. Due to this, the 'updated_cpus' mask turned out to be empty after exiting the 'for-loop' in arch_update_cpu_topology(). 3. So we ended up calling stop-machine() with an empty cpumask list, which made stop-machine internally elect cpumask_first(cpu_online_mask), i.e., CPU0 as the cpu to run the payload (the update_cpu_topology() function). 4. This causes update_cpu_topology() to be run by CPU0. And since 'updates' is kzalloc()'ed inside arch_update_cpu_topology(), update_cpu_topology() finds update->cpu as well as update->new_nid to be 0. In other words, we end up assigning CPU0 (and eventually its siblings) to node 0, incorrectly. Along with the following wrong updating, it causes the sched-domain rebuild code to break and crash the system. Fix this by skipping the topology update in cases where we find that the topology has not actually changed in reality (ie., spurious updates). CC: Benjamin Herrenschmidt <benh@kernel.crashing.org> CC: Paul Mackerras <paulus@samba.org> CC: Nathan Fontenot <nfont@linux.vnet.ibm.com> CC: Stephen Rothwell <sfr@canb.auug.org.au> CC: Andrew Morton <akpm@linux-foundation.org> CC: Robert Jennings <rcj@linux.vnet.ibm.com> CC: Jesse Larrew <jlarrew@linux.vnet.ibm.com> CC: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> CC: Alistair Popple <alistair@popple.id.au> Suggested-by: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com> Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/mm: NUMA pte should be handled via slow path in get_user_pages_fast()Aneesh Kumar K.V2014-04-091-0/+13
| | | | | | | We need to handle numa pte via the slow path Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Merge remote-tracking branch 'scott/next' into nextBenjamin Herrenschmidt2014-03-242-23/+51
|\ | | | | | | | | | | Freescale updates from Scott. Mostly support for critical and machine check exceptions on 64-bit BookE, some new PCI suspend/resume work and misc bits.
| * powerpc/booke64: Critical and machine check exception supportScott Wood2014-03-191-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | Add special state saving for critical and machine check exceptions. Most of this code could be used to handle debug exceptions taken from kernel space, but actually doing so is outside the scope of this patch. The various critical and machine check exceptions now point to their real handlers, rather than hanging the kernel. Signed-off-by: Scott Wood <scottwood@freescale.com>
| * powerpc/booke64: Use SPRG_TLB_EXFRAME on bolted handlersScott Wood2014-03-191-16/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | While bolted handlers (including e6500) do not need to deal with a TLB miss recursively causing another TLB miss, nested TLB misses can still happen with crit/mc/debug exceptions -- so we still need to honor SPRG_TLB_EXFRAME. We don't need to spend time modifying it in the TLB miss fastpath, though -- the special level exception will handle that. Signed-off-by: Scott Wood <scottwood@freescale.com> Cc: Mihai Caraman <mihai.caraman@freescale.com> Cc: kvm-ppc@vger.kernel.org
| * powerpc/e6500: Make TLB lock recursiveScott Wood2014-03-191-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | Once special level interrupts are supported, we may take nested TLB misses -- so allow the same thread to acquire the lock recursively. The lock will not be effective against the nested TLB miss handler trying to write the same entry as the interrupted TLB miss handler, but that's also a problem on non-threaded CPUs that lack TLB write conditional. This will be addressed in the patch that enables crit/mc support by invalidating the TLB on return from level exceptions. Signed-off-by: Scott Wood <scottwood@freescale.com>
* | powerpc/mm: Make sure a local_irq_disable prevent a parallel THP splitAneesh Kumar K.V2014-03-241-0/+5
|/ | | | | | | | | | | | | | | We have generic code like the one in get_futex_key that assume that a local_irq_disable prevents a parallel THP split. Support that by adding a dummy smp call function after setting _PAGE_SPLITTING. Code paths like get_user_pages_fast still need to check for _PAGE_SPLITTING after disabling IRQ which indicate that a parallel THP splitting is ongoing. Now if they don't find _PAGE_SPLITTING set, then we can be sure that parallel split will now block in pmdp_splitting flush until we enables IRQ Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/book3e: Fix check for linear mapping in TLB miss handlerBenjamin Krill2014-03-071-1/+2
| | | | | | | | | | The previous code added wrong TLBs and causes machine check errors if a driver accessed passed the end of the linear mapping instead of a clean page fault. Signed-off-by: Ralph E. Bellofatto <ralphbel@us.ibm.com> Signed-off-by: Benjamin Krill <ben@codiert.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/pseries: Use remove_memory() to remove memoryNathan Fontenot2014-03-071-1/+6
| | | | | | | | | | | | | | | | | | | | The memory remove code for powerpc/pseries should call remove_memory() so that we are holding the hotplug_memory lock during memory remove operations. This patch updates the memory node remove handler to call remove_memory() and adds a ppc_md.remove_memory() entry to handle pseries specific work that is called from arch_remove_memory(). During memory remove in pseries_remove_memblock() we have to stay with removing memory one section at a time. This is needed because of how memory resources are handled. During memory add for pseries (via the probe file in sysfs) we add memory one section at a time which gives us a memory resource for each section. Future patches will aim to address this so will not have to remove memory one section at a time. Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/mm: Add new "set" flag argument to pte/pmd update functionAneesh Kumar K.V2014-02-172-6/+8
| | | | | | | | | | | | | | | | | | | | | | | pte_update() is a powerpc-ism used to change the bits of a PTE when the access permission is being restricted (a flush is potentially needed). It uses atomic operations on when needed and handles the hash synchronization on hash based processors. It is currently only used to clear PTE bits and so the current implementation doesn't provide a way to also set PTE bits. The new _PAGE_NUMA bit, when set, is actually restricting access so it must use that function too, so this change adds the ability for pte_update() to also set bits. We will use this later to set the _PAGE_NUMA bit. Acked-by: Mel Gorman <mgorman@suse.de> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Fix kdump hang issue on p8 with relocation on exception enabled.Mahesh Salgaonkar2014-02-111-0/+14
| | | | | | | | | | | | | | | | | | | | | On p8 systems, with relocation on exception feature enabled we are seeing kdump kernel hang at interrupt vector 0xc*4400. The reason is, with this feature enabled, exception are raised with MMU (IR=DR=1) ON with the default offset of 0xc*4000. Since exception is raised in virtual mode it requires the vector region to be executable without which it fails to fetch and execute instruction at 0xc*4xxx. For default kernel since kernel is loaded at real 0, the htab mappings sets the entire kernel text region executable. But for relocatable kernel (e.g. kdump case) we only copy interrupt vectors down to real 0 and never marked that region as executable because in p7 and below we always get exception in real mode. This patch fixes this issue by marking htab mapping range as executable that overlaps with the interrupt vector region for relocatable kernel. Thanks to Ben who helped me to debug this issue and find the root cause. Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/hugetlb: Replace __get_cpu_var with get_cpu_varTiejun Chen2014-01-291-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | Replace __get_cpu_var safely with get_cpu_var to avoid the following call trace: [ 7253.637591] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: hugemmap01/9048 [ 7253.637601] caller is free_hugepd_range.constprop.25+0x88/0x1a8 [ 7253.637605] CPU: 1 PID: 9048 Comm: hugemmap01 Not tainted 3.10.20-rt14+ #114 [ 7253.637606] Call Trace: [ 7253.637617] [cb049d80] [c0007ea4] show_stack+0x4c/0x168 (unreliable) [ 7253.637624] [cb049dc0] [c031c674] debug_smp_processor_id+0x114/0x134 [ 7253.637628] [cb049de0] [c0016d28] free_hugepd_range.constprop.25+0x88/0x1a8 [ 7253.637632] [cb049e00] [c001711c] hugetlb_free_pgd_range+0x6c/0x168 [ 7253.637639] [cb049e40] [c0117408] free_pgtables+0x12c/0x150 [ 7253.637646] [cb049e70] [c011ce38] unmap_region+0xa0/0x11c [ 7253.637671] [cb049ef0] [c011f03c] do_munmap+0x224/0x3bc [ 7253.637676] [cb049f20] [c011f2e0] vm_munmap+0x38/0x5c [ 7253.637682] [cb049f40] [c000ef88] ret_from_syscall+0x0/0x3c [ 7253.637686] --- Exception: c01 at 0xff16004 Signed-off-by: Tiejun Chen<tiejun.chen@windriver.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the ↵jmarchan@redhat.com2014-01-291-1/+1
| | | | | | | | | | | | | | allowed address space According to Posix, if MAP_FIXED is specified mmap shall set ENOMEM if the requested mapping exceeds the allowed range for address space of the process. The generic code set it right, but the specific powerpc slice_get_unmapped_area() function currently returns -EINVAL in that case. This patch corrects it. Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/numa: Fix decimal permissionsJoe Perches2014-01-291-1/+1
| | | | | | | This should have been octal. Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Merge remote-tracking branch 'scott/next' into nextBenjamin Herrenschmidt2014-01-292-1/+4
|\ | | | | | | | | | | | | | | << This contains a fix for a chroma_defconfig build break that was introduced by e6500 tablewalk support, and a device tree binding patch that missed the previous pull request due to some last-minute polishing. >>
| * powerpc/booke64: Guard e6500 tlb handler with CONFIG_PPC_FSL_BOOK3EScott Wood2014-01-172-1/+4
| | | | | | | | | | | | | | | | | | | | ...and make CONFIG_PPC_FSL_BOOK3E conflict with CONFIG_PPC_64K_PAGES. This fixes a build break with CONFIG_PPC_64K_PAGES on 64-bit book3e, that was introduced by commit 28efc35fe68dacbddc4b12c2fa8f2df1593a4ad3 ("powerpc/e6500: TLB miss handler with hardware tablewalk support"). Signed-off-by: Scott Wood <scottwood@freescale.com>
* | Merge branch 'next' of ↵Linus Torvalds2014-01-2716-79/+503
|\ \ | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "So here's my next branch for powerpc. A bit late as I was on vacation last week. It's mostly the same stuff that was in next already, I just added two patches today which are the wiring up of lockref for powerpc, which for some reason fell through the cracks last time and is trivial. The highlights are, in addition to a bunch of bug fixes: - Reworked Machine Check handling on kernels running without a hypervisor (or acting as a hypervisor). Provides hooks to handle some errors in real mode such as TLB errors, handle SLB errors, etc... - Support for retrieving memory error information from the service processor on IBM servers running without a hypervisor and routing them to the memory poison infrastructure. - _PAGE_NUMA support on server processors - 32-bit BookE relocatable kernel support - FSL e6500 hardware tablewalk support - A bunch of new/revived board support - FSL e6500 deeper idle states and altivec powerdown support You'll notice a generic mm change here, it has been acked by the relevant authorities and is a pre-req for our _PAGE_NUMA support" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (121 commits) powerpc: Implement arch_spin_is_locked() using arch_spin_value_unlocked() powerpc: Add support for the optimised lockref implementation powerpc/powernv: Call OPAL sync before kexec'ing powerpc/eeh: Escalate error on non-existing PE powerpc/eeh: Handle multiple EEH errors powerpc: Fix transactional FP/VMX/VSX unavailable handlers powerpc: Don't corrupt transactional state when using FP/VMX in kernel powerpc: Reclaim two unused thread_info flag bits powerpc: Fix races with irq_work Move precessing of MCE queued event out from syscall exit path. pseries/cpuidle: Remove redundant call to ppc64_runlatch_off() in cpu idle routines powerpc: Make add_system_ram_resources() __init powerpc: add SATA_MV to ppc64_defconfig powerpc/powernv: Increase candidate fw image size powerpc: Add debug checks to catch invalid cpu-to-node mappings powerpc: Fix the setup of CPU-to-Node mappings during CPU online powerpc/iommu: Don't detach device without IOMMU group powerpc/eeh: Hotplug improvement powerpc/eeh: Call opal_pci_reinit() on powernv for restoring config space powerpc/eeh: Add restore_config operation ...
| * Merge remote-tracking branch 'scott/next' into nextBenjamin Herrenschmidt2014-01-159-42/+384
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | Freescale updates from Scott: << Highlights include 32-bit booke relocatable support, e6500 hardware tablewalk support, various e500 SPE fixes, some new/revived boards, and e6500 deeper idle and altivec powerdown modes. >>
OpenPOWER on IntegriCloud