summaryrefslogtreecommitdiffstats
path: root/arch/powerpc/include/asm/pgtable.h
Commit message (Collapse)AuthorAgeFilesLines
* x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levelsMel Gorman2014-06-041-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting faults on x86. Care is taken such that _PAGE_NUMA is used only in situations where the VMA flags distinguish between NUMA hinting faults and prot_none faults. This decision was x86-specific and conceptually it is difficult requiring special casing to distinguish between PROTNONE and NUMA ptes based on context. Fundamentally, we only need the _PAGE_NUMA bit to tell the difference between an entry that is really unmapped and a page that is protected for NUMA hinting faults as if the PTE is not present then a fault will be trapped. Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset. This patch shrinks the maximum possible swap size and uses the bit to uniquely distinguish between NUMA hinting ptes and swap ptes. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bitAneesh Kumar K.V2014-02-171-0/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using a hash table MMU for various reasons (the flush is handled as part of the PTE modification when necessary). ppc64 thus doesn't implement flush_tlb_range for hash based MMUs. Additionally ppc64 require the tlb flushing to be batched within ptl locks. The reason to do that is to ensure that the hash page table is in sync with linux page table. We track the hpte index in linux pte and if we clear them without flushing hash and drop the ptl lock, we can have another cpu update the pte and can end up with duplicate entry in the hash table, which is fatal. We also want to keep set_pte_at simpler by not requiring them to do hash flush for performance reason. We do that by assuming that set_pte_at() is never *ever* called on a PTE that is already valid. This was the case until the NUMA code went in which broke that assumption. Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a way similar to ptep/pmdp_set_wrprotect(), with a generic implementation using set_pte_at() and a powerpc specific one using the appropriate mechanism needed to keep the hash table in sync. Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2014-01-311-0/+21
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more KVM updates from Paolo Bonzini: "Second batch of KVM updates. Some minor x86 fixes, two s390 guest features that need some handling in the host, and all the PPC changes. The PPC changes include support for little-endian guests and enablement for new POWER8 features" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (45 commits) x86, kvm: correctly access the KVM_CPUID_FEATURES leaf at 0x40000101 x86, kvm: cache the base of the KVM cpuid leaves kvm: x86: move KVM_CAP_HYPERV_TIME outside #ifdef KVM: PPC: Book3S PR: Cope with doorbell interrupts KVM: PPC: Book3S HV: Add software abort codes for transactional memory KVM: PPC: Book3S HV: Add new state for transactional memory powerpc/Kconfig: Make TM select VSX and VMX KVM: PPC: Book3S HV: Basic little-endian guest support KVM: PPC: Book3S HV: Add support for DABRX register on POWER7 KVM: PPC: Book3S HV: Prepare for host using hypervisor doorbells KVM: PPC: Book3S HV: Handle new LPCR bits on POWER8 KVM: PPC: Book3S HV: Handle guest using doorbells for IPIs KVM: PPC: Book3S HV: Consolidate code that checks reason for wake from nap KVM: PPC: Book3S HV: Implement architecture compatibility modes for POWER8 KVM: PPC: Book3S HV: Add handler for HV facility unavailable KVM: PPC: Book3S HV: Flush the correct number of TLB sets on POWER8 KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs KVM: PPC: Book3S HV: Align physical and virtual CPU thread numbers KVM: PPC: Book3S HV: Don't set DABR on POWER8 kvm/ppc: IRQ disabling cleanup ...
| * kvm: powerpc: define a linux pte lookup functionBharat Bhushan2014-01-091-0/+21
| | | | | | | | | | | | | | | | | | We need to search linux "pte" to get "pte" attributes for setting TLB in KVM. This patch defines a lookup_linux_ptep() function which returns pte pointer. Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com> Reviewed-by: Scott Wood <scottwood@freescale.com> Signed-off-by: Alexander Graf <agraf@suse.de>
* | powerpc/mm: Enable _PAGE_NUMA for book3sAneesh Kumar K.V2013-12-091-1/+65
|/ | | | | | | | We steal the _PAGE_COHERENCE bit and use that for indicating NUMA ptes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* Merge branch 'next' of ↵Linus Torvalds2013-07-041-0/+6
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc updates from Ben Herrenschmidt: "This is the powerpc changes for the 3.11 merge window. In addition to the usual bug fixes and small updates, the main highlights are: - Support for transparent huge pages by Aneesh Kumar for 64-bit server processors. This allows the use of 16M pages as transparent huge pages on kernels compiled with a 64K base page size. - Base VFIO support for KVM on power by Alexey Kardashevskiy - Wiring up of our nvram to the pstore infrastructure, including putting compressed oopses in there by Aruna Balakrishnaiah - Move, rework and improve our "EEH" (basically PCI error handling and recovery) infrastructure. It is no longer specific to pseries but is now usable by the new "powernv" platform as well (no hypervisor) by Gavin Shan. - I fixed some bugs in our math-emu instruction decoding and made it usable to emulate some optional FP instructions on processors with hard FP that lack them (such as fsqrt on Freescale embedded processors). - Support for Power8 "Event Based Branch" facility by Michael Ellerman. This facility allows what is basically "userspace interrupts" for performance monitor events. - A bunch of Transactional Memory vs. Signals bug fixes and HW breakpoint/watchpoint fixes by Michael Neuling. And more ... I appologize in advance if I've failed to highlight something that somebody deemed worth it." * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits) pstore: Add hsize argument in write_buf call of pstore_ftrace_call powerpc/fsl: add MPIC timer wakeup support powerpc/mpic: create mpic subsystem object powerpc/mpic: add global timer support powerpc/mpic: add irq_set_wake support powerpc/85xx: enable coreint for all the 64bit boards powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig powerpc/mpic: Add get_version API both for internal and external use powerpc: Handle both new style and old style reserve maps powerpc/hw_brk: Fix off by one error when validating DAWR region end powerpc/pseries: Support compression of oops text via pstore powerpc/pseries: Re-organise the oops compression code pstore: Pass header size in the pstore write callback powerpc/powernv: Fix iommu initialization again powerpc/pseries: Inform the hypervisor we are using EBB regs powerpc/perf: Add power8 EBB support powerpc/perf: Core EBB support for 64-bit book3s powerpc/perf: Drop MMCRA from thread_struct powerpc/perf: Don't enable if we have zero events ...
| * powerpc: move find_linux_pte_or_hugepte and gup_hugepte to common codeAneesh Kumar K.V2013-06-211-0/+2
| | | | | | | | | | | | | | | | We will use this in the later patch for handling THP pages Reviewed-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * powerpc/THP: Implement transparent hugepages for ppc64Aneesh Kumar K.V2013-06-211-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now have pmd entries covering 16MB range and the PMD table double its original size. We use the second half of the PMD table to deposit the pgtable (PTE page). The depoisted PTE page is further used to track the HPTE information. The information include [ secondary group | 3 bit hidx | valid ]. We use one byte per each HPTE entry. With 16MB hugepage and 64K HPTE we need 256 entries and with 4K HPTE we need 4096 entries. Both will fit in a 4K PTE page. On hugepage invalidate we need to walk the PTE page and invalidate all valid HPTEs. This patch implements necessary arch specific functions for THP support and also hugepage invalidate logic. These PMD related functions are intentionally kept similar to their PTE counter-part. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | consolidate io_remap_pfn_range definitionsAl Viro2013-06-291-3/+0
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* powerpc: Switch 16GB and 16MB explicit hugepages to a different page table ↵Aneesh Kumar K.V2013-04-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | format We will be switching PMD_SHIFT to 24 bits to facilitate THP impmenetation. With PMD_SHIFT set to 24, we now have 16MB huge pages allocated at PGD level. That means with 32 bit process we cannot allocate normal pages at all, because we cover the entire address space with one pgd entry. Fix this by switching to a new page table format for hugepages. With the new page table format for 16GB and 16MB hugepages we won't allocate hugepage directory. Instead we encode the PTE information directly at the directory level. This forces 16MB hugepage at PMD level. This will also make the page take walk much simpler later when we add the THP support. With the new table format we have 4 cases for pgds and pmds: (1) invalid (all zeroes) (2) pointer to next table, as normal; bottom 6 bits == 0 (3) leaf pte for huge page, bottom two bits != 00 (4) hugepd pointer, bottom two bits == 00, next 4 bits indicate size of table Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Don't hard code the size of pte pageAneesh Kumar K.V2013-04-301-0/+6
| | | | | | | | | USE PTRS_PER_PTE to indicate the size of pte page. To support THP, later patches will be changing PTRS_PER_PTE value. Acked-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/mm: Make some of the PGTABLE_RANGE dependency explicitAneesh Kumar K.V2012-09-171-8/+2
| | | | | | | | slice array size and slice mask size depend on PGTABLE_RANGE. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Add pgprot_cached_noncoherent()Geoff Thorpe2011-11-251-0/+3
| | | | | | | | This adds a pgprot combination required by some cache-enabled IO device mappings, such as Freescale datapath (QMan and BMan) portals. Signed-off-by: Geoff Thorpe <geoff@geoffthorpe.net> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc: Add pgprot_writecombineAnton Blanchard2011-03-021-0/+1
| | | | | | | | A number of drivers are using pgprot_writecombine() to enable write combining on userspace mappings. Implement it on powerpc. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itselfRussell King2010-02-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On VIVT ARM, when we have multiple shared mappings of the same file in the same MM, we need to ensure that we have coherency across all copies. We do this via make_coherent() by making the pages uncacheable. This used to work fine, until we allowed highmem with highpte - we now have a page table which is mapped as required, and is not available for modification via update_mmu_cache(). Ralf Beache suggested getting rid of the PTE value passed to update_mmu_cache(): On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables to construct a pointer to the pte again. Passing a pte_t * is much more elegant. Maybe we might even replace the pte argument with the pte_t? Ben Herrenschmidt would also like the pte pointer for PowerPC: Passing the ptep in there is exactly what I want. I want that -instead- of the PTE value, because I have issue on some ppc cases, for I$/D$ coherency, where set_pte_at() may decide to mask out the _PAGE_EXEC. So, pass in the mapped page table pointer into update_mmu_cache(), and remove the PTE value, updating all implementations and call sites to suit. Includes a fix from Stephen Rothwell: sparc: fix fallout from update_mmu_cache API change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* powerpc/mm: Allow more flexible layouts for hugepage pagetablesDavid Gibson2009-10-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently each available hugepage size uses a slightly different pagetable layout: that is, the bottem level table of pointers to hugepages is a different size, and may branch off from the normal page tables at a different level. Every hugepage aware path that needs to walk the pagetables must therefore look up the hugepage size from the slice info first, and work out the correct way to walk the pagetables accordingly. Future hardware is likely to add more possible hugepage sizes, more layout options and more mess. This patch, therefore reworks the handling of hugepage pagetables to reduce this complexity. In the new scheme, instead of having to consult the slice mask, pagetable walking code can check a flag in the PGD/PUD/PMD entries to see where to branch off to hugepage pagetables, and the entry also contains the information (eseentially hugepage shift) necessary to then interpret that table without recourse to the slice mask. This scheme can be extended neatly to handle multiple levels of self-describing "special" hugepage pagetables, although for now we assume only one level exists. This approach means that only the pagetable allocation path needs to know how the pagetables should be set out. All other (hugepage) pagetable walking paths can just interpret the structure as they go. There already was a flag bit in PGD/PUD/PMD entries for hugepage directory pointers, but it was only used for debug. We alter that flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable pointer (normally it would be 1 since the pointer lies in the linear mapping). This means that asm pagetable walking can test for (and punt on) hugepage pointers with the same test that checks for unpopulated page directory entries (beq becomes bge), since hugepage pointers will always be positive, and normal pointers always negative. While we're at it, we get rid of the confusing (and grep defeating) #defining of hugepte_shift to be the same thing as mmu_huge_psizes. Signed-off-by: David Gibson <dwg@au1.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/32: Always order writes to halves of 64-bit PTEsPaul Mackerras2009-08-181-3/+3
| | | | | | | | | | | | | | | | On 32-bit systems with 64-bit PTEs, the PTEs have to be written in two 32-bit halves. On SMP we write the higher-order half and then the lower-order half, with a write barrier between the two halves, but on UP there was no particular ordering of the writes to the two halves. This extends the ordering that we already do on SMP to the UP case as well. The reason is that with the perf_counter subsystem potentially accessing user memory at interrupt time to get stack traces, we have to be careful not to create an incorrect but apparently valid PTE even on UP. Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* powerpc/mm: Merge various PTE bits and accessors definitionsBenjamin Herrenschmidt2009-03-241-4/+50
| | | | | | | | | | | | | | | | | Now that they are almost identical, we can merge some of the definitions related to the PTE format into common files. This creates a new pte-common.h which is included by both 32 and 64-bit right after the CPU specific pte-*.h file, and which defines some bits to "default" values if they haven't been defined already, and then provides a generic definition of most of the bit combinations based on these and exposed to the rest of the kernel. I also moved to the common pgtable.h most of the "small" accessors to the PTE bits and modification helpers (pte_mk*). The actual accessors remain in their separate files. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/mm: Tweak PTE bit combination definitionsBenjamin Herrenschmidt2009-03-241-0/+4
| | | | | | | | | | | | | | | | | This patch tweaks the way some PTE bit combinations are defined, in such a way that the 32 and 64-bit variant become almost identical and that will make it easier to bring in a new common pte-* file for the new variant of the Book3-E support. The combination of bits defining access to kernel pages are now clearly separated from the combination used by userspace and the core VM. The resulting generated code should remain identical unless I made a mistake. Note: While at it, I removed a non-sensical statement related to CONFIG_KGDB in ppc_mmu_32.c which could cause kernel mappings to be user accessible when that option is enabled. Probably something that bitrot. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/mm: Rework I$/D$ coherency (v3)Benjamin Herrenschmidt2009-02-111-0/+84
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reworks the way we do I and D cache coherency on PowerPC. The "old" way was split in 3 different parts depending on the processor type: - Hash with per-page exec support (64-bit and >= POWER4 only) does it at hashing time, by preventing exec on unclean pages and cleaning pages on exec faults. - Everything without per-page exec support (32-bit hash, 8xx, and 64-bit < POWER4) does it for all page going to user space in update_mmu_cache(). - Embedded with per-page exec support does it from do_page_fault() on exec faults, in a way similar to what the hash code does. That leads to confusion, and bugs. For example, the method using update_mmu_cache() is racy on SMP where another processor can see the new PTE and hash it in before we have cleaned the cache, and then blow trying to execute. This is hard to hit but I think it has bitten us in the past. Also, it's inefficient for embedded where we always end up having to do at least one more page fault. This reworks the whole thing by moving the cache sync into two main call sites, though we keep different behaviours depending on the HW capability. The call sites are set_pte_at() which is now made out of line, and ptep_set_access_flags() which joins the former in pgtable.c The base idea for Embedded with per-page exec support, is that we now do the flush at set_pte_at() time when coming from an exec fault, which allows us to avoid the double fault problem completely (we can even improve the situation more by implementing TLB preload in update_mmu_cache() but that's for later). If for some reason we didn't do it there and we try to execute, we'll hit the page fault, which will do a minor fault, which will hit ptep_set_access_flags() to do things like update _PAGE_ACCESSED or _PAGE_DIRTY if needed, we just make this guys also perform the I/D cache sync for exec faults now. This second path is the catch all for things that weren't cleaned at set_pte_at() time. For cpus without per-pag exec support, we always do the sync at set_pte_at(), thus guaranteeing that when the PTE is visible to other processors, the cache is clean. For the 64-bit hash with per-page exec support case, we keep the old mechanism for now. I'll look into changing it later, once I've reworked a bit how we use _PAGE_EXEC. This is also a first step for adding _PAGE_EXEC support for embedded platforms Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* powerpc/mm: Rework usage of _PAGE_COHERENT/NO_CACHE/GUARDEDBenjamin Herrenschmidt2008-12-211-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we never set _PAGE_COHERENT in the PTEs, we just OR it in in the hash code based on some CPU feature bit. We also manipulate _PAGE_NO_CACHE and _PAGE_GUARDED by hand in all sorts of places. This changes the logic so that instead, the PTE now contains _PAGE_COHERENT for all normal RAM pages thay have I = 0 on platforms that need it. The hash code clears it if the feature bit is not set. It also adds some clean accessors to setup various valid combinations of access flags and change various bits of code to use them instead. This should help having the PTE actually containing the bit combinations that we really want. I also removed _PAGE_GUARDED from _PAGE_BASE on 44x and instead set it explicitely from the TLB miss. I will ultimately remove it completely as it appears that it might not be needed after all but in the meantime, having it in the TLB miss makes things a lot easier. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Paul Mackerras <paulus@samba.org>
* powerpc: Move include files to arch/powerpc/include/asmStephen Rothwell2008-08-041-0/+57
from include/asm-powerpc. This is the result of a mkdir arch/powerpc/include/asm git mv include/asm-powerpc/* arch/powerpc/include/asm Followed by a few documentation/comment fixups and a couple of places where <asm-powepc/...> was being used explicitly. Of the latter only one was outside the arch code and it is a driver only built for powerpc. Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Paul Mackerras <paulus@samba.org>
OpenPOWER on IntegriCloud