summaryrefslogtreecommitdiffstats
path: root/arch/powerpc
Commit message (Collapse)AuthorAgeFilesLines
* powerpc32: move xxxxx_dcache_range() functions inlineChristophe Leroy2016-03-113-68/+51
| | | | | | | | | | | | | flush/clean/invalidate _dcache_range() functions are all very similar and are quite short. They are mainly used in __dma_sync() perf_event locate them in the top 3 consumming functions during heavy ethernet activity They are good candidate for inlining, as __dma_sync() does almost nothing but calling them Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc32: Remove clear_pages() and define clear_page() inlineChristophe Leroy2016-03-113-20/+14
| | | | | | | | | | | | clear_pages() is never used expect by clear_page, and PPC32 is the only architecture (still) having this function. Neither PPC64 nor any other architecture has it. This patch removes clear_pages() and moves clear_page() function inline (same as PPC64) as it only is a few isns Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc: add inline functions for cache related instructionsChristophe Leroy2016-03-111-0/+19
| | | | | | | | This patch adds inline functions to use dcbz, dcbi, dcbf, dcbst from C functions Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/8xx: rewrite flush_instruction_cache() in CChristophe Leroy2016-03-112-6/+11
| | | | | | | | | | On PPC8xx, flushing instruction cache is performed by writing in register SPRN_IC_CST. This registers suffers CPU6 ERRATA. The patch rewrites the fonction in C so that CPU6 ERRATA will be handled transparently Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/8xx: rewrite set_context() in CChristophe Leroy2016-03-112-44/+34
| | | | | | | | | There is no real need to have set_context() in assembly. Now that we have mtspr() handling CPU6 ERRATA directly, we can rewrite set_context() in C language for easier maintenance. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/8xx: remove special handling of CPU6 errata in set_dec()Christophe Leroy2016-03-112-23/+1
| | | | | | | | CPU6 ERRATA is now handled directly in mtspr(), so we can use the standard set_dec() fonction in all cases. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/8xx: Handle CPU6 ERRATA directly in mtspr() macroChristophe Leroy2016-03-112-0/+84
| | | | | | | | | MPC8xx has an ERRATA on the use of mtspr() for some registers This patch includes the ERRATA handling directly into mtspr() macro so that mtspr() users don't need to bother about that errata Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/8xx: Add missing SPRN defines into reg_8xx.hChristophe Leroy2016-03-112-2/+13
| | | | | | | | | | | Add missing SPRN defines into reg_8xx.h Some of them are defined in mmu-8xx.h, so we include mmu-8xx.h in reg_8xx.h, for that we remove references to PAGE_SHIFT in mmu-8xx.h to have it self sufficient, as includers of reg_8xx.h don't all include asm/page.h Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc32: remove ioremap_baseChristophe Leroy2016-03-114-14/+2
| | | | | | | ioremap_base is not initialised and is nowhere used so remove it Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc32: Remove useless/wrong MMU:setio progress messageChristophe Leroy2016-03-111-4/+0
| | | | | | | | | Commit 771168494719 ("[POWERPC] Remove unused machine call outs") removed the call to setup_io_mappings(), so remove the associated progress line message Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc32: refactor x_mapped_by_bats() and x_mapped_by_tlbcam() togetherChristophe Leroy2016-03-114-42/+22
| | | | | | | | | | x_mapped_by_bats() and x_mapped_by_tlbcam() serve the same kind of purpose, and are never defined at the same time. So rename them x_block_mapped() and define them in the relevant places Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc32: Fix pte_offset_kernel() to return NULL for bad pagesChristophe Leroy2016-03-111-1/+2
| | | | | | | | | | The fixmap related functions try to map kernel pages that are already mapped through Large TLBs. pte_offset_kernel() has to return NULL for LTLBs, otherwise the caller will try to access level 2 table which doesn't exist Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/8xx: move setup_initial_memory_limit() into 8xx_mmu.cChristophe Leroy2016-03-112-19/+17
| | | | | | | | Now we have a 8xx specific .c file for that so put it in there as other powerpc variants do Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/8xx: Map linear kernel RAM with 8M pagesChristophe Leroy2016-03-114-14/+120
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a live running system (VoIP gateway for Air Trafic Control), over a 10 minutes period (with 277s idle), we get 87 millions DTLB misses and approximatly 35 secondes are spent in DTLB handler. This represents 5.8% of the overall time and even 10.8% of the non-idle time. Among those 87 millions DTLB misses, 15% are on user addresses and 85% are on kernel addresses. And within the kernel addresses, 93% are on addresses from the linear address space and only 7% are on addresses from the virtual address space. MPC8xx has no BATs but it has 8Mb page size. This patch implements mapping of kernel RAM using 8Mb pages, on the same model as what is done on the 40x. In 4k pages mode, each PGD entry maps a 4Mb area: we map every two entries to the same 8Mb physical page. In each second entry, we add 4Mb to the page physical address to ease life of the FixupDAR routine. This is just ignored by HW. In 16k pages mode, each PGD entry maps a 64Mb area: each PGD entry will point to the first page of the area. The DTLB handler adds the 3 bits from EPN to map the correct page. With this patch applied, we now get only 13 millions TLB misses during the 10 minutes period. The idle time has increased to 313s and the overall time spent in DTLB miss handler is 6.3s, which represents 1% of the overall time and 2.2% of non-idle time. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/8xx: Save r3 all the time in DTLB miss handlerChristophe Leroy2016-03-111-9/+4
| | | | | | | | | | | We are spending between 40 and 160 cycles with a mean of 65 cycles in the DTLB handling routine (measured with mftbl) so make it more simple althought it adds one instruction. With this modification, we get three registers available at all time, which will help with following patch. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/p5040: Add device node for RAID EngineXuelin Shi2016-03-092-0/+7
| | | | | | | | add the missing RAID Engine device node for p5040. otherwise, the device can not be detected. Signed-off-by: Xuelin Shi <xuelin.shi@nxp.com> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc: optimise csum_partial() call when len is constantChristophe Leroy2016-03-094-28/+61
| | | | | | | | | | | | | | | | | | | | | | | csum_partial is often called for small fixed length packets for which it is suboptimal to use the generic csum_partial() function. For instance, in my configuration, I got: * One place calling it with constant len 4 * Seven places calling it with constant len 8 * Three places calling it with constant len 14 * One place calling it with constant len 20 * One place calling it with constant len 24 * One place calling it with constant len 32 This patch renames csum_partial() to __csum_partial() and implements csum_partial() as a wrapper inline function which * uses csum_add() for small 16bits multiple constant length * uses ip_fast_csum() for other 32bits multiple constant * uses __csum_partial() in all other cases Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/fsl-lbc: Modify suspend/resume entry sequenceRaghav Dogra2016-03-091-11/+38
| | | | | | | | | | Modify platform driver suspend/resume to syscore suspend/resume. This is because p1022ds needs to use localbus when entering the PCIE resume. Signed-off-by: Raghav Dogra <raghav.dogra@nxp.com> [scottwood: dropped makefile churn] Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/8xx: CONFIG_DEBUG_PAGEALLOC requires ITLBmiss for kernel addressesChristophe Leroy2016-03-091-1/+1
| | | | | | | | | When CONFIG_DEBUG_PAGEALLOC is activated, the initial TLB mapping gets flushed to track accesses to wrong areas. Therefore, kernel addresses will also generate ITLB misses. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/885: set SDCR to 0x40Christophe Leroy2016-03-091-1/+4
| | | | | | | | | | | | | The MPC885 reference manual says that SDCR shall have value 0x40, but most exemples set SDCR to 0x1 With 0x1 in SDCR, we observe TX underruns on SCC when using it in QMC mode. According the NXP technical support, this is a copy/paste error from MPC860 reference manual, 0x40 being the only value supported by the MPC885 HW. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/86xx: disable IDE subsystem in mpc8610_hpcd_defconfigBartlomiej Zolnierkiewicz2016-03-091-1/+0
| | | | | | | | | This patch disables deprecated IDE subsystem in mpc8610_hpcd_defconfig (no IDE host drivers are selected in this config so there is no valid reason to enable IDE subsystem itself). Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/85xx: disable IDE subsystem in stx_gp3_defconfigBartlomiej Zolnierkiewicz2016-03-091-2/+0
| | | | | | | | | | | This patch disables deprecated IDE subsystem in stx_gp3_defconfig (no IDE host drivers are selected in this config so there is no valid reason to enable IDE subsystem itself). Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/85xx: disable IDE subsystem in ksi8560_defconfigBartlomiej Zolnierkiewicz2016-03-091-1/+0
| | | | | | | | | | | This patch disables deprecated IDE subsystem in ksi8560_defconfig (no IDE host drivers are selected in this config so there is no valid reason to enable IDE subsystem itself). Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/83xx: disable IDE subsystem in mpc834x_itx_defconfigBartlomiej Zolnierkiewicz2016-03-091-1/+0
| | | | | | | | | | | This patch disables deprecated IDE subsystem in mpc834x_itx_defconfig (no IDE host drivers are selected in this config so there is no valid reason to enable IDE subsystem itself). Cc: Scott Wood <oss@buserror.net> Cc: Kumar Gala <galak@kernel.crashing.org> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/mpc85xx: Add CPU hotplug support for E6500chenhui zhao2016-03-044-31/+124
| | | | | | | Support Freescale E6500 core-based platforms, like t4240. Support disabling/enabling individual CPU thread dynamically. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
* powerpc/mpc85xx: Add hotplug support on E5500 and E500MC coreschenhui zhao2016-03-044-90/+118
| | | | | | | | | | | Freescale E500MC and E5500 core-based platforms, like P4080, T1040, support disabling/enabling CPU dynamically. This patch adds this feature on those platforms. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Tang Yuantian <Yuantian.Tang@feescale.com> [scottwood: removed unused pr_fmt] Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/mpc85xx: refactor the PM operationschenhui zhao2016-03-044-54/+127
| | | | | | | | | | | Freescale CoreNet-based and Non-CoreNet-based platforms require different PM operations. This patch extracted existing PM operations on Non-CoreNet-based platforms to a new file which can accommodate both platforms. In this way, PM operation codes are clearer structurally. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Tang Yuantian <Yuantian.Tang@feescale.com> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/rcpm: add RCPM driverchenhui zhao2016-03-048-0/+466
| | | | | | | | | | | | | | There is a RCPM (Run Control/Power Management) in Freescale QorIQ series processors. The device performs tasks associated with device run control and power management. The driver implements some features: mask/unmask irq, enter/exit low power states, freeze time base, etc. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Tang Yuantian <Yuantian.Tang@freescale.com> [scottwood: remove __KERNEL__ ifdef] Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/cache: add cache flush operation for various e500chenhui zhao2016-03-047-78/+128
| | | | | | | | | | | | Various e500 core have different cache architecture, so they need different cache flush operations. Therefore, add a callback function cpu_flush_caches to the struct cpu_spec. The cache flush operation for the specific kind of e500 is selected at init time. The callback function will flush all caches inside the current cpu. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Tang Yuantian <Yuantian.Tang@feescale.com> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/mm: any thread in one core can be the first to setup TLB1chenhui zhao2016-03-042-3/+9
| | | | | | | | | | | On e6500, in the case of cpu hotplug, either thread in one core may be the first thread initilzing the TLB1. The subsequent threads must not setup it again. The code is derived from the comment of Scott Wood. Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc: simplify csum_add(a, b) in case a or b is constant 0Christophe Leroy2016-03-041-0/+6
| | | | | | | Simplify csum_add(a, b) in case a or b is constant 0 Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc32: optimise csum_partial() loopChristophe Leroy2016-03-041-1/+15
| | | | | | | | | | | | | On the 8xx, load latency is 2 cycles and taking branches also takes 2 cycles. So let's unroll the loop. This patch improves csum_partial() speed by around 10% on both: * 8xx (single issue processor with parallel execution) * 83xx (superscalar 6xx processor with dual instruction fetch and parallel execution) Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc32: optimise a few instructions in csum_partial()Christophe Leroy2016-03-041-20/+17
| | | | | | | | | | | | | | | | | | | | | | r5 does contain the value to be updated, so lets use r5 all way long for that. It makes the code more readable. To avoid confusion, it is better to use adde instead of addc The first addition is useless. Its only purpose is to clear carry. As r4 is a signed int that is always positive, this can be done by using srawi instead of srwi Let's also remove the comment about bdnz having no overhead as it is not correct on all powerpc, at least on MPC8xx In the last part, in our situation, the remaining quantity of bytes to be proceeded is between 0 and 3. Therefore, we can base that part on the value of bit 31 and bit 30 of r4 instead of anding r4 with 3 then proceding on comparisons and substractions. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()Christophe Leroy2016-03-041-111/+209
| | | | | | | | | | | | | | | | | | | | | | | | csum_partial_copy_generic() does the same as copy_tofrom_user and also calculates the checksum during the copy. Unlike copy_tofrom_user(), the existing version of csum_partial_copy_generic() doesn't take benefit of the cache. This patch is a rewrite of csum_partial_copy_generic() based on copy_tofrom_user(). The previous version of csum_partial_copy_generic() was handling errors. Now we have the checksum wrapper functions to handle the error case like in powerpc64 so we can make the error case simple: just return -EFAULT. copy_tofrom_user() only has r12 available => we use it for the checksum r7 and r8 which contains pointers to error feedback are used, so we stack them. On a TCP benchmark using socklib on the loopback interface on which checksum offload and scatter/gather have been deactivated, we get about 20% performance increase. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc: inline ip_fast_csum()Christophe Leroy2016-03-044-56/+38
| | | | | | | | | | | | In several architectures, ip_fast_csum() is inlined There are functions like ip_send_check() which do nothing much more than calling ip_fast_csum(). Inlining ip_fast_csum() allows the compiler to optimise better Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> [scottwood: whitespace and cast fixes] Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc32: checksum_wrappers_64 becomes checksum_wrappersChristophe Leroy2016-03-043-11/+1
| | | | | | | | | | | | | | | | | The powerpc64 checksum wrapper functions adds csum_and_copy_to_user() which otherwise is implemented in include/net/checksum.h by using csum_partial() then copy_to_user() Those two wrapper fonctions are also applicable to powerpc32 as it is based on the use of csum_partial_copy_generic() which also exists on powerpc32 This patch renames arch/powerpc/lib/checksum_wrappers_64.c to arch/powerpc/lib/checksum_wrappers.c and makes it non-conditional to CONFIG_WORD_SIZE Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc: mark xer clobbered in csum_add()Christophe Leroy2016-03-041-1/+1
| | | | | | | addc uses carry so xer is clobbered in csum_add() Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc: unexport csum_tcpudp_magicChristophe Leroy2016-03-041-1/+0
| | | | | | | | csum_tcpudp_magic is now an inline function, so there is nothing to export Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Scott Wood <oss@buserror.net>
* powerpc/mm: Move hash64 tlbflush code into a new headerAneesh Kumar K.V2016-03-032-91/+95
| | | | | | | No code changes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Move hash related mmu-*.h headers to book3s/Aneesh Kumar K.V2016-03-0313-12/+12
| | | | | | | No code changes. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: add _PAGE_HASHPTE similar to 4K hashAneesh Kumar K.V2016-03-031-1/+1
| | | | | | | | | | | | | | | | We don't need to update linux page table entry with _PAGE_HASHPTE early in hash pte fault. A parallel pte update will loop via _PAGE_BUSY and look at _PAGE_HASHPTE for a required hpte flush only if _PAGE_BUSY is cleared. That ensures a pte update will wait for a parallel hpte insert to finish before looking at _PAGE_HASHPTE bit. To avoid further confusion drop setting _PAGE_HASHPTE in cmpxchg in __hash_page_4K. commit 41743a4e34f0 ("powerpc: Free a PTE bit on ppc64 with 64K pages") did similar change for 64K config Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerp/mm: Update code commentsAneesh Kumar K.V2016-03-032-3/+3
| | | | | | | We are updating pte in those functions. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* mm: Some arch may want to use HPAGE_PMD related values as variablesKirill A. Shutemov2016-03-031-0/+7
| | | | | | | | | | | | | | | | | | | | With next generation power processor, we are having a new mmu model [1] that require us to maintain a different linux page table format. Inorder to support both current and future ppc64 systems with a single kernel we need to make sure kernel can select between different page table format at runtime. With the new MMU (radix MMU) added, we will have two different pmd hugepage size 16MB for hash model and 2MB for Radix model. Hence make HPAGE_PMD related values as a variable. Actual conversion of HPAGE_PMD to a variable for ppc64 happens in a followup patch. [1] http://ibm.biz/power-isa3 (Needs registration). Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Switch book3s 64 with 64K page size to 4 level page tableAneesh Kumar K.V2016-03-038-64/+102
| | | | | | | | | | | | This is needed so that we can support both hash and radix page table using single kernel. Radix kernel uses a 4 level table. We now use physical address in upper page table tree levels. Even though they are aligned to their size, for the masked bits we use the bit positions as per PowerISA 3.0. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Don't have conditional defines for real_pte_tAneesh Kumar K.V2016-03-032-22/+9
| | | | | | | | We remove real_pte_t out of STRICT_MM_TYPESCHECK. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc/mm: Split pgtable types to separate headerAneesh Kumar K.V2016-03-032-103/+109
| | | | | | | | | | We move the page table accessors into a separate header. We will later add a big endian variant of the table which is needed for radix. No functionality change only code movement. Reviewed-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Add the ability to save VSX without giving it upCyril Bur2016-03-024-37/+30
| | | | | | | | | | | | This patch adds the ability to be able to save the VSX registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch builds on a previous optimisation for the FPU and VEC registers in the thread copy path to avoid a possibly pointless reload of VSX state. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Add the ability to save Altivec without giving it upCyril Bur2016-03-023-22/+17
| | | | | | | | | | | | This patch adds the ability to be able to save the VEC registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch builds on a previous optimisation for the FPU registers in the thread copy path to avoid a possibly pointless reload of VEC state. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Add the ability to save FPU without giving it upCyril Bur2016-03-023-19/+17
| | | | | | | | | | | | | This patch adds the ability to be able to save the FPU registers to the thread struct without giving up (disabling the facility) next time the process returns to userspace. This patch optimises the thread copy path (as a result of a fork() or clone()) so that the parent thread can return to userspace with hot registers avoiding a possibly pointless reload of FPU register state. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
* powerpc: Prepare for splitting giveup_{fpu, altivec, vsx} in twoCyril Bur2016-03-023-1/+45
| | | | | | | | | | | | | | | This prepares for the decoupling of saving {fpu,altivec,vsx} registers and marking {fpu,altivec,vsx} as being unused by a thread. Currently giveup_{fpu,altivec,vsx}() does both however optimisations to task switching can be made if these two operations are decoupled. save_all() will permit the saving of registers to thread structs and leave threads MSR with bits enabled. This patch introduces no functional change. Signed-off-by: Cyril Bur <cyrilbur@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
OpenPOWER on IntegriCloud