summaryrefslogtreecommitdiffstats
path: root/contrib/llvm/lib/CodeGen/InterleavedAccessPass.cpp
diff options
context:
space:
mode:
authordim <dim@FreeBSD.org>2016-12-26 20:36:37 +0000
committerdim <dim@FreeBSD.org>2016-12-26 20:36:37 +0000
commit06210ae42d418d50d8d9365d5c9419308ae9e7ee (patch)
treeab60b4cdd6e430dda1f292a46a77ddb744723f31 /contrib/llvm/lib/CodeGen/InterleavedAccessPass.cpp
parent2dd166267f53df1c3748b4325d294b9b839de74b (diff)
downloadFreeBSD-src-06210ae42d418d50d8d9365d5c9419308ae9e7ee.zip
FreeBSD-src-06210ae42d418d50d8d9365d5c9419308ae9e7ee.tar.gz
MFC r309124:
Upgrade our copies of clang, llvm, lldb, compiler-rt and libc++ to 3.9.0 release, and add lld 3.9.0. Also completely revamp the build system for clang, llvm, lldb and their related tools. Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11 support to build; see UPDATING for more information. Release notes for llvm, clang and lld are available here: <http://llvm.org/releases/3.9.0/docs/ReleaseNotes.html> <http://llvm.org/releases/3.9.0/tools/clang/docs/ReleaseNotes.html> <http://llvm.org/releases/3.9.0/tools/lld/docs/ReleaseNotes.html> Thanks to Ed Maste, Bryan Drewery, Andrew Turner, Antoine Brodin and Jan Beich for their help. Relnotes: yes MFC r309147: Pull in r282174 from upstream llvm trunk (by Krzysztof Parzyszek): [PPC] Set SP after loading data from stack frame, if no red zone is present Follow-up to r280705: Make sure that the SP is only restored after all data is loaded from the stack frame, if there is no red zone. This completes the fix for https://llvm.org/bugs/show_bug.cgi?id=26519. Differential Revision: https://reviews.llvm.org/D24466 Reported by: Mark Millard PR: 214433 MFC r309149: Pull in r283060 from upstream llvm trunk (by Hal Finkel): [PowerPC] Refactor soft-float support, and enable PPC64 soft float This change enables soft-float for PowerPC64, and also makes soft-float disable all vector instruction sets for both 32-bit and 64-bit modes. This latter part is necessary because the PPC backend canonicalizes many Altivec vector types to floating-point types, and so soft-float breaks scalarization support for many operations. Both for embedded targets and for operating-system kernels desiring soft-float support, it seems reasonable that disabling hardware floating-point also disables vector instructions (embedded targets without hardware floating point support are unlikely to have Altivec, etc. and operating system kernels desiring not to use floating-point registers to lower syscall cost are unlikely to want to use vector registers either). If someone needs this to work, we'll need to change the fact that we promote many Altivec operations to act on v4f32. To make it possible to disable Altivec when soft-float is enabled, hardware floating-point support needs to be expressed as a positive feature, like the others, and not a negative feature, because target features cannot have dependencies on the disabling of some other feature. So +soft-float has now become -hard-float. Fixes PR26970. Pull in r283061 from upstream clang trunk (by Hal Finkel): [PowerPC] Enable soft-float for PPC64, and +soft-float -> -hard-float Enable soft-float support on PPC64, as the backend now supports it. Also, the backend now uses -hard-float instead of +soft-float, so set the target features accordingly. Fixes PR26970. Reported by: Mark Millard PR: 214433 MFC r309212: Add a few missed clang 3.9.0 files to OptionalObsoleteFiles. MFC r309262: Fix packaging for clang, lldb and lld 3.9.0 During the upgrade of clang/llvm etc to 3.9.0 in r309124, the PACKAGE directive in the usr.bin/clang/*.mk files got dropped accidentally. Restore it, with a few minor changes and additions: * Correct license in clang.ucl to NCSA * Add PACKAGE=clang for clang and most of the "ll" tools * Put lldb in its own package * Put lld in its own package Reviewed by: gjb, jmallett Differential Revision: https://reviews.freebsd.org/D8666 MFC r309656: During the bootstrap phase, when building the minimal llvm library on PowerPC, add lib/Support/Atomic.cpp. This is needed because upstream llvm revision r271821 disabled the use of std::call_once, which causes some fallback functions from Atomic.cpp to be used instead. Reported by: Mark Millard PR: 214902 MFC r309835: Tentatively apply https://reviews.llvm.org/D18730 to work around gcc PR 70528 (bogus error: constructor required before non-static data member). This should fix buildworld with the external gcc package. Reported by: https://jenkins.freebsd.org/job/FreeBSD_HEAD_amd64_gcc/ MFC r310194: Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 3.9.1 release. Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11 support to build; see UPDATING for more information. Release notes for llvm, clang and lld will be available here: <http://releases.llvm.org/3.9.1/docs/ReleaseNotes.html> <http://releases.llvm.org/3.9.1/tools/clang/docs/ReleaseNotes.html> <http://releases.llvm.org/3.9.1/tools/lld/docs/ReleaseNotes.html> Relnotes: yes
Diffstat (limited to 'contrib/llvm/lib/CodeGen/InterleavedAccessPass.cpp')
-rw-r--r--contrib/llvm/lib/CodeGen/InterleavedAccessPass.cpp132
1 files changed, 116 insertions, 16 deletions
diff --git a/contrib/llvm/lib/CodeGen/InterleavedAccessPass.cpp b/contrib/llvm/lib/CodeGen/InterleavedAccessPass.cpp
index 724f1d6..3f11119 100644
--- a/contrib/llvm/lib/CodeGen/InterleavedAccessPass.cpp
+++ b/contrib/llvm/lib/CodeGen/InterleavedAccessPass.cpp
@@ -1,6 +1,6 @@
-//=----------------------- InterleavedAccessPass.cpp -----------------------==//
+//===--------------------- InterleavedAccessPass.cpp ----------------------===//
//
-// The LLVM Compiler Infrastructure
+// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
@@ -8,16 +8,18 @@
//===----------------------------------------------------------------------===//
//
// This file implements the Interleaved Access pass, which identifies
-// interleaved memory accesses and transforms into target specific intrinsics.
+// interleaved memory accesses and transforms them into target specific
+// intrinsics.
//
// An interleaved load reads data from memory into several vectors, with
// DE-interleaving the data on a factor. An interleaved store writes several
// vectors to memory with RE-interleaving the data on a factor.
//
-// As interleaved accesses are hard to be identified in CodeGen (mainly because
-// the VECTOR_SHUFFLE DAG node is quite different from the shufflevector IR),
-// we identify and transform them to intrinsics in this pass. So the intrinsics
-// can be easily matched into target specific instructions later in CodeGen.
+// As interleaved accesses are difficult to identified in CodeGen (mainly
+// because the VECTOR_SHUFFLE DAG node is quite different from the shufflevector
+// IR), we identify and transform them to intrinsics in this pass so the
+// intrinsics can be easily matched into target specific instructions later in
+// CodeGen.
//
// E.g. An interleaved load (Factor = 2):
// %wide.vec = load <8 x i32>, <8 x i32>* %ptr
@@ -38,6 +40,7 @@
//===----------------------------------------------------------------------===//
#include "llvm/CodeGen/Passes.h"
+#include "llvm/IR/Dominators.h"
#include "llvm/IR/InstIterator.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/MathExtras.h"
@@ -56,10 +59,6 @@ static cl::opt<bool> LowerInterleavedAccesses(
static unsigned MaxFactor; // The maximum supported interleave factor.
-namespace llvm {
-static void initializeInterleavedAccessPass(PassRegistry &);
-}
-
namespace {
class InterleavedAccess : public FunctionPass {
@@ -67,7 +66,7 @@ class InterleavedAccess : public FunctionPass {
public:
static char ID;
InterleavedAccess(const TargetMachine *TM = nullptr)
- : FunctionPass(ID), TM(TM), TLI(nullptr) {
+ : FunctionPass(ID), DT(nullptr), TM(TM), TLI(nullptr) {
initializeInterleavedAccessPass(*PassRegistry::getPassRegistry());
}
@@ -75,7 +74,13 @@ public:
bool runOnFunction(Function &F) override;
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
+ AU.addRequired<DominatorTreeWrapperPass>();
+ AU.addPreserved<DominatorTreeWrapperPass>();
+ }
+
private:
+ DominatorTree *DT;
const TargetMachine *TM;
const TargetLowering *TLI;
@@ -86,13 +91,26 @@ private:
/// \brief Transform an interleaved store into target specific intrinsics.
bool lowerInterleavedStore(StoreInst *SI,
SmallVector<Instruction *, 32> &DeadInsts);
+
+ /// \brief Returns true if the uses of an interleaved load by the
+ /// extractelement instructions in \p Extracts can be replaced by uses of the
+ /// shufflevector instructions in \p Shuffles instead. If so, the necessary
+ /// replacements are also performed.
+ bool tryReplaceExtracts(ArrayRef<ExtractElementInst *> Extracts,
+ ArrayRef<ShuffleVectorInst *> Shuffles);
};
} // end anonymous namespace.
char InterleavedAccess::ID = 0;
-INITIALIZE_TM_PASS(InterleavedAccess, "interleaved-access",
- "Lower interleaved memory accesses to target specific intrinsics",
- false, false)
+INITIALIZE_TM_PASS_BEGIN(
+ InterleavedAccess, "interleaved-access",
+ "Lower interleaved memory accesses to target specific intrinsics", false,
+ false)
+INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
+INITIALIZE_TM_PASS_END(
+ InterleavedAccess, "interleaved-access",
+ "Lower interleaved memory accesses to target specific intrinsics", false,
+ false)
FunctionPass *llvm::createInterleavedAccessPass(const TargetMachine *TM) {
return new InterleavedAccess(TM);
@@ -181,9 +199,18 @@ bool InterleavedAccess::lowerInterleavedLoad(
return false;
SmallVector<ShuffleVectorInst *, 4> Shuffles;
+ SmallVector<ExtractElementInst *, 4> Extracts;
- // Check if all users of this load are shufflevectors.
+ // Check if all users of this load are shufflevectors. If we encounter any
+ // users that are extractelement instructions, we save them to later check if
+ // they can be modifed to extract from one of the shufflevectors instead of
+ // the load.
for (auto UI = LI->user_begin(), E = LI->user_end(); UI != E; UI++) {
+ auto *Extract = dyn_cast<ExtractElementInst>(*UI);
+ if (Extract && isa<ConstantInt>(Extract->getIndexOperand())) {
+ Extracts.push_back(Extract);
+ continue;
+ }
ShuffleVectorInst *SVI = dyn_cast<ShuffleVectorInst>(*UI);
if (!SVI || !isa<UndefValue>(SVI->getOperand(1)))
return false;
@@ -219,6 +246,11 @@ bool InterleavedAccess::lowerInterleavedLoad(
Indices.push_back(Index);
}
+ // Try and modify users of the load that are extractelement instructions to
+ // use the shufflevector instructions instead of the load.
+ if (!tryReplaceExtracts(Extracts, Shuffles))
+ return false;
+
DEBUG(dbgs() << "IA: Found an interleaved load: " << *LI << "\n");
// Try to create target specific intrinsics to replace the load and shuffles.
@@ -232,6 +264,73 @@ bool InterleavedAccess::lowerInterleavedLoad(
return true;
}
+bool InterleavedAccess::tryReplaceExtracts(
+ ArrayRef<ExtractElementInst *> Extracts,
+ ArrayRef<ShuffleVectorInst *> Shuffles) {
+
+ // If there aren't any extractelement instructions to modify, there's nothing
+ // to do.
+ if (Extracts.empty())
+ return true;
+
+ // Maps extractelement instructions to vector-index pairs. The extractlement
+ // instructions will be modified to use the new vector and index operands.
+ DenseMap<ExtractElementInst *, std::pair<Value *, int>> ReplacementMap;
+
+ for (auto *Extract : Extracts) {
+
+ // The vector index that is extracted.
+ auto *IndexOperand = cast<ConstantInt>(Extract->getIndexOperand());
+ auto Index = IndexOperand->getSExtValue();
+
+ // Look for a suitable shufflevector instruction. The goal is to modify the
+ // extractelement instruction (which uses an interleaved load) to use one
+ // of the shufflevector instructions instead of the load.
+ for (auto *Shuffle : Shuffles) {
+
+ // If the shufflevector instruction doesn't dominate the extract, we
+ // can't create a use of it.
+ if (!DT->dominates(Shuffle, Extract))
+ continue;
+
+ // Inspect the indices of the shufflevector instruction. If the shuffle
+ // selects the same index that is extracted, we can modify the
+ // extractelement instruction.
+ SmallVector<int, 4> Indices;
+ Shuffle->getShuffleMask(Indices);
+ for (unsigned I = 0; I < Indices.size(); ++I)
+ if (Indices[I] == Index) {
+ assert(Extract->getOperand(0) == Shuffle->getOperand(0) &&
+ "Vector operations do not match");
+ ReplacementMap[Extract] = std::make_pair(Shuffle, I);
+ break;
+ }
+
+ // If we found a suitable shufflevector instruction, stop looking.
+ if (ReplacementMap.count(Extract))
+ break;
+ }
+
+ // If we did not find a suitable shufflevector instruction, the
+ // extractelement instruction cannot be modified, so we must give up.
+ if (!ReplacementMap.count(Extract))
+ return false;
+ }
+
+ // Finally, perform the replacements.
+ IRBuilder<> Builder(Extracts[0]->getContext());
+ for (auto &Replacement : ReplacementMap) {
+ auto *Extract = Replacement.first;
+ auto *Vector = Replacement.second.first;
+ auto Index = Replacement.second.second;
+ Builder.SetInsertPoint(Extract);
+ Extract->replaceAllUsesWith(Builder.CreateExtractElement(Vector, Index));
+ Extract->eraseFromParent();
+ }
+
+ return true;
+}
+
bool InterleavedAccess::lowerInterleavedStore(
StoreInst *SI, SmallVector<Instruction *, 32> &DeadInsts) {
if (!SI->isSimple())
@@ -264,6 +363,7 @@ bool InterleavedAccess::runOnFunction(Function &F) {
DEBUG(dbgs() << "*** " << getPassName() << ": " << F.getName() << "\n");
+ DT = &getAnalysis<DominatorTreeWrapperPass>().getDomTree();
TLI = TM->getSubtargetImpl(F)->getTargetLowering();
MaxFactor = TLI->getMaxSupportedInterleaveFactor();
OpenPOWER on IntegriCloud