summaryrefslogtreecommitdiffstats
path: root/contrib/llvm/lib/Target/PowerPC
diff options
context:
space:
mode:
authordim <dim@FreeBSD.org>2017-04-02 17:24:58 +0000
committerdim <dim@FreeBSD.org>2017-04-02 17:24:58 +0000
commit60b571e49a90d38697b3aca23020d9da42fc7d7f (patch)
tree99351324c24d6cb146b6285b6caffa4d26fce188 /contrib/llvm/lib/Target/PowerPC
parentbea1b22c7a9bce1dfdd73e6e5b65bc4752215180 (diff)
downloadFreeBSD-src-60b571e49a90d38697b3aca23020d9da42fc7d7f.zip
FreeBSD-src-60b571e49a90d38697b3aca23020d9da42fc7d7f.tar.gz
Update clang, llvm, lld, lldb, compiler-rt and libc++ to 4.0.0 release:
MFC r309142 (by emaste): Add WITH_LLD_AS_LD build knob If set it installs LLD as /usr/bin/ld. LLD (as of version 3.9) is not capable of linking the world and kernel, but can self-host and link many substantial applications. GNU ld continues to be used for the world and kernel build, regardless of how this knob is set. It is on by default for arm64, and off for all other CPU architectures. Sponsored by: The FreeBSD Foundation MFC r310840: Reapply 310775, now it also builds correctly if lldb is disabled: Move llvm-objdump from CLANG_EXTRAS to installed by default We currently install three tools from binutils 2.17.50: as, ld, and objdump. Work is underway to migrate to a permissively-licensed tool-chain, with one goal being the retirement of binutils 2.17.50. LLVM's llvm-objdump is intended to be compatible with GNU objdump although it is currently missing some options and may have formatting differences. Enable it by default for testing and further investigation. It may later be changed to install as /usr/bin/objdump, it becomes a fully viable replacement. Reviewed by: emaste Differential Revision: https://reviews.freebsd.org/D8879 MFC r312855 (by emaste): Rename LLD_AS_LD to LLD_IS_LD, for consistency with CLANG_IS_CC Reported by: Dan McGregor <dan.mcgregor usask.ca> MFC r313559 | glebius | 2017-02-10 18:34:48 +0100 (Fri, 10 Feb 2017) | 5 lines Don't check struct rtentry on FreeBSD, it is an internal kernel structure. On other systems it may be API structure for SIOCADDRT/SIOCDELRT. Reviewed by: emaste, dim MFC r314152 (by jkim): Remove an assembler flag, which is redundant since r309124. The upstream took care of it by introducing a macro NO_EXEC_STACK_DIRECTIVE. http://llvm.org/viewvc/llvm-project?rev=273500&view=rev Reviewed by: dim MFC r314564: Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 4.0.0 (branches/release_40 296509). The release will follow soon. Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11 support to build; see UPDATING for more information. Also note that as of 4.0.0, lld should be able to link the base system on amd64 and aarch64. See the WITH_LLD_IS_LLD setting in src.conf(5). Though please be aware that this is work in progress. Release notes for llvm, clang and lld will be available here: <http://releases.llvm.org/4.0.0/docs/ReleaseNotes.html> <http://releases.llvm.org/4.0.0/tools/clang/docs/ReleaseNotes.html> <http://releases.llvm.org/4.0.0/tools/lld/docs/ReleaseNotes.html> Thanks to Ed Maste, Jan Beich, Antoine Brodin and Eric Fiselier for their help. Relnotes: yes Exp-run: antoine PR: 215969, 216008 MFC r314708: For now, revert r287232 from upstream llvm trunk (by Daniil Fukalov): [SCEV] limit recursion depth of CompareSCEVComplexity Summary: CompareSCEVComplexity goes too deep (50+ on a quite a big unrolled loop) and runs almost infinite time. Added cache of "equal" SCEV pairs to earlier cutoff of further estimation. Recursion depth limit was also introduced as a parameter. Reviewers: sanjoy Subscribers: mzolotukhin, tstellarAMD, llvm-commits Differential Revision: https://reviews.llvm.org/D26389 This commit is the cause of excessive compile times on skein_block.c (and possibly other files) during kernel builds on amd64. We never saw the problematic behavior described in this upstream commit, so for now it is better to revert it. An upstream bug has been filed here: https://bugs.llvm.org/show_bug.cgi?id=32142 Reported by: mjg MFC r314795: Reapply r287232 from upstream llvm trunk (by Daniil Fukalov): [SCEV] limit recursion depth of CompareSCEVComplexity Summary: CompareSCEVComplexity goes too deep (50+ on a quite a big unrolled loop) and runs almost infinite time. Added cache of "equal" SCEV pairs to earlier cutoff of further estimation. Recursion depth limit was also introduced as a parameter. Reviewers: sanjoy Subscribers: mzolotukhin, tstellarAMD, llvm-commits Differential Revision: https://reviews.llvm.org/D26389 Pull in r296992 from upstream llvm trunk (by Sanjoy Das): [SCEV] Decrease the recursion threshold for CompareValueComplexity Fixes PR32142. r287232 accidentally increased the recursion threshold for CompareValueComplexity from 2 to 32. This change reverses that change by introducing a separate flag for CompareValueComplexity's threshold. The latter revision fixes the excessive compile times for skein_block.c. MFC r314907 | mmel | 2017-03-08 12:40:27 +0100 (Wed, 08 Mar 2017) | 7 lines Unbreak ARMv6 world. The new compiler_rt library imported with clang 4.0.0 have several fatal issues (non-functional __udivsi3 for example) with ARM specific instrict functions. As temporary workaround, until upstream solve these problems, disable all thumb[1][2] related feature. MFC r315016: Update clang, llvm, lld, lldb, compiler-rt and libc++ to 4.0.0 release. We were already very close to the last release candidate, so this is a pretty minor update. Relnotes: yes MFC r316005: Revert r314907, and pull in r298713 from upstream compiler-rt trunk (by Weiming Zhao): builtins: Select correct code fragments when compiling for Thumb1/Thum2/ARM ISA. Summary: Value of __ARM_ARCH_ISA_THUMB isn't based on the actual compilation mode (-mthumb, -marm), it reflect's capability of given CPU. Due to this: - use __tbumb__ and __thumb2__ insteand of __ARM_ARCH_ISA_THUMB - use '.thumb' directive consistently in all affected files - decorate all thumb functions using DEFINE_COMPILERRT_THUMB_FUNCTION() --------- Note: This patch doesn't fix broken Thumb1 variant of __udivsi3 ! Reviewers: weimingz, rengolin, compnerd Subscribers: aemerson, dim Differential Revision: https://reviews.llvm.org/D30938 Discussed with: mmel
Diffstat (limited to 'contrib/llvm/lib/Target/PowerPC')
-rw-r--r--contrib/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp369
-rw-r--r--contrib/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp39
-rw-r--r--contrib/llvm/lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp61
-rw-r--r--contrib/llvm/lib/Target/PowerPC/InstPrinter/PPCInstPrinter.h1
-rw-r--r--contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp3
-rw-r--r--contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp22
-rw-r--r--contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp3
-rw-r--r--contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h10
-rw-r--r--contrib/llvm/lib/Target/PowerPC/P9InstrResources.td808
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPC.td6
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp209
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCBoolRetToInt.cpp80
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCBranchSelector.cpp81
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCCTRLoops.cpp8
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCCallingConv.td26
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCEarlyReturn.cpp4
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCFastISel.cpp85
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp212
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCHazardRecognizers.cpp2
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp222
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCISelLowering.cpp955
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCISelLowering.h78
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCInstr64Bit.td13
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCInstrAltivec.td142
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCInstrFormats.td65
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp262
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.h29
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td156
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCInstrQPX.td10
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCInstrVSX.td775
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp38
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCMCInstLower.cpp8
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp164
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCQPXLoadSplat.cpp2
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp31
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.h2
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.td35
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCSchedule.td4
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCScheduleE500mc.td8
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCScheduleE5500.td10
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCScheduleP9.td335
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCSubtarget.h3
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCTLSDynamicCall.cpp26
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp27
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCTargetObjectFile.cpp8
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCTargetObjectFile.h3
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp28
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h10
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCVSXCopy.cpp42
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCVSXFMAMutate.cpp78
-rw-r--r--contrib/llvm/lib/Target/PowerPC/PPCVSXSwapRemoval.cpp3
-rw-r--r--contrib/llvm/lib/Target/PowerPC/TargetInfo/PowerPCTargetInfo.cpp25
52 files changed, 4389 insertions, 1237 deletions
diff --git a/contrib/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp b/contrib/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
index 4181775..52432a5 100644
--- a/contrib/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/AsmParser/PPCAsmParser.cpp
@@ -83,6 +83,16 @@ static const MCPhysReg FRegs[32] = {
PPC::F24, PPC::F25, PPC::F26, PPC::F27,
PPC::F28, PPC::F29, PPC::F30, PPC::F31
};
+static const MCPhysReg VFRegs[32] = {
+ PPC::VF0, PPC::VF1, PPC::VF2, PPC::VF3,
+ PPC::VF4, PPC::VF5, PPC::VF6, PPC::VF7,
+ PPC::VF8, PPC::VF9, PPC::VF10, PPC::VF11,
+ PPC::VF12, PPC::VF13, PPC::VF14, PPC::VF15,
+ PPC::VF16, PPC::VF17, PPC::VF18, PPC::VF19,
+ PPC::VF20, PPC::VF21, PPC::VF22, PPC::VF23,
+ PPC::VF24, PPC::VF25, PPC::VF26, PPC::VF27,
+ PPC::VF28, PPC::VF29, PPC::VF30, PPC::VF31
+};
static const MCPhysReg VRegs[32] = {
PPC::V0, PPC::V1, PPC::V2, PPC::V3,
PPC::V4, PPC::V5, PPC::V6, PPC::V7,
@@ -103,14 +113,14 @@ static const MCPhysReg VSRegs[64] = {
PPC::VSL24, PPC::VSL25, PPC::VSL26, PPC::VSL27,
PPC::VSL28, PPC::VSL29, PPC::VSL30, PPC::VSL31,
- PPC::VSH0, PPC::VSH1, PPC::VSH2, PPC::VSH3,
- PPC::VSH4, PPC::VSH5, PPC::VSH6, PPC::VSH7,
- PPC::VSH8, PPC::VSH9, PPC::VSH10, PPC::VSH11,
- PPC::VSH12, PPC::VSH13, PPC::VSH14, PPC::VSH15,
- PPC::VSH16, PPC::VSH17, PPC::VSH18, PPC::VSH19,
- PPC::VSH20, PPC::VSH21, PPC::VSH22, PPC::VSH23,
- PPC::VSH24, PPC::VSH25, PPC::VSH26, PPC::VSH27,
- PPC::VSH28, PPC::VSH29, PPC::VSH30, PPC::VSH31
+ PPC::V0, PPC::V1, PPC::V2, PPC::V3,
+ PPC::V4, PPC::V5, PPC::V6, PPC::V7,
+ PPC::V8, PPC::V9, PPC::V10, PPC::V11,
+ PPC::V12, PPC::V13, PPC::V14, PPC::V15,
+ PPC::V16, PPC::V17, PPC::V18, PPC::V19,
+ PPC::V20, PPC::V21, PPC::V22, PPC::V23,
+ PPC::V24, PPC::V25, PPC::V26, PPC::V27,
+ PPC::V28, PPC::V29, PPC::V30, PPC::V31
};
static const MCPhysReg VSFRegs[64] = {
PPC::F0, PPC::F1, PPC::F2, PPC::F3,
@@ -246,13 +256,11 @@ class PPCAsmParser : public MCTargetAsmParser {
bool IsDarwin;
void Warning(SMLoc L, const Twine &Msg) { getParser().Warning(L, Msg); }
- bool Error(SMLoc L, const Twine &Msg) { return getParser().Error(L, Msg); }
bool isPPC64() const { return IsPPC64; }
bool isDarwin() const { return IsDarwin; }
- bool MatchRegisterName(const AsmToken &Tok,
- unsigned &RegNo, int64_t &IntVal);
+ bool MatchRegisterName(unsigned &RegNo, int64_t &IntVal);
bool ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMLoc &EndLoc) override;
@@ -264,8 +272,8 @@ class PPCAsmParser : public MCTargetAsmParser {
bool ParseOperand(OperandVector &Operands);
- bool ParseDirectiveWord(unsigned Size, SMLoc L);
- bool ParseDirectiveTC(unsigned Size, SMLoc L);
+ bool ParseDirectiveWord(unsigned Size, AsmToken ID);
+ bool ParseDirectiveTC(unsigned Size, AsmToken ID);
bool ParseDirectiveMachine(SMLoc L);
bool ParseDarwinDirectiveMachine(SMLoc L);
bool ParseDirectiveAbiVersion(SMLoc L);
@@ -545,6 +553,7 @@ public:
&& isUInt<5>(getImm())); }
bool isCRBitMask() const { return Kind == Immediate && isUInt<8>(getImm()) &&
isPowerOf2_32(getImm()); }
+ bool isATBitsAsHint() const { return false; }
bool isMem() const override { return false; }
bool isReg() const override { return false; }
@@ -596,6 +605,11 @@ public:
Inst.addOperand(MCOperand::createReg(FRegs[getReg()]));
}
+ void addRegVFRCOperands(MCInst &Inst, unsigned N) const {
+ assert(N == 1 && "Invalid number of operands!");
+ Inst.addOperand(MCOperand::createReg(VFRegs[getReg()]));
+ }
+
void addRegVRRCOperands(MCInst &Inst, unsigned N) const {
assert(N == 1 && "Invalid number of operands!");
Inst.addOperand(MCOperand::createReg(VRegs[getReg()]));
@@ -874,6 +888,23 @@ void PPCAsmParser::ProcessInstruction(MCInst &Inst,
Inst = TmpInst;
break;
}
+ case PPC::DCBFx:
+ case PPC::DCBFL:
+ case PPC::DCBFLP: {
+ int L = 0;
+ if (Opcode == PPC::DCBFL)
+ L = 1;
+ else if (Opcode == PPC::DCBFLP)
+ L = 3;
+
+ MCInst TmpInst;
+ TmpInst.setOpcode(PPC::DCBF);
+ TmpInst.addOperand(MCOperand::createImm(L));
+ TmpInst.addOperand(Inst.getOperand(0));
+ TmpInst.addOperand(Inst.getOperand(1));
+ Inst = TmpInst;
+ break;
+ }
case PPC::LAx: {
MCInst TmpInst;
TmpInst.setOpcode(PPC::LA);
@@ -1263,68 +1294,54 @@ bool PPCAsmParser::MatchAndEmitInstruction(SMLoc IDLoc, unsigned &Opcode,
llvm_unreachable("Implement any new match types added!");
}
-bool PPCAsmParser::
-MatchRegisterName(const AsmToken &Tok, unsigned &RegNo, int64_t &IntVal) {
- if (Tok.is(AsmToken::Identifier)) {
- StringRef Name = Tok.getString();
-
+bool PPCAsmParser::MatchRegisterName(unsigned &RegNo, int64_t &IntVal) {
+ if (getParser().getTok().is(AsmToken::Identifier)) {
+ StringRef Name = getParser().getTok().getString();
if (Name.equals_lower("lr")) {
RegNo = isPPC64()? PPC::LR8 : PPC::LR;
IntVal = 8;
- return false;
} else if (Name.equals_lower("ctr")) {
RegNo = isPPC64()? PPC::CTR8 : PPC::CTR;
IntVal = 9;
- return false;
} else if (Name.equals_lower("vrsave")) {
RegNo = PPC::VRSAVE;
IntVal = 256;
- return false;
} else if (Name.startswith_lower("r") &&
!Name.substr(1).getAsInteger(10, IntVal) && IntVal < 32) {
RegNo = isPPC64()? XRegs[IntVal] : RRegs[IntVal];
- return false;
} else if (Name.startswith_lower("f") &&
!Name.substr(1).getAsInteger(10, IntVal) && IntVal < 32) {
RegNo = FRegs[IntVal];
- return false;
} else if (Name.startswith_lower("vs") &&
!Name.substr(2).getAsInteger(10, IntVal) && IntVal < 64) {
RegNo = VSRegs[IntVal];
- return false;
} else if (Name.startswith_lower("v") &&
!Name.substr(1).getAsInteger(10, IntVal) && IntVal < 32) {
RegNo = VRegs[IntVal];
- return false;
} else if (Name.startswith_lower("q") &&
!Name.substr(1).getAsInteger(10, IntVal) && IntVal < 32) {
RegNo = QFRegs[IntVal];
- return false;
} else if (Name.startswith_lower("cr") &&
!Name.substr(2).getAsInteger(10, IntVal) && IntVal < 8) {
RegNo = CRRegs[IntVal];
- return false;
- }
+ } else
+ return true;
+ getParser().Lex();
+ return false;
}
-
return true;
}
bool PPCAsmParser::
ParseRegister(unsigned &RegNo, SMLoc &StartLoc, SMLoc &EndLoc) {
- MCAsmParser &Parser = getParser();
- const AsmToken &Tok = Parser.getTok();
+ const AsmToken &Tok = getParser().getTok();
StartLoc = Tok.getLoc();
EndLoc = Tok.getEndLoc();
RegNo = 0;
int64_t IntVal;
-
- if (!MatchRegisterName(Tok, RegNo, IntVal)) {
- Parser.Lex(); // Eat identifier token.
- return false;
- }
-
- return Error(StartLoc, "invalid register name");
+ if (MatchRegisterName(RegNo, IntVal))
+ return TokError("invalid register name");
+ return false;
}
/// Extract \code @l/@ha \endcode modifier from expression. Recursively scan
@@ -1550,14 +1567,21 @@ bool PPCAsmParser::ParseOperand(OperandVector &Operands) {
Parser.Lex(); // Eat the '%'.
unsigned RegNo;
int64_t IntVal;
- if (!MatchRegisterName(Parser.getTok(), RegNo, IntVal)) {
- Parser.Lex(); // Eat the identifier token.
- Operands.push_back(PPCOperand::CreateImm(IntVal, S, E, isPPC64()));
- return false;
- }
- return Error(S, "invalid register name");
+ if (MatchRegisterName(RegNo, IntVal))
+ return Error(S, "invalid register name");
+
+ Operands.push_back(PPCOperand::CreateImm(IntVal, S, E, isPPC64()));
+ return false;
case AsmToken::Identifier:
+ case AsmToken::LParen:
+ case AsmToken::Plus:
+ case AsmToken::Minus:
+ case AsmToken::Integer:
+ case AsmToken::Dot:
+ case AsmToken::Dollar:
+ case AsmToken::Exclaim:
+ case AsmToken::Tilde:
// Note that non-register-name identifiers from the compiler will begin
// with '_', 'L'/'l' or '"'. Of course, handwritten asm could include
// identifiers like r31foo - so we fall through in the event that parsing
@@ -1565,25 +1589,17 @@ bool PPCAsmParser::ParseOperand(OperandVector &Operands) {
if (isDarwin()) {
unsigned RegNo;
int64_t IntVal;
- if (!MatchRegisterName(Parser.getTok(), RegNo, IntVal)) {
- Parser.Lex(); // Eat the identifier token.
+ if (!MatchRegisterName(RegNo, IntVal)) {
Operands.push_back(PPCOperand::CreateImm(IntVal, S, E, isPPC64()));
return false;
}
}
- // Fall-through to process non-register-name identifiers as expression.
- // All other expressions
- case AsmToken::LParen:
- case AsmToken::Plus:
- case AsmToken::Minus:
- case AsmToken::Integer:
- case AsmToken::Dot:
- case AsmToken::Dollar:
- case AsmToken::Exclaim:
- case AsmToken::Tilde:
+ // All other expressions
+
if (!ParseExpression(EVal))
break;
- /* fall through */
+ // Fall-through
+ LLVM_FALLTHROUGH;
default:
return Error(S, "unknown operand");
}
@@ -1621,40 +1637,33 @@ bool PPCAsmParser::ParseOperand(OperandVector &Operands) {
case AsmToken::Percent:
Parser.Lex(); // Eat the '%'.
unsigned RegNo;
- if (MatchRegisterName(Parser.getTok(), RegNo, IntVal))
+ if (MatchRegisterName(RegNo, IntVal))
return Error(S, "invalid register name");
- Parser.Lex(); // Eat the identifier token.
break;
case AsmToken::Integer:
- if (!isDarwin()) {
- if (getParser().parseAbsoluteExpression(IntVal) ||
- IntVal < 0 || IntVal > 31)
- return Error(S, "invalid register number");
- } else {
+ if (isDarwin())
return Error(S, "unexpected integer value");
- }
+ else if (getParser().parseAbsoluteExpression(IntVal) || IntVal < 0 ||
+ IntVal > 31)
+ return Error(S, "invalid register number");
break;
-
case AsmToken::Identifier:
if (isDarwin()) {
unsigned RegNo;
- if (!MatchRegisterName(Parser.getTok(), RegNo, IntVal)) {
- Parser.Lex(); // Eat the identifier token.
+ if (!MatchRegisterName(RegNo, IntVal)) {
break;
}
}
- // Fall-through..
+ LLVM_FALLTHROUGH;
default:
return Error(S, "invalid memory operand");
}
- if (getLexer().isNot(AsmToken::RParen))
- return Error(Parser.getTok().getLoc(), "missing ')'");
E = Parser.getTok().getLoc();
- Parser.Lex(); // Eat the ')'.
-
+ if (parseToken(AsmToken::RParen, "missing ')'"))
+ return true;
Operands.push_back(PPCOperand::CreateImm(IntVal, S, E, isPPC64()));
}
@@ -1668,14 +1677,12 @@ bool PPCAsmParser::ParseInstruction(ParseInstructionInfo &Info, StringRef Name,
// If the next character is a '+' or '-', we need to add it to the
// instruction name, to match what TableGen is doing.
std::string NewOpcode;
- if (getLexer().is(AsmToken::Plus)) {
- getLexer().Lex();
+ if (parseOptionalToken(AsmToken::Plus)) {
NewOpcode = Name;
NewOpcode += '+';
Name = NewOpcode;
}
- if (getLexer().is(AsmToken::Minus)) {
- getLexer().Lex();
+ if (parseOptionalToken(AsmToken::Minus)) {
NewOpcode = Name;
NewOpcode += '-';
Name = NewOpcode;
@@ -1700,20 +1707,15 @@ bool PPCAsmParser::ParseInstruction(ParseInstructionInfo &Info, StringRef Name,
}
// If there are no more operands then finish
- if (getLexer().is(AsmToken::EndOfStatement))
+ if (parseOptionalToken(AsmToken::EndOfStatement))
return false;
// Parse the first operand
if (ParseOperand(Operands))
return true;
- while (getLexer().isNot(AsmToken::EndOfStatement) &&
- getLexer().is(AsmToken::Comma)) {
- // Consume the comma token
- Lex();
-
- // Parse the next operand
- if (ParseOperand(Operands))
+ while (!parseOptionalToken(AsmToken::EndOfStatement)) {
+ if (parseToken(AsmToken::Comma) || ParseOperand(Operands))
return true;
}
@@ -1738,108 +1740,94 @@ bool PPCAsmParser::ParseInstruction(ParseInstructionInfo &Info, StringRef Name,
/// ParseDirective parses the PPC specific directives
bool PPCAsmParser::ParseDirective(AsmToken DirectiveID) {
StringRef IDVal = DirectiveID.getIdentifier();
- if (!isDarwin()) {
- if (IDVal == ".word")
- return ParseDirectiveWord(2, DirectiveID.getLoc());
- if (IDVal == ".llong")
- return ParseDirectiveWord(8, DirectiveID.getLoc());
- if (IDVal == ".tc")
- return ParseDirectiveTC(isPPC64()? 8 : 4, DirectiveID.getLoc());
+ if (isDarwin()) {
if (IDVal == ".machine")
- return ParseDirectiveMachine(DirectiveID.getLoc());
- if (IDVal == ".abiversion")
- return ParseDirectiveAbiVersion(DirectiveID.getLoc());
- if (IDVal == ".localentry")
- return ParseDirectiveLocalEntry(DirectiveID.getLoc());
- } else {
- if (IDVal == ".machine")
- return ParseDarwinDirectiveMachine(DirectiveID.getLoc());
- }
- return true;
+ ParseDarwinDirectiveMachine(DirectiveID.getLoc());
+ else
+ return true;
+ } else if (IDVal == ".word")
+ ParseDirectiveWord(2, DirectiveID);
+ else if (IDVal == ".llong")
+ ParseDirectiveWord(8, DirectiveID);
+ else if (IDVal == ".tc")
+ ParseDirectiveTC(isPPC64() ? 8 : 4, DirectiveID);
+ else if (IDVal == ".machine")
+ ParseDirectiveMachine(DirectiveID.getLoc());
+ else if (IDVal == ".abiversion")
+ ParseDirectiveAbiVersion(DirectiveID.getLoc());
+ else if (IDVal == ".localentry")
+ ParseDirectiveLocalEntry(DirectiveID.getLoc());
+ else
+ return true;
+ return false;
}
/// ParseDirectiveWord
/// ::= .word [ expression (, expression)* ]
-bool PPCAsmParser::ParseDirectiveWord(unsigned Size, SMLoc L) {
- MCAsmParser &Parser = getParser();
- if (getLexer().isNot(AsmToken::EndOfStatement)) {
- for (;;) {
- const MCExpr *Value;
- SMLoc ExprLoc = getLexer().getLoc();
- if (getParser().parseExpression(Value))
- return false;
-
- if (const auto *MCE = dyn_cast<MCConstantExpr>(Value)) {
- assert(Size <= 8 && "Invalid size");
- uint64_t IntValue = MCE->getValue();
- if (!isUIntN(8 * Size, IntValue) && !isIntN(8 * Size, IntValue))
- return Error(ExprLoc, "literal value out of range for directive");
- getStreamer().EmitIntValue(IntValue, Size);
- } else {
- getStreamer().EmitValue(Value, Size, ExprLoc);
- }
-
- if (getLexer().is(AsmToken::EndOfStatement))
- break;
-
- if (getLexer().isNot(AsmToken::Comma))
- return Error(L, "unexpected token in directive");
- Parser.Lex();
- }
- }
+bool PPCAsmParser::ParseDirectiveWord(unsigned Size, AsmToken ID) {
+ auto parseOp = [&]() -> bool {
+ const MCExpr *Value;
+ SMLoc ExprLoc = getParser().getTok().getLoc();
+ if (getParser().parseExpression(Value))
+ return true;
+ if (const auto *MCE = dyn_cast<MCConstantExpr>(Value)) {
+ assert(Size <= 8 && "Invalid size");
+ uint64_t IntValue = MCE->getValue();
+ if (!isUIntN(8 * Size, IntValue) && !isIntN(8 * Size, IntValue))
+ return Error(ExprLoc, "literal value out of range for '" +
+ ID.getIdentifier() + "' directive");
+ getStreamer().EmitIntValue(IntValue, Size);
+ } else
+ getStreamer().EmitValue(Value, Size, ExprLoc);
+ return false;
+ };
- Parser.Lex();
+ if (parseMany(parseOp))
+ return addErrorSuffix(" in '" + ID.getIdentifier() + "' directive");
return false;
}
/// ParseDirectiveTC
/// ::= .tc [ symbol (, expression)* ]
-bool PPCAsmParser::ParseDirectiveTC(unsigned Size, SMLoc L) {
+bool PPCAsmParser::ParseDirectiveTC(unsigned Size, AsmToken ID) {
MCAsmParser &Parser = getParser();
// Skip TC symbol, which is only used with XCOFF.
while (getLexer().isNot(AsmToken::EndOfStatement)
&& getLexer().isNot(AsmToken::Comma))
Parser.Lex();
- if (getLexer().isNot(AsmToken::Comma)) {
- Error(L, "unexpected token in directive");
- return false;
- }
- Parser.Lex();
+ if (parseToken(AsmToken::Comma))
+ return addErrorSuffix(" in '.tc' directive");
// Align to word size.
getParser().getStreamer().EmitValueToAlignment(Size);
// Emit expressions.
- return ParseDirectiveWord(Size, L);
+ return ParseDirectiveWord(Size, ID);
}
/// ParseDirectiveMachine (ELF platforms)
/// ::= .machine [ cpu | "push" | "pop" ]
bool PPCAsmParser::ParseDirectiveMachine(SMLoc L) {
MCAsmParser &Parser = getParser();
- if (getLexer().isNot(AsmToken::Identifier) &&
- getLexer().isNot(AsmToken::String)) {
- Error(L, "unexpected token in directive");
- return false;
- }
+ if (Parser.getTok().isNot(AsmToken::Identifier) &&
+ Parser.getTok().isNot(AsmToken::String))
+ return Error(L, "unexpected token in '.machine' directive");
StringRef CPU = Parser.getTok().getIdentifier();
- Parser.Lex();
// FIXME: Right now, the parser always allows any available
// instruction, so the .machine directive is not useful.
// Implement ".machine any" (by doing nothing) for the benefit
// of existing assembler code. Likewise, we can then implement
// ".machine push" and ".machine pop" as no-op.
- if (CPU != "any" && CPU != "push" && CPU != "pop") {
- Error(L, "unrecognized machine type");
- return false;
- }
+ if (CPU != "any" && CPU != "push" && CPU != "pop")
+ return TokError("unrecognized machine type");
+
+ Parser.Lex();
+
+ if (parseToken(AsmToken::EndOfStatement))
+ return addErrorSuffix(" in '.machine' directive");
- if (getLexer().isNot(AsmToken::EndOfStatement)) {
- Error(L, "unexpected token in directive");
- return false;
- }
PPCTargetStreamer &TStreamer =
*static_cast<PPCTargetStreamer *>(
getParser().getStreamer().getTargetStreamer());
@@ -1852,11 +1840,9 @@ bool PPCAsmParser::ParseDirectiveMachine(SMLoc L) {
/// ::= .machine cpu-identifier
bool PPCAsmParser::ParseDarwinDirectiveMachine(SMLoc L) {
MCAsmParser &Parser = getParser();
- if (getLexer().isNot(AsmToken::Identifier) &&
- getLexer().isNot(AsmToken::String)) {
- Error(L, "unexpected token in directive");
- return false;
- }
+ if (Parser.getTok().isNot(AsmToken::Identifier) &&
+ Parser.getTok().isNot(AsmToken::String))
+ return Error(L, "unexpected token in directive");
StringRef CPU = Parser.getTok().getIdentifier();
Parser.Lex();
@@ -1864,25 +1850,14 @@ bool PPCAsmParser::ParseDarwinDirectiveMachine(SMLoc L) {
// FIXME: this is only the 'default' set of cpu variants.
// However we don't act on this information at present, this is simply
// allowing parsing to proceed with minimal sanity checking.
- if (CPU != "ppc7400" && CPU != "ppc" && CPU != "ppc64") {
- Error(L, "unrecognized cpu type");
- return false;
- }
-
- if (isPPC64() && (CPU == "ppc7400" || CPU == "ppc")) {
- Error(L, "wrong cpu type specified for 64bit");
- return false;
- }
- if (!isPPC64() && CPU == "ppc64") {
- Error(L, "wrong cpu type specified for 32bit");
- return false;
- }
-
- if (getLexer().isNot(AsmToken::EndOfStatement)) {
- Error(L, "unexpected token in directive");
- return false;
- }
-
+ if (check(CPU != "ppc7400" && CPU != "ppc" && CPU != "ppc64", L,
+ "unrecognized cpu type") ||
+ check(isPPC64() && (CPU == "ppc7400" || CPU == "ppc"), L,
+ "wrong cpu type specified for 64bit") ||
+ check(!isPPC64() && CPU == "ppc64", L,
+ "wrong cpu type specified for 32bit") ||
+ parseToken(AsmToken::EndOfStatement))
+ return addErrorSuffix(" in '.machine' directive");
return false;
}
@@ -1890,14 +1865,10 @@ bool PPCAsmParser::ParseDarwinDirectiveMachine(SMLoc L) {
/// ::= .abiversion constant-expression
bool PPCAsmParser::ParseDirectiveAbiVersion(SMLoc L) {
int64_t AbiVersion;
- if (getParser().parseAbsoluteExpression(AbiVersion)){
- Error(L, "expected constant expression");
- return false;
- }
- if (getLexer().isNot(AsmToken::EndOfStatement)) {
- Error(L, "unexpected token in directive");
- return false;
- }
+ if (check(getParser().parseAbsoluteExpression(AbiVersion), L,
+ "expected constant expression") ||
+ parseToken(AsmToken::EndOfStatement))
+ return addErrorSuffix(" in '.abiversion' directive");
PPCTargetStreamer &TStreamer =
*static_cast<PPCTargetStreamer *>(
@@ -1911,28 +1882,16 @@ bool PPCAsmParser::ParseDirectiveAbiVersion(SMLoc L) {
/// ::= .localentry symbol, expression
bool PPCAsmParser::ParseDirectiveLocalEntry(SMLoc L) {
StringRef Name;
- if (getParser().parseIdentifier(Name)) {
- Error(L, "expected identifier in directive");
- return false;
- }
- MCSymbolELF *Sym = cast<MCSymbolELF>(getContext().getOrCreateSymbol(Name));
-
- if (getLexer().isNot(AsmToken::Comma)) {
- Error(L, "unexpected token in directive");
- return false;
- }
- Lex();
+ if (getParser().parseIdentifier(Name))
+ return Error(L, "expected identifier in '.localentry' directive");
+ MCSymbolELF *Sym = cast<MCSymbolELF>(getContext().getOrCreateSymbol(Name));
const MCExpr *Expr;
- if (getParser().parseExpression(Expr)) {
- Error(L, "expected expression");
- return false;
- }
- if (getLexer().isNot(AsmToken::EndOfStatement)) {
- Error(L, "unexpected token in directive");
- return false;
- }
+ if (parseToken(AsmToken::Comma) ||
+ check(getParser().parseExpression(Expr), L, "expected expression") ||
+ parseToken(AsmToken::EndOfStatement))
+ return addErrorSuffix(" in '.localentry' directive");
PPCTargetStreamer &TStreamer =
*static_cast<PPCTargetStreamer *>(
@@ -1946,9 +1905,9 @@ bool PPCAsmParser::ParseDirectiveLocalEntry(SMLoc L) {
/// Force static initialization.
extern "C" void LLVMInitializePowerPCAsmParser() {
- RegisterMCAsmParser<PPCAsmParser> A(ThePPC32Target);
- RegisterMCAsmParser<PPCAsmParser> B(ThePPC64Target);
- RegisterMCAsmParser<PPCAsmParser> C(ThePPC64LETarget);
+ RegisterMCAsmParser<PPCAsmParser> A(getThePPC32Target());
+ RegisterMCAsmParser<PPCAsmParser> B(getThePPC64Target());
+ RegisterMCAsmParser<PPCAsmParser> C(getThePPC64LETarget());
}
#define GET_REGISTER_MATCHER
diff --git a/contrib/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp b/contrib/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp
index 6ea4fb1..12ffbfd 100644
--- a/contrib/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/Disassembler/PPCDisassembler.cpp
@@ -51,11 +51,11 @@ static MCDisassembler *createPPCLEDisassembler(const Target &T,
extern "C" void LLVMInitializePowerPCDisassembler() {
// Register the disassembler for each target.
- TargetRegistry::RegisterMCDisassembler(ThePPC32Target,
+ TargetRegistry::RegisterMCDisassembler(getThePPC32Target(),
createPPCDisassembler);
- TargetRegistry::RegisterMCDisassembler(ThePPC64Target,
+ TargetRegistry::RegisterMCDisassembler(getThePPC64Target(),
createPPCDisassembler);
- TargetRegistry::RegisterMCDisassembler(ThePPC64LETarget,
+ TargetRegistry::RegisterMCDisassembler(getThePPC64LETarget(),
createPPCLEDisassembler);
}
@@ -89,6 +89,17 @@ static const unsigned FRegs[] = {
PPC::F28, PPC::F29, PPC::F30, PPC::F31
};
+static const unsigned VFRegs[] = {
+ PPC::VF0, PPC::VF1, PPC::VF2, PPC::VF3,
+ PPC::VF4, PPC::VF5, PPC::VF6, PPC::VF7,
+ PPC::VF8, PPC::VF9, PPC::VF10, PPC::VF11,
+ PPC::VF12, PPC::VF13, PPC::VF14, PPC::VF15,
+ PPC::VF16, PPC::VF17, PPC::VF18, PPC::VF19,
+ PPC::VF20, PPC::VF21, PPC::VF22, PPC::VF23,
+ PPC::VF24, PPC::VF25, PPC::VF26, PPC::VF27,
+ PPC::VF28, PPC::VF29, PPC::VF30, PPC::VF31
+};
+
static const unsigned VRegs[] = {
PPC::V0, PPC::V1, PPC::V2, PPC::V3,
PPC::V4, PPC::V5, PPC::V6, PPC::V7,
@@ -110,14 +121,14 @@ static const unsigned VSRegs[] = {
PPC::VSL24, PPC::VSL25, PPC::VSL26, PPC::VSL27,
PPC::VSL28, PPC::VSL29, PPC::VSL30, PPC::VSL31,
- PPC::VSH0, PPC::VSH1, PPC::VSH2, PPC::VSH3,
- PPC::VSH4, PPC::VSH5, PPC::VSH6, PPC::VSH7,
- PPC::VSH8, PPC::VSH9, PPC::VSH10, PPC::VSH11,
- PPC::VSH12, PPC::VSH13, PPC::VSH14, PPC::VSH15,
- PPC::VSH16, PPC::VSH17, PPC::VSH18, PPC::VSH19,
- PPC::VSH20, PPC::VSH21, PPC::VSH22, PPC::VSH23,
- PPC::VSH24, PPC::VSH25, PPC::VSH26, PPC::VSH27,
- PPC::VSH28, PPC::VSH29, PPC::VSH30, PPC::VSH31
+ PPC::V0, PPC::V1, PPC::V2, PPC::V3,
+ PPC::V4, PPC::V5, PPC::V6, PPC::V7,
+ PPC::V8, PPC::V9, PPC::V10, PPC::V11,
+ PPC::V12, PPC::V13, PPC::V14, PPC::V15,
+ PPC::V16, PPC::V17, PPC::V18, PPC::V19,
+ PPC::V20, PPC::V21, PPC::V22, PPC::V23,
+ PPC::V24, PPC::V25, PPC::V26, PPC::V27,
+ PPC::V28, PPC::V29, PPC::V30, PPC::V31
};
static const unsigned VSFRegs[] = {
@@ -242,6 +253,12 @@ static DecodeStatus DecodeF8RCRegisterClass(MCInst &Inst, uint64_t RegNo,
return decodeRegisterClass(Inst, RegNo, FRegs);
}
+static DecodeStatus DecodeVFRCRegisterClass(MCInst &Inst, uint64_t RegNo,
+ uint64_t Address,
+ const void *Decoder) {
+ return decodeRegisterClass(Inst, RegNo, VFRegs);
+}
+
static DecodeStatus DecodeVRRCRegisterClass(MCInst &Inst, uint64_t RegNo,
uint64_t Address,
const void *Decoder) {
diff --git a/contrib/llvm/lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp b/contrib/llvm/lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
index d9d9b4f1..609d959 100644
--- a/contrib/llvm/lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/InstPrinter/PPCInstPrinter.cpp
@@ -12,6 +12,7 @@
//===----------------------------------------------------------------------===//
#include "PPCInstPrinter.h"
+#include "PPCInstrInfo.h"
#include "MCTargetDesc/PPCMCTargetDesc.h"
#include "MCTargetDesc/PPCPredicates.h"
#include "llvm/MC/MCExpr.h"
@@ -33,6 +34,11 @@ static cl::opt<bool>
FullRegNames("ppc-asm-full-reg-names", cl::Hidden, cl::init(false),
cl::desc("Use full register names when printing assembly"));
+// Useful for testing purposes. Prints vs{31-63} as v{0-31} respectively.
+static cl::opt<bool>
+ShowVSRNumsAsVR("ppc-vsr-nums-as-vr", cl::Hidden, cl::init(false),
+ cl::desc("Prints full register names with vs{31-63} as v{0-31}"));
+
#define PRINT_ALIAS_INSTR
#include "PPCGenAsmWriter.inc"
@@ -135,6 +141,25 @@ void PPCInstPrinter::printInst(const MCInst *MI, raw_ostream &O,
printAnnotation(O, Annot);
return;
}
+
+ if (MI->getOpcode() == PPC::DCBF) {
+ unsigned char L = MI->getOperand(0).getImm();
+ if (!L || L == 1 || L == 3) {
+ O << "\tdcbf";
+ if (L == 1 || L == 3)
+ O << "l";
+ if (L == 3)
+ O << "p";
+ O << " ";
+
+ printOperand(MI, 1, O);
+ O << ", ";
+ printOperand(MI, 2, O);
+
+ printAnnotation(O, Annot);
+ return;
+ }
+ }
if (!printAliasInstr(MI, O))
printInstruction(MI, O);
@@ -239,6 +264,15 @@ void PPCInstPrinter::printPredicateOperand(const MCInst *MI, unsigned OpNo,
printOperand(MI, OpNo+1, O);
}
+void PPCInstPrinter::printATBitsAsHint(const MCInst *MI, unsigned OpNo,
+ raw_ostream &O) {
+ unsigned Code = MI->getOperand(OpNo).getImm();
+ if (Code == 2)
+ O << "-";
+ else if (Code == 3)
+ O << "+";
+}
+
void PPCInstPrinter::printU1ImmOperand(const MCInst *MI, unsigned OpNo,
raw_ostream &O) {
unsigned int Value = MI->getOperand(OpNo).getImm();
@@ -295,10 +329,12 @@ void PPCInstPrinter::printU7ImmOperand(const MCInst *MI, unsigned OpNo,
O << (unsigned int)Value;
}
+// Operands of BUILD_VECTOR are signed and we use this to print operands
+// of XXSPLTIB which are unsigned. So we simply truncate to 8 bits and
+// print as unsigned.
void PPCInstPrinter::printU8ImmOperand(const MCInst *MI, unsigned OpNo,
raw_ostream &O) {
- unsigned int Value = MI->getOperand(OpNo).getImm();
- assert(Value <= 255 && "Invalid u8imm argument!");
+ unsigned char Value = MI->getOperand(OpNo).getImm();
O << (unsigned int)Value;
}
@@ -412,7 +448,7 @@ void PPCInstPrinter::printTLSCall(const MCInst *MI, unsigned OpNo,
/// stripRegisterPrefix - This method strips the character prefix from a
/// register name so that only the number is left. Used by for linux asm.
static const char *stripRegisterPrefix(const char *RegName) {
- if (FullRegNames)
+ if (FullRegNames || ShowVSRNumsAsVR)
return RegName;
switch (RegName[0]) {
@@ -433,7 +469,24 @@ void PPCInstPrinter::printOperand(const MCInst *MI, unsigned OpNo,
raw_ostream &O) {
const MCOperand &Op = MI->getOperand(OpNo);
if (Op.isReg()) {
- const char *RegName = getRegisterName(Op.getReg());
+ unsigned Reg = Op.getReg();
+
+ // There are VSX instructions that use VSX register numbering (vs0 - vs63)
+ // as well as those that use VMX register numbering (v0 - v31 which
+ // correspond to vs32 - vs63). If we have an instruction that uses VSX
+ // numbering, we need to convert the VMX registers to VSX registers.
+ // Namely, we print 32-63 when the instruction operates on one of the
+ // VMX registers.
+ // (Please synchronize with PPCAsmPrinter::printOperand)
+ if ((MII.get(MI->getOpcode()).TSFlags & PPCII::UseVSXReg) &&
+ !ShowVSRNumsAsVR) {
+ if (PPCInstrInfo::isVRRegister(Reg))
+ Reg = PPC::VSX32 + (Reg - PPC::V0);
+ else if (PPCInstrInfo::isVFRegister(Reg))
+ Reg = PPC::VSX32 + (Reg - PPC::VF0);
+ }
+
+ const char *RegName = getRegisterName(Reg);
// The linux and AIX assembler does not take register prefixes.
if (!isDarwinSyntax())
RegName = stripRegisterPrefix(RegName);
diff --git a/contrib/llvm/lib/Target/PowerPC/InstPrinter/PPCInstPrinter.h b/contrib/llvm/lib/Target/PowerPC/InstPrinter/PPCInstPrinter.h
index d0ffeff..9c79ffb 100644
--- a/contrib/llvm/lib/Target/PowerPC/InstPrinter/PPCInstPrinter.h
+++ b/contrib/llvm/lib/Target/PowerPC/InstPrinter/PPCInstPrinter.h
@@ -45,6 +45,7 @@ public:
void printOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printPredicateOperand(const MCInst *MI, unsigned OpNo,
raw_ostream &O, const char *Modifier = nullptr);
+ void printATBitsAsHint(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printU1ImmOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
void printU2ImmOperand(const MCInst *MI, unsigned OpNo, raw_ostream &O);
diff --git a/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp b/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
index 9100ecb..5847b3a 100644
--- a/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCAsmBackend.cpp
@@ -230,7 +230,8 @@ namespace {
MCAsmBackend *llvm::createPPCAsmBackend(const Target &T,
const MCRegisterInfo &MRI,
- const Triple &TT, StringRef CPU) {
+ const Triple &TT, StringRef CPU,
+ const MCTargetOptions &Options) {
if (TT.isOSDarwin())
return new DarwinPPCAsmBackend(T);
diff --git a/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp b/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp
index e7b2d83..017d21a 100644
--- a/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCCodeEmitter.cpp
@@ -11,6 +11,7 @@
//
//===----------------------------------------------------------------------===//
+#include "PPCInstrInfo.h"
#include "MCTargetDesc/PPCMCTargetDesc.h"
#include "MCTargetDesc/PPCFixupKinds.h"
#include "llvm/ADT/Statistic.h"
@@ -105,6 +106,9 @@ public:
void encodeInstruction(const MCInst &MI, raw_ostream &OS,
SmallVectorImpl<MCFixup> &Fixups,
const MCSubtargetInfo &STI) const override {
+ verifyInstructionPredicates(MI,
+ computeAvailableFeatures(STI.getFeatureBits()));
+
unsigned Opcode = MI.getOpcode();
const MCInstrDesc &Desc = MCII.get(Opcode);
@@ -138,7 +142,11 @@ public:
++MCNumEmitted; // Keep track of the # of mi's emitted.
}
-
+
+private:
+ uint64_t computeAvailableFeatures(const FeatureBitset &FB) const;
+ void verifyInstructionPredicates(const MCInst &MI,
+ uint64_t AvailableFeatures) const;
};
} // end anonymous namespace
@@ -350,7 +358,6 @@ get_crbitm_encoding(const MCInst &MI, unsigned OpNo,
return 0x80 >> CTX.getRegisterInfo()->getEncodingValue(MO.getReg());
}
-
unsigned PPCMCCodeEmitter::
getMachineOpValue(const MCInst &MI, const MCOperand &MO,
SmallVectorImpl<MCFixup> &Fixups,
@@ -361,7 +368,14 @@ getMachineOpValue(const MCInst &MI, const MCOperand &MO,
assert((MI.getOpcode() != PPC::MTOCRF && MI.getOpcode() != PPC::MTOCRF8 &&
MI.getOpcode() != PPC::MFOCRF && MI.getOpcode() != PPC::MFOCRF8) ||
MO.getReg() < PPC::CR0 || MO.getReg() > PPC::CR7);
- return CTX.getRegisterInfo()->getEncodingValue(MO.getReg());
+ unsigned Reg = MO.getReg();
+ unsigned Encode = CTX.getRegisterInfo()->getEncodingValue(Reg);
+
+ if ((MCII.get(MI.getOpcode()).TSFlags & PPCII::UseVSXReg))
+ if (PPCInstrInfo::isVRRegister(Reg))
+ Encode += 32;
+
+ return Encode;
}
assert(MO.isImm() &&
@@ -370,4 +384,6 @@ getMachineOpValue(const MCInst &MI, const MCOperand &MO,
}
+
+#define ENABLE_INSTR_PREDICATE_VERIFIER
#include "PPCGenMCCodeEmitter.inc"
diff --git a/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp b/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp
index c907444..bbd10e5 100644
--- a/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.cpp
@@ -228,7 +228,8 @@ static MCInstPrinter *createPPCMCInstPrinter(const Triple &T,
}
extern "C" void LLVMInitializePowerPCTargetMC() {
- for (Target *T : {&ThePPC32Target, &ThePPC64Target, &ThePPC64LETarget}) {
+ for (Target *T :
+ {&getThePPC32Target(), &getThePPC64Target(), &getThePPC64LETarget()}) {
// Register the MC asm info.
RegisterMCAsmInfoFn C(*T, createPPCMCAsmInfo);
diff --git a/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h b/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h
index 77fe458..0989e0c 100644
--- a/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h
+++ b/contrib/llvm/lib/Target/PowerPC/MCTargetDesc/PPCMCTargetDesc.h
@@ -28,22 +28,24 @@ class MCInstrInfo;
class MCObjectWriter;
class MCRegisterInfo;
class MCSubtargetInfo;
+class MCTargetOptions;
class Target;
class Triple;
class StringRef;
class raw_pwrite_stream;
class raw_ostream;
-extern Target ThePPC32Target;
-extern Target ThePPC64Target;
-extern Target ThePPC64LETarget;
+Target &getThePPC32Target();
+Target &getThePPC64Target();
+Target &getThePPC64LETarget();
MCCodeEmitter *createPPCMCCodeEmitter(const MCInstrInfo &MCII,
const MCRegisterInfo &MRI,
MCContext &Ctx);
MCAsmBackend *createPPCAsmBackend(const Target &T, const MCRegisterInfo &MRI,
- const Triple &TT, StringRef CPU);
+ const Triple &TT, StringRef CPU,
+ const MCTargetOptions &Options);
/// Construct an PPC ELF object writer.
MCObjectWriter *createPPCELFObjectWriter(raw_pwrite_stream &OS, bool Is64Bit,
diff --git a/contrib/llvm/lib/Target/PowerPC/P9InstrResources.td b/contrib/llvm/lib/Target/PowerPC/P9InstrResources.td
new file mode 100644
index 0000000..aea022f
--- /dev/null
+++ b/contrib/llvm/lib/Target/PowerPC/P9InstrResources.td
@@ -0,0 +1,808 @@
+//===- P9InstrResources.td - P9 Instruction Resource Defs -*- tablegen -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines resources required by some of P9 instruction. This is part
+// P9 processor model used for instruction scheduling. Not every instruction
+// is listed here. Instructions in this file belong to itinerary classes that
+// have instructions with different resource requirements.
+//
+//===----------------------------------------------------------------------===//
+
+
+def : InstRW<[P9_ALUE_2C, P9_ALUO_2C, IP_EXECE_1C, IP_EXECO_1C,
+ DISP_1C, DISP_1C],
+ (instrs
+ VADDCUW,
+ VADDUBM,
+ VADDUDM,
+ VADDUHM,
+ VADDUWM,
+ VAND,
+ VANDC,
+ VCMPEQUB,
+ VCMPEQUBo,
+ VCMPEQUD,
+ VCMPEQUDo,
+ VCMPEQUH,
+ VCMPEQUHo,
+ VCMPEQUW,
+ VCMPEQUWo,
+ VCMPGTSB,
+ VCMPGTSBo,
+ VCMPGTSD,
+ VCMPGTSDo,
+ VCMPGTSH,
+ VCMPGTSHo,
+ VCMPGTSW,
+ VCMPGTSWo,
+ VCMPGTUB,
+ VCMPGTUBo,
+ VCMPGTUD,
+ VCMPGTUDo,
+ VCMPGTUH,
+ VCMPGTUHo,
+ VCMPGTUW,
+ VCMPGTUWo,
+ VCMPNEB,
+ VCMPNEBo,
+ VCMPNEH,
+ VCMPNEHo,
+ VCMPNEW,
+ VCMPNEWo,
+ VCMPNEZB,
+ VCMPNEZBo,
+ VCMPNEZH,
+ VCMPNEZHo,
+ VCMPNEZW,
+ VCMPNEZWo,
+ VEQV,
+ VEXTSB2D,
+ VEXTSB2W,
+ VEXTSH2D,
+ VEXTSH2W,
+ VEXTSW2D,
+ VMRGEW,
+ VMRGOW,
+ VNAND,
+ VNEGD,
+ VNEGW,
+ VNOR,
+ VOR,
+ VORC,
+ VPOPCNTB,
+ VPOPCNTH,
+ VPOPCNTW,
+ VSEL,
+ VSUBCUW,
+ VSUBUBM,
+ VSUBUDM,
+ VSUBUHM,
+ VSUBUWM,
+ VXOR,
+ V_SET0B,
+ V_SET0H,
+ V_SET0,
+ XVABSDP,
+ XVABSSP,
+ XVCPSGNDP,
+ XVCPSGNSP,
+ XVIEXPDP,
+ XVNABSDP,
+ XVNABSSP,
+ XVNEGDP,
+ XVNEGSP,
+ XVXEXPDP,
+ XXLAND,
+ XXLANDC,
+ XXLEQV,
+ XXLNAND,
+ XXLNOR,
+ XXLOR,
+ XXLORf,
+ XXLORC,
+ XXLXOR,
+ XXSEL
+)>;
+
+def : InstRW<[P9_ALU_2C, IP_EXEC_1C, DISP_1C, DISP_1C],
+ (instrs
+ XSABSQP,
+ XSCPSGNQP,
+ XSIEXPQP,
+ XSNABSQP,
+ XSNEGQP,
+ XSXEXPQP,
+ XSABSDP,
+ XSCPSGNDP,
+ XSIEXPDP,
+ XSNABSDP,
+ XSNEGDP,
+ XSXEXPDP
+)>;
+
+def : InstRW<[P9_ALUE_3C, P9_ALUO_3C, IP_EXECE_1C, IP_EXECO_1C, DISP_1C, DISP_1C],
+ (instrs
+
+ VMINSB,
+ VMINSD,
+ VMINSH,
+ VMINSW,
+ VMINUB,
+ VMINUD,
+ VMINUH,
+ VMINUW,
+ VPOPCNTD,
+ VPRTYBD,
+ VPRTYBW,
+ VRLB,
+ VRLD,
+ VRLDMI,
+ VRLDNM,
+ VRLH,
+ VRLW,
+ VRLWMI,
+ VRLWNM,
+ VSHASIGMAD,
+ VSHASIGMAW,
+ VSLB,
+ VSLD,
+ VSLH,
+ VSLW,
+ VSRAB,
+ VSRAD,
+ VSRAH,
+ VSRAW,
+ VSRB,
+ VSRD,
+ VSRH,
+ VSRW,
+ VSUBSBS,
+ VSUBSHS,
+ VSUBSWS,
+ VSUBUBS,
+ VSUBUHS,
+ VSUBUWS,
+ XSCMPEQDP,
+ XSCMPEXPDP,
+ XSCMPGEDP,
+ XSCMPGTDP,
+ XSCMPODP,
+ XSCMPUDP,
+ XSCVSPDPN,
+ XSMAXCDP,
+ XSMAXDP,
+ XSMAXJDP,
+ XSMINCDP,
+ XSMINDP,
+ XSMINJDP,
+ XSTDIVDP,
+ XSTSQRTDP,
+ XSTSTDCDP,
+ XSTSTDCSP,
+ XSXSIGDP,
+ XVCMPEQDP,
+ XVCMPEQDPo,
+ XVCMPEQSP,
+ XVCMPEQSPo,
+ XVCMPGEDP,
+ XVCMPGEDPo,
+ XVCMPGESP,
+ XVCMPGESPo,
+ XVCMPGTDP,
+ XVCMPGTDPo,
+ XVCMPGTSP,
+ XVCMPGTSPo,
+ XVIEXPSP,
+ XVMAXDP,
+ XVMAXSP,
+ XVMINDP,
+ XVMINSP,
+ XVTDIVDP,
+ XVTDIVSP,
+ XVTSQRTDP,
+ XVTSQRTSP,
+ XVTSTDCDP,
+ XVTSTDCSP,
+ XVXEXPSP,
+ XVXSIGDP,
+ XVXSIGSP
+)>;
+
+def : InstRW<[P9_ALUE_4C, P9_ALUO_4C, IP_EXECE_1C, IP_EXECO_1C, DISP_1C, DISP_1C],
+ (instrs
+ VABSDUB,
+ VABSDUH,
+ VABSDUW,
+ VADDSBS,
+ VADDSHS,
+ VADDSWS,
+ VADDUBS,
+ VADDUHS,
+ VADDUWS,
+ VAVGSB,
+ VAVGSH,
+ VAVGSW,
+ VAVGUB,
+ VAVGUH,
+ VAVGUW,
+ VBPERMD,
+ VCLZB,
+ VCLZD,
+ VCLZH,
+ VCLZW,
+ VCMPBFP,
+ VCMPBFPo,
+ VCMPGTFP,
+ VCMPGTFPo,
+ VCTZB,
+ VCTZD,
+ VCTZH,
+ VCTZW,
+ VMAXFP,
+ VMAXSB,
+ VMAXSD,
+ VMAXSH,
+ VMAXSW,
+ VMAXUB,
+ VMAXUD,
+ VMAXUH,
+ VMAXUW,
+ VMINFP,
+ VCMPEQFP,
+ VCMPEQFPo,
+ VCMPGEFP,
+ VCMPGEFPo
+)>;
+
+def : InstRW<[P9_DPE_7C, P9_DPO_7C, IP_EXECE_1C, IP_EXECO_1C, DISP_1C, DISP_1C],
+ (instrs
+ VADDFP,
+ VCTSXS,
+ VCTSXS_0,
+ VCTUXS,
+ VCTUXS_0,
+ VEXPTEFP,
+ VLOGEFP,
+ VMADDFP,
+ VMHADDSHS,
+ VNMSUBFP,
+ VREFP,
+ VRFIM,
+ VRFIN,
+ VRFIP,
+ VRFIZ,
+ VRSQRTEFP,
+ VSUBFP,
+ XVADDDP,
+ XVADDSP,
+ XVCVDPSP,
+ XVCVDPSXDS,
+ XVCVDPSXWS,
+ XVCVDPUXDS,
+ XVCVDPUXWS,
+ XVCVHPSP,
+ XVCVSPDP,
+ XVCVSPHP,
+ XVCVSPSXDS,
+ XVCVSPSXWS,
+ XVCVSPUXDS,
+ XVCVSPUXWS,
+ XVCVSXDDP,
+ XVCVSXDSP,
+ XVCVSXWDP,
+ XVCVSXWSP,
+ XVCVUXDDP,
+ XVCVUXDSP,
+ XVCVUXWDP,
+ XVCVUXWSP,
+ XVMADDADP,
+ XVMADDASP,
+ XVMADDMDP,
+ XVMADDMSP,
+ XVMSUBADP,
+ XVMSUBASP,
+ XVMSUBMDP,
+ XVMSUBMSP,
+ XVMULDP,
+ XVMULSP,
+ XVNMADDADP,
+ XVNMADDASP,
+ XVNMADDMDP,
+ XVNMADDMSP,
+ XVNMSUBADP,
+ XVNMSUBASP,
+ XVNMSUBMDP,
+ XVNMSUBMSP,
+ XVRDPI,
+ XVRDPIC,
+ XVRDPIM,
+ XVRDPIP,
+ XVRDPIZ,
+ XVREDP,
+ XVRESP,
+ XVRSPI,
+ XVRSPIC,
+ XVRSPIM,
+ XVRSPIP,
+ XVRSPIZ,
+ XVRSQRTEDP,
+ XVRSQRTESP,
+ XVSUBDP,
+ XVSUBSP,
+ VCFSX,
+ VCFSX_0,
+ VCFUX,
+ VCFUX_0,
+ VMHRADDSHS,
+ VMLADDUHM,
+ VMSUMMBM,
+ VMSUMSHM,
+ VMSUMSHS,
+ VMSUMUBM,
+ VMSUMUHM,
+ VMSUMUHS,
+ VMULESB,
+ VMULESH,
+ VMULESW,
+ VMULEUB,
+ VMULEUH,
+ VMULEUW,
+ VMULOSB,
+ VMULOSH,
+ VMULOSW,
+ VMULOUB,
+ VMULOUH,
+ VMULOUW,
+ VMULUWM,
+ VSUM2SWS,
+ VSUM4SBS,
+ VSUM4SHS,
+ VSUM4UBS,
+ VSUMSWS
+)>;
+
+def : InstRW<[P9_DP_7C, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ XSMADDADP,
+ XSMADDASP,
+ XSMADDMDP,
+ XSMADDMSP,
+ XSMSUBADP,
+ XSMSUBASP,
+ XSMSUBMDP,
+ XSMSUBMSP,
+ XSMULDP,
+ XSMULSP,
+ XSNMADDADP,
+ XSNMADDASP,
+ XSNMADDMDP,
+ XSNMADDMSP,
+ XSNMSUBADP,
+ XSNMSUBASP,
+ XSNMSUBMDP,
+ XSNMSUBMSP
+)>;
+
+
+def : InstRW<[P9_DP_7C, IP_EXEC_1C, DISP_1C, DISP_1C],
+ (instrs
+ XSADDDP,
+ XSADDSP,
+ XSCVDPHP,
+ XSCVDPSP,
+ XSCVDPSXDS,
+ XSCVDPSXWS,
+ XSCVDPUXDS,
+ XSCVDPUXWS,
+ XSCVHPDP,
+ XSCVSPDP,
+ XSCVSXDDP,
+ XSCVSXDSP,
+ XSCVUXDDP,
+ XSCVUXDSP,
+ XSRDPI,
+ XSRDPIC,
+ XSRDPIM,
+ XSRDPIP,
+ XSRDPIZ,
+ XSREDP,
+ XSRESP,
+ //XSRSP,
+ XSRSQRTEDP,
+ XSRSQRTESP,
+ XSSUBDP,
+ XSSUBSP,
+ XSCVDPSPN
+)>;
+
+def : InstRW<[P9_PM_3C, IP_EXECO_1C, IP_EXECE_1C, DISP_1C, DISP_1C],
+ (instrs
+ VBPERMQ,
+ VCLZLSBB,
+ VCTZLSBB,
+ VEXTRACTD,
+ VEXTRACTUB,
+ VEXTRACTUH,
+ VEXTRACTUW,
+ VEXTUBLX,
+ VEXTUBRX,
+ VEXTUHLX,
+ VEXTUHRX,
+ VEXTUWLX,
+ VEXTUWRX,
+ VGBBD,
+ VINSERTB,
+ VINSERTD,
+ VINSERTH,
+ VINSERTW,
+ VMRGHB,
+ VMRGHH,
+ VMRGHW,
+ VMRGLB,
+ VMRGLH,
+ VMRGLW,
+ VPERM,
+ VPERMR,
+ VPERMXOR,
+ VPKPX,
+ VPKSDSS,
+ VPKSDUS,
+ VPKSHSS,
+ VPKSHUS,
+ VPKSWSS,
+ VPKSWUS,
+ VPKUDUM,
+ VPKUDUS,
+ VPKUHUM,
+ VPKUHUS,
+ VPKUWUM,
+ VPKUWUS,
+ VPRTYBQ,
+ VSL,
+ VSLDOI,
+ VSLO,
+ VSLV,
+ VSPLTB,
+ VSPLTH,
+ VSPLTISB,
+ VSPLTISH,
+ VSPLTISW,
+ VSPLTW,
+ VSR,
+ VSRO,
+ VSRV,
+ VUPKHPX,
+ VUPKHSB,
+ VUPKHSH,
+ VUPKHSW,
+ VUPKLPX,
+ VUPKLSB,
+ VUPKLSH,
+ VUPKLSW,
+ XXBRD,
+ XXBRH,
+ XXBRQ,
+ XXBRW,
+ XXEXTRACTUW,
+ XXINSERTW,
+ XXMRGHW,
+ XXMRGLW,
+ XXPERM,
+ XXPERMR,
+ XXSLDWI,
+ XXSPLTIB,
+ XXSPLTW,
+ VADDCUQ,
+ VADDECUQ,
+ VADDEUQM,
+ VADDUQM,
+ VMUL10CUQ,
+ VMUL10ECUQ,
+ VMUL10EUQ,
+ VMUL10UQ,
+ VSUBCUQ,
+ VSUBECUQ,
+ VSUBEUQM,
+ VSUBUQM,
+ XSCMPEXPQP,
+ XSCMPOQP,
+ XSCMPUQP,
+ XSTSTDCQP,
+ XSXSIGQP
+)>;
+
+def : InstRW<[P9_DFU_12C, IP_EXECE_1C, IP_EXECO_1C, DISP_1C, DISP_1C],
+ (instrs
+ XSADDQP,
+ XSADDQPO,
+ XSCVDPQP,
+ XSCVQPDP,
+ XSCVQPDPO,
+ XSCVQPSDZ,
+ XSCVQPSWZ,
+ XSCVQPUDZ,
+ XSCVQPUWZ,
+ XSCVSDQP,
+ XSCVUDQP,
+ XSRQPI,
+ XSRQPXP,
+ XSSUBQP,
+ XSSUBQPO
+)>;
+
+def : InstRW<[P9_DFU_24C, IP_EXECE_1C, IP_EXECO_1C, DISP_1C, DISP_1C],
+ (instrs
+ XSMADDQP,
+ XSMADDQPO,
+ XSMSUBQP,
+ XSMSUBQPO,
+ XSMULQP,
+ XSMULQPO,
+ XSNMADDQP,
+ XSNMADDQPO,
+ XSNMSUBQP,
+ XSNMSUBQPO
+)>;
+
+def : InstRW<[P9_DFU_58C, IP_EXECE_1C, IP_EXECO_1C, DISP_1C, DISP_1C],
+ (instrs
+ XSDIVQP,
+ XSDIVQPO
+)>;
+
+def : InstRW<[P9_DFU_76C, IP_EXECE_1C, IP_EXECO_1C, DISP_1C, DISP_1C],
+ (instrs
+ XSSQRTQP,
+ XSSQRTQPO
+)>;
+
+// Load Operation in IIC_LdStLFD
+
+def : InstRW<[P9_LS_5C, IP_AGEN_1C, DISP_1C, DISP_1C],
+ (instrs
+ LXSDX,
+ LXVD2X,
+ LXSIWZX,
+ LXV,
+ LXSD
+)>;
+
+def : InstRW<[P9_LS_5C, IP_AGEN_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ LFIWZX,
+ LFDX,
+ LFD
+)>;
+
+def : InstRW<[P9_LoadAndALUOp_7C, IP_AGEN_1C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ LXSSPX,
+ LXSIWAX,
+ LXSSP
+)>;
+
+def : InstRW<[P9_LoadAndALUOp_7C, IP_AGEN_1C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ LFIWAX,
+ LFSX,
+ LFS
+)>;
+
+def : InstRW<[P9_LoadAndPMOp_8C, IP_AGEN_1C, IP_EXEC_1C, DISP_1C, DISP_1C],
+ (instrs
+ LXVDSX,
+ LXVW4X
+)>;
+
+// Store Operations in IIC_LdStSTFD.
+
+def : InstRW<[P9_LS_1C, IP_EXEC_1C, IP_AGEN_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ STFS,
+ STFD,
+ STFIWX,
+ STFSX,
+ STFDX,
+ STXSDX,
+ STXSSPX,
+ STXSIWX
+)>;
+
+def : InstRW<[P9_LS_1C, IP_EXEC_1C, IP_EXEC_1C, IP_AGEN_1C, DISP_1C, DISP_1C],
+ (instrs
+ STXVD2X,
+ STXVW4X
+)>;
+
+
+// Divide Operations in IIC_IntDivW, IIC_IntDivD.
+
+def : InstRW<[P9_DIV_16C_8, IP_EXECE_1C, DISP_1C, DISP_1C],
+ (instrs
+ DIVW,
+ DIVWU
+)>;
+
+def : InstRW<[P9_DIV_24C_8, IP_EXECE_1C, DISP_1C, DISP_1C],
+ (instrs
+ DIVWE,
+ DIVD,
+ DIVWEU,
+ DIVDU
+)>;
+
+def : InstRW<[P9_DIV_40C_8, IP_EXECE_1C, DISP_1C, DISP_1C],
+ (instrs
+ DIVDE,
+ DIVDEU
+)>;
+
+def : InstRW<[P9_IntDivAndALUOp_26C_8, IP_EXECE_1C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ DIVWEo,
+ DIVWEUo
+)>;
+
+def : InstRW<[P9_IntDivAndALUOp_42C_8, IP_EXECE_1C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ DIVDEo,
+ DIVDEUo
+)>;
+
+// Rotate Operations in IIC_IntRotateD, IIC_IntRotateDI
+def : InstRW<[P9_ALU_2C, IP_EXEC_1C, DISP_1C, DISP_1C],
+ (instrs
+ SLD,
+ SRD,
+ SRAD,
+ SRADI,
+ RLDIC
+)>;
+
+def : InstRW<[P9_ALU_2C, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ RLDCL,
+ RLDCR,
+ RLDIMI,
+ RLDICL,
+ RLDICR,
+ RLDICL_32_64
+)>;
+
+// CR access instructions in _BrMCR, IIC_BrMCRX.
+
+def : InstRW<[P9_ALU_2C, P9_ALU_2C, IP_EXEC_1C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ MTOCRF,
+ MTOCRF8,
+ MTCRF,
+ MTCRF8
+)>;
+
+def : InstRW<[P9_ALU_5C, IP_EXEC_1C, DISP_1C, DISP_1C],
+ (instrs
+ MCRF,
+ MCRXRX
+)>;
+
+def : InstRW<[P9_ALU_5C, P9_ALU_5C, IP_EXEC_1C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ MCRFS
+)>;
+
+// FP Div instructions in IIC_FPDivD and IIC_FPDivS.
+
+def : InstRW<[P9_DP_33C_8, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ FDIV,
+ XSDIVDP
+)>;
+
+def : InstRW<[P9_DP_22C_5, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ FDIVS,
+ XSDIVSP
+)>;
+
+def : InstRW<[P9_DP_24C_8, IP_EXECE_1C, IP_EXECO_1C, DISP_1C, DISP_1C],
+ (instrs
+ XVDIVSP
+)>;
+
+def : InstRW<[P9_DP_33C_8, IP_EXECE_1C, IP_EXECO_1C, DISP_1C, DISP_1C],
+ (instrs
+ XVDIVDP
+)>;
+
+// FP Instructions in IIC_FPGeneral, IIC_FPFused
+
+def : InstRW<[P9_DP_7C, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ FRSP,
+ FRIND,
+ FRINS,
+ FRIPD,
+ FRIPS,
+ FRIZD,
+ FRIZS,
+ FRIMD,
+ FRIMS,
+ FRE,
+ FRES,
+ FRSQRTE,
+ FRSQRTES,
+ FMADDS,
+ FMADD,
+ FMSUBS,
+ FMSUB,
+ FNMADDS,
+ FNMADD,
+ FNMSUBS,
+ FNMSUB,
+ FSELD,
+ FSELS,
+ FADDS,
+ FMULS,
+ FMUL,
+ FSUBS,
+ FCFID,
+ FCTID,
+ FCTIDZ,
+ FCFIDU,
+ FCFIDS,
+ FCFIDUS,
+ FCTIDUZ,
+ FCTIWUZ,
+ FCTIW,
+ FCTIWZ
+)>;
+
+def : InstRW<[P9_DP_7C, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ FMR,
+ FABSD,
+ FABSS,
+ FNABSD,
+ FNABSS,
+ FNEGD,
+ FNEGS,
+ FCPSGND,
+ FCPSGNS
+)>;
+
+def : InstRW<[P9_ALU_3C, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ FCMPUS,
+ FCMPUD
+)>;
+
+// Load instructions in IIC_LdStLFDU and IIC_LdStLFDUX.
+
+def : InstRW<[P9_LoadAndALUOp_7C, P9_ALU_2C,
+ IP_AGEN_1C, IP_EXEC_1C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ LFSU,
+ LFSUX
+)>;
+
+def : InstRW<[P9_LS_5C, P9_ALU_2C, IP_AGEN_1C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ (instrs
+ LFDU,
+ LFDUX
+)>;
+
diff --git a/contrib/llvm/lib/Target/PowerPC/PPC.td b/contrib/llvm/lib/Target/PowerPC/PPC.td
index 6a8e87e..4650220 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPC.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPC.td
@@ -216,7 +216,7 @@ def ProcessorFeatures {
list<SubtargetFeature> Power8FeatureList =
!listconcat(Power7FeatureList, Power8SpecificFeatures);
list<SubtargetFeature> Power9SpecificFeatures =
- [FeatureP9Altivec, FeatureP9Vector, FeatureISA3_0];
+ [DirectivePwr9, FeatureP9Altivec, FeatureP9Vector, FeatureISA3_0];
list<SubtargetFeature> Power9FeatureList =
!listconcat(Power8FeatureList, Power9SpecificFeatures);
}
@@ -289,7 +289,6 @@ def getAltVSXFMAOpcode : InstrMapping {
include "PPCRegisterInfo.td"
include "PPCSchedule.td"
-include "PPCInstrInfo.td"
//===----------------------------------------------------------------------===//
// PowerPC processors supported.
@@ -418,8 +417,7 @@ def : ProcessorModel<"pwr6x", G5Model,
FeatureMFTB, DeprecatedDST]>;
def : ProcessorModel<"pwr7", P7Model, ProcessorFeatures.Power7FeatureList>;
def : ProcessorModel<"pwr8", P8Model, ProcessorFeatures.Power8FeatureList>;
-// FIXME: Same as P8 until the POWER9 scheduling info is available
-def : ProcessorModel<"pwr9", P8Model, ProcessorFeatures.Power9FeatureList>;
+def : ProcessorModel<"pwr9", P9Model, ProcessorFeatures.Power9FeatureList>;
def : Processor<"ppc", G3Itineraries, [Directive32, FeatureHardFloat,
FeatureMFTB]>;
def : Processor<"ppc32", G3Itineraries, [Directive32, FeatureHardFloat,
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp b/contrib/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
index 76c52ab..f0e0ebc 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCAsmPrinter.cpp
@@ -17,28 +17,29 @@
//===----------------------------------------------------------------------===//
#include "PPC.h"
+#include "PPCInstrInfo.h"
#include "InstPrinter/PPCInstPrinter.h"
#include "MCTargetDesc/PPCMCExpr.h"
-#include "MCTargetDesc/PPCPredicates.h"
+#include "MCTargetDesc/PPCMCTargetDesc.h"
#include "PPCMachineFunctionInfo.h"
#include "PPCSubtarget.h"
#include "PPCTargetMachine.h"
#include "PPCTargetStreamer.h"
#include "llvm/ADT/MapVector.h"
-#include "llvm/ADT/StringExtras.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/ADT/Triple.h"
+#include "llvm/ADT/Twine.h"
#include "llvm/CodeGen/AsmPrinter.h"
-#include "llvm/CodeGen/MachineConstantPool.h"
-#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstr.h"
-#include "llvm/CodeGen/MachineInstrBuilder.h"
#include "llvm/CodeGen/MachineModuleInfoImpls.h"
+#include "llvm/CodeGen/MachineOperand.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/StackMaps.h"
#include "llvm/CodeGen/TargetLoweringObjectFileImpl.h"
-#include "llvm/IR/Constants.h"
-#include "llvm/IR/DebugInfo.h"
-#include "llvm/IR/DerivedTypes.h"
-#include "llvm/IR/Mangler.h"
+#include "llvm/IR/DataLayout.h"
+#include "llvm/IR/GlobalValue.h"
#include "llvm/IR/Module.h"
#include "llvm/MC/MCAsmInfo.h"
#include "llvm/MC/MCContext.h"
@@ -48,21 +49,30 @@
#include "llvm/MC/MCSectionELF.h"
#include "llvm/MC/MCSectionMachO.h"
#include "llvm/MC/MCStreamer.h"
+#include "llvm/MC/MCSymbol.h"
#include "llvm/MC/MCSymbolELF.h"
+#include "llvm/MC/SectionKind.h"
+#include "llvm/Support/Casting.h"
+#include "llvm/Support/CodeGen.h"
#include "llvm/Support/Debug.h"
#include "llvm/Support/ELF.h"
#include "llvm/Support/ErrorHandling.h"
-#include "llvm/Support/MathExtras.h"
-#include "llvm/Support/TargetRegistry.h"
+#include "llvm/Support/MachO.h"
#include "llvm/Support/raw_ostream.h"
-#include "llvm/Target/TargetInstrInfo.h"
-#include "llvm/Target/TargetOptions.h"
-#include "llvm/Target/TargetRegisterInfo.h"
+#include "llvm/Support/TargetRegistry.h"
+#include "llvm/Target/TargetMachine.h"
+#include <algorithm>
+#include <cassert>
+#include <cstdint>
+#include <memory>
+#include <new>
+
using namespace llvm;
#define DEBUG_TYPE "asmprinter"
namespace {
+
class PPCAsmPrinter : public AsmPrinter {
protected:
MapVector<MCSymbol *, MCSymbol *> TOC;
@@ -74,17 +84,15 @@ public:
std::unique_ptr<MCStreamer> Streamer)
: AsmPrinter(TM, std::move(Streamer)), SM(*this) {}
- const char *getPassName() const override {
- return "PowerPC Assembly Printer";
- }
+ StringRef getPassName() const override { return "PowerPC Assembly Printer"; }
- MCSymbol *lookUpOrCreateTOCEntry(MCSymbol *Sym);
+ MCSymbol *lookUpOrCreateTOCEntry(MCSymbol *Sym);
- virtual bool doInitialization(Module &M) override {
- if (!TOC.empty())
- TOC.clear();
- return AsmPrinter::doInitialization(M);
- }
+ bool doInitialization(Module &M) override {
+ if (!TOC.empty())
+ TOC.clear();
+ return AsmPrinter::doInitialization(M);
+ }
void EmitInstruction(const MachineInstr *MI) override;
@@ -115,7 +123,7 @@ public:
std::unique_ptr<MCStreamer> Streamer)
: PPCAsmPrinter(TM, std::move(Streamer)) {}
- const char *getPassName() const override {
+ StringRef getPassName() const override {
return "Linux PPC Assembly Printer";
}
@@ -136,14 +144,15 @@ public:
std::unique_ptr<MCStreamer> Streamer)
: PPCAsmPrinter(TM, std::move(Streamer)) {}
- const char *getPassName() const override {
+ StringRef getPassName() const override {
return "Darwin PPC Assembly Printer";
}
bool doFinalization(Module &M) override;
void EmitStartOfAsmFile(Module &M) override;
};
-} // end of anonymous namespace
+
+} // end anonymous namespace
/// stripRegisterPrefix - This method strips the character prefix from a
/// register name so that only the number is left. Used by for linux asm.
@@ -169,7 +178,23 @@ void PPCAsmPrinter::printOperand(const MachineInstr *MI, unsigned OpNo,
switch (MO.getType()) {
case MachineOperand::MO_Register: {
- const char *RegName = PPCInstPrinter::getRegisterName(MO.getReg());
+ unsigned Reg = MO.getReg();
+
+ // There are VSX instructions that use VSX register numbering (vs0 - vs63)
+ // as well as those that use VMX register numbering (v0 - v31 which
+ // correspond to vs32 - vs63). If we have an instruction that uses VSX
+ // numbering, we need to convert the VMX registers to VSX registers.
+ // Namely, we print 32-63 when the instruction operates on one of the
+ // VMX registers.
+ // (Please synchronize with PPCInstPrinter::printOperand)
+ if (MI->getDesc().TSFlags & PPCII::UseVSXReg) {
+ if (PPCInstrInfo::isVRRegister(Reg))
+ Reg = PPC::VSX32 + (Reg - PPC::V0);
+ else if (PPCInstrInfo::isVFRegister(Reg))
+ Reg = PPC::VSX32 + (Reg - PPC::VF0);
+ }
+ const char *RegName = PPCInstPrinter::getRegisterName(Reg);
+
// Linux assembler (Others?) does not take register mnemonics.
// FIXME - What about special registers used in mfspr/mtspr?
if (!Subtarget->isDarwin())
@@ -347,11 +372,10 @@ void PPCAsmPrinter::LowerPATCHPOINT(StackMaps &SM, const MachineInstr &MI) {
PatchPointOpers Opers(&MI);
unsigned EncodedBytes = 0;
- const MachineOperand &CalleeMO =
- Opers.getMetaOper(PatchPointOpers::TargetPos);
+ const MachineOperand &CalleeMO = Opers.getCallTarget();
if (CalleeMO.isImm()) {
- int64_t CallTarget = Opers.getMetaOper(PatchPointOpers::TargetPos).getImm();
+ int64_t CallTarget = CalleeMO.getImm();
if (CallTarget) {
assert((CallTarget & 0xFFFFFFFFFFFF) == CallTarget &&
"High 16 bits of call target should be zero.");
@@ -430,7 +454,7 @@ void PPCAsmPrinter::LowerPATCHPOINT(StackMaps &SM, const MachineInstr &MI) {
EncodedBytes *= 4;
// Emit padding.
- unsigned NumBytes = Opers.getMetaOper(PatchPointOpers::NBytesPos).getImm();
+ unsigned NumBytes = Opers.getNumPatchBytes();
assert(NumBytes >= EncodedBytes &&
"Patchpoint can't request size less than the length of a call.");
assert((NumBytes - EncodedBytes) % 4 == 0 &&
@@ -674,6 +698,13 @@ void PPCAsmPrinter::EmitInstruction(const MachineInstr *MI) {
const MCExpr *Exp =
MCSymbolRefExpr::create(MOSymbol, MCSymbolRefExpr::VK_PPC_TOC_HA,
OutContext);
+
+ if (!MO.isJTI() && MO.getOffset())
+ Exp = MCBinaryExpr::createAdd(Exp,
+ MCConstantExpr::create(MO.getOffset(),
+ OutContext),
+ OutContext);
+
TmpInst.getOperand(2) = MCOperand::createExpr(Exp);
EmitToStreamer(*OutStreamer, TmpInst);
return;
@@ -1147,10 +1178,12 @@ bool PPCLinuxAsmPrinter::doFinalization(Module &M) {
E = TOC.end(); I != E; ++I) {
OutStreamer->EmitLabel(I->second);
MCSymbol *S = I->first;
- if (isPPC64)
+ if (isPPC64) {
TS.emitTCEntry(*S);
- else
+ } else {
+ OutStreamer->EmitValueToAlignment(4);
OutStreamer->EmitSymbolValue(S, 4);
+ }
}
}
@@ -1193,6 +1226,9 @@ void PPCLinuxAsmPrinter::EmitFunctionBodyStart() {
if (Subtarget->isELFv2ABI()
// Only do all that if the function uses r2 in the first place.
&& !MF->getRegInfo().use_empty(PPC::X2)) {
+ // Note: The logic here must be synchronized with the code in the
+ // branch-selection pass which sets the offset of the first block in the
+ // function. This matters because it affects the alignment.
const PPCFunctionInfo *PPCFI = MF->getInfo<PPCFunctionInfo>();
MCSymbol *GlobalEntryLabel = PPCFI->getGlobalEPSymbol();
@@ -1345,57 +1381,61 @@ bool PPCDarwinAsmPrinter::doFinalization(Module &M) {
// Darwin/PPC always uses mach-o.
const TargetLoweringObjectFileMachO &TLOFMacho =
static_cast<const TargetLoweringObjectFileMachO &>(getObjFileLowering());
- MachineModuleInfoMachO &MMIMacho =
- MMI->getObjFileInfo<MachineModuleInfoMachO>();
-
- if (MAI->doesSupportExceptionHandling() && MMI) {
- // Add the (possibly multiple) personalities to the set of global values.
- // Only referenced functions get into the Personalities list.
- for (const Function *Personality : MMI->getPersonalities()) {
- if (Personality) {
- MCSymbol *NLPSym =
- getSymbolWithGlobalValueBase(Personality, "$non_lazy_ptr");
- MachineModuleInfoImpl::StubValueTy &StubSym =
- MMIMacho.getGVStubEntry(NLPSym);
- StubSym =
- MachineModuleInfoImpl::StubValueTy(getSymbol(Personality), true);
+ if (MMI) {
+ MachineModuleInfoMachO &MMIMacho =
+ MMI->getObjFileInfo<MachineModuleInfoMachO>();
+
+ if (MAI->doesSupportExceptionHandling()) {
+ // Add the (possibly multiple) personalities to the set of global values.
+ // Only referenced functions get into the Personalities list.
+ for (const Function *Personality : MMI->getPersonalities()) {
+ if (Personality) {
+ MCSymbol *NLPSym =
+ getSymbolWithGlobalValueBase(Personality, "$non_lazy_ptr");
+ MachineModuleInfoImpl::StubValueTy &StubSym =
+ MMIMacho.getGVStubEntry(NLPSym);
+ StubSym =
+ MachineModuleInfoImpl::StubValueTy(getSymbol(Personality), true);
+ }
}
}
- }
- // Output stubs for dynamically-linked functions.
- MachineModuleInfoMachO::SymbolListTy Stubs = MMIMacho.GetGVStubList();
-
- // Output macho stubs for external and common global variables.
- if (!Stubs.empty()) {
- // Switch with ".non_lazy_symbol_pointer" directive.
- OutStreamer->SwitchSection(TLOFMacho.getNonLazySymbolPointerSection());
- EmitAlignment(isPPC64 ? 3 : 2);
-
- for (unsigned i = 0, e = Stubs.size(); i != e; ++i) {
- // L_foo$stub:
- OutStreamer->EmitLabel(Stubs[i].first);
- // .indirect_symbol _foo
- MachineModuleInfoImpl::StubValueTy &MCSym = Stubs[i].second;
- OutStreamer->EmitSymbolAttribute(MCSym.getPointer(), MCSA_IndirectSymbol);
-
- if (MCSym.getInt())
- // External to current translation unit.
- OutStreamer->EmitIntValue(0, isPPC64 ? 8 : 4/*size*/);
- else
- // Internal to current translation unit.
- //
- // When we place the LSDA into the TEXT section, the type info pointers
- // need to be indirect and pc-rel. We accomplish this by using NLPs.
- // However, sometimes the types are local to the file. So we need to
- // fill in the value for the NLP in those cases.
- OutStreamer->EmitValue(MCSymbolRefExpr::create(MCSym.getPointer(),
- OutContext),
- isPPC64 ? 8 : 4/*size*/);
- }
+ // Output stubs for dynamically-linked functions.
+ MachineModuleInfoMachO::SymbolListTy Stubs = MMIMacho.GetGVStubList();
+
+ // Output macho stubs for external and common global variables.
+ if (!Stubs.empty()) {
+ // Switch with ".non_lazy_symbol_pointer" directive.
+ OutStreamer->SwitchSection(TLOFMacho.getNonLazySymbolPointerSection());
+ EmitAlignment(isPPC64 ? 3 : 2);
+
+ for (unsigned i = 0, e = Stubs.size(); i != e; ++i) {
+ // L_foo$stub:
+ OutStreamer->EmitLabel(Stubs[i].first);
+ // .indirect_symbol _foo
+ MachineModuleInfoImpl::StubValueTy &MCSym = Stubs[i].second;
+ OutStreamer->EmitSymbolAttribute(MCSym.getPointer(),
+ MCSA_IndirectSymbol);
+
+ if (MCSym.getInt())
+ // External to current translation unit.
+ OutStreamer->EmitIntValue(0, isPPC64 ? 8 : 4 /*size*/);
+ else
+ // Internal to current translation unit.
+ //
+ // When we place the LSDA into the TEXT section, the type info
+ // pointers
+ // need to be indirect and pc-rel. We accomplish this by using NLPs.
+ // However, sometimes the types are local to the file. So we need to
+ // fill in the value for the NLP in those cases.
+ OutStreamer->EmitValue(
+ MCSymbolRefExpr::create(MCSym.getPointer(), OutContext),
+ isPPC64 ? 8 : 4 /*size*/);
+ }
- Stubs.clear();
- OutStreamer->AddBlankLine();
+ Stubs.clear();
+ OutStreamer->AddBlankLine();
+ }
}
// Funny Darwin hack: This flag tells the linker that no global symbols
@@ -1422,7 +1462,10 @@ createPPCAsmPrinterPass(TargetMachine &tm,
// Force static initialization.
extern "C" void LLVMInitializePowerPCAsmPrinter() {
- TargetRegistry::RegisterAsmPrinter(ThePPC32Target, createPPCAsmPrinterPass);
- TargetRegistry::RegisterAsmPrinter(ThePPC64Target, createPPCAsmPrinterPass);
- TargetRegistry::RegisterAsmPrinter(ThePPC64LETarget, createPPCAsmPrinterPass);
+ TargetRegistry::RegisterAsmPrinter(getThePPC32Target(),
+ createPPCAsmPrinterPass);
+ TargetRegistry::RegisterAsmPrinter(getThePPC64Target(),
+ createPPCAsmPrinterPass);
+ TargetRegistry::RegisterAsmPrinter(getThePPC64LETarget(),
+ createPPCAsmPrinterPass);
}
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCBoolRetToInt.cpp b/contrib/llvm/lib/Target/PowerPC/PPCBoolRetToInt.cpp
index bfb4d87..93c201d 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCBoolRetToInt.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCBoolRetToInt.cpp
@@ -1,4 +1,4 @@
-//===- PPCBoolRetToInt.cpp - Convert bool literals to i32 if they are returned ==//
+//===- PPCBoolRetToInt.cpp ------------------------------------------------===//
//
// The LLVM Compiler Infrastructure
//
@@ -33,15 +33,26 @@
//===----------------------------------------------------------------------===//
#include "PPC.h"
-#include "llvm/Transforms/Scalar.h"
+#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/SmallPtrSet.h"
+#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/Statistic.h"
+#include "llvm/ADT/STLExtras.h"
+#include "llvm/IR/Argument.h"
#include "llvm/IR/Constants.h"
#include "llvm/IR/Dominators.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"
-#include "llvm/Support/raw_ostream.h"
+#include "llvm/IR/OperandTraits.h"
+#include "llvm/IR/Type.h"
+#include "llvm/IR/Use.h"
+#include "llvm/IR/User.h"
+#include "llvm/IR/Value.h"
+#include "llvm/Support/Casting.h"
#include "llvm/Pass.h"
+#include <cassert>
using namespace llvm;
@@ -57,7 +68,6 @@ STATISTIC(NumBoolToIntPromotion,
"Total number of times a bool was promoted to an int");
class PPCBoolRetToInt : public FunctionPass {
-
static SmallPtrSet<Value *, 8> findAllDefs(Value *V) {
SmallPtrSet<Value *, 8> Defs;
SmallVector<Value *, 8> WorkList;
@@ -66,7 +76,10 @@ class PPCBoolRetToInt : public FunctionPass {
while (!WorkList.empty()) {
Value *Curr = WorkList.back();
WorkList.pop_back();
- if (User *CurrUser = dyn_cast<User>(Curr))
+ auto *CurrUser = dyn_cast<User>(Curr);
+ // Operands of CallInst are skipped because they may not be Bool type,
+ // and their positions are defined by ABI.
+ if (CurrUser && !isa<CallInst>(Curr))
for (auto &Op : CurrUser->operands())
if (Defs.insert(Op).second)
WorkList.push_back(Op);
@@ -77,9 +90,9 @@ class PPCBoolRetToInt : public FunctionPass {
// Translate a i1 value to an equivalent i32 value:
static Value *translate(Value *V) {
Type *Int32Ty = Type::getInt32Ty(V->getContext());
- if (Constant *C = dyn_cast<Constant>(V))
+ if (auto *C = dyn_cast<Constant>(V))
return ConstantExpr::getZExt(C, Int32Ty);
- if (PHINode *P = dyn_cast<PHINode>(V)) {
+ if (auto *P = dyn_cast<PHINode>(V)) {
// Temporarily set the operands to 0. We'll fix this later in
// runOnUse.
Value *Zero = Constant::getNullValue(Int32Ty);
@@ -90,8 +103,8 @@ class PPCBoolRetToInt : public FunctionPass {
return Q;
}
- Argument *A = dyn_cast<Argument>(V);
- Instruction *I = dyn_cast<Instruction>(V);
+ auto *A = dyn_cast<Argument>(V);
+ auto *I = dyn_cast<Instruction>(V);
assert((A || I) && "Unknown value type");
auto InstPt =
@@ -114,7 +127,7 @@ class PPCBoolRetToInt : public FunctionPass {
// Condition 1
for (auto &BB : F)
for (auto &I : BB)
- if (const PHINode *P = dyn_cast<PHINode>(&I))
+ if (const auto *P = dyn_cast<PHINode>(&I))
if (P->getType()->isIntegerTy(1))
Promotable.insert(P);
@@ -131,14 +144,14 @@ class PPCBoolRetToInt : public FunctionPass {
};
const auto &Users = P->users();
const auto &Operands = P->operands();
- if (!std::all_of(Users.begin(), Users.end(), IsValidUser) ||
- !std::all_of(Operands.begin(), Operands.end(), IsValidOperand))
+ if (!llvm::all_of(Users, IsValidUser) ||
+ !llvm::all_of(Operands, IsValidOperand))
ToRemove.push_back(P);
}
// Iterate to convergence
auto IsPromotable = [&Promotable] (const Value *V) -> bool {
- const PHINode *Phi = dyn_cast<PHINode>(V);
+ const auto *Phi = dyn_cast<PHINode>(V);
return !Phi || Promotable.count(Phi);
};
while (!ToRemove.empty()) {
@@ -150,8 +163,8 @@ class PPCBoolRetToInt : public FunctionPass {
// Condition 4 and 5
const auto &Users = P->users();
const auto &Operands = P->operands();
- if (!std::all_of(Users.begin(), Users.end(), IsPromotable) ||
- !std::all_of(Operands.begin(), Operands.end(), IsPromotable))
+ if (!llvm::all_of(Users, IsPromotable) ||
+ !llvm::all_of(Operands, IsPromotable))
ToRemove.push_back(P);
}
}
@@ -163,11 +176,12 @@ class PPCBoolRetToInt : public FunctionPass {
public:
static char ID;
+
PPCBoolRetToInt() : FunctionPass(ID) {
initializePPCBoolRetToIntPass(*PassRegistry::getPassRegistry());
}
- bool runOnFunction(Function &F) {
+ bool runOnFunction(Function &F) override {
if (skipFunction(F))
return false;
@@ -176,12 +190,12 @@ class PPCBoolRetToInt : public FunctionPass {
bool Changed = false;
for (auto &BB : F) {
for (auto &I : BB) {
- if (ReturnInst *R = dyn_cast<ReturnInst>(&I))
+ if (auto *R = dyn_cast<ReturnInst>(&I))
if (F.getReturnType()->isIntegerTy(1))
Changed |=
runOnUse(R->getOperandUse(0), PromotablePHINodes, Bool2IntMap);
- if (CallInst *CI = dyn_cast<CallInst>(&I))
+ if (auto *CI = dyn_cast<CallInst>(&I))
for (auto &U : CI->operands())
if (U->getType()->isIntegerTy(1))
Changed |= runOnUse(U, PromotablePHINodes, Bool2IntMap);
@@ -196,18 +210,19 @@ class PPCBoolRetToInt : public FunctionPass {
auto Defs = findAllDefs(U);
// If the values are all Constants or Arguments, don't bother
- if (!std::any_of(Defs.begin(), Defs.end(), isa<Instruction, Value *>))
+ if (llvm::none_of(Defs, isa<Instruction, Value *>))
return false;
- // Presently, we only know how to handle PHINode, Constant, and Arguments.
- // Potentially, bitwise operations (AND, OR, XOR, NOT) and sign extension
- // could also be handled in the future.
+ // Presently, we only know how to handle PHINode, Constant, Arguments and
+ // CallInst. Potentially, bitwise operations (AND, OR, XOR, NOT) and sign
+ // extension could also be handled in the future.
for (Value *V : Defs)
- if (!isa<PHINode>(V) && !isa<Constant>(V) && !isa<Argument>(V))
+ if (!isa<PHINode>(V) && !isa<Constant>(V) &&
+ !isa<Argument>(V) && !isa<CallInst>(V))
return false;
for (Value *V : Defs)
- if (const PHINode *P = dyn_cast<PHINode>(V))
+ if (const auto *P = dyn_cast<PHINode>(V))
if (!PromotablePHINodes.count(P))
return false;
@@ -221,32 +236,35 @@ class PPCBoolRetToInt : public FunctionPass {
if (!BoolToIntMap.count(V))
BoolToIntMap[V] = translate(V);
- // Replace the operands of the translated instructions. There were set to
+ // Replace the operands of the translated instructions. They were set to
// zero in the translate function.
for (auto &Pair : BoolToIntMap) {
- User *First = dyn_cast<User>(Pair.first);
- User *Second = dyn_cast<User>(Pair.second);
+ auto *First = dyn_cast<User>(Pair.first);
+ auto *Second = dyn_cast<User>(Pair.second);
assert((!First || Second) && "translated from user to non-user!?");
- if (First)
+ // Operands of CallInst are skipped because they may not be Bool type,
+ // and their positions are defined by ABI.
+ if (First && !isa<CallInst>(First))
for (unsigned i = 0; i < First->getNumOperands(); ++i)
Second->setOperand(i, BoolToIntMap[First->getOperand(i)]);
}
Value *IntRetVal = BoolToIntMap[U];
Type *Int1Ty = Type::getInt1Ty(U->getContext());
- Instruction *I = cast<Instruction>(U.getUser());
+ auto *I = cast<Instruction>(U.getUser());
Value *BackToBool = new TruncInst(IntRetVal, Int1Ty, "backToBool", I);
U.set(BackToBool);
return true;
}
- void getAnalysisUsage(AnalysisUsage &AU) const {
+ void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addPreserved<DominatorTreeWrapperPass>();
FunctionPass::getAnalysisUsage(AU);
}
};
-}
+
+} // end anonymous namespace
char PPCBoolRetToInt::ID = 0;
INITIALIZE_PASS(PPCBoolRetToInt, "bool-ret-to-int",
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCBranchSelector.cpp b/contrib/llvm/lib/Target/PowerPC/PPCBranchSelector.cpp
index 4d63c5b..ae76386 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCBranchSelector.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCBranchSelector.cpp
@@ -19,8 +19,10 @@
#include "MCTargetDesc/PPCPredicates.h"
#include "PPCInstrBuilder.h"
#include "PPCInstrInfo.h"
+#include "PPCSubtarget.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/Support/MathExtras.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Target/TargetSubtargetInfo.h"
@@ -41,19 +43,19 @@ namespace {
initializePPCBSelPass(*PassRegistry::getPassRegistry());
}
- /// BlockSizes - The sizes of the basic blocks in the function.
- std::vector<unsigned> BlockSizes;
+ // The sizes of the basic blocks in the function (the first
+ // element of the pair); the second element of the pair is the amount of the
+ // size that is due to potential padding.
+ std::vector<std::pair<unsigned, unsigned>> BlockSizes;
bool runOnMachineFunction(MachineFunction &Fn) override;
MachineFunctionProperties getRequiredProperties() const override {
return MachineFunctionProperties().set(
- MachineFunctionProperties::Property::AllVRegsAllocated);
+ MachineFunctionProperties::Property::NoVRegs);
}
- const char *getPassName() const override {
- return "PowerPC Branch Selector";
- }
+ StringRef getPassName() const override { return "PowerPC Branch Selector"; }
};
char PPCBSel::ID = 0;
}
@@ -92,8 +94,19 @@ bool PPCBSel::runOnMachineFunction(MachineFunction &Fn) {
return AlignAmt + OffsetToAlignment(Offset, AlignAmt);
};
+ // We need to be careful about the offset of the first block in the function
+ // because it might not have the function's alignment. This happens because,
+ // under the ELFv2 ABI, for functions which require a TOC pointer, we add a
+ // two-instruction sequence to the start of the function.
+ // Note: This needs to be synchronized with the check in
+ // PPCLinuxAsmPrinter::EmitFunctionBodyStart.
+ unsigned InitialOffset = 0;
+ if (Fn.getSubtarget<PPCSubtarget>().isELFv2ABI() &&
+ !Fn.getRegInfo().use_empty(PPC::X2))
+ InitialOffset = 8;
+
// Measure each MBB and compute a size for the entire function.
- unsigned FuncSize = 0;
+ unsigned FuncSize = InitialOffset;
for (MachineFunction::iterator MFI = Fn.begin(), E = Fn.end(); MFI != E;
++MFI) {
MachineBasicBlock *MBB = &*MFI;
@@ -102,15 +115,19 @@ bool PPCBSel::runOnMachineFunction(MachineFunction &Fn) {
// alignment requirement.
if (MBB->getNumber() > 0) {
unsigned AlignExtra = GetAlignmentAdjustment(*MBB, FuncSize);
- BlockSizes[MBB->getNumber()-1] += AlignExtra;
+
+ auto &BS = BlockSizes[MBB->getNumber()-1];
+ BS.first += AlignExtra;
+ BS.second = AlignExtra;
+
FuncSize += AlignExtra;
}
unsigned BlockSize = 0;
for (MachineInstr &MI : *MBB)
- BlockSize += TII->GetInstSizeInBytes(MI);
+ BlockSize += TII->getInstSizeInBytes(MI);
- BlockSizes[MBB->getNumber()] = BlockSize;
+ BlockSizes[MBB->getNumber()].first = BlockSize;
FuncSize += BlockSize;
}
@@ -155,7 +172,7 @@ bool PPCBSel::runOnMachineFunction(MachineFunction &Fn) {
Dest = I->getOperand(0).getMBB();
if (!Dest) {
- MBBStartOffset += TII->GetInstSizeInBytes(*I);
+ MBBStartOffset += TII->getInstSizeInBytes(*I);
continue;
}
@@ -169,14 +186,14 @@ bool PPCBSel::runOnMachineFunction(MachineFunction &Fn) {
BranchSize = MBBStartOffset;
for (unsigned i = Dest->getNumber(), e = MBB.getNumber(); i != e; ++i)
- BranchSize += BlockSizes[i];
+ BranchSize += BlockSizes[i].first;
} else {
// Otherwise, add the size of the blocks between this block and the
// dest to the number of bytes left in this block.
BranchSize = -MBBStartOffset;
for (unsigned i = MBB.getNumber(), e = Dest->getNumber(); i != e; ++i)
- BranchSize += BlockSizes[i];
+ BranchSize += BlockSizes[i].first;
}
// If this branch is in range, ignore it.
@@ -186,9 +203,9 @@ bool PPCBSel::runOnMachineFunction(MachineFunction &Fn) {
}
// Otherwise, we have to expand it to a long branch.
- MachineInstr *OldBranch = I;
- DebugLoc dl = OldBranch->getDebugLoc();
-
+ MachineInstr &OldBranch = *I;
+ DebugLoc dl = OldBranch.getDebugLoc();
+
if (I->getOpcode() == PPC::BCC) {
// The BCC operands are:
// 0. PPC branch predicate
@@ -222,16 +239,42 @@ bool PPCBSel::runOnMachineFunction(MachineFunction &Fn) {
I = BuildMI(MBB, I, dl, TII->get(PPC::B)).addMBB(Dest);
// Remove the old branch from the function.
- OldBranch->eraseFromParent();
-
+ OldBranch.eraseFromParent();
+
// Remember that this instruction is 8-bytes, increase the size of the
// block by 4, remember to iterate.
- BlockSizes[MBB.getNumber()] += 4;
+ BlockSizes[MBB.getNumber()].first += 4;
MBBStartOffset += 8;
++NumExpanded;
MadeChange = true;
}
}
+
+ if (MadeChange) {
+ // If we're going to iterate again, make sure we've updated our
+ // padding-based contributions to the block sizes.
+ unsigned Offset = InitialOffset;
+ for (MachineFunction::iterator MFI = Fn.begin(), E = Fn.end(); MFI != E;
+ ++MFI) {
+ MachineBasicBlock *MBB = &*MFI;
+
+ if (MBB->getNumber() > 0) {
+ auto &BS = BlockSizes[MBB->getNumber()-1];
+ BS.first -= BS.second;
+ Offset -= BS.second;
+
+ unsigned AlignExtra = GetAlignmentAdjustment(*MBB, Offset);
+
+ BS.first += AlignExtra;
+ BS.second = AlignExtra;
+
+ Offset += AlignExtra;
+ }
+
+ Offset += BlockSizes[MBB->getNumber()].first;
+ }
+ }
+
EverMadeChange |= MadeChange;
}
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCCTRLoops.cpp b/contrib/llvm/lib/Target/PowerPC/PPCCTRLoops.cpp
index 8752266..2c62a0f 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCCTRLoops.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCCTRLoops.cpp
@@ -618,9 +618,9 @@ bool PPCCTRLoops::convertToCTRLoop(Loop *L) {
}
#ifndef NDEBUG
-static bool clobbersCTR(const MachineInstr *MI) {
- for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
- const MachineOperand &MO = MI->getOperand(i);
+static bool clobbersCTR(const MachineInstr &MI) {
+ for (unsigned i = 0, e = MI.getNumOperands(); i != e; ++i) {
+ const MachineOperand &MO = MI.getOperand(i);
if (MO.isReg()) {
if (MO.isDef() && (MO.getReg() == PPC::CTR || MO.getReg() == PPC::CTR8))
return true;
@@ -659,7 +659,7 @@ check_block:
break;
}
- if (I != BI && clobbersCTR(I)) {
+ if (I != BI && clobbersCTR(*I)) {
DEBUG(dbgs() << "BB#" << MBB->getNumber() << " (" <<
MBB->getFullName() << ") instruction " << *I <<
" clobbers CTR, invalidating " << "BB#" <<
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCCallingConv.td b/contrib/llvm/lib/Target/PowerPC/PPCCallingConv.td
index 53d2f77..a4f4c86 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCCallingConv.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCCallingConv.td
@@ -26,6 +26,9 @@ class CCIfNotSubtarget<string F, CCAction A>
class CCIfOrigArgWasNotPPCF128<CCAction A>
: CCIf<"!static_cast<PPCCCState *>(&State)->WasOriginalArgPPCF128(ValNo)",
A>;
+class CCIfOrigArgWasPPCF128<CCAction A>
+ : CCIf<"static_cast<PPCCCState *>(&State)->WasOriginalArgPPCF128(ValNo)",
+ A>;
//===----------------------------------------------------------------------===//
// Return Value Calling Convention
@@ -65,11 +68,9 @@ def RetCC_PPC : CallingConv<[
// Vector types returned as "direct" go into V2 .. V9; note that only the
// ELFv2 ABI fully utilizes all these registers.
- CCIfType<[v16i8, v8i16, v4i32, v2i64, v1i128, v4f32],
+ CCIfType<[v16i8, v8i16, v4i32, v2i64, v1i128, v4f32, v2f64],
CCIfSubtarget<"hasAltivec()",
- CCAssignToReg<[V2, V3, V4, V5, V6, V7, V8, V9]>>>,
- CCIfType<[v2f64, v2i64], CCIfSubtarget<"hasVSX()",
- CCAssignToReg<[VSH2, VSH3, VSH4, VSH5, VSH6, VSH7, VSH8, VSH9]>>>
+ CCAssignToReg<[V2, V3, V4, V5, V6, V7, V8, V9]>>>
]>;
// No explicit register is specified for the AnyReg calling convention. The
@@ -118,11 +119,9 @@ def RetCC_PPC64_ELF_FIS : CallingConv<[
CCIfType<[f64], CCAssignToReg<[F1, F2, F3, F4, F5, F6, F7, F8]>>,
CCIfType<[v4f64, v4f32, v4i1],
CCIfSubtarget<"hasQPX()", CCAssignToReg<[QF1, QF2]>>>,
- CCIfType<[v16i8, v8i16, v4i32, v2i64, v1i128, v4f32],
+ CCIfType<[v16i8, v8i16, v4i32, v2i64, v1i128, v4f32, v2f64],
CCIfSubtarget<"hasAltivec()",
- CCAssignToReg<[V2, V3, V4, V5, V6, V7, V8, V9]>>>,
- CCIfType<[v2f64, v2i64], CCIfSubtarget<"hasVSX()",
- CCAssignToReg<[VSH2, VSH3, VSH4, VSH5, VSH6, VSH7, VSH8, VSH9]>>>
+ CCAssignToReg<[V2, V3, V4, V5, V6, V7, V8, V9]>>>
]>;
//===----------------------------------------------------------------------===//
@@ -142,6 +141,9 @@ def CC_PPC32_SVR4_Common : CallingConv<[
CCIfType<[i32],
CCIfSplit<CCIfNotSubtarget<"useSoftFloat()",
CCCustom<"CC_PPC32_SVR4_Custom_AlignArgRegs">>>>,
+ CCIfSplit<CCIfSubtarget<"useSoftFloat()",
+ CCIfOrigArgWasPPCF128<CCCustom<
+ "CC_PPC32_SVR4_Custom_SkipLastArgRegsPPCF128">>>>,
// The 'nest' parameter, if any, is passed in R11.
CCIfNest<CCAssignToReg<[R11]>>,
@@ -187,12 +189,9 @@ def CC_PPC32_SVR4 : CallingConv<[
CCAssignToReg<[QF1, QF2, QF3, QF4, QF5, QF6, QF7, QF8]>>>,
// The first 12 Vector arguments are passed in AltiVec registers.
- CCIfType<[v16i8, v8i16, v4i32, v2i64, v1i128, v4f32],
+ CCIfType<[v16i8, v8i16, v4i32, v2i64, v1i128, v4f32, v2f64],
CCIfSubtarget<"hasAltivec()", CCAssignToReg<[V2, V3, V4, V5, V6, V7,
V8, V9, V10, V11, V12, V13]>>>,
- CCIfType<[v2f64, v2i64], CCIfSubtarget<"hasVSX()",
- CCAssignToReg<[VSH2, VSH3, VSH4, VSH5, VSH6, VSH7, VSH8, VSH9,
- VSH10, VSH11, VSH12, VSH13]>>>,
CCDelegateTo<CC_PPC32_SVR4_Common>
]>;
@@ -281,6 +280,5 @@ def CSR_64_AllRegs_Altivec : CalleeSavedRegs<(add CSR_64_AllRegs,
(sequence "V%u", 0, 31))>;
def CSR_64_AllRegs_VSX : CalleeSavedRegs<(add CSR_64_AllRegs_Altivec,
- (sequence "VSL%u", 0, 31),
- (sequence "VSH%u", 0, 31))>;
+ (sequence "VSL%u", 0, 31))>;
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCEarlyReturn.cpp b/contrib/llvm/lib/Target/PowerPC/PPCEarlyReturn.cpp
index fcd2f50..6bd2296 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCEarlyReturn.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCEarlyReturn.cpp
@@ -58,7 +58,7 @@ protected:
bool Changed = false;
MachineBasicBlock::iterator I = ReturnMBB.begin();
- I = ReturnMBB.SkipPHIsAndLabels(I);
+ I = ReturnMBB.SkipPHIsLabelsAndDebug(I);
// The block must be essentially empty except for the blr.
if (I == ReturnMBB.end() ||
@@ -196,7 +196,7 @@ public:
MachineFunctionProperties getRequiredProperties() const override {
return MachineFunctionProperties().set(
- MachineFunctionProperties::Property::AllVRegsAllocated);
+ MachineFunctionProperties::Property::NoVRegs);
}
void getAnalysisUsage(AnalysisUsage &AU) const override {
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCFastISel.cpp b/contrib/llvm/lib/Target/PowerPC/PPCFastISel.cpp
index 7e92042..9b91b9a 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCFastISel.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCFastISel.cpp
@@ -146,11 +146,11 @@ class PPCFastISel final : public FastISel {
bool isTypeLegal(Type *Ty, MVT &VT);
bool isLoadTypeLegal(Type *Ty, MVT &VT);
bool isValueAvailable(const Value *V) const;
- bool isVSFRCRegister(unsigned Register) const {
- return MRI.getRegClass(Register)->getID() == PPC::VSFRCRegClassID;
+ bool isVSFRCRegClass(const TargetRegisterClass *RC) const {
+ return RC->getID() == PPC::VSFRCRegClassID;
}
- bool isVSSRCRegister(unsigned Register) const {
- return MRI.getRegClass(Register)->getID() == PPC::VSSRCRegClassID;
+ bool isVSSRCRegClass(const TargetRegisterClass *RC) const {
+ return RC->getID() == PPC::VSSRCRegClassID;
}
bool PPCEmitCmp(const Value *Src1Value, const Value *Src2Value,
bool isZExt, unsigned DestReg);
@@ -358,7 +358,7 @@ bool PPCFastISel::PPCComputeAddress(const Value *Obj, Address &Addr) {
for (User::const_op_iterator II = U->op_begin() + 1, IE = U->op_end();
II != IE; ++II, ++GTI) {
const Value *Op = *II;
- if (StructType *STy = dyn_cast<StructType>(*GTI)) {
+ if (StructType *STy = GTI.getStructTypeOrNull()) {
const StructLayout *SL = DL.getStructLayout(STy);
unsigned Idx = cast<ConstantInt>(Op)->getZExtValue();
TmpOffset += SL->getElementOffset(Idx);
@@ -458,7 +458,7 @@ void PPCFastISel::PPCSimplifyAddress(Address &Addr, bool &UseOffset,
// Emit a load instruction if possible, returning true if we succeeded,
// otherwise false. See commentary below for how the register class of
-// the load is determined.
+// the load is determined.
bool PPCFastISel::PPCEmitLoad(MVT VT, unsigned &ResultReg, Address &Addr,
const TargetRegisterClass *RC,
bool IsZExt, unsigned FP64LoadOpc) {
@@ -489,20 +489,18 @@ bool PPCFastISel::PPCEmitLoad(MVT VT, unsigned &ResultReg, Address &Addr,
Opc = Is32BitInt ? PPC::LBZ : PPC::LBZ8;
break;
case MVT::i16:
- Opc = (IsZExt ?
- (Is32BitInt ? PPC::LHZ : PPC::LHZ8) :
- (Is32BitInt ? PPC::LHA : PPC::LHA8));
+ Opc = (IsZExt ? (Is32BitInt ? PPC::LHZ : PPC::LHZ8)
+ : (Is32BitInt ? PPC::LHA : PPC::LHA8));
break;
case MVT::i32:
- Opc = (IsZExt ?
- (Is32BitInt ? PPC::LWZ : PPC::LWZ8) :
- (Is32BitInt ? PPC::LWA_32 : PPC::LWA));
+ Opc = (IsZExt ? (Is32BitInt ? PPC::LWZ : PPC::LWZ8)
+ : (Is32BitInt ? PPC::LWA_32 : PPC::LWA));
if ((Opc == PPC::LWA || Opc == PPC::LWA_32) && ((Addr.Offset & 3) != 0))
UseOffset = false;
break;
case MVT::i64:
Opc = PPC::LD;
- assert(UseRC->hasSuperClassEq(&PPC::G8RCRegClass) &&
+ assert(UseRC->hasSuperClassEq(&PPC::G8RCRegClass) &&
"64-bit load with 32-bit target??");
UseOffset = ((Addr.Offset & 3) == 0);
break;
@@ -521,10 +519,10 @@ bool PPCFastISel::PPCEmitLoad(MVT VT, unsigned &ResultReg, Address &Addr,
// If this is a potential VSX load with an offset of 0, a VSX indexed load can
// be used.
- bool IsVSSRC = (ResultReg != 0) && isVSSRCRegister(ResultReg);
- bool IsVSFRC = (ResultReg != 0) && isVSFRCRegister(ResultReg);
+ bool IsVSSRC = isVSSRCRegClass(UseRC);
+ bool IsVSFRC = isVSFRCRegClass(UseRC);
bool Is32VSXLoad = IsVSSRC && Opc == PPC::LFS;
- bool Is64VSXLoad = IsVSSRC && Opc == PPC::LFD;
+ bool Is64VSXLoad = IsVSFRC && Opc == PPC::LFD;
if ((Is32VSXLoad || Is64VSXLoad) &&
(Addr.BaseType != Address::FrameIndexBase) && UseOffset &&
(Addr.Offset == 0)) {
@@ -579,8 +577,18 @@ bool PPCFastISel::PPCEmitLoad(MVT VT, unsigned &ResultReg, Address &Addr,
case PPC::LFS: Opc = IsVSSRC ? PPC::LXSSPX : PPC::LFSX; break;
case PPC::LFD: Opc = IsVSFRC ? PPC::LXSDX : PPC::LFDX; break;
}
- BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), ResultReg)
- .addReg(Addr.Base.Reg).addReg(IndexReg);
+
+ auto MIB = BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc),
+ ResultReg);
+
+ // If we have an index register defined we use it in the store inst,
+ // otherwise we use X0 as base as it makes the vector instructions to
+ // use zero in the computation of the effective address regardless the
+ // content of the register.
+ if (IndexReg)
+ MIB.addReg(Addr.Base.Reg).addReg(IndexReg);
+ else
+ MIB.addReg(PPC::ZERO8).addReg(Addr.Base.Reg);
}
return true;
@@ -657,8 +665,8 @@ bool PPCFastISel::PPCEmitStore(MVT VT, unsigned SrcReg, Address &Addr) {
// If this is a potential VSX store with an offset of 0, a VSX indexed store
// can be used.
- bool IsVSSRC = isVSSRCRegister(SrcReg);
- bool IsVSFRC = isVSFRCRegister(SrcReg);
+ bool IsVSSRC = isVSSRCRegClass(RC);
+ bool IsVSFRC = isVSFRCRegClass(RC);
bool Is32VSXStore = IsVSSRC && Opc == PPC::STFS;
bool Is64VSXStore = IsVSFRC && Opc == PPC::STFD;
if ((Is32VSXStore || Is64VSXStore) &&
@@ -689,8 +697,9 @@ bool PPCFastISel::PPCEmitStore(MVT VT, unsigned SrcReg, Address &Addr) {
// Base reg with offset in range.
} else if (UseOffset) {
// VSX only provides an indexed store.
- if (Is32VSXStore || Is64VSXStore) return false;
-
+ if (Is32VSXStore || Is64VSXStore)
+ return false;
+
BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc))
.addReg(SrcReg).addImm(Addr.Offset).addReg(Addr.Base.Reg);
@@ -828,7 +837,7 @@ bool PPCFastISel::PPCEmitCmp(const Value *SrcValue1, const Value *SrcValue2,
long Imm = 0;
bool UseImm = false;
- // Only 16-bit integer constants can be represented in compares for
+ // Only 16-bit integer constants can be represented in compares for
// PowerPC. Others will be materialized into a register.
if (const ConstantInt *ConstInt = dyn_cast<ConstantInt>(SrcValue2)) {
if (SrcVT == MVT::i64 || SrcVT == MVT::i32 || SrcVT == MVT::i16 ||
@@ -1617,7 +1626,7 @@ bool PPCFastISel::SelectRet(const Instruction *I) {
CCState CCInfo(CC, F.isVarArg(), *FuncInfo.MF, ValLocs, *Context);
CCInfo.AnalyzeReturn(Outs, RetCC_PPC64_ELF_FIS);
const Value *RV = Ret->getOperand(0);
-
+
// FIXME: Only one output register for now.
if (ValLocs.size() > 1)
return false;
@@ -1663,7 +1672,7 @@ bool PPCFastISel::SelectRet(const Instruction *I) {
if (RVVT != DestVT && RVVT != MVT::i8 &&
RVVT != MVT::i16 && RVVT != MVT::i32)
return false;
-
+
if (RVVT != DestVT) {
switch (VA.getLocInfo()) {
default:
@@ -1907,7 +1916,9 @@ unsigned PPCFastISel::PPCMaterializeFP(const ConstantFP *CFP, MVT VT) {
unsigned Align = DL.getPrefTypeAlignment(CFP->getType());
assert(Align > 0 && "Unexpectedly missing alignment information!");
unsigned Idx = MCP.getConstantPoolIndex(cast<Constant>(CFP), Align);
- unsigned DestReg = createResultReg(TLI.getRegClassFor(VT));
+ const TargetRegisterClass *RC =
+ (VT == MVT::f32) ? &PPC::F4RCRegClass : &PPC::F8RCRegClass;
+ unsigned DestReg = createResultReg(RC);
CodeModel::Model CModel = TM.getCodeModel();
MachineMemOperand *MMO = FuncInfo.MF->getMachineMemOperand(
@@ -1936,8 +1947,9 @@ unsigned PPCFastISel::PPCMaterializeFP(const ConstantFP *CFP, MVT VT) {
BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(PPC::LDtocL),
TmpReg2).addConstantPoolIndex(Idx).addReg(TmpReg);
BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), DestReg)
- .addImm(0).addReg(TmpReg2);
- } else
+ .addImm(0)
+ .addReg(TmpReg2);
+ } else
BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc, TII.get(Opc), DestReg)
.addConstantPoolIndex(Idx, 0, PPCII::MO_TOC_LO)
.addReg(TmpReg)
@@ -2028,8 +2040,8 @@ unsigned PPCFastISel::PPCMaterialize32BitInt(int64_t Imm,
// Just Hi bits.
BuildMI(*FuncInfo.MBB, FuncInfo.InsertPt, DbgLoc,
TII.get(IsGPRC ? PPC::LIS : PPC::LIS8), ResultReg)
- .addImm(Hi);
-
+ .addImm(Hi);
+
return ResultReg;
}
@@ -2145,7 +2157,12 @@ unsigned PPCFastISel::fastMaterializeConstant(const Constant *C) {
else if (const GlobalValue *GV = dyn_cast<GlobalValue>(C))
return PPCMaterializeGV(GV, VT);
else if (const ConstantInt *CI = dyn_cast<ConstantInt>(C))
- return PPCMaterializeInt(CI, VT, VT != MVT::i1);
+ // Note that the code in FunctionLoweringInfo::ComputePHILiveOutRegInfo
+ // assumes that constant PHI operands will be zero extended, and failure to
+ // match that assumption will cause problems if we sign extend here but
+ // some user of a PHI is in a block for which we fall back to full SDAG
+ // instruction selection.
+ return PPCMaterializeInt(CI, VT, false);
return 0;
}
@@ -2263,7 +2280,7 @@ bool PPCFastISel::fastLowerArguments() {
// Handle materializing integer constants into a register. This is not
// automatically generated for PowerPC, so must be explicitly created here.
unsigned PPCFastISel::fastEmit_i(MVT Ty, MVT VT, unsigned Opc, uint64_t Imm) {
-
+
if (Opc != ISD::Constant)
return 0;
@@ -2276,8 +2293,8 @@ unsigned PPCFastISel::fastEmit_i(MVT Ty, MVT VT, unsigned Opc, uint64_t Imm) {
return ImmReg;
}
- if (VT != MVT::i64 && VT != MVT::i32 && VT != MVT::i16 &&
- VT != MVT::i8 && VT != MVT::i1)
+ if (VT != MVT::i64 && VT != MVT::i32 && VT != MVT::i16 && VT != MVT::i8 &&
+ VT != MVT::i1)
return 0;
const TargetRegisterClass *RC = ((VT == MVT::i64) ? &PPC::G8RCRegClass :
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp b/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
index c3a5d3c..e786ef9 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCFrameLowering.cpp
@@ -253,8 +253,8 @@ const PPCFrameLowering::SpillSlot *PPCFrameLowering::getCalleeSavedSpillSlots(
/// contents is spilled and reloaded around the call. Without the prolog code,
/// the spill instruction refers to an undefined register. This code needs
/// to account for all uses of that GPR.
-static void RemoveVRSaveCode(MachineInstr *MI) {
- MachineBasicBlock *Entry = MI->getParent();
+static void RemoveVRSaveCode(MachineInstr &MI) {
+ MachineBasicBlock *Entry = MI.getParent();
MachineFunction *MF = Entry->getParent();
// We know that the MTVRSAVE instruction immediately follows MI. Remove it.
@@ -293,16 +293,16 @@ static void RemoveVRSaveCode(MachineInstr *MI) {
}
// Finally, nuke the UPDATE_VRSAVE.
- MI->eraseFromParent();
+ MI.eraseFromParent();
}
// HandleVRSaveUpdate - MI is the UPDATE_VRSAVE instruction introduced by the
// instruction selector. Based on the vector registers that have been used,
// transform this into the appropriate ORI instruction.
-static void HandleVRSaveUpdate(MachineInstr *MI, const TargetInstrInfo &TII) {
- MachineFunction *MF = MI->getParent()->getParent();
+static void HandleVRSaveUpdate(MachineInstr &MI, const TargetInstrInfo &TII) {
+ MachineFunction *MF = MI.getParent()->getParent();
const TargetRegisterInfo *TRI = MF->getSubtarget().getRegisterInfo();
- DebugLoc dl = MI->getDebugLoc();
+ DebugLoc dl = MI.getDebugLoc();
const MachineRegisterInfo &MRI = MF->getRegInfo();
unsigned UsedRegMask = 0;
@@ -343,44 +343,44 @@ static void HandleVRSaveUpdate(MachineInstr *MI, const TargetInstrInfo &TII) {
return;
}
- unsigned SrcReg = MI->getOperand(1).getReg();
- unsigned DstReg = MI->getOperand(0).getReg();
+ unsigned SrcReg = MI.getOperand(1).getReg();
+ unsigned DstReg = MI.getOperand(0).getReg();
if ((UsedRegMask & 0xFFFF) == UsedRegMask) {
if (DstReg != SrcReg)
- BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORI), DstReg)
- .addReg(SrcReg)
- .addImm(UsedRegMask);
+ BuildMI(*MI.getParent(), MI, dl, TII.get(PPC::ORI), DstReg)
+ .addReg(SrcReg)
+ .addImm(UsedRegMask);
else
- BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORI), DstReg)
- .addReg(SrcReg, RegState::Kill)
- .addImm(UsedRegMask);
+ BuildMI(*MI.getParent(), MI, dl, TII.get(PPC::ORI), DstReg)
+ .addReg(SrcReg, RegState::Kill)
+ .addImm(UsedRegMask);
} else if ((UsedRegMask & 0xFFFF0000) == UsedRegMask) {
if (DstReg != SrcReg)
- BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORIS), DstReg)
- .addReg(SrcReg)
- .addImm(UsedRegMask >> 16);
+ BuildMI(*MI.getParent(), MI, dl, TII.get(PPC::ORIS), DstReg)
+ .addReg(SrcReg)
+ .addImm(UsedRegMask >> 16);
else
- BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORIS), DstReg)
- .addReg(SrcReg, RegState::Kill)
- .addImm(UsedRegMask >> 16);
+ BuildMI(*MI.getParent(), MI, dl, TII.get(PPC::ORIS), DstReg)
+ .addReg(SrcReg, RegState::Kill)
+ .addImm(UsedRegMask >> 16);
} else {
if (DstReg != SrcReg)
- BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORIS), DstReg)
- .addReg(SrcReg)
- .addImm(UsedRegMask >> 16);
+ BuildMI(*MI.getParent(), MI, dl, TII.get(PPC::ORIS), DstReg)
+ .addReg(SrcReg)
+ .addImm(UsedRegMask >> 16);
else
- BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORIS), DstReg)
- .addReg(SrcReg, RegState::Kill)
- .addImm(UsedRegMask >> 16);
+ BuildMI(*MI.getParent(), MI, dl, TII.get(PPC::ORIS), DstReg)
+ .addReg(SrcReg, RegState::Kill)
+ .addImm(UsedRegMask >> 16);
- BuildMI(*MI->getParent(), MI, dl, TII.get(PPC::ORI), DstReg)
- .addReg(DstReg, RegState::Kill)
- .addImm(UsedRegMask & 0xFFFF);
+ BuildMI(*MI.getParent(), MI, dl, TII.get(PPC::ORI), DstReg)
+ .addReg(DstReg, RegState::Kill)
+ .addImm(UsedRegMask & 0xFFFF);
}
// Remove the old UPDATE_VRSAVE instruction.
- MI->eraseFromParent();
+ MI.eraseFromParent();
}
static bool spillsCR(const MachineFunction &MF) {
@@ -422,15 +422,15 @@ static bool MustSaveLR(const MachineFunction &MF, unsigned LR) {
unsigned PPCFrameLowering::determineFrameLayout(MachineFunction &MF,
bool UpdateMF,
bool UseEstimate) const {
- MachineFrameInfo *MFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
// Get the number of bytes to allocate from the FrameInfo
unsigned FrameSize =
- UseEstimate ? MFI->estimateStackSize(MF) : MFI->getStackSize();
+ UseEstimate ? MFI.estimateStackSize(MF) : MFI.getStackSize();
// Get stack alignments. The frame must be aligned to the greatest of these:
unsigned TargetAlign = getStackAlignment(); // alignment required per the ABI
- unsigned MaxAlign = MFI->getMaxAlignment(); // algmt required by data in frame
+ unsigned MaxAlign = MFI.getMaxAlignment(); // algmt required by data in frame
unsigned AlignMask = std::max(MaxAlign, TargetAlign) - 1;
const PPCRegisterInfo *RegInfo =
@@ -448,18 +448,18 @@ unsigned PPCFrameLowering::determineFrameLayout(MachineFunction &MF,
!Subtarget.isSVR4ABI() || // allocated locals.
FrameSize == 0) &&
FrameSize <= 224 && // Fits in red zone.
- !MFI->hasVarSizedObjects() && // No dynamic alloca.
- !MFI->adjustsStack() && // No calls.
+ !MFI.hasVarSizedObjects() && // No dynamic alloca.
+ !MFI.adjustsStack() && // No calls.
!MustSaveLR(MF, LR) &&
!RegInfo->hasBasePointer(MF)) { // No special alignment.
// No need for frame
if (UpdateMF)
- MFI->setStackSize(0);
+ MFI.setStackSize(0);
return 0;
}
// Get the maximum call frame size of all the calls.
- unsigned maxCallFrameSize = MFI->getMaxCallFrameSize();
+ unsigned maxCallFrameSize = MFI.getMaxCallFrameSize();
// Maximum call frame needs to be at least big enough for linkage area.
unsigned minCallFrameSize = getLinkageSize();
@@ -467,12 +467,12 @@ unsigned PPCFrameLowering::determineFrameLayout(MachineFunction &MF,
// If we have dynamic alloca then maxCallFrameSize needs to be aligned so
// that allocations will be aligned.
- if (MFI->hasVarSizedObjects())
+ if (MFI.hasVarSizedObjects())
maxCallFrameSize = (maxCallFrameSize + AlignMask) & ~AlignMask;
// Update maximum call frame size.
if (UpdateMF)
- MFI->setMaxCallFrameSize(maxCallFrameSize);
+ MFI.setMaxCallFrameSize(maxCallFrameSize);
// Include call frame size in total.
FrameSize += maxCallFrameSize;
@@ -482,7 +482,7 @@ unsigned PPCFrameLowering::determineFrameLayout(MachineFunction &MF,
// Update frame info.
if (UpdateMF)
- MFI->setStackSize(FrameSize);
+ MFI.setStackSize(FrameSize);
return FrameSize;
}
@@ -490,18 +490,18 @@ unsigned PPCFrameLowering::determineFrameLayout(MachineFunction &MF,
// hasFP - Return true if the specified function actually has a dedicated frame
// pointer register.
bool PPCFrameLowering::hasFP(const MachineFunction &MF) const {
- const MachineFrameInfo *MFI = MF.getFrameInfo();
+ const MachineFrameInfo &MFI = MF.getFrameInfo();
// FIXME: This is pretty much broken by design: hasFP() might be called really
// early, before the stack layout was calculated and thus hasFP() might return
// true or false here depending on the time of call.
- return (MFI->getStackSize()) && needsFP(MF);
+ return (MFI.getStackSize()) && needsFP(MF);
}
// needsFP - Return true if the specified function should have a dedicated frame
// pointer register. This is true if the function has variable sized allocas or
// if frame pointer elimination is disabled.
bool PPCFrameLowering::needsFP(const MachineFunction &MF) const {
- const MachineFrameInfo *MFI = MF.getFrameInfo();
+ const MachineFrameInfo &MFI = MF.getFrameInfo();
// Naked functions have no stack frame pushed, so we don't have a frame
// pointer.
@@ -509,8 +509,7 @@ bool PPCFrameLowering::needsFP(const MachineFunction &MF) const {
return false;
return MF.getTarget().Options.DisableFramePointerElim(MF) ||
- MFI->hasVarSizedObjects() ||
- MFI->hasStackMap() || MFI->hasPatchPoint() ||
+ MFI.hasVarSizedObjects() || MFI.hasStackMap() || MFI.hasPatchPoint() ||
(MF.getTarget().Options.GuaranteedTailCallOpt &&
MF.getInfo<PPCFunctionInfo>()->hasFastCall());
}
@@ -671,8 +670,8 @@ PPCFrameLowering::twoUniqueScratchRegsRequired(MachineBasicBlock *MBB) const {
unsigned FrameSize = determineFrameLayout(MF, false);
int NegFrameSize = -FrameSize;
bool IsLargeFrame = !isInt<16>(NegFrameSize);
- MachineFrameInfo *MFI = MF.getFrameInfo();
- unsigned MaxAlign = MFI->getMaxAlignment();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
+ unsigned MaxAlign = MFI.getMaxAlignment();
bool HasRedZone = Subtarget.isPPC64() || !Subtarget.isSVR4ABI();
return (IsLargeFrame || !HasRedZone) && HasBP && MaxAlign > 1;
@@ -694,7 +693,7 @@ bool PPCFrameLowering::canUseAsEpilogue(const MachineBasicBlock &MBB) const {
void PPCFrameLowering::emitPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {
MachineBasicBlock::iterator MBBI = MBB.begin();
- MachineFrameInfo *MFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
const PPCInstrInfo &TII =
*static_cast<const PPCInstrInfo *>(Subtarget.getInstrInfo());
const PPCRegisterInfo *RegInfo =
@@ -719,7 +718,7 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
if (!isSVR4ABI)
for (unsigned i = 0; MBBI != MBB.end(); ++i, ++MBBI) {
if (MBBI->getOpcode() == PPC::UPDATE_VRSAVE) {
- HandleVRSaveUpdate(MBBI, TII);
+ HandleVRSaveUpdate(*MBBI, TII);
break;
}
}
@@ -733,7 +732,7 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
if (!isInt<32>(NegFrameSize))
llvm_unreachable("Unhandled stack size!");
- if (MFI->isFrameAddressTaken())
+ if (MFI.isFrameAddressTaken())
replaceFPWithRealFP(MF);
// Check if the link register (LR) must be saved.
@@ -779,7 +778,7 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
assert((isPPC64 || !isSVR4ABI || !(!FrameSize && (MustSaveLR || HasFP))) &&
"FrameSize must be >0 to save/restore the FP or LR for 32-bit SVR4.");
- // Using the same bool variable as below to supress compiler warnings.
+ // Using the same bool variable as below to suppress compiler warnings.
bool SingleScratchReg =
findScratchRegister(&MBB, false, twoUniqueScratchRegsRequired(&MBB),
&ScratchReg, &TempReg);
@@ -793,10 +792,10 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
int FPOffset = 0;
if (HasFP) {
if (isSVR4ABI) {
- MachineFrameInfo *FFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
int FPIndex = FI->getFramePointerSaveIndex();
assert(FPIndex && "No Frame Pointer Save Slot!");
- FPOffset = FFI->getObjectOffset(FPIndex);
+ FPOffset = MFI.getObjectOffset(FPIndex);
} else {
FPOffset = getFramePointerSaveOffset();
}
@@ -805,10 +804,10 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
int BPOffset = 0;
if (HasBP) {
if (isSVR4ABI) {
- MachineFrameInfo *FFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
int BPIndex = FI->getBasePointerSaveIndex();
assert(BPIndex && "No Base Pointer Save Slot!");
- BPOffset = FFI->getObjectOffset(BPIndex);
+ BPOffset = MFI.getObjectOffset(BPIndex);
} else {
BPOffset = getBasePointerSaveOffset();
}
@@ -816,14 +815,14 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
int PBPOffset = 0;
if (FI->usesPICBase()) {
- MachineFrameInfo *FFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
int PBPIndex = FI->getPICBasePointerSaveIndex();
assert(PBPIndex && "No PIC Base Pointer Save Slot!");
- PBPOffset = FFI->getObjectOffset(PBPIndex);
+ PBPOffset = MFI.getObjectOffset(PBPIndex);
}
// Get stack alignments.
- unsigned MaxAlign = MFI->getMaxAlignment();
+ unsigned MaxAlign = MFI.getMaxAlignment();
if (HasBP && MaxAlign > 1)
assert(isPowerOf2_32(MaxAlign) && isInt<16>(MaxAlign) &&
"Invalid alignment!");
@@ -1106,12 +1105,12 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
// because if the stack needed aligning then CFA won't be at a fixed
// offset from FP/SP.
unsigned Reg = MRI->getDwarfRegNum(BPReg, true);
- CFIIndex = MMI.addFrameInst(
+ CFIIndex = MF.addFrameInst(
MCCFIInstruction::createDefCfaRegister(nullptr, Reg));
} else {
// Adjust the definition of CFA to account for the change in SP.
assert(NegFrameSize);
- CFIIndex = MMI.addFrameInst(
+ CFIIndex = MF.addFrameInst(
MCCFIInstruction::createDefCfaOffset(nullptr, NegFrameSize));
}
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
@@ -1120,7 +1119,7 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
if (HasFP) {
// Describe where FP was saved, at a fixed offset from CFA.
unsigned Reg = MRI->getDwarfRegNum(FPReg, true);
- CFIIndex = MMI.addFrameInst(
+ CFIIndex = MF.addFrameInst(
MCCFIInstruction::createOffset(nullptr, Reg, FPOffset));
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);
@@ -1129,7 +1128,7 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
if (FI->usesPICBase()) {
// Describe where FP was saved, at a fixed offset from CFA.
unsigned Reg = MRI->getDwarfRegNum(PPC::R30, true);
- CFIIndex = MMI.addFrameInst(
+ CFIIndex = MF.addFrameInst(
MCCFIInstruction::createOffset(nullptr, Reg, PBPOffset));
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);
@@ -1138,7 +1137,7 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
if (HasBP) {
// Describe where BP was saved, at a fixed offset from CFA.
unsigned Reg = MRI->getDwarfRegNum(BPReg, true);
- CFIIndex = MMI.addFrameInst(
+ CFIIndex = MF.addFrameInst(
MCCFIInstruction::createOffset(nullptr, Reg, BPOffset));
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);
@@ -1147,7 +1146,7 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
if (MustSaveLR) {
// Describe where LR was saved, at a fixed offset from CFA.
unsigned Reg = MRI->getDwarfRegNum(LRReg, true);
- CFIIndex = MMI.addFrameInst(
+ CFIIndex = MF.addFrameInst(
MCCFIInstruction::createOffset(nullptr, Reg, LROffset));
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);
@@ -1164,7 +1163,7 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
// Change the definition of CFA from SP+offset to FP+offset, because SP
// will change at every alloca.
unsigned Reg = MRI->getDwarfRegNum(FPReg, true);
- unsigned CFIIndex = MMI.addFrameInst(
+ unsigned CFIIndex = MF.addFrameInst(
MCCFIInstruction::createDefCfaRegister(nullptr, Reg));
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
@@ -1175,7 +1174,7 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
if (needsCFI) {
// Describe where callee saved registers were saved, at fixed offsets from
// CFA.
- const std::vector<CalleeSavedInfo> &CSI = MFI->getCalleeSavedInfo();
+ const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
for (unsigned I = 0, E = CSI.size(); I != E; ++I) {
unsigned Reg = CSI[I].getReg();
if (Reg == PPC::LR || Reg == PPC::LR8 || Reg == PPC::RM) continue;
@@ -1198,15 +1197,15 @@ void PPCFrameLowering::emitPrologue(MachineFunction &MF,
// the whole CR word. In the ELFv2 ABI, every CR that was
// actually saved gets its own CFI record.
unsigned CRReg = isELFv2ABI? Reg : (unsigned) PPC::CR2;
- unsigned CFIIndex = MMI.addFrameInst(MCCFIInstruction::createOffset(
+ unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
nullptr, MRI->getDwarfRegNum(CRReg, true), 8));
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);
continue;
}
- int Offset = MFI->getObjectOffset(CSI[I].getFrameIdx());
- unsigned CFIIndex = MMI.addFrameInst(MCCFIInstruction::createOffset(
+ int Offset = MFI.getObjectOffset(CSI[I].getFrameIdx());
+ unsigned CFIIndex = MF.addFrameInst(MCCFIInstruction::createOffset(
nullptr, MRI->getDwarfRegNum(Reg, true), Offset));
BuildMI(MBB, MBBI, dl, TII.get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);
@@ -1228,10 +1227,10 @@ void PPCFrameLowering::emitEpilogue(MachineFunction &MF,
static_cast<const PPCRegisterInfo *>(Subtarget.getRegisterInfo());
// Get alignment info so we know how to restore the SP.
- const MachineFrameInfo *MFI = MF.getFrameInfo();
+ const MachineFrameInfo &MFI = MF.getFrameInfo();
// Get the number of bytes allocated from the FrameInfo.
- int FrameSize = MFI->getStackSize();
+ int FrameSize = MFI.getStackSize();
// Get processor type.
bool isPPC64 = Subtarget.isPPC64();
@@ -1272,7 +1271,7 @@ void PPCFrameLowering::emitEpilogue(MachineFunction &MF,
int FPOffset = 0;
- // Using the same bool variable as below to supress compiler warnings.
+ // Using the same bool variable as below to suppress compiler warnings.
bool SingleScratchReg = findScratchRegister(&MBB, true, false, &ScratchReg,
&TempReg);
assert(SingleScratchReg &&
@@ -1284,7 +1283,7 @@ void PPCFrameLowering::emitEpilogue(MachineFunction &MF,
if (isSVR4ABI) {
int FPIndex = FI->getFramePointerSaveIndex();
assert(FPIndex && "No Frame Pointer Save Slot!");
- FPOffset = MFI->getObjectOffset(FPIndex);
+ FPOffset = MFI.getObjectOffset(FPIndex);
} else {
FPOffset = getFramePointerSaveOffset();
}
@@ -1295,7 +1294,7 @@ void PPCFrameLowering::emitEpilogue(MachineFunction &MF,
if (isSVR4ABI) {
int BPIndex = FI->getBasePointerSaveIndex();
assert(BPIndex && "No Base Pointer Save Slot!");
- BPOffset = MFI->getObjectOffset(BPIndex);
+ BPOffset = MFI.getObjectOffset(BPIndex);
} else {
BPOffset = getBasePointerSaveOffset();
}
@@ -1305,7 +1304,7 @@ void PPCFrameLowering::emitEpilogue(MachineFunction &MF,
if (FI->usesPICBase()) {
int PBPIndex = FI->getPICBasePointerSaveIndex();
assert(PBPIndex && "No PIC Base Pointer Save Slot!");
- PBPOffset = MFI->getObjectOffset(PBPIndex);
+ PBPOffset = MFI.getObjectOffset(PBPIndex);
}
bool IsReturnBlock = (MBBI != MBB.end() && MBBI->isReturn());
@@ -1380,7 +1379,7 @@ void PPCFrameLowering::emitEpilogue(MachineFunction &MF,
.addReg(FPReg)
.addReg(ScratchReg);
}
- } else if (!isLargeFrame && !HasBP && !MFI->hasVarSizedObjects()) {
+ } else if (!isLargeFrame && !HasBP && !MFI.hasVarSizedObjects()) {
if (HasRedZone) {
BuildMI(MBB, MBBI, dl, AddImmInst, SPReg)
.addReg(SPReg)
@@ -1603,14 +1602,14 @@ void PPCFrameLowering::determineCalleeSaves(MachineFunction &MF,
int FPSI = FI->getFramePointerSaveIndex();
bool isPPC64 = Subtarget.isPPC64();
bool isDarwinABI = Subtarget.isDarwinABI();
- MachineFrameInfo *MFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
// If the frame pointer save index hasn't been defined yet.
if (!FPSI && needsFP(MF)) {
// Find out what the fix offset of the frame pointer save area.
int FPOffset = getFramePointerSaveOffset();
// Allocate the frame index for frame pointer save area.
- FPSI = MFI->CreateFixedObject(isPPC64? 8 : 4, FPOffset, true);
+ FPSI = MFI.CreateFixedObject(isPPC64? 8 : 4, FPOffset, true);
// Save the result.
FI->setFramePointerSaveIndex(FPSI);
}
@@ -1619,7 +1618,7 @@ void PPCFrameLowering::determineCalleeSaves(MachineFunction &MF,
if (!BPSI && RegInfo->hasBasePointer(MF)) {
int BPOffset = getBasePointerSaveOffset();
// Allocate the frame index for the base pointer save area.
- BPSI = MFI->CreateFixedObject(isPPC64? 8 : 4, BPOffset, true);
+ BPSI = MFI.CreateFixedObject(isPPC64? 8 : 4, BPOffset, true);
// Save the result.
FI->setBasePointerSaveIndex(BPSI);
}
@@ -1627,7 +1626,7 @@ void PPCFrameLowering::determineCalleeSaves(MachineFunction &MF,
// Reserve stack space for the PIC Base register (R30).
// Only used in SVR4 32-bit.
if (FI->usesPICBase()) {
- int PBPSI = MFI->CreateFixedObject(4, -8, true);
+ int PBPSI = MFI.CreateFixedObject(4, -8, true);
FI->setPICBasePointerSaveIndex(PBPSI);
}
@@ -1646,7 +1645,7 @@ void PPCFrameLowering::determineCalleeSaves(MachineFunction &MF,
int TCSPDelta = 0;
if (MF.getTarget().Options.GuaranteedTailCallOpt &&
(TCSPDelta = FI->getTailCallSPDelta()) < 0) {
- MFI->CreateFixedObject(-1 * TCSPDelta, TCSPDelta, true);
+ MFI.CreateFixedObject(-1 * TCSPDelta, TCSPDelta, true);
}
// For 32-bit SVR4, allocate the nonvolatile CR spill slot iff the
@@ -1655,7 +1654,7 @@ void PPCFrameLowering::determineCalleeSaves(MachineFunction &MF,
(SavedRegs.test(PPC::CR2) ||
SavedRegs.test(PPC::CR3) ||
SavedRegs.test(PPC::CR4))) {
- int FrameIdx = MFI->CreateFixedObject((uint64_t)4, (int64_t)-4, true);
+ int FrameIdx = MFI.CreateFixedObject((uint64_t)4, (int64_t)-4, true);
FI->setCRSpillFrameIndex(FrameIdx);
}
}
@@ -1669,15 +1668,15 @@ void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF,
}
// Get callee saved register information.
- MachineFrameInfo *FFI = MF.getFrameInfo();
- const std::vector<CalleeSavedInfo> &CSI = FFI->getCalleeSavedInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
+ const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
// If the function is shrink-wrapped, and if the function has a tail call, the
// tail call might not be in the new RestoreBlock, so real branch instruction
// won't be generated by emitEpilogue(), because shrink-wrap has chosen new
// RestoreBlock. So we handle this case here.
- if (FFI->getSavePoint() && FFI->hasTailCall()) {
- MachineBasicBlock *RestoreBlock = FFI->getRestorePoint();
+ if (MFI.getSavePoint() && MFI.hasTailCall()) {
+ MachineBasicBlock *RestoreBlock = MFI.getRestorePoint();
for (MachineBasicBlock &MBB : MF) {
if (MBB.isReturnBlock() && (&MBB) != RestoreBlock)
createTailCallBranchInstr(MBB);
@@ -1768,7 +1767,7 @@ void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF,
for (unsigned i = 0, e = FPRegs.size(); i != e; ++i) {
int FI = FPRegs[i].getFrameIdx();
- FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI));
+ MFI.setObjectOffset(FI, LowerBound + MFI.getObjectOffset(FI));
}
LowerBound -= (31 - TRI->getEncodingValue(MinFPR) + 1) * 8;
@@ -1782,7 +1781,7 @@ void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF,
int FI = PFI->getFramePointerSaveIndex();
assert(FI && "No Frame Pointer Save Slot!");
- FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI));
+ MFI.setObjectOffset(FI, LowerBound + MFI.getObjectOffset(FI));
}
if (PFI->usesPICBase()) {
@@ -1791,7 +1790,7 @@ void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF,
int FI = PFI->getPICBasePointerSaveIndex();
assert(FI && "No PIC Base Pointer Save Slot!");
- FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI));
+ MFI.setObjectOffset(FI, LowerBound + MFI.getObjectOffset(FI));
}
const PPCRegisterInfo *RegInfo =
@@ -1802,7 +1801,7 @@ void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF,
int FI = PFI->getBasePointerSaveIndex();
assert(FI && "No Base Pointer Save Slot!");
- FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI));
+ MFI.setObjectOffset(FI, LowerBound + MFI.getObjectOffset(FI));
}
// General register save area starts right below the Floating-point
@@ -1813,7 +1812,7 @@ void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF,
for (unsigned i = 0, e = GPRegs.size(); i != e; ++i) {
int FI = GPRegs[i].getFrameIdx();
- FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI));
+ MFI.setObjectOffset(FI, LowerBound + MFI.getObjectOffset(FI));
}
// Move general register save area spill slots down, taking into account
@@ -1821,7 +1820,7 @@ void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF,
for (unsigned i = 0, e = G8Regs.size(); i != e; ++i) {
int FI = G8Regs[i].getFrameIdx();
- FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI));
+ MFI.setObjectOffset(FI, LowerBound + MFI.getObjectOffset(FI));
}
unsigned MinReg =
@@ -1852,7 +1851,7 @@ void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF,
PPC::CRRCRegClass.contains(Reg)))) {
int FI = CSI[i].getFrameIdx();
- FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI));
+ MFI.setObjectOffset(FI, LowerBound + MFI.getObjectOffset(FI));
}
}
@@ -1869,7 +1868,7 @@ void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF,
if (PPC::VRSAVERCRegClass.contains(Reg)) {
int FI = CSI[i].getFrameIdx();
- FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI));
+ MFI.setObjectOffset(FI, LowerBound + MFI.getObjectOffset(FI));
}
}
@@ -1883,7 +1882,7 @@ void PPCFrameLowering::processFunctionBeforeFrameFinalized(MachineFunction &MF,
for (unsigned i = 0, e = VRegs.size(); i != e; ++i) {
int FI = VRegs[i].getFrameIdx();
- FFI->setObjectOffset(FI, LowerBound + FFI->getObjectOffset(FI));
+ MFI.setObjectOffset(FI, LowerBound + MFI.getObjectOffset(FI));
}
}
@@ -1907,25 +1906,25 @@ PPCFrameLowering::addScavengingSpillSlot(MachineFunction &MF,
// because we've not yet computed callee-saved register spills or the
// needed alignment padding.
unsigned StackSize = determineFrameLayout(MF, false, true);
- MachineFrameInfo *MFI = MF.getFrameInfo();
- if (MFI->hasVarSizedObjects() || spillsCR(MF) || spillsVRSAVE(MF) ||
+ MachineFrameInfo &MFI = MF.getFrameInfo();
+ if (MFI.hasVarSizedObjects() || spillsCR(MF) || spillsVRSAVE(MF) ||
hasNonRISpills(MF) || (hasSpills(MF) && !isInt<16>(StackSize))) {
const TargetRegisterClass *GPRC = &PPC::GPRCRegClass;
const TargetRegisterClass *G8RC = &PPC::G8RCRegClass;
const TargetRegisterClass *RC = Subtarget.isPPC64() ? G8RC : GPRC;
- RS->addScavengingFrameIndex(MFI->CreateStackObject(RC->getSize(),
- RC->getAlignment(),
- false));
+ RS->addScavengingFrameIndex(MFI.CreateStackObject(RC->getSize(),
+ RC->getAlignment(),
+ false));
// Might we have over-aligned allocas?
- bool HasAlVars = MFI->hasVarSizedObjects() &&
- MFI->getMaxAlignment() > getStackAlignment();
+ bool HasAlVars = MFI.hasVarSizedObjects() &&
+ MFI.getMaxAlignment() > getStackAlignment();
// These kinds of spills might need two registers.
if (spillsCR(MF) || spillsVRSAVE(MF) || HasAlVars)
- RS->addScavengingFrameIndex(MFI->CreateStackObject(RC->getSize(),
- RC->getAlignment(),
- false));
+ RS->addScavengingFrameIndex(MFI.CreateStackObject(RC->getSize(),
+ RC->getAlignment(),
+ false));
}
}
@@ -2049,8 +2048,7 @@ eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
unsigned ADDInstr = is64Bit ? PPC::ADD8 : PPC::ADD4;
unsigned LISInstr = is64Bit ? PPC::LIS8 : PPC::LIS;
unsigned ORIInstr = is64Bit ? PPC::ORI8 : PPC::ORI;
- MachineInstr *MI = I;
- const DebugLoc &dl = MI->getDebugLoc();
+ const DebugLoc &dl = I->getDebugLoc();
if (isInt<16>(CalleeAmt)) {
BuildMI(MBB, I, dl, TII.get(ADDIInstr), StackReg)
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCHazardRecognizers.cpp b/contrib/llvm/lib/Target/PowerPC/PPCHazardRecognizers.cpp
index caab67d..f327396 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCHazardRecognizers.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCHazardRecognizers.cpp
@@ -226,7 +226,7 @@ void PPCDispatchGroupSBHazardRecognizer::EmitNoop() {
// group-terminating nop, the group is complete.
// FIXME: the same for P9 as previous gen until POWER9 scheduling is ready
if (Directive == PPC::DIR_PWR6 || Directive == PPC::DIR_PWR7 ||
- Directive == PPC::DIR_PWR8 || Directive == PPC::DIR_PWR8 ||
+ Directive == PPC::DIR_PWR8 || Directive == PPC::DIR_PWR9 ||
CurSlots == 6) {
CurGroup.clear();
CurSlots = CurBranches = 0;
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp b/contrib/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
index 0e9b2da..1e51c1f 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
@@ -215,7 +215,7 @@ namespace {
void InsertVRSaveCode(MachineFunction &MF);
- const char *getPassName() const override {
+ StringRef getPassName() const override {
return "PowerPC DAG->DAG Pattern Instruction Selection";
}
@@ -334,12 +334,12 @@ SDNode *PPCDAGToDAGISel::getGlobalBaseReg() {
}
} else {
GlobalBaseReg =
- RegInfo->createVirtualRegister(&PPC::GPRC_NOR0RegClass);
+ RegInfo->createVirtualRegister(&PPC::GPRC_and_GPRC_NOR0RegClass);
BuildMI(FirstMBB, MBBI, dl, TII.get(PPC::MovePCtoLR));
BuildMI(FirstMBB, MBBI, dl, TII.get(PPC::MFLR), GlobalBaseReg);
}
} else {
- GlobalBaseReg = RegInfo->createVirtualRegister(&PPC::G8RC_NOX0RegClass);
+ GlobalBaseReg = RegInfo->createVirtualRegister(&PPC::G8RC_and_G8RC_NOX0RegClass);
BuildMI(FirstMBB, MBBI, dl, TII.get(PPC::MovePCtoLR8));
BuildMI(FirstMBB, MBBI, dl, TII.get(PPC::MFLR8), GlobalBaseReg);
}
@@ -633,6 +633,13 @@ static unsigned getInt64CountDirect(int64_t Imm) {
// If no shift, we're done.
if (!Shift) return Result;
+ // If Hi word == Lo word,
+ // we can use rldimi to insert the Lo word into Hi word.
+ if ((unsigned)(Imm & 0xFFFFFFFF) == Remainder) {
+ ++Result;
+ return Result;
+ }
+
// Shift for next step if the upper 32-bits were not zero.
if (Imm)
++Result;
@@ -731,6 +738,14 @@ static SDNode *getInt64Direct(SelectionDAG *CurDAG, const SDLoc &dl,
// If no shift, we're done.
if (!Shift) return Result;
+ // If Hi word == Lo word,
+ // we can use rldimi to insert the Lo word into Hi word.
+ if ((unsigned)(Imm & 0xFFFFFFFF) == Remainder) {
+ SDValue Ops[] =
+ { SDValue(Result, 0), SDValue(Result, 0), getI32Imm(Shift), getI32Imm(0)};
+ return CurDAG->getMachineNode(PPC::RLDIMI, dl, MVT::i64, Ops);
+ }
+
// Shift for next step if the upper 32-bits were not zero.
if (Imm) {
Result = CurDAG->getMachineNode(PPC::RLDICR, dl, MVT::i64,
@@ -912,84 +927,95 @@ class BitPermutationSelector {
}
};
- // Return true if something interesting was deduced, return false if we're
+ using ValueBitsMemoizedValue = std::pair<bool, SmallVector<ValueBit, 64>>;
+ using ValueBitsMemoizer =
+ DenseMap<SDValue, std::unique_ptr<ValueBitsMemoizedValue>>;
+ ValueBitsMemoizer Memoizer;
+
+ // Return a pair of bool and a SmallVector pointer to a memoization entry.
+ // The bool is true if something interesting was deduced, otherwise if we're
// providing only a generic representation of V (or something else likewise
- // uninteresting for instruction selection).
- bool getValueBits(SDValue V, SmallVector<ValueBit, 64> &Bits) {
+ // uninteresting for instruction selection) through the SmallVector.
+ std::pair<bool, SmallVector<ValueBit, 64> *> getValueBits(SDValue V,
+ unsigned NumBits) {
+ auto &ValueEntry = Memoizer[V];
+ if (ValueEntry)
+ return std::make_pair(ValueEntry->first, &ValueEntry->second);
+ ValueEntry.reset(new ValueBitsMemoizedValue());
+ bool &Interesting = ValueEntry->first;
+ SmallVector<ValueBit, 64> &Bits = ValueEntry->second;
+ Bits.resize(NumBits);
+
switch (V.getOpcode()) {
default: break;
case ISD::ROTL:
if (isa<ConstantSDNode>(V.getOperand(1))) {
unsigned RotAmt = V.getConstantOperandVal(1);
- SmallVector<ValueBit, 64> LHSBits(Bits.size());
- getValueBits(V.getOperand(0), LHSBits);
+ const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;
- for (unsigned i = 0; i < Bits.size(); ++i)
- Bits[i] = LHSBits[i < RotAmt ? i + (Bits.size() - RotAmt) : i - RotAmt];
+ for (unsigned i = 0; i < NumBits; ++i)
+ Bits[i] = LHSBits[i < RotAmt ? i + (NumBits - RotAmt) : i - RotAmt];
- return true;
+ return std::make_pair(Interesting = true, &Bits);
}
break;
case ISD::SHL:
if (isa<ConstantSDNode>(V.getOperand(1))) {
unsigned ShiftAmt = V.getConstantOperandVal(1);
- SmallVector<ValueBit, 64> LHSBits(Bits.size());
- getValueBits(V.getOperand(0), LHSBits);
+ const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;
- for (unsigned i = ShiftAmt; i < Bits.size(); ++i)
+ for (unsigned i = ShiftAmt; i < NumBits; ++i)
Bits[i] = LHSBits[i - ShiftAmt];
for (unsigned i = 0; i < ShiftAmt; ++i)
Bits[i] = ValueBit(ValueBit::ConstZero);
- return true;
+ return std::make_pair(Interesting = true, &Bits);
}
break;
case ISD::SRL:
if (isa<ConstantSDNode>(V.getOperand(1))) {
unsigned ShiftAmt = V.getConstantOperandVal(1);
- SmallVector<ValueBit, 64> LHSBits(Bits.size());
- getValueBits(V.getOperand(0), LHSBits);
+ const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;
- for (unsigned i = 0; i < Bits.size() - ShiftAmt; ++i)
+ for (unsigned i = 0; i < NumBits - ShiftAmt; ++i)
Bits[i] = LHSBits[i + ShiftAmt];
- for (unsigned i = Bits.size() - ShiftAmt; i < Bits.size(); ++i)
+ for (unsigned i = NumBits - ShiftAmt; i < NumBits; ++i)
Bits[i] = ValueBit(ValueBit::ConstZero);
- return true;
+ return std::make_pair(Interesting = true, &Bits);
}
break;
case ISD::AND:
if (isa<ConstantSDNode>(V.getOperand(1))) {
uint64_t Mask = V.getConstantOperandVal(1);
- SmallVector<ValueBit, 64> LHSBits(Bits.size());
- bool LHSTrivial = getValueBits(V.getOperand(0), LHSBits);
+ const SmallVector<ValueBit, 64> *LHSBits;
+ // Mark this as interesting, only if the LHS was also interesting. This
+ // prevents the overall procedure from matching a single immediate 'and'
+ // (which is non-optimal because such an and might be folded with other
+ // things if we don't select it here).
+ std::tie(Interesting, LHSBits) = getValueBits(V.getOperand(0), NumBits);
- for (unsigned i = 0; i < Bits.size(); ++i)
+ for (unsigned i = 0; i < NumBits; ++i)
if (((Mask >> i) & 1) == 1)
- Bits[i] = LHSBits[i];
+ Bits[i] = (*LHSBits)[i];
else
Bits[i] = ValueBit(ValueBit::ConstZero);
- // Mark this as interesting, only if the LHS was also interesting. This
- // prevents the overall procedure from matching a single immediate 'and'
- // (which is non-optimal because such an and might be folded with other
- // things if we don't select it here).
- return LHSTrivial;
+ return std::make_pair(Interesting, &Bits);
}
break;
case ISD::OR: {
- SmallVector<ValueBit, 64> LHSBits(Bits.size()), RHSBits(Bits.size());
- getValueBits(V.getOperand(0), LHSBits);
- getValueBits(V.getOperand(1), RHSBits);
+ const auto &LHSBits = *getValueBits(V.getOperand(0), NumBits).second;
+ const auto &RHSBits = *getValueBits(V.getOperand(1), NumBits).second;
bool AllDisjoint = true;
- for (unsigned i = 0; i < Bits.size(); ++i)
+ for (unsigned i = 0; i < NumBits; ++i)
if (LHSBits[i].isZero())
Bits[i] = RHSBits[i];
else if (RHSBits[i].isZero())
@@ -1002,14 +1028,14 @@ class BitPermutationSelector {
if (!AllDisjoint)
break;
- return true;
+ return std::make_pair(Interesting = true, &Bits);
}
}
- for (unsigned i = 0; i < Bits.size(); ++i)
+ for (unsigned i = 0; i < NumBits; ++i)
Bits[i] = ValueBit(V, i);
- return false;
+ return std::make_pair(Interesting = false, &Bits);
}
// For each value (except the constant ones), compute the left-rotate amount
@@ -1648,9 +1674,12 @@ class BitPermutationSelector {
unsigned NumRLInsts = 0;
bool FirstBG = true;
+ bool MoreBG = false;
for (auto &BG : BitGroups) {
- if (!MatchingBG(BG))
+ if (!MatchingBG(BG)) {
+ MoreBG = true;
continue;
+ }
NumRLInsts +=
SelectRotMask64Count(BG.RLAmt, BG.Repl32, BG.StartIdx, BG.EndIdx,
!FirstBG);
@@ -1668,7 +1697,10 @@ class BitPermutationSelector {
// because that exposes more opportunities for CSE.
if (NumAndInsts > NumRLInsts)
continue;
- if (Use32BitInsts && NumAndInsts == NumRLInsts)
+ // When merging multiple bit groups, instruction or is used.
+ // But when rotate is used, rldimi can inert the rotated value into any
+ // register, so instruction or can be avoided.
+ if ((Use32BitInsts || MoreBG) && NumAndInsts == NumRLInsts)
continue;
DEBUG(dbgs() << "\t\t\t\tusing masking\n");
@@ -1886,8 +1918,7 @@ class BitPermutationSelector {
}
void eraseMatchingBitGroups(function_ref<bool(const BitGroup &)> F) {
- BitGroups.erase(std::remove_if(BitGroups.begin(), BitGroups.end(), F),
- BitGroups.end());
+ BitGroups.erase(remove_if(BitGroups, F), BitGroups.end());
}
SmallVector<ValueBit, 64> Bits;
@@ -1910,9 +1941,12 @@ public:
// rotate-and-shift/shift/and/or instructions, using a set of heuristics
// known to produce optimial code for common cases (like i32 byte swapping).
SDNode *Select(SDNode *N) {
- Bits.resize(N->getValueType(0).getSizeInBits());
- if (!getValueBits(SDValue(N, 0), Bits))
+ Memoizer.clear();
+ auto Result =
+ getValueBits(SDValue(N, 0), N->getValueType(0).getSizeInBits());
+ if (!Result.first)
return nullptr;
+ Bits = std::move(*Result.second);
DEBUG(dbgs() << "Considering bit-permutation-based instruction"
" selection for: ");
@@ -2623,6 +2657,23 @@ void PPCDAGToDAGISel::Select(SDNode *N) {
MB = 64 - countTrailingOnes(Imm64);
SH = 0;
+ if (Val.getOpcode() == ISD::ANY_EXTEND) {
+ auto Op0 = Val.getOperand(0);
+ if ( Op0.getOpcode() == ISD::SRL &&
+ isInt32Immediate(Op0.getOperand(1).getNode(), Imm) && Imm <= MB) {
+
+ auto ResultType = Val.getNode()->getValueType(0);
+ auto ImDef = CurDAG->getMachineNode(PPC::IMPLICIT_DEF, dl,
+ ResultType);
+ SDValue IDVal (ImDef, 0);
+
+ Val = SDValue(CurDAG->getMachineNode(PPC::INSERT_SUBREG, dl,
+ ResultType, IDVal, Op0.getOperand(0),
+ getI32Imm(1, dl)), 0);
+ SH = 64 - Imm;
+ }
+ }
+
// If the operand is a logical right shift, we can fold it into this
// instruction: rldicl(rldicl(x, 64-n, n), 0, mb) -> rldicl(x, 64-n, mb)
// for n <= mb. The right shift is really a left rotate followed by a
@@ -3187,7 +3238,7 @@ SDValue PPCDAGToDAGISel::combineToCMPB(SDNode *N) {
Op0.getOperand(1) == Op1.getOperand(1) && CC == ISD::SETEQ &&
isa<ConstantSDNode>(Op0.getOperand(1))) {
- unsigned Bits = Op0.getValueType().getSizeInBits();
+ unsigned Bits = Op0.getValueSizeInBits();
if (b != Bits/8-1)
return false;
if (Op0.getConstantOperandVal(1) != Bits-8)
@@ -3215,9 +3266,9 @@ SDValue PPCDAGToDAGISel::combineToCMPB(SDNode *N) {
// Now we need to make sure that the upper bytes are known to be
// zero.
- unsigned Bits = Op0.getValueType().getSizeInBits();
- if (!CurDAG->MaskedValueIsZero(Op0,
- APInt::getHighBitsSet(Bits, Bits - (b+1)*8)))
+ unsigned Bits = Op0.getValueSizeInBits();
+ if (!CurDAG->MaskedValueIsZero(
+ Op0, APInt::getHighBitsSet(Bits, Bits - (b + 1) * 8)))
return false;
LHS = Op0.getOperand(0);
@@ -3250,7 +3301,7 @@ SDValue PPCDAGToDAGISel::combineToCMPB(SDNode *N) {
} else if (Op.getOpcode() == ISD::SRL) {
if (!isa<ConstantSDNode>(Op.getOperand(1)))
return false;
- unsigned Bits = Op.getValueType().getSizeInBits();
+ unsigned Bits = Op.getValueSizeInBits();
if (b != Bits/8-1)
return false;
if (Op.getConstantOperandVal(1) != Bits-8)
@@ -3562,7 +3613,8 @@ void PPCDAGToDAGISel::PeepholeCROps() {
Op.getOperand(0) == Op.getOperand(1))
Op2Not = true;
}
- } // fallthrough
+ LLVM_FALLTHROUGH;
+ }
case PPC::BC:
case PPC::BCn:
case PPC::SELECT_I4:
@@ -3989,8 +4041,9 @@ static bool PeepholePPC64ZExtGather(SDValue Op32,
return true;
}
- // CNTLZW always produces a 64-bit value in [0,32], and so is zero extended.
- if (Op32.getMachineOpcode() == PPC::CNTLZW) {
+ // CNT[LT]ZW always produce a 64-bit value in [0,32], and so is zero extended.
+ if (Op32.getMachineOpcode() == PPC::CNTLZW ||
+ Op32.getMachineOpcode() == PPC::CNTTZW) {
ToPromote.insert(Op32.getNode());
return true;
}
@@ -4185,6 +4238,7 @@ void PPCDAGToDAGISel::PeepholePPC64ZExt() {
case PPC::LHBRX: NewOpcode = PPC::LHBRX8; break;
case PPC::LWBRX: NewOpcode = PPC::LWBRX8; break;
case PPC::CNTLZW: NewOpcode = PPC::CNTLZW8; break;
+ case PPC::CNTTZW: NewOpcode = PPC::CNTTZW8; break;
case PPC::RLWIMI: NewOpcode = PPC::RLWIMI8; break;
case PPC::OR: NewOpcode = PPC::OR8; break;
case PPC::SELECT_I4: NewOpcode = PPC::SELECT_I8; break;
@@ -4312,13 +4366,6 @@ void PPCDAGToDAGISel::PeepholePPC64() {
if (!Base.isMachineOpcode())
continue;
- // On targets with fusion, we don't want this to fire and remove a fusion
- // opportunity, unless a) it results in another fusion opportunity or
- // b) optimizing for size.
- if (PPCSubTarget->hasFusion() &&
- (!MF->getFunction()->optForSize() && !Base.hasOneUse()))
- continue;
-
unsigned Flags = 0;
bool ReplaceFlags = true;
@@ -4363,15 +4410,64 @@ void PPCDAGToDAGISel::PeepholePPC64() {
}
SDValue ImmOpnd = Base.getOperand(1);
- int MaxDisplacement = 0;
+
+ // On PPC64, the TOC base pointer is guaranteed by the ABI only to have
+ // 8-byte alignment, and so we can only use offsets less than 8 (otherwise,
+ // we might have needed different @ha relocation values for the offset
+ // pointers).
+ int MaxDisplacement = 7;
if (GlobalAddressSDNode *GA = dyn_cast<GlobalAddressSDNode>(ImmOpnd)) {
const GlobalValue *GV = GA->getGlobal();
- MaxDisplacement = GV->getAlignment() - 1;
+ MaxDisplacement = std::min((int) GV->getAlignment() - 1, MaxDisplacement);
}
+ bool UpdateHBase = false;
+ SDValue HBase = Base.getOperand(0);
+
int Offset = N->getConstantOperandVal(FirstOp);
- if (Offset < 0 || Offset > MaxDisplacement)
- continue;
+ if (ReplaceFlags) {
+ if (Offset < 0 || Offset > MaxDisplacement) {
+ // If we have a addi(toc@l)/addis(toc@ha) pair, and the addis has only
+ // one use, then we can do this for any offset, we just need to also
+ // update the offset (i.e. the symbol addend) on the addis also.
+ if (Base.getMachineOpcode() != PPC::ADDItocL)
+ continue;
+
+ if (!HBase.isMachineOpcode() ||
+ HBase.getMachineOpcode() != PPC::ADDIStocHA)
+ continue;
+
+ if (!Base.hasOneUse() || !HBase.hasOneUse())
+ continue;
+
+ SDValue HImmOpnd = HBase.getOperand(1);
+ if (HImmOpnd != ImmOpnd)
+ continue;
+
+ UpdateHBase = true;
+ }
+ } else {
+ // If we're directly folding the addend from an addi instruction, then:
+ // 1. In general, the offset on the memory access must be zero.
+ // 2. If the addend is a constant, then it can be combined with a
+ // non-zero offset, but only if the result meets the encoding
+ // requirements.
+ if (auto *C = dyn_cast<ConstantSDNode>(ImmOpnd)) {
+ Offset += C->getSExtValue();
+
+ if ((StorageOpcode == PPC::LWA || StorageOpcode == PPC::LD ||
+ StorageOpcode == PPC::STD) && (Offset % 4) != 0)
+ continue;
+
+ if (!isInt<16>(Offset))
+ continue;
+
+ ImmOpnd = CurDAG->getTargetConstant(Offset, SDLoc(ImmOpnd),
+ ImmOpnd.getValueType());
+ } else if (Offset != 0) {
+ continue;
+ }
+ }
// We found an opportunity. Reverse the operands from the add
// immediate and substitute them into the load or store. If
@@ -4414,6 +4510,10 @@ void PPCDAGToDAGISel::PeepholePPC64() {
(void)CurDAG->UpdateNodeOperands(N, ImmOpnd, Base.getOperand(0),
N->getOperand(2));
+ if (UpdateHBase)
+ (void)CurDAG->UpdateNodeOperands(HBase.getNode(), HBase.getOperand(0),
+ ImmOpnd);
+
// The add-immediate may now be dead, in which case remove it.
if (Base.getNode()->use_empty())
CurDAG->RemoveDeadNode(Base.getNode());
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.cpp b/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
index 9089c6a..2b9195b 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.cpp
@@ -27,6 +27,7 @@
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunction.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineJumpTableInfo.h"
#include "llvm/CodeGen/MachineLoopInfo.h"
#include "llvm/CodeGen/MachineRegisterInfo.h"
#include "llvm/CodeGen/SelectionDAG.h"
@@ -216,11 +217,17 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
setOperationAction(ISD::FROUND, MVT::f32, Legal);
}
- // PowerPC does not have BSWAP, CTPOP or CTTZ
+ // PowerPC does not have BSWAP
+ // CTPOP or CTTZ were introduced in P8/P9 respectivelly
setOperationAction(ISD::BSWAP, MVT::i32 , Expand);
- setOperationAction(ISD::CTTZ , MVT::i32 , Expand);
setOperationAction(ISD::BSWAP, MVT::i64 , Expand);
- setOperationAction(ISD::CTTZ , MVT::i64 , Expand);
+ if (Subtarget.isISA3_0()) {
+ setOperationAction(ISD::CTTZ , MVT::i32 , Legal);
+ setOperationAction(ISD::CTTZ , MVT::i64 , Legal);
+ } else {
+ setOperationAction(ISD::CTTZ , MVT::i32 , Expand);
+ setOperationAction(ISD::CTTZ , MVT::i64 , Expand);
+ }
if (Subtarget.hasPOPCNTD() == PPCSubtarget::POPCNTD_Fast) {
setOperationAction(ISD::CTPOP, MVT::i32 , Legal);
@@ -433,6 +440,12 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
setOperationAction(ISD::CTLZ, VT, Expand);
}
+ // Vector instructions introduced in P9
+ if (Subtarget.hasP9Altivec() && (VT.SimpleTy != MVT::v1i128))
+ setOperationAction(ISD::CTTZ, VT, Legal);
+ else
+ setOperationAction(ISD::CTTZ, VT, Expand);
+
// We promote all shuffles to v16i8.
setOperationAction(ISD::VECTOR_SHUFFLE, VT, Promote);
AddPromotedToType (ISD::VECTOR_SHUFFLE, VT, MVT::v16i8);
@@ -489,7 +502,6 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
setOperationAction(ISD::SCALAR_TO_VECTOR, VT, Expand);
setOperationAction(ISD::FPOW, VT, Expand);
setOperationAction(ISD::BSWAP, VT, Expand);
- setOperationAction(ISD::CTTZ, VT, Expand);
setOperationAction(ISD::VSELECT, VT, Expand);
setOperationAction(ISD::SIGN_EXTEND_INREG, VT, Expand);
setOperationAction(ISD::ROTL, VT, Expand);
@@ -660,6 +672,10 @@ PPCTargetLowering::PPCTargetLowering(const PPCTargetMachine &TM,
setOperationAction(ISD::FABS, MVT::v4f32, Legal);
setOperationAction(ISD::FABS, MVT::v2f64, Legal);
+ if (Subtarget.hasDirectMove())
+ setOperationAction(ISD::BUILD_VECTOR, MVT::v2i64, Custom);
+ setOperationAction(ISD::BUILD_VECTOR, MVT::v2f64, Custom);
+
addRegisterClass(MVT::v2i64, &PPC::VSRCRegClass);
}
@@ -1061,6 +1077,9 @@ const char *PPCTargetLowering::getTargetNodeName(unsigned Opcode) const {
case PPCISD::STBRX: return "PPCISD::STBRX";
case PPCISD::LFIWAX: return "PPCISD::LFIWAX";
case PPCISD::LFIWZX: return "PPCISD::LFIWZX";
+ case PPCISD::LXSIZX: return "PPCISD::LXSIZX";
+ case PPCISD::STXSIX: return "PPCISD::STXSIX";
+ case PPCISD::VEXTS: return "PPCISD::VEXTS";
case PPCISD::LXVD2X: return "PPCISD::LXVD2X";
case PPCISD::STXVD2X: return "PPCISD::STXVD2X";
case PPCISD::COND_BRANCH: return "PPCISD::COND_BRANCH";
@@ -1832,9 +1851,9 @@ static void fixupFuncForFI(SelectionDAG &DAG, int FrameIdx, EVT VT) {
return;
MachineFunction &MF = DAG.getMachineFunction();
- MachineFrameInfo *MFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
- unsigned Align = MFI->getObjectAlignment(FrameIdx);
+ unsigned Align = MFI.getObjectAlignment(FrameIdx);
if (Align >= 4)
return;
@@ -2158,6 +2177,55 @@ SDValue PPCTargetLowering::LowerConstantPool(SDValue Op,
return LowerLabelRef(CPIHi, CPILo, IsPIC, DAG);
}
+// For 64-bit PowerPC, prefer the more compact relative encodings.
+// This trades 32 bits per jump table entry for one or two instructions
+// on the jump site.
+unsigned PPCTargetLowering::getJumpTableEncoding() const {
+ if (isJumpTableRelative())
+ return MachineJumpTableInfo::EK_LabelDifference32;
+
+ return TargetLowering::getJumpTableEncoding();
+}
+
+bool PPCTargetLowering::isJumpTableRelative() const {
+ if (Subtarget.isPPC64())
+ return true;
+ return TargetLowering::isJumpTableRelative();
+}
+
+SDValue PPCTargetLowering::getPICJumpTableRelocBase(SDValue Table,
+ SelectionDAG &DAG) const {
+ if (!Subtarget.isPPC64())
+ return TargetLowering::getPICJumpTableRelocBase(Table, DAG);
+
+ switch (getTargetMachine().getCodeModel()) {
+ case CodeModel::Default:
+ case CodeModel::Small:
+ case CodeModel::Medium:
+ return TargetLowering::getPICJumpTableRelocBase(Table, DAG);
+ default:
+ return DAG.getNode(PPCISD::GlobalBaseReg, SDLoc(),
+ getPointerTy(DAG.getDataLayout()));
+ }
+}
+
+const MCExpr *
+PPCTargetLowering::getPICJumpTableRelocBaseExpr(const MachineFunction *MF,
+ unsigned JTI,
+ MCContext &Ctx) const {
+ if (!Subtarget.isPPC64())
+ return TargetLowering::getPICJumpTableRelocBaseExpr(MF, JTI, Ctx);
+
+ switch (getTargetMachine().getCodeModel()) {
+ case CodeModel::Default:
+ case CodeModel::Small:
+ case CodeModel::Medium:
+ return TargetLowering::getPICJumpTableRelocBaseExpr(MF, JTI, Ctx);
+ default:
+ return MCSymbolRefExpr::create(MF->getPICBaseSymbol(), Ctx);
+ }
+}
+
SDValue PPCTargetLowering::LowerJumpTable(SDValue Op, SelectionDAG &DAG) const {
EVT PtrVT = Op.getValueType();
JumpTableSDNode *JT = cast<JumpTableSDNode>(Op);
@@ -2365,20 +2433,10 @@ SDValue PPCTargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const {
// If we're comparing for equality to zero, expose the fact that this is
// implemented as a ctlz/srl pair on ppc, so that the dag combiner can
// fold the new nodes.
+ if (SDValue V = lowerCmpEqZeroToCtlzSrl(Op, DAG))
+ return V;
+
if (ConstantSDNode *C = dyn_cast<ConstantSDNode>(Op.getOperand(1))) {
- if (C->isNullValue() && CC == ISD::SETEQ) {
- EVT VT = Op.getOperand(0).getValueType();
- SDValue Zext = Op.getOperand(0);
- if (VT.bitsLT(MVT::i32)) {
- VT = MVT::i32;
- Zext = DAG.getNode(ISD::ZERO_EXTEND, dl, VT, Op.getOperand(0));
- }
- unsigned Log2b = Log2_32(VT.getSizeInBits());
- SDValue Clz = DAG.getNode(ISD::CTLZ, dl, VT, Zext);
- SDValue Scc = DAG.getNode(ISD::SRL, dl, VT, Clz,
- DAG.getConstant(Log2b, dl, MVT::i32));
- return DAG.getNode(ISD::TRUNCATE, dl, MVT::i32, Scc);
- }
// Leave comparisons against 0 and -1 alone for now, since they're usually
// optimized. FIXME: revisit this when we can custom lower all setcc
// optimizations.
@@ -2679,6 +2737,32 @@ bool llvm::CC_PPC32_SVR4_Custom_AlignArgRegs(unsigned &ValNo, MVT &ValVT,
return false;
}
+bool
+llvm::CC_PPC32_SVR4_Custom_SkipLastArgRegsPPCF128(unsigned &ValNo, MVT &ValVT,
+ MVT &LocVT,
+ CCValAssign::LocInfo &LocInfo,
+ ISD::ArgFlagsTy &ArgFlags,
+ CCState &State) {
+ static const MCPhysReg ArgRegs[] = {
+ PPC::R3, PPC::R4, PPC::R5, PPC::R6,
+ PPC::R7, PPC::R8, PPC::R9, PPC::R10,
+ };
+ const unsigned NumArgRegs = array_lengthof(ArgRegs);
+
+ unsigned RegNum = State.getFirstUnallocated(ArgRegs);
+ int RegsLeft = NumArgRegs - RegNum;
+
+ // Skip if there is not enough registers left for long double type (4 gpr regs
+ // in soft float mode) and put long double argument on the stack.
+ if (RegNum != NumArgRegs && RegsLeft < 4) {
+ for (int i = 0; i < RegsLeft; i++) {
+ State.AllocateReg(ArgRegs[RegNum + i]);
+ }
+ }
+
+ return false;
+}
+
bool llvm::CC_PPC32_SVR4_Custom_AlignFPArgRegs(unsigned &ValNo, MVT &ValVT,
MVT &LocVT,
CCValAssign::LocInfo &LocInfo,
@@ -2896,7 +2980,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_32SVR4(
// AltiVec Technology Programming Interface Manual
MachineFunction &MF = DAG.getMachineFunction();
- MachineFrameInfo *MFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();
EVT PtrVT = getPointerTy(MF.getDataLayout());
@@ -2956,7 +3040,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_32SVR4(
break;
case MVT::v2f64:
case MVT::v2i64:
- RC = &PPC::VSHRCRegClass;
+ RC = &PPC::VRRCRegClass;
break;
case MVT::v4f64:
RC = &PPC::QFRCRegClass;
@@ -2980,8 +3064,8 @@ SDValue PPCTargetLowering::LowerFormalArguments_32SVR4(
assert(VA.isMemLoc());
unsigned ArgSize = VA.getLocVT().getStoreSize();
- int FI = MFI->CreateFixedObject(ArgSize, VA.getLocMemOffset(),
- isImmutable);
+ int FI = MFI.CreateFixedObject(ArgSize, VA.getLocMemOffset(),
+ isImmutable);
// Create load nodes to retrieve arguments from the stack.
SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
@@ -3042,10 +3126,10 @@ SDValue PPCTargetLowering::LowerFormalArguments_32SVR4(
NumFPArgRegs * MVT(MVT::f64).getSizeInBits()/8;
FuncInfo->setVarArgsStackOffset(
- MFI->CreateFixedObject(PtrVT.getSizeInBits()/8,
- CCInfo.getNextStackOffset(), true));
+ MFI.CreateFixedObject(PtrVT.getSizeInBits()/8,
+ CCInfo.getNextStackOffset(), true));
- FuncInfo->setVarArgsFrameIndex(MFI->CreateStackObject(Depth, 8, false));
+ FuncInfo->setVarArgsFrameIndex(MFI.CreateStackObject(Depth, 8, false));
SDValue FIN = DAG.getFrameIndex(FuncInfo->getVarArgsFrameIndex(), PtrVT);
// The fixed integer arguments of a variadic function are stored to the
@@ -3118,7 +3202,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_64SVR4(
bool isELFv2ABI = Subtarget.isELFv2ABI();
bool isLittleEndian = Subtarget.isLittleEndian();
MachineFunction &MF = DAG.getMachineFunction();
- MachineFrameInfo *MFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();
assert(!(CallConv == CallingConv::Fast && isVarArg) &&
@@ -3139,10 +3223,6 @@ SDValue PPCTargetLowering::LowerFormalArguments_64SVR4(
PPC::V2, PPC::V3, PPC::V4, PPC::V5, PPC::V6, PPC::V7, PPC::V8,
PPC::V9, PPC::V10, PPC::V11, PPC::V12, PPC::V13
};
- static const MCPhysReg VSRH[] = {
- PPC::VSH2, PPC::VSH3, PPC::VSH4, PPC::VSH5, PPC::VSH6, PPC::VSH7, PPC::VSH8,
- PPC::VSH9, PPC::VSH10, PPC::VSH11, PPC::VSH12, PPC::VSH13
- };
const unsigned Num_GPR_Regs = array_lengthof(GPR);
const unsigned Num_FPR_Regs = useSoftFloat() ? 0 : 13;
@@ -3231,7 +3311,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_64SVR4(
// pretend we have an 8-byte item at the current address for that
// purpose.
if (!ObjSize) {
- int FI = MFI->CreateFixedObject(PtrByteSize, ArgOffset, true);
+ int FI = MFI.CreateFixedObject(PtrByteSize, ArgOffset, true);
SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
InVals.push_back(FIN);
continue;
@@ -3246,9 +3326,9 @@ SDValue PPCTargetLowering::LowerFormalArguments_64SVR4(
int FI;
if (HasParameterArea ||
ArgSize + ArgOffset > LinkageSize + Num_GPR_Regs * PtrByteSize)
- FI = MFI->CreateFixedObject(ArgSize, ArgOffset, false, true);
+ FI = MFI.CreateFixedObject(ArgSize, ArgOffset, false, true);
else
- FI = MFI->CreateStackObject(ArgSize, Align, false);
+ FI = MFI.CreateStackObject(ArgSize, Align, false);
SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
// Handle aggregates smaller than 8 bytes.
@@ -3418,9 +3498,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_64SVR4(
// passed directly. The latter are used to implement ELFv2 homogenous
// vector aggregates.
if (VR_idx != Num_VR_Regs) {
- unsigned VReg = (ObjectVT == MVT::v2f64 || ObjectVT == MVT::v2i64) ?
- MF.addLiveIn(VSRH[VR_idx], &PPC::VSHRCRegClass) :
- MF.addLiveIn(VR[VR_idx], &PPC::VRRCRegClass);
+ unsigned VReg = MF.addLiveIn(VR[VR_idx], &PPC::VRRCRegClass);
ArgVal = DAG.getCopyFromReg(Chain, dl, VReg, ObjectVT);
++VR_idx;
} else {
@@ -3469,7 +3547,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_64SVR4(
if (needsLoad) {
if (ObjSize < ArgSize && !isLittleEndian)
CurArgOffset += ArgSize - ObjSize;
- int FI = MFI->CreateFixedObject(ObjSize, CurArgOffset, isImmutable);
+ int FI = MFI.CreateFixedObject(ObjSize, CurArgOffset, isImmutable);
SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
ArgVal = DAG.getLoad(ObjectVT, dl, Chain, FIN, MachinePointerInfo());
}
@@ -3498,7 +3576,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_64SVR4(
int Depth = ArgOffset;
FuncInfo->setVarArgsFrameIndex(
- MFI->CreateFixedObject(PtrByteSize, Depth, true));
+ MFI.CreateFixedObject(PtrByteSize, Depth, true));
SDValue FIN = DAG.getFrameIndex(FuncInfo->getVarArgsFrameIndex(), PtrVT);
// If this function is vararg, store any remaining integer argument regs
@@ -3530,7 +3608,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_Darwin(
// TODO: add description of PPC stack frame format, or at least some docs.
//
MachineFunction &MF = DAG.getMachineFunction();
- MachineFrameInfo *MFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();
EVT PtrVT = getPointerTy(MF.getDataLayout());
@@ -3665,7 +3743,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_Darwin(
CurArgOffset = CurArgOffset + (4 - ObjSize);
}
// The value of the object is its address.
- int FI = MFI->CreateFixedObject(ObjSize, CurArgOffset, false, true);
+ int FI = MFI.CreateFixedObject(ObjSize, CurArgOffset, false, true);
SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
InVals.push_back(FIN);
if (ObjSize==1 || ObjSize==2) {
@@ -3698,7 +3776,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_Darwin(
VReg = MF.addLiveIn(GPR[GPR_idx], &PPC::G8RCRegClass);
else
VReg = MF.addLiveIn(GPR[GPR_idx], &PPC::GPRCRegClass);
- int FI = MFI->CreateFixedObject(PtrByteSize, ArgOffset, true);
+ int FI = MFI.CreateFixedObject(PtrByteSize, ArgOffset, true);
SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
SDValue Val = DAG.getCopyFromReg(Chain, dl, VReg, PtrVT);
SDValue Store = DAG.getStore(Val.getValue(1), dl, Val, FIN,
@@ -3735,7 +3813,7 @@ SDValue PPCTargetLowering::LowerFormalArguments_Darwin(
ArgOffset += PtrByteSize;
break;
}
- // FALLTHROUGH
+ LLVM_FALLTHROUGH;
case MVT::i64: // PPC64
if (GPR_idx != Num_GPR_Regs) {
unsigned VReg = MF.addLiveIn(GPR[GPR_idx], &PPC::G8RCRegClass);
@@ -3819,9 +3897,9 @@ SDValue PPCTargetLowering::LowerFormalArguments_Darwin(
// We need to load the argument to a virtual register if we determined above
// that we ran out of physical registers of the appropriate type.
if (needsLoad) {
- int FI = MFI->CreateFixedObject(ObjSize,
- CurArgOffset + (ArgSize - ObjSize),
- isImmutable);
+ int FI = MFI.CreateFixedObject(ObjSize,
+ CurArgOffset + (ArgSize - ObjSize),
+ isImmutable);
SDValue FIN = DAG.getFrameIndex(FI, PtrVT);
ArgVal = DAG.getLoad(ObjectVT, dl, Chain, FIN, MachinePointerInfo());
}
@@ -3852,8 +3930,8 @@ SDValue PPCTargetLowering::LowerFormalArguments_Darwin(
int Depth = ArgOffset;
FuncInfo->setVarArgsFrameIndex(
- MFI->CreateFixedObject(PtrVT.getSizeInBits()/8,
- Depth, true));
+ MFI.CreateFixedObject(PtrVT.getSizeInBits()/8,
+ Depth, true));
SDValue FIN = DAG.getFrameIndex(FuncInfo->getVarArgsFrameIndex(), PtrVT);
// If this function is vararg, store any remaining integer argument regs
@@ -3903,40 +3981,46 @@ static int CalculateTailCallSPDiff(SelectionDAG& DAG, bool isTailCall,
static bool isFunctionGlobalAddress(SDValue Callee);
static bool
-resideInSameModule(SDValue Callee, Reloc::Model RelMod) {
+resideInSameSection(const Function *Caller, SDValue Callee,
+ const TargetMachine &TM) {
// If !G, Callee can be an external symbol.
GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee);
- if (!G) return false;
+ if (!G)
+ return false;
const GlobalValue *GV = G->getGlobal();
-
- if (GV->isDeclaration()) return false;
-
- switch(GV->getLinkage()) {
- default: llvm_unreachable("unknow linkage type");
- case GlobalValue::AvailableExternallyLinkage:
- case GlobalValue::ExternalWeakLinkage:
+ if (!GV->isStrongDefinitionForLinker())
return false;
- // Callee with weak linkage is allowed if it has hidden or protected
- // visibility
- case GlobalValue::LinkOnceAnyLinkage:
- case GlobalValue::LinkOnceODRLinkage: // e.g. c++ inline functions
- case GlobalValue::WeakAnyLinkage:
- case GlobalValue::WeakODRLinkage: // e.g. c++ template instantiation
- if (GV->hasDefaultVisibility())
+ // Any explicitly-specified sections and section prefixes must also match.
+ // Also, if we're using -ffunction-sections, then each function is always in
+ // a different section (the same is true for COMDAT functions).
+ if (TM.getFunctionSections() || GV->hasComdat() || Caller->hasComdat() ||
+ GV->getSection() != Caller->getSection())
+ return false;
+ if (const auto *F = dyn_cast<Function>(GV)) {
+ if (F->getSectionPrefix() != Caller->getSectionPrefix())
return false;
-
- case GlobalValue::ExternalLinkage:
- case GlobalValue::InternalLinkage:
- case GlobalValue::PrivateLinkage:
- break;
}
- // With '-fPIC', calling default visiblity function need insert 'nop' after
- // function call, no matter that function resides in same module or not, so
- // we treat it as in different module.
- if (RelMod == Reloc::PIC_ && GV->hasDefaultVisibility())
+ // If the callee might be interposed, then we can't assume the ultimate call
+ // target will be in the same section. Even in cases where we can assume that
+ // interposition won't happen, in any case where the linker might insert a
+ // stub to allow for interposition, we must generate code as though
+ // interposition might occur. To understand why this matters, consider a
+ // situation where: a -> b -> c where the arrows indicate calls. b and c are
+ // in the same section, but a is in a different module (i.e. has a different
+ // TOC base pointer). If the linker allows for interposition between b and c,
+ // then it will generate a stub for the call edge between b and c which will
+ // save the TOC pointer into the designated stack slot allocated by b. If we
+ // return true here, and therefore allow a tail call between b and c, that
+ // stack slot won't exist and the b -> c stub will end up saving b'c TOC base
+ // pointer into the stack slot allocated by a (where the a -> b stub saved
+ // a's TOC base pointer). If we're not considering a tail call, but rather,
+ // whether a nop is needed after the call instruction in b, because the linker
+ // will insert a stub, it might complain about a missing nop if we omit it
+ // (although many don't complain in this case).
+ if (!TM.shouldAssumeDSOLocal(*Caller->getParent(), GV))
return false;
return true;
@@ -4037,8 +4121,7 @@ PPCTargetLowering::IsEligibleForTailCallOptimization_64SVR4(
return false;
// Caller contains any byval parameter is not supported.
- if (std::any_of(Ins.begin(), Ins.end(),
- [](const ISD::InputArg& IA) { return IA.Flags.isByVal(); }))
+ if (any_of(Ins, [](const ISD::InputArg &IA) { return IA.Flags.isByVal(); }))
return false;
// Callee contains any byval parameter is not supported, too.
@@ -4053,11 +4136,11 @@ PPCTargetLowering::IsEligibleForTailCallOptimization_64SVR4(
!isa<ExternalSymbolSDNode>(Callee))
return false;
- // Check if Callee resides in the same module, because for now, PPC64 SVR4 ABI
- // (ELFv1/ELFv2) doesn't allow tail calls to a symbol resides in another
- // module.
+ // Check if Callee resides in the same section, because for now, PPC64 SVR4
+ // ABI (ELFv1/ELFv2) doesn't allow tail calls to a symbol resides in another
+ // section.
// ref: https://bugzilla.mozilla.org/show_bug.cgi?id=973977
- if (!resideInSameModule(Callee, getTargetMachine().getRelocationModel()))
+ if (!resideInSameSection(MF.getFunction(), Callee, getTargetMachine()))
return false;
// TCO allows altering callee ABI, so we don't have to check further.
@@ -4174,8 +4257,8 @@ static SDValue EmitTailCallStoreFPAndRetAddr(SelectionDAG &DAG, SDValue Chain,
bool isPPC64 = Subtarget.isPPC64();
int SlotSize = isPPC64 ? 8 : 4;
int NewRetAddrLoc = SPDiff + FL->getReturnSaveOffset();
- int NewRetAddr = MF.getFrameInfo()->CreateFixedObject(SlotSize,
- NewRetAddrLoc, true);
+ int NewRetAddr = MF.getFrameInfo().CreateFixedObject(SlotSize,
+ NewRetAddrLoc, true);
EVT VT = isPPC64 ? MVT::i64 : MVT::i32;
SDValue NewRetAddrFrIdx = DAG.getFrameIndex(NewRetAddr, VT);
Chain = DAG.getStore(Chain, dl, OldRetAddr, NewRetAddrFrIdx,
@@ -4185,8 +4268,8 @@ static SDValue EmitTailCallStoreFPAndRetAddr(SelectionDAG &DAG, SDValue Chain,
// slot as the FP is never overwritten.
if (Subtarget.isDarwinABI()) {
int NewFPLoc = SPDiff + FL->getFramePointerSaveOffset();
- int NewFPIdx = MF.getFrameInfo()->CreateFixedObject(SlotSize, NewFPLoc,
- true);
+ int NewFPIdx = MF.getFrameInfo().CreateFixedObject(SlotSize, NewFPLoc,
+ true);
SDValue NewFramePtrIdx = DAG.getFrameIndex(NewFPIdx, VT);
Chain = DAG.getStore(Chain, dl, OldFP, NewFramePtrIdx,
MachinePointerInfo::getFixedStack(
@@ -4203,8 +4286,8 @@ CalculateTailCallArgDest(SelectionDAG &DAG, MachineFunction &MF, bool isPPC64,
SDValue Arg, int SPDiff, unsigned ArgOffset,
SmallVectorImpl<TailCallArgumentInfo>& TailCallArguments) {
int Offset = ArgOffset + SPDiff;
- uint32_t OpSize = (Arg.getValueType().getSizeInBits()+7)/8;
- int FI = MF.getFrameInfo()->CreateFixedObject(OpSize, Offset, true);
+ uint32_t OpSize = (Arg.getValueSizeInBits() + 7) / 8;
+ int FI = MF.getFrameInfo().CreateFixedObject(OpSize, Offset, true);
EVT VT = isPPC64 ? MVT::i64 : MVT::i32;
SDValue FIN = DAG.getFrameIndex(FI, VT);
TailCallArgumentInfo Info;
@@ -4430,7 +4513,8 @@ PrepareCall(SelectionDAG &DAG, SDValue &Callee, SDValue &InFlag, SDValue &Chain,
LDChain = CallSeqStart.getValue(CallSeqStart->getNumValues()-2);
auto MMOFlags = Subtarget.hasInvariantFunctionDescriptors()
- ? MachineMemOperand::MOInvariant
+ ? (MachineMemOperand::MODereferenceable |
+ MachineMemOperand::MOInvariant)
: MachineMemOperand::MONone;
MachinePointerInfo MPI(CS ? CS->getCalledValue() : nullptr);
@@ -4514,14 +4598,6 @@ PrepareCall(SelectionDAG &DAG, SDValue &Callee, SDValue &InFlag, SDValue &Chain,
return CallOpc;
}
-static
-bool isLocalCall(const SDValue &Callee)
-{
- if (GlobalAddressSDNode *G = dyn_cast<GlobalAddressSDNode>(Callee))
- return G->getGlobal()->isStrongDefinitionForLinker();
- return false;
-}
-
SDValue PPCTargetLowering::LowerCallResult(
SDValue Chain, SDValue InFlag, CallingConv::ID CallConv, bool isVarArg,
const SmallVectorImpl<ISD::InputArg> &Ins, const SDLoc &dl,
@@ -4610,7 +4686,7 @@ SDValue PPCTargetLowering::FinishCall(
isa<ConstantSDNode>(Callee)) &&
"Expecting an global address, external symbol, absolute value or register");
- DAG.getMachineFunction().getFrameInfo()->setHasTailCall();
+ DAG.getMachineFunction().getFrameInfo().setHasTailCall();
return DAG.getNode(PPCISD::TC_RETURN, dl, MVT::Other, Ops);
}
@@ -4623,6 +4699,7 @@ SDValue PPCTargetLowering::FinishCall(
// stack frame. If caller and callee belong to the same module (and have the
// same TOC), the NOP will remain unchanged.
+ MachineFunction &MF = DAG.getMachineFunction();
if (!isTailCall && Subtarget.isSVR4ABI()&& Subtarget.isPPC64() &&
!isPatchPoint) {
if (CallOpc == PPCISD::BCTRL) {
@@ -4646,11 +4723,11 @@ SDValue PPCTargetLowering::FinishCall(
// The address needs to go after the chain input but before the flag (or
// any other variadic arguments).
Ops.insert(std::next(Ops.begin()), AddTOC);
- } else if ((CallOpc == PPCISD::CALL) &&
- (!isLocalCall(Callee) ||
- DAG.getTarget().getRelocationModel() == Reloc::PIC_))
+ } else if (CallOpc == PPCISD::CALL &&
+ !resideInSameSection(MF.getFunction(), Callee, DAG.getTarget())) {
// Otherwise insert NOP for non-local calls.
CallOpc = PPCISD::CALL_NOP;
+ }
}
Chain = DAG.getNode(CallOpc, dl, NodeTys, Ops);
@@ -5026,10 +5103,6 @@ SDValue PPCTargetLowering::LowerCall_64SVR4(
PPC::V2, PPC::V3, PPC::V4, PPC::V5, PPC::V6, PPC::V7, PPC::V8,
PPC::V9, PPC::V10, PPC::V11, PPC::V12, PPC::V13
};
- static const MCPhysReg VSRH[] = {
- PPC::VSH2, PPC::VSH3, PPC::VSH4, PPC::VSH5, PPC::VSH6, PPC::VSH7, PPC::VSH8,
- PPC::VSH9, PPC::VSH10, PPC::VSH11, PPC::VSH12, PPC::VSH13
- };
const unsigned NumGPRs = array_lengthof(GPR);
const unsigned NumFPRs = 13;
@@ -5456,13 +5529,7 @@ SDValue PPCTargetLowering::LowerCall_64SVR4(
SDValue Load =
DAG.getLoad(MVT::v4f32, dl, Store, PtrOff, MachinePointerInfo());
MemOpChains.push_back(Load.getValue(1));
-
- unsigned VReg = (Arg.getSimpleValueType() == MVT::v2f64 ||
- Arg.getSimpleValueType() == MVT::v2i64) ?
- VSRH[VR_idx] : VR[VR_idx];
- ++VR_idx;
-
- RegsToPass.push_back(std::make_pair(VReg, Load));
+ RegsToPass.push_back(std::make_pair(VR[VR_idx++], Load));
}
ArgOffset += 16;
for (unsigned i=0; i<16; i+=PtrByteSize) {
@@ -5480,12 +5547,7 @@ SDValue PPCTargetLowering::LowerCall_64SVR4(
// Non-varargs Altivec params go into VRs or on the stack.
if (VR_idx != NumVRs) {
- unsigned VReg = (Arg.getSimpleValueType() == MVT::v2f64 ||
- Arg.getSimpleValueType() == MVT::v2i64) ?
- VSRH[VR_idx] : VR[VR_idx];
- ++VR_idx;
-
- RegsToPass.push_back(std::make_pair(VReg, Arg));
+ RegsToPass.push_back(std::make_pair(VR[VR_idx++], Arg));
} else {
if (CallConv == CallingConv::Fast)
ComputePtrOff();
@@ -6126,7 +6188,7 @@ SDValue PPCTargetLowering::getReturnAddrFrameIndex(SelectionDAG &DAG) const {
// Find out what the fix offset of the frame pointer save area.
int LROffset = Subtarget.getFrameLowering()->getReturnSaveOffset();
// Allocate the frame index for frame pointer save area.
- RASI = MF.getFrameInfo()->CreateFixedObject(isPPC64? 8 : 4, LROffset, false);
+ RASI = MF.getFrameInfo().CreateFixedObject(isPPC64? 8 : 4, LROffset, false);
// Save the result.
FI->setReturnAddrSaveIndex(RASI);
}
@@ -6149,7 +6211,7 @@ PPCTargetLowering::getFramePointerFrameIndex(SelectionDAG & DAG) const {
// Find out what the fix offset of the frame pointer save area.
int FPOffset = Subtarget.getFrameLowering()->getFramePointerSaveOffset();
// Allocate the frame index for frame pointer save area.
- FPSI = MF.getFrameInfo()->CreateFixedObject(isPPC64? 8 : 4, FPOffset, true);
+ FPSI = MF.getFrameInfo().CreateFixedObject(isPPC64? 8 : 4, FPOffset, true);
// Save the result.
FI->setFramePointerSaveIndex(FPSI);
}
@@ -6183,7 +6245,7 @@ SDValue PPCTargetLowering::LowerEH_DWARF_CFA(SDValue Op,
bool isPPC64 = Subtarget.isPPC64();
EVT PtrVT = getPointerTy(DAG.getDataLayout());
- int FI = MF.getFrameInfo()->CreateFixedObject(isPPC64 ? 8 : 4, 0, false);
+ int FI = MF.getFrameInfo().CreateFixedObject(isPPC64 ? 8 : 4, 0, false);
return DAG.getFrameIndex(FI, PtrVT);
}
@@ -6467,10 +6529,7 @@ SDValue PPCTargetLowering::LowerFP_TO_INT(SDValue Op, SelectionDAG &DAG,
LowerFP_TO_INTForReuse(Op, RLI, DAG, dl);
return DAG.getLoad(Op.getValueType(), dl, RLI.Chain, RLI.Ptr, RLI.MPI,
- RLI.Alignment,
- RLI.IsInvariant ? MachineMemOperand::MOInvariant
- : MachineMemOperand::MONone,
- RLI.AAInfo, RLI.Ranges);
+ RLI.Alignment, RLI.MMOFlags(), RLI.AAInfo, RLI.Ranges);
}
// We're trying to insert a regular store, S, and then a load, L. If the
@@ -6513,6 +6572,7 @@ bool PPCTargetLowering::canReuseLoadAddress(SDValue Op, EVT MemVT,
RLI.Chain = LD->getChain();
RLI.MPI = LD->getPointerInfo();
+ RLI.IsDereferenceable = LD->isDereferenceable();
RLI.IsInvariant = LD->isInvariant();
RLI.Alignment = LD->getAlignment();
RLI.AAInfo = LD->getAAInfo();
@@ -6545,11 +6605,17 @@ void PPCTargetLowering::spliceIntoChain(SDValue ResChain,
/// \brief Analyze profitability of direct move
/// prefer float load to int load plus direct move
/// when there is no integer use of int load
-static bool directMoveIsProfitable(const SDValue &Op) {
+bool PPCTargetLowering::directMoveIsProfitable(const SDValue &Op) const {
SDNode *Origin = Op.getOperand(0).getNode();
if (Origin->getOpcode() != ISD::LOAD)
return true;
+ // If there is no LXSIBZX/LXSIHZX, like Power8,
+ // prefer direct move if the memory size is 1 or 2 bytes.
+ MachineMemOperand *MMO = cast<LoadSDNode>(Origin)->getMemOperand();
+ if (!Subtarget.hasP9Vector() && MMO->getSize() <= 2)
+ return true;
+
for (SDNode::use_iterator UI = Origin->use_begin(),
UE = Origin->use_end();
UI != UE; ++UI) {
@@ -6705,11 +6771,8 @@ SDValue PPCTargetLowering::LowerINT_TO_FP(SDValue Op,
MachineFunction &MF = DAG.getMachineFunction();
if (canReuseLoadAddress(SINT, MVT::i64, RLI, DAG)) {
- Bits =
- DAG.getLoad(MVT::f64, dl, RLI.Chain, RLI.Ptr, RLI.MPI, RLI.Alignment,
- RLI.IsInvariant ? MachineMemOperand::MOInvariant
- : MachineMemOperand::MONone,
- RLI.AAInfo, RLI.Ranges);
+ Bits = DAG.getLoad(MVT::f64, dl, RLI.Chain, RLI.Ptr, RLI.MPI,
+ RLI.Alignment, RLI.MMOFlags(), RLI.AAInfo, RLI.Ranges);
spliceIntoChain(RLI.ResChain, Bits.getValue(1), DAG);
} else if (Subtarget.hasLFIWAX() &&
canReuseLoadAddress(SINT, MVT::i32, RLI, DAG, ISD::SEXTLOAD)) {
@@ -6736,10 +6799,10 @@ SDValue PPCTargetLowering::LowerINT_TO_FP(SDValue Op,
(Subtarget.hasFPCVT() &&
SINT.getOpcode() == ISD::ZERO_EXTEND)) &&
SINT.getOperand(0).getValueType() == MVT::i32) {
- MachineFrameInfo *FrameInfo = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
EVT PtrVT = getPointerTy(DAG.getDataLayout());
- int FrameIdx = FrameInfo->CreateStackObject(4, 4, false);
+ int FrameIdx = MFI.CreateStackObject(4, 4, false);
SDValue FIdx = DAG.getFrameIndex(FrameIdx, PtrVT);
SDValue Store =
@@ -6782,7 +6845,7 @@ SDValue PPCTargetLowering::LowerINT_TO_FP(SDValue Op,
// 64-bit register with extsw, store the WHOLE 64-bit value into the stack
// then lfd it and fcfid it.
MachineFunction &MF = DAG.getMachineFunction();
- MachineFrameInfo *FrameInfo = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
EVT PtrVT = getPointerTy(MF.getDataLayout());
SDValue Ld;
@@ -6791,7 +6854,7 @@ SDValue PPCTargetLowering::LowerINT_TO_FP(SDValue Op,
bool ReusingLoad;
if (!(ReusingLoad = canReuseLoadAddress(Op.getOperand(0), MVT::i32, RLI,
DAG))) {
- int FrameIdx = FrameInfo->CreateStackObject(4, 4, false);
+ int FrameIdx = MFI.CreateStackObject(4, 4, false);
SDValue FIdx = DAG.getFrameIndex(FrameIdx, PtrVT);
SDValue Store =
@@ -6823,7 +6886,7 @@ SDValue PPCTargetLowering::LowerINT_TO_FP(SDValue Op,
assert(Subtarget.isPPC64() &&
"i32->FP without LFIWAX supported only on PPC64");
- int FrameIdx = FrameInfo->CreateStackObject(8, 8, false);
+ int FrameIdx = MFI.CreateStackObject(8, 8, false);
SDValue FIdx = DAG.getFrameIndex(FrameIdx, PtrVT);
SDValue Ext64 = DAG.getNode(ISD::SIGN_EXTEND, dl, MVT::i64,
@@ -6882,7 +6945,7 @@ SDValue PPCTargetLowering::LowerFLT_ROUNDS_(SDValue Op,
SDValue Chain = DAG.getNode(PPCISD::MFFS, dl, NodeTys, None);
// Save FP register to stack slot
- int SSFI = MF.getFrameInfo()->CreateStackObject(8, 8, false);
+ int SSFI = MF.getFrameInfo().CreateStackObject(8, 8, false);
SDValue StackSlot = DAG.getFrameIndex(SSFI, PtrVT);
SDValue Store = DAG.getStore(DAG.getEntryNode(), dl, Chain, StackSlot,
MachinePointerInfo());
@@ -7068,6 +7131,57 @@ static SDValue BuildVSLDOI(SDValue LHS, SDValue RHS, unsigned Amt, EVT VT,
return DAG.getNode(ISD::BITCAST, dl, VT, T);
}
+/// Do we have an efficient pattern in a .td file for this node?
+///
+/// \param V - pointer to the BuildVectorSDNode being matched
+/// \param HasDirectMove - does this subtarget have VSR <-> GPR direct moves?
+///
+/// There are some patterns where it is beneficial to keep a BUILD_VECTOR
+/// node as a BUILD_VECTOR node rather than expanding it. The patterns where
+/// the opposite is true (expansion is beneficial) are:
+/// - The node builds a vector out of integers that are not 32 or 64-bits
+/// - The node builds a vector out of constants
+/// - The node is a "load-and-splat"
+/// In all other cases, we will choose to keep the BUILD_VECTOR.
+static bool haveEfficientBuildVectorPattern(BuildVectorSDNode *V,
+ bool HasDirectMove) {
+ EVT VecVT = V->getValueType(0);
+ bool RightType = VecVT == MVT::v2f64 || VecVT == MVT::v4f32 ||
+ (HasDirectMove && (VecVT == MVT::v2i64 || VecVT == MVT::v4i32));
+ if (!RightType)
+ return false;
+
+ bool IsSplat = true;
+ bool IsLoad = false;
+ SDValue Op0 = V->getOperand(0);
+
+ // This function is called in a block that confirms the node is not a constant
+ // splat. So a constant BUILD_VECTOR here means the vector is built out of
+ // different constants.
+ if (V->isConstant())
+ return false;
+ for (int i = 0, e = V->getNumOperands(); i < e; ++i) {
+ if (V->getOperand(i).isUndef())
+ return false;
+ // We want to expand nodes that represent load-and-splat even if the
+ // loaded value is a floating point truncation or conversion to int.
+ if (V->getOperand(i).getOpcode() == ISD::LOAD ||
+ (V->getOperand(i).getOpcode() == ISD::FP_ROUND &&
+ V->getOperand(i).getOperand(0).getOpcode() == ISD::LOAD) ||
+ (V->getOperand(i).getOpcode() == ISD::FP_TO_SINT &&
+ V->getOperand(i).getOperand(0).getOpcode() == ISD::LOAD) ||
+ (V->getOperand(i).getOpcode() == ISD::FP_TO_UINT &&
+ V->getOperand(i).getOperand(0).getOpcode() == ISD::LOAD))
+ IsLoad = true;
+ // If the operands are different or the input is not a load and has more
+ // uses than just this BV node, then it isn't a splat.
+ if (V->getOperand(i) != Op0 ||
+ (!IsLoad && !V->isOnlyUserOf(V->getOperand(i).getNode())))
+ IsSplat = false;
+ }
+ return !(IsSplat && IsLoad);
+}
+
// If this is a case we can't handle, return null and let the default
// expansion code take care of it. If we CAN select this case, and if it
// selects to a single instruction, return Op. Otherwise, if we can codegen
@@ -7083,8 +7197,8 @@ SDValue PPCTargetLowering::LowerBUILD_VECTOR(SDValue Op,
// We first build an i32 vector, load it into a QPX register,
// then convert it to a floating-point vector and compare it
// to a zero vector to get the boolean result.
- MachineFrameInfo *FrameInfo = DAG.getMachineFunction().getFrameInfo();
- int FrameIdx = FrameInfo->CreateStackObject(16, 16, false);
+ MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
+ int FrameIdx = MFI.CreateStackObject(16, 16, false);
MachinePointerInfo PtrInfo =
MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FrameIdx);
EVT PtrVT = getPointerTy(DAG.getDataLayout());
@@ -7189,8 +7303,15 @@ SDValue PPCTargetLowering::LowerBUILD_VECTOR(SDValue Op,
bool HasAnyUndefs;
if (! BVN->isConstantSplat(APSplatBits, APSplatUndef, SplatBitSize,
HasAnyUndefs, 0, !Subtarget.isLittleEndian()) ||
- SplatBitSize > 32)
+ SplatBitSize > 32) {
+ // BUILD_VECTOR nodes that are not constant splats of up to 32-bits can be
+ // lowered to VSX instructions under certain conditions.
+ // Without VSX, there is no pattern more efficient than expanding the node.
+ if (Subtarget.hasVSX() &&
+ haveEfficientBuildVectorPattern(BVN, Subtarget.hasDirectMove()))
+ return Op;
return SDValue();
+ }
unsigned SplatBits = APSplatBits.getZExtValue();
unsigned SplatUndef = APSplatUndef.getZExtValue();
@@ -7208,6 +7329,22 @@ SDValue PPCTargetLowering::LowerBUILD_VECTOR(SDValue Op,
return Op;
}
+ // We have XXSPLTIB for constant splats one byte wide
+ if (Subtarget.hasP9Vector() && SplatSize == 1) {
+ // This is a splat of 1-byte elements with some elements potentially undef.
+ // Rather than trying to match undef in the SDAG patterns, ensure that all
+ // elements are the same constant.
+ if (HasAnyUndefs || ISD::isBuildVectorAllOnes(BVN)) {
+ SmallVector<SDValue, 16> Ops(16, DAG.getConstant(SplatBits,
+ dl, MVT::i32));
+ SDValue NewBV = DAG.getBuildVector(MVT::v16i8, dl, Ops);
+ if (Op.getValueType() != MVT::v16i8)
+ return DAG.getBitcast(Op.getValueType(), NewBV);
+ return NewBV;
+ }
+ return Op;
+ }
+
// If the sign extended value is in the range [-16,15], use VSPLTI[bhw].
int32_t SextVal= (int32_t(SplatBits << (32-SplatBitSize)) >>
(32-SplatBitSize));
@@ -7451,6 +7588,18 @@ SDValue PPCTargetLowering::LowerVECTOR_SHUFFLE(SDValue Op,
if (Subtarget.hasVSX()) {
if (V2.isUndef() && PPC::isSplatShuffleMask(SVOp, 4)) {
int SplatIdx = PPC::getVSPLTImmediate(SVOp, 4, DAG);
+
+ // If the source for the shuffle is a scalar_to_vector that came from a
+ // 32-bit load, it will have used LXVWSX so we don't need to splat again.
+ if (Subtarget.hasP9Vector() &&
+ ((isLittleEndian && SplatIdx == 3) ||
+ (!isLittleEndian && SplatIdx == 0))) {
+ SDValue Src = V1.getOperand(0);
+ if (Src.getOpcode() == ISD::SCALAR_TO_VECTOR &&
+ Src.getOperand(0).getOpcode() == ISD::LOAD &&
+ Src.getOperand(0).hasOneUse())
+ return V1;
+ }
SDValue Conv = DAG.getNode(ISD::BITCAST, dl, MVT::v4i32, V1);
SDValue Splat = DAG.getNode(PPCISD::XXSPLT, dl, MVT::v4i32, Conv,
DAG.getConstant(SplatIdx, dl, MVT::i32));
@@ -7662,6 +7811,27 @@ static bool getVectorCompareInfo(SDValue Intrin, int &CompareOpc,
return false;
break;
+ case Intrinsic::ppc_altivec_vcmpneb_p:
+ case Intrinsic::ppc_altivec_vcmpneh_p:
+ case Intrinsic::ppc_altivec_vcmpnew_p:
+ case Intrinsic::ppc_altivec_vcmpnezb_p:
+ case Intrinsic::ppc_altivec_vcmpnezh_p:
+ case Intrinsic::ppc_altivec_vcmpnezw_p:
+ if (Subtarget.hasP9Altivec()) {
+ switch(IntrinsicID) {
+ default: llvm_unreachable("Unknown comparison intrinsic.");
+ case Intrinsic::ppc_altivec_vcmpneb_p: CompareOpc = 7; break;
+ case Intrinsic::ppc_altivec_vcmpneh_p: CompareOpc = 71; break;
+ case Intrinsic::ppc_altivec_vcmpnew_p: CompareOpc = 135; break;
+ case Intrinsic::ppc_altivec_vcmpnezb_p: CompareOpc = 263; break;
+ case Intrinsic::ppc_altivec_vcmpnezh_p: CompareOpc = 327; break;
+ case Intrinsic::ppc_altivec_vcmpnezw_p: CompareOpc = 391; break;
+ }
+ isDot = 1;
+ } else
+ return false;
+
+ break;
case Intrinsic::ppc_altivec_vcmpgefp_p: CompareOpc = 454; isDot = 1; break;
case Intrinsic::ppc_altivec_vcmpgtfp_p: CompareOpc = 710; isDot = 1; break;
case Intrinsic::ppc_altivec_vcmpgtsb_p: CompareOpc = 774; isDot = 1; break;
@@ -7723,6 +7893,26 @@ static bool getVectorCompareInfo(SDValue Intrin, int &CompareOpc,
return false;
break;
+ case Intrinsic::ppc_altivec_vcmpneb:
+ case Intrinsic::ppc_altivec_vcmpneh:
+ case Intrinsic::ppc_altivec_vcmpnew:
+ case Intrinsic::ppc_altivec_vcmpnezb:
+ case Intrinsic::ppc_altivec_vcmpnezh:
+ case Intrinsic::ppc_altivec_vcmpnezw:
+ if (Subtarget.hasP9Altivec()) {
+ switch (IntrinsicID) {
+ default: llvm_unreachable("Unknown comparison intrinsic.");
+ case Intrinsic::ppc_altivec_vcmpneb: CompareOpc = 7; break;
+ case Intrinsic::ppc_altivec_vcmpneh: CompareOpc = 71; break;
+ case Intrinsic::ppc_altivec_vcmpnew: CompareOpc = 135; break;
+ case Intrinsic::ppc_altivec_vcmpnezb: CompareOpc = 263; break;
+ case Intrinsic::ppc_altivec_vcmpnezh: CompareOpc = 327; break;
+ case Intrinsic::ppc_altivec_vcmpnezw: CompareOpc = 391; break;
+ }
+ isDot = 0;
+ } else
+ return false;
+ break;
case Intrinsic::ppc_altivec_vcmpgefp: CompareOpc = 454; isDot = 0; break;
case Intrinsic::ppc_altivec_vcmpgtfp: CompareOpc = 710; isDot = 0; break;
case Intrinsic::ppc_altivec_vcmpgtsb: CompareOpc = 774; isDot = 0; break;
@@ -7857,8 +8047,8 @@ SDValue PPCTargetLowering::LowerSCALAR_TO_VECTOR(SDValue Op,
SelectionDAG &DAG) const {
SDLoc dl(Op);
// Create a stack slot that is 16-byte aligned.
- MachineFrameInfo *FrameInfo = DAG.getMachineFunction().getFrameInfo();
- int FrameIdx = FrameInfo->CreateStackObject(16, 16, false);
+ MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
+ int FrameIdx = MFI.CreateStackObject(16, 16, false);
EVT PtrVT = getPointerTy(DAG.getDataLayout());
SDValue FIdx = DAG.getFrameIndex(FrameIdx, PtrVT);
@@ -7909,8 +8099,8 @@ SDValue PPCTargetLowering::LowerEXTRACT_VECTOR_ELT(SDValue Op,
DAG.getConstant(Intrinsic::ppc_qpx_qvfctiwu, dl, MVT::i32),
Value);
- MachineFrameInfo *FrameInfo = DAG.getMachineFunction().getFrameInfo();
- int FrameIdx = FrameInfo->CreateStackObject(16, 16, false);
+ MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
+ int FrameIdx = MFI.CreateStackObject(16, 16, false);
MachinePointerInfo PtrInfo =
MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FrameIdx);
EVT PtrVT = getPointerTy(DAG.getDataLayout());
@@ -8109,8 +8299,8 @@ SDValue PPCTargetLowering::LowerVectorStore(SDValue Op,
DAG.getConstant(Intrinsic::ppc_qpx_qvfctiwu, dl, MVT::i32),
Value);
- MachineFrameInfo *FrameInfo = DAG.getMachineFunction().getFrameInfo();
- int FrameIdx = FrameInfo->CreateStackObject(16, 16, false);
+ MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
+ int FrameIdx = MFI.CreateStackObject(16, 16, false);
MachinePointerInfo PtrInfo =
MachinePointerInfo::getFixedStack(DAG.getMachineFunction(), FrameIdx);
EVT PtrVT = getPointerTy(DAG.getDataLayout());
@@ -8545,6 +8735,7 @@ PPCTargetLowering::EmitPartwordAtomicBinary(MachineInstr &MI,
// registers without caring whether they're 32 or 64, but here we're
// doing actual arithmetic on the addresses.
bool is64bit = Subtarget.isPPC64();
+ bool isLittleEndian = Subtarget.isLittleEndian();
unsigned ZeroReg = is64bit ? PPC::ZERO8 : PPC::ZERO;
const BasicBlock *LLVM_BB = BB->getBasicBlock();
@@ -8574,7 +8765,8 @@ PPCTargetLowering::EmitPartwordAtomicBinary(MachineInstr &MI,
: &PPC::GPRCRegClass;
unsigned PtrReg = RegInfo.createVirtualRegister(RC);
unsigned Shift1Reg = RegInfo.createVirtualRegister(RC);
- unsigned ShiftReg = RegInfo.createVirtualRegister(RC);
+ unsigned ShiftReg =
+ isLittleEndian ? Shift1Reg : RegInfo.createVirtualRegister(RC);
unsigned Incr2Reg = RegInfo.createVirtualRegister(RC);
unsigned MaskReg = RegInfo.createVirtualRegister(RC);
unsigned Mask2Reg = RegInfo.createVirtualRegister(RC);
@@ -8619,8 +8811,9 @@ PPCTargetLowering::EmitPartwordAtomicBinary(MachineInstr &MI,
}
BuildMI(BB, dl, TII->get(PPC::RLWINM), Shift1Reg).addReg(Ptr1Reg)
.addImm(3).addImm(27).addImm(is8bit ? 28 : 27);
- BuildMI(BB, dl, TII->get(is64bit ? PPC::XORI8 : PPC::XORI), ShiftReg)
- .addReg(Shift1Reg).addImm(is8bit ? 24 : 16);
+ if (!isLittleEndian)
+ BuildMI(BB, dl, TII->get(is64bit ? PPC::XORI8 : PPC::XORI), ShiftReg)
+ .addReg(Shift1Reg).addImm(is8bit ? 24 : 16);
if (is64bit)
BuildMI(BB, dl, TII->get(PPC::RLDICR), PtrReg)
.addReg(Ptr1Reg).addImm(0).addImm(61);
@@ -9325,6 +9518,7 @@ PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
// since we're actually doing arithmetic on them. Other registers
// can be 32-bit.
bool is64bit = Subtarget.isPPC64();
+ bool isLittleEndian = Subtarget.isLittleEndian();
bool is8bit = MI.getOpcode() == PPC::ATOMIC_CMP_SWAP_I8;
unsigned dest = MI.getOperand(0).getReg();
@@ -9351,7 +9545,8 @@ PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
: &PPC::GPRCRegClass;
unsigned PtrReg = RegInfo.createVirtualRegister(RC);
unsigned Shift1Reg = RegInfo.createVirtualRegister(RC);
- unsigned ShiftReg = RegInfo.createVirtualRegister(RC);
+ unsigned ShiftReg =
+ isLittleEndian ? Shift1Reg : RegInfo.createVirtualRegister(RC);
unsigned NewVal2Reg = RegInfo.createVirtualRegister(RC);
unsigned NewVal3Reg = RegInfo.createVirtualRegister(RC);
unsigned OldVal2Reg = RegInfo.createVirtualRegister(RC);
@@ -9406,8 +9601,9 @@ PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
}
BuildMI(BB, dl, TII->get(PPC::RLWINM), Shift1Reg).addReg(Ptr1Reg)
.addImm(3).addImm(27).addImm(is8bit ? 28 : 27);
- BuildMI(BB, dl, TII->get(is64bit ? PPC::XORI8 : PPC::XORI), ShiftReg)
- .addReg(Shift1Reg).addImm(is8bit ? 24 : 16);
+ if (!isLittleEndian)
+ BuildMI(BB, dl, TII->get(is64bit ? PPC::XORI8 : PPC::XORI), ShiftReg)
+ .addReg(Shift1Reg).addImm(is8bit ? 24 : 16);
if (is64bit)
BuildMI(BB, dl, TII->get(PPC::RLDICR), PtrReg)
.addReg(Ptr1Reg).addImm(0).addImm(61);
@@ -9532,23 +9728,21 @@ PPCTargetLowering::EmitInstrWithCustomInserter(MachineInstr &MI,
// Target Optimization Hooks
//===----------------------------------------------------------------------===//
-static std::string getRecipOp(const char *Base, EVT VT) {
- std::string RecipOp(Base);
+static int getEstimateRefinementSteps(EVT VT, const PPCSubtarget &Subtarget) {
+ // For the estimates, convergence is quadratic, so we essentially double the
+ // number of digits correct after every iteration. For both FRE and FRSQRTE,
+ // the minimum architected relative accuracy is 2^-5. When hasRecipPrec(),
+ // this is 2^-14. IEEE float has 23 digits and double has 52 digits.
+ int RefinementSteps = Subtarget.hasRecipPrec() ? 1 : 3;
if (VT.getScalarType() == MVT::f64)
- RecipOp += "d";
- else
- RecipOp += "f";
-
- if (VT.isVector())
- RecipOp = "vec-" + RecipOp;
-
- return RecipOp;
+ RefinementSteps++;
+ return RefinementSteps;
}
-SDValue PPCTargetLowering::getRsqrtEstimate(SDValue Operand,
- DAGCombinerInfo &DCI,
- unsigned &RefinementSteps,
- bool &UseOneConstNR) const {
+SDValue PPCTargetLowering::getSqrtEstimate(SDValue Operand, SelectionDAG &DAG,
+ int Enabled, int &RefinementSteps,
+ bool &UseOneConstNR,
+ bool Reciprocal) const {
EVT VT = Operand.getValueType();
if ((VT == MVT::f32 && Subtarget.hasFRSQRTES()) ||
(VT == MVT::f64 && Subtarget.hasFRSQRTE()) ||
@@ -9556,21 +9750,18 @@ SDValue PPCTargetLowering::getRsqrtEstimate(SDValue Operand,
(VT == MVT::v2f64 && Subtarget.hasVSX()) ||
(VT == MVT::v4f32 && Subtarget.hasQPX()) ||
(VT == MVT::v4f64 && Subtarget.hasQPX())) {
- TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;
- std::string RecipOp = getRecipOp("sqrt", VT);
- if (!Recips.isEnabled(RecipOp))
- return SDValue();
+ if (RefinementSteps == ReciprocalEstimate::Unspecified)
+ RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);
- RefinementSteps = Recips.getRefinementSteps(RecipOp);
UseOneConstNR = true;
- return DCI.DAG.getNode(PPCISD::FRSQRTE, SDLoc(Operand), VT, Operand);
+ return DAG.getNode(PPCISD::FRSQRTE, SDLoc(Operand), VT, Operand);
}
return SDValue();
}
-SDValue PPCTargetLowering::getRecipEstimate(SDValue Operand,
- DAGCombinerInfo &DCI,
- unsigned &RefinementSteps) const {
+SDValue PPCTargetLowering::getRecipEstimate(SDValue Operand, SelectionDAG &DAG,
+ int Enabled,
+ int &RefinementSteps) const {
EVT VT = Operand.getValueType();
if ((VT == MVT::f32 && Subtarget.hasFRES()) ||
(VT == MVT::f64 && Subtarget.hasFRE()) ||
@@ -9578,13 +9769,9 @@ SDValue PPCTargetLowering::getRecipEstimate(SDValue Operand,
(VT == MVT::v2f64 && Subtarget.hasVSX()) ||
(VT == MVT::v4f32 && Subtarget.hasQPX()) ||
(VT == MVT::v4f64 && Subtarget.hasQPX())) {
- TargetRecip Recips = DCI.DAG.getTarget().Options.Reciprocals;
- std::string RecipOp = getRecipOp("div", VT);
- if (!Recips.isEnabled(RecipOp))
- return SDValue();
-
- RefinementSteps = Recips.getRefinementSteps(RecipOp);
- return DCI.DAG.getNode(PPCISD::FRE, SDLoc(Operand), VT, Operand);
+ if (RefinementSteps == ReciprocalEstimate::Unspecified)
+ RefinementSteps = getEstimateRefinementSteps(VT, Subtarget);
+ return DAG.getNode(PPCISD::FRE, SDLoc(Operand), VT, Operand);
}
return SDValue();
}
@@ -9635,13 +9822,13 @@ static bool isConsecutiveLSLoc(SDValue Loc, EVT VT, LSBaseSDNode *Base,
if (Loc.getOpcode() == ISD::FrameIndex) {
if (BaseLoc.getOpcode() != ISD::FrameIndex)
return false;
- const MachineFrameInfo *MFI = DAG.getMachineFunction().getFrameInfo();
+ const MachineFrameInfo &MFI = DAG.getMachineFunction().getFrameInfo();
int FI = cast<FrameIndexSDNode>(Loc)->getIndex();
int BFI = cast<FrameIndexSDNode>(BaseLoc)->getIndex();
- int FS = MFI->getObjectSize(FI);
- int BFS = MFI->getObjectSize(BFI);
+ int FS = MFI.getObjectSize(FI);
+ int BFS = MFI.getObjectSize(BFI);
if (FS != BFS || FS != (int)Bytes) return false;
- return MFI->getObjectOffset(FI) == (MFI->getObjectOffset(BFI) + Dist*Bytes);
+ return MFI.getObjectOffset(FI) == (MFI.getObjectOffset(BFI) + Dist*Bytes);
}
SDValue Base1 = Loc, Base2 = BaseLoc;
@@ -9699,9 +9886,11 @@ static bool isConsecutiveLS(SDNode *N, LSBaseSDNode *Base,
case Intrinsic::ppc_altivec_lvx:
case Intrinsic::ppc_altivec_lvxl:
case Intrinsic::ppc_vsx_lxvw4x:
+ case Intrinsic::ppc_vsx_lxvw4x_be:
VT = MVT::v4i32;
break;
case Intrinsic::ppc_vsx_lxvd2x:
+ case Intrinsic::ppc_vsx_lxvd2x_be:
VT = MVT::v2f64;
break;
case Intrinsic::ppc_altivec_lvebx:
@@ -9748,6 +9937,12 @@ static bool isConsecutiveLS(SDNode *N, LSBaseSDNode *Base,
case Intrinsic::ppc_vsx_stxvd2x:
VT = MVT::v2f64;
break;
+ case Intrinsic::ppc_vsx_stxvw4x_be:
+ VT = MVT::v4i32;
+ break;
+ case Intrinsic::ppc_vsx_stxvd2x_be:
+ VT = MVT::v2f64;
+ break;
case Intrinsic::ppc_altivec_stvebx:
VT = MVT::i8;
break;
@@ -9833,6 +10028,87 @@ static bool findConsecutiveLoad(LoadSDNode *LD, SelectionDAG &DAG) {
return false;
}
+
+/// This function is called when we have proved that a SETCC node can be replaced
+/// by subtraction (and other supporting instructions) so that the result of
+/// comparison is kept in a GPR instead of CR. This function is purely for
+/// codegen purposes and has some flags to guide the codegen process.
+static SDValue generateEquivalentSub(SDNode *N, int Size, bool Complement,
+ bool Swap, SDLoc &DL, SelectionDAG &DAG) {
+
+ assert(N->getOpcode() == ISD::SETCC && "ISD::SETCC Expected.");
+
+ // Zero extend the operands to the largest legal integer. Originally, they
+ // must be of a strictly smaller size.
+ auto Op0 = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i64, N->getOperand(0),
+ DAG.getConstant(Size, DL, MVT::i32));
+ auto Op1 = DAG.getNode(ISD::ZERO_EXTEND, DL, MVT::i64, N->getOperand(1),
+ DAG.getConstant(Size, DL, MVT::i32));
+
+ // Swap if needed. Depends on the condition code.
+ if (Swap)
+ std::swap(Op0, Op1);
+
+ // Subtract extended integers.
+ auto SubNode = DAG.getNode(ISD::SUB, DL, MVT::i64, Op0, Op1);
+
+ // Move the sign bit to the least significant position and zero out the rest.
+ // Now the least significant bit carries the result of original comparison.
+ auto Shifted = DAG.getNode(ISD::SRL, DL, MVT::i64, SubNode,
+ DAG.getConstant(Size - 1, DL, MVT::i32));
+ auto Final = Shifted;
+
+ // Complement the result if needed. Based on the condition code.
+ if (Complement)
+ Final = DAG.getNode(ISD::XOR, DL, MVT::i64, Shifted,
+ DAG.getConstant(1, DL, MVT::i64));
+
+ return DAG.getNode(ISD::TRUNCATE, DL, MVT::i1, Final);
+}
+
+SDValue PPCTargetLowering::ConvertSETCCToSubtract(SDNode *N,
+ DAGCombinerInfo &DCI) const {
+
+ assert(N->getOpcode() == ISD::SETCC && "ISD::SETCC Expected.");
+
+ SelectionDAG &DAG = DCI.DAG;
+ SDLoc DL(N);
+
+ // Size of integers being compared has a critical role in the following
+ // analysis, so we prefer to do this when all types are legal.
+ if (!DCI.isAfterLegalizeVectorOps())
+ return SDValue();
+
+ // If all users of SETCC extend its value to a legal integer type
+ // then we replace SETCC with a subtraction
+ for (SDNode::use_iterator UI = N->use_begin(),
+ UE = N->use_end(); UI != UE; ++UI) {
+ if (UI->getOpcode() != ISD::ZERO_EXTEND)
+ return SDValue();
+ }
+
+ ISD::CondCode CC = cast<CondCodeSDNode>(N->getOperand(2))->get();
+ auto OpSize = N->getOperand(0).getValueSizeInBits();
+
+ unsigned Size = DAG.getDataLayout().getLargestLegalIntTypeSizeInBits();
+
+ if (OpSize < Size) {
+ switch (CC) {
+ default: break;
+ case ISD::SETULT:
+ return generateEquivalentSub(N, Size, false, false, DL, DAG);
+ case ISD::SETULE:
+ return generateEquivalentSub(N, Size, true, true, DL, DAG);
+ case ISD::SETUGT:
+ return generateEquivalentSub(N, Size, false, true, DL, DAG);
+ case ISD::SETUGE:
+ return generateEquivalentSub(N, Size, true, false, DL, DAG);
+ }
+ }
+
+ return SDValue();
+}
+
SDValue PPCTargetLowering::DAGCombineTruncBoolExt(SDNode *N,
DAGCombinerInfo &DCI) const {
SelectionDAG &DAG = DCI.DAG;
@@ -9874,7 +10150,8 @@ SDValue PPCTargetLowering::DAGCombineTruncBoolExt(SDNode *N,
APInt::getHighBitsSet(OpBits, OpBits-1)) ||
!DAG.MaskedValueIsZero(N->getOperand(1),
APInt::getHighBitsSet(OpBits, OpBits-1)))
- return SDValue();
+ return (N->getOpcode() == ISD::SETCC ? ConvertSETCCToSubtract(N, DCI)
+ : SDValue());
} else {
// This is neither a signed nor an unsigned comparison, just make sure
// that the high bits are equal.
@@ -10398,6 +10675,173 @@ SDValue PPCTargetLowering::DAGCombineExtBoolTrunc(SDNode *N,
ShiftCst);
}
+/// \brief Reduces the number of fp-to-int conversion when building a vector.
+///
+/// If this vector is built out of floating to integer conversions,
+/// transform it to a vector built out of floating point values followed by a
+/// single floating to integer conversion of the vector.
+/// Namely (build_vector (fptosi $A), (fptosi $B), ...)
+/// becomes (fptosi (build_vector ($A, $B, ...)))
+SDValue PPCTargetLowering::
+combineElementTruncationToVectorTruncation(SDNode *N,
+ DAGCombinerInfo &DCI) const {
+ assert(N->getOpcode() == ISD::BUILD_VECTOR &&
+ "Should be called with a BUILD_VECTOR node");
+
+ SelectionDAG &DAG = DCI.DAG;
+ SDLoc dl(N);
+
+ SDValue FirstInput = N->getOperand(0);
+ assert(FirstInput.getOpcode() == PPCISD::MFVSR &&
+ "The input operand must be an fp-to-int conversion.");
+
+ // This combine happens after legalization so the fp_to_[su]i nodes are
+ // already converted to PPCSISD nodes.
+ unsigned FirstConversion = FirstInput.getOperand(0).getOpcode();
+ if (FirstConversion == PPCISD::FCTIDZ ||
+ FirstConversion == PPCISD::FCTIDUZ ||
+ FirstConversion == PPCISD::FCTIWZ ||
+ FirstConversion == PPCISD::FCTIWUZ) {
+ bool IsSplat = true;
+ bool Is32Bit = FirstConversion == PPCISD::FCTIWZ ||
+ FirstConversion == PPCISD::FCTIWUZ;
+ EVT SrcVT = FirstInput.getOperand(0).getValueType();
+ SmallVector<SDValue, 4> Ops;
+ EVT TargetVT = N->getValueType(0);
+ for (int i = 0, e = N->getNumOperands(); i < e; ++i) {
+ if (N->getOperand(i).getOpcode() != PPCISD::MFVSR)
+ return SDValue();
+ unsigned NextConversion = N->getOperand(i).getOperand(0).getOpcode();
+ if (NextConversion != FirstConversion)
+ return SDValue();
+ if (N->getOperand(i) != FirstInput)
+ IsSplat = false;
+ }
+
+ // If this is a splat, we leave it as-is since there will be only a single
+ // fp-to-int conversion followed by a splat of the integer. This is better
+ // for 32-bit and smaller ints and neutral for 64-bit ints.
+ if (IsSplat)
+ return SDValue();
+
+ // Now that we know we have the right type of node, get its operands
+ for (int i = 0, e = N->getNumOperands(); i < e; ++i) {
+ SDValue In = N->getOperand(i).getOperand(0);
+ // For 32-bit values, we need to add an FP_ROUND node.
+ if (Is32Bit) {
+ if (In.isUndef())
+ Ops.push_back(DAG.getUNDEF(SrcVT));
+ else {
+ SDValue Trunc = DAG.getNode(ISD::FP_ROUND, dl,
+ MVT::f32, In.getOperand(0),
+ DAG.getIntPtrConstant(1, dl));
+ Ops.push_back(Trunc);
+ }
+ } else
+ Ops.push_back(In.isUndef() ? DAG.getUNDEF(SrcVT) : In.getOperand(0));
+ }
+
+ unsigned Opcode;
+ if (FirstConversion == PPCISD::FCTIDZ ||
+ FirstConversion == PPCISD::FCTIWZ)
+ Opcode = ISD::FP_TO_SINT;
+ else
+ Opcode = ISD::FP_TO_UINT;
+
+ EVT NewVT = TargetVT == MVT::v2i64 ? MVT::v2f64 : MVT::v4f32;
+ SDValue BV = DAG.getBuildVector(NewVT, dl, Ops);
+ return DAG.getNode(Opcode, dl, TargetVT, BV);
+ }
+ return SDValue();
+}
+
+/// \brief Reduce the number of loads when building a vector.
+///
+/// Building a vector out of multiple loads can be converted to a load
+/// of the vector type if the loads are consecutive. If the loads are
+/// consecutive but in descending order, a shuffle is added at the end
+/// to reorder the vector.
+static SDValue combineBVOfConsecutiveLoads(SDNode *N, SelectionDAG &DAG) {
+ assert(N->getOpcode() == ISD::BUILD_VECTOR &&
+ "Should be called with a BUILD_VECTOR node");
+
+ SDLoc dl(N);
+ bool InputsAreConsecutiveLoads = true;
+ bool InputsAreReverseConsecutive = true;
+ unsigned ElemSize = N->getValueType(0).getScalarSizeInBits() / 8;
+ SDValue FirstInput = N->getOperand(0);
+ bool IsRoundOfExtLoad = false;
+
+ if (FirstInput.getOpcode() == ISD::FP_ROUND &&
+ FirstInput.getOperand(0).getOpcode() == ISD::LOAD) {
+ LoadSDNode *LD = dyn_cast<LoadSDNode>(FirstInput.getOperand(0));
+ IsRoundOfExtLoad = LD->getExtensionType() == ISD::EXTLOAD;
+ }
+ // Not a build vector of (possibly fp_rounded) loads.
+ if (!IsRoundOfExtLoad && FirstInput.getOpcode() != ISD::LOAD)
+ return SDValue();
+
+ for (int i = 1, e = N->getNumOperands(); i < e; ++i) {
+ // If any inputs are fp_round(extload), they all must be.
+ if (IsRoundOfExtLoad && N->getOperand(i).getOpcode() != ISD::FP_ROUND)
+ return SDValue();
+
+ SDValue NextInput = IsRoundOfExtLoad ? N->getOperand(i).getOperand(0) :
+ N->getOperand(i);
+ if (NextInput.getOpcode() != ISD::LOAD)
+ return SDValue();
+
+ SDValue PreviousInput =
+ IsRoundOfExtLoad ? N->getOperand(i-1).getOperand(0) : N->getOperand(i-1);
+ LoadSDNode *LD1 = dyn_cast<LoadSDNode>(PreviousInput);
+ LoadSDNode *LD2 = dyn_cast<LoadSDNode>(NextInput);
+
+ // If any inputs are fp_round(extload), they all must be.
+ if (IsRoundOfExtLoad && LD2->getExtensionType() != ISD::EXTLOAD)
+ return SDValue();
+
+ if (!isConsecutiveLS(LD2, LD1, ElemSize, 1, DAG))
+ InputsAreConsecutiveLoads = false;
+ if (!isConsecutiveLS(LD1, LD2, ElemSize, 1, DAG))
+ InputsAreReverseConsecutive = false;
+
+ // Exit early if the loads are neither consecutive nor reverse consecutive.
+ if (!InputsAreConsecutiveLoads && !InputsAreReverseConsecutive)
+ return SDValue();
+ }
+
+ assert(!(InputsAreConsecutiveLoads && InputsAreReverseConsecutive) &&
+ "The loads cannot be both consecutive and reverse consecutive.");
+
+ SDValue FirstLoadOp =
+ IsRoundOfExtLoad ? FirstInput.getOperand(0) : FirstInput;
+ SDValue LastLoadOp =
+ IsRoundOfExtLoad ? N->getOperand(N->getNumOperands()-1).getOperand(0) :
+ N->getOperand(N->getNumOperands()-1);
+
+ LoadSDNode *LD1 = dyn_cast<LoadSDNode>(FirstLoadOp);
+ LoadSDNode *LDL = dyn_cast<LoadSDNode>(LastLoadOp);
+ if (InputsAreConsecutiveLoads) {
+ assert(LD1 && "Input needs to be a LoadSDNode.");
+ return DAG.getLoad(N->getValueType(0), dl, LD1->getChain(),
+ LD1->getBasePtr(), LD1->getPointerInfo(),
+ LD1->getAlignment());
+ }
+ if (InputsAreReverseConsecutive) {
+ assert(LDL && "Input needs to be a LoadSDNode.");
+ SDValue Load = DAG.getLoad(N->getValueType(0), dl, LDL->getChain(),
+ LDL->getBasePtr(), LDL->getPointerInfo(),
+ LDL->getAlignment());
+ SmallVector<int, 16> Ops;
+ for (int i = N->getNumOperands() - 1; i >= 0; i--)
+ Ops.push_back(i);
+
+ return DAG.getVectorShuffle(N->getValueType(0), dl, Load,
+ DAG.getUNDEF(N->getValueType(0)), Ops);
+ }
+ return SDValue();
+}
+
SDValue PPCTargetLowering::DAGCombineBuildVector(SDNode *N,
DAGCombinerInfo &DCI) const {
assert(N->getOpcode() == ISD::BUILD_VECTOR &&
@@ -10405,21 +10849,41 @@ SDValue PPCTargetLowering::DAGCombineBuildVector(SDNode *N,
SelectionDAG &DAG = DCI.DAG;
SDLoc dl(N);
- if (N->getValueType(0) != MVT::v2f64 || !Subtarget.hasVSX())
+
+ if (!Subtarget.hasVSX())
+ return SDValue();
+
+ // The target independent DAG combiner will leave a build_vector of
+ // float-to-int conversions intact. We can generate MUCH better code for
+ // a float-to-int conversion of a vector of floats.
+ SDValue FirstInput = N->getOperand(0);
+ if (FirstInput.getOpcode() == PPCISD::MFVSR) {
+ SDValue Reduced = combineElementTruncationToVectorTruncation(N, DCI);
+ if (Reduced)
+ return Reduced;
+ }
+
+ // If we're building a vector out of consecutive loads, just load that
+ // vector type.
+ SDValue Reduced = combineBVOfConsecutiveLoads(N, DAG);
+ if (Reduced)
+ return Reduced;
+
+ if (N->getValueType(0) != MVT::v2f64)
return SDValue();
// Looking for:
// (build_vector ([su]int_to_fp (extractelt 0)), [su]int_to_fp (extractelt 1))
- if (N->getOperand(0).getOpcode() != ISD::SINT_TO_FP &&
- N->getOperand(0).getOpcode() != ISD::UINT_TO_FP)
+ if (FirstInput.getOpcode() != ISD::SINT_TO_FP &&
+ FirstInput.getOpcode() != ISD::UINT_TO_FP)
return SDValue();
if (N->getOperand(1).getOpcode() != ISD::SINT_TO_FP &&
N->getOperand(1).getOpcode() != ISD::UINT_TO_FP)
return SDValue();
- if (N->getOperand(0).getOpcode() != N->getOperand(1).getOpcode())
+ if (FirstInput.getOpcode() != N->getOperand(1).getOpcode())
return SDValue();
- SDValue Ext1 = N->getOperand(0).getOperand(0);
+ SDValue Ext1 = FirstInput.getOperand(0);
SDValue Ext2 = N->getOperand(1).getOperand(0);
if(Ext1.getOpcode() != ISD::EXTRACT_VECTOR_ELT ||
Ext2.getOpcode() != ISD::EXTRACT_VECTOR_ELT)
@@ -10464,6 +10928,34 @@ SDValue PPCTargetLowering::combineFPToIntToFP(SDNode *N,
SDLoc dl(N);
SDValue Op(N, 0);
+ SDValue FirstOperand(Op.getOperand(0));
+ bool SubWordLoad = FirstOperand.getOpcode() == ISD::LOAD &&
+ (FirstOperand.getValueType() == MVT::i8 ||
+ FirstOperand.getValueType() == MVT::i16);
+ if (Subtarget.hasP9Vector() && Subtarget.hasP9Altivec() && SubWordLoad) {
+ bool Signed = N->getOpcode() == ISD::SINT_TO_FP;
+ bool DstDouble = Op.getValueType() == MVT::f64;
+ unsigned ConvOp = Signed ?
+ (DstDouble ? PPCISD::FCFID : PPCISD::FCFIDS) :
+ (DstDouble ? PPCISD::FCFIDU : PPCISD::FCFIDUS);
+ SDValue WidthConst =
+ DAG.getIntPtrConstant(FirstOperand.getValueType() == MVT::i8 ? 1 : 2,
+ dl, false);
+ LoadSDNode *LDN = cast<LoadSDNode>(FirstOperand.getNode());
+ SDValue Ops[] = { LDN->getChain(), LDN->getBasePtr(), WidthConst };
+ SDValue Ld = DAG.getMemIntrinsicNode(PPCISD::LXSIZX, dl,
+ DAG.getVTList(MVT::f64, MVT::Other),
+ Ops, MVT::i8, LDN->getMemOperand());
+
+ // For signed conversion, we need to sign-extend the value in the VSR
+ if (Signed) {
+ SDValue ExtOps[] = { Ld, WidthConst };
+ SDValue Ext = DAG.getNode(PPCISD::VEXTS, dl, MVT::f64, ExtOps);
+ return DAG.getNode(ConvOp, dl, DstDouble ? MVT::f64 : MVT::f32, Ext);
+ } else
+ return DAG.getNode(ConvOp, dl, DstDouble ? MVT::f64 : MVT::f32, Ld);
+ }
+
// Don't handle ppc_fp128 here or i1 conversions.
if (Op.getValueType() != MVT::f32 && Op.getValueType() != MVT::f64)
return SDValue();
@@ -10676,10 +11168,14 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
case ISD::UINT_TO_FP:
return combineFPToIntToFP(N, DCI);
case ISD::STORE: {
+ EVT Op1VT = N->getOperand(1).getValueType();
+ bool ValidTypeForStoreFltAsInt = (Op1VT == MVT::i32) ||
+ (Subtarget.hasP9Vector() && (Op1VT == MVT::i8 || Op1VT == MVT::i16));
+
// Turn STORE (FP_TO_SINT F) -> STFIWX(FCTIWZ(F)).
if (Subtarget.hasSTFIWX() && !cast<StoreSDNode>(N)->isTruncatingStore() &&
N->getOperand(1).getOpcode() == ISD::FP_TO_SINT &&
- N->getOperand(1).getValueType() == MVT::i32 &&
+ ValidTypeForStoreFltAsInt &&
N->getOperand(1).getOperand(0).getValueType() != MVT::ppcf128) {
SDValue Val = N->getOperand(1).getOperand(0);
if (Val.getValueType() == MVT::f32) {
@@ -10689,15 +11185,31 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
Val = DAG.getNode(PPCISD::FCTIWZ, dl, MVT::f64, Val);
DCI.AddToWorklist(Val.getNode());
- SDValue Ops[] = {
- N->getOperand(0), Val, N->getOperand(2),
- DAG.getValueType(N->getOperand(1).getValueType())
- };
+ if (Op1VT == MVT::i32) {
+ SDValue Ops[] = {
+ N->getOperand(0), Val, N->getOperand(2),
+ DAG.getValueType(N->getOperand(1).getValueType())
+ };
+
+ Val = DAG.getMemIntrinsicNode(PPCISD::STFIWX, dl,
+ DAG.getVTList(MVT::Other), Ops,
+ cast<StoreSDNode>(N)->getMemoryVT(),
+ cast<StoreSDNode>(N)->getMemOperand());
+ } else {
+ unsigned WidthInBytes =
+ N->getOperand(1).getValueType() == MVT::i8 ? 1 : 2;
+ SDValue WidthConst = DAG.getIntPtrConstant(WidthInBytes, dl, false);
+
+ SDValue Ops[] = {
+ N->getOperand(0), Val, N->getOperand(2), WidthConst,
+ DAG.getValueType(N->getOperand(1).getValueType())
+ };
+ Val = DAG.getMemIntrinsicNode(PPCISD::STXSIX, dl,
+ DAG.getVTList(MVT::Other), Ops,
+ cast<StoreSDNode>(N)->getMemoryVT(),
+ cast<StoreSDNode>(N)->getMemOperand());
+ }
- Val = DAG.getMemIntrinsicNode(PPCISD::STFIWX, dl,
- DAG.getVTList(MVT::Other), Ops,
- cast<StoreSDNode>(N)->getMemoryVT(),
- cast<StoreSDNode>(N)->getMemOperand());
DCI.AddToWorklist(Val.getNode());
return Val;
}
@@ -10726,10 +11238,11 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
}
// For little endian, VSX stores require generating xxswapd/lxvd2x.
+ // Not needed on ISA 3.0 based CPUs since we have a non-permuting store.
EVT VT = N->getOperand(1).getValueType();
if (VT.isSimple()) {
MVT StoreVT = VT.getSimpleVT();
- if (Subtarget.hasVSX() && Subtarget.isLittleEndian() &&
+ if (Subtarget.needsSwapsForVSXMemOps() &&
(StoreVT == MVT::v2f64 || StoreVT == MVT::v2i64 ||
StoreVT == MVT::v4f32 || StoreVT == MVT::v4i32))
return expandVSXStoreForLE(N, DCI);
@@ -10741,9 +11254,10 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
EVT VT = LD->getValueType(0);
// For little endian, VSX loads require generating lxvd2x/xxswapd.
+ // Not needed on ISA 3.0 based CPUs since we have a non-permuting load.
if (VT.isSimple()) {
MVT LoadVT = VT.getSimpleVT();
- if (Subtarget.hasVSX() && Subtarget.isLittleEndian() &&
+ if (Subtarget.needsSwapsForVSXMemOps() &&
(LoadVT == MVT::v2f64 || LoadVT == MVT::v2i64 ||
LoadVT == MVT::v4f32 || LoadVT == MVT::v4i32))
return expandVSXLoadForLE(N, DCI);
@@ -11014,11 +11528,9 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
int Bits = IID == Intrinsic::ppc_qpx_qvlpcld ?
5 /* 32 byte alignment */ : 4 /* 16 byte alignment */;
- if (DAG.MaskedValueIsZero(
- Add->getOperand(1),
- APInt::getAllOnesValue(Bits /* alignment */)
- .zext(
- Add.getValueType().getScalarType().getSizeInBits()))) {
+ if (DAG.MaskedValueIsZero(Add->getOperand(1),
+ APInt::getAllOnesValue(Bits /* alignment */)
+ .zext(Add.getScalarValueSizeInBits()))) {
SDNode *BasePtr = Add->getOperand(0).getNode();
for (SDNode::use_iterator UI = BasePtr->use_begin(),
UE = BasePtr->use_end();
@@ -11060,7 +11572,8 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
break;
case ISD::INTRINSIC_W_CHAIN: {
// For little endian, VSX loads require generating lxvd2x/xxswapd.
- if (Subtarget.hasVSX() && Subtarget.isLittleEndian()) {
+ // Not needed on ISA 3.0 based CPUs since we have a non-permuting load.
+ if (Subtarget.needsSwapsForVSXMemOps()) {
switch (cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()) {
default:
break;
@@ -11073,7 +11586,8 @@ SDValue PPCTargetLowering::PerformDAGCombine(SDNode *N,
}
case ISD::INTRINSIC_VOID: {
// For little endian, VSX stores require generating xxswapd/stxvd2x.
- if (Subtarget.hasVSX() && Subtarget.isLittleEndian()) {
+ // Not needed on ISA 3.0 based CPUs since we have a non-permuting store.
+ if (Subtarget.needsSwapsForVSXMemOps()) {
switch (cast<ConstantSDNode>(N->getOperand(1))->getZExtValue()) {
default:
break;
@@ -11392,7 +11906,7 @@ unsigned PPCTargetLowering::getPrefLoopAlignment(MachineLoop *ML) const {
uint64_t LoopSize = 0;
for (auto I = ML->block_begin(), IE = ML->block_end(); I != IE; ++I)
for (auto J = (*I)->begin(), JE = (*I)->end(); J != JE; ++J) {
- LoopSize += TII->GetInstSizeInBytes(*J);
+ LoopSize += TII->getInstSizeInBytes(*J);
if (LoopSize > 32)
break;
}
@@ -11688,8 +12202,8 @@ bool PPCTargetLowering::isLegalAddressingMode(const DataLayout &DL,
SDValue PPCTargetLowering::LowerRETURNADDR(SDValue Op,
SelectionDAG &DAG) const {
MachineFunction &MF = DAG.getMachineFunction();
- MachineFrameInfo *MFI = MF.getFrameInfo();
- MFI->setReturnAddressIsTaken(true);
+ MachineFrameInfo &MFI = MF.getFrameInfo();
+ MFI.setReturnAddressIsTaken(true);
if (verifyReturnAddressArgumentIsConstant(Op, DAG))
return SDValue();
@@ -11726,8 +12240,8 @@ SDValue PPCTargetLowering::LowerFRAMEADDR(SDValue Op,
unsigned Depth = cast<ConstantSDNode>(Op.getOperand(0))->getZExtValue();
MachineFunction &MF = DAG.getMachineFunction();
- MachineFrameInfo *MFI = MF.getFrameInfo();
- MFI->setFrameAddressIsTaken(true);
+ MachineFrameInfo &MFI = MF.getFrameInfo();
+ MFI.setFrameAddressIsTaken(true);
EVT PtrVT = getPointerTy(MF.getDataLayout());
bool isPPC64 = PtrVT == MVT::i64;
@@ -12237,3 +12751,20 @@ void PPCTargetLowering::insertSSPDeclarations(Module &M) const {
if (!Subtarget.isTargetLinux())
return TargetLowering::insertSSPDeclarations(M);
}
+
+bool PPCTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT) const {
+
+ if (!VT.isSimple() || !Subtarget.hasVSX())
+ return false;
+
+ switch(VT.getSimpleVT().SimpleTy) {
+ default:
+ // For FP types that are currently not supported by PPC backend, return
+ // false. Examples: f16, f80.
+ return false;
+ case MVT::f32:
+ case MVT::f64:
+ case MVT::ppcf128:
+ return Imm.isPosZero();
+ }
+}
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.h b/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.h
index cc7222b..05acd25 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.h
+++ b/contrib/llvm/lib/Target/PowerPC/PPCISelLowering.h
@@ -47,9 +47,13 @@ namespace llvm {
FCTIDZ, FCTIWZ,
/// Newer FCTI[D,W]UZ floating-point-to-integer conversion instructions for
- /// unsigned integers.
+ /// unsigned integers with round toward zero.
FCTIDUZ, FCTIWUZ,
+ /// VEXTS, ByteWidth - takes an input in VSFRC and produces an output in
+ /// VSFRC that is sign-extended from ByteWidth to a 64-byte integer.
+ VEXTS,
+
/// Reciprocal estimate instructions (unary FP ops).
FRE, FRSQRTE,
@@ -365,6 +369,16 @@ namespace llvm {
/// destination 64-bit register.
LFIWZX,
+ /// GPRC, CHAIN = LXSIZX, CHAIN, Ptr, ByteWidth - This is a load of an
+ /// integer smaller than 64 bits into a VSR. The integer is zero-extended.
+ /// This can be used for converting loaded integers to floating point.
+ LXSIZX,
+
+ /// STXSIX - The STXSI[bh]X instruction. The first operand is an input
+ /// chain, then an f64 value to store, then an address to store it to,
+ /// followed by a byte-width for the store.
+ STXSIX,
+
/// VSRC, CHAIN = LXVD2X_LE CHAIN, Ptr - Occurs only for little endian.
/// Maps directly to an lxvd2x instruction that will be followed by
/// an xxswapd.
@@ -474,7 +488,7 @@ namespace llvm {
/// then the VPERM for the shuffle. All in all a very slow sequence.
TargetLoweringBase::LegalizeTypeAction getPreferredVectorAction(EVT VT)
const override {
- if (VT.getVectorElementType().getSizeInBits() % 8 == 0)
+ if (VT.getScalarSizeInBits() % 8 == 0)
return TypeWidenVector;
return TargetLoweringBase::getPreferredVectorAction(VT);
}
@@ -492,6 +506,14 @@ namespace llvm {
return true;
}
+ bool isCtlzFast() const override {
+ return true;
+ }
+
+ bool hasAndNotCompare(SDValue) const override {
+ return true;
+ }
+
bool supportSplitCSR(MachineFunction *MF) const override {
return
MF->getFunction()->getCallingConv() == CallingConv::CXX_FAST_TLS &&
@@ -747,18 +769,40 @@ namespace llvm {
bool useLoadStackGuardNode() const override;
void insertSSPDeclarations(Module &M) const override;
+ bool isFPImmLegal(const APFloat &Imm, EVT VT) const override;
+
+ unsigned getJumpTableEncoding() const override;
+ bool isJumpTableRelative() const override;
+ SDValue getPICJumpTableRelocBase(SDValue Table,
+ SelectionDAG &DAG) const override;
+ const MCExpr *getPICJumpTableRelocBaseExpr(const MachineFunction *MF,
+ unsigned JTI,
+ MCContext &Ctx) const override;
+
private:
struct ReuseLoadInfo {
SDValue Ptr;
SDValue Chain;
SDValue ResChain;
MachinePointerInfo MPI;
+ bool IsDereferenceable;
bool IsInvariant;
unsigned Alignment;
AAMDNodes AAInfo;
const MDNode *Ranges;
- ReuseLoadInfo() : IsInvariant(false), Alignment(0), Ranges(nullptr) {}
+ ReuseLoadInfo()
+ : IsDereferenceable(false), IsInvariant(false), Alignment(0),
+ Ranges(nullptr) {}
+
+ MachineMemOperand::Flags MMOFlags() const {
+ MachineMemOperand::Flags F = MachineMemOperand::MONone;
+ if (IsDereferenceable)
+ F |= MachineMemOperand::MODereferenceable;
+ if (IsInvariant)
+ F |= MachineMemOperand::MOInvariant;
+ return F;
+ }
};
bool canReuseLoadAddress(SDValue Op, EVT MemVT, ReuseLoadInfo &RLI,
@@ -771,6 +815,8 @@ namespace llvm {
SelectionDAG &DAG, const SDLoc &dl) const;
SDValue LowerFP_TO_INTDirectMove(SDValue Op, SelectionDAG &DAG,
const SDLoc &dl) const;
+
+ bool directMoveIsProfitable(const SDValue &Op) const;
SDValue LowerINT_TO_FPDirectMove(SDValue Op, SelectionDAG &DAG,
const SDLoc &dl) const;
@@ -933,14 +979,23 @@ namespace llvm {
SDValue DAGCombineTruncBoolExt(SDNode *N, DAGCombinerInfo &DCI) const;
SDValue combineFPToIntToFP(SDNode *N, DAGCombinerInfo &DCI) const;
- SDValue getRsqrtEstimate(SDValue Operand, DAGCombinerInfo &DCI,
- unsigned &RefinementSteps,
- bool &UseOneConstNR) const override;
- SDValue getRecipEstimate(SDValue Operand, DAGCombinerInfo &DCI,
- unsigned &RefinementSteps) const override;
+ /// ConvertSETCCToSubtract - looks at SETCC that compares ints. It replaces
+ /// SETCC with integer subtraction when (1) there is a legal way of doing it
+ /// (2) keeping the result of comparison in GPR has performance benefit.
+ SDValue ConvertSETCCToSubtract(SDNode *N, DAGCombinerInfo &DCI) const;
+
+ SDValue getSqrtEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
+ int &RefinementSteps, bool &UseOneConstNR,
+ bool Reciprocal) const override;
+ SDValue getRecipEstimate(SDValue Operand, SelectionDAG &DAG, int Enabled,
+ int &RefinementSteps) const override;
unsigned combineRepeatedFPDivisors() const override;
CCAssignFn *useFastISelCCs(unsigned Flag) const;
+
+ SDValue
+ combineElementTruncationToVectorTruncation(SDNode *N,
+ DAGCombinerInfo &DCI) const;
};
namespace PPC {
@@ -959,6 +1014,13 @@ namespace llvm {
ISD::ArgFlagsTy &ArgFlags,
CCState &State);
+ bool
+ CC_PPC32_SVR4_Custom_SkipLastArgRegsPPCF128(unsigned &ValNo, MVT &ValVT,
+ MVT &LocVT,
+ CCValAssign::LocInfo &LocInfo,
+ ISD::ArgFlagsTy &ArgFlags,
+ CCState &State);
+
bool CC_PPC32_SVR4_Custom_AlignFPArgRegs(unsigned &ValNo, MVT &ValVT,
MVT &LocVT,
CCValAssign::LocInfo &LocInfo,
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCInstr64Bit.td b/contrib/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
index 5e514c8..fbec878 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCInstr64Bit.td
@@ -65,16 +65,6 @@ def SRL64 : SDNodeXForm<imm, [{
: getI32Imm(0, SDLoc(N));
}]>;
-def HI32_48 : SDNodeXForm<imm, [{
- // Transformation function: shift the immediate value down into the low bits.
- return getI32Imm((unsigned short)(N->getZExtValue() >> 32, SDLoc(N)));
-}]>;
-
-def HI48_64 : SDNodeXForm<imm, [{
- // Transformation function: shift the immediate value down into the low bits.
- return getI32Imm((unsigned short)(N->getZExtValue() >> 48, SDLoc(N)));
-}]>;
-
//===----------------------------------------------------------------------===//
// Calls.
@@ -1164,6 +1154,9 @@ defm FCFID : XForm_26r<63, 846, (outs f8rc:$frD), (ins f8rc:$frB),
defm FCTID : XForm_26r<63, 814, (outs f8rc:$frD), (ins f8rc:$frB),
"fctid", "$frD, $frB", IIC_FPGeneral,
[]>, isPPC64;
+defm FCTIDU : XForm_26r<63, 942, (outs f8rc:$frD), (ins f8rc:$frB),
+ "fctidu", "$frD, $frB", IIC_FPGeneral,
+ []>, isPPC64;
defm FCTIDZ : XForm_26r<63, 815, (outs f8rc:$frD), (ins f8rc:$frB),
"fctidz", "$frD, $frB", IIC_FPGeneral,
[(set f64:$frD, (PPCfctidz f64:$frB))]>, isPPC64;
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCInstrAltivec.td b/contrib/llvm/lib/Target/PowerPC/PPCInstrAltivec.td
index e1c4673..5c02274 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCInstrAltivec.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCInstrAltivec.td
@@ -26,6 +26,7 @@
// ** in PPCVSXSwapRemoval::gatherVectorInstructions(). **
// ****************************************************************************
+
//===----------------------------------------------------------------------===//
// Altivec transformation functions and pattern fragments.
//
@@ -242,7 +243,7 @@ def VSPLTISB_get_imm : SDNodeXForm<build_vector, [{
return PPC::get_VSPLTI_elt(N, 1, *CurDAG);
}]>;
def vecspltisb : PatLeaf<(build_vector), [{
- return PPC::get_VSPLTI_elt(N, 1, *CurDAG).getNode() != 0;
+ return PPC::get_VSPLTI_elt(N, 1, *CurDAG).getNode() != nullptr;
}], VSPLTISB_get_imm>;
// VSPLTISH_get_imm xform function: convert build_vector to VSPLTISH imm.
@@ -250,7 +251,7 @@ def VSPLTISH_get_imm : SDNodeXForm<build_vector, [{
return PPC::get_VSPLTI_elt(N, 2, *CurDAG);
}]>;
def vecspltish : PatLeaf<(build_vector), [{
- return PPC::get_VSPLTI_elt(N, 2, *CurDAG).getNode() != 0;
+ return PPC::get_VSPLTI_elt(N, 2, *CurDAG).getNode() != nullptr;
}], VSPLTISH_get_imm>;
// VSPLTISW_get_imm xform function: convert build_vector to VSPLTISW imm.
@@ -258,7 +259,7 @@ def VSPLTISW_get_imm : SDNodeXForm<build_vector, [{
return PPC::get_VSPLTI_elt(N, 4, *CurDAG);
}]>;
def vecspltisw : PatLeaf<(build_vector), [{
- return PPC::get_VSPLTI_elt(N, 4, *CurDAG).getNode() != 0;
+ return PPC::get_VSPLTI_elt(N, 4, *CurDAG).getNode() != nullptr;
}], VSPLTISW_get_imm>;
//===----------------------------------------------------------------------===//
@@ -706,6 +707,12 @@ def VSPLTW : VXForm_1<652, (outs vrrc:$vD), (ins u5imm:$UIMM, vrrc:$vB),
"vspltw $vD, $vB, $UIMM", IIC_VecPerm,
[(set v16i8:$vD,
(vspltw_shuffle:$UIMM v16i8:$vB, (undef)))]>;
+let isCodeGenOnly = 1 in {
+ def VSPLTBs : VXForm_1<524, (outs vrrc:$vD), (ins u5imm:$UIMM, vfrc:$vB),
+ "vspltb $vD, $vB, $UIMM", IIC_VecPerm, []>;
+ def VSPLTHs : VXForm_1<588, (outs vrrc:$vD), (ins u5imm:$UIMM, vfrc:$vB),
+ "vsplth $vD, $vB, $UIMM", IIC_VecPerm, []>;
+}
def VSR : VX1_Int_Ty< 708, "vsr" , int_ppc_altivec_vsr, v4i32>;
def VSRO : VX1_Int_Ty<1100, "vsro" , int_ppc_altivec_vsro, v4i32>;
@@ -1218,34 +1225,23 @@ def VSBOX : VXBX_Int_Ty<1480, "vsbox", int_ppc_altivec_crypto_vsbox, v2i64>;
def HasP9Altivec : Predicate<"PPCSubTarget->hasP9Altivec()">;
let Predicates = [HasP9Altivec] in {
-// Vector Compare Not Equal (Zero)
-class P9VCMP<bits<10> xo, string asmstr, ValueType Ty>
- : VXRForm_1<xo, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB), asmstr,
- IIC_VecFPCompare, []>;
-class P9VCMPo<bits<10> xo, string asmstr, ValueType Ty>
- : VXRForm_1<xo, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB), asmstr,
- IIC_VecFPCompare, []> {
- let Defs = [CR6];
- let RC = 1;
-}
-
// i8 element comparisons.
-def VCMPNEB : P9VCMP < 7, "vcmpneb $vD, $vA, $vB" , v16i8>;
-def VCMPNEBo : P9VCMPo< 7, "vcmpneb. $vD, $vA, $vB" , v16i8>;
-def VCMPNEZB : P9VCMP <263, "vcmpnezb $vD, $vA, $vB" , v16i8>;
-def VCMPNEZBo : P9VCMPo<263, "vcmpnezb. $vD, $vA, $vB", v16i8>;
+def VCMPNEB : VCMP < 7, "vcmpneb $vD, $vA, $vB" , v16i8>;
+def VCMPNEBo : VCMPo < 7, "vcmpneb. $vD, $vA, $vB" , v16i8>;
+def VCMPNEZB : VCMP <263, "vcmpnezb $vD, $vA, $vB" , v16i8>;
+def VCMPNEZBo : VCMPo<263, "vcmpnezb. $vD, $vA, $vB", v16i8>;
// i16 element comparisons.
-def VCMPNEH : P9VCMP < 71, "vcmpneh $vD, $vA, $vB" , v8i16>;
-def VCMPNEHo : P9VCMPo< 71, "vcmpneh. $vD, $vA, $vB" , v8i16>;
-def VCMPNEZH : P9VCMP <327, "vcmpnezh $vD, $vA, $vB" , v8i16>;
-def VCMPNEZHo : P9VCMPo<327, "vcmpnezh. $vD, $vA, $vB", v8i16>;
+def VCMPNEH : VCMP < 71, "vcmpneh $vD, $vA, $vB" , v8i16>;
+def VCMPNEHo : VCMPo< 71, "vcmpneh. $vD, $vA, $vB" , v8i16>;
+def VCMPNEZH : VCMP <327, "vcmpnezh $vD, $vA, $vB" , v8i16>;
+def VCMPNEZHo : VCMPo<327, "vcmpnezh. $vD, $vA, $vB", v8i16>;
// i32 element comparisons.
-def VCMPNEW : P9VCMP <135, "vcmpnew $vD, $vA, $vB" , v4i32>;
-def VCMPNEWo : P9VCMPo<135, "vcmpnew. $vD, $vA, $vB" , v4i32>;
-def VCMPNEZW : P9VCMP <391, "vcmpnezw $vD, $vA, $vB" , v4i32>;
-def VCMPNEZWo : P9VCMPo<391, "vcmpnezw. $vD, $vA, $vB", v4i32>;
+def VCMPNEW : VCMP <135, "vcmpnew $vD, $vA, $vB" , v4i32>;
+def VCMPNEWo : VCMPo<135, "vcmpnew. $vD, $vA, $vB" , v4i32>;
+def VCMPNEZW : VCMP <391, "vcmpnezw $vD, $vA, $vB" , v4i32>;
+def VCMPNEZWo : VCMPo<391, "vcmpnezw. $vD, $vA, $vB", v4i32>;
// VX-Form: [PO VRT / UIM VRB XO].
// We use VXForm_1 to implement it, that is, we use "VRA" (5 bit) to represent
@@ -1281,17 +1277,28 @@ def VINSERTD : VX1_VT5_UIM5_VB5<973, "vinsertd", []>;
class VX_VT5_EO5_VB5<bits<11> xo, bits<5> eo, string opc, list<dag> pattern>
: VXForm_RD5_XO5_RS5<xo, eo, (outs vrrc:$vD), (ins vrrc:$vB),
!strconcat(opc, " $vD, $vB"), IIC_VecGeneral, pattern>;
+class VX_VT5_EO5_VB5s<bits<11> xo, bits<5> eo, string opc, list<dag> pattern>
+ : VXForm_RD5_XO5_RS5<xo, eo, (outs vfrc:$vD), (ins vfrc:$vB),
+ !strconcat(opc, " $vD, $vB"), IIC_VecGeneral, pattern>;
// Vector Count Leading/Trailing Zero LSB. Result is placed into GPR[rD]
-def VCLZLSBB : VXForm_RD5_XO5_RS5<1538, 0, (outs g8rc:$rD), (ins vrrc:$vB),
- "vclzlsbb $rD, $vB", IIC_VecGeneral, []>;
-def VCTZLSBB : VXForm_RD5_XO5_RS5<1538, 1, (outs g8rc:$rD), (ins vrrc:$vB),
- "vctzlsbb $rD, $vB", IIC_VecGeneral, []>;
+def VCLZLSBB : VXForm_RD5_XO5_RS5<1538, 0, (outs gprc:$rD), (ins vrrc:$vB),
+ "vclzlsbb $rD, $vB", IIC_VecGeneral,
+ [(set i32:$rD, (int_ppc_altivec_vclzlsbb
+ v16i8:$vB))]>;
+def VCTZLSBB : VXForm_RD5_XO5_RS5<1538, 1, (outs gprc:$rD), (ins vrrc:$vB),
+ "vctzlsbb $rD, $vB", IIC_VecGeneral,
+ [(set i32:$rD, (int_ppc_altivec_vctzlsbb
+ v16i8:$vB))]>;
// Vector Count Trailing Zeros
-def VCTZB : VX_VT5_EO5_VB5<1538, 28, "vctzb", []>;
-def VCTZH : VX_VT5_EO5_VB5<1538, 29, "vctzh", []>;
-def VCTZW : VX_VT5_EO5_VB5<1538, 30, "vctzw", []>;
-def VCTZD : VX_VT5_EO5_VB5<1538, 31, "vctzd", []>;
+def VCTZB : VX_VT5_EO5_VB5<1538, 28, "vctzb",
+ [(set v16i8:$vD, (cttz v16i8:$vB))]>;
+def VCTZH : VX_VT5_EO5_VB5<1538, 29, "vctzh",
+ [(set v8i16:$vD, (cttz v8i16:$vB))]>;
+def VCTZW : VX_VT5_EO5_VB5<1538, 30, "vctzw",
+ [(set v4i32:$vD, (cttz v4i32:$vB))]>;
+def VCTZD : VX_VT5_EO5_VB5<1538, 31, "vctzd",
+ [(set v2i64:$vD, (cttz v2i64:$vB))]>;
// Vector Extend Sign
def VEXTSB2W : VX_VT5_EO5_VB5<1538, 16, "vextsb2w", []>;
@@ -1299,15 +1306,31 @@ def VEXTSH2W : VX_VT5_EO5_VB5<1538, 17, "vextsh2w", []>;
def VEXTSB2D : VX_VT5_EO5_VB5<1538, 24, "vextsb2d", []>;
def VEXTSH2D : VX_VT5_EO5_VB5<1538, 25, "vextsh2d", []>;
def VEXTSW2D : VX_VT5_EO5_VB5<1538, 26, "vextsw2d", []>;
+let isCodeGenOnly = 1 in {
+ def VEXTSB2Ws : VX_VT5_EO5_VB5s<1538, 16, "vextsb2w", []>;
+ def VEXTSH2Ws : VX_VT5_EO5_VB5s<1538, 17, "vextsh2w", []>;
+ def VEXTSB2Ds : VX_VT5_EO5_VB5s<1538, 24, "vextsb2d", []>;
+ def VEXTSH2Ds : VX_VT5_EO5_VB5s<1538, 25, "vextsh2d", []>;
+ def VEXTSW2Ds : VX_VT5_EO5_VB5s<1538, 26, "vextsw2d", []>;
+}
// Vector Integer Negate
-def VNEGW : VX_VT5_EO5_VB5<1538, 6, "vnegw", []>;
-def VNEGD : VX_VT5_EO5_VB5<1538, 7, "vnegd", []>;
+def VNEGW : VX_VT5_EO5_VB5<1538, 6, "vnegw",
+ [(set v4i32:$vD,
+ (sub (v4i32 immAllZerosV), v4i32:$vB))]>;
+
+def VNEGD : VX_VT5_EO5_VB5<1538, 7, "vnegd",
+ [(set v2i64:$vD,
+ (sub (v2i64 (bitconvert (v4i32 immAllZerosV))),
+ v2i64:$vB))]>;
// Vector Parity Byte
-def VPRTYBW : VX_VT5_EO5_VB5<1538, 8, "vprtybw", []>;
-def VPRTYBD : VX_VT5_EO5_VB5<1538, 9, "vprtybd", []>;
-def VPRTYBQ : VX_VT5_EO5_VB5<1538, 10, "vprtybq", []>;
+def VPRTYBW : VX_VT5_EO5_VB5<1538, 8, "vprtybw", [(set v4i32:$vD,
+ (int_ppc_altivec_vprtybw v4i32:$vB))]>;
+def VPRTYBD : VX_VT5_EO5_VB5<1538, 9, "vprtybd", [(set v2i64:$vD,
+ (int_ppc_altivec_vprtybd v2i64:$vB))]>;
+def VPRTYBQ : VX_VT5_EO5_VB5<1538, 10, "vprtybq", [(set v1i128:$vD,
+ (int_ppc_altivec_vprtybq v1i128:$vB))]>;
// Vector (Bit) Permute (Right-indexed)
def VBPERMD : VXForm_1<1484, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),
@@ -1320,14 +1343,32 @@ class VX1_VT5_VA5_VB5<bits<11> xo, string opc, list<dag> pattern>
!strconcat(opc, " $vD, $vA, $vB"), IIC_VecFP, pattern>;
// Vector Rotate Left Mask/Mask-Insert
-def VRLWNM : VX1_VT5_VA5_VB5<389, "vrlwnm", []>;
-def VRLWMI : VX1_VT5_VA5_VB5<133, "vrlwmi", []>;
-def VRLDNM : VX1_VT5_VA5_VB5<453, "vrldnm", []>;
-def VRLDMI : VX1_VT5_VA5_VB5<197, "vrldmi", []>;
+def VRLWNM : VX1_VT5_VA5_VB5<389, "vrlwnm",
+ [(set v4i32:$vD,
+ (int_ppc_altivec_vrlwnm v4i32:$vA,
+ v4i32:$vB))]>;
+def VRLWMI : VXForm_1<133, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB, vrrc:$vDi),
+ "vrlwmi $vD, $vA, $vB", IIC_VecFP,
+ [(set v4i32:$vD,
+ (int_ppc_altivec_vrlwmi v4i32:$vA, v4i32:$vB,
+ v4i32:$vDi))]>,
+ RegConstraint<"$vDi = $vD">, NoEncode<"$vDi">;
+def VRLDNM : VX1_VT5_VA5_VB5<453, "vrldnm",
+ [(set v2i64:$vD,
+ (int_ppc_altivec_vrldnm v2i64:$vA,
+ v2i64:$vB))]>;
+def VRLDMI : VXForm_1<197, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB, vrrc:$vDi),
+ "vrldmi $vD, $vA, $vB", IIC_VecFP,
+ [(set v2i64:$vD,
+ (int_ppc_altivec_vrldmi v2i64:$vA, v2i64:$vB,
+ v2i64:$vDi))]>,
+ RegConstraint<"$vDi = $vD">, NoEncode<"$vDi">;
// Vector Shift Left/Right
-def VSLV : VX1_VT5_VA5_VB5<1860, "vslv", []>;
-def VSRV : VX1_VT5_VA5_VB5<1796, "vsrv", []>;
+def VSLV : VX1_VT5_VA5_VB5<1860, "vslv",
+ [(set v16i8 : $vD, (int_ppc_altivec_vslv v16i8 : $vA, v16i8 : $vB))]>;
+def VSRV : VX1_VT5_VA5_VB5<1796, "vsrv",
+ [(set v16i8 : $vD, (int_ppc_altivec_vsrv v16i8 : $vA, v16i8 : $vB))]>;
// Vector Multiply-by-10 (& Write Carry) Unsigned Quadword
def VMUL10UQ : VXForm_BX<513, (outs vrrc:$vD), (ins vrrc:$vA),
@@ -1396,4 +1437,15 @@ def BCDSRo : VX_VT5_VA5_VB5_PS1_XO9_o<449, "bcdsr.", []>;
// Decimal (Unsigned) Truncate
def BCDTRUNCo : VX_VT5_VA5_VB5_PS1_XO9_o<257, "bcdtrunc." , []>;
def BCDUTRUNCo : VX_VT5_VA5_VB5_XO9_o <321, "bcdutrunc.", []>;
+
+// Absolute Difference
+def VABSDUB : VXForm_1<1027, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),
+ "vabsdub $vD, $vA, $vB", IIC_VecGeneral,
+ [(set v16i8:$vD, (int_ppc_altivec_vabsdub v16i8:$vA, v16i8:$vB))]>;
+def VABSDUH : VXForm_1<1091, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),
+ "vabsduh $vD, $vA, $vB", IIC_VecGeneral,
+ [(set v8i16:$vD, (int_ppc_altivec_vabsduh v8i16:$vA, v8i16:$vB))]>;
+def VABSDUW : VXForm_1<1155, (outs vrrc:$vD), (ins vrrc:$vA, vrrc:$vB),
+ "vabsduw $vD, $vA, $vB", IIC_VecGeneral,
+ [(set v4i32:$vD, (int_ppc_altivec_vabsduw v4i32:$vA, v4i32:$vB))]>;
} // end HasP9Altivec
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCInstrFormats.td b/contrib/llvm/lib/Target/PowerPC/PPCInstrFormats.td
index 5acff75..ef7d201 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCInstrFormats.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCInstrFormats.td
@@ -38,6 +38,14 @@ class I<bits<6> opcode, dag OOL, dag IOL, string asmstr, InstrItinClass itin>
let TSFlags{2} = PPC970_Cracked;
let TSFlags{5-3} = PPC970_Unit;
+ /// Indicate that the VSX instruction is to use VSX numbering/encoding.
+ /// Since ISA 3.0, there are scalar instructions that use the upper
+ /// half of the VSX register set only. Rather than adding further complexity
+ /// to the register class set, the VSX registers just include the Altivec
+ /// registers and this flag decides the numbering to be used for them.
+ bits<1> UseVSXReg = 0;
+ let TSFlags{6} = UseVSXReg;
+
// Fields used for relation models.
string BaseName = "";
@@ -62,6 +70,8 @@ class PPC970_Unit_VALU { bits<3> PPC970_Unit = 5; }
class PPC970_Unit_VPERM { bits<3> PPC970_Unit = 6; }
class PPC970_Unit_BRU { bits<3> PPC970_Unit = 7; }
+class UseVSXReg { bits<1> UseVSXReg = 1; }
+
// Two joined instructions; used to emit two adjacent instructions as one.
// The itinerary from the first instruction is used for scheduling and
// classification.
@@ -163,6 +173,22 @@ class BForm_3<bits<6> opcode, bit aa, bit lk,
let Inst{31} = lk;
}
+class BForm_3_at<bits<6> opcode, bit aa, bit lk,
+ dag OOL, dag IOL, string asmstr>
+ : I<opcode, OOL, IOL, asmstr, IIC_BrB> {
+ bits<5> BO;
+ bits<2> at;
+ bits<5> BI;
+ bits<14> BD;
+
+ let Inst{6-8} = BO{4-2};
+ let Inst{9-10} = at;
+ let Inst{11-15} = BI;
+ let Inst{16-29} = BD;
+ let Inst{30} = aa;
+ let Inst{31} = lk;
+}
+
class BForm_4<bits<6> opcode, bits<5> bo, bit aa, bit lk,
dag OOL, dag IOL, string asmstr>
: I<opcode, OOL, IOL, asmstr, IIC_BrB> {
@@ -577,6 +603,12 @@ class XForm_17<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
let Inst{31} = 0;
}
+class XForm_17a<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
+ InstrItinClass itin>
+ : XForm_17<opcode, xo, OOL, IOL, asmstr, itin > {
+ let FRA = 0;
+}
+
// Used for QPX
class XForm_18<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
InstrItinClass itin, list<dag> pattern>
@@ -1043,6 +1075,20 @@ class XX3Form<bits<6> opcode, bits<8> xo, dag OOL, dag IOL, string asmstr,
let Inst{31} = XT{5};
}
+class XX3Form_Zero<bits<6> opcode, bits<8> xo, dag OOL, dag IOL, string asmstr,
+ InstrItinClass itin, list<dag> pattern>
+ : XX3Form<opcode, xo, OOL, IOL, asmstr, itin, pattern> {
+ let XA = XT;
+ let XB = XT;
+}
+
+class XX3Form_SetZero<bits<6> opcode, bits<8> xo, dag OOL, dag IOL, string asmstr,
+ InstrItinClass itin, list<dag> pattern>
+ : XX3Form<opcode, xo, OOL, IOL, asmstr, itin, pattern> {
+ let XB = XT;
+ let XA = XT;
+}
+
class XX3Form_1<bits<6> opcode, bits<8> xo, dag OOL, dag IOL, string asmstr,
InstrItinClass itin, list<dag> pattern>
: I<opcode, OOL, IOL, asmstr, itin> {
@@ -1193,6 +1239,25 @@ class XLForm_1<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
let Inst{31} = 0;
}
+class XLForm_1_np<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
+ InstrItinClass itin, list<dag> pattern>
+ : XLForm_1<opcode, xo, OOL, IOL, asmstr, itin, pattern> {
+ let CRD = 0;
+ let CRA = 0;
+ let CRB = 0;
+}
+
+class XLForm_1_gen<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
+ InstrItinClass itin, list<dag> pattern>
+ : XLForm_1<opcode, xo, OOL, IOL, asmstr, itin, pattern> {
+ bits<5> RT;
+ bits<5> RB;
+
+ let CRD = RT;
+ let CRA = 0;
+ let CRB = RB;
+}
+
class XLForm_1_ext<bits<6> opcode, bits<10> xo, dag OOL, dag IOL, string asmstr,
InstrItinClass itin, list<dag> pattern>
: I<opcode, OOL, IOL, asmstr, itin> {
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp b/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
index b6ae70e..2e0b935 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.cpp
@@ -273,6 +273,7 @@ unsigned PPCInstrInfo::isLoadFromStackSlot(const MachineInstr &MI,
case PPC::RESTORE_CRBIT:
case PPC::LVX:
case PPC::LXVD2X:
+ case PPC::LXVX:
case PPC::QVLFDX:
case PPC::QVLFSXs:
case PPC::QVLFDXb:
@@ -302,6 +303,7 @@ unsigned PPCInstrInfo::isStoreToStackSlot(const MachineInstr &MI,
case PPC::SPILL_CRBIT:
case PPC::STVX:
case PPC::STXVD2X:
+ case PPC::STXVX:
case PPC::QVSTFDX:
case PPC::QVSTFSXs:
case PPC::QVSTFDXb:
@@ -460,57 +462,57 @@ bool PPCInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
return false;
// Get the last instruction in the block.
- MachineInstr *LastInst = I;
+ MachineInstr &LastInst = *I;
// If there is only one terminator instruction, process it.
if (I == MBB.begin() || !isUnpredicatedTerminator(*--I)) {
- if (LastInst->getOpcode() == PPC::B) {
- if (!LastInst->getOperand(0).isMBB())
+ if (LastInst.getOpcode() == PPC::B) {
+ if (!LastInst.getOperand(0).isMBB())
return true;
- TBB = LastInst->getOperand(0).getMBB();
+ TBB = LastInst.getOperand(0).getMBB();
return false;
- } else if (LastInst->getOpcode() == PPC::BCC) {
- if (!LastInst->getOperand(2).isMBB())
+ } else if (LastInst.getOpcode() == PPC::BCC) {
+ if (!LastInst.getOperand(2).isMBB())
return true;
// Block ends with fall-through condbranch.
- TBB = LastInst->getOperand(2).getMBB();
- Cond.push_back(LastInst->getOperand(0));
- Cond.push_back(LastInst->getOperand(1));
+ TBB = LastInst.getOperand(2).getMBB();
+ Cond.push_back(LastInst.getOperand(0));
+ Cond.push_back(LastInst.getOperand(1));
return false;
- } else if (LastInst->getOpcode() == PPC::BC) {
- if (!LastInst->getOperand(1).isMBB())
+ } else if (LastInst.getOpcode() == PPC::BC) {
+ if (!LastInst.getOperand(1).isMBB())
return true;
// Block ends with fall-through condbranch.
- TBB = LastInst->getOperand(1).getMBB();
+ TBB = LastInst.getOperand(1).getMBB();
Cond.push_back(MachineOperand::CreateImm(PPC::PRED_BIT_SET));
- Cond.push_back(LastInst->getOperand(0));
+ Cond.push_back(LastInst.getOperand(0));
return false;
- } else if (LastInst->getOpcode() == PPC::BCn) {
- if (!LastInst->getOperand(1).isMBB())
+ } else if (LastInst.getOpcode() == PPC::BCn) {
+ if (!LastInst.getOperand(1).isMBB())
return true;
// Block ends with fall-through condbranch.
- TBB = LastInst->getOperand(1).getMBB();
+ TBB = LastInst.getOperand(1).getMBB();
Cond.push_back(MachineOperand::CreateImm(PPC::PRED_BIT_UNSET));
- Cond.push_back(LastInst->getOperand(0));
+ Cond.push_back(LastInst.getOperand(0));
return false;
- } else if (LastInst->getOpcode() == PPC::BDNZ8 ||
- LastInst->getOpcode() == PPC::BDNZ) {
- if (!LastInst->getOperand(0).isMBB())
+ } else if (LastInst.getOpcode() == PPC::BDNZ8 ||
+ LastInst.getOpcode() == PPC::BDNZ) {
+ if (!LastInst.getOperand(0).isMBB())
return true;
if (DisableCTRLoopAnal)
return true;
- TBB = LastInst->getOperand(0).getMBB();
+ TBB = LastInst.getOperand(0).getMBB();
Cond.push_back(MachineOperand::CreateImm(1));
Cond.push_back(MachineOperand::CreateReg(isPPC64 ? PPC::CTR8 : PPC::CTR,
true));
return false;
- } else if (LastInst->getOpcode() == PPC::BDZ8 ||
- LastInst->getOpcode() == PPC::BDZ) {
- if (!LastInst->getOperand(0).isMBB())
+ } else if (LastInst.getOpcode() == PPC::BDZ8 ||
+ LastInst.getOpcode() == PPC::BDZ) {
+ if (!LastInst.getOperand(0).isMBB())
return true;
if (DisableCTRLoopAnal)
return true;
- TBB = LastInst->getOperand(0).getMBB();
+ TBB = LastInst.getOperand(0).getMBB();
Cond.push_back(MachineOperand::CreateImm(0));
Cond.push_back(MachineOperand::CreateReg(isPPC64 ? PPC::CTR8 : PPC::CTR,
true));
@@ -522,80 +524,79 @@ bool PPCInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
}
// Get the instruction before it if it's a terminator.
- MachineInstr *SecondLastInst = I;
+ MachineInstr &SecondLastInst = *I;
// If there are three terminators, we don't know what sort of block this is.
- if (SecondLastInst && I != MBB.begin() && isUnpredicatedTerminator(*--I))
+ if (I != MBB.begin() && isUnpredicatedTerminator(*--I))
return true;
// If the block ends with PPC::B and PPC:BCC, handle it.
- if (SecondLastInst->getOpcode() == PPC::BCC &&
- LastInst->getOpcode() == PPC::B) {
- if (!SecondLastInst->getOperand(2).isMBB() ||
- !LastInst->getOperand(0).isMBB())
+ if (SecondLastInst.getOpcode() == PPC::BCC &&
+ LastInst.getOpcode() == PPC::B) {
+ if (!SecondLastInst.getOperand(2).isMBB() ||
+ !LastInst.getOperand(0).isMBB())
return true;
- TBB = SecondLastInst->getOperand(2).getMBB();
- Cond.push_back(SecondLastInst->getOperand(0));
- Cond.push_back(SecondLastInst->getOperand(1));
- FBB = LastInst->getOperand(0).getMBB();
+ TBB = SecondLastInst.getOperand(2).getMBB();
+ Cond.push_back(SecondLastInst.getOperand(0));
+ Cond.push_back(SecondLastInst.getOperand(1));
+ FBB = LastInst.getOperand(0).getMBB();
return false;
- } else if (SecondLastInst->getOpcode() == PPC::BC &&
- LastInst->getOpcode() == PPC::B) {
- if (!SecondLastInst->getOperand(1).isMBB() ||
- !LastInst->getOperand(0).isMBB())
+ } else if (SecondLastInst.getOpcode() == PPC::BC &&
+ LastInst.getOpcode() == PPC::B) {
+ if (!SecondLastInst.getOperand(1).isMBB() ||
+ !LastInst.getOperand(0).isMBB())
return true;
- TBB = SecondLastInst->getOperand(1).getMBB();
+ TBB = SecondLastInst.getOperand(1).getMBB();
Cond.push_back(MachineOperand::CreateImm(PPC::PRED_BIT_SET));
- Cond.push_back(SecondLastInst->getOperand(0));
- FBB = LastInst->getOperand(0).getMBB();
+ Cond.push_back(SecondLastInst.getOperand(0));
+ FBB = LastInst.getOperand(0).getMBB();
return false;
- } else if (SecondLastInst->getOpcode() == PPC::BCn &&
- LastInst->getOpcode() == PPC::B) {
- if (!SecondLastInst->getOperand(1).isMBB() ||
- !LastInst->getOperand(0).isMBB())
+ } else if (SecondLastInst.getOpcode() == PPC::BCn &&
+ LastInst.getOpcode() == PPC::B) {
+ if (!SecondLastInst.getOperand(1).isMBB() ||
+ !LastInst.getOperand(0).isMBB())
return true;
- TBB = SecondLastInst->getOperand(1).getMBB();
+ TBB = SecondLastInst.getOperand(1).getMBB();
Cond.push_back(MachineOperand::CreateImm(PPC::PRED_BIT_UNSET));
- Cond.push_back(SecondLastInst->getOperand(0));
- FBB = LastInst->getOperand(0).getMBB();
+ Cond.push_back(SecondLastInst.getOperand(0));
+ FBB = LastInst.getOperand(0).getMBB();
return false;
- } else if ((SecondLastInst->getOpcode() == PPC::BDNZ8 ||
- SecondLastInst->getOpcode() == PPC::BDNZ) &&
- LastInst->getOpcode() == PPC::B) {
- if (!SecondLastInst->getOperand(0).isMBB() ||
- !LastInst->getOperand(0).isMBB())
+ } else if ((SecondLastInst.getOpcode() == PPC::BDNZ8 ||
+ SecondLastInst.getOpcode() == PPC::BDNZ) &&
+ LastInst.getOpcode() == PPC::B) {
+ if (!SecondLastInst.getOperand(0).isMBB() ||
+ !LastInst.getOperand(0).isMBB())
return true;
if (DisableCTRLoopAnal)
return true;
- TBB = SecondLastInst->getOperand(0).getMBB();
+ TBB = SecondLastInst.getOperand(0).getMBB();
Cond.push_back(MachineOperand::CreateImm(1));
Cond.push_back(MachineOperand::CreateReg(isPPC64 ? PPC::CTR8 : PPC::CTR,
true));
- FBB = LastInst->getOperand(0).getMBB();
+ FBB = LastInst.getOperand(0).getMBB();
return false;
- } else if ((SecondLastInst->getOpcode() == PPC::BDZ8 ||
- SecondLastInst->getOpcode() == PPC::BDZ) &&
- LastInst->getOpcode() == PPC::B) {
- if (!SecondLastInst->getOperand(0).isMBB() ||
- !LastInst->getOperand(0).isMBB())
+ } else if ((SecondLastInst.getOpcode() == PPC::BDZ8 ||
+ SecondLastInst.getOpcode() == PPC::BDZ) &&
+ LastInst.getOpcode() == PPC::B) {
+ if (!SecondLastInst.getOperand(0).isMBB() ||
+ !LastInst.getOperand(0).isMBB())
return true;
if (DisableCTRLoopAnal)
return true;
- TBB = SecondLastInst->getOperand(0).getMBB();
+ TBB = SecondLastInst.getOperand(0).getMBB();
Cond.push_back(MachineOperand::CreateImm(0));
Cond.push_back(MachineOperand::CreateReg(isPPC64 ? PPC::CTR8 : PPC::CTR,
true));
- FBB = LastInst->getOperand(0).getMBB();
+ FBB = LastInst.getOperand(0).getMBB();
return false;
}
// If the block ends with two PPC:Bs, handle it. The second one is not
// executed, so remove it.
- if (SecondLastInst->getOpcode() == PPC::B &&
- LastInst->getOpcode() == PPC::B) {
- if (!SecondLastInst->getOperand(0).isMBB())
+ if (SecondLastInst.getOpcode() == PPC::B && LastInst.getOpcode() == PPC::B) {
+ if (!SecondLastInst.getOperand(0).isMBB())
return true;
- TBB = SecondLastInst->getOperand(0).getMBB();
+ TBB = SecondLastInst.getOperand(0).getMBB();
I = LastInst;
if (AllowModify)
I->eraseFromParent();
@@ -606,7 +607,10 @@ bool PPCInstrInfo::analyzeBranch(MachineBasicBlock &MBB,
return true;
}
-unsigned PPCInstrInfo::RemoveBranch(MachineBasicBlock &MBB) const {
+unsigned PPCInstrInfo::removeBranch(MachineBasicBlock &MBB,
+ int *BytesRemoved) const {
+ assert(!BytesRemoved && "code size not handled");
+
MachineBasicBlock::iterator I = MBB.getLastNonDebugInstr();
if (I == MBB.end())
return 0;
@@ -635,15 +639,17 @@ unsigned PPCInstrInfo::RemoveBranch(MachineBasicBlock &MBB) const {
return 2;
}
-unsigned PPCInstrInfo::InsertBranch(MachineBasicBlock &MBB,
+unsigned PPCInstrInfo::insertBranch(MachineBasicBlock &MBB,
MachineBasicBlock *TBB,
MachineBasicBlock *FBB,
ArrayRef<MachineOperand> Cond,
- const DebugLoc &DL) const {
+ const DebugLoc &DL,
+ int *BytesAdded) const {
// Shouldn't be a fall through.
- assert(TBB && "InsertBranch must not be told to insert a fallthrough");
+ assert(TBB && "insertBranch must not be told to insert a fallthrough");
assert((Cond.size() == 2 || Cond.size() == 0) &&
"PPC branch conditions have two components!");
+ assert(!BytesAdded && "code size not handled");
bool isPPC64 = Subtarget.isPPC64();
@@ -853,15 +859,6 @@ void PPCInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
llvm_unreachable("nop VSX copy");
DestReg = SuperReg;
- } else if (PPC::VRRCRegClass.contains(DestReg) &&
- PPC::VSRCRegClass.contains(SrcReg)) {
- unsigned SuperReg =
- TRI->getMatchingSuperReg(DestReg, PPC::sub_128, &PPC::VSRCRegClass);
-
- if (VSXSelfCopyCrash && SrcReg == SuperReg)
- llvm_unreachable("nop VSX copy");
-
- DestReg = SuperReg;
} else if (PPC::F8RCRegClass.contains(SrcReg) &&
PPC::VSRCRegClass.contains(DestReg)) {
unsigned SuperReg =
@@ -871,15 +868,6 @@ void PPCInstrInfo::copyPhysReg(MachineBasicBlock &MBB,
llvm_unreachable("nop VSX copy");
SrcReg = SuperReg;
- } else if (PPC::VRRCRegClass.contains(SrcReg) &&
- PPC::VSRCRegClass.contains(DestReg)) {
- unsigned SuperReg =
- TRI->getMatchingSuperReg(SrcReg, PPC::sub_128, &PPC::VSRCRegClass);
-
- if (VSXSelfCopyCrash && DestReg == SuperReg)
- llvm_unreachable("nop VSX copy");
-
- SrcReg = SuperReg;
}
// Different class register copy
@@ -1004,19 +992,22 @@ PPCInstrInfo::StoreRegToStackSlot(MachineFunction &MF,
FrameIdx));
NonRI = true;
} else if (PPC::VSRCRegClass.hasSubClassEq(RC)) {
- NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::STXVD2X))
+ unsigned Op = Subtarget.hasP9Vector() ? PPC::STXVX : PPC::STXVD2X;
+ NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(Op))
.addReg(SrcReg,
getKillRegState(isKill)),
FrameIdx));
NonRI = true;
} else if (PPC::VSFRCRegClass.hasSubClassEq(RC)) {
- NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::STXSDX))
+ unsigned Opc = Subtarget.hasP9Vector() ? PPC::DFSTOREf64 : PPC::STXSDX;
+ NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(Opc))
.addReg(SrcReg,
getKillRegState(isKill)),
FrameIdx));
NonRI = true;
} else if (PPC::VSSRCRegClass.hasSubClassEq(RC)) {
- NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::STXSSPX))
+ unsigned Opc = Subtarget.hasP9Vector() ? PPC::DFSTOREf32 : PPC::STXSSPX;
+ NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(Opc))
.addReg(SrcReg,
getKillRegState(isKill)),
FrameIdx));
@@ -1066,6 +1057,15 @@ PPCInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();
FuncInfo->setHasSpills();
+ // We need to avoid a situation in which the value from a VRRC register is
+ // spilled using an Altivec instruction and reloaded into a VSRC register
+ // using a VSX instruction. The issue with this is that the VSX
+ // load/store instructions swap the doublewords in the vector and the Altivec
+ // ones don't. The register classes on the spill/reload may be different if
+ // the register is defined using an Altivec instruction and is then used by a
+ // VSX instruction.
+ RC = updatedRC(RC);
+
bool NonRI = false, SpillsVRS = false;
if (StoreRegToStackSlot(MF, SrcReg, isKill, FrameIdx, RC, NewMIs,
NonRI, SpillsVRS))
@@ -1080,7 +1080,7 @@ PPCInstrInfo::storeRegToStackSlot(MachineBasicBlock &MBB,
for (unsigned i = 0, e = NewMIs.size(); i != e; ++i)
MBB.insert(MI, NewMIs[i]);
- const MachineFrameInfo &MFI = *MF.getFrameInfo();
+ const MachineFrameInfo &MFI = MF.getFrameInfo();
MachineMemOperand *MMO = MF.getMachineMemOperand(
MachinePointerInfo::getFixedStack(MF, FrameIdx),
MachineMemOperand::MOStore, MFI.getObjectSize(FrameIdx),
@@ -1125,16 +1125,19 @@ bool PPCInstrInfo::LoadRegFromStackSlot(MachineFunction &MF, const DebugLoc &DL,
FrameIdx));
NonRI = true;
} else if (PPC::VSRCRegClass.hasSubClassEq(RC)) {
- NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::LXVD2X), DestReg),
+ unsigned Op = Subtarget.hasP9Vector() ? PPC::LXVX : PPC::LXVD2X;
+ NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(Op), DestReg),
FrameIdx));
NonRI = true;
} else if (PPC::VSFRCRegClass.hasSubClassEq(RC)) {
- NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::LXSDX), DestReg),
- FrameIdx));
+ unsigned Opc = Subtarget.hasP9Vector() ? PPC::DFLOADf64 : PPC::LXSDX;
+ NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(Opc),
+ DestReg), FrameIdx));
NonRI = true;
} else if (PPC::VSSRCRegClass.hasSubClassEq(RC)) {
- NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(PPC::LXSSPX), DestReg),
- FrameIdx));
+ unsigned Opc = Subtarget.hasP9Vector() ? PPC::DFLOADf32 : PPC::LXSSPX;
+ NewMIs.push_back(addFrameReference(BuildMI(MF, DL, get(Opc),
+ DestReg), FrameIdx));
NonRI = true;
} else if (PPC::VRSAVERCRegClass.hasSubClassEq(RC)) {
assert(Subtarget.isDarwin() &&
@@ -1177,6 +1180,16 @@ PPCInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
PPCFunctionInfo *FuncInfo = MF.getInfo<PPCFunctionInfo>();
FuncInfo->setHasSpills();
+ // We need to avoid a situation in which the value from a VRRC register is
+ // spilled using an Altivec instruction and reloaded into a VSRC register
+ // using a VSX instruction. The issue with this is that the VSX
+ // load/store instructions swap the doublewords in the vector and the Altivec
+ // ones don't. The register classes on the spill/reload may be different if
+ // the register is defined using an Altivec instruction and is then used by a
+ // VSX instruction.
+ if (Subtarget.hasVSX() && RC == &PPC::VRRCRegClass)
+ RC = &PPC::VSRCRegClass;
+
bool NonRI = false, SpillsVRS = false;
if (LoadRegFromStackSlot(MF, DL, DestReg, FrameIdx, RC, NewMIs,
NonRI, SpillsVRS))
@@ -1191,7 +1204,7 @@ PPCInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
for (unsigned i = 0, e = NewMIs.size(); i != e; ++i)
MBB.insert(MI, NewMIs[i]);
- const MachineFrameInfo &MFI = *MF.getFrameInfo();
+ const MachineFrameInfo &MFI = MF.getFrameInfo();
MachineMemOperand *MMO = MF.getMachineMemOperand(
MachinePointerInfo::getFixedStack(MF, FrameIdx),
MachineMemOperand::MOLoad, MFI.getObjectSize(FrameIdx),
@@ -1200,7 +1213,7 @@ PPCInstrInfo::loadRegFromStackSlot(MachineBasicBlock &MBB,
}
bool PPCInstrInfo::
-ReverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {
+reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const {
assert(Cond.size() == 2 && "Invalid PPC branch opcode!");
if (Cond[1].getReg() == PPC::CTR8 || Cond[1].getReg() == PPC::CTR)
Cond[0].setImm(Cond[0].getImm() == 0 ? 1 : 0);
@@ -1809,7 +1822,7 @@ bool PPCInstrInfo::optimizeCompareInstr(MachineInstr &CmpInstr, unsigned SrcReg,
/// GetInstSize - Return the number of bytes of code the specified
/// instruction may be. This returns the maximum number of bytes.
///
-unsigned PPCInstrInfo::GetInstSizeInBytes(const MachineInstr &MI) const {
+unsigned PPCInstrInfo::getInstSizeInBytes(const MachineInstr &MI) const {
unsigned Opcode = MI.getOpcode();
if (Opcode == PPC::INLINEASM) {
@@ -1817,10 +1830,11 @@ unsigned PPCInstrInfo::GetInstSizeInBytes(const MachineInstr &MI) const {
const char *AsmStr = MI.getOperand(0).getSymbolName();
return getInlineAsmLength(AsmStr, *MF->getTarget().getMCAsmInfo());
} else if (Opcode == TargetOpcode::STACKMAP) {
- return MI.getOperand(1).getImm();
+ StackMapOpers Opers(&MI);
+ return Opers.getNumPatchBytes();
} else if (Opcode == TargetOpcode::PATCHPOINT) {
PatchPointOpers Opers(&MI);
- return Opers.getMetaOper(PatchPointOpers::NBytesPos).getImm();
+ return Opers.getNumPatchBytes();
} else {
const MCInstrDesc &Desc = get(Opcode);
return Desc.getSize();
@@ -1872,6 +1886,48 @@ bool PPCInstrInfo::expandPostRAPseudo(MachineInstr &MI) const {
.addReg(Reg);
return true;
}
+ case PPC::DFLOADf32:
+ case PPC::DFLOADf64:
+ case PPC::DFSTOREf32:
+ case PPC::DFSTOREf64: {
+ assert(Subtarget.hasP9Vector() &&
+ "Invalid D-Form Pseudo-ops on non-P9 target.");
+ unsigned UpperOpcode, LowerOpcode;
+ switch (MI.getOpcode()) {
+ case PPC::DFLOADf32:
+ UpperOpcode = PPC::LXSSP;
+ LowerOpcode = PPC::LFS;
+ break;
+ case PPC::DFLOADf64:
+ UpperOpcode = PPC::LXSD;
+ LowerOpcode = PPC::LFD;
+ break;
+ case PPC::DFSTOREf32:
+ UpperOpcode = PPC::STXSSP;
+ LowerOpcode = PPC::STFS;
+ break;
+ case PPC::DFSTOREf64:
+ UpperOpcode = PPC::STXSD;
+ LowerOpcode = PPC::STFD;
+ break;
+ }
+ unsigned TargetReg = MI.getOperand(0).getReg();
+ unsigned Opcode;
+ if ((TargetReg >= PPC::F0 && TargetReg <= PPC::F31) ||
+ (TargetReg >= PPC::VSL0 && TargetReg <= PPC::VSL31))
+ Opcode = LowerOpcode;
+ else
+ Opcode = UpperOpcode;
+ MI.setDesc(get(Opcode));
+ return true;
+ }
}
return false;
}
+
+const TargetRegisterClass *
+PPCInstrInfo::updatedRC(const TargetRegisterClass *RC) const {
+ if (Subtarget.hasVSX() && RC == &PPC::VRRCRegClass)
+ return &PPC::VSRCRegClass;
+ return RC;
+}
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.h b/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.h
index 98baf12..32b2f00 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.h
+++ b/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.h
@@ -61,6 +61,15 @@ enum PPC970_Unit {
PPC970_VPERM = 6 << PPC970_Shift, // Vector Permute Unit
PPC970_BRU = 7 << PPC970_Shift // Branch Unit
};
+
+enum {
+ /// Shift count to bypass PPC970 flags
+ NewDef_Shift = 6,
+
+ /// The VSX instruction that uses VSX register (vs0-vs63), instead of VMX
+ /// register (v0-v31).
+ UseVSXReg = 0x1 << NewDef_Shift
+};
} // end namespace PPCII
class PPCSubtarget;
@@ -168,10 +177,12 @@ public:
MachineBasicBlock *&FBB,
SmallVectorImpl<MachineOperand> &Cond,
bool AllowModify) const override;
- unsigned RemoveBranch(MachineBasicBlock &MBB) const override;
- unsigned InsertBranch(MachineBasicBlock &MBB, MachineBasicBlock *TBB,
+ unsigned removeBranch(MachineBasicBlock &MBB,
+ int *BytesRemoved = nullptr) const override;
+ unsigned insertBranch(MachineBasicBlock &MBB, MachineBasicBlock *TBB,
MachineBasicBlock *FBB, ArrayRef<MachineOperand> Cond,
- const DebugLoc &DL) const override;
+ const DebugLoc &DL,
+ int *BytesAdded = nullptr) const override;
// Select analysis.
bool canInsertSelect(const MachineBasicBlock &, ArrayRef<MachineOperand> Cond,
@@ -198,7 +209,7 @@ public:
const TargetRegisterInfo *TRI) const override;
bool
- ReverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const override;
+ reverseBranchCondition(SmallVectorImpl<MachineOperand> &Cond) const override;
bool FoldImmediate(MachineInstr &UseMI, MachineInstr &DefMI, unsigned Reg,
MachineRegisterInfo *MRI) const override;
@@ -256,7 +267,7 @@ public:
/// GetInstSize - Return the number of bytes of code the specified
/// instruction may be. This returns the maximum number of bytes.
///
- unsigned GetInstSizeInBytes(const MachineInstr &MI) const;
+ unsigned getInstSizeInBytes(const MachineInstr &MI) const override;
void getNoopForMachoTarget(MCInst &NopInst) const override;
@@ -271,6 +282,14 @@ public:
// Lower pseudo instructions after register allocation.
bool expandPostRAPseudo(MachineInstr &MI) const override;
+
+ static bool isVFRegister(unsigned Reg) {
+ return Reg >= PPC::VF0 && Reg <= PPC::VF31;
+ }
+ static bool isVRRegister(unsigned Reg) {
+ return Reg >= PPC::V0 && Reg <= PPC::V31;
+ }
+ const TargetRegisterClass *updatedRC(const TargetRegisterClass *RC) const;
};
}
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td b/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td
index a40d4e1..f615cc7 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCInstrInfo.td
@@ -23,6 +23,15 @@ def SDT_PPCstfiwx : SDTypeProfile<0, 2, [ // stfiwx
def SDT_PPClfiwx : SDTypeProfile<1, 1, [ // lfiw[az]x
SDTCisVT<0, f64>, SDTCisPtrTy<1>
]>;
+def SDT_PPCLxsizx : SDTypeProfile<1, 2, [
+ SDTCisVT<0, f64>, SDTCisPtrTy<1>, SDTCisPtrTy<2>
+]>;
+def SDT_PPCstxsix : SDTypeProfile<0, 3, [
+ SDTCisVT<0, f64>, SDTCisPtrTy<1>, SDTCisPtrTy<2>
+]>;
+def SDT_PPCVexts : SDTypeProfile<1, 2, [
+ SDTCisVT<0, f64>, SDTCisVT<1, f64>, SDTCisPtrTy<2>
+]>;
def SDT_PPCCallSeqStart : SDCallSeqStart<[ SDTCisVT<0, i32> ]>;
def SDT_PPCCallSeqEnd : SDCallSeqEnd<[ SDTCisVT<0, i32>,
@@ -108,6 +117,11 @@ def PPClfiwax : SDNode<"PPCISD::LFIWAX", SDT_PPClfiwx,
[SDNPHasChain, SDNPMayLoad]>;
def PPClfiwzx : SDNode<"PPCISD::LFIWZX", SDT_PPClfiwx,
[SDNPHasChain, SDNPMayLoad]>;
+def PPClxsizx : SDNode<"PPCISD::LXSIZX", SDT_PPCLxsizx,
+ [SDNPHasChain, SDNPMayLoad]>;
+def PPCstxsix : SDNode<"PPCISD::STXSIX", SDT_PPCstxsix,
+ [SDNPHasChain, SDNPMayStore]>;
+def PPCVexts : SDNode<"PPCISD::VEXTS", SDT_PPCVexts, []>;
// Extract FPSCR (not modeled at the DAG level).
def PPCmffs : SDNode<"PPCISD::MFFS",
@@ -312,6 +326,8 @@ def immZExt16 : PatLeaf<(imm), [{
// field. Used by instructions like 'ori'.
return (uint64_t)N->getZExtValue() == (unsigned short)N->getZExtValue();
}], LO16>;
+def immAnyExt8 : ImmLeaf<i32, [{ return isInt<8>(Imm) || isUInt<8>(Imm); }]>;
+def immSExt5NonZero : ImmLeaf<i32, [{ return Imm && isInt<5>(Imm); }]>;
// imm16Shifted* - These match immediates where the low 16-bits are zero. There
// are two forms: imm16ShiftedSExt and imm16ShiftedZExt. These two forms are
@@ -444,6 +460,12 @@ def PPCRegVRRCAsmOperand : AsmOperandClass {
def vrrc : RegisterOperand<VRRC> {
let ParserMatchClass = PPCRegVRRCAsmOperand;
}
+def PPCRegVFRCAsmOperand : AsmOperandClass {
+ let Name = "RegVFRC"; let PredicateMethod = "isRegNumber";
+}
+def vfrc : RegisterOperand<VFRC> {
+ let ParserMatchClass = PPCRegVFRCAsmOperand;
+}
def PPCRegCRBITRCAsmOperand : AsmOperandClass {
let Name = "RegCRBITRC"; let PredicateMethod = "isCRBitNumber";
}
@@ -478,6 +500,15 @@ def u2imm : Operand<i32> {
let ParserMatchClass = PPCU2ImmAsmOperand;
}
+def PPCATBitsAsHintAsmOperand : AsmOperandClass {
+ let Name = "ATBitsAsHint"; let PredicateMethod = "isATBitsAsHint";
+ let RenderMethod = "addImmOperands"; // Irrelevant, predicate always fails.
+}
+def atimm : Operand<i32> {
+ let PrintMethod = "printATBitsAsHint";
+ let ParserMatchClass = PPCATBitsAsHintAsmOperand;
+}
+
def PPCU3ImmAsmOperand : AsmOperandClass {
let Name = "U3Imm"; let PredicateMethod = "isU3Imm";
let RenderMethod = "addImmOperands";
@@ -591,6 +622,9 @@ def s17imm : Operand<i32> {
let ParserMatchClass = PPCS17ImmAsmOperand;
let DecoderMethod = "decodeSImmOperand<16>";
}
+
+def fpimm0 : PatLeaf<(fpimm), [{ return N->isExactlyValue(+0.0); }]>;
+
def PPCDirectBrAsmOperand : AsmOperandClass {
let Name = "DirectBr"; let PredicateMethod = "isDirectBr";
let RenderMethod = "addBranchTargetOperands";
@@ -1448,9 +1482,6 @@ def RFEBB : XLForm_S<19, 146, (outs), (ins u1imm:$imm), "rfebb $imm",
def DCBA : DCB_Form<758, 0, (outs), (ins memrr:$dst), "dcba $dst",
IIC_LdStDCBF, [(int_ppc_dcba xoaddr:$dst)]>,
PPC970_DGroup_Single;
-def DCBF : DCB_Form<86, 0, (outs), (ins memrr:$dst), "dcbf $dst",
- IIC_LdStDCBF, [(int_ppc_dcbf xoaddr:$dst)]>,
- PPC970_DGroup_Single;
def DCBI : DCB_Form<470, 0, (outs), (ins memrr:$dst), "dcbi $dst",
IIC_LdStDCBF, [(int_ppc_dcbi xoaddr:$dst)]>,
PPC970_DGroup_Single;
@@ -1464,6 +1495,10 @@ def DCBZL : DCB_Form<1014, 1, (outs), (ins memrr:$dst), "dcbzl $dst",
IIC_LdStDCBF, [(int_ppc_dcbzl xoaddr:$dst)]>,
PPC970_DGroup_Single;
+def DCBF : DCB_Form_hint<86, (outs), (ins u5imm:$TH, memrr:$dst),
+ "dcbf $dst, $TH", IIC_LdStDCBF, []>,
+ PPC970_DGroup_Single;
+
let hasSideEffects = 0, mayLoad = 1, mayStore = 1 in {
def DCBT : DCB_Form_hint<278, (outs), (ins u5imm:$TH, memrr:$dst),
"dcbt $dst, $TH", IIC_LdStDCBF, []>,
@@ -1473,13 +1508,21 @@ def DCBTST : DCB_Form_hint<246, (outs), (ins u5imm:$TH, memrr:$dst),
PPC970_DGroup_Single;
} // hasSideEffects = 0
+def ICBLC : XForm_icbt<31, 230, (outs), (ins u4imm:$CT, memrr:$src),
+ "icblc $CT, $src", IIC_LdStStore>, Requires<[HasICBT]>;
+def ICBLQ : XForm_icbt<31, 198, (outs), (ins u4imm:$CT, memrr:$src),
+ "icblq. $CT, $src", IIC_LdStLoad>, Requires<[HasICBT]>;
def ICBT : XForm_icbt<31, 22, (outs), (ins u4imm:$CT, memrr:$src),
"icbt $CT, $src", IIC_LdStLoad>, Requires<[HasICBT]>;
+def ICBTLS : XForm_icbt<31, 486, (outs), (ins u4imm:$CT, memrr:$src),
+ "icbtls $CT, $src", IIC_LdStLoad>, Requires<[HasICBT]>;
def : Pat<(int_ppc_dcbt xoaddr:$dst),
(DCBT 0, xoaddr:$dst)>;
def : Pat<(int_ppc_dcbtst xoaddr:$dst),
(DCBTST 0, xoaddr:$dst)>;
+def : Pat<(int_ppc_dcbf xoaddr:$dst),
+ (DCBF 0, xoaddr:$dst)>;
def : Pat<(prefetch xoaddr:$dst, (i32 0), imm, (i32 1)),
(DCBT 0, xoaddr:$dst)>; // data prefetch for loads
@@ -2135,26 +2178,34 @@ let isCompare = 1, hasSideEffects = 0 in {
"fcmpu $crD, $fA, $fB", IIC_FPCompare>;
}
+def FTDIV: XForm_17<63, 128, (outs crrc:$crD), (ins f8rc:$fA, f8rc:$fB),
+ "ftdiv $crD, $fA, $fB", IIC_FPCompare>;
+def FTSQRT: XForm_17a<63, 160, (outs crrc:$crD), (ins f8rc:$fB),
+ "ftsqrt $crD, $fB", IIC_FPCompare>;
+
let Uses = [RM] in {
let hasSideEffects = 0 in {
defm FCTIW : XForm_26r<63, 14, (outs f8rc:$frD), (ins f8rc:$frB),
"fctiw", "$frD, $frB", IIC_FPGeneral,
[]>;
+ defm FCTIWU : XForm_26r<63, 142, (outs f8rc:$frD), (ins f8rc:$frB),
+ "fctiwu", "$frD, $frB", IIC_FPGeneral,
+ []>;
defm FCTIWZ : XForm_26r<63, 15, (outs f8rc:$frD), (ins f8rc:$frB),
"fctiwz", "$frD, $frB", IIC_FPGeneral,
[(set f64:$frD, (PPCfctiwz f64:$frB))]>;
defm FRSP : XForm_26r<63, 12, (outs f4rc:$frD), (ins f8rc:$frB),
"frsp", "$frD, $frB", IIC_FPGeneral,
- [(set f32:$frD, (fround f64:$frB))]>;
+ [(set f32:$frD, (fpround f64:$frB))]>;
let Interpretation64Bit = 1, isCodeGenOnly = 1 in
defm FRIND : XForm_26r<63, 392, (outs f8rc:$frD), (ins f8rc:$frB),
"frin", "$frD, $frB", IIC_FPGeneral,
- [(set f64:$frD, (frnd f64:$frB))]>;
+ [(set f64:$frD, (fround f64:$frB))]>;
defm FRINS : XForm_26r<63, 392, (outs f4rc:$frD), (ins f4rc:$frB),
"frin", "$frD, $frB", IIC_FPGeneral,
- [(set f32:$frD, (frnd f32:$frB))]>;
+ [(set f32:$frD, (fround f32:$frB))]>;
}
let hasSideEffects = 0 in {
@@ -2336,6 +2387,13 @@ def MTSPR : XFXForm_1<31, 467, (outs), (ins i32imm:$SPR, gprc:$RT),
def MFTB : XFXForm_1<31, 371, (outs gprc:$RT), (ins i32imm:$SPR),
"mftb $RT, $SPR", IIC_SprMFTB>;
+def MFPMR : XFXForm_1<31, 334, (outs gprc:$RT), (ins i32imm:$SPR),
+ "mfpmr $RT, $SPR", IIC_SprMFPMR>;
+
+def MTPMR : XFXForm_1<31, 462, (outs), (ins i32imm:$SPR, gprc:$RT),
+ "mtpmr $SPR, $RT", IIC_SprMTPMR>;
+
+
// A pseudo-instruction used to implement the read of the 64-bit cycle counter
// on a 32-bit target.
let hasSideEffects = 1, usesCustomInserter = 1 in
@@ -2892,7 +2950,7 @@ def : Pat<(f64 (extloadf32 iaddr:$src)),
def : Pat<(f64 (extloadf32 xaddr:$src)),
(COPY_TO_REGCLASS (LFSX xaddr:$src), F8RC)>;
-def : Pat<(f64 (fextend f32:$src)),
+def : Pat<(f64 (fpextend f32:$src)),
(COPY_TO_REGCLASS $src, F8RC)>;
// Only seq_cst fences require the heavyweight sync (SYNC 0).
@@ -3185,6 +3243,46 @@ defm : ExtSetCCPat<SETLE,
OutPatFrag<(ops node:$in),
(RLDICL $in, 1, 63)> >;
+// An extended SETCC with shift amount.
+multiclass ExtSetCCShiftPat<CondCode cc, PatFrag pfrag,
+ OutPatFrag rfrag, OutPatFrag rfrag8> {
+ def : Pat<(i32 (zext (i1 (pfrag i32:$s1, i32:$sa, cc)))),
+ (rfrag $s1, $sa)>;
+ def : Pat<(i64 (zext (i1 (pfrag i64:$s1, i32:$sa, cc)))),
+ (rfrag8 $s1, $sa)>;
+ def : Pat<(i64 (zext (i1 (pfrag i32:$s1, i32:$sa, cc)))),
+ (INSERT_SUBREG (i64 (IMPLICIT_DEF)), (rfrag $s1, $sa), sub_32)>;
+ def : Pat<(i32 (zext (i1 (pfrag i64:$s1, i32:$sa, cc)))),
+ (EXTRACT_SUBREG (rfrag8 $s1, $sa), sub_32)>;
+
+ def : Pat<(i32 (anyext (i1 (pfrag i32:$s1, i32:$sa, cc)))),
+ (rfrag $s1, $sa)>;
+ def : Pat<(i64 (anyext (i1 (pfrag i64:$s1, i32:$sa, cc)))),
+ (rfrag8 $s1, $sa)>;
+ def : Pat<(i64 (anyext (i1 (pfrag i32:$s1, i32:$sa, cc)))),
+ (INSERT_SUBREG (i64 (IMPLICIT_DEF)), (rfrag $s1, $sa), sub_32)>;
+ def : Pat<(i32 (anyext (i1 (pfrag i64:$s1, i32:$sa, cc)))),
+ (EXTRACT_SUBREG (rfrag8 $s1, $sa), sub_32)>;
+}
+
+defm : ExtSetCCShiftPat<SETNE,
+ PatFrag<(ops node:$in, node:$sa, node:$cc),
+ (setcc (and $in, (shl 1, $sa)), 0, $cc)>,
+ OutPatFrag<(ops node:$in, node:$sa),
+ (RLWNM $in, (SUBFIC $sa, 32), 31, 31)>,
+ OutPatFrag<(ops node:$in, node:$sa),
+ (RLDCL $in, (SUBFIC $sa, 64), 63)> >;
+
+defm : ExtSetCCShiftPat<SETEQ,
+ PatFrag<(ops node:$in, node:$sa, node:$cc),
+ (setcc (and $in, (shl 1, $sa)), 0, $cc)>,
+ OutPatFrag<(ops node:$in, node:$sa),
+ (RLWNM (i32not $in),
+ (SUBFIC $sa, 32), 31, 31)>,
+ OutPatFrag<(ops node:$in, node:$sa),
+ (RLDCL (i64not $in),
+ (SUBFIC $sa, 64), 63)> >;
+
// SETCC for i32.
def : Pat<(i1 (setcc i32:$s1, immZExt16:$imm, SETULT)),
(EXTRACT_SUBREG (CMPLWI $s1, imm:$imm), sub_lt)>;
@@ -3654,6 +3752,9 @@ def SLBMTE : XForm_26<31, 402, (outs), (ins gprc:$RS, gprc:$RB),
def SLBMFEE : XForm_26<31, 915, (outs gprc:$RT), (ins gprc:$RB),
"slbmfee $RT, $RB", IIC_SprSLBMFEE, []>;
+def SLBMFEV : XLForm_1_gen<31, 851, (outs gprc:$RT), (ins gprc:$RB),
+ "slbmfev $RT, $RB", IIC_SprSLBMFEV, []>;
+
def SLBIA : XForm_0<31, 498, (outs), (ins), "slbia", IIC_SprSLBIA, []>;
def TLBIA : XForm_0<31, 370, (outs), (ins),
@@ -3716,6 +3817,9 @@ def MFDCR : XFXForm_1<31, 323, (outs gprc:$RT), (ins i32imm:$SPR),
def MTDCR : XFXForm_1<31, 451, (outs), (ins gprc:$RT, i32imm:$SPR),
"mtdcr $SPR, $RT", IIC_SprMTSPR>, Requires<[IsPPC4xx]>;
+def HRFID : XLForm_1_np<19, 274, (outs), (ins), "hrfid", IIC_BrB, []>;
+def NAP : XLForm_1_np<19, 434, (outs), (ins), "nap", IIC_BrB, []>;
+
def ATTN : XForm_attn<0, 256, (outs), (ins), "attn", IIC_BrB>;
def LBZCIX : XForm_base_r3xo<31, 853, (outs gprc:$RST), (ins gprc:$A, gprc:$B),
@@ -3780,6 +3884,10 @@ def DCBTSTCT : PPCAsmPseudo<"dcbtstct $dst, $TH", (ins memrr:$dst, u5imm:$TH)>;
def DCBTSTDS : PPCAsmPseudo<"dcbtstds $dst, $TH", (ins memrr:$dst, u5imm:$TH)>;
def DCBTSTT : PPCAsmPseudo<"dcbtstt $dst", (ins memrr:$dst)>;
+def DCBFx : PPCAsmPseudo<"dcbf $dst", (ins memrr:$dst)>;
+def DCBFL : PPCAsmPseudo<"dcbfl $dst", (ins memrr:$dst)>;
+def DCBFLP : PPCAsmPseudo<"dcbflp $dst", (ins memrr:$dst)>;
+
def : InstAlias<"crset $bx", (CREQV crbitrc:$bx, crbitrc:$bx, crbitrc:$bx)>;
def : InstAlias<"crclr $bx", (CRXOR crbitrc:$bx, crbitrc:$bx, crbitrc:$bx)>;
def : InstAlias<"crmove $bx, $by", (CROR crbitrc:$bx, crbitrc:$by, crbitrc:$by)>;
@@ -4081,6 +4189,16 @@ let PPC970_Unit = 7 in {
def gBCA : BForm_3<16, 1, 0, (outs),
(ins u5imm:$bo, crbitrc:$bi, abscondbrtarget:$dst),
"bca $bo, $bi, $dst">;
+ let isAsmParserOnly = 1 in {
+ def gBCat : BForm_3_at<16, 0, 0, (outs),
+ (ins u5imm:$bo, atimm:$at, crbitrc:$bi,
+ condbrtarget:$dst),
+ "bc$at $bo, $bi, $dst">;
+ def gBCAat : BForm_3_at<16, 1, 0, (outs),
+ (ins u5imm:$bo, atimm:$at, crbitrc:$bi,
+ abscondbrtarget:$dst),
+ "bca$at $bo, $bi, $dst">;
+ } // isAsmParserOnly = 1
}
let Defs = [LR, CTR], Uses = [CTR, RM] in {
def gBCL : BForm_3<16, 0, 1, (outs),
@@ -4089,6 +4207,16 @@ let PPC970_Unit = 7 in {
def gBCLA : BForm_3<16, 1, 1, (outs),
(ins u5imm:$bo, crbitrc:$bi, abscondbrtarget:$dst),
"bcla $bo, $bi, $dst">;
+ let isAsmParserOnly = 1 in {
+ def gBCLat : BForm_3_at<16, 0, 1, (outs),
+ (ins u5imm:$bo, atimm:$at, crbitrc:$bi,
+ condbrtarget:$dst),
+ "bcl$at $bo, $bi, $dst">;
+ def gBCLAat : BForm_3_at<16, 1, 1, (outs),
+ (ins u5imm:$bo, atimm:$at, crbitrc:$bi,
+ abscondbrtarget:$dst),
+ "bcla$at $bo, $bi, $dst">;
+ } // // isAsmParserOnly = 1
}
let Defs = [CTR], Uses = [CTR, LR, RM] in
def gBCLR : XLForm_2<19, 16, 0, (outs),
@@ -4107,6 +4235,20 @@ let PPC970_Unit = 7 in {
(ins u5imm:$bo, crbitrc:$bi, i32imm:$bh),
"bcctrl $bo, $bi, $bh", IIC_BrB, []>;
}
+
+multiclass BranchSimpleMnemonicAT<string pm, int at> {
+ def : InstAlias<"bc"#pm#" $bo, $bi, $dst", (gBCat u5imm:$bo, at, crbitrc:$bi,
+ condbrtarget:$dst)>;
+ def : InstAlias<"bca"#pm#" $bo, $bi, $dst", (gBCAat u5imm:$bo, at, crbitrc:$bi,
+ condbrtarget:$dst)>;
+ def : InstAlias<"bcl"#pm#" $bo, $bi, $dst", (gBCLat u5imm:$bo, at, crbitrc:$bi,
+ condbrtarget:$dst)>;
+ def : InstAlias<"bcla"#pm#" $bo, $bi, $dst", (gBCLAat u5imm:$bo, at, crbitrc:$bi,
+ condbrtarget:$dst)>;
+}
+defm : BranchSimpleMnemonicAT<"+", 3>;
+defm : BranchSimpleMnemonicAT<"-", 2>;
+
def : InstAlias<"bclr $bo, $bi", (gBCLR u5imm:$bo, crbitrc:$bi, 0)>;
def : InstAlias<"bclrl $bo, $bi", (gBCLRL u5imm:$bo, crbitrc:$bi, 0)>;
def : InstAlias<"bcctr $bo, $bi", (gBCCTR u5imm:$bo, crbitrc:$bi, 0)>;
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCInstrQPX.td b/contrib/llvm/lib/Target/PowerPC/PPCInstrQPX.td
index 4312007..4940c77 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCInstrQPX.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCInstrQPX.td
@@ -88,11 +88,11 @@ def pre_truncstv4f32 : PatFrag<(ops node:$val, node:$base, node:$offset),
return cast<StoreSDNode>(N)->getMemoryVT() == MVT::v4f32;
}]>;
-def fround_inexact : PatFrag<(ops node:$val), (fround node:$val), [{
+def fround_inexact : PatFrag<(ops node:$val), (fpround node:$val), [{
return cast<ConstantSDNode>(N->getOperand(1))->getZExtValue() == 0;
}]>;
-def fround_exact : PatFrag<(ops node:$val), (fround node:$val), [{
+def fround_exact : PatFrag<(ops node:$val), (fpround node:$val), [{
return cast<ConstantSDNode>(N->getOperand(1))->getZExtValue() == 1;
}]>;
@@ -311,11 +311,11 @@ let Uses = [RM] in {
def QVFRIN : XForm_19<4, 392, (outs qfrc:$FRT), (ins qfrc:$FRB),
"qvfrin $FRT, $FRB", IIC_FPGeneral,
- [(set v4f64:$FRT, (frnd v4f64:$FRB))]>;
+ [(set v4f64:$FRT, (fround v4f64:$FRB))]>;
let isCodeGenOnly = 1 in
def QVFRINs : XForm_19<4, 392, (outs qsrc:$FRT), (ins qsrc:$FRB),
"qvfrin $FRT, $FRB", IIC_FPGeneral,
- [(set v4f32:$FRT, (frnd v4f32:$FRB))]>;
+ [(set v4f32:$FRT, (fround v4f32:$FRB))]>;
def QVFRIP : XForm_19<4, 456, (outs qfrc:$FRT), (ins qfrc:$FRB),
"qvfrip $FRT, $FRB", IIC_FPGeneral,
@@ -1103,7 +1103,7 @@ def : Pat<(xor v4i1:$FRA, v4i1:$FRB),
def : Pat<(not v4i1:$FRA),
(QVFLOGICALb $FRA, $FRA, (i32 10))>;
-def : Pat<(v4f64 (fextend v4f32:$src)),
+def : Pat<(v4f64 (fpextend v4f32:$src)),
(COPY_TO_REGCLASS $src, QFRC)>;
def : Pat<(v4f32 (fround_exact v4f64:$src)),
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCInstrVSX.td b/contrib/llvm/lib/Target/PowerPC/PPCInstrVSX.td
index a02ace0..0d9e345 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCInstrVSX.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCInstrVSX.td
@@ -89,22 +89,42 @@ multiclass XX3Form_Rcr<bits<6> opcode, bits<7> xo, string asmbase,
}
}
+// Instruction form with a single input register for instructions such as
+// XXPERMDI. The reason for defining this is that specifying multiple chained
+// operands (such as loads) to an instruction will perform both chained
+// operations rather than coalescing them into a single register - even though
+// the source memory location is the same. This simply forces the instruction
+// to use the same register for both inputs.
+// For example, an output DAG such as this:
+// (XXPERMDI (LXSIBZX xoaddr:$src), (LXSIBZX xoaddr:$src ), 0))
+// would result in two load instructions emitted and used as separate inputs
+// to the XXPERMDI instruction.
+class XX3Form_2s<bits<6> opcode, bits<5> xo, dag OOL, dag IOL, string asmstr,
+ InstrItinClass itin, list<dag> pattern>
+ : XX3Form_2<opcode, xo, OOL, IOL, asmstr, itin, pattern> {
+ let XB = XA;
+}
+
def HasVSX : Predicate<"PPCSubTarget->hasVSX()">;
def IsLittleEndian : Predicate<"PPCSubTarget->isLittleEndian()">;
def IsBigEndian : Predicate<"!PPCSubTarget->isLittleEndian()">;
+def HasOnlySwappingMemOps : Predicate<"!PPCSubTarget->hasP9Vector()">;
let Predicates = [HasVSX] in {
let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
+let UseVSXReg = 1 in {
let hasSideEffects = 0 in { // VSX instructions don't have side effects.
let Uses = [RM] in {
// Load indexed instructions
let mayLoad = 1 in {
+ let CodeSize = 3 in
def LXSDX : XX1Form<31, 588,
(outs vsfrc:$XT), (ins memrr:$src),
"lxsdx $XT, $src", IIC_LdStLFD,
[(set f64:$XT, (load xoaddr:$src))]>;
+ let Predicates = [HasVSX, HasOnlySwappingMemOps] in
def LXVD2X : XX1Form<31, 844,
(outs vsrc:$XT), (ins memrr:$src),
"lxvd2x $XT, $src", IIC_LdStLFD,
@@ -114,6 +134,7 @@ let Uses = [RM] in {
(outs vsrc:$XT), (ins memrr:$src),
"lxvdsx $XT, $src", IIC_LdStLFD, []>;
+ let Predicates = [HasVSX, HasOnlySwappingMemOps] in
def LXVW4X : XX1Form<31, 780,
(outs vsrc:$XT), (ins memrr:$src),
"lxvw4x $XT, $src", IIC_LdStLFD,
@@ -122,21 +143,25 @@ let Uses = [RM] in {
// Store indexed instructions
let mayStore = 1 in {
+ let CodeSize = 3 in
def STXSDX : XX1Form<31, 716,
(outs), (ins vsfrc:$XT, memrr:$dst),
"stxsdx $XT, $dst", IIC_LdStSTFD,
[(store f64:$XT, xoaddr:$dst)]>;
+ let Predicates = [HasVSX, HasOnlySwappingMemOps] in {
+ // The behaviour of this instruction is endianness-specific so we provide no
+ // pattern to match it without considering endianness.
def STXVD2X : XX1Form<31, 972,
(outs), (ins vsrc:$XT, memrr:$dst),
"stxvd2x $XT, $dst", IIC_LdStSTFD,
- [(store v2f64:$XT, xoaddr:$dst)]>;
+ []>;
def STXVW4X : XX1Form<31, 908,
(outs), (ins vsrc:$XT, memrr:$dst),
"stxvw4x $XT, $dst", IIC_LdStSTFD,
[(store v4i32:$XT, xoaddr:$dst)]>;
-
+ }
} // mayStore
// Add/Mul Instructions
@@ -545,18 +570,38 @@ let Uses = [RM] in {
(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvdpsxds $XT, $XB", IIC_VecFP,
[(set f64:$XT, (PPCfctidz f64:$XB))]>;
+ let isCodeGenOnly = 1 in
+ def XSCVDPSXDSs : XX2Form<60, 344,
+ (outs vssrc:$XT), (ins vssrc:$XB),
+ "xscvdpsxds $XT, $XB", IIC_VecFP,
+ [(set f32:$XT, (PPCfctidz f32:$XB))]>;
def XSCVDPSXWS : XX2Form<60, 88,
(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvdpsxws $XT, $XB", IIC_VecFP,
[(set f64:$XT, (PPCfctiwz f64:$XB))]>;
+ let isCodeGenOnly = 1 in
+ def XSCVDPSXWSs : XX2Form<60, 88,
+ (outs vssrc:$XT), (ins vssrc:$XB),
+ "xscvdpsxws $XT, $XB", IIC_VecFP,
+ [(set f32:$XT, (PPCfctiwz f32:$XB))]>;
def XSCVDPUXDS : XX2Form<60, 328,
(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvdpuxds $XT, $XB", IIC_VecFP,
[(set f64:$XT, (PPCfctiduz f64:$XB))]>;
+ let isCodeGenOnly = 1 in
+ def XSCVDPUXDSs : XX2Form<60, 328,
+ (outs vssrc:$XT), (ins vssrc:$XB),
+ "xscvdpuxds $XT, $XB", IIC_VecFP,
+ [(set f32:$XT, (PPCfctiduz f32:$XB))]>;
def XSCVDPUXWS : XX2Form<60, 72,
(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvdpuxws $XT, $XB", IIC_VecFP,
[(set f64:$XT, (PPCfctiwuz f64:$XB))]>;
+ let isCodeGenOnly = 1 in
+ def XSCVDPUXWSs : XX2Form<60, 72,
+ (outs vssrc:$XT), (ins vssrc:$XB),
+ "xscvdpuxws $XT, $XB", IIC_VecFP,
+ [(set f32:$XT, (PPCfctiwuz f32:$XB))]>;
def XSCVSPDP : XX2Form<60, 329,
(outs vsfrc:$XT), (ins vsfrc:$XB),
"xscvspdp $XT, $XB", IIC_VecFP, []>;
@@ -571,47 +616,55 @@ let Uses = [RM] in {
def XVCVDPSP : XX2Form<60, 393,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvdpsp $XT, $XB", IIC_VecFP, []>;
+ "xvcvdpsp $XT, $XB", IIC_VecFP,
+ [(set v4f32:$XT, (int_ppc_vsx_xvcvdpsp v2f64:$XB))]>;
def XVCVDPSXDS : XX2Form<60, 472,
(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvdpsxds $XT, $XB", IIC_VecFP,
[(set v2i64:$XT, (fp_to_sint v2f64:$XB))]>;
def XVCVDPSXWS : XX2Form<60, 216,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvdpsxws $XT, $XB", IIC_VecFP, []>;
+ "xvcvdpsxws $XT, $XB", IIC_VecFP,
+ [(set v4i32:$XT, (int_ppc_vsx_xvcvdpsxws v2f64:$XB))]>;
def XVCVDPUXDS : XX2Form<60, 456,
(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvdpuxds $XT, $XB", IIC_VecFP,
[(set v2i64:$XT, (fp_to_uint v2f64:$XB))]>;
def XVCVDPUXWS : XX2Form<60, 200,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvdpuxws $XT, $XB", IIC_VecFP, []>;
+ "xvcvdpuxws $XT, $XB", IIC_VecFP,
+ [(set v4i32:$XT, (int_ppc_vsx_xvcvdpuxws v2f64:$XB))]>;
def XVCVSPDP : XX2Form<60, 457,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvspdp $XT, $XB", IIC_VecFP, []>;
+ "xvcvspdp $XT, $XB", IIC_VecFP,
+ [(set v2f64:$XT, (int_ppc_vsx_xvcvspdp v4f32:$XB))]>;
def XVCVSPSXDS : XX2Form<60, 408,
(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvspsxds $XT, $XB", IIC_VecFP, []>;
def XVCVSPSXWS : XX2Form<60, 152,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvspsxws $XT, $XB", IIC_VecFP, []>;
+ "xvcvspsxws $XT, $XB", IIC_VecFP,
+ [(set v4i32:$XT, (fp_to_sint v4f32:$XB))]>;
def XVCVSPUXDS : XX2Form<60, 392,
(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvspuxds $XT, $XB", IIC_VecFP, []>;
def XVCVSPUXWS : XX2Form<60, 136,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvspuxws $XT, $XB", IIC_VecFP, []>;
+ "xvcvspuxws $XT, $XB", IIC_VecFP,
+ [(set v4i32:$XT, (fp_to_uint v4f32:$XB))]>;
def XVCVSXDDP : XX2Form<60, 504,
(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvsxddp $XT, $XB", IIC_VecFP,
[(set v2f64:$XT, (sint_to_fp v2i64:$XB))]>;
def XVCVSXDSP : XX2Form<60, 440,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvsxdsp $XT, $XB", IIC_VecFP, []>;
+ "xvcvsxdsp $XT, $XB", IIC_VecFP,
+ [(set v4f32:$XT, (int_ppc_vsx_xvcvsxdsp v2i64:$XB))]>;
def XVCVSXWDP : XX2Form<60, 248,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvsxwdp $XT, $XB", IIC_VecFP, []>;
+ "xvcvsxwdp $XT, $XB", IIC_VecFP,
+ [(set v2f64:$XT, (int_ppc_vsx_xvcvsxwdp v4i32:$XB))]>;
def XVCVSXWSP : XX2Form<60, 184,
(outs vsrc:$XT), (ins vsrc:$XB),
"xvcvsxwsp $XT, $XB", IIC_VecFP,
@@ -622,19 +675,22 @@ let Uses = [RM] in {
[(set v2f64:$XT, (uint_to_fp v2i64:$XB))]>;
def XVCVUXDSP : XX2Form<60, 424,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvuxdsp $XT, $XB", IIC_VecFP, []>;
+ "xvcvuxdsp $XT, $XB", IIC_VecFP,
+ [(set v4f32:$XT, (int_ppc_vsx_xvcvuxdsp v2i64:$XB))]>;
def XVCVUXWDP : XX2Form<60, 232,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvuxwdp $XT, $XB", IIC_VecFP, []>;
+ "xvcvuxwdp $XT, $XB", IIC_VecFP,
+ [(set v2f64:$XT, (int_ppc_vsx_xvcvuxwdp v4i32:$XB))]>;
def XVCVUXWSP : XX2Form<60, 168,
(outs vsrc:$XT), (ins vsrc:$XB),
- "xvcvuxwsp $XT, $XB", IIC_VecFP, []>;
+ "xvcvuxwsp $XT, $XB", IIC_VecFP,
+ [(set v4f32:$XT, (uint_to_fp v4i32:$XB))]>;
// Rounding Instructions
def XSRDPI : XX2Form<60, 73,
(outs vsfrc:$XT), (ins vsfrc:$XB),
"xsrdpi $XT, $XB", IIC_VecFP,
- [(set f64:$XT, (frnd f64:$XB))]>;
+ [(set f64:$XT, (fround f64:$XB))]>;
def XSRDPIC : XX2Form<60, 107,
(outs vsfrc:$XT), (ins vsfrc:$XB),
"xsrdpic $XT, $XB", IIC_VecFP,
@@ -655,7 +711,7 @@ let Uses = [RM] in {
def XVRDPI : XX2Form<60, 201,
(outs vsrc:$XT), (ins vsrc:$XB),
"xvrdpi $XT, $XB", IIC_VecFP,
- [(set v2f64:$XT, (frnd v2f64:$XB))]>;
+ [(set v2f64:$XT, (fround v2f64:$XB))]>;
def XVRDPIC : XX2Form<60, 235,
(outs vsrc:$XT), (ins vsrc:$XB),
"xvrdpic $XT, $XB", IIC_VecFP,
@@ -676,7 +732,7 @@ let Uses = [RM] in {
def XVRSPI : XX2Form<60, 137,
(outs vsrc:$XT), (ins vsrc:$XB),
"xvrspi $XT, $XB", IIC_VecFP,
- [(set v4f32:$XT, (frnd v4f32:$XB))]>;
+ [(set v4f32:$XT, (fround v4f32:$XB))]>;
def XVRSPIC : XX2Form<60, 171,
(outs vsrc:$XT), (ins vsrc:$XB),
"xvrspic $XT, $XB", IIC_VecFP,
@@ -761,6 +817,21 @@ let Uses = [RM] in {
"xxlxor $XT, $XA, $XB", IIC_VecGeneral,
[(set v4i32:$XT, (xor v4i32:$XA, v4i32:$XB))]>;
} // isCommutable
+ let isCodeGenOnly = 1 in
+ def XXLXORz : XX3Form_Zero<60, 154, (outs vsrc:$XT), (ins),
+ "xxlxor $XT, $XT, $XT", IIC_VecGeneral,
+ [(set v4i32:$XT, (v4i32 immAllZerosV))]>;
+
+ let isCodeGenOnly = 1 in {
+ def XXLXORdpz : XX3Form_SetZero<60, 154,
+ (outs vsfrc:$XT), (ins),
+ "xxlxor $XT, $XT, $XT", IIC_VecGeneral,
+ [(set f64:$XT, (fpimm0))]>;
+ def XXLXORspz : XX3Form_SetZero<60, 154,
+ (outs vssrc:$XT), (ins),
+ "xxlxor $XT, $XT, $XT", IIC_VecGeneral,
+ [(set f32:$XT, (fpimm0))]>;
+ }
// Permutation Instructions
def XXMRGHW : XX3Form<60, 18,
@@ -773,6 +844,9 @@ let Uses = [RM] in {
def XXPERMDI : XX3Form_2<60, 10,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, u2imm:$DM),
"xxpermdi $XT, $XA, $XB, $DM", IIC_VecPerm, []>;
+ let isCodeGenOnly = 1 in
+ def XXPERMDIs : XX3Form_2s<60, 10, (outs vsrc:$XT), (ins vsfrc:$XA, u2imm:$DM),
+ "xxpermdi $XT, $XA, $XA, $DM", IIC_VecPerm, []>;
def XXSEL : XX4Form<60, 3,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB, vsrc:$XC),
"xxsel $XT, $XA, $XB, $XC", IIC_VecPerm, []>;
@@ -787,7 +861,12 @@ let Uses = [RM] in {
"xxspltw $XT, $XB, $UIM", IIC_VecPerm,
[(set v4i32:$XT,
(PPCxxsplt v4i32:$XB, imm32SExt16:$UIM))]>;
+ let isCodeGenOnly = 1 in
+ def XXSPLTWs : XX2Form_2<60, 164,
+ (outs vsrc:$XT), (ins vfrc:$XB, u2imm:$UIM),
+ "xxspltw $XT, $XB, $UIM", IIC_VecPerm, []>;
} // hasSideEffects
+} // UseVSXReg = 1
// SELECT_CC_* - Used to implement the SELECT_CC DAG operation. Expanded after
// instruction selection into a branch sequence.
@@ -839,9 +918,17 @@ def : InstAlias<"xxmrgld $XT, $XA, $XB",
(XXPERMDI vsrc:$XT, vsrc:$XA, vsrc:$XB, 3)>;
def : InstAlias<"xxswapd $XT, $XB",
(XXPERMDI vsrc:$XT, vsrc:$XB, vsrc:$XB, 2)>;
+def : InstAlias<"xxspltd $XT, $XB, 0",
+ (XXPERMDIs vsrc:$XT, vsfrc:$XB, 0)>;
+def : InstAlias<"xxspltd $XT, $XB, 1",
+ (XXPERMDIs vsrc:$XT, vsfrc:$XB, 3)>;
+def : InstAlias<"xxswapd $XT, $XB",
+ (XXPERMDIs vsrc:$XT, vsfrc:$XB, 2)>;
let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
+def : Pat<(v4i32 (vnot_ppc v4i32:$A)),
+ (v4i32 (XXLNOR $A, $A))>;
let Predicates = [IsBigEndian] in {
def : Pat<(v2f64 (scalar_to_vector f64:$A)),
(v2f64 (SUBREG_TO_REG (i64 1), $A, sub_64))>;
@@ -948,18 +1035,27 @@ def : Pat<(v2f64 (PPCuvec2fp v4i32:$C, 1)),
(v2f64 (XVCVUXWDP (v2i64 (XXMRGLW $C, $C))))>;
// Loads.
-def : Pat<(v2f64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;
-def : Pat<(v2i64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;
-def : Pat<(v4i32 (load xoaddr:$src)), (LXVW4X xoaddr:$src)>;
-def : Pat<(v2f64 (PPClxvd2x xoaddr:$src)), (LXVD2X xoaddr:$src)>;
-
-// Stores.
-def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),
- (STXVD2X $rS, xoaddr:$dst)>;
-def : Pat<(store v2i64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
-def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, xoaddr:$dst),
- (STXVW4X $rS, xoaddr:$dst)>;
-def : Pat<(PPCstxvd2x v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
+let Predicates = [HasVSX, HasOnlySwappingMemOps] in {
+ def : Pat<(v2f64 (PPClxvd2x xoaddr:$src)), (LXVD2X xoaddr:$src)>;
+
+ // Stores.
+ def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),
+ (STXVD2X $rS, xoaddr:$dst)>;
+ def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, xoaddr:$dst),
+ (STXVW4X $rS, xoaddr:$dst)>;
+ def : Pat<(int_ppc_vsx_stxvd2x_be v2f64:$rS, xoaddr:$dst),
+ (STXVD2X $rS, xoaddr:$dst)>;
+ def : Pat<(int_ppc_vsx_stxvw4x_be v4i32:$rS, xoaddr:$dst),
+ (STXVW4X $rS, xoaddr:$dst)>;
+ def : Pat<(PPCstxvd2x v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
+}
+let Predicates = [IsBigEndian, HasVSX, HasOnlySwappingMemOps] in {
+ def : Pat<(v2f64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;
+ def : Pat<(v2i64 (load xoaddr:$src)), (LXVD2X xoaddr:$src)>;
+ def : Pat<(v4i32 (load xoaddr:$src)), (LXVW4X xoaddr:$src)>;
+ def : Pat<(store v2f64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
+ def : Pat<(store v2i64:$rS, xoaddr:$dst), (STXVD2X $rS, xoaddr:$dst)>;
+}
// Permutes.
def : Pat<(v2f64 (PPCxxswapd v2f64:$src)), (XXPERMDI $src, $src, 2)>;
@@ -1054,6 +1150,22 @@ def : Pat<(f64 (PPCfcfidu (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),
} // AddedComplexity
} // HasVSX
+def ScalarLoads {
+ dag Li8 = (i32 (extloadi8 xoaddr:$src));
+ dag ZELi8 = (i32 (zextloadi8 xoaddr:$src));
+ dag ZELi8i64 = (i64 (zextloadi8 xoaddr:$src));
+ dag SELi8 = (i32 (sext_inreg (extloadi8 xoaddr:$src), i8));
+ dag SELi8i64 = (i64 (sext_inreg (extloadi8 xoaddr:$src), i8));
+
+ dag Li16 = (i32 (extloadi16 xoaddr:$src));
+ dag ZELi16 = (i32 (zextloadi16 xoaddr:$src));
+ dag ZELi16i64 = (i64 (zextloadi16 xoaddr:$src));
+ dag SELi16 = (i32 (sextloadi16 xoaddr:$src));
+ dag SELi16i64 = (i64 (sextloadi16 xoaddr:$src));
+
+ dag Li32 = (i32 (load xoaddr:$src));
+}
+
// The following VSX instructions were introduced in Power ISA 2.07
/* FIXME: if the operands are v2i64, these patterns will not match.
we should define new patterns or otherwise match the same patterns
@@ -1063,7 +1175,7 @@ def HasP8Vector : Predicate<"PPCSubTarget->hasP8Vector()">;
def HasDirectMove : Predicate<"PPCSubTarget->hasDirectMove()">;
let Predicates = [HasP8Vector] in {
let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
- let isCommutable = 1 in {
+ let isCommutable = 1, UseVSXReg = 1 in {
def XXLEQV : XX3Form<60, 186,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
"xxleqv $XT, $XA, $XB", IIC_VecGeneral,
@@ -1073,11 +1185,12 @@ let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
"xxlnand $XT, $XA, $XB", IIC_VecGeneral,
[(set v4i32:$XT, (vnot_ppc (and v4i32:$XA,
v4i32:$XB)))]>;
- } // isCommutable
+ } // isCommutable, UseVSXReg
def : Pat<(int_ppc_vsx_xxleqv v4i32:$A, v4i32:$B),
(XXLEQV $A, $B)>;
+ let UseVSXReg = 1 in {
def XXLORC : XX3Form<60, 170,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
"xxlorc $XT, $XA, $XB", IIC_VecGeneral,
@@ -1085,6 +1198,7 @@ let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
// VSX scalar loads introduced in ISA 2.07
let mayLoad = 1 in {
+ let CodeSize = 3 in
def LXSSPX : XX1Form<31, 524, (outs vssrc:$XT), (ins memrr:$src),
"lxsspx $XT, $src", IIC_LdStLFD,
[(set f32:$XT, (load xoaddr:$src))]>;
@@ -1098,6 +1212,7 @@ let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
// VSX scalar stores introduced in ISA 2.07
let mayStore = 1 in {
+ let CodeSize = 3 in
def STXSSPX : XX1Form<31, 652, (outs), (ins vssrc:$XT, memrr:$dst),
"stxsspx $XT, $dst", IIC_LdStSTFD,
[(store f32:$XT, xoaddr:$dst)]>;
@@ -1105,10 +1220,13 @@ let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
"stxsiwx $XT, $dst", IIC_LdStSTFD,
[(PPCstfiwx f64:$XT, xoaddr:$dst)]>;
} // mayStore
+ } // UseVSXReg = 1
def : Pat<(f64 (extloadf32 xoaddr:$src)),
(COPY_TO_REGCLASS (LXSSPX xoaddr:$src), VSFRC)>;
- def : Pat<(f64 (fextend f32:$src)),
+ def : Pat<(f32 (fpround (extloadf32 xoaddr:$src))),
+ (f32 (LXSSPX xoaddr:$src))>;
+ def : Pat<(f64 (fpextend f32:$src)),
(COPY_TO_REGCLASS $src, VSFRC)>;
def : Pat<(f32 (selectcc i1:$lhs, i1:$rhs, f32:$tval, f32:$fval, SETLT)),
@@ -1132,6 +1250,7 @@ let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
def : Pat<(f32 (selectcc i1:$lhs, i1:$rhs, f32:$tval, f32:$fval, SETNE)),
(SELECT_VSSRC (CRXOR $lhs, $rhs), $tval, $fval)>;
+ let UseVSXReg = 1 in {
// VSX Elementary Scalar FP arithmetic (SP)
let isCommutable = 1 in {
def XSADDSP : XX3Form<60, 0,
@@ -1256,6 +1375,7 @@ let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
"xscvdpspn $XT, $XB", IIC_VecFP, []>;
def XSCVSPDPN : XX2Form<60, 331, (outs vssrc:$XT), (ins vsrc:$XB),
"xscvspdpn $XT, $XB", IIC_VecFP, []>;
+ } // UseVSXReg = 1
let Predicates = [IsLittleEndian] in {
def : Pat<(f32 (PPCfcfids (PPCmtvsra (i64 (vector_extract v2i64:$S, 0))))),
@@ -1278,9 +1398,12 @@ let AddedComplexity = 400 in { // Prefer VSX patterns over non-VSX patterns.
def : Pat<(f32 (PPCfcfidus (PPCmtvsra (i64 (vector_extract v2i64:$S, 1))))),
(f32 (XSCVUXDSP (COPY_TO_REGCLASS (XXPERMDI $S, $S, 2), VSFRC)))>;
}
+ def : Pat<(v4i32 (scalar_to_vector ScalarLoads.Li32)),
+ (v4i32 (XXSPLTWs (LXSIWAX xoaddr:$src), 1))>;
} // AddedComplexity = 400
} // HasP8Vector
+let UseVSXReg = 1, AddedComplexity = 400 in {
let Predicates = [HasDirectMove] in {
// VSX direct move instructions
def MFVSRD : XX1_RS6_RD5_XO<31, 51, (outs g8rc:$rA), (ins vsfrc:$XT),
@@ -1304,8 +1427,7 @@ let Predicates = [HasDirectMove] in {
let Predicates = [IsISA3_0, HasDirectMove] in {
def MTVSRWS: XX1_RS6_RD5_XO<31, 403, (outs vsrc:$XT), (ins gprc:$rA),
- "mtvsrws $XT, $rA", IIC_VecGeneral,
- []>;
+ "mtvsrws $XT, $rA", IIC_VecGeneral, []>;
def MTVSRDD: XX1Form<31, 435, (outs vsrc:$XT), (ins g8rc:$rA, g8rc:$rB),
"mtvsrdd $XT, $rA, $rB", IIC_VecGeneral,
@@ -1316,6 +1438,7 @@ let Predicates = [IsISA3_0, HasDirectMove] in {
[]>, Requires<[In64BitMode]>;
} // IsISA3_0, HasDirectMove
+} // UseVSXReg = 1
/* Direct moves of various widths from GPR's into VSR's. Each move lines
the value up into element 0 (both BE and LE). Namely, entities smaller than
@@ -1626,6 +1749,7 @@ def VectorExtractions {
dag BE_VARIABLE_DOUBLE = (COPY_TO_REGCLASS BE_VDOUBLE_PERMUTE, VSRC);
}
+let AddedComplexity = 400 in {
// v4f32 scalar <-> vector conversions (BE)
let Predicates = [IsBigEndian, HasP8Vector] in {
def : Pat<(v4f32 (scalar_to_vector f32:$A)),
@@ -1754,6 +1878,9 @@ let Predicates = [IsLittleEndian, HasVSX] in
def : Pat<(f64 (vector_extract v2f64:$S, i64:$Idx)),
(f64 VectorExtractions.LE_VARIABLE_DOUBLE)>;
+ def : Pat<(v4i32 (int_ppc_vsx_lxvw4x_be xoaddr:$src)), (LXVW4X xoaddr:$src)>;
+ def : Pat<(v2f64 (int_ppc_vsx_lxvd2x_be xoaddr:$src)), (LXVD2X xoaddr:$src)>;
+
let Predicates = [IsLittleEndian, HasDirectMove] in {
// v16i8 scalar <-> vector conversions (LE)
def : Pat<(v16i8 (scalar_to_vector i32:$A)),
@@ -1864,6 +1991,11 @@ def : Pat<(f64 (bitconvert i64:$S)),
(f64 (MTVSRD $S))>;
}
+// Materialize a zero-vector of long long
+def : Pat<(v2i64 immAllZerosV),
+ (v2i64 (XXLXORz))>;
+}
+
def AlignValues {
dag F32_TO_BE_WORD1 = (v4f32 (XXSLDWI (XSCVDPSPN $B), (XSCVDPSPN $B), 3));
dag I32_TO_BE_WORD1 = (COPY_TO_REGCLASS (MTVSRWZ $B), VSRC);
@@ -1891,6 +2023,7 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
: X_RD5_XO5_RS5<opcode, xo2, xo, (outs vrrc:$vT), (ins vbtype:$vB),
!strconcat(opc, " $vT, $vB"), IIC_VecFP, pattern>;
+ let UseVSXReg = 1 in {
// [PO T XO B XO BX /]
class XX2_RT5_XO5_XB6<bits<6> opcode, bits<5> xo2, bits<9> xo, string opc,
list<dag> pattern>
@@ -1909,6 +2042,7 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
InstrItinClass itin, list<dag> pattern>
: XX3Form<opcode, xo, (outs xty:$XT), (ins aty:$XA, bty:$XB),
!strconcat(opc, " $XT, $XA, $XB"), itin, pattern>;
+ } // UseVSXReg = 1
// [PO VRT VRA VRB XO /]
class X_VT5_VA5_VB5<bits<6> opcode, bits<10> xo, string opc,
@@ -1977,7 +2111,8 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
// DP/QP Compare Exponents
def XSCMPEXPDP : XX3Form_1<60, 59,
(outs crrc:$crD), (ins vsfrc:$XA, vsfrc:$XB),
- "xscmpexpdp $crD, $XA, $XB", IIC_FPCompare, []>;
+ "xscmpexpdp $crD, $XA, $XB", IIC_FPCompare, []>,
+ UseVSXReg;
def XSCMPEXPQP : X_BF3_VA5_VB5<63, 164, "xscmpexpqp", []>;
// DP Compare ==, >=, >, !=
@@ -1991,6 +2126,7 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
IIC_FPCompare, []>;
def XSCMPNEDP : XX3_XT5_XA5_XB5<60, 27, "xscmpnedp", vsrc, vsfrc, vsfrc,
IIC_FPCompare, []>;
+ let UseVSXReg = 1 in {
// Vector Compare Not Equal
def XVCMPNEDP : XX3Form_Rc<60, 123,
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
@@ -2008,12 +2144,13 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
(outs vsrc:$XT), (ins vsrc:$XA, vsrc:$XB),
"xvcmpnesp. $XT, $XA, $XB", IIC_VecFPCompare, []>,
isDOT;
+ } // UseVSXReg = 1
//===--------------------------------------------------------------------===//
// Quad-Precision Floating-Point Conversion Instructions:
// Convert DP -> QP
- def XSCVDPQP : X_VT5_XO5_VB5_TyVB<63, 22, 836, "xscvdpqp", vsfrc, []>;
+ def XSCVDPQP : X_VT5_XO5_VB5_TyVB<63, 22, 836, "xscvdpqp", vfrc, []>;
// Round & Convert QP -> DP (dword[1] is set to zero)
def XSCVQPDP : X_VT5_XO5_VB5 <63, 20, 836, "xscvqpdp" , []>;
@@ -2026,9 +2163,10 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
def XSCVQPUWZ : X_VT5_XO5_VB5<63, 1, 836, "xscvqpuwz", []>;
// Convert (Un)Signed DWord -> QP
- def XSCVSDQP : X_VT5_XO5_VB5_TyVB<63, 10, 836, "xscvsdqp", vsfrc, []>;
- def XSCVUDQP : X_VT5_XO5_VB5_TyVB<63, 2, 836, "xscvudqp", vsfrc, []>;
+ def XSCVSDQP : X_VT5_XO5_VB5_TyVB<63, 10, 836, "xscvsdqp", vfrc, []>;
+ def XSCVUDQP : X_VT5_XO5_VB5_TyVB<63, 2, 836, "xscvudqp", vfrc, []>;
+ let UseVSXReg = 1 in {
//===--------------------------------------------------------------------===//
// Round to Floating-Point Integer Instructions
@@ -2041,7 +2179,17 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
// Vector HP -> SP
def XVCVHPSP : XX2_XT6_XO5_XB6<60, 24, 475, "xvcvhpsp", vsrc, []>;
- def XVCVSPHP : XX2_XT6_XO5_XB6<60, 25, 475, "xvcvsphp", vsrc, []>;
+ def XVCVSPHP : XX2_XT6_XO5_XB6<60, 25, 475, "xvcvsphp", vsrc,
+ [(set v4f32:$XT,
+ (int_ppc_vsx_xvcvsphp v4f32:$XB))]>;
+
+ } // UseVSXReg = 1
+
+ // Pattern for matching Vector HP -> Vector SP intrinsic. Defined as a
+ // seperate pattern so that it can convert the input register class from
+ // VRRC(v8i16) to VSRC.
+ def : Pat<(v4f32 (int_ppc_vsx_xvcvhpsp v8i16:$A)),
+ (v4f32 (XVCVHPSP (COPY_TO_REGCLASS $A, VSRC)))>;
class Z23_VT5_R1_VB5_RMC2_EX1<bits<6> opcode, bits<8> xo, bit ex, string opc,
list<dag> pattern>
@@ -2064,7 +2212,7 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
// Insert Exponent DP/QP
// XT NOTE: XT.dword[1] = 0xUUUU_UUUU_UUUU_UUUU
def XSIEXPDP : XX1Form <60, 918, (outs vsrc:$XT), (ins g8rc:$rA, g8rc:$rB),
- "xsiexpdp $XT, $rA, $rB", IIC_VecFP, []>;
+ "xsiexpdp $XT, $rA, $rB", IIC_VecFP, []>, UseVSXReg;
// vB NOTE: only vB.dword[0] is used, that's why we don't use
// X_VT5_VA5_VB5 form
def XSIEXPQP : XForm_18<63, 868, (outs vrrc:$vT), (ins vrrc:$vA, vsfrc:$vB),
@@ -2073,10 +2221,12 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
// Extract Exponent/Significand DP/QP
def XSXEXPDP : XX2_RT5_XO5_XB6<60, 0, 347, "xsxexpdp", []>;
def XSXSIGDP : XX2_RT5_XO5_XB6<60, 1, 347, "xsxsigdp", []>;
+
def XSXEXPQP : X_VT5_XO5_VB5 <63, 2, 804, "xsxexpqp", []>;
def XSXSIGQP : X_VT5_XO5_VB5 <63, 18, 804, "xsxsigqp", []>;
// Vector Insert Word
+ let UseVSXReg = 1 in {
// XB NOTE: Only XB.dword[1] is used, but we use vsrc on XB.
def XXINSERTW :
XX2_RD6_UIM5_RS6<60, 181, (outs vsrc:$XT),
@@ -2090,39 +2240,64 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
def XXEXTRACTUW : XX2_RD6_UIM5_RS6<60, 165,
(outs vsfrc:$XT), (ins vsrc:$XB, u4imm:$UIMM),
"xxextractuw $XT, $XB, $UIMM", IIC_VecFP, []>;
+ } // UseVSXReg = 1
// Vector Insert Exponent DP/SP
def XVIEXPDP : XX3_XT5_XA5_XB5<60, 248, "xviexpdp", vsrc, vsrc, vsrc,
- IIC_VecFP, []>;
+ IIC_VecFP, [(set v2f64: $XT,(int_ppc_vsx_xviexpdp v2i64:$XA, v2i64:$XB))]>;
def XVIEXPSP : XX3_XT5_XA5_XB5<60, 216, "xviexpsp", vsrc, vsrc, vsrc,
- IIC_VecFP, []>;
+ IIC_VecFP, [(set v4f32: $XT,(int_ppc_vsx_xviexpsp v4i32:$XA, v4i32:$XB))]>;
// Vector Extract Exponent/Significand DP/SP
- def XVXEXPDP : XX2_XT6_XO5_XB6<60, 0, 475, "xvxexpdp", vsrc, []>;
- def XVXEXPSP : XX2_XT6_XO5_XB6<60, 8, 475, "xvxexpsp", vsrc, []>;
- def XVXSIGDP : XX2_XT6_XO5_XB6<60, 1, 475, "xvxsigdp", vsrc, []>;
- def XVXSIGSP : XX2_XT6_XO5_XB6<60, 9, 475, "xvxsigsp", vsrc, []>;
+ def XVXEXPDP : XX2_XT6_XO5_XB6<60, 0, 475, "xvxexpdp", vsrc,
+ [(set v2i64: $XT,
+ (int_ppc_vsx_xvxexpdp v2f64:$XB))]>;
+ def XVXEXPSP : XX2_XT6_XO5_XB6<60, 8, 475, "xvxexpsp", vsrc,
+ [(set v4i32: $XT,
+ (int_ppc_vsx_xvxexpsp v4f32:$XB))]>;
+ def XVXSIGDP : XX2_XT6_XO5_XB6<60, 1, 475, "xvxsigdp", vsrc,
+ [(set v2i64: $XT,
+ (int_ppc_vsx_xvxsigdp v2f64:$XB))]>;
+ def XVXSIGSP : XX2_XT6_XO5_XB6<60, 9, 475, "xvxsigsp", vsrc,
+ [(set v4i32: $XT,
+ (int_ppc_vsx_xvxsigsp v4f32:$XB))]>;
+
+ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
+ // Extra patterns expanding to vector Extract Word/Insert Word
+ def : Pat<(v4i32 (int_ppc_vsx_xxinsertw v4i32:$A, v2i64:$B, imm:$IMM)),
+ (v4i32 (XXINSERTW $A, $B, imm:$IMM))>;
+ def : Pat<(v2i64 (int_ppc_vsx_xxextractuw v2i64:$A, imm:$IMM)),
+ (v2i64 (COPY_TO_REGCLASS (XXEXTRACTUW $A, imm:$IMM), VSRC))>;
+ } // AddedComplexity = 400, HasP9Vector
//===--------------------------------------------------------------------===//
// Test Data Class SP/DP/QP
+ let UseVSXReg = 1 in {
def XSTSTDCSP : XX2_BF3_DCMX7_RS6<60, 298,
(outs crrc:$BF), (ins u7imm:$DCMX, vsfrc:$XB),
"xststdcsp $BF, $XB, $DCMX", IIC_VecFP, []>;
def XSTSTDCDP : XX2_BF3_DCMX7_RS6<60, 362,
(outs crrc:$BF), (ins u7imm:$DCMX, vsfrc:$XB),
"xststdcdp $BF, $XB, $DCMX", IIC_VecFP, []>;
+ } // UseVSXReg = 1
def XSTSTDCQP : X_BF3_DCMX7_RS5 <63, 708,
(outs crrc:$BF), (ins u7imm:$DCMX, vrrc:$vB),
"xststdcqp $BF, $vB, $DCMX", IIC_VecFP, []>;
// Vector Test Data Class SP/DP
+ let UseVSXReg = 1 in {
def XVTSTDCSP : XX2_RD6_DCMX7_RS6<60, 13, 5,
(outs vsrc:$XT), (ins u7imm:$DCMX, vsrc:$XB),
- "xvtstdcsp $XT, $XB, $DCMX", IIC_VecFP, []>;
+ "xvtstdcsp $XT, $XB, $DCMX", IIC_VecFP,
+ [(set v4i32: $XT,
+ (int_ppc_vsx_xvtstdcsp v4f32:$XB, imm:$DCMX))]>;
def XVTSTDCDP : XX2_RD6_DCMX7_RS6<60, 15, 5,
(outs vsrc:$XT), (ins u7imm:$DCMX, vsrc:$XB),
- "xvtstdcdp $XT, $XB, $DCMX", IIC_VecFP, []>;
+ "xvtstdcdp $XT, $XB, $DCMX", IIC_VecFP,
+ [(set v2i64: $XT,
+ (int_ppc_vsx_xvtstdcdp v2f64:$XB, imm:$DCMX))]>;
+ } // UseVSXReg = 1
//===--------------------------------------------------------------------===//
@@ -2153,20 +2328,22 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
// Vector Splat Immediate Byte
def XXSPLTIB : X_RD6_IMM8<60, 360, (outs vsrc:$XT), (ins u8imm:$IMM8),
- "xxspltib $XT, $IMM8", IIC_VecPerm, []>;
+ "xxspltib $XT, $IMM8", IIC_VecPerm, []>, UseVSXReg;
//===--------------------------------------------------------------------===//
// Vector/Scalar Load/Store Instructions
+ // When adding new D-Form loads/stores, be sure to update the ImmToIdxMap in
+ // PPCRegisterInfo::PPCRegisterInfo and maybe save yourself some debugging.
let mayLoad = 1 in {
// Load Vector
def LXV : DQ_RD6_RS5_DQ12<61, 1, (outs vsrc:$XT), (ins memrix16:$src),
- "lxv $XT, $src", IIC_LdStLFD, []>;
+ "lxv $XT, $src", IIC_LdStLFD, []>, UseVSXReg;
// Load DWord
- def LXSD : DSForm_1<57, 2, (outs vrrc:$vD), (ins memrix:$src),
+ def LXSD : DSForm_1<57, 2, (outs vfrc:$vD), (ins memrix:$src),
"lxsd $vD, $src", IIC_LdStLFD, []>;
// Load SP from src, convert it to DP, and place in dword[0]
- def LXSSP : DSForm_1<57, 3, (outs vrrc:$vD), (ins memrix:$src),
+ def LXSSP : DSForm_1<57, 3, (outs vfrc:$vD), (ins memrix:$src),
"lxssp $vD, $src", IIC_LdStLFD, []>;
// [PO T RA RB XO TX] almost equal to [PO S RA RB XO SX], but has different
@@ -2174,59 +2351,83 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
class X_XT6_RA5_RB5<bits<6> opcode, bits<10> xo, string opc,
RegisterOperand vtype, list<dag> pattern>
: XX1Form<opcode, xo, (outs vtype:$XT), (ins memrr:$src),
- !strconcat(opc, " $XT, $src"), IIC_LdStLFD, pattern>;
+ !strconcat(opc, " $XT, $src"), IIC_LdStLFD, pattern>, UseVSXReg;
// Load as Integer Byte/Halfword & Zero Indexed
- def LXSIBZX : X_XT6_RA5_RB5<31, 781, "lxsibzx", vsfrc, []>;
- def LXSIHZX : X_XT6_RA5_RB5<31, 813, "lxsihzx", vsfrc, []>;
+ def LXSIBZX : X_XT6_RA5_RB5<31, 781, "lxsibzx", vsfrc,
+ [(set f64:$XT, (PPClxsizx xoaddr:$src, 1))]>;
+ def LXSIHZX : X_XT6_RA5_RB5<31, 813, "lxsihzx", vsfrc,
+ [(set f64:$XT, (PPClxsizx xoaddr:$src, 2))]>;
// Load Vector Halfword*8/Byte*16 Indexed
def LXVH8X : X_XT6_RA5_RB5<31, 812, "lxvh8x" , vsrc, []>;
def LXVB16X : X_XT6_RA5_RB5<31, 876, "lxvb16x", vsrc, []>;
// Load Vector Indexed
- def LXVX : X_XT6_RA5_RB5<31, 268, "lxvx" , vsrc, []>;
+ def LXVX : X_XT6_RA5_RB5<31, 268, "lxvx" , vsrc,
+ [(set v2f64:$XT, (load xoaddr:$src))]>;
// Load Vector (Left-justified) with Length
- def LXVL : X_XT6_RA5_RB5<31, 269, "lxvl" , vsrc, []>;
- def LXVLL : X_XT6_RA5_RB5<31, 301, "lxvll" , vsrc, []>;
+ def LXVL : XX1Form<31, 269, (outs vsrc:$XT), (ins memr:$src, g8rc:$rB),
+ "lxvl $XT, $src, $rB", IIC_LdStLoad,
+ [(set v4i32:$XT, (int_ppc_vsx_lxvl addr:$src, i64:$rB))]>,
+ UseVSXReg;
+ def LXVLL : XX1Form<31,301, (outs vsrc:$XT), (ins memr:$src, g8rc:$rB),
+ "lxvll $XT, $src, $rB", IIC_LdStLoad,
+ [(set v4i32:$XT, (int_ppc_vsx_lxvll addr:$src, i64:$rB))]>,
+ UseVSXReg;
// Load Vector Word & Splat Indexed
def LXVWSX : X_XT6_RA5_RB5<31, 364, "lxvwsx" , vsrc, []>;
- } // end mayLoad
+ } // mayLoad
+ // When adding new D-Form loads/stores, be sure to update the ImmToIdxMap in
+ // PPCRegisterInfo::PPCRegisterInfo and maybe save yourself some debugging.
let mayStore = 1 in {
// Store Vector
def STXV : DQ_RD6_RS5_DQ12<61, 5, (outs), (ins vsrc:$XT, memrix16:$dst),
- "stxv $XT, $dst", IIC_LdStSTFD, []>;
+ "stxv $XT, $dst", IIC_LdStSTFD, []>, UseVSXReg;
// Store DWord
- def STXSD : DSForm_1<61, 2, (outs), (ins vrrc:$vS, memrix:$dst),
+ def STXSD : DSForm_1<61, 2, (outs), (ins vfrc:$vS, memrix:$dst),
"stxsd $vS, $dst", IIC_LdStSTFD, []>;
// Convert DP of dword[0] to SP, and Store to dst
- def STXSSP : DSForm_1<61, 3, (outs), (ins vrrc:$vS, memrix:$dst),
+ def STXSSP : DSForm_1<61, 3, (outs), (ins vfrc:$vS, memrix:$dst),
"stxssp $vS, $dst", IIC_LdStSTFD, []>;
// [PO S RA RB XO SX]
class X_XS6_RA5_RB5<bits<6> opcode, bits<10> xo, string opc,
RegisterOperand vtype, list<dag> pattern>
: XX1Form<opcode, xo, (outs), (ins vtype:$XT, memrr:$dst),
- !strconcat(opc, " $XT, $dst"), IIC_LdStSTFD, pattern>;
+ !strconcat(opc, " $XT, $dst"), IIC_LdStSTFD, pattern>, UseVSXReg;
// Store as Integer Byte/Halfword Indexed
- def STXSIBX : X_XS6_RA5_RB5<31, 909, "stxsibx" , vsfrc, []>;
- def STXSIHX : X_XS6_RA5_RB5<31, 941, "stxsihx" , vsfrc, []>;
+ def STXSIBX : X_XS6_RA5_RB5<31, 909, "stxsibx" , vsfrc,
+ [(PPCstxsix f64:$XT, xoaddr:$dst, 1)]>;
+ def STXSIHX : X_XS6_RA5_RB5<31, 941, "stxsihx" , vsfrc,
+ [(PPCstxsix f64:$XT, xoaddr:$dst, 2)]>;
+ let isCodeGenOnly = 1 in {
+ def STXSIBXv : X_XS6_RA5_RB5<31, 909, "stxsibx" , vrrc, []>;
+ def STXSIHXv : X_XS6_RA5_RB5<31, 941, "stxsihx" , vrrc, []>;
+ }
// Store Vector Halfword*8/Byte*16 Indexed
def STXVH8X : X_XS6_RA5_RB5<31, 940, "stxvh8x" , vsrc, []>;
def STXVB16X : X_XS6_RA5_RB5<31, 1004, "stxvb16x", vsrc, []>;
// Store Vector Indexed
- def STXVX : X_XS6_RA5_RB5<31, 396, "stxvx" , vsrc, []>;
+ def STXVX : X_XS6_RA5_RB5<31, 396, "stxvx" , vsrc,
+ [(store v2f64:$XT, xoaddr:$dst)]>;
// Store Vector (Left-justified) with Length
- def STXVL : X_XS6_RA5_RB5<31, 397, "stxvl" , vsrc, []>;
- def STXVLL : X_XS6_RA5_RB5<31, 429, "stxvll" , vsrc, []>;
- } // end mayStore
+ def STXVL : XX1Form<31, 397, (outs), (ins vsrc:$XT, memr:$dst, g8rc:$rB),
+ "stxvl $XT, $dst, $rB", IIC_LdStLoad,
+ [(int_ppc_vsx_stxvl v4i32:$XT, addr:$dst, i64:$rB)]>,
+ UseVSXReg;
+ def STXVLL : XX1Form<31, 429, (outs), (ins vsrc:$XT, memr:$dst, g8rc:$rB),
+ "stxvll $XT, $dst, $rB", IIC_LdStLoad,
+ [(int_ppc_vsx_stxvll v4i32:$XT, addr:$dst, i64:$rB)]>,
+ UseVSXReg;
+ } // mayStore
// Patterns for which instructions from ISA 3.0 are a better match
let Predicates = [IsLittleEndian, HasP9Vector] in {
@@ -2282,4 +2483,442 @@ let AddedComplexity = 400, Predicates = [HasP9Vector] in {
def : Pat<(v4f32 (insertelt v4f32:$A, f32:$B, 3)),
(v4f32 (XXINSERTW v4f32:$A, AlignValues.F32_TO_BE_WORD1, 12))>;
} // IsLittleEndian, HasP9Vector
+
+ def : Pat<(v2f64 (load xoaddr:$src)), (LXVX xoaddr:$src)>;
+ def : Pat<(v2i64 (load xoaddr:$src)), (LXVX xoaddr:$src)>;
+ def : Pat<(v4f32 (load xoaddr:$src)), (LXVX xoaddr:$src)>;
+ def : Pat<(v4i32 (load xoaddr:$src)), (LXVX xoaddr:$src)>;
+ def : Pat<(v4i32 (int_ppc_vsx_lxvw4x xoaddr:$src)), (LXVX xoaddr:$src)>;
+ def : Pat<(v2f64 (int_ppc_vsx_lxvd2x xoaddr:$src)), (LXVX xoaddr:$src)>;
+ def : Pat<(store v2f64:$rS, xoaddr:$dst), (STXVX $rS, xoaddr:$dst)>;
+ def : Pat<(store v2i64:$rS, xoaddr:$dst), (STXVX $rS, xoaddr:$dst)>;
+ def : Pat<(store v4f32:$rS, xoaddr:$dst), (STXVX $rS, xoaddr:$dst)>;
+ def : Pat<(store v4i32:$rS, xoaddr:$dst), (STXVX $rS, xoaddr:$dst)>;
+ def : Pat<(int_ppc_vsx_stxvw4x v4i32:$rS, xoaddr:$dst),
+ (STXVX $rS, xoaddr:$dst)>;
+ def : Pat<(int_ppc_vsx_stxvd2x v2f64:$rS, xoaddr:$dst),
+ (STXVX $rS, xoaddr:$dst)>;
+
+ def : Pat<(v4i32 (scalar_to_vector (i32 (load xoaddr:$src)))),
+ (v4i32 (LXVWSX xoaddr:$src))>;
+ def : Pat<(v4f32 (scalar_to_vector (f32 (load xoaddr:$src)))),
+ (v4f32 (LXVWSX xoaddr:$src))>;
+ def : Pat<(v4f32 (scalar_to_vector (f32 (fpround (extloadf32 xoaddr:$src))))),
+ (v4f32 (LXVWSX xoaddr:$src))>;
+
+ // Build vectors from i8 loads
+ def : Pat<(v16i8 (scalar_to_vector ScalarLoads.Li8)),
+ (v16i8 (VSPLTBs 7, (LXSIBZX xoaddr:$src)))>;
+ def : Pat<(v8i16 (scalar_to_vector ScalarLoads.ZELi8)),
+ (v8i16 (VSPLTHs 3, (LXSIBZX xoaddr:$src)))>;
+ def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi8)),
+ (v4i32 (XXSPLTWs (LXSIBZX xoaddr:$src), 1))>;
+ def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi8i64)),
+ (v2i64 (XXPERMDIs (LXSIBZX xoaddr:$src), 0))>;
+ def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi8)),
+ (v4i32 (XXSPLTWs (VEXTSB2Ws (LXSIBZX xoaddr:$src)), 1))>;
+ def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi8i64)),
+ (v2i64 (XXPERMDIs (VEXTSB2Ds (LXSIBZX xoaddr:$src)), 0))>;
+
+ // Build vectors from i16 loads
+ def : Pat<(v8i16 (scalar_to_vector ScalarLoads.Li16)),
+ (v8i16 (VSPLTHs 3, (LXSIHZX xoaddr:$src)))>;
+ def : Pat<(v4i32 (scalar_to_vector ScalarLoads.ZELi16)),
+ (v4i32 (XXSPLTWs (LXSIHZX xoaddr:$src), 1))>;
+ def : Pat<(v2i64 (scalar_to_vector ScalarLoads.ZELi16i64)),
+ (v2i64 (XXPERMDIs (LXSIHZX xoaddr:$src), 0))>;
+ def : Pat<(v4i32 (scalar_to_vector ScalarLoads.SELi16)),
+ (v4i32 (XXSPLTWs (VEXTSH2Ws (LXSIHZX xoaddr:$src)), 1))>;
+ def : Pat<(v2i64 (scalar_to_vector ScalarLoads.SELi16i64)),
+ (v2i64 (XXPERMDIs (VEXTSH2Ds (LXSIHZX xoaddr:$src)), 0))>;
+
+ let Predicates = [IsBigEndian, HasP9Vector] in {
+ // Scalar stores of i8
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 0)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 9), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 1)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 10), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 2)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 11), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 3)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 12), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 4)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 13), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 5)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 14), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 6)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 15), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 7)), xoaddr:$dst),
+ (STXSIBXv $S, xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 8)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 1), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 9)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 2), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 10)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 3), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 11)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 4), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 12)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 5), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 13)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 6), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 14)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 7), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 15)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 8), xoaddr:$dst)>;
+
+ // Scalar stores of i16
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 0)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 10), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 1)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 12), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 2)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 14), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 3)), xoaddr:$dst),
+ (STXSIHXv $S, xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 4)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 2), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 5)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 4), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 6)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 6), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 7)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 8), xoaddr:$dst)>;
+ } // IsBigEndian, HasP9Vector
+
+ let Predicates = [IsLittleEndian, HasP9Vector] in {
+ // Scalar stores of i8
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 0)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 8), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 1)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 7), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 2)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 6), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 3)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 5), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 4)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 4), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 5)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 3), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 6)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 2), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 7)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 1), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 8)), xoaddr:$dst),
+ (STXSIBXv $S, xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 9)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 15), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 10)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 14), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 11)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 13), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 12)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 12), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 13)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 11), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 14)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 10), xoaddr:$dst)>;
+ def : Pat<(truncstorei8 (i32 (vector_extract v16i8:$S, 15)), xoaddr:$dst),
+ (STXSIBXv (VSLDOI $S, $S, 9), xoaddr:$dst)>;
+
+ // Scalar stores of i16
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 0)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 8), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 1)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 6), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 2)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 4), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 3)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 2), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 4)), xoaddr:$dst),
+ (STXSIHXv $S, xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 5)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 14), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 6)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 12), xoaddr:$dst)>;
+ def : Pat<(truncstorei16 (i32 (vector_extract v8i16:$S, 7)), xoaddr:$dst),
+ (STXSIHXv (VSLDOI $S, $S, 10), xoaddr:$dst)>;
+ } // IsLittleEndian, HasP9Vector
+
+
+ // Vector sign extensions
+ def : Pat<(f64 (PPCVexts f64:$A, 1)),
+ (f64 (COPY_TO_REGCLASS (VEXTSB2Ds $A), VSFRC))>;
+ def : Pat<(f64 (PPCVexts f64:$A, 2)),
+ (f64 (COPY_TO_REGCLASS (VEXTSH2Ds $A), VSFRC))>;
+
+ let isPseudo = 1 in {
+ def DFLOADf32 : Pseudo<(outs vssrc:$XT), (ins memrix:$src),
+ "#DFLOADf32",
+ [(set f32:$XT, (load iaddr:$src))]>;
+ def DFLOADf64 : Pseudo<(outs vsfrc:$XT), (ins memrix:$src),
+ "#DFLOADf64",
+ [(set f64:$XT, (load iaddr:$src))]>;
+ def DFSTOREf32 : Pseudo<(outs), (ins vssrc:$XT, memrix:$dst),
+ "#DFSTOREf32",
+ [(store f32:$XT, iaddr:$dst)]>;
+ def DFSTOREf64 : Pseudo<(outs), (ins vsfrc:$XT, memrix:$dst),
+ "#DFSTOREf64",
+ [(store f64:$XT, iaddr:$dst)]>;
+ }
+ def : Pat<(f64 (extloadf32 iaddr:$src)),
+ (COPY_TO_REGCLASS (DFLOADf32 iaddr:$src), VSFRC)>;
+ def : Pat<(f32 (fpround (extloadf32 iaddr:$src))),
+ (f32 (DFLOADf32 iaddr:$src))>;
} // end HasP9Vector, AddedComplexity
+
+// Integer extend helper dags 32 -> 64
+def AnyExts {
+ dag A = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $A, sub_32);
+ dag B = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $B, sub_32);
+ dag C = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $C, sub_32);
+ dag D = (INSERT_SUBREG (i64 (IMPLICIT_DEF)), $D, sub_32);
+}
+
+def DblToFlt {
+ dag A0 = (f32 (fpround (f64 (extractelt v2f64:$A, 0))));
+ dag A1 = (f32 (fpround (f64 (extractelt v2f64:$A, 1))));
+ dag B0 = (f32 (fpround (f64 (extractelt v2f64:$B, 0))));
+ dag B1 = (f32 (fpround (f64 (extractelt v2f64:$B, 1))));
+}
+def FltToIntLoad {
+ dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (extloadf32 xoaddr:$A)))));
+}
+def FltToUIntLoad {
+ dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (extloadf32 xoaddr:$A)))));
+}
+def FltToLongLoad {
+ dag A = (i64 (PPCmfvsr (PPCfctidz (f64 (extloadf32 xoaddr:$A)))));
+}
+def FltToULongLoad {
+ dag A = (i64 (PPCmfvsr (PPCfctiduz (f64 (extloadf32 xoaddr:$A)))));
+}
+def FltToLong {
+ dag A = (i64 (PPCmfvsr (PPCfctidz (fpextend f32:$A))));
+}
+def FltToULong {
+ dag A = (i64 (PPCmfvsr (PPCfctiduz (fpextend f32:$A))));
+}
+def DblToInt {
+ dag A = (i32 (PPCmfvsr (f64 (PPCfctiwz f64:$A))));
+}
+def DblToUInt {
+ dag A = (i32 (PPCmfvsr (f64 (PPCfctiwuz f64:$A))));
+}
+def DblToLong {
+ dag A = (i64 (PPCmfvsr (f64 (PPCfctidz f64:$A))));
+}
+def DblToULong {
+ dag A = (i64 (PPCmfvsr (f64 (PPCfctiduz f64:$A))));
+}
+def DblToIntLoad {
+ dag A = (i32 (PPCmfvsr (PPCfctiwz (f64 (load xoaddr:$A)))));
+}
+def DblToUIntLoad {
+ dag A = (i32 (PPCmfvsr (PPCfctiwuz (f64 (load xoaddr:$A)))));
+}
+def DblToLongLoad {
+ dag A = (i64 (PPCmfvsr (PPCfctidz (f64 (load xoaddr:$A)))));
+}
+def DblToULongLoad {
+ dag A = (i64 (PPCmfvsr (PPCfctiduz (f64 (load xoaddr:$A)))));
+}
+
+// FP merge dags (for f32 -> v4f32)
+def MrgFP {
+ dag AC = (XVCVDPSP (XXPERMDI (COPY_TO_REGCLASS $A, VSRC),
+ (COPY_TO_REGCLASS $C, VSRC), 0));
+ dag BD = (XVCVDPSP (XXPERMDI (COPY_TO_REGCLASS $B, VSRC),
+ (COPY_TO_REGCLASS $D, VSRC), 0));
+ dag ABhToFlt = (XVCVDPSP (XXPERMDI $A, $B, 0));
+ dag ABlToFlt = (XVCVDPSP (XXPERMDI $A, $B, 3));
+ dag BAhToFlt = (XVCVDPSP (XXPERMDI $B, $A, 0));
+ dag BAlToFlt = (XVCVDPSP (XXPERMDI $B, $A, 3));
+}
+
+// Patterns for BUILD_VECTOR nodes.
+def NoP9Vector : Predicate<"!PPCSubTarget->hasP9Vector()">;
+let AddedComplexity = 400 in {
+
+ let Predicates = [HasVSX] in {
+ // Build vectors of floating point converted to i32.
+ def : Pat<(v4i32 (build_vector DblToInt.A, DblToInt.A,
+ DblToInt.A, DblToInt.A)),
+ (v4i32 (XXSPLTW (COPY_TO_REGCLASS (XSCVDPSXWS $A), VSRC), 1))>;
+ def : Pat<(v4i32 (build_vector DblToUInt.A, DblToUInt.A,
+ DblToUInt.A, DblToUInt.A)),
+ (v4i32 (XXSPLTW (COPY_TO_REGCLASS (XSCVDPUXWS $A), VSRC), 1))>;
+ def : Pat<(v2i64 (build_vector DblToLong.A, DblToLong.A)),
+ (v2i64 (XXPERMDI (COPY_TO_REGCLASS (XSCVDPSXDS $A), VSRC),
+ (COPY_TO_REGCLASS (XSCVDPSXDS $A), VSRC), 0))>;
+ def : Pat<(v2i64 (build_vector DblToULong.A, DblToULong.A)),
+ (v2i64 (XXPERMDI (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC),
+ (COPY_TO_REGCLASS (XSCVDPUXDS $A), VSRC), 0))>;
+ def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
+ (v4i32 (XXSPLTW (COPY_TO_REGCLASS
+ (XSCVDPSXWSs (LXSSPX xoaddr:$A)), VSRC), 1))>;
+ def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
+ (v4i32 (XXSPLTW (COPY_TO_REGCLASS
+ (XSCVDPUXWSs (LXSSPX xoaddr:$A)), VSRC), 1))>;
+ def : Pat<(v4f32 (build_vector f32:$A, f32:$A, f32:$A, f32:$A)),
+ (v4f32 (XXSPLTW (v4f32 (XSCVDPSPN $A)), 0))>;
+
+ // Build vectors of floating point converted to i64.
+ def : Pat<(v2i64 (build_vector FltToLong.A, FltToLong.A)),
+ (v2i64 (XXPERMDIs
+ (COPY_TO_REGCLASS (XSCVDPSXDSs $A), VSFRC), 0))>;
+ def : Pat<(v2i64 (build_vector FltToULong.A, FltToULong.A)),
+ (v2i64 (XXPERMDIs
+ (COPY_TO_REGCLASS (XSCVDPUXDSs $A), VSFRC), 0))>;
+ def : Pat<(v2i64 (scalar_to_vector DblToLongLoad.A)),
+ (v2i64 (XVCVDPSXDS (LXVDSX xoaddr:$A)))>;
+ def : Pat<(v2i64 (scalar_to_vector DblToULongLoad.A)),
+ (v2i64 (XVCVDPUXDS (LXVDSX xoaddr:$A)))>;
+ }
+
+ let Predicates = [HasVSX, NoP9Vector] in {
+ // Load-and-splat with fp-to-int conversion (using X-Form VSX loads).
+ def : Pat<(v4i32 (scalar_to_vector DblToIntLoad.A)),
+ (v4i32 (XXSPLTW (COPY_TO_REGCLASS
+ (XSCVDPSXWS (LXSDX xoaddr:$A)), VSRC), 1))>;
+ def : Pat<(v4i32 (scalar_to_vector DblToUIntLoad.A)),
+ (v4i32 (XXSPLTW (COPY_TO_REGCLASS
+ (XSCVDPUXWS (LXSDX xoaddr:$A)), VSRC), 1))>;
+ def : Pat<(v2i64 (scalar_to_vector FltToLongLoad.A)),
+ (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
+ (LXSSPX xoaddr:$A), VSFRC)), 0))>;
+ def : Pat<(v2i64 (scalar_to_vector FltToULongLoad.A)),
+ (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
+ (LXSSPX xoaddr:$A), VSFRC)), 0))>;
+ }
+
+ // Big endian, available on all targets with VSX
+ let Predicates = [IsBigEndian, HasVSX] in {
+ def : Pat<(v2f64 (build_vector f64:$A, f64:$B)),
+ (v2f64 (XXPERMDI
+ (COPY_TO_REGCLASS $A, VSRC),
+ (COPY_TO_REGCLASS $B, VSRC), 0))>;
+
+ def : Pat<(v4f32 (build_vector f32:$A, f32:$B, f32:$C, f32:$D)),
+ (VMRGEW MrgFP.AC, MrgFP.BD)>;
+ def : Pat<(v4f32 (build_vector DblToFlt.A0, DblToFlt.A1,
+ DblToFlt.B0, DblToFlt.B1)),
+ (v4f32 (VMRGEW MrgFP.ABhToFlt, MrgFP.ABlToFlt))>;
+ }
+
+ let Predicates = [IsLittleEndian, HasVSX] in {
+ // Little endian, available on all targets with VSX
+ def : Pat<(v2f64 (build_vector f64:$A, f64:$B)),
+ (v2f64 (XXPERMDI
+ (COPY_TO_REGCLASS $B, VSRC),
+ (COPY_TO_REGCLASS $A, VSRC), 0))>;
+
+ def : Pat<(v4f32 (build_vector f32:$D, f32:$C, f32:$B, f32:$A)),
+ (VMRGEW MrgFP.AC, MrgFP.BD)>;
+ def : Pat<(v4f32 (build_vector DblToFlt.A0, DblToFlt.A1,
+ DblToFlt.B0, DblToFlt.B1)),
+ (v4f32 (VMRGEW MrgFP.BAhToFlt, MrgFP.BAlToFlt))>;
+ }
+
+ let Predicates = [HasDirectMove] in {
+ // Endianness-neutral constant splat on P8 and newer targets. The reason
+ // for this pattern is that on targets with direct moves, we don't expand
+ // BUILD_VECTOR nodes for v4i32.
+ def : Pat<(v4i32 (build_vector immSExt5NonZero:$A, immSExt5NonZero:$A,
+ immSExt5NonZero:$A, immSExt5NonZero:$A)),
+ (v4i32 (VSPLTISW imm:$A))>;
+ }
+
+ let Predicates = [IsBigEndian, HasDirectMove, NoP9Vector] in {
+ // Big endian integer vectors using direct moves.
+ def : Pat<(v2i64 (build_vector i64:$A, i64:$B)),
+ (v2i64 (XXPERMDI
+ (COPY_TO_REGCLASS (MTVSRD $A), VSRC),
+ (COPY_TO_REGCLASS (MTVSRD $B), VSRC), 0))>;
+ def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),
+ (VMRGOW (XXPERMDI (COPY_TO_REGCLASS (MTVSRWZ $A), VSRC),
+ (COPY_TO_REGCLASS (MTVSRWZ $C), VSRC), 0),
+ (XXPERMDI (COPY_TO_REGCLASS (MTVSRWZ $B), VSRC),
+ (COPY_TO_REGCLASS (MTVSRWZ $D), VSRC), 0))>;
+ def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)),
+ (XXSPLTW (COPY_TO_REGCLASS (MTVSRWZ $A), VSRC), 1)>;
+ }
+
+ let Predicates = [IsLittleEndian, HasDirectMove, NoP9Vector] in {
+ // Little endian integer vectors using direct moves.
+ def : Pat<(v2i64 (build_vector i64:$A, i64:$B)),
+ (v2i64 (XXPERMDI
+ (COPY_TO_REGCLASS (MTVSRD $B), VSRC),
+ (COPY_TO_REGCLASS (MTVSRD $A), VSRC), 0))>;
+ def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),
+ (VMRGOW (XXPERMDI (COPY_TO_REGCLASS (MTVSRWZ $D), VSRC),
+ (COPY_TO_REGCLASS (MTVSRWZ $B), VSRC), 0),
+ (XXPERMDI (COPY_TO_REGCLASS (MTVSRWZ $C), VSRC),
+ (COPY_TO_REGCLASS (MTVSRWZ $A), VSRC), 0))>;
+ def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)),
+ (XXSPLTW (COPY_TO_REGCLASS (MTVSRWZ $A), VSRC), 1)>;
+ }
+
+ let Predicates = [HasP9Vector] in {
+ // Endianness-neutral patterns for const splats with ISA 3.0 instructions.
+ def : Pat<(v4i32 (scalar_to_vector i32:$A)),
+ (v4i32 (MTVSRWS $A))>;
+ def : Pat<(v4i32 (build_vector i32:$A, i32:$A, i32:$A, i32:$A)),
+ (v4i32 (MTVSRWS $A))>;
+ def : Pat<(v16i8 (build_vector immAnyExt8:$A, immAnyExt8:$A, immAnyExt8:$A,
+ immAnyExt8:$A, immAnyExt8:$A, immAnyExt8:$A,
+ immAnyExt8:$A, immAnyExt8:$A, immAnyExt8:$A,
+ immAnyExt8:$A, immAnyExt8:$A, immAnyExt8:$A,
+ immAnyExt8:$A, immAnyExt8:$A, immAnyExt8:$A,
+ immAnyExt8:$A)),
+ (v16i8 (COPY_TO_REGCLASS (XXSPLTIB imm:$A), VSRC))>;
+ def : Pat<(v16i8 immAllOnesV),
+ (v16i8 (COPY_TO_REGCLASS (XXSPLTIB 255), VSRC))>;
+ def : Pat<(v8i16 immAllOnesV),
+ (v8i16 (COPY_TO_REGCLASS (XXSPLTIB 255), VSRC))>;
+ def : Pat<(v4i32 immAllOnesV),
+ (v4i32 (XXSPLTIB 255))>;
+ def : Pat<(v2i64 immAllOnesV),
+ (v2i64 (XXSPLTIB 255))>;
+ def : Pat<(v4i32 (scalar_to_vector FltToIntLoad.A)),
+ (v4i32 (XVCVSPSXWS (LXVWSX xoaddr:$A)))>;
+ def : Pat<(v4i32 (scalar_to_vector FltToUIntLoad.A)),
+ (v4i32 (XVCVSPUXWS (LXVWSX xoaddr:$A)))>;
+ def : Pat<(v4i32 (scalar_to_vector DblToIntLoad.A)),
+ (v4i32 (XXSPLTW (COPY_TO_REGCLASS
+ (XSCVDPSXWS (DFLOADf64 iaddr:$A)), VSRC), 1))>;
+ def : Pat<(v4i32 (scalar_to_vector DblToUIntLoad.A)),
+ (v4i32 (XXSPLTW (COPY_TO_REGCLASS
+ (XSCVDPUXWS (DFLOADf64 iaddr:$A)), VSRC), 1))>;
+ def : Pat<(v2i64 (scalar_to_vector FltToLongLoad.A)),
+ (v2i64 (XXPERMDIs (XSCVDPSXDS (COPY_TO_REGCLASS
+ (DFLOADf32 iaddr:$A),
+ VSFRC)), 0))>;
+ def : Pat<(v2i64 (scalar_to_vector FltToULongLoad.A)),
+ (v2i64 (XXPERMDIs (XSCVDPUXDS (COPY_TO_REGCLASS
+ (DFLOADf32 iaddr:$A),
+ VSFRC)), 0))>;
+ }
+
+ let Predicates = [IsISA3_0, HasDirectMove, IsBigEndian] in {
+ def : Pat<(i64 (extractelt v2i64:$A, 1)),
+ (i64 (MFVSRLD $A))>;
+ // Better way to build integer vectors if we have MTVSRDD. Big endian.
+ def : Pat<(v2i64 (build_vector i64:$rB, i64:$rA)),
+ (v2i64 (MTVSRDD $rB, $rA))>;
+ def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),
+ (VMRGOW (COPY_TO_REGCLASS (MTVSRDD AnyExts.A, AnyExts.C), VSRC),
+ (COPY_TO_REGCLASS (MTVSRDD AnyExts.B, AnyExts.D), VSRC))>;
+ }
+
+ let Predicates = [IsISA3_0, HasDirectMove, IsLittleEndian] in {
+ def : Pat<(i64 (extractelt v2i64:$A, 0)),
+ (i64 (MFVSRLD $A))>;
+ // Better way to build integer vectors if we have MTVSRDD. Little endian.
+ def : Pat<(v2i64 (build_vector i64:$rA, i64:$rB)),
+ (v2i64 (MTVSRDD $rB, $rA))>;
+ def : Pat<(v4i32 (build_vector i32:$A, i32:$B, i32:$C, i32:$D)),
+ (VMRGOW (COPY_TO_REGCLASS (MTVSRDD AnyExts.D, AnyExts.B), VSRC),
+ (COPY_TO_REGCLASS (MTVSRDD AnyExts.C, AnyExts.A), VSRC))>;
+ }
+}
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp b/contrib/llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
index 48a71cf..2c3e755 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCLoopPreIncPrep.cpp
@@ -20,31 +20,38 @@
//===----------------------------------------------------------------------===//
#define DEBUG_TYPE "ppc-loop-preinc-prep"
+
#include "PPC.h"
+#include "PPCSubtarget.h"
#include "PPCTargetMachine.h"
#include "llvm/ADT/DepthFirstIterator.h"
-#include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/ADT/SmallSet.h"
-#include "llvm/ADT/Statistic.h"
-#include "llvm/Analysis/CodeMetrics.h"
-#include "llvm/Analysis/InstructionSimplify.h"
+#include "llvm/ADT/SmallVector.h"
#include "llvm/Analysis/LoopInfo.h"
#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"
-#include "llvm/Analysis/ValueTracking.h"
+#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"
#include "llvm/IR/Dominators.h"
-#include "llvm/IR/Function.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Module.h"
+#include "llvm/IR/Value.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/Casting.h"
#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"
#include "llvm/Transforms/Scalar.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/LoopUtils.h"
-#include "llvm/Transforms/Utils/ValueMapper.h"
+#include <cassert>
+#include <iterator>
+#include <utility>
+
using namespace llvm;
// By default, we limit this to creating 16 PHIs (which is a little over half
@@ -54,14 +61,17 @@ static cl::opt<unsigned> MaxVars("ppc-preinc-prep-max-vars",
cl::desc("Potential PHI threshold for PPC preinc loop prep"));
namespace llvm {
+
void initializePPCLoopPreIncPrepPass(PassRegistry&);
-}
+
+} // end namespace llvm
namespace {
class PPCLoopPreIncPrep : public FunctionPass {
public:
static char ID; // Pass ID, replacement for typeid
+
PPCLoopPreIncPrep() : FunctionPass(ID), TM(nullptr) {
initializePPCLoopPreIncPrepPass(*PassRegistry::getPassRegistry());
}
@@ -89,7 +99,8 @@ namespace {
ScalarEvolution *SE;
bool PreserveLCSSA;
};
-}
+
+} // end anonymous namespace
char PPCLoopPreIncPrep::ID = 0;
static const char *name = "Prepare loop for pre-inc. addressing modes";
@@ -103,6 +114,7 @@ FunctionPass *llvm::createPPCLoopPreIncPrepPass(PPCTargetMachine &TM) {
}
namespace {
+
struct BucketElement {
BucketElement(const SCEVConstant *O, Instruction *I) : Offset(O), Instr(I) {}
BucketElement(Instruction *I) : Offset(nullptr), Instr(I) {}
@@ -118,7 +130,8 @@ namespace {
const SCEV *BaseSCEV;
SmallVector<BucketElement, 16> Elements;
};
-}
+
+} // end anonymous namespace
static bool IsPtrInBounds(Value *BasePtr) {
Value *StrippedBasePtr = BasePtr;
@@ -140,7 +153,7 @@ static Value *GetPointerOperand(Value *MemI) {
return IMemI->getArgOperand(0);
}
- return 0;
+ return nullptr;
}
bool PPCLoopPreIncPrep::runOnFunction(Function &F) {
@@ -394,7 +407,7 @@ bool PPCLoopPreIncPrep::runOnLoop(Loop *L) {
Instruction *PtrIP = dyn_cast<Instruction>(Ptr);
if (PtrIP && isa<Instruction>(NewBasePtr) &&
cast<Instruction>(NewBasePtr)->getParent() == PtrIP->getParent())
- PtrIP = 0;
+ PtrIP = nullptr;
else if (isa<PHINode>(PtrIP))
PtrIP = &*PtrIP->getParent()->getFirstInsertionPt();
else if (!PtrIP)
@@ -437,4 +450,3 @@ bool PPCLoopPreIncPrep::runOnLoop(Loop *L) {
return MadeChange;
}
-
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCMCInstLower.cpp b/contrib/llvm/lib/Target/PowerPC/PPCMCInstLower.cpp
index 18377a4..e527b01 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCMCInstLower.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCMCInstLower.cpp
@@ -34,10 +34,10 @@ static MachineModuleInfoMachO &getMachOMMI(AsmPrinter &AP) {
return AP.MMI->getObjFileInfo<MachineModuleInfoMachO>();
}
-
-static MCSymbol *GetSymbolFromOperand(const MachineOperand &MO, AsmPrinter &AP){
+static MCSymbol *GetSymbolFromOperand(const MachineOperand &MO,
+ AsmPrinter &AP) {
const TargetMachine &TM = AP.TM;
- Mangler *Mang = AP.Mang;
+ Mangler &Mang = TM.getObjFileLowering()->getMangler();
const DataLayout &DL = AP.getDataLayout();
MCContext &Ctx = AP.OutContext;
@@ -54,7 +54,7 @@ static MCSymbol *GetSymbolFromOperand(const MachineOperand &MO, AsmPrinter &AP){
Mangler::getNameWithPrefix(Name, MO.getSymbolName(), DL);
} else {
const GlobalValue *GV = MO.getGlobal();
- TM.getNameWithPrefix(Name, GV, *Mang);
+ TM.getNameWithPrefix(Name, GV, Mang);
}
Name += Suffix;
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp b/contrib/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
index a57a83d..2413af3 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCMIPeephole.cpp
@@ -124,10 +124,40 @@ bool PPCMIPeephole::simplifyCode(void) {
if (TrueReg1 == TrueReg2
&& TargetRegisterInfo::isVirtualRegister(TrueReg1)) {
MachineInstr *DefMI = MRI->getVRegDef(TrueReg1);
+ unsigned DefOpc = DefMI ? DefMI->getOpcode() : 0;
+
+ // If this is a splat fed by a splatting load, the splat is
+ // redundant. Replace with a copy. This doesn't happen directly due
+ // to code in PPCDAGToDAGISel.cpp, but it can happen when converting
+ // a load of a double to a vector of 64-bit integers.
+ auto isConversionOfLoadAndSplat = [=]() -> bool {
+ if (DefOpc != PPC::XVCVDPSXDS && DefOpc != PPC::XVCVDPUXDS)
+ return false;
+ unsigned DefReg = lookThruCopyLike(DefMI->getOperand(1).getReg());
+ if (TargetRegisterInfo::isVirtualRegister(DefReg)) {
+ MachineInstr *LoadMI = MRI->getVRegDef(DefReg);
+ if (LoadMI && LoadMI->getOpcode() == PPC::LXVDSX)
+ return true;
+ }
+ return false;
+ };
+ if (DefMI && (Immed == 0 || Immed == 3)) {
+ if (DefOpc == PPC::LXVDSX || isConversionOfLoadAndSplat()) {
+ DEBUG(dbgs()
+ << "Optimizing load-and-splat/splat "
+ "to load-and-splat/copy: ");
+ DEBUG(MI.dump());
+ BuildMI(MBB, &MI, MI.getDebugLoc(),
+ TII->get(PPC::COPY), MI.getOperand(0).getReg())
+ .addOperand(MI.getOperand(1));
+ ToErase = &MI;
+ Simplified = true;
+ }
+ }
// If this is a splat or a swap fed by another splat, we
// can replace it with a copy.
- if (DefMI && DefMI->getOpcode() == PPC::XXPERMDI) {
+ if (DefOpc == PPC::XXPERMDI) {
unsigned FeedImmed = DefMI->getOperand(3).getImm();
unsigned FeedReg1
= lookThruCopyLike(DefMI->getOperand(1).getReg());
@@ -170,14 +200,144 @@ bool PPCMIPeephole::simplifyCode(void) {
ToErase = &MI;
Simplified = true;
}
+ } else if ((Immed == 0 || Immed == 3) && DefOpc == PPC::XXPERMDIs &&
+ (DefMI->getOperand(2).getImm() == 0 ||
+ DefMI->getOperand(2).getImm() == 3)) {
+ // Splat fed by another splat - switch the output of the first
+ // and remove the second.
+ DefMI->getOperand(0).setReg(MI.getOperand(0).getReg());
+ ToErase = &MI;
+ Simplified = true;
+ DEBUG(dbgs() << "Removing redundant splat: ");
+ DEBUG(MI.dump());
+ }
+ }
+ }
+ break;
+ }
+ case PPC::VSPLTB:
+ case PPC::VSPLTH:
+ case PPC::XXSPLTW: {
+ unsigned MyOpcode = MI.getOpcode();
+ unsigned OpNo = MyOpcode == PPC::XXSPLTW ? 1 : 2;
+ unsigned TrueReg = lookThruCopyLike(MI.getOperand(OpNo).getReg());
+ if (!TargetRegisterInfo::isVirtualRegister(TrueReg))
+ break;
+ MachineInstr *DefMI = MRI->getVRegDef(TrueReg);
+ if (!DefMI)
+ break;
+ unsigned DefOpcode = DefMI->getOpcode();
+ auto isConvertOfSplat = [=]() -> bool {
+ if (DefOpcode != PPC::XVCVSPSXWS && DefOpcode != PPC::XVCVSPUXWS)
+ return false;
+ unsigned ConvReg = DefMI->getOperand(1).getReg();
+ if (!TargetRegisterInfo::isVirtualRegister(ConvReg))
+ return false;
+ MachineInstr *Splt = MRI->getVRegDef(ConvReg);
+ return Splt && (Splt->getOpcode() == PPC::LXVWSX ||
+ Splt->getOpcode() == PPC::XXSPLTW);
+ };
+ bool AlreadySplat = (MyOpcode == DefOpcode) ||
+ (MyOpcode == PPC::VSPLTB && DefOpcode == PPC::VSPLTBs) ||
+ (MyOpcode == PPC::VSPLTH && DefOpcode == PPC::VSPLTHs) ||
+ (MyOpcode == PPC::XXSPLTW && DefOpcode == PPC::XXSPLTWs) ||
+ (MyOpcode == PPC::XXSPLTW && DefOpcode == PPC::LXVWSX) ||
+ (MyOpcode == PPC::XXSPLTW && DefOpcode == PPC::MTVSRWS)||
+ (MyOpcode == PPC::XXSPLTW && isConvertOfSplat());
+ // If the instruction[s] that feed this splat have already splat
+ // the value, this splat is redundant.
+ if (AlreadySplat) {
+ DEBUG(dbgs() << "Changing redundant splat to a copy: ");
+ DEBUG(MI.dump());
+ BuildMI(MBB, &MI, MI.getDebugLoc(), TII->get(PPC::COPY),
+ MI.getOperand(0).getReg())
+ .addOperand(MI.getOperand(OpNo));
+ ToErase = &MI;
+ Simplified = true;
+ }
+ // Splat fed by a shift. Usually when we align value to splat into
+ // vector element zero.
+ if (DefOpcode == PPC::XXSLDWI) {
+ unsigned ShiftRes = DefMI->getOperand(0).getReg();
+ unsigned ShiftOp1 = DefMI->getOperand(1).getReg();
+ unsigned ShiftOp2 = DefMI->getOperand(2).getReg();
+ unsigned ShiftImm = DefMI->getOperand(3).getImm();
+ unsigned SplatImm = MI.getOperand(2).getImm();
+ if (ShiftOp1 == ShiftOp2) {
+ unsigned NewElem = (SplatImm + ShiftImm) & 0x3;
+ if (MRI->hasOneNonDBGUse(ShiftRes)) {
+ DEBUG(dbgs() << "Removing redundant shift: ");
+ DEBUG(DefMI->dump());
+ ToErase = DefMI;
}
+ Simplified = true;
+ DEBUG(dbgs() << "Changing splat immediate from " << SplatImm <<
+ " to " << NewElem << " in instruction: ");
+ DEBUG(MI.dump());
+ MI.getOperand(1).setReg(ShiftOp1);
+ MI.getOperand(2).setImm(NewElem);
}
}
break;
}
+ case PPC::XVCVDPSP: {
+ // If this is a DP->SP conversion fed by an FRSP, the FRSP is redundant.
+ unsigned TrueReg = lookThruCopyLike(MI.getOperand(1).getReg());
+ if (!TargetRegisterInfo::isVirtualRegister(TrueReg))
+ break;
+ MachineInstr *DefMI = MRI->getVRegDef(TrueReg);
+
+ // This can occur when building a vector of single precision or integer
+ // values.
+ if (DefMI && DefMI->getOpcode() == PPC::XXPERMDI) {
+ unsigned DefsReg1 = lookThruCopyLike(DefMI->getOperand(1).getReg());
+ unsigned DefsReg2 = lookThruCopyLike(DefMI->getOperand(2).getReg());
+ if (!TargetRegisterInfo::isVirtualRegister(DefsReg1) ||
+ !TargetRegisterInfo::isVirtualRegister(DefsReg2))
+ break;
+ MachineInstr *P1 = MRI->getVRegDef(DefsReg1);
+ MachineInstr *P2 = MRI->getVRegDef(DefsReg2);
+
+ if (!P1 || !P2)
+ break;
+
+ // Remove the passed FRSP instruction if it only feeds this MI and
+ // set any uses of that FRSP (in this MI) to the source of the FRSP.
+ auto removeFRSPIfPossible = [&](MachineInstr *RoundInstr) {
+ if (RoundInstr->getOpcode() == PPC::FRSP &&
+ MRI->hasOneNonDBGUse(RoundInstr->getOperand(0).getReg())) {
+ Simplified = true;
+ unsigned ConvReg1 = RoundInstr->getOperand(1).getReg();
+ unsigned FRSPDefines = RoundInstr->getOperand(0).getReg();
+ MachineInstr &Use = *(MRI->use_instr_begin(FRSPDefines));
+ for (int i = 0, e = Use.getNumOperands(); i < e; ++i)
+ if (Use.getOperand(i).isReg() &&
+ Use.getOperand(i).getReg() == FRSPDefines)
+ Use.getOperand(i).setReg(ConvReg1);
+ DEBUG(dbgs() << "Removing redundant FRSP:\n");
+ DEBUG(RoundInstr->dump());
+ DEBUG(dbgs() << "As it feeds instruction:\n");
+ DEBUG(MI.dump());
+ DEBUG(dbgs() << "Through instruction:\n");
+ DEBUG(DefMI->dump());
+ RoundInstr->eraseFromParent();
+ }
+ };
+
+ // If the input to XVCVDPSP is a vector that was built (even
+ // partially) out of FRSP's, the FRSP(s) can safely be removed
+ // since this instruction performs the same operation.
+ if (P1 != P2) {
+ removeFRSPIfPossible(P1);
+ removeFRSPIfPossible(P2);
+ break;
+ }
+ removeFRSPIfPossible(P1);
+ }
+ break;
+ }
}
}
-
// If the last instruction was marked for elimination,
// remove it now.
if (ToErase) {
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCQPXLoadSplat.cpp b/contrib/llvm/lib/Target/PowerPC/PPCQPXLoadSplat.cpp
index bfe20c1..8a18ab9 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCQPXLoadSplat.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCQPXLoadSplat.cpp
@@ -44,7 +44,7 @@ namespace {
bool runOnMachineFunction(MachineFunction &Fn) override;
- const char *getPassName() const override {
+ StringRef getPassName() const override {
return "PowerPC QPX Load Splat Simplification";
}
};
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp b/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
index f0161a0..e492014 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
@@ -78,6 +78,18 @@ PPCRegisterInfo::PPCRegisterInfo(const PPCTargetMachine &TM)
ImmToIdxMap[PPC::STB8] = PPC::STBX8; ImmToIdxMap[PPC::STH8] = PPC::STHX8;
ImmToIdxMap[PPC::STW8] = PPC::STWX8; ImmToIdxMap[PPC::STDU] = PPC::STDUX;
ImmToIdxMap[PPC::ADDI8] = PPC::ADD8;
+
+ // VSX
+ ImmToIdxMap[PPC::DFLOADf32] = PPC::LXSSPX;
+ ImmToIdxMap[PPC::DFLOADf64] = PPC::LXSDX;
+ ImmToIdxMap[PPC::DFSTOREf32] = PPC::STXSSPX;
+ ImmToIdxMap[PPC::DFSTOREf64] = PPC::STXSDX;
+ ImmToIdxMap[PPC::LXV] = PPC::LXVX;
+ ImmToIdxMap[PPC::LXSD] = PPC::LXSDX;
+ ImmToIdxMap[PPC::LXSSP] = PPC::LXSSPX;
+ ImmToIdxMap[PPC::STXV] = PPC::STXVX;
+ ImmToIdxMap[PPC::STXSD] = PPC::STXSDX;
+ ImmToIdxMap[PPC::STXSSP] = PPC::STXSSPX;
}
/// getPointerRegClass - Return the register class to use to hold pointers.
@@ -303,7 +315,6 @@ unsigned PPCRegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
case PPC::VRRCRegClassID:
case PPC::VFRCRegClassID:
case PPC::VSLRCRegClassID:
- case PPC::VSHRCRegClassID:
return 32 - DefaultSafety;
case PPC::VSRCRegClassID:
case PPC::VSFRCRegClassID:
@@ -352,7 +363,7 @@ void PPCRegisterInfo::lowerDynamicAlloc(MachineBasicBlock::iterator II) const {
// Get the basic block's function.
MachineFunction &MF = *MBB.getParent();
// Get the frame info.
- MachineFrameInfo *MFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();
// Get the instruction info.
const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
@@ -361,14 +372,14 @@ void PPCRegisterInfo::lowerDynamicAlloc(MachineBasicBlock::iterator II) const {
DebugLoc dl = MI.getDebugLoc();
// Get the maximum call stack size.
- unsigned maxCallFrameSize = MFI->getMaxCallFrameSize();
+ unsigned maxCallFrameSize = MFI.getMaxCallFrameSize();
// Get the total frame size.
- unsigned FrameSize = MFI->getStackSize();
+ unsigned FrameSize = MFI.getStackSize();
// Get stack alignments.
const PPCFrameLowering *TFI = getFrameLowering(MF);
unsigned TargetAlign = TFI->getStackAlignment();
- unsigned MaxAlign = MFI->getMaxAlignment();
+ unsigned MaxAlign = MFI.getMaxAlignment();
assert((maxCallFrameSize & (MaxAlign-1)) == 0 &&
"Maximum call-frame size not sufficiently aligned");
@@ -466,12 +477,12 @@ void PPCRegisterInfo::lowerDynamicAreaOffset(
// Get the basic block's function.
MachineFunction &MF = *MBB.getParent();
// Get the frame info.
- MachineFrameInfo *MFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
const PPCSubtarget &Subtarget = MF.getSubtarget<PPCSubtarget>();
// Get the instruction info.
const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
- unsigned maxCallFrameSize = MFI->getMaxCallFrameSize();
+ unsigned maxCallFrameSize = MFI.getMaxCallFrameSize();
DebugLoc dl = MI.getDebugLoc();
BuildMI(MBB, II, dl, TII.get(PPC::LI), MI.getOperand(0).getReg())
.addImm(maxCallFrameSize);
@@ -787,7 +798,7 @@ PPCRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
// Get the instruction info.
const TargetInstrInfo &TII = *Subtarget.getInstrInfo();
// Get the frame info.
- MachineFrameInfo *MFI = MF.getFrameInfo();
+ MachineFrameInfo &MFI = MF.getFrameInfo();
DebugLoc dl = MI.getDebugLoc();
unsigned OffsetOperandNo = getOffsetONFromFION(MI, FIOperandNum);
@@ -848,7 +859,7 @@ PPCRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
OpC != TargetOpcode::PATCHPOINT && !ImmToIdxMap.count(OpC);
// Now add the frame object offset to the offset from r1.
- int Offset = MFI->getObjectOffset(FrameIndex);
+ int Offset = MFI.getObjectOffset(FrameIndex);
Offset += MI.getOperand(OffsetOperandNo).getImm();
// If we're not using a Frame Pointer that has been set to the value of the
@@ -859,7 +870,7 @@ PPCRegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
// functions.
if (!MF.getFunction()->hasFnAttribute(Attribute::Naked)) {
if (!(hasBasePointer(MF) && FrameIndex < 0))
- Offset += MFI->getStackSize();
+ Offset += MFI.getStackSize();
}
// If we can, encode the offset directly into the instruction. If this is a
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.h b/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.h
index 459502e..4a96327 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.h
+++ b/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.h
@@ -75,7 +75,7 @@ public:
/// Code Generation virtual methods...
const MCPhysReg *getCalleeSavedRegs(const MachineFunction *MF) const override;
- const MCPhysReg *getCalleeSavedRegsViaCopy(const MachineFunction *MF) const override;
+ const MCPhysReg *getCalleeSavedRegsViaCopy(const MachineFunction *MF) const;
const uint32_t *getCallPreservedMask(const MachineFunction &MF,
CallingConv::ID CC) const override;
const uint32_t *getNoPreservedMask() const override;
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.td b/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.td
index e5f363c..896cec7 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCRegisterInfo.td
@@ -17,7 +17,6 @@ def sub_eq : SubRegIndex<1, 2>;
def sub_un : SubRegIndex<1, 3>;
def sub_32 : SubRegIndex<32>;
def sub_64 : SubRegIndex<64>;
-def sub_128 : SubRegIndex<128>;
}
@@ -79,15 +78,6 @@ class VSRL<FPR SubReg, string n> : PPCReg<n> {
let SubRegIndices = [sub_64];
}
-// VSRH - One of the 32 128-bit VSX registers that overlap with the vector
-// registers.
-class VSRH<VR SubReg, string n> : PPCReg<n> {
- let HWEncoding{4-0} = SubReg.HWEncoding{4-0};
- let HWEncoding{5} = 1;
- let SubRegs = [SubReg];
- let SubRegIndices = [sub_128];
-}
-
// CR - One of the 8 4-bit condition registers
class CR<bits<3> num, string n, list<Register> subregs> : PPCReg<n> {
let HWEncoding{2-0} = num;
@@ -116,9 +106,12 @@ foreach Index = 0-31 in {
DwarfRegNum<[!add(Index, 32), !add(Index, 32)]>;
}
-// Floating-point vector subregisters (for VSX)
+// 64-bit Floating-point subregisters of Altivec registers
+// Note: the register names are v0-v31 or vs32-vs63 depending on the use.
+// Custom C++ code is used to produce the correct name and encoding.
foreach Index = 0-31 in {
- def VF#Index : VF<Index, "vs" # !add(Index, 32)>;
+ def VF#Index : VF<Index, "v" #Index>,
+ DwarfRegNum<[!add(Index, 77), !add(Index, 77)]>;
}
// QPX Floating-point registers
@@ -138,9 +131,11 @@ foreach Index = 0-31 in {
def VSL#Index : VSRL<!cast<FPR>("F"#Index), "vs"#Index>,
DwarfRegAlias<!cast<FPR>("F"#Index)>;
}
-foreach Index = 0-31 in {
- def VSH#Index : VSRH<!cast<VR>("V"#Index), "vs" # !add(Index, 32)>,
- DwarfRegAlias<!cast<VR>("V"#Index)>;
+
+// Dummy VSX registers, this defines string: "vs32"-"vs63", and is only used for
+// asm printing.
+foreach Index = 32-63 in {
+ def VSX#Index : PPCReg<"vs"#Index>;
}
// The reprsentation of r0 when treated as the constant 0.
@@ -288,7 +283,7 @@ def F8RC : RegisterClass<"PPC", [f64], 64, (add (sequence "F%u", 0, 13),
(sequence "F%u", 31, 14))>;
def F4RC : RegisterClass<"PPC", [f32], 32, (add F8RC)>;
-def VRRC : RegisterClass<"PPC", [v16i8,v8i16,v4i32,v2i64,v1i128,v4f32], 128,
+def VRRC : RegisterClass<"PPC", [v16i8,v8i16,v4i32,v2i64,v1i128,v4f32,v2f64], 128,
(add V2, V3, V4, V5, V0, V1, V6, V7, V8, V9, V10, V11,
V12, V13, V14, V15, V16, V17, V18, V19, V31, V30,
V29, V28, V27, V26, V25, V24, V23, V22, V21, V20)>;
@@ -298,14 +293,8 @@ def VRRC : RegisterClass<"PPC", [v16i8,v8i16,v4i32,v2i64,v1i128,v4f32], 128,
def VSLRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128,
(add (sequence "VSL%u", 0, 13),
(sequence "VSL%u", 31, 14))>;
-def VSHRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128,
- (add VSH2, VSH3, VSH4, VSH5, VSH0, VSH1, VSH6, VSH7,
- VSH8, VSH9, VSH10, VSH11, VSH12, VSH13, VSH14,
- VSH15, VSH16, VSH17, VSH18, VSH19, VSH31, VSH30,
- VSH29, VSH28, VSH27, VSH26, VSH25, VSH24, VSH23,
- VSH22, VSH21, VSH20)>;
def VSRC : RegisterClass<"PPC", [v4i32,v4f32,v2f64,v2i64], 128,
- (add VSLRC, VSHRC)>;
+ (add VSLRC, VRRC)>;
// Register classes for the 64-bit "scalar" VSX subregisters.
def VFRC : RegisterClass<"PPC", [f64], 64,
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCSchedule.td b/contrib/llvm/lib/Target/PowerPC/PPCSchedule.td
index b4d72ef..d240529 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCSchedule.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCSchedule.td
@@ -109,6 +109,7 @@ def IIC_SprSLBIE : InstrItinClass;
def IIC_SprSLBIEG : InstrItinClass;
def IIC_SprSLBMTE : InstrItinClass;
def IIC_SprSLBMFEE : InstrItinClass;
+def IIC_SprSLBMFEV : InstrItinClass;
def IIC_SprSLBIA : InstrItinClass;
def IIC_SprSLBSYNC : InstrItinClass;
def IIC_SprTLBIA : InstrItinClass;
@@ -117,6 +118,8 @@ def IIC_SprTLBIE : InstrItinClass;
def IIC_SprABORT : InstrItinClass;
def IIC_SprMSGSYNC : InstrItinClass;
def IIC_SprSTOP : InstrItinClass;
+def IIC_SprMFPMR : InstrItinClass;
+def IIC_SprMTPMR : InstrItinClass;
//===----------------------------------------------------------------------===//
// Processor instruction itineraries.
@@ -128,6 +131,7 @@ include "PPCScheduleG4Plus.td"
include "PPCScheduleG5.td"
include "PPCScheduleP7.td"
include "PPCScheduleP8.td"
+include "PPCScheduleP9.td"
include "PPCScheduleA2.td"
include "PPCScheduleE500mc.td"
include "PPCScheduleE5500.td"
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCScheduleE500mc.td b/contrib/llvm/lib/Target/PowerPC/PPCScheduleE500mc.td
index f687d32..15d5991 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCScheduleE500mc.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCScheduleE500mc.td
@@ -249,6 +249,10 @@ def PPCE500mcItineraries : ProcessorItineraries<
InstrStage<5, [E500_SFX0]>],
[8, 1],
[E500_GPR_Bypass, E500_CR_Bypass]>,
+ InstrItinData<IIC_SprMFPMR, [InstrStage<1, [E500_DIS0, E500_DIS1], 0>,
+ InstrStage<4, [E500_SFX0]>],
+ [7, 1], // Latency = 4, Repeat rate = 4
+ [E500_GPR_Bypass, E500_GPR_Bypass]>,
InstrItinData<IIC_SprMFMSR, [InstrStage<1, [E500_DIS0, E500_DIS1], 0>,
InstrStage<4, [E500_SFX0]>],
[7, 1], // Latency = 4, Repeat rate = 4
@@ -257,6 +261,10 @@ def PPCE500mcItineraries : ProcessorItineraries<
InstrStage<1, [E500_SFX0, E500_SFX1]>],
[4, 1], // Latency = 1, Repeat rate = 1
[E500_GPR_Bypass, E500_CR_Bypass]>,
+ InstrItinData<IIC_SprMTPMR, [InstrStage<1, [E500_DIS0, E500_DIS1], 0>,
+ InstrStage<1, [E500_SFX0]>],
+ [4, 1], // Latency = 1, Repeat rate = 1
+ [E500_CR_Bypass, E500_GPR_Bypass]>,
InstrItinData<IIC_SprMFTB, [InstrStage<1, [E500_DIS0, E500_DIS1], 0>,
InstrStage<4, [E500_SFX0]>],
[7, 1], // Latency = 4, Repeat rate = 4
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCScheduleE5500.td b/contrib/llvm/lib/Target/PowerPC/PPCScheduleE5500.td
index 5db886c..32f8e65 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCScheduleE5500.td
+++ b/contrib/llvm/lib/Target/PowerPC/PPCScheduleE5500.td
@@ -313,20 +313,24 @@ def PPCE5500Itineraries : ProcessorItineraries<
InstrStage<5, [E5500_CFX_0]>],
[9, 2], // Latency = 5, Repeat rate = 5
[E5500_GPR_Bypass, E5500_CR_Bypass]>,
- InstrItinData<IIC_SprMFMSR, [InstrStage<1, [E5500_DIS0, E5500_DIS1], 0>,
- InstrStage<4, [E5500_SFX0]>],
+ InstrItinData<IIC_SprMFPMR, [InstrStage<1, [E5500_DIS0, E5500_DIS1], 0>,
+ InstrStage<4, [E5500_CFX_0]>],
[8, 2], // Latency = 4, Repeat rate = 4
[E5500_GPR_Bypass, E5500_GPR_Bypass]>,
InstrItinData<IIC_SprMFSPR, [InstrStage<1, [E5500_DIS0, E5500_DIS1], 0>,
InstrStage<1, [E5500_CFX_0]>],
[5], // Latency = 1, Repeat rate = 1
[E5500_GPR_Bypass]>,
+ InstrItinData<IIC_SprMTPMR, [InstrStage<1, [E5500_DIS0, E5500_DIS1], 0>,
+ InstrStage<1, [E5500_CFX_0]>],
+ [5], // Latency = 1, Repeat rate = 1
+ [E5500_GPR_Bypass]>,
InstrItinData<IIC_SprMFTB, [InstrStage<1, [E5500_DIS0, E5500_DIS1], 0>,
InstrStage<4, [E5500_CFX_0]>],
[8, 2], // Latency = 4, Repeat rate = 4
[NoBypass, E5500_GPR_Bypass]>,
InstrItinData<IIC_SprMTSPR, [InstrStage<1, [E5500_DIS0, E5500_DIS1], 0>,
- InstrStage<1, [E5500_SFX0, E5500_SFX1]>],
+ InstrStage<1, [E5500_CFX_0]>],
[5], // Latency = 1, Repeat rate = 1
[E5500_GPR_Bypass]>,
InstrItinData<IIC_FPGeneral, [InstrStage<1, [E5500_DIS0, E5500_DIS1], 0>,
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCScheduleP9.td b/contrib/llvm/lib/Target/PowerPC/PPCScheduleP9.td
new file mode 100644
index 0000000..a9c1bd7
--- /dev/null
+++ b/contrib/llvm/lib/Target/PowerPC/PPCScheduleP9.td
@@ -0,0 +1,335 @@
+//===-- PPCScheduleP9.td - PPC P9 Scheduling Definitions ---*- tablegen -*-===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is distributed under the University of Illinois Open Source
+// License. See LICENSE.TXT for details.
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the itinerary class data for the POWER9 processor.
+//
+//===----------------------------------------------------------------------===//
+include "PPCInstrInfo.td"
+
+def P9Model : SchedMachineModel {
+ let IssueWidth = 8;
+
+ let LoadLatency = 5;
+
+ let MispredictPenalty = 16;
+
+ // Try to make sure we have at least 10 dispatch groups in a loop.
+ let LoopMicroOpBufferSize = 60;
+
+ let CompleteModel = 0;
+
+}
+
+let SchedModel = P9Model in {
+
+ // ***************** Processor Resources *****************
+
+ //Dispatcher:
+ def DISPATCHER : ProcResource<12>;
+
+ // Issue Ports
+ def IP_AGEN : ProcResource<4>;
+ def IP_EXEC : ProcResource<4>;
+ def IP_EXECE : ProcResource<2> {
+ //Even Exec Ports
+ let Super = IP_EXEC;
+ }
+ def IP_EXECO : ProcResource<2> {
+ //Odd Exec Ports
+ let Super = IP_EXEC;
+ }
+
+ // Pipeline Groups
+ def ALU : ProcResource<4>;
+ def ALUE : ProcResource<2> {
+ //Even ALU pipelines
+ let Super = ALU;
+ }
+ def ALUO : ProcResource<2> {
+ //Odd ALU pipelines
+ let Super = ALU;
+ }
+ def DIV : ProcResource<2>;
+ def DP : ProcResource<4>;
+ def DPE : ProcResource<2> {
+ //Even DP pipelines
+ let Super = DP;
+ }
+ def DPO : ProcResource<2> {
+ //Odd DP pipelines
+ let Super = DP;
+ }
+ def LS : ProcResource<4>;
+ def PM : ProcResource<2>;
+ def DFU : ProcResource<1>;
+
+ def TestGroup : ProcResGroup<[ALU, DP]>;
+
+ // ***************** SchedWriteRes Definitions *****************
+
+ //Dispatcher
+ def DISP_1C : SchedWriteRes<[DISPATCHER]> {
+ let NumMicroOps = 0;
+ let Latency = 1;
+ }
+
+ // Issue Ports
+ def IP_AGEN_1C : SchedWriteRes<[IP_AGEN]> {
+ let NumMicroOps = 0;
+ let Latency = 1;
+ }
+
+ def IP_EXEC_1C : SchedWriteRes<[IP_EXEC]> {
+ let NumMicroOps = 0;
+ let Latency = 1;
+ }
+
+ def IP_EXECE_1C : SchedWriteRes<[IP_EXECE]> {
+ let NumMicroOps = 0;
+ let Latency = 1;
+ }
+
+ def IP_EXECO_1C : SchedWriteRes<[IP_EXECO]> {
+ let NumMicroOps = 0;
+ let Latency = 1;
+ }
+
+ //Pipeline Groups
+ def P9_ALU_2C : SchedWriteRes<[ALU]> {
+ let Latency = 2;
+ }
+
+ def P9_ALUE_2C : SchedWriteRes<[ALUE]> {
+ let Latency = 2;
+ }
+
+ def P9_ALUO_2C : SchedWriteRes<[ALUO]> {
+ let Latency = 2;
+ }
+
+ def P9_ALU_3C : SchedWriteRes<[ALU]> {
+ let Latency = 3;
+ }
+
+ def P9_ALUE_3C : SchedWriteRes<[ALUE]> {
+ let Latency = 3;
+ }
+
+ def P9_ALUO_3C : SchedWriteRes<[ALUO]> {
+ let Latency = 3;
+ }
+
+ def P9_ALU_4C : SchedWriteRes<[ALU]> {
+ let Latency = 4;
+ }
+
+ def P9_ALUE_4C : SchedWriteRes<[ALUE]> {
+ let Latency = 4;
+ }
+
+ def P9_ALUO_4C : SchedWriteRes<[ALUO]> {
+ let Latency = 4;
+ }
+
+ def P9_ALU_5C : SchedWriteRes<[ALU]> {
+ let Latency = 5;
+ }
+
+ def P9_ALU_6C : SchedWriteRes<[ALU]> {
+ let Latency = 6;
+ }
+
+ def P9_DIV_16C_8 : SchedWriteRes<[DIV]> {
+ let ResourceCycles = [8];
+ let Latency = 16;
+ }
+
+ def P9_DIV_24C_8 : SchedWriteRes<[DIV]> {
+ let ResourceCycles = [8];
+ let Latency = 24;
+ }
+
+ def P9_DIV_40C_8 : SchedWriteRes<[DIV]> {
+ let ResourceCycles = [8];
+ let Latency = 40;
+ }
+
+ def P9_DP_2C : SchedWriteRes<[DP]> {
+ let Latency = 2;
+ }
+
+ def P9_DP_5C : SchedWriteRes<[DP]> {
+ let Latency = 5;
+ }
+
+ def P9_DP_7C : SchedWriteRes<[DP]> {
+ let Latency = 7;
+ }
+
+ def P9_DPE_7C : SchedWriteRes<[DPE]> {
+ let Latency = 7;
+ }
+
+ def P9_DPO_7C : SchedWriteRes<[DPO]> {
+ let Latency = 7;
+ }
+
+ def P9_DP_22C_5 : SchedWriteRes<[DP]> {
+ let ResourceCycles = [5];
+ let Latency = 22;
+ }
+
+ def P9_DP_24C_8 : SchedWriteRes<[DP]> {
+ let ResourceCycles = [8];
+ let Latency = 24;
+ }
+
+ def P9_DP_26C_5 : SchedWriteRes<[DP]> {
+ let ResourceCycles = [5];
+ let Latency = 22;
+ }
+
+ def P9_DP_27C_7 : SchedWriteRes<[DP]> {
+ let ResourceCycles = [7];
+ let Latency = 27;
+ }
+
+ def P9_DP_33C_8 : SchedWriteRes<[DP]> {
+ let ResourceCycles = [8];
+ let Latency = 33;
+ }
+
+ def P9_DP_36C_10 : SchedWriteRes<[DP]> {
+ let ResourceCycles = [10];
+ let Latency = 36;
+ }
+
+ def P9_PM_3C : SchedWriteRes<[PM]> {
+ let Latency = 3;
+ }
+
+ def P9_PM_7C : SchedWriteRes<[PM]> {
+ let Latency = 3;
+ }
+
+ def P9_LS_1C : SchedWriteRes<[LS]> {
+ let Latency = 1;
+ }
+
+ def P9_LS_4C : SchedWriteRes<[LS]> {
+ let Latency = 4;
+ }
+
+ def P9_LS_5C : SchedWriteRes<[LS]> {
+ let Latency = 5;
+ }
+
+ def P9_DFU_12C : SchedWriteRes<[DFU]> {
+ let Latency = 12;
+ }
+
+ def P9_DFU_24C : SchedWriteRes<[DFU]> {
+ let Latency = 24;
+ let ResourceCycles = [12];
+ }
+
+ def P9_DFU_58C : SchedWriteRes<[DFU]> {
+ let Latency = 58;
+ let ResourceCycles = [44];
+ }
+
+ def P9_DFU_76C : SchedWriteRes<[TestGroup, DFU]> {
+ let Latency = 76;
+ let ResourceCycles = [62];
+ }
+ // ***************** WriteSeq Definitions *****************
+
+ def P9_LoadAndALUOp_6C : WriteSequence<[P9_LS_4C, P9_ALU_2C]>;
+ def P9_LoadAndALUOp_7C : WriteSequence<[P9_LS_5C, P9_ALU_2C]>;
+ def P9_LoadAndPMOp_8C : WriteSequence<[P9_LS_5C, P9_PM_3C]>;
+ def P9_IntDivAndALUOp_26C_8 : WriteSequence<[P9_DIV_24C_8, P9_ALU_2C]>;
+ def P9_IntDivAndALUOp_42C_8 : WriteSequence<[P9_DIV_40C_8, P9_ALU_2C]>;
+ def P9_StoreAndALUOp_4C : WriteSequence<[P9_LS_1C, P9_ALU_3C]>;
+ def P9_ALUOpAndALUOp_4C : WriteSequence<[P9_ALU_2C, P9_ALU_2C]>;
+
+ // ***************** Defining Itinerary Class Resources *****************
+
+ def : ItinRW<[P9_DFU_76C, IP_EXEC_1C, DISP_1C, DISP_1C], [IIC_IntSimple,
+ IIC_IntGeneral]>;
+
+ def : ItinRW<[P9_ALU_2C, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_IntISEL, IIC_IntRotate, IIC_IntShift]>;
+
+ def : ItinRW<[P9_ALU_2C, IP_EXEC_1C, DISP_1C, DISP_1C], [IIC_IntCompare]>;
+
+ def : ItinRW<[P9_DP_5C, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_IntMulHW, IIC_IntMulHWU, IIC_IntMulLI]>;
+
+ def : ItinRW<[P9_LS_5C, IP_EXEC_1C, DISP_1C, DISP_1C],
+ [IIC_LdStLoad, IIC_LdStLD]>;
+
+ def : ItinRW<[P9_LS_4C, P9_ALU_2C, IP_EXEC_1C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_LdStLoadUpd, IIC_LdStLDU]>;
+
+ def : ItinRW<[P9_LS_4C, P9_ALU_2C, IP_EXECE_1C, IP_EXECO_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_LdStLoadUpdX, IIC_LdStLDUX]>;
+
+ def : ItinRW<[P9_LS_1C, P9_ALU_2C, IP_EXEC_1C, IP_EXEC_1C, IP_AGEN_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_LdStSTFDU]>;
+
+ def : ItinRW<[P9_LoadAndALUOp_6C,
+ IP_AGEN_1C, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_LdStLHA, IIC_LdStLWA]>;
+
+ def : ItinRW<[P9_LoadAndALUOp_6C, P9_ALU_2C,
+ IP_AGEN_1C, IP_EXEC_1C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_LdStLHAU, IIC_LdStLHAUX]>;
+
+ // IIC_LdStLMW contains two microcoded insns. This is not accurate, but
+ // those insns are not used that much, if at all.
+ def : ItinRW<[P9_LS_4C, IP_EXEC_1C, DISP_1C, DISP_1C],
+ [IIC_LdStLWARX, IIC_LdStLDARX, IIC_LdStLMW]>;
+
+ def : ItinRW<[P9_LS_1C, IP_EXEC_1C, IP_AGEN_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_LdStSTFD, IIC_LdStSTD, IIC_LdStStore]>;
+
+ def : ItinRW<[P9_LS_1C, P9_ALU_2C, IP_EXEC_1C, IP_EXEC_1C, IP_AGEN_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_LdStSTDU, IIC_LdStSTDUX]>;
+
+ def : ItinRW<[P9_StoreAndALUOp_4C, IP_EXEC_1C, IP_EXEC_1C, IP_AGEN_1C,
+ DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_LdStSTDCX, IIC_LdStSTWCX]>;
+
+ def : ItinRW<[P9_ALU_5C, IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C],
+ [IIC_BrCR, IIC_IntMTFSB0]>;
+
+ def : ItinRW<[P9_ALUOpAndALUOp_4C, P9_ALU_2C, IP_EXEC_1C, IP_EXEC_1C,
+ IP_EXEC_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C, DISP_1C,
+ DISP_1C, DISP_1C, DISP_1C], [IIC_SprMFCR, IIC_SprMFCRF]>;
+
+ // This class should be broken down to instruction level, once some missing
+ // info is obtained.
+ def : ItinRW<[P9_LoadAndALUOp_6C, IP_EXEC_1C, IP_AGEN_1C,
+ DISP_1C, DISP_1C, DISP_1C], [IIC_SprMTSPR]>;
+
+ def : ItinRW<[P9_DP_7C, IP_EXEC_1C,
+ DISP_1C, DISP_1C, DISP_1C], [IIC_FPGeneral, IIC_FPAddSub]>;
+
+ def : ItinRW<[P9_DP_36C_10, IP_EXEC_1C], [IIC_FPSqrtD]>;
+ def : ItinRW<[P9_DP_26C_5, P9_DP_26C_5, IP_EXEC_1C, IP_EXEC_1C], [IIC_FPSqrtS]>;
+
+ include "P9InstrResources.td"
+
+}
+
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCSubtarget.h b/contrib/llvm/lib/Target/PowerPC/PPCSubtarget.h
index 46da840..7fd9079 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCSubtarget.h
+++ b/contrib/llvm/lib/Target/PowerPC/PPCSubtarget.h
@@ -277,6 +277,9 @@ public:
bool hasFloat128() const { return HasFloat128; }
bool isISA3_0() const { return IsISA3_0; }
bool useLongCalls() const { return UseLongCalls; }
+ bool needsSwapsForVSXMemOps() const {
+ return hasVSX() && isLittleEndian() && !hasP9Vector();
+ }
POPCNTDKind hasPOPCNTD() const { return HasPOPCNTD; }
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCTLSDynamicCall.cpp b/contrib/llvm/lib/Target/PowerPC/PPCTLSDynamicCall.cpp
index 61ce48e..0c1260a 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCTLSDynamicCall.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCTLSDynamicCall.cpp
@@ -56,26 +56,26 @@ protected:
for (MachineBasicBlock::iterator I = MBB.begin(), IE = MBB.end();
I != IE;) {
- MachineInstr *MI = I;
+ MachineInstr &MI = *I;
- if (MI->getOpcode() != PPC::ADDItlsgdLADDR &&
- MI->getOpcode() != PPC::ADDItlsldLADDR &&
- MI->getOpcode() != PPC::ADDItlsgdLADDR32 &&
- MI->getOpcode() != PPC::ADDItlsldLADDR32) {
+ if (MI.getOpcode() != PPC::ADDItlsgdLADDR &&
+ MI.getOpcode() != PPC::ADDItlsldLADDR &&
+ MI.getOpcode() != PPC::ADDItlsgdLADDR32 &&
+ MI.getOpcode() != PPC::ADDItlsldLADDR32) {
++I;
continue;
}
- DEBUG(dbgs() << "TLS Dynamic Call Fixup:\n " << *MI;);
+ DEBUG(dbgs() << "TLS Dynamic Call Fixup:\n " << MI);
- unsigned OutReg = MI->getOperand(0).getReg();
- unsigned InReg = MI->getOperand(1).getReg();
- DebugLoc DL = MI->getDebugLoc();
+ unsigned OutReg = MI.getOperand(0).getReg();
+ unsigned InReg = MI.getOperand(1).getReg();
+ DebugLoc DL = MI.getDebugLoc();
unsigned GPR3 = Is64Bit ? PPC::X3 : PPC::R3;
unsigned Opc1, Opc2;
const unsigned OrigRegs[] = {OutReg, InReg, GPR3};
- switch (MI->getOpcode()) {
+ switch (MI.getOpcode()) {
default:
llvm_unreachable("Opcode inconsistency error");
case PPC::ADDItlsgdLADDR:
@@ -104,7 +104,7 @@ protected:
// Expand into two ops built prior to the existing instruction.
MachineInstr *Addi = BuildMI(MBB, I, DL, TII->get(Opc1), GPR3)
.addReg(InReg);
- Addi->addOperand(MI->getOperand(2));
+ Addi->addOperand(MI.getOperand(2));
// The ADDItls* instruction is the first instruction in the
// repair range.
@@ -113,7 +113,7 @@ protected:
MachineInstr *Call = (BuildMI(MBB, I, DL, TII->get(Opc2), GPR3)
.addReg(GPR3));
- Call->addOperand(MI->getOperand(3));
+ Call->addOperand(MI.getOperand(3));
BuildMI(MBB, I, DL, TII->get(PPC::ADJCALLSTACKUP)).addImm(0).addImm(0);
@@ -126,7 +126,7 @@ protected:
// Move past the original instruction and remove it.
++I;
- MI->removeFromParent();
+ MI.removeFromParent();
// Repair the live intervals.
LIS->repairIntervalsInRange(&MBB, First, Last, OrigRegs);
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp b/contrib/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp
index 1bb6b67..91b1d24 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCTargetMachine.cpp
@@ -74,9 +74,9 @@ EnableMachineCombinerPass("ppc-machine-combiner",
extern "C" void LLVMInitializePowerPCTarget() {
// Register the targets
- RegisterTargetMachine<PPC32TargetMachine> A(ThePPC32Target);
- RegisterTargetMachine<PPC64TargetMachine> B(ThePPC64Target);
- RegisterTargetMachine<PPC64TargetMachine> C(ThePPC64LETarget);
+ RegisterTargetMachine<PPC32TargetMachine> A(getThePPC32Target());
+ RegisterTargetMachine<PPC64TargetMachine> B(getThePPC64Target());
+ RegisterTargetMachine<PPC64TargetMachine> C(getThePPC64LETarget());
PassRegistry &PR = *PassRegistry::getPassRegistry();
initializePPCBoolRetToIntPass(PR);
@@ -181,6 +181,10 @@ static PPCTargetMachine::PPCABI computeTargetABI(const Triple &TT,
static Reloc::Model getEffectiveRelocModel(const Triple &TT,
Optional<Reloc::Model> RM) {
if (!RM.hasValue()) {
+ if (TT.getArch() == Triple::ppc64 || TT.getArch() == Triple::ppc64le) {
+ if (!TT.isOSBinFormatMachO() && !TT.isMacOSX())
+ return Reloc::PIC_;
+ }
if (TT.isOSDarwin())
return Reloc::DynamicNoPIC;
return Reloc::Static;
@@ -204,23 +208,6 @@ PPCTargetMachine::PPCTargetMachine(const Target &T, const Triple &TT,
TargetABI(computeTargetABI(TT, Options)),
Subtarget(TargetTriple, CPU, computeFSAdditions(FS, OL, TT), *this) {
- // For the estimates, convergence is quadratic, so we essentially double the
- // number of digits correct after every iteration. For both FRE and FRSQRTE,
- // the minimum architected relative accuracy is 2^-5. When hasRecipPrec(),
- // this is 2^-14. IEEE float has 23 digits and double has 52 digits.
- unsigned RefinementSteps = Subtarget.hasRecipPrec() ? 1 : 3,
- RefinementSteps64 = RefinementSteps + 1;
-
- this->Options.Reciprocals.setDefaults("sqrtf", true, RefinementSteps);
- this->Options.Reciprocals.setDefaults("vec-sqrtf", true, RefinementSteps);
- this->Options.Reciprocals.setDefaults("divf", true, RefinementSteps);
- this->Options.Reciprocals.setDefaults("vec-divf", true, RefinementSteps);
-
- this->Options.Reciprocals.setDefaults("sqrtd", true, RefinementSteps64);
- this->Options.Reciprocals.setDefaults("vec-sqrtd", true, RefinementSteps64);
- this->Options.Reciprocals.setDefaults("divd", true, RefinementSteps64);
- this->Options.Reciprocals.setDefaults("vec-divd", true, RefinementSteps64);
-
initAsmInfo();
}
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCTargetObjectFile.cpp b/contrib/llvm/lib/Target/PowerPC/PPCTargetObjectFile.cpp
index 8f66035..a049dc3 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCTargetObjectFile.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCTargetObjectFile.cpp
@@ -23,8 +23,7 @@ Initialize(MCContext &Ctx, const TargetMachine &TM) {
}
MCSection *PPC64LinuxTargetObjectFile::SelectSectionForGlobal(
- const GlobalValue *GV, SectionKind Kind, Mangler &Mang,
- const TargetMachine &TM) const {
+ const GlobalObject *GO, SectionKind Kind, const TargetMachine &TM) const {
// Here override ReadOnlySection to DataRelROSection for PPC64 SVR4 ABI
// when we have a constant that contains global relocations. This is
// necessary because of this ABI's handling of pointers to functions in
@@ -40,14 +39,13 @@ MCSection *PPC64LinuxTargetObjectFile::SelectSectionForGlobal(
// For more information, see the description of ELIMINATE_COPY_RELOCS in
// GNU ld.
if (Kind.isReadOnly()) {
- const GlobalVariable *GVar = dyn_cast<GlobalVariable>(GV);
+ const auto *GVar = dyn_cast<GlobalVariable>(GO);
if (GVar && GVar->isConstant() && GVar->getInitializer()->needsRelocation())
Kind = SectionKind::getReadOnlyWithRel();
}
- return TargetLoweringObjectFileELF::SelectSectionForGlobal(GV, Kind,
- Mang, TM);
+ return TargetLoweringObjectFileELF::SelectSectionForGlobal(GO, Kind, TM);
}
const MCExpr *PPC64LinuxTargetObjectFile::
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCTargetObjectFile.h b/contrib/llvm/lib/Target/PowerPC/PPCTargetObjectFile.h
index d248791..c8b9b2e 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCTargetObjectFile.h
+++ b/contrib/llvm/lib/Target/PowerPC/PPCTargetObjectFile.h
@@ -22,8 +22,7 @@ namespace llvm {
void Initialize(MCContext &Ctx, const TargetMachine &TM) override;
- MCSection *SelectSectionForGlobal(const GlobalValue *GV, SectionKind Kind,
- Mangler &Mang,
+ MCSection *SelectSectionForGlobal(const GlobalObject *GO, SectionKind Kind,
const TargetMachine &TM) const override;
/// \brief Describe a TLS variable address within debug info.
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp b/contrib/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
index 9331e41..f94d1ea 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.cpp
@@ -131,12 +131,12 @@ int PPCTTIImpl::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
return TTI::TCC_Free;
case Instruction::And:
RunFree = true; // (for the rotate-and-mask instructions)
- // Fallthrough...
+ LLVM_FALLTHROUGH;
case Instruction::Add:
case Instruction::Or:
case Instruction::Xor:
ShiftedFree = true;
- // Fallthrough...
+ LLVM_FALLTHROUGH;
case Instruction::Sub:
case Instruction::Mul:
case Instruction::Shl:
@@ -147,7 +147,8 @@ int PPCTTIImpl::getIntImmCost(unsigned Opcode, unsigned Idx, const APInt &Imm,
case Instruction::ICmp:
UnsignedFree = true;
ImmIdx = 1;
- // Fallthrough... (zero comparisons can use record-form instructions)
+ // Zero comparisons can use record-form instructions.
+ LLVM_FALLTHROUGH;
case Instruction::Select:
ZeroFree = true;
break;
@@ -280,7 +281,7 @@ unsigned PPCTTIImpl::getMaxInterleaveFactor(unsigned VF) {
int PPCTTIImpl::getArithmeticInstrCost(
unsigned Opcode, Type *Ty, TTI::OperandValueKind Op1Info,
TTI::OperandValueKind Op2Info, TTI::OperandValueProperties Opd1PropInfo,
- TTI::OperandValueProperties Opd2PropInfo) {
+ TTI::OperandValueProperties Opd2PropInfo, ArrayRef<const Value *> Args) {
assert(TLI->InstructionOpcodeToISD(Opcode) && "Invalid opcode");
// Fallback to the default implementation.
@@ -359,11 +360,6 @@ int PPCTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,
int Cost = BaseT::getMemoryOpCost(Opcode, Src, Alignment, AddressSpace);
- // Aligned loads and stores are easy.
- unsigned SrcBytes = LT.second.getStoreSize();
- if (!SrcBytes || !Alignment || Alignment >= SrcBytes)
- return Cost;
-
bool IsAltivecType = ST->hasAltivec() &&
(LT.second == MVT::v16i8 || LT.second == MVT::v8i16 ||
LT.second == MVT::v4i32 || LT.second == MVT::v4f32);
@@ -372,6 +368,20 @@ int PPCTTIImpl::getMemoryOpCost(unsigned Opcode, Type *Src, unsigned Alignment,
bool IsQPXType = ST->hasQPX() &&
(LT.second == MVT::v4f64 || LT.second == MVT::v4f32);
+ // VSX has 32b/64b load instructions. Legalization can handle loading of
+ // 32b/64b to VSR correctly and cheaply. But BaseT::getMemoryOpCost and
+ // PPCTargetLowering can't compute the cost appropriately. So here we
+ // explicitly check this case.
+ unsigned MemBytes = Src->getPrimitiveSizeInBits();
+ if (Opcode == Instruction::Load && ST->hasVSX() && IsAltivecType &&
+ (MemBytes == 64 || (ST->hasP8Vector() && MemBytes == 32)))
+ return 1;
+
+ // Aligned loads and stores are easy.
+ unsigned SrcBytes = LT.second.getStoreSize();
+ if (!SrcBytes || !Alignment || Alignment >= SrcBytes)
+ return Cost;
+
// If we can use the permutation-based load sequence, then this is also
// relatively cheap (not counting loop-invariant instructions): one load plus
// one permute (the last load in a series has extra cost, but we're
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h b/contrib/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
index 5ea9a54..30ee281 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
+++ b/contrib/llvm/lib/Target/PowerPC/PPCTargetTransformInfo.h
@@ -41,13 +41,6 @@ public:
: BaseT(TM, F.getParent()->getDataLayout()), ST(TM->getSubtargetImpl(F)),
TLI(ST->getTargetLowering()) {}
- // Provide value semantics. MSVC requires that we spell all of these out.
- PPCTTIImpl(const PPCTTIImpl &Arg)
- : BaseT(static_cast<const BaseT &>(Arg)), ST(Arg.ST), TLI(Arg.TLI) {}
- PPCTTIImpl(PPCTTIImpl &&Arg)
- : BaseT(std::move(static_cast<BaseT &>(Arg))), ST(std::move(Arg.ST)),
- TLI(std::move(Arg.TLI)) {}
-
/// \name Scalar TTI Implementations
/// @{
@@ -78,7 +71,8 @@ public:
TTI::OperandValueKind Opd1Info = TTI::OK_AnyValue,
TTI::OperandValueKind Opd2Info = TTI::OK_AnyValue,
TTI::OperandValueProperties Opd1PropInfo = TTI::OP_None,
- TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None);
+ TTI::OperandValueProperties Opd2PropInfo = TTI::OP_None,
+ ArrayRef<const Value *> Args = ArrayRef<const Value *>());
int getShuffleCost(TTI::ShuffleKind Kind, Type *Tp, int Index, Type *SubTp);
int getCastInstrCost(unsigned Opcode, Type *Dst, Type *Src);
int getCmpSelInstrCost(unsigned Opcode, Type *ValTy, Type *CondTy);
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCVSXCopy.cpp b/contrib/llvm/lib/Target/PowerPC/PPCVSXCopy.cpp
index 60f1ad5..3b5d8f0 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCVSXCopy.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCVSXCopy.cpp
@@ -89,37 +89,31 @@ protected:
bool Changed = false;
MachineRegisterInfo &MRI = MBB.getParent()->getRegInfo();
- for (MachineBasicBlock::iterator I = MBB.begin(), IE = MBB.end();
- I != IE; ++I) {
- MachineInstr *MI = I;
- if (!MI->isFullCopy())
+ for (MachineInstr &MI : MBB) {
+ if (!MI.isFullCopy())
continue;
- MachineOperand &DstMO = MI->getOperand(0);
- MachineOperand &SrcMO = MI->getOperand(1);
+ MachineOperand &DstMO = MI.getOperand(0);
+ MachineOperand &SrcMO = MI.getOperand(1);
if ( IsVSReg(DstMO.getReg(), MRI) &&
!IsVSReg(SrcMO.getReg(), MRI)) {
// This is a copy *to* a VSX register from a non-VSX register.
Changed = true;
- const TargetRegisterClass *SrcRC =
- IsVRReg(SrcMO.getReg(), MRI) ? &PPC::VSHRCRegClass :
- &PPC::VSLRCRegClass;
+ const TargetRegisterClass *SrcRC = &PPC::VSLRCRegClass;
assert((IsF8Reg(SrcMO.getReg(), MRI) ||
- IsVRReg(SrcMO.getReg(), MRI) ||
IsVSSReg(SrcMO.getReg(), MRI) ||
IsVSFReg(SrcMO.getReg(), MRI)) &&
"Unknown source for a VSX copy");
unsigned NewVReg = MRI.createVirtualRegister(SrcRC);
- BuildMI(MBB, MI, MI->getDebugLoc(),
+ BuildMI(MBB, MI, MI.getDebugLoc(),
TII->get(TargetOpcode::SUBREG_TO_REG), NewVReg)
- .addImm(1) // add 1, not 0, because there is no implicit clearing
- // of the high bits.
- .addOperand(SrcMO)
- .addImm(IsVRReg(SrcMO.getReg(), MRI) ? PPC::sub_128 :
- PPC::sub_64);
+ .addImm(1) // add 1, not 0, because there is no implicit clearing
+ // of the high bits.
+ .addOperand(SrcMO)
+ .addImm(PPC::sub_64);
// The source of the original copy is now the new virtual register.
SrcMO.setReg(NewVReg);
@@ -128,25 +122,21 @@ protected:
// This is a copy *from* a VSX register to a non-VSX register.
Changed = true;
- const TargetRegisterClass *DstRC =
- IsVRReg(DstMO.getReg(), MRI) ? &PPC::VSHRCRegClass :
- &PPC::VSLRCRegClass;
+ const TargetRegisterClass *DstRC = &PPC::VSLRCRegClass;
assert((IsF8Reg(DstMO.getReg(), MRI) ||
IsVSFReg(DstMO.getReg(), MRI) ||
- IsVSSReg(DstMO.getReg(), MRI) ||
- IsVRReg(DstMO.getReg(), MRI)) &&
+ IsVSSReg(DstMO.getReg(), MRI)) &&
"Unknown destination for a VSX copy");
// Copy the VSX value into a new VSX register of the correct subclass.
unsigned NewVReg = MRI.createVirtualRegister(DstRC);
- BuildMI(MBB, MI, MI->getDebugLoc(),
- TII->get(TargetOpcode::COPY), NewVReg)
- .addOperand(SrcMO);
+ BuildMI(MBB, MI, MI.getDebugLoc(), TII->get(TargetOpcode::COPY),
+ NewVReg)
+ .addOperand(SrcMO);
// Transform the original copy into a subregister extraction copy.
SrcMO.setReg(NewVReg);
- SrcMO.setSubReg(IsVRReg(DstMO.getReg(), MRI) ? PPC::sub_128 :
- PPC::sub_64);
+ SrcMO.setSubReg(PPC::sub_64);
}
}
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCVSXFMAMutate.cpp b/contrib/llvm/lib/Target/PowerPC/PPCVSXFMAMutate.cpp
index 7c22cb2..f6d20ce 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCVSXFMAMutate.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCVSXFMAMutate.cpp
@@ -21,6 +21,7 @@
#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/Statistic.h"
#include "llvm/CodeGen/LiveIntervalAnalysis.h"
+#include "llvm/CodeGen/MachineDominators.h"
#include "llvm/CodeGen/MachineFrameInfo.h"
#include "llvm/CodeGen/MachineFunctionPass.h"
#include "llvm/CodeGen/MachineInstrBuilder.h"
@@ -74,7 +75,7 @@ protected:
const TargetRegisterInfo *TRI = &TII->getRegisterInfo();
for (MachineBasicBlock::iterator I = MBB.begin(), IE = MBB.end();
I != IE; ++I) {
- MachineInstr *MI = I;
+ MachineInstr &MI = *I;
// The default (A-type) VSX FMA form kills the addend (it is taken from
// the target register, which is then updated to reflect the result of
@@ -82,7 +83,7 @@ protected:
// used for the product, then we can use the M-form instruction (which
// will take that value from the to-be-defined register).
- int AltOpc = PPC::getAltVSXFMAOpcode(MI->getOpcode());
+ int AltOpc = PPC::getAltVSXFMAOpcode(MI.getOpcode());
if (AltOpc == -1)
continue;
@@ -105,10 +106,10 @@ protected:
// %RM<imp-use>; VSLRC:%vreg16,%vreg18,%vreg9
// and we remove: %vreg5<def> = COPY %vreg9; VSLRC:%vreg5,%vreg9
- SlotIndex FMAIdx = LIS->getInstructionIndex(*MI);
+ SlotIndex FMAIdx = LIS->getInstructionIndex(MI);
VNInfo *AddendValNo =
- LIS->getInterval(MI->getOperand(1).getReg()).Query(FMAIdx).valueIn();
+ LIS->getInterval(MI.getOperand(1).getReg()).Query(FMAIdx).valueIn();
// This can be null if the register is undef.
if (!AddendValNo)
@@ -118,7 +119,7 @@ protected:
// The addend and this instruction must be in the same block.
- if (!AddendMI || AddendMI->getParent() != MI->getParent())
+ if (!AddendMI || AddendMI->getParent() != MI.getParent())
continue;
// The addend must be a full copy within the same register class.
@@ -182,12 +183,12 @@ protected:
// %vreg5 = A-form-op %vreg5, %vreg5, %vreg11;
// where vreg5 and vreg11 are both kills. This case would be skipped
// otherwise.
- unsigned OldFMAReg = MI->getOperand(0).getReg();
+ unsigned OldFMAReg = MI.getOperand(0).getReg();
// Find one of the product operands that is killed by this instruction.
unsigned KilledProdOp = 0, OtherProdOp = 0;
- unsigned Reg2 = MI->getOperand(2).getReg();
- unsigned Reg3 = MI->getOperand(3).getReg();
+ unsigned Reg2 = MI.getOperand(2).getReg();
+ unsigned Reg3 = MI.getOperand(3).getReg();
if (LIS->getInterval(Reg2).Query(FMAIdx).isKill()
&& Reg2 != OldFMAReg) {
KilledProdOp = 2;
@@ -214,20 +215,20 @@ protected:
// Transform: (O2 * O3) + O1 -> (O2 * O1) + O3.
- unsigned KilledProdReg = MI->getOperand(KilledProdOp).getReg();
- unsigned OtherProdReg = MI->getOperand(OtherProdOp).getReg();
+ unsigned KilledProdReg = MI.getOperand(KilledProdOp).getReg();
+ unsigned OtherProdReg = MI.getOperand(OtherProdOp).getReg();
unsigned AddSubReg = AddendMI->getOperand(1).getSubReg();
- unsigned KilledProdSubReg = MI->getOperand(KilledProdOp).getSubReg();
- unsigned OtherProdSubReg = MI->getOperand(OtherProdOp).getSubReg();
+ unsigned KilledProdSubReg = MI.getOperand(KilledProdOp).getSubReg();
+ unsigned OtherProdSubReg = MI.getOperand(OtherProdOp).getSubReg();
bool AddRegKill = AddendMI->getOperand(1).isKill();
- bool KilledProdRegKill = MI->getOperand(KilledProdOp).isKill();
- bool OtherProdRegKill = MI->getOperand(OtherProdOp).isKill();
+ bool KilledProdRegKill = MI.getOperand(KilledProdOp).isKill();
+ bool OtherProdRegKill = MI.getOperand(OtherProdOp).isKill();
bool AddRegUndef = AddendMI->getOperand(1).isUndef();
- bool KilledProdRegUndef = MI->getOperand(KilledProdOp).isUndef();
- bool OtherProdRegUndef = MI->getOperand(OtherProdOp).isUndef();
+ bool KilledProdRegUndef = MI.getOperand(KilledProdOp).isUndef();
+ bool OtherProdRegUndef = MI.getOperand(OtherProdOp).isUndef();
// If there isn't a class that fits, we can't perform the transform.
// This is needed for correctness with a mixture of VSX and Altivec
@@ -240,39 +241,39 @@ protected:
assert(OldFMAReg == AddendMI->getOperand(0).getReg() &&
"Addend copy not tied to old FMA output!");
- DEBUG(dbgs() << "VSX FMA Mutation:\n " << *MI;);
+ DEBUG(dbgs() << "VSX FMA Mutation:\n " << MI);
- MI->getOperand(0).setReg(KilledProdReg);
- MI->getOperand(1).setReg(KilledProdReg);
- MI->getOperand(3).setReg(AddendSrcReg);
+ MI.getOperand(0).setReg(KilledProdReg);
+ MI.getOperand(1).setReg(KilledProdReg);
+ MI.getOperand(3).setReg(AddendSrcReg);
- MI->getOperand(0).setSubReg(KilledProdSubReg);
- MI->getOperand(1).setSubReg(KilledProdSubReg);
- MI->getOperand(3).setSubReg(AddSubReg);
+ MI.getOperand(0).setSubReg(KilledProdSubReg);
+ MI.getOperand(1).setSubReg(KilledProdSubReg);
+ MI.getOperand(3).setSubReg(AddSubReg);
- MI->getOperand(1).setIsKill(KilledProdRegKill);
- MI->getOperand(3).setIsKill(AddRegKill);
+ MI.getOperand(1).setIsKill(KilledProdRegKill);
+ MI.getOperand(3).setIsKill(AddRegKill);
- MI->getOperand(1).setIsUndef(KilledProdRegUndef);
- MI->getOperand(3).setIsUndef(AddRegUndef);
+ MI.getOperand(1).setIsUndef(KilledProdRegUndef);
+ MI.getOperand(3).setIsUndef(AddRegUndef);
- MI->setDesc(TII->get(AltOpc));
+ MI.setDesc(TII->get(AltOpc));
// If the addend is also a multiplicand, replace it with the addend
// source in both places.
if (OtherProdReg == AddendMI->getOperand(0).getReg()) {
- MI->getOperand(2).setReg(AddendSrcReg);
- MI->getOperand(2).setSubReg(AddSubReg);
- MI->getOperand(2).setIsKill(AddRegKill);
- MI->getOperand(2).setIsUndef(AddRegUndef);
+ MI.getOperand(2).setReg(AddendSrcReg);
+ MI.getOperand(2).setSubReg(AddSubReg);
+ MI.getOperand(2).setIsKill(AddRegKill);
+ MI.getOperand(2).setIsUndef(AddRegUndef);
} else {
- MI->getOperand(2).setReg(OtherProdReg);
- MI->getOperand(2).setSubReg(OtherProdSubReg);
- MI->getOperand(2).setIsKill(OtherProdRegKill);
- MI->getOperand(2).setIsUndef(OtherProdRegUndef);
+ MI.getOperand(2).setReg(OtherProdReg);
+ MI.getOperand(2).setSubReg(OtherProdSubReg);
+ MI.getOperand(2).setIsKill(OtherProdRegKill);
+ MI.getOperand(2).setIsUndef(OtherProdRegUndef);
}
- DEBUG(dbgs() << " -> " << *MI);
+ DEBUG(dbgs() << " -> " << MI);
// The killed product operand was killed here, so we can reuse it now
// for the result of the fma.
@@ -374,6 +375,8 @@ public:
AU.addPreserved<LiveIntervals>();
AU.addRequired<SlotIndexes>();
AU.addPreserved<SlotIndexes>();
+ AU.addRequired<MachineDominatorTree>();
+ AU.addPreserved<MachineDominatorTree>();
MachineFunctionPass::getAnalysisUsage(AU);
}
};
@@ -383,6 +386,7 @@ INITIALIZE_PASS_BEGIN(PPCVSXFMAMutate, DEBUG_TYPE,
"PowerPC VSX FMA Mutation", false, false)
INITIALIZE_PASS_DEPENDENCY(LiveIntervals)
INITIALIZE_PASS_DEPENDENCY(SlotIndexes)
+INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree)
INITIALIZE_PASS_END(PPCVSXFMAMutate, DEBUG_TYPE,
"PowerPC VSX FMA Mutation", false, false)
diff --git a/contrib/llvm/lib/Target/PowerPC/PPCVSXSwapRemoval.cpp b/contrib/llvm/lib/Target/PowerPC/PPCVSXSwapRemoval.cpp
index d53c8e3..8197285 100644
--- a/contrib/llvm/lib/Target/PowerPC/PPCVSXSwapRemoval.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/PPCVSXSwapRemoval.cpp
@@ -962,7 +962,8 @@ void PPCVSXSwapRemoval::dumpSwapVector() {
DEBUG(dbgs() << format("%6d", ID));
DEBUG(dbgs() << format("%6d", EC->getLeaderValue(ID)));
DEBUG(dbgs() << format(" BB#%3d", MI->getParent()->getNumber()));
- DEBUG(dbgs() << format(" %14s ", TII->getName(MI->getOpcode())));
+ DEBUG(dbgs() << format(" %14s ",
+ TII->getName(MI->getOpcode()).str().c_str()));
if (SwapVector[EntryIdx].IsLoad)
DEBUG(dbgs() << "load ");
diff --git a/contrib/llvm/lib/Target/PowerPC/TargetInfo/PowerPCTargetInfo.cpp b/contrib/llvm/lib/Target/PowerPC/TargetInfo/PowerPCTargetInfo.cpp
index 5b2fe19..a637dd1 100644
--- a/contrib/llvm/lib/Target/PowerPC/TargetInfo/PowerPCTargetInfo.cpp
+++ b/contrib/llvm/lib/Target/PowerPC/TargetInfo/PowerPCTargetInfo.cpp
@@ -12,15 +12,26 @@
#include "llvm/Support/TargetRegistry.h"
using namespace llvm;
-Target llvm::ThePPC32Target, llvm::ThePPC64Target, llvm::ThePPC64LETarget;
+Target &llvm::getThePPC32Target() {
+ static Target ThePPC32Target;
+ return ThePPC32Target;
+}
+Target &llvm::getThePPC64Target() {
+ static Target ThePPC64Target;
+ return ThePPC64Target;
+}
+Target &llvm::getThePPC64LETarget() {
+ static Target ThePPC64LETarget;
+ return ThePPC64LETarget;
+}
extern "C" void LLVMInitializePowerPCTargetInfo() {
- RegisterTarget<Triple::ppc, /*HasJIT=*/true>
- X(ThePPC32Target, "ppc32", "PowerPC 32");
+ RegisterTarget<Triple::ppc, /*HasJIT=*/true> X(getThePPC32Target(), "ppc32",
+ "PowerPC 32");
- RegisterTarget<Triple::ppc64, /*HasJIT=*/true>
- Y(ThePPC64Target, "ppc64", "PowerPC 64");
+ RegisterTarget<Triple::ppc64, /*HasJIT=*/true> Y(getThePPC64Target(), "ppc64",
+ "PowerPC 64");
- RegisterTarget<Triple::ppc64le, /*HasJIT=*/true>
- Z(ThePPC64LETarget, "ppc64le", "PowerPC 64 LE");
+ RegisterTarget<Triple::ppc64le, /*HasJIT=*/true> Z(
+ getThePPC64LETarget(), "ppc64le", "PowerPC 64 LE");
}
OpenPOWER on IntegriCloud