diff options
author | dim <dim@FreeBSD.org> | 2016-12-26 20:36:37 +0000 |
---|---|---|
committer | dim <dim@FreeBSD.org> | 2016-12-26 20:36:37 +0000 |
commit | 06210ae42d418d50d8d9365d5c9419308ae9e7ee (patch) | |
tree | ab60b4cdd6e430dda1f292a46a77ddb744723f31 /contrib/llvm/lib/Target/X86/X86CallFrameOptimization.cpp | |
parent | 2dd166267f53df1c3748b4325d294b9b839de74b (diff) | |
download | FreeBSD-src-06210ae42d418d50d8d9365d5c9419308ae9e7ee.zip FreeBSD-src-06210ae42d418d50d8d9365d5c9419308ae9e7ee.tar.gz |
MFC r309124:
Upgrade our copies of clang, llvm, lldb, compiler-rt and libc++ to 3.9.0
release, and add lld 3.9.0. Also completely revamp the build system for
clang, llvm, lldb and their related tools.
Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11
support to build; see UPDATING for more information.
Release notes for llvm, clang and lld are available here:
<http://llvm.org/releases/3.9.0/docs/ReleaseNotes.html>
<http://llvm.org/releases/3.9.0/tools/clang/docs/ReleaseNotes.html>
<http://llvm.org/releases/3.9.0/tools/lld/docs/ReleaseNotes.html>
Thanks to Ed Maste, Bryan Drewery, Andrew Turner, Antoine Brodin and Jan
Beich for their help.
Relnotes: yes
MFC r309147:
Pull in r282174 from upstream llvm trunk (by Krzysztof Parzyszek):
[PPC] Set SP after loading data from stack frame, if no red zone is
present
Follow-up to r280705: Make sure that the SP is only restored after
all data is loaded from the stack frame, if there is no red zone.
This completes the fix for
https://llvm.org/bugs/show_bug.cgi?id=26519.
Differential Revision: https://reviews.llvm.org/D24466
Reported by: Mark Millard
PR: 214433
MFC r309149:
Pull in r283060 from upstream llvm trunk (by Hal Finkel):
[PowerPC] Refactor soft-float support, and enable PPC64 soft float
This change enables soft-float for PowerPC64, and also makes
soft-float disable all vector instruction sets for both 32-bit and
64-bit modes. This latter part is necessary because the PPC backend
canonicalizes many Altivec vector types to floating-point types, and
so soft-float breaks scalarization support for many operations. Both
for embedded targets and for operating-system kernels desiring
soft-float support, it seems reasonable that disabling hardware
floating-point also disables vector instructions (embedded targets
without hardware floating point support are unlikely to have Altivec,
etc. and operating system kernels desiring not to use floating-point
registers to lower syscall cost are unlikely to want to use vector
registers either). If someone needs this to work, we'll need to
change the fact that we promote many Altivec operations to act on
v4f32. To make it possible to disable Altivec when soft-float is
enabled, hardware floating-point support needs to be expressed as a
positive feature, like the others, and not a negative feature,
because target features cannot have dependencies on the disabling of
some other feature. So +soft-float has now become -hard-float.
Fixes PR26970.
Pull in r283061 from upstream clang trunk (by Hal Finkel):
[PowerPC] Enable soft-float for PPC64, and +soft-float -> -hard-float
Enable soft-float support on PPC64, as the backend now supports it.
Also, the backend now uses -hard-float instead of +soft-float, so set
the target features accordingly.
Fixes PR26970.
Reported by: Mark Millard
PR: 214433
MFC r309212:
Add a few missed clang 3.9.0 files to OptionalObsoleteFiles.
MFC r309262:
Fix packaging for clang, lldb and lld 3.9.0
During the upgrade of clang/llvm etc to 3.9.0 in r309124, the PACKAGE
directive in the usr.bin/clang/*.mk files got dropped accidentally.
Restore it, with a few minor changes and additions:
* Correct license in clang.ucl to NCSA
* Add PACKAGE=clang for clang and most of the "ll" tools
* Put lldb in its own package
* Put lld in its own package
Reviewed by: gjb, jmallett
Differential Revision: https://reviews.freebsd.org/D8666
MFC r309656:
During the bootstrap phase, when building the minimal llvm library on
PowerPC, add lib/Support/Atomic.cpp. This is needed because upstream
llvm revision r271821 disabled the use of std::call_once, which causes
some fallback functions from Atomic.cpp to be used instead.
Reported by: Mark Millard
PR: 214902
MFC r309835:
Tentatively apply https://reviews.llvm.org/D18730 to work around gcc PR
70528 (bogus error: constructor required before non-static data member).
This should fix buildworld with the external gcc package.
Reported by: https://jenkins.freebsd.org/job/FreeBSD_HEAD_amd64_gcc/
MFC r310194:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
3.9.1 release.
Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11
support to build; see UPDATING for more information.
Release notes for llvm, clang and lld will be available here:
<http://releases.llvm.org/3.9.1/docs/ReleaseNotes.html>
<http://releases.llvm.org/3.9.1/tools/clang/docs/ReleaseNotes.html>
<http://releases.llvm.org/3.9.1/tools/lld/docs/ReleaseNotes.html>
Relnotes: yes
Diffstat (limited to 'contrib/llvm/lib/Target/X86/X86CallFrameOptimization.cpp')
-rw-r--r-- | contrib/llvm/lib/Target/X86/X86CallFrameOptimization.cpp | 191 |
1 files changed, 112 insertions, 79 deletions
diff --git a/contrib/llvm/lib/Target/X86/X86CallFrameOptimization.cpp b/contrib/llvm/lib/Target/X86/X86CallFrameOptimization.cpp index fc6ee17..8f6fc40 100644 --- a/contrib/llvm/lib/Target/X86/X86CallFrameOptimization.cpp +++ b/contrib/llvm/lib/Target/X86/X86CallFrameOptimization.cpp @@ -10,9 +10,9 @@ // This file defines a pass that optimizes call sequences on x86. // Currently, it converts movs of function parameters onto the stack into // pushes. This is beneficial for two main reasons: -// 1) The push instruction encoding is much smaller than an esp-relative mov +// 1) The push instruction encoding is much smaller than a stack-ptr-based mov. // 2) It is possible to push memory arguments directly. So, if the -// the transformation is preformed pre-reg-alloc, it can help relieve +// the transformation is performed pre-reg-alloc, it can help relieve // register pressure. // //===----------------------------------------------------------------------===// @@ -21,8 +21,8 @@ #include "X86.h" #include "X86InstrInfo.h" -#include "X86Subtarget.h" #include "X86MachineFunctionInfo.h" +#include "X86Subtarget.h" #include "llvm/ADT/Statistic.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineInstrBuilder.h" @@ -55,7 +55,7 @@ private: struct CallContext { CallContext() : FrameSetup(nullptr), Call(nullptr), SPCopy(nullptr), ExpectedDist(0), - MovVector(4, nullptr), NoStackParams(false), UsePush(false){} + MovVector(4, nullptr), NoStackParams(false), UsePush(false) {} // Iterator referring to the frame setup instruction MachineBasicBlock::iterator FrameSetup; @@ -75,7 +75,7 @@ private: // True if this call site has no stack parameters bool NoStackParams; - // True of this callsite can use push instructions + // True if this call site can use push instructions bool UsePush; }; @@ -88,7 +88,7 @@ private: void collectCallInfo(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I, CallContext &Context); - bool adjustCallSequence(MachineFunction &MF, const CallContext &Context); + void adjustCallSequence(MachineFunction &MF, const CallContext &Context); MachineInstr *canFoldIntoRegPush(MachineBasicBlock::iterator FrameSetup, unsigned Reg); @@ -105,12 +105,14 @@ private: const TargetInstrInfo *TII; const X86FrameLowering *TFL; const X86Subtarget *STI; - const MachineRegisterInfo *MRI; + MachineRegisterInfo *MRI; + unsigned SlotSize; + unsigned Log2SlotSize; static char ID; }; char X86CallFrameOptimization::ID = 0; -} +} // end anonymous namespace FunctionPass *llvm::createX86CallFrameOptimization() { return new X86CallFrameOptimization(); @@ -123,22 +125,24 @@ bool X86CallFrameOptimization::isLegal(MachineFunction &MF) { if (NoX86CFOpt.getValue()) return false; - // We currently only support call sequences where *all* parameters. - // are passed on the stack. - // No point in running this in 64-bit mode, since some arguments are - // passed in-register in all common calling conventions, so the pattern - // we're looking for will never match. - if (STI->is64Bit()) + // Work around LLVM PR30879 (bad interaction between CFO and libunwind) + if (STI->isTargetFreeBSD() && STI->is32Bit() && + STI->getTargetTriple().getOSMajorVersion() >= 12) return false; // We can't encode multiple DW_CFA_GNU_args_size or DW_CFA_def_cfa_offset // in the compact unwind encoding that Darwin uses. So, bail if there // is a danger of that being generated. - if (STI->isTargetDarwin() && - (!MF.getMMI().getLandingPads().empty() || + if (STI->isTargetDarwin() && + (!MF.getMMI().getLandingPads().empty() || (MF.getFunction()->needsUnwindTableEntry() && !TFL->hasFP(MF)))) return false; + // It is not valid to change the stack pointer outside the prolog/epilog + // on 64-bit Windows. + if (STI->isTargetWin64()) + return false; + // You would expect straight-line code between call-frame setup and // call-frame destroy. You would be wrong. There are circumstances (e.g. // CMOV_GR8 expansion of a select that feeds a function call!) where we can @@ -169,10 +173,10 @@ bool X86CallFrameOptimization::isLegal(MachineFunction &MF) { return true; } -// Check whether this trasnformation is profitable for a particular +// Check whether this transformation is profitable for a particular // function - in terms of code size. -bool X86CallFrameOptimization::isProfitable(MachineFunction &MF, - ContextVector &CallSeqVector) { +bool X86CallFrameOptimization::isProfitable(MachineFunction &MF, + ContextVector &CallSeqVector) { // This transformation is always a win when we do not expect to have // a reserved call frame. Under other circumstances, it may be either // a win or a loss, and requires a heuristic. @@ -180,10 +184,6 @@ bool X86CallFrameOptimization::isProfitable(MachineFunction &MF, if (CannotReserveFrame) return true; - // Don't do this when not optimizing for size. - if (!MF.getFunction()->optForSize()) - return false; - unsigned StackAlign = TFL->getStackAlignment(); int64_t Advantage = 0; @@ -206,16 +206,16 @@ bool X86CallFrameOptimization::isProfitable(MachineFunction &MF, // We can use pushes. First, account for the fixed costs. // We'll need a add after the call. Advantage -= 3; - // If we have to realign the stack, we'll also need and sub before + // If we have to realign the stack, we'll also need a sub before if (CC.ExpectedDist % StackAlign) Advantage -= 3; // Now, for each push, we save ~3 bytes. For small constants, we actually, // save more (up to 5 bytes), but 3 should be a good approximation. - Advantage += (CC.ExpectedDist / 4) * 3; + Advantage += (CC.ExpectedDist >> Log2SlotSize) * 3; } } - return (Advantage >= 0); + return Advantage >= 0; } bool X86CallFrameOptimization::runOnMachineFunction(MachineFunction &MF) { @@ -224,6 +224,12 @@ bool X86CallFrameOptimization::runOnMachineFunction(MachineFunction &MF) { TFL = STI->getFrameLowering(); MRI = &MF.getRegInfo(); + const X86RegisterInfo &RegInfo = + *static_cast<const X86RegisterInfo *>(STI->getRegisterInfo()); + SlotSize = RegInfo.getSlotSize(); + assert(isPowerOf2_32(SlotSize) && "Expect power of 2 stack slot size"); + Log2SlotSize = Log2_32(SlotSize); + if (!isLegal(MF)) return false; @@ -233,20 +239,23 @@ bool X86CallFrameOptimization::runOnMachineFunction(MachineFunction &MF) { ContextVector CallSeqVector; - for (MachineFunction::iterator BB = MF.begin(), E = MF.end(); BB != E; ++BB) - for (MachineBasicBlock::iterator I = BB->begin(); I != BB->end(); ++I) - if (I->getOpcode() == FrameSetupOpcode) { + for (auto &MBB : MF) + for (auto &MI : MBB) + if (MI.getOpcode() == FrameSetupOpcode) { CallContext Context; - collectCallInfo(MF, *BB, I, Context); + collectCallInfo(MF, MBB, MI, Context); CallSeqVector.push_back(Context); } if (!isProfitable(MF, CallSeqVector)) return false; - for (auto CC : CallSeqVector) - if (CC.UsePush) - Changed |= adjustCallSequence(MF, CC); + for (auto CC : CallSeqVector) { + if (CC.UsePush) { + adjustCallSequence(MF, CC); + Changed = true; + } + } return Changed; } @@ -260,7 +269,8 @@ X86CallFrameOptimization::classifyInstruction( // The instructions we actually care about are movs onto the stack int Opcode = MI->getOpcode(); - if (Opcode == X86::MOV32mi || Opcode == X86::MOV32mr) + if (Opcode == X86::MOV32mi || Opcode == X86::MOV32mr || + Opcode == X86::MOV64mi32 || Opcode == X86::MOV64mr) return Convert; // Not all calling conventions have only stack MOVs between the stack @@ -315,8 +325,8 @@ void X86CallFrameOptimization::collectCallInfo(MachineFunction &MF, CallContext &Context) { // Check that this particular call sequence is amenable to the // transformation. - const X86RegisterInfo &RegInfo = *static_cast<const X86RegisterInfo *>( - STI->getRegisterInfo()); + const X86RegisterInfo &RegInfo = + *static_cast<const X86RegisterInfo *>(STI->getRegisterInfo()); unsigned FrameDestroyOpcode = TII->getCallFrameDestroyOpcode(); // We expect to enter this at the beginning of a call sequence @@ -326,7 +336,8 @@ void X86CallFrameOptimization::collectCallInfo(MachineFunction &MF, // How much do we adjust the stack? This puts an upper bound on // the number of parameters actually passed on it. - unsigned int MaxAdjust = FrameSetup->getOperand(0).getImm() / 4; + unsigned int MaxAdjust = + FrameSetup->getOperand(0).getImm() >> Log2SlotSize; // A zero adjustment means no stack parameters if (!MaxAdjust) { @@ -340,19 +351,19 @@ void X86CallFrameOptimization::collectCallInfo(MachineFunction &MF, while (I->getOpcode() == X86::LEA32r) ++I; - // We expect a copy instruction here. - // TODO: The copy instruction is a lowering artifact. - // We should also support a copy-less version, where the stack - // pointer is used directly. - if (!I->isCopy() || !I->getOperand(0).isReg()) - return; - Context.SPCopy = I++; - - unsigned StackPtr = Context.SPCopy->getOperand(0).getReg(); + unsigned StackPtr = RegInfo.getStackRegister(); + // SelectionDAG (but not FastISel) inserts a copy of ESP into a virtual + // register here. If it's there, use that virtual register as stack pointer + // instead. + if (I->isCopy() && I->getOperand(0).isReg() && I->getOperand(1).isReg() && + I->getOperand(1).getReg() == StackPtr) { + Context.SPCopy = &*I++; + StackPtr = Context.SPCopy->getOperand(0).getReg(); + } // Scan the call setup sequence for the pattern we're looking for. - // We only handle a simple case - a sequence of MOV32mi or MOV32mr - // instructions, that push a sequence of 32-bit values onto the stack, with + // We only handle a simple case - a sequence of store instructions that + // push a sequence of stack-slot-aligned values onto the stack, with // no gaps between them. if (MaxAdjust > 4) Context.MovVector.resize(MaxAdjust, nullptr); @@ -367,9 +378,9 @@ void X86CallFrameOptimization::collectCallInfo(MachineFunction &MF, continue; } - // We know the instruction is a MOV32mi/MOV32mr. + // We know the instruction has a supported store opcode. // We only want movs of the form: - // movl imm/r32, k(%esp) + // mov imm/reg, k(%StackPtr) // If we run into something else, bail. // Note that AddrBaseReg may, counter to its name, not be a register, // but rather a frame index. @@ -390,9 +401,9 @@ void X86CallFrameOptimization::collectCallInfo(MachineFunction &MF, "Negative stack displacement when passing parameters"); // We really don't want to consider the unaligned case. - if (StackDisp % 4) + if (StackDisp & (SlotSize - 1)) return; - StackDisp /= 4; + StackDisp >>= Log2SlotSize; assert((size_t)StackDisp < Context.MovVector.size() && "Function call has more parameters than the stack is adjusted for."); @@ -400,7 +411,7 @@ void X86CallFrameOptimization::collectCallInfo(MachineFunction &MF, // If the same stack slot is being filled twice, something's fishy. if (Context.MovVector[StackDisp] != nullptr) return; - Context.MovVector[StackDisp] = I; + Context.MovVector[StackDisp] = &*I; for (const MachineOperand &MO : I->uses()) { if (!MO.isReg()) @@ -418,14 +429,14 @@ void X86CallFrameOptimization::collectCallInfo(MachineFunction &MF, if (I == MBB.end() || !I->isCall()) return; - Context.Call = I; + Context.Call = &*I; if ((++I)->getOpcode() != FrameDestroyOpcode) return; // Now, go through the vector, and see that we don't have any gaps, - // but only a series of 32-bit MOVs. + // but only a series of MOVs. auto MMI = Context.MovVector.begin(), MME = Context.MovVector.end(); - for (; MMI != MME; ++MMI, Context.ExpectedDist += 4) + for (; MMI != MME; ++MMI, Context.ExpectedDist += SlotSize) if (*MMI == nullptr) break; @@ -440,10 +451,9 @@ void X86CallFrameOptimization::collectCallInfo(MachineFunction &MF, return; Context.UsePush = true; - return; } -bool X86CallFrameOptimization::adjustCallSequence(MachineFunction &MF, +void X86CallFrameOptimization::adjustCallSequence(MachineFunction &MF, const CallContext &Context) { // Ok, we can in fact do the transformation for this call. // Do not remove the FrameSetup instruction, but adjust the parameters. @@ -453,15 +463,21 @@ bool X86CallFrameOptimization::adjustCallSequence(MachineFunction &MF, FrameSetup->getOperand(1).setImm(Context.ExpectedDist); DebugLoc DL = FrameSetup->getDebugLoc(); + bool Is64Bit = STI->is64Bit(); // Now, iterate through the vector in reverse order, and replace the movs // with pushes. MOVmi/MOVmr doesn't have any defs, so no need to // replace uses. - for (int Idx = (Context.ExpectedDist / 4) - 1; Idx >= 0; --Idx) { + for (int Idx = (Context.ExpectedDist >> Log2SlotSize) - 1; Idx >= 0; --Idx) { MachineBasicBlock::iterator MOV = *Context.MovVector[Idx]; MachineOperand PushOp = MOV->getOperand(X86::AddrNumOperands); MachineBasicBlock::iterator Push = nullptr; - if (MOV->getOpcode() == X86::MOV32mi) { - unsigned PushOpcode = X86::PUSHi32; + unsigned PushOpcode; + switch (MOV->getOpcode()) { + default: + llvm_unreachable("Unexpected Opcode!"); + case X86::MOV32mi: + case X86::MOV64mi32: + PushOpcode = Is64Bit ? X86::PUSH64i32 : X86::PUSHi32; // If the operand is a small (8-bit) immediate, we can use a // PUSH instruction with a shorter encoding. // Note that isImm() may fail even though this is a MOVmi, because @@ -469,13 +485,27 @@ bool X86CallFrameOptimization::adjustCallSequence(MachineFunction &MF, if (PushOp.isImm()) { int64_t Val = PushOp.getImm(); if (isInt<8>(Val)) - PushOpcode = X86::PUSH32i8; + PushOpcode = Is64Bit ? X86::PUSH64i8 : X86::PUSH32i8; } Push = BuildMI(MBB, Context.Call, DL, TII->get(PushOpcode)) - .addOperand(PushOp); - } else { + .addOperand(PushOp); + break; + case X86::MOV32mr: + case X86::MOV64mr: unsigned int Reg = PushOp.getReg(); + // If storing a 32-bit vreg on 64-bit targets, extend to a 64-bit vreg + // in preparation for the PUSH64. The upper 32 bits can be undef. + if (Is64Bit && MOV->getOpcode() == X86::MOV32mr) { + unsigned UndefReg = MRI->createVirtualRegister(&X86::GR64RegClass); + Reg = MRI->createVirtualRegister(&X86::GR64RegClass); + BuildMI(MBB, Context.Call, DL, TII->get(X86::IMPLICIT_DEF), UndefReg); + BuildMI(MBB, Context.Call, DL, TII->get(X86::INSERT_SUBREG), Reg) + .addReg(UndefReg) + .addOperand(PushOp) + .addImm(X86::sub_32bit); + } + // If PUSHrmm is not slow on this target, try to fold the source of the // push into the instruction. bool SlowPUSHrmm = STI->isAtom() || STI->isSLM(); @@ -484,7 +514,8 @@ bool X86CallFrameOptimization::adjustCallSequence(MachineFunction &MF, // conservative about that. MachineInstr *DefMov = nullptr; if (!SlowPUSHrmm && (DefMov = canFoldIntoRegPush(FrameSetup, Reg))) { - Push = BuildMI(MBB, Context.Call, DL, TII->get(X86::PUSH32rmm)); + PushOpcode = Is64Bit ? X86::PUSH64rmm : X86::PUSH32rmm; + Push = BuildMI(MBB, Context.Call, DL, TII->get(PushOpcode)); unsigned NumOps = DefMov->getDesc().getNumOperands(); for (unsigned i = NumOps - X86::AddrNumOperands; i != NumOps; ++i) @@ -492,33 +523,34 @@ bool X86CallFrameOptimization::adjustCallSequence(MachineFunction &MF, DefMov->eraseFromParent(); } else { - Push = BuildMI(MBB, Context.Call, DL, TII->get(X86::PUSH32r)) - .addReg(Reg) - .getInstr(); + PushOpcode = Is64Bit ? X86::PUSH64r : X86::PUSH32r; + Push = BuildMI(MBB, Context.Call, DL, TII->get(PushOpcode)) + .addReg(Reg) + .getInstr(); } + break; } // For debugging, when using SP-based CFA, we need to adjust the CFA // offset after each push. // TODO: This is needed only if we require precise CFA. if (!TFL->hasFP(MF)) - TFL->BuildCFI(MBB, std::next(Push), DL, - MCCFIInstruction::createAdjustCfaOffset(nullptr, 4)); + TFL->BuildCFI( + MBB, std::next(Push), DL, + MCCFIInstruction::createAdjustCfaOffset(nullptr, SlotSize)); MBB.erase(MOV); } // The stack-pointer copy is no longer used in the call sequences. // There should not be any other users, but we can't commit to that, so: - if (MRI->use_empty(Context.SPCopy->getOperand(0).getReg())) + if (Context.SPCopy && MRI->use_empty(Context.SPCopy->getOperand(0).getReg())) Context.SPCopy->eraseFromParent(); // Once we've done this, we need to make sure PEI doesn't assume a reserved // frame. X86MachineFunctionInfo *FuncInfo = MF.getInfo<X86MachineFunctionInfo>(); FuncInfo->setHasPushSequences(true); - - return true; } MachineInstr *X86CallFrameOptimization::canFoldIntoRegPush( @@ -540,19 +572,20 @@ MachineInstr *X86CallFrameOptimization::canFoldIntoRegPush( if (!MRI->hasOneNonDBGUse(Reg)) return nullptr; - MachineBasicBlock::iterator DefMI = MRI->getVRegDef(Reg); + MachineInstr &DefMI = *MRI->getVRegDef(Reg); // Make sure the def is a MOV from memory. - // If the def is an another block, give up. - if (DefMI->getOpcode() != X86::MOV32rm || - DefMI->getParent() != FrameSetup->getParent()) + // If the def is in another block, give up. + if ((DefMI.getOpcode() != X86::MOV32rm && + DefMI.getOpcode() != X86::MOV64rm) || + DefMI.getParent() != FrameSetup->getParent()) return nullptr; // Make sure we don't have any instructions between DefMI and the // push that make folding the load illegal. - for (auto I = DefMI; I != FrameSetup; ++I) + for (MachineBasicBlock::iterator I = DefMI; I != FrameSetup; ++I) if (I->isLoadFoldBarrier()) return nullptr; - return DefMI; + return &DefMI; } |