diff options
author | dim <dim@FreeBSD.org> | 2017-09-26 19:56:36 +0000 |
---|---|---|
committer | dim <dim@FreeBSD.org> | 2017-09-26 19:56:36 +0000 |
commit | 12cd91cf4c6b96a24427c0de5374916f2808d263 (patch) | |
tree | 6d243b0ccba6738dbbd30767188e2963f90ef18f /contrib/llvm/lib/CodeGen/MachineBlockPlacement.cpp | |
parent | b60520398f206195e21774c315afb59a0f6d7146 (diff) | |
download | FreeBSD-src-12cd91cf4c6b96a24427c0de5374916f2808d263.zip FreeBSD-src-12cd91cf4c6b96a24427c0de5374916f2808d263.tar.gz |
Merge clang, llvm, lld, lldb, compiler-rt and libc++ 5.0.0 release.
MFC r309126 (by emaste):
Correct lld llvm-tblgen dependency file name
MFC r309169:
Get rid of separate Subversion mergeinfo properties for llvm-dwarfdump
and llvm-lto. The mergeinfo confuses Subversion enormously, and these
directories will just use the mergeinfo for llvm itself.
MFC r312765:
Pull in r276136 from upstream llvm trunk (by Wei Mi):
Use ValueOffsetPair to enhance value reuse during SCEV expansion.
In D12090, the ExprValueMap was added to reuse existing value during
SCEV expansion. However, const folding and sext/zext distribution can
make the reuse still difficult.
A simplified case is: suppose we know S1 expands to V1 in
ExprValueMap, and
S1 = S2 + C_a
S3 = S2 + C_b
where C_a and C_b are different SCEVConstants. Then we'd like to
expand S3 as V1 - C_a + C_b instead of expanding S2 literally. It is
helpful when S2 is a complex SCEV expr and S2 has no entry in
ExprValueMap, which is usually caused by the fact that S3 is
generated from S1 after const folding.
In order to do that, we represent ExprValueMap as a mapping from SCEV
to ValueOffsetPair. We will save both S1->{V1, 0} and S2->{V1, C_a}
into the ExprValueMap when we create SCEV for V1. When S3 is
expanded, it will first expand S2 to V1 - C_a because of S2->{V1,
C_a} in the map, then expand S3 to V1 - C_a + C_b.
Differential Revision: https://reviews.llvm.org/D21313
This should fix assertion failures when building OpenCV >= 3.1.
PR: 215649
MFC r312831:
Revert r312765 for now, since it causes assertions when building
lang/spidermonkey24.
Reported by: antoine
PR: 215649
MFC r316511 (by jhb):
Add an implementation of __ffssi2() derived from __ffsdi2().
Newer versions of GCC include an __ffssi2() symbol in libgcc and the
compiler can emit calls to it in generated code. This is true for at
least GCC 6.2 when compiling world for mips and mips64.
Reviewed by: jmallett, dim
Sponsored by: DARPA / AFRL
Differential Revision: https://reviews.freebsd.org/D10086
MFC r318601 (by adrian):
[libcompiler-rt] add bswapdi2/bswapsi2
This is required for mips gcc 6.3 userland to build/run.
Reviewed by: emaste, dim
Approved by: emaste
Differential Revision: https://reviews.freebsd.org/D10838
MFC r318884 (by emaste):
lldb: map TRAP_CAP to a trace trap
In the absense of a more specific handler for TRAP_CAP (generated by
ENOTCAPABLE or ECAPMODE while in capability mode) treat it as a trace
trap.
Example usage (testing the bug in PR219173):
% proccontrol -m trapcap lldb usr.bin/hexdump/obj/hexdump -- -Cv -s 1 /bin/ls
...
(lldb) run
Process 12980 launching
Process 12980 launched: '.../usr.bin/hexdump/obj/hexdump' (x86_64)
Process 12980 stopped
* thread #1, stop reason = trace
frame #0: 0x0000004b80c65f1a libc.so.7`__sys_lseek + 10
...
In the future we should have LLDB control the trapcap procctl itself
(as it does with ASLR), as well as report a specific stop reason.
This change eliminates an assertion failure from LLDB for now.
MFC r319796:
Remove a few unneeded files from libllvm, libclang and liblldb.
MFC r319885 (by emaste):
lld: ELF: Fix ICF crash on absolute symbol relocations.
If two sections contained relocations to absolute symbols with the same
value we would crash when trying to access their sections. Add a check that
both symbols point to sections before accessing their sections, and treat
absolute symbols as equal if their values are equal.
Obtained from: LLD commit r292578
MFC r319918:
Revert r319796 for now, it can cause undefined references when linking
in some circumstances.
Reported by: Shawn Webb <shawn.webb@hardenedbsd.org>
MFC r319957 (by emaste):
lld: Add armelf emulation mode
Obtained from: LLD r305375
MFC r321369:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
5.0.0 (trunk r308421). Upstream has branched for the 5.0.0 release,
which should be in about a month. Please report bugs and regressions,
so we can get them into the release.
Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11
support to build; see UPDATING for more information.
MFC r321420:
Add a few more object files to liblldb, which should solve errors when
linking the lldb executable in some cases. In particular, when the
-ffunction-sections -fdata-sections options are turned off, or
ineffective.
Reported by: Shawn Webb, Mark Millard
MFC r321433:
Cleanup stale Options.inc files from the previous libllvm build for
clang 4.0.0. Otherwise, these can get included before the two newly
generated ones (which are different) for clang 5.0.0.
Reported by: Mark Millard
MFC r321439 (by bdrewery):
Move llvm Options.inc hack from r321433 for NO_CLEAN to lib/clang/libllvm.
The files are only ever generated to .OBJDIR, not to WORLDTMP (as a
sysroot) and are only ever included from a compilation. So using
a beforebuild target here removes the file before the compilation
tries to include it.
MFC r321664:
Pull in r308891 from upstream llvm trunk (by Benjamin Kramer):
[CodeGenPrepare] Cut off FindAllMemoryUses if there are too many uses.
This avoids excessive compile time. The case I'm looking at is
Function.cpp from an old version of LLVM that still had the giant
memcmp string matcher in it. Before r308322 this compiled in about 2
minutes, after it, clang takes infinite* time to compile it. With
this patch we're at 5 min, which is still bad but this is a
pathological case.
The cut off at 20 uses was chosen by looking at other cut-offs in LLVM
for user scanning. It's probably too high, but does the job and is
very unlikely to regress anything.
Fixes PR33900.
* I'm impatient and aborted after 15 minutes, on the bug report it was
killed after 2h.
Pull in r308986 from upstream llvm trunk (by Simon Pilgrim):
[X86][CGP] Reduce memcmp() expansion to 2 load pairs (PR33914)
D35067/rL308322 attempted to support up to 4 load pairs for memcmp
inlining which resulted in regressions for some optimized libc memcmp
implementations (PR33914).
Until we can match these more optimal cases, this patch reduces the
memcmp expansion to a maximum of 2 load pairs (which matches what we
do for -Os).
This patch should be considered for the 5.0.0 release branch as well
Differential Revision: https://reviews.llvm.org/D35830
These fix a hang (or extremely long compile time) when building older
LLVM ports.
Reported by: antoine
PR: 219139
MFC r321719:
Pull in r309503 from upstream clang trunk (by Richard Smith):
PR33902: Invalidate line number cache when adding more text to
existing buffer.
This led to crashes as the line number cache would report a bogus
line number for a line of code, and we'd try to find a nonexistent
column within the line when printing diagnostics.
This fixes an assertion when building the graphics/champlain port.
Reported by: antoine, kwm
PR: 219139
MFC r321723:
Upgrade our copies of clang, llvm, lld and lldb to r309439 from the
upstream release_50 branch. This is just after upstream's 5.0.0-rc1.
MFC r322320:
Upgrade our copies of clang, llvm and libc++ to r310316 from the
upstream release_50 branch.
MFC r322326 (by emaste):
lldb: Make i386-*-freebsd expression work on JIT path
* Enable i386 ABI creation for freebsd
* Added an extra argument in ABISysV_i386::PrepareTrivialCall for mmap
syscall
* Unlike linux, the last argument of mmap is actually 64-bit(off_t).
This requires us to push an additional word for the higher order bits.
* Prior to this change, ktrace dump will show mmap failures due to
invalid argument coming from the 6th mmap argument.
Submitted by: Karnajit Wangkhem
Differential Revision: https://reviews.llvm.org/D34776
MFC r322360 (by emaste):
lldb: Report inferior signals as signals, not exceptions, on FreeBSD
This is the FreeBSD equivalent of LLVM r238549.
This serves 2 purposes:
* LLDB should handle inferior process signals SIGSEGV/SIGILL/SIGBUS/
SIGFPE the way it is suppose to be handled. Prior to this fix these
signals will neither create a coredump, nor exit from the debugger
or work for signal handling scenario.
* eInvalidCrashReason need not report "unknown crash reason" if we have
a valid si_signo
llvm.org/pr23699
Patch by Karnajit Wangkhem
Differential Revision: https://reviews.llvm.org/D35223
Submitted by: Karnajit Wangkhem
Obtained from: LLVM r310591
MFC r322474 (by emaste):
lld: Add `-z muldefs` option.
Obtained from: LLVM r310757
MFC r322740:
Upgrade our copies of clang, llvm, lld and libc++ to r311219 from the
upstream release_50 branch.
MFC r322855:
Upgrade our copies of clang, llvm, lldb and compiler-rt to r311606 from
the upstream release_50 branch.
As of this version, lib/msun's trig test should also work correctly
again (see bug 220989 for more information).
PR: 220989
MFC r323112:
Upgrade our copies of clang, llvm, lldb and compiler-rt to r312293 from
the upstream release_50 branch. This corresponds to 5.0.0 rc4.
As of this version, the cad/stepcode port should now compile in a more
reasonable time on i386 (see bug 221836 for more information).
PR: 221836
MFC r323245:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
5.0.0 release (upstream r312559).
Release notes for llvm, clang and lld will be available here soon:
<http://releases.llvm.org/5.0.0/docs/ReleaseNotes.html>
<http://releases.llvm.org/5.0.0/tools/clang/docs/ReleaseNotes.html>
<http://releases.llvm.org/5.0.0/tools/lld/docs/ReleaseNotes.html>
Relnotes: yes
Diffstat (limited to 'contrib/llvm/lib/CodeGen/MachineBlockPlacement.cpp')
-rw-r--r-- | contrib/llvm/lib/CodeGen/MachineBlockPlacement.cpp | 1167 |
1 files changed, 941 insertions, 226 deletions
diff --git a/contrib/llvm/lib/CodeGen/MachineBlockPlacement.cpp b/contrib/llvm/lib/CodeGen/MachineBlockPlacement.cpp index 40e3840..447ad62 100644 --- a/contrib/llvm/lib/CodeGen/MachineBlockPlacement.cpp +++ b/contrib/llvm/lib/CodeGen/MachineBlockPlacement.cpp @@ -25,22 +25,23 @@ // //===----------------------------------------------------------------------===// -#include "llvm/CodeGen/Passes.h" -#include "llvm/CodeGen/TargetPassConfig.h" #include "BranchFolding.h" #include "llvm/ADT/DenseMap.h" #include "llvm/ADT/SmallPtrSet.h" #include "llvm/ADT/SmallVector.h" #include "llvm/ADT/Statistic.h" +#include "llvm/Analysis/BlockFrequencyInfoImpl.h" #include "llvm/CodeGen/MachineBasicBlock.h" #include "llvm/CodeGen/MachineBlockFrequencyInfo.h" #include "llvm/CodeGen/MachineBranchProbabilityInfo.h" -#include "llvm/CodeGen/MachineDominators.h" #include "llvm/CodeGen/MachineFunction.h" #include "llvm/CodeGen/MachineFunctionPass.h" #include "llvm/CodeGen/MachineLoopInfo.h" #include "llvm/CodeGen/MachineModuleInfo.h" +#include "llvm/CodeGen/MachinePostDominators.h" +#include "llvm/CodeGen/Passes.h" #include "llvm/CodeGen/TailDuplicator.h" +#include "llvm/CodeGen/TargetPassConfig.h" #include "llvm/Support/Allocator.h" #include "llvm/Support/CommandLine.h" #include "llvm/Support/Debug.h" @@ -49,6 +50,8 @@ #include "llvm/Target/TargetLowering.h" #include "llvm/Target/TargetSubtargetInfo.h" #include <algorithm> +#include <functional> +#include <utility> using namespace llvm; #define DEBUG_TYPE "block-placement" @@ -82,19 +85,6 @@ static cl::opt<unsigned> ExitBlockBias( // Definition: // - Outlining: placement of a basic block outside the chain or hot path. -static cl::opt<bool> OutlineOptionalBranches( - "outline-optional-branches", - cl::desc("Outlining optional branches will place blocks that are optional " - "branches, i.e. branches with a common post dominator, outside " - "the hot path or chain"), - cl::init(false), cl::Hidden); - -static cl::opt<unsigned> OutlineOptionalThreshold( - "outline-optional-threshold", - cl::desc("Don't outline optional branches that are a single block with an " - "instruction count below this threshold"), - cl::init(4), cl::Hidden); - static cl::opt<unsigned> LoopToColdBlockRatio( "loop-to-cold-block-ratio", cl::desc("Outline loop blocks from loop chain if (frequency of loop) / " @@ -136,20 +126,55 @@ BranchFoldPlacement("branch-fold-placement", cl::init(true), cl::Hidden); // Heuristic for tail duplication. -static cl::opt<unsigned> TailDuplicatePlacementThreshold( +static cl::opt<unsigned> TailDupPlacementThreshold( "tail-dup-placement-threshold", cl::desc("Instruction cutoff for tail duplication during layout. " "Tail merging during layout is forced to have a threshold " "that won't conflict."), cl::init(2), cl::Hidden); +// Heuristic for aggressive tail duplication. +static cl::opt<unsigned> TailDupPlacementAggressiveThreshold( + "tail-dup-placement-aggressive-threshold", + cl::desc("Instruction cutoff for aggressive tail duplication during " + "layout. Used at -O3. Tail merging during layout is forced to " + "have a threshold that won't conflict."), cl::init(3), + cl::Hidden); + +// Heuristic for tail duplication. +static cl::opt<unsigned> TailDupPlacementPenalty( + "tail-dup-placement-penalty", + cl::desc("Cost penalty for blocks that can avoid breaking CFG by copying. " + "Copying can increase fallthrough, but it also increases icache " + "pressure. This parameter controls the penalty to account for that. " + "Percent as integer."), + cl::init(2), + cl::Hidden); + +// Heuristic for triangle chains. +static cl::opt<unsigned> TriangleChainCount( + "triangle-chain-count", + cl::desc("Number of triangle-shaped-CFG's that need to be in a row for the " + "triangle tail duplication heuristic to kick in. 0 to disable."), + cl::init(2), + cl::Hidden); + extern cl::opt<unsigned> StaticLikelyProb; extern cl::opt<unsigned> ProfileLikelyProb; +// Internal option used to control BFI display only after MBP pass. +// Defined in CodeGen/MachineBlockFrequencyInfo.cpp: +// -view-block-layout-with-bfi= +extern cl::opt<GVDAGType> ViewBlockLayoutWithBFI; + +// Command line option to specify the name of the function for CFG dump +// Defined in Analysis/BlockFrequencyInfo.cpp: -view-bfi-func-name= +extern cl::opt<std::string> ViewBlockFreqFuncName; + namespace { class BlockChain; /// \brief Type for our function-wide basic block -> block chain mapping. -typedef DenseMap<MachineBasicBlock *, BlockChain *> BlockToChainMapType; +typedef DenseMap<const MachineBasicBlock *, BlockChain *> BlockToChainMapType; } namespace { @@ -193,12 +218,15 @@ public: /// \brief Iterator over blocks within the chain. typedef SmallVectorImpl<MachineBasicBlock *>::iterator iterator; + typedef SmallVectorImpl<MachineBasicBlock *>::const_iterator const_iterator; /// \brief Beginning of blocks within the chain. iterator begin() { return Blocks.begin(); } + const_iterator begin() const { return Blocks.begin(); } /// \brief End of blocks within the chain. iterator end() { return Blocks.end(); } + const_iterator end() const { return Blocks.end(); } bool remove(MachineBasicBlock* BB) { for(iterator i = begin(); i != end(); ++i) { @@ -217,25 +245,26 @@ public: /// updating the block -> chain mapping. It does not free or tear down the /// old chain, but the old chain's block list is no longer valid. void merge(MachineBasicBlock *BB, BlockChain *Chain) { - assert(BB); - assert(!Blocks.empty()); + assert(BB && "Can't merge a null block."); + assert(!Blocks.empty() && "Can't merge into an empty chain."); // Fast path in case we don't have a chain already. if (!Chain) { - assert(!BlockToChain[BB]); + assert(!BlockToChain[BB] && + "Passed chain is null, but BB has entry in BlockToChain."); Blocks.push_back(BB); BlockToChain[BB] = this; return; } - assert(BB == *Chain->begin()); + assert(BB == *Chain->begin() && "Passed BB is not head of Chain."); assert(Chain->begin() != Chain->end()); // Update the incoming blocks to point to this chain, and add them to the // chain structure. for (MachineBasicBlock *ChainBB : *Chain) { Blocks.push_back(ChainBB); - assert(BlockToChain[ChainBB] == Chain && "Incoming blocks not in chain"); + assert(BlockToChain[ChainBB] == Chain && "Incoming blocks not in chain."); BlockToChain[ChainBB] = this; } } @@ -264,12 +293,28 @@ public: namespace { class MachineBlockPlacement : public MachineFunctionPass { /// \brief A typedef for a block filter set. - typedef SmallSetVector<MachineBasicBlock *, 16> BlockFilterSet; + typedef SmallSetVector<const MachineBasicBlock *, 16> BlockFilterSet; + + /// Pair struct containing basic block and taildup profitiability + struct BlockAndTailDupResult { + MachineBasicBlock *BB; + bool ShouldTailDup; + }; + + /// Triple struct containing edge weight and the edge. + struct WeightedEdge { + BlockFrequency Weight; + MachineBasicBlock *Src; + MachineBasicBlock *Dest; + }; /// \brief work lists of blocks that are ready to be laid out SmallVector<MachineBasicBlock *, 16> BlockWorkList; SmallVector<MachineBasicBlock *, 16> EHPadWorkList; + /// Edges that have already been computed as optimal. + DenseMap<const MachineBasicBlock *, BlockAndTailDupResult> ComputedEdges; + /// \brief Machine Function MachineFunction *F; @@ -294,7 +339,7 @@ class MachineBlockPlacement : public MachineFunctionPass { const TargetLoweringBase *TLI; /// \brief A handle to the post dominator tree. - MachineDominatorTree *MDT; + MachinePostDominatorTree *MPDT; /// \brief Duplicator used to duplicate tails during placement. /// @@ -303,10 +348,6 @@ class MachineBlockPlacement : public MachineFunctionPass { /// must be done inline. TailDuplicator TailDup; - /// \brief A set of blocks that are unavoidably execute, i.e. they dominate - /// all terminators of the MachineFunction. - SmallPtrSet<MachineBasicBlock *, 4> UnavoidableBlocks; - /// \brief Allocator and owner of BlockChain structures. /// /// We build BlockChains lazily while processing the loop structure of @@ -322,7 +363,7 @@ class MachineBlockPlacement : public MachineFunctionPass { /// BlockChain it participates in, if any. We use it to, among other things, /// allow implicitly defining edges between chains as the existing edges /// between basic blocks. - DenseMap<MachineBasicBlock *, BlockChain *> BlockToChain; + DenseMap<const MachineBasicBlock *, BlockChain *> BlockToChain; #ifndef NDEBUG /// The set of basic blocks that have terminators that cannot be fully @@ -334,75 +375,107 @@ class MachineBlockPlacement : public MachineFunctionPass { /// Decrease the UnscheduledPredecessors count for all blocks in chain, and /// if the count goes to 0, add them to the appropriate work list. - void markChainSuccessors(BlockChain &Chain, MachineBasicBlock *LoopHeaderBB, - const BlockFilterSet *BlockFilter = nullptr); + void markChainSuccessors( + const BlockChain &Chain, const MachineBasicBlock *LoopHeaderBB, + const BlockFilterSet *BlockFilter = nullptr); /// Decrease the UnscheduledPredecessors count for a single block, and /// if the count goes to 0, add them to the appropriate work list. void markBlockSuccessors( - BlockChain &Chain, MachineBasicBlock *BB, MachineBasicBlock *LoopHeaderBB, + const BlockChain &Chain, const MachineBasicBlock *BB, + const MachineBasicBlock *LoopHeaderBB, const BlockFilterSet *BlockFilter = nullptr); - BranchProbability - collectViableSuccessors(MachineBasicBlock *BB, BlockChain &Chain, - const BlockFilterSet *BlockFilter, - SmallVector<MachineBasicBlock *, 4> &Successors); - bool shouldPredBlockBeOutlined(MachineBasicBlock *BB, MachineBasicBlock *Succ, - BlockChain &Chain, - const BlockFilterSet *BlockFilter, - BranchProbability SuccProb, - BranchProbability HotProb); + collectViableSuccessors( + const MachineBasicBlock *BB, const BlockChain &Chain, + const BlockFilterSet *BlockFilter, + SmallVector<MachineBasicBlock *, 4> &Successors); + bool shouldPredBlockBeOutlined( + const MachineBasicBlock *BB, const MachineBasicBlock *Succ, + const BlockChain &Chain, const BlockFilterSet *BlockFilter, + BranchProbability SuccProb, BranchProbability HotProb); bool repeatedlyTailDuplicateBlock( MachineBasicBlock *BB, MachineBasicBlock *&LPred, - MachineBasicBlock *LoopHeaderBB, + const MachineBasicBlock *LoopHeaderBB, BlockChain &Chain, BlockFilterSet *BlockFilter, MachineFunction::iterator &PrevUnplacedBlockIt); - bool maybeTailDuplicateBlock(MachineBasicBlock *BB, MachineBasicBlock *LPred, - const BlockChain &Chain, - BlockFilterSet *BlockFilter, - MachineFunction::iterator &PrevUnplacedBlockIt, - bool &DuplicatedToPred); - bool - hasBetterLayoutPredecessor(MachineBasicBlock *BB, MachineBasicBlock *Succ, - BlockChain &SuccChain, BranchProbability SuccProb, - BranchProbability RealSuccProb, BlockChain &Chain, - const BlockFilterSet *BlockFilter); - MachineBasicBlock *selectBestSuccessor(MachineBasicBlock *BB, - BlockChain &Chain, - const BlockFilterSet *BlockFilter); - MachineBasicBlock * - selectBestCandidateBlock(BlockChain &Chain, - SmallVectorImpl<MachineBasicBlock *> &WorkList); - MachineBasicBlock * - getFirstUnplacedBlock(const BlockChain &PlacedChain, - MachineFunction::iterator &PrevUnplacedBlockIt, - const BlockFilterSet *BlockFilter); + bool maybeTailDuplicateBlock( + MachineBasicBlock *BB, MachineBasicBlock *LPred, + BlockChain &Chain, BlockFilterSet *BlockFilter, + MachineFunction::iterator &PrevUnplacedBlockIt, + bool &DuplicatedToPred); + bool hasBetterLayoutPredecessor( + const MachineBasicBlock *BB, const MachineBasicBlock *Succ, + const BlockChain &SuccChain, BranchProbability SuccProb, + BranchProbability RealSuccProb, const BlockChain &Chain, + const BlockFilterSet *BlockFilter); + BlockAndTailDupResult selectBestSuccessor( + const MachineBasicBlock *BB, const BlockChain &Chain, + const BlockFilterSet *BlockFilter); + MachineBasicBlock *selectBestCandidateBlock( + const BlockChain &Chain, SmallVectorImpl<MachineBasicBlock *> &WorkList); + MachineBasicBlock *getFirstUnplacedBlock( + const BlockChain &PlacedChain, + MachineFunction::iterator &PrevUnplacedBlockIt, + const BlockFilterSet *BlockFilter); /// \brief Add a basic block to the work list if it is appropriate. /// /// If the optional parameter BlockFilter is provided, only MBB /// present in the set will be added to the worklist. If nullptr /// is provided, no filtering occurs. - void fillWorkLists(MachineBasicBlock *MBB, + void fillWorkLists(const MachineBasicBlock *MBB, SmallPtrSetImpl<BlockChain *> &UpdatedPreds, const BlockFilterSet *BlockFilter); - void buildChain(MachineBasicBlock *BB, BlockChain &Chain, + void buildChain(const MachineBasicBlock *BB, BlockChain &Chain, BlockFilterSet *BlockFilter = nullptr); - MachineBasicBlock *findBestLoopTop(MachineLoop &L, - const BlockFilterSet &LoopBlockSet); - MachineBasicBlock *findBestLoopExit(MachineLoop &L, - const BlockFilterSet &LoopBlockSet); - BlockFilterSet collectLoopBlockSet(MachineLoop &L); - void buildLoopChains(MachineLoop &L); - void rotateLoop(BlockChain &LoopChain, MachineBasicBlock *ExitingBB, - const BlockFilterSet &LoopBlockSet); - void rotateLoopWithProfile(BlockChain &LoopChain, MachineLoop &L, - const BlockFilterSet &LoopBlockSet); - void collectMustExecuteBBs(); + MachineBasicBlock *findBestLoopTop( + const MachineLoop &L, const BlockFilterSet &LoopBlockSet); + MachineBasicBlock *findBestLoopExit( + const MachineLoop &L, const BlockFilterSet &LoopBlockSet); + BlockFilterSet collectLoopBlockSet(const MachineLoop &L); + void buildLoopChains(const MachineLoop &L); + void rotateLoop( + BlockChain &LoopChain, const MachineBasicBlock *ExitingBB, + const BlockFilterSet &LoopBlockSet); + void rotateLoopWithProfile( + BlockChain &LoopChain, const MachineLoop &L, + const BlockFilterSet &LoopBlockSet); void buildCFGChains(); void optimizeBranches(); void alignBlocks(); + /// Returns true if a block should be tail-duplicated to increase fallthrough + /// opportunities. + bool shouldTailDuplicate(MachineBasicBlock *BB); + /// Check the edge frequencies to see if tail duplication will increase + /// fallthroughs. + bool isProfitableToTailDup( + const MachineBasicBlock *BB, const MachineBasicBlock *Succ, + BranchProbability AdjustedSumProb, + const BlockChain &Chain, const BlockFilterSet *BlockFilter); + /// Check for a trellis layout. + bool isTrellis(const MachineBasicBlock *BB, + const SmallVectorImpl<MachineBasicBlock *> &ViableSuccs, + const BlockChain &Chain, const BlockFilterSet *BlockFilter); + /// Get the best successor given a trellis layout. + BlockAndTailDupResult getBestTrellisSuccessor( + const MachineBasicBlock *BB, + const SmallVectorImpl<MachineBasicBlock *> &ViableSuccs, + BranchProbability AdjustedSumProb, const BlockChain &Chain, + const BlockFilterSet *BlockFilter); + /// Get the best pair of non-conflicting edges. + static std::pair<WeightedEdge, WeightedEdge> getBestNonConflictingEdges( + const MachineBasicBlock *BB, + MutableArrayRef<SmallVector<WeightedEdge, 8>> Edges); + /// Returns true if a block can tail duplicate into all unplaced + /// predecessors. Filters based on loop. + bool canTailDuplicateUnplacedPreds( + const MachineBasicBlock *BB, MachineBasicBlock *Succ, + const BlockChain &Chain, const BlockFilterSet *BlockFilter); + /// Find chains of triangles to tail-duplicate where a global analysis works, + /// but a local analysis would not find them. + void precomputeTriangleChains(); public: static char ID; // Pass identification, replacement for typeid @@ -415,7 +488,8 @@ public: void getAnalysisUsage(AnalysisUsage &AU) const override { AU.addRequired<MachineBranchProbabilityInfo>(); AU.addRequired<MachineBlockFrequencyInfo>(); - AU.addRequired<MachineDominatorTree>(); + if (TailDupPlacement) + AU.addRequired<MachinePostDominatorTree>(); AU.addRequired<MachineLoopInfo>(); AU.addRequired<TargetPassConfig>(); MachineFunctionPass::getAnalysisUsage(AU); @@ -425,20 +499,20 @@ public: char MachineBlockPlacement::ID = 0; char &llvm::MachineBlockPlacementID = MachineBlockPlacement::ID; -INITIALIZE_PASS_BEGIN(MachineBlockPlacement, "block-placement", +INITIALIZE_PASS_BEGIN(MachineBlockPlacement, DEBUG_TYPE, "Branch Probability Basic Block Placement", false, false) INITIALIZE_PASS_DEPENDENCY(MachineBranchProbabilityInfo) INITIALIZE_PASS_DEPENDENCY(MachineBlockFrequencyInfo) -INITIALIZE_PASS_DEPENDENCY(MachineDominatorTree) +INITIALIZE_PASS_DEPENDENCY(MachinePostDominatorTree) INITIALIZE_PASS_DEPENDENCY(MachineLoopInfo) -INITIALIZE_PASS_END(MachineBlockPlacement, "block-placement", +INITIALIZE_PASS_END(MachineBlockPlacement, DEBUG_TYPE, "Branch Probability Basic Block Placement", false, false) #ifndef NDEBUG /// \brief Helper to print the name of a MBB. /// /// Only used by debug logging. -static std::string getBlockName(MachineBasicBlock *BB) { +static std::string getBlockName(const MachineBasicBlock *BB) { std::string Result; raw_string_ostream OS(Result); OS << "BB#" << BB->getNumber(); @@ -455,7 +529,7 @@ static std::string getBlockName(MachineBasicBlock *BB) { /// having one fewer active predecessor. It also adds any successors of this /// chain which reach the zero-predecessor state to the appropriate worklist. void MachineBlockPlacement::markChainSuccessors( - BlockChain &Chain, MachineBasicBlock *LoopHeaderBB, + const BlockChain &Chain, const MachineBasicBlock *LoopHeaderBB, const BlockFilterSet *BlockFilter) { // Walk all the blocks in this chain, marking their successors as having // a predecessor placed. @@ -471,8 +545,8 @@ void MachineBlockPlacement::markChainSuccessors( /// and was duplicated into the chain end, we need to redo markBlockSuccessors /// for just that block. void MachineBlockPlacement::markBlockSuccessors( - BlockChain &Chain, MachineBasicBlock *MBB, MachineBasicBlock *LoopHeaderBB, - const BlockFilterSet *BlockFilter) { + const BlockChain &Chain, const MachineBasicBlock *MBB, + const MachineBasicBlock *LoopHeaderBB, const BlockFilterSet *BlockFilter) { // Add any successors for which this is the only un-placed in-loop // predecessor to the worklist as a viable candidate for CFG-neutral // placement. No subsequent placement of this block will violate the CFG @@ -504,7 +578,8 @@ void MachineBlockPlacement::markBlockSuccessors( /// the total branch probability of edges from \p BB to those /// blocks. BranchProbability MachineBlockPlacement::collectViableSuccessors( - MachineBasicBlock *BB, BlockChain &Chain, const BlockFilterSet *BlockFilter, + const MachineBasicBlock *BB, const BlockChain &Chain, + const BlockFilterSet *BlockFilter, SmallVector<MachineBasicBlock *, 4> &Successors) { // Adjust edge probabilities by excluding edges pointing to blocks that is // either not in BlockFilter or is already in the current chain. Consider the @@ -519,8 +594,8 @@ BranchProbability MachineBlockPlacement::collectViableSuccessors( // Assume A->C is very hot (>90%), and C->D has a 50% probability, then after // A->C is chosen as a fall-through, D won't be selected as a successor of C // due to CFG constraint (the probability of C->D is not greater than - // HotProb to break top-order). If we exclude E that is not in BlockFilter - // when calculating the probability of C->D, D will be selected and we + // HotProb to break topo-order). If we exclude E that is not in BlockFilter + // when calculating the probability of C->D, D will be selected and we // will get A C D B as the layout of this loop. auto AdjustedSumProb = BranchProbability::getOne(); for (MachineBasicBlock *Succ : BB->successors()) { @@ -561,46 +636,573 @@ getAdjustedProbability(BranchProbability OrigProb, return SuccProb; } -/// When the option OutlineOptionalBranches is on, this method -/// checks if the fallthrough candidate block \p Succ (of block -/// \p BB) also has other unscheduled predecessor blocks which -/// are also successors of \p BB (forming triangular shape CFG). -/// If none of such predecessors are small, it returns true. -/// The caller can choose to select \p Succ as the layout successors -/// so that \p Succ's predecessors (optional branches) can be -/// outlined. -/// FIXME: fold this with more general layout cost analysis. -bool MachineBlockPlacement::shouldPredBlockBeOutlined( - MachineBasicBlock *BB, MachineBasicBlock *Succ, BlockChain &Chain, - const BlockFilterSet *BlockFilter, BranchProbability SuccProb, - BranchProbability HotProb) { - if (!OutlineOptionalBranches) +/// Check if \p BB has exactly the successors in \p Successors. +static bool +hasSameSuccessors(MachineBasicBlock &BB, + SmallPtrSetImpl<const MachineBasicBlock *> &Successors) { + if (BB.succ_size() != Successors.size()) + return false; + // We don't want to count self-loops + if (Successors.count(&BB)) return false; - // If we outline optional branches, look whether Succ is unavoidable, i.e. - // dominates all terminators of the MachineFunction. If it does, other - // successors must be optional. Don't do this for cold branches. - if (SuccProb > HotProb.getCompl() && UnavoidableBlocks.count(Succ) > 0) { - for (MachineBasicBlock *Pred : Succ->predecessors()) { - // Check whether there is an unplaced optional branch. - if (Pred == Succ || (BlockFilter && !BlockFilter->count(Pred)) || - BlockToChain[Pred] == &Chain) + for (MachineBasicBlock *Succ : BB.successors()) + if (!Successors.count(Succ)) + return false; + return true; +} + +/// Check if a block should be tail duplicated to increase fallthrough +/// opportunities. +/// \p BB Block to check. +bool MachineBlockPlacement::shouldTailDuplicate(MachineBasicBlock *BB) { + // Blocks with single successors don't create additional fallthrough + // opportunities. Don't duplicate them. TODO: When conditional exits are + // analyzable, allow them to be duplicated. + bool IsSimple = TailDup.isSimpleBB(BB); + + if (BB->succ_size() == 1) + return false; + return TailDup.shouldTailDuplicate(IsSimple, *BB); +} + +/// Compare 2 BlockFrequency's with a small penalty for \p A. +/// In order to be conservative, we apply a X% penalty to account for +/// increased icache pressure and static heuristics. For small frequencies +/// we use only the numerators to improve accuracy. For simplicity, we assume the +/// penalty is less than 100% +/// TODO(iteratee): Use 64-bit fixed point edge frequencies everywhere. +static bool greaterWithBias(BlockFrequency A, BlockFrequency B, + uint64_t EntryFreq) { + BranchProbability ThresholdProb(TailDupPlacementPenalty, 100); + BlockFrequency Gain = A - B; + return (Gain / ThresholdProb).getFrequency() >= EntryFreq; +} + +/// Check the edge frequencies to see if tail duplication will increase +/// fallthroughs. It only makes sense to call this function when +/// \p Succ would not be chosen otherwise. Tail duplication of \p Succ is +/// always locally profitable if we would have picked \p Succ without +/// considering duplication. +bool MachineBlockPlacement::isProfitableToTailDup( + const MachineBasicBlock *BB, const MachineBasicBlock *Succ, + BranchProbability QProb, + const BlockChain &Chain, const BlockFilterSet *BlockFilter) { + // We need to do a probability calculation to make sure this is profitable. + // First: does succ have a successor that post-dominates? This affects the + // calculation. The 2 relevant cases are: + // BB BB + // | \Qout | \Qout + // P| C |P C + // = C' = C' + // | /Qin | /Qin + // | / | / + // Succ Succ + // / \ | \ V + // U/ =V |U \ + // / \ = D + // D E | / + // | / + // |/ + // PDom + // '=' : Branch taken for that CFG edge + // In the second case, Placing Succ while duplicating it into C prevents the + // fallthrough of Succ into either D or PDom, because they now have C as an + // unplaced predecessor + + // Start by figuring out which case we fall into + MachineBasicBlock *PDom = nullptr; + SmallVector<MachineBasicBlock *, 4> SuccSuccs; + // Only scan the relevant successors + auto AdjustedSuccSumProb = + collectViableSuccessors(Succ, Chain, BlockFilter, SuccSuccs); + BranchProbability PProb = MBPI->getEdgeProbability(BB, Succ); + auto BBFreq = MBFI->getBlockFreq(BB); + auto SuccFreq = MBFI->getBlockFreq(Succ); + BlockFrequency P = BBFreq * PProb; + BlockFrequency Qout = BBFreq * QProb; + uint64_t EntryFreq = MBFI->getEntryFreq(); + // If there are no more successors, it is profitable to copy, as it strictly + // increases fallthrough. + if (SuccSuccs.size() == 0) + return greaterWithBias(P, Qout, EntryFreq); + + auto BestSuccSucc = BranchProbability::getZero(); + // Find the PDom or the best Succ if no PDom exists. + for (MachineBasicBlock *SuccSucc : SuccSuccs) { + auto Prob = MBPI->getEdgeProbability(Succ, SuccSucc); + if (Prob > BestSuccSucc) + BestSuccSucc = Prob; + if (PDom == nullptr) + if (MPDT->dominates(SuccSucc, Succ)) { + PDom = SuccSucc; + break; + } + } + // For the comparisons, we need to know Succ's best incoming edge that isn't + // from BB. + auto SuccBestPred = BlockFrequency(0); + for (MachineBasicBlock *SuccPred : Succ->predecessors()) { + if (SuccPred == Succ || SuccPred == BB + || BlockToChain[SuccPred] == &Chain + || (BlockFilter && !BlockFilter->count(SuccPred))) + continue; + auto Freq = MBFI->getBlockFreq(SuccPred) + * MBPI->getEdgeProbability(SuccPred, Succ); + if (Freq > SuccBestPred) + SuccBestPred = Freq; + } + // Qin is Succ's best unplaced incoming edge that isn't BB + BlockFrequency Qin = SuccBestPred; + // If it doesn't have a post-dominating successor, here is the calculation: + // BB BB + // | \Qout | \ + // P| C | = + // = C' | C + // | /Qin | | + // | / | C' (+Succ) + // Succ Succ /| + // / \ | \/ | + // U/ =V | == | + // / \ | / \| + // D E D E + // '=' : Branch taken for that CFG edge + // Cost in the first case is: P + V + // For this calculation, we always assume P > Qout. If Qout > P + // The result of this function will be ignored at the caller. + // Let F = SuccFreq - Qin + // Cost in the second case is: Qout + min(Qin, F) * U + max(Qin, F) * V + + if (PDom == nullptr || !Succ->isSuccessor(PDom)) { + BranchProbability UProb = BestSuccSucc; + BranchProbability VProb = AdjustedSuccSumProb - UProb; + BlockFrequency F = SuccFreq - Qin; + BlockFrequency V = SuccFreq * VProb; + BlockFrequency QinU = std::min(Qin, F) * UProb; + BlockFrequency BaseCost = P + V; + BlockFrequency DupCost = Qout + QinU + std::max(Qin, F) * VProb; + return greaterWithBias(BaseCost, DupCost, EntryFreq); + } + BranchProbability UProb = MBPI->getEdgeProbability(Succ, PDom); + BranchProbability VProb = AdjustedSuccSumProb - UProb; + BlockFrequency U = SuccFreq * UProb; + BlockFrequency V = SuccFreq * VProb; + BlockFrequency F = SuccFreq - Qin; + // If there is a post-dominating successor, here is the calculation: + // BB BB BB BB + // | \Qout | \ | \Qout | \ + // |P C | = |P C | = + // = C' |P C = C' |P C + // | /Qin | | | /Qin | | + // | / | C' (+Succ) | / | C' (+Succ) + // Succ Succ /| Succ Succ /| + // | \ V | \/ | | \ V | \/ | + // |U \ |U /\ =? |U = |U /\ | + // = D = = =?| | D | = =| + // | / |/ D | / |/ D + // | / | / | = | / + // |/ | / |/ | = + // Dom Dom Dom Dom + // '=' : Branch taken for that CFG edge + // The cost for taken branches in the first case is P + U + // Let F = SuccFreq - Qin + // The cost in the second case (assuming independence), given the layout: + // BB, Succ, (C+Succ), D, Dom or the layout: + // BB, Succ, D, Dom, (C+Succ) + // is Qout + max(F, Qin) * U + min(F, Qin) + // compare P + U vs Qout + P * U + Qin. + // + // The 3rd and 4th cases cover when Dom would be chosen to follow Succ. + // + // For the 3rd case, the cost is P + 2 * V + // For the 4th case, the cost is Qout + min(Qin, F) * U + max(Qin, F) * V + V + // We choose 4 over 3 when (P + V) > Qout + min(Qin, F) * U + max(Qin, F) * V + if (UProb > AdjustedSuccSumProb / 2 && + !hasBetterLayoutPredecessor(Succ, PDom, *BlockToChain[PDom], UProb, UProb, + Chain, BlockFilter)) + // Cases 3 & 4 + return greaterWithBias( + (P + V), (Qout + std::max(Qin, F) * VProb + std::min(Qin, F) * UProb), + EntryFreq); + // Cases 1 & 2 + return greaterWithBias((P + U), + (Qout + std::min(Qin, F) * AdjustedSuccSumProb + + std::max(Qin, F) * UProb), + EntryFreq); +} + +/// Check for a trellis layout. \p BB is the upper part of a trellis if its +/// successors form the lower part of a trellis. A successor set S forms the +/// lower part of a trellis if all of the predecessors of S are either in S or +/// have all of S as successors. We ignore trellises where BB doesn't have 2 +/// successors because for fewer than 2, it's trivial, and for 3 or greater they +/// are very uncommon and complex to compute optimally. Allowing edges within S +/// is not strictly a trellis, but the same algorithm works, so we allow it. +bool MachineBlockPlacement::isTrellis( + const MachineBasicBlock *BB, + const SmallVectorImpl<MachineBasicBlock *> &ViableSuccs, + const BlockChain &Chain, const BlockFilterSet *BlockFilter) { + // Technically BB could form a trellis with branching factor higher than 2. + // But that's extremely uncommon. + if (BB->succ_size() != 2 || ViableSuccs.size() != 2) + return false; + + SmallPtrSet<const MachineBasicBlock *, 2> Successors(BB->succ_begin(), + BB->succ_end()); + // To avoid reviewing the same predecessors twice. + SmallPtrSet<const MachineBasicBlock *, 8> SeenPreds; + + for (MachineBasicBlock *Succ : ViableSuccs) { + int PredCount = 0; + for (auto SuccPred : Succ->predecessors()) { + // Allow triangle successors, but don't count them. + if (Successors.count(SuccPred)) { + // Make sure that it is actually a triangle. + for (MachineBasicBlock *CheckSucc : SuccPred->successors()) + if (!Successors.count(CheckSucc)) + return false; + continue; + } + const BlockChain *PredChain = BlockToChain[SuccPred]; + if (SuccPred == BB || (BlockFilter && !BlockFilter->count(SuccPred)) || + PredChain == &Chain || PredChain == BlockToChain[Succ]) continue; - // Check whether the optional branch has exactly one BB. - if (Pred->pred_size() > 1 || *Pred->pred_begin() != BB) + ++PredCount; + // Perform the successor check only once. + if (!SeenPreds.insert(SuccPred).second) continue; - // Check whether the optional branch is small. - if (Pred->size() < OutlineOptionalThreshold) + if (!hasSameSuccessors(*SuccPred, Successors)) return false; } - return true; - } else + // If one of the successors has only BB as a predecessor, it is not a + // trellis. + if (PredCount < 1) + return false; + } + return true; +} + +/// Pick the highest total weight pair of edges that can both be laid out. +/// The edges in \p Edges[0] are assumed to have a different destination than +/// the edges in \p Edges[1]. Simple counting shows that the best pair is either +/// the individual highest weight edges to the 2 different destinations, or in +/// case of a conflict, one of them should be replaced with a 2nd best edge. +std::pair<MachineBlockPlacement::WeightedEdge, + MachineBlockPlacement::WeightedEdge> +MachineBlockPlacement::getBestNonConflictingEdges( + const MachineBasicBlock *BB, + MutableArrayRef<SmallVector<MachineBlockPlacement::WeightedEdge, 8>> + Edges) { + // Sort the edges, and then for each successor, find the best incoming + // predecessor. If the best incoming predecessors aren't the same, + // then that is clearly the best layout. If there is a conflict, one of the + // successors will have to fallthrough from the second best predecessor. We + // compare which combination is better overall. + + // Sort for highest frequency. + auto Cmp = [](WeightedEdge A, WeightedEdge B) { return A.Weight > B.Weight; }; + + std::stable_sort(Edges[0].begin(), Edges[0].end(), Cmp); + std::stable_sort(Edges[1].begin(), Edges[1].end(), Cmp); + auto BestA = Edges[0].begin(); + auto BestB = Edges[1].begin(); + // Arrange for the correct answer to be in BestA and BestB + // If the 2 best edges don't conflict, the answer is already there. + if (BestA->Src == BestB->Src) { + // Compare the total fallthrough of (Best + Second Best) for both pairs + auto SecondBestA = std::next(BestA); + auto SecondBestB = std::next(BestB); + BlockFrequency BestAScore = BestA->Weight + SecondBestB->Weight; + BlockFrequency BestBScore = BestB->Weight + SecondBestA->Weight; + if (BestAScore < BestBScore) + BestA = SecondBestA; + else + BestB = SecondBestB; + } + // Arrange for the BB edge to be in BestA if it exists. + if (BestB->Src == BB) + std::swap(BestA, BestB); + return std::make_pair(*BestA, *BestB); +} + +/// Get the best successor from \p BB based on \p BB being part of a trellis. +/// We only handle trellises with 2 successors, so the algorithm is +/// straightforward: Find the best pair of edges that don't conflict. We find +/// the best incoming edge for each successor in the trellis. If those conflict, +/// we consider which of them should be replaced with the second best. +/// Upon return the two best edges will be in \p BestEdges. If one of the edges +/// comes from \p BB, it will be in \p BestEdges[0] +MachineBlockPlacement::BlockAndTailDupResult +MachineBlockPlacement::getBestTrellisSuccessor( + const MachineBasicBlock *BB, + const SmallVectorImpl<MachineBasicBlock *> &ViableSuccs, + BranchProbability AdjustedSumProb, const BlockChain &Chain, + const BlockFilterSet *BlockFilter) { + + BlockAndTailDupResult Result = {nullptr, false}; + SmallPtrSet<const MachineBasicBlock *, 4> Successors(BB->succ_begin(), + BB->succ_end()); + + // We assume size 2 because it's common. For general n, we would have to do + // the Hungarian algorithm, but it's not worth the complexity because more + // than 2 successors is fairly uncommon, and a trellis even more so. + if (Successors.size() != 2 || ViableSuccs.size() != 2) + return Result; + + // Collect the edge frequencies of all edges that form the trellis. + SmallVector<WeightedEdge, 8> Edges[2]; + int SuccIndex = 0; + for (auto Succ : ViableSuccs) { + for (MachineBasicBlock *SuccPred : Succ->predecessors()) { + // Skip any placed predecessors that are not BB + if (SuccPred != BB) + if ((BlockFilter && !BlockFilter->count(SuccPred)) || + BlockToChain[SuccPred] == &Chain || + BlockToChain[SuccPred] == BlockToChain[Succ]) + continue; + BlockFrequency EdgeFreq = MBFI->getBlockFreq(SuccPred) * + MBPI->getEdgeProbability(SuccPred, Succ); + Edges[SuccIndex].push_back({EdgeFreq, SuccPred, Succ}); + } + ++SuccIndex; + } + + // Pick the best combination of 2 edges from all the edges in the trellis. + WeightedEdge BestA, BestB; + std::tie(BestA, BestB) = getBestNonConflictingEdges(BB, Edges); + + if (BestA.Src != BB) { + // If we have a trellis, and BB doesn't have the best fallthrough edges, + // we shouldn't choose any successor. We've already looked and there's a + // better fallthrough edge for all the successors. + DEBUG(dbgs() << "Trellis, but not one of the chosen edges.\n"); + return Result; + } + + // Did we pick the triangle edge? If tail-duplication is profitable, do + // that instead. Otherwise merge the triangle edge now while we know it is + // optimal. + if (BestA.Dest == BestB.Src) { + // The edges are BB->Succ1->Succ2, and we're looking to see if BB->Succ2 + // would be better. + MachineBasicBlock *Succ1 = BestA.Dest; + MachineBasicBlock *Succ2 = BestB.Dest; + // Check to see if tail-duplication would be profitable. + if (TailDupPlacement && shouldTailDuplicate(Succ2) && + canTailDuplicateUnplacedPreds(BB, Succ2, Chain, BlockFilter) && + isProfitableToTailDup(BB, Succ2, MBPI->getEdgeProbability(BB, Succ1), + Chain, BlockFilter)) { + DEBUG(BranchProbability Succ2Prob = getAdjustedProbability( + MBPI->getEdgeProbability(BB, Succ2), AdjustedSumProb); + dbgs() << " Selected: " << getBlockName(Succ2) + << ", probability: " << Succ2Prob << " (Tail Duplicate)\n"); + Result.BB = Succ2; + Result.ShouldTailDup = true; + return Result; + } + } + // We have already computed the optimal edge for the other side of the + // trellis. + ComputedEdges[BestB.Src] = { BestB.Dest, false }; + + auto TrellisSucc = BestA.Dest; + DEBUG(BranchProbability SuccProb = getAdjustedProbability( + MBPI->getEdgeProbability(BB, TrellisSucc), AdjustedSumProb); + dbgs() << " Selected: " << getBlockName(TrellisSucc) + << ", probability: " << SuccProb << " (Trellis)\n"); + Result.BB = TrellisSucc; + return Result; +} + +/// When the option TailDupPlacement is on, this method checks if the +/// fallthrough candidate block \p Succ (of block \p BB) can be tail-duplicated +/// into all of its unplaced, unfiltered predecessors, that are not BB. +bool MachineBlockPlacement::canTailDuplicateUnplacedPreds( + const MachineBasicBlock *BB, MachineBasicBlock *Succ, + const BlockChain &Chain, const BlockFilterSet *BlockFilter) { + if (!shouldTailDuplicate(Succ)) return false; + + // For CFG checking. + SmallPtrSet<const MachineBasicBlock *, 4> Successors(BB->succ_begin(), + BB->succ_end()); + for (MachineBasicBlock *Pred : Succ->predecessors()) { + // Make sure all unplaced and unfiltered predecessors can be + // tail-duplicated into. + // Skip any blocks that are already placed or not in this loop. + if (Pred == BB || (BlockFilter && !BlockFilter->count(Pred)) + || BlockToChain[Pred] == &Chain) + continue; + if (!TailDup.canTailDuplicate(Succ, Pred)) { + if (Successors.size() > 1 && hasSameSuccessors(*Pred, Successors)) + // This will result in a trellis after tail duplication, so we don't + // need to copy Succ into this predecessor. In the presence + // of a trellis tail duplication can continue to be profitable. + // For example: + // A A + // |\ |\ + // | \ | \ + // | C | C+BB + // | / | | + // |/ | | + // BB => BB | + // |\ |\/| + // | \ |/\| + // | D | D + // | / | / + // |/ |/ + // Succ Succ + // + // After BB was duplicated into C, the layout looks like the one on the + // right. BB and C now have the same successors. When considering + // whether Succ can be duplicated into all its unplaced predecessors, we + // ignore C. + // We can do this because C already has a profitable fallthrough, namely + // D. TODO(iteratee): ignore sufficiently cold predecessors for + // duplication and for this test. + // + // This allows trellises to be laid out in 2 separate chains + // (A,B,Succ,...) and later (C,D,...) This is a reasonable heuristic + // because it allows the creation of 2 fallthrough paths with links + // between them, and we correctly identify the best layout for these + // CFGs. We want to extend trellises that the user created in addition + // to trellises created by tail-duplication, so we just look for the + // CFG. + continue; + return false; + } + } + return true; +} + +/// Find chains of triangles where we believe it would be profitable to +/// tail-duplicate them all, but a local analysis would not find them. +/// There are 3 ways this can be profitable: +/// 1) The post-dominators marked 50% are actually taken 55% (This shrinks with +/// longer chains) +/// 2) The chains are statically correlated. Branch probabilities have a very +/// U-shaped distribution. +/// [http://nrs.harvard.edu/urn-3:HUL.InstRepos:24015805] +/// If the branches in a chain are likely to be from the same side of the +/// distribution as their predecessor, but are independent at runtime, this +/// transformation is profitable. (Because the cost of being wrong is a small +/// fixed cost, unlike the standard triangle layout where the cost of being +/// wrong scales with the # of triangles.) +/// 3) The chains are dynamically correlated. If the probability that a previous +/// branch was taken positively influences whether the next branch will be +/// taken +/// We believe that 2 and 3 are common enough to justify the small margin in 1. +void MachineBlockPlacement::precomputeTriangleChains() { + struct TriangleChain { + std::vector<MachineBasicBlock *> Edges; + TriangleChain(MachineBasicBlock *src, MachineBasicBlock *dst) + : Edges({src, dst}) {} + + void append(MachineBasicBlock *dst) { + assert(getKey()->isSuccessor(dst) && + "Attempting to append a block that is not a successor."); + Edges.push_back(dst); + } + + unsigned count() const { return Edges.size() - 1; } + + MachineBasicBlock *getKey() const { + return Edges.back(); + } + }; + + if (TriangleChainCount == 0) + return; + + DEBUG(dbgs() << "Pre-computing triangle chains.\n"); + // Map from last block to the chain that contains it. This allows us to extend + // chains as we find new triangles. + DenseMap<const MachineBasicBlock *, TriangleChain> TriangleChainMap; + for (MachineBasicBlock &BB : *F) { + // If BB doesn't have 2 successors, it doesn't start a triangle. + if (BB.succ_size() != 2) + continue; + MachineBasicBlock *PDom = nullptr; + for (MachineBasicBlock *Succ : BB.successors()) { + if (!MPDT->dominates(Succ, &BB)) + continue; + PDom = Succ; + break; + } + // If BB doesn't have a post-dominating successor, it doesn't form a + // triangle. + if (PDom == nullptr) + continue; + // If PDom has a hint that it is low probability, skip this triangle. + if (MBPI->getEdgeProbability(&BB, PDom) < BranchProbability(50, 100)) + continue; + // If PDom isn't eligible for duplication, this isn't the kind of triangle + // we're looking for. + if (!shouldTailDuplicate(PDom)) + continue; + bool CanTailDuplicate = true; + // If PDom can't tail-duplicate into it's non-BB predecessors, then this + // isn't the kind of triangle we're looking for. + for (MachineBasicBlock* Pred : PDom->predecessors()) { + if (Pred == &BB) + continue; + if (!TailDup.canTailDuplicate(PDom, Pred)) { + CanTailDuplicate = false; + break; + } + } + // If we can't tail-duplicate PDom to its predecessors, then skip this + // triangle. + if (!CanTailDuplicate) + continue; + + // Now we have an interesting triangle. Insert it if it's not part of an + // existing chain. + // Note: This cannot be replaced with a call insert() or emplace() because + // the find key is BB, but the insert/emplace key is PDom. + auto Found = TriangleChainMap.find(&BB); + // If it is, remove the chain from the map, grow it, and put it back in the + // map with the end as the new key. + if (Found != TriangleChainMap.end()) { + TriangleChain Chain = std::move(Found->second); + TriangleChainMap.erase(Found); + Chain.append(PDom); + TriangleChainMap.insert(std::make_pair(Chain.getKey(), std::move(Chain))); + } else { + auto InsertResult = TriangleChainMap.try_emplace(PDom, &BB, PDom); + assert(InsertResult.second && "Block seen twice."); + (void)InsertResult; + } + } + + // Iterating over a DenseMap is safe here, because the only thing in the body + // of the loop is inserting into another DenseMap (ComputedEdges). + // ComputedEdges is never iterated, so this doesn't lead to non-determinism. + for (auto &ChainPair : TriangleChainMap) { + TriangleChain &Chain = ChainPair.second; + // Benchmarking has shown that due to branch correlation duplicating 2 or + // more triangles is profitable, despite the calculations assuming + // independence. + if (Chain.count() < TriangleChainCount) + continue; + MachineBasicBlock *dst = Chain.Edges.back(); + Chain.Edges.pop_back(); + for (MachineBasicBlock *src : reverse(Chain.Edges)) { + DEBUG(dbgs() << "Marking edge: " << getBlockName(src) << "->" << + getBlockName(dst) << " as pre-computed based on triangles.\n"); + + auto InsertResult = ComputedEdges.insert({src, {dst, true}}); + assert(InsertResult.second && "Block seen twice."); + (void)InsertResult; + + dst = src; + } + } } // When profile is not present, return the StaticLikelyProb. // When profile is available, we need to handle the triangle-shape CFG. static BranchProbability getLayoutSuccessorProbThreshold( - MachineBasicBlock *BB) { + const MachineBasicBlock *BB) { if (!BB->getParent()->getFunction()->getEntryCount()) return BranchProbability(StaticLikelyProb, 100); if (BB->succ_size() == 2) { @@ -609,11 +1211,11 @@ static BranchProbability getLayoutSuccessorProbThreshold( if (Succ1->isSuccessor(Succ2) || Succ2->isSuccessor(Succ1)) { /* See case 1 below for the cost analysis. For BB->Succ to * be taken with smaller cost, the following needs to hold: - * Prob(BB->Succ) > 2* Prob(BB->Pred) - * So the threshold T - * T = 2 * (1-Prob(BB->Pred). Since T + Prob(BB->Pred) == 1, - * We have T + T/2 = 1, i.e. T = 2/3. Also adding user specified - * branch bias, we have + * Prob(BB->Succ) > 2 * Prob(BB->Pred) + * So the threshold T in the calculation below + * (1-T) * Prob(BB->Succ) > T * Prob(BB->Pred) + * So T / (1 - T) = 2, Yielding T = 2/3 + * Also adding user specified branch bias, we have * T = (2/3)*(ProfileLikelyProb/50) * = (2*ProfileLikelyProb)/150) */ @@ -625,10 +1227,17 @@ static BranchProbability getLayoutSuccessorProbThreshold( /// Checks to see if the layout candidate block \p Succ has a better layout /// predecessor than \c BB. If yes, returns true. +/// \p SuccProb: The probability adjusted for only remaining blocks. +/// Only used for logging +/// \p RealSuccProb: The un-adjusted probability. +/// \p Chain: The chain that BB belongs to and Succ is being considered for. +/// \p BlockFilter: if non-null, the set of blocks that make up the loop being +/// considered bool MachineBlockPlacement::hasBetterLayoutPredecessor( - MachineBasicBlock *BB, MachineBasicBlock *Succ, BlockChain &SuccChain, - BranchProbability SuccProb, BranchProbability RealSuccProb, - BlockChain &Chain, const BlockFilterSet *BlockFilter) { + const MachineBasicBlock *BB, const MachineBasicBlock *Succ, + const BlockChain &SuccChain, BranchProbability SuccProb, + BranchProbability RealSuccProb, const BlockChain &Chain, + const BlockFilterSet *BlockFilter) { // There isn't a better layout when there are no unscheduled predecessors. if (SuccChain.UnscheduledPredecessors == 0) @@ -689,9 +1298,9 @@ bool MachineBlockPlacement::hasBetterLayoutPredecessor( // | | | | // ---BB | | BB // | | | | - // | pred-- | Succ-- + // | Pred-- | Succ-- // | | | | - // ---succ ---pred-- + // ---Succ ---Pred-- // // cost = freq(S->Pred) + freq(BB->Succ) cost = 2 * freq (S->Pred) // = freq(S->Pred) + freq(S->BB) @@ -734,11 +1343,12 @@ bool MachineBlockPlacement::hasBetterLayoutPredecessor( // | Pred----| | S1---- // | | | | // --(S1 or S2) ---Pred-- + // | + // S2 // // topo-cost = freq(S->Pred) + freq(BB->S1) + freq(BB->S2) // + min(freq(Pred->S1), freq(Pred->S2)) // Non-topo-order cost: - // In the worst case, S2 will not get laid out after Pred. // non-topo-cost = 2 * freq(S->Pred) + freq(BB->S2). // To be conservative, we can assume that min(freq(Pred->S1), freq(Pred->S2)) // is 0. Then the non topo layout is better when @@ -756,13 +1366,15 @@ bool MachineBlockPlacement::hasBetterLayoutPredecessor( for (MachineBasicBlock *Pred : Succ->predecessors()) { if (Pred == Succ || BlockToChain[Pred] == &SuccChain || (BlockFilter && !BlockFilter->count(Pred)) || - BlockToChain[Pred] == &Chain) + BlockToChain[Pred] == &Chain || + // This check is redundant except for look ahead. This function is + // called for lookahead by isProfitableToTailDup when BB hasn't been + // placed yet. + (Pred == BB)) continue; // Do backward checking. // For all cases above, we need a backward checking to filter out edges that - // are not 'strongly' biased. With profile data available, the check is - // mostly redundant for case 2 (when threshold prob is set at 50%) unless S - // has more than two successors. + // are not 'strongly' biased. // BB Pred // \ / // Succ @@ -798,14 +1410,15 @@ bool MachineBlockPlacement::hasBetterLayoutPredecessor( /// breaking CFG structure, but cave and break such structures in the case of /// very hot successor edges. /// -/// \returns The best successor block found, or null if none are viable. -MachineBasicBlock * -MachineBlockPlacement::selectBestSuccessor(MachineBasicBlock *BB, - BlockChain &Chain, - const BlockFilterSet *BlockFilter) { +/// \returns The best successor block found, or null if none are viable, along +/// with a boolean indicating if tail duplication is necessary. +MachineBlockPlacement::BlockAndTailDupResult +MachineBlockPlacement::selectBestSuccessor( + const MachineBasicBlock *BB, const BlockChain &Chain, + const BlockFilterSet *BlockFilter) { const BranchProbability HotProb(StaticLikelyProb, 100); - MachineBasicBlock *BestSucc = nullptr; + BlockAndTailDupResult BestSucc = { nullptr, false }; auto BestProb = BranchProbability::getZero(); SmallVector<MachineBasicBlock *, 4> Successors; @@ -813,22 +1426,45 @@ MachineBlockPlacement::selectBestSuccessor(MachineBasicBlock *BB, collectViableSuccessors(BB, Chain, BlockFilter, Successors); DEBUG(dbgs() << "Selecting best successor for: " << getBlockName(BB) << "\n"); + + // if we already precomputed the best successor for BB, return that if still + // applicable. + auto FoundEdge = ComputedEdges.find(BB); + if (FoundEdge != ComputedEdges.end()) { + MachineBasicBlock *Succ = FoundEdge->second.BB; + ComputedEdges.erase(FoundEdge); + BlockChain *SuccChain = BlockToChain[Succ]; + if (BB->isSuccessor(Succ) && (!BlockFilter || BlockFilter->count(Succ)) && + SuccChain != &Chain && Succ == *SuccChain->begin()) + return FoundEdge->second; + } + + // if BB is part of a trellis, Use the trellis to determine the optimal + // fallthrough edges + if (isTrellis(BB, Successors, Chain, BlockFilter)) + return getBestTrellisSuccessor(BB, Successors, AdjustedSumProb, Chain, + BlockFilter); + + // For blocks with CFG violations, we may be able to lay them out anyway with + // tail-duplication. We keep this vector so we can perform the probability + // calculations the minimum number of times. + SmallVector<std::tuple<BranchProbability, MachineBasicBlock *>, 4> + DupCandidates; for (MachineBasicBlock *Succ : Successors) { auto RealSuccProb = MBPI->getEdgeProbability(BB, Succ); BranchProbability SuccProb = getAdjustedProbability(RealSuccProb, AdjustedSumProb); - // This heuristic is off by default. - if (shouldPredBlockBeOutlined(BB, Succ, Chain, BlockFilter, SuccProb, - HotProb)) - return Succ; - BlockChain &SuccChain = *BlockToChain[Succ]; // Skip the edge \c BB->Succ if block \c Succ has a better layout // predecessor that yields lower global cost. if (hasBetterLayoutPredecessor(BB, Succ, SuccChain, SuccProb, RealSuccProb, - Chain, BlockFilter)) + Chain, BlockFilter)) { + // If tail duplication would make Succ profitable, place it. + if (TailDupPlacement && shouldTailDuplicate(Succ)) + DupCandidates.push_back(std::make_tuple(SuccProb, Succ)); continue; + } DEBUG( dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: " @@ -836,17 +1472,48 @@ MachineBlockPlacement::selectBestSuccessor(MachineBasicBlock *BB, << (SuccChain.UnscheduledPredecessors != 0 ? " (CFG break)" : "") << "\n"); - if (BestSucc && BestProb >= SuccProb) { + if (BestSucc.BB && BestProb >= SuccProb) { DEBUG(dbgs() << " Not the best candidate, continuing\n"); continue; } DEBUG(dbgs() << " Setting it as best candidate\n"); - BestSucc = Succ; + BestSucc.BB = Succ; BestProb = SuccProb; } - if (BestSucc) - DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc) << "\n"); + // Handle the tail duplication candidates in order of decreasing probability. + // Stop at the first one that is profitable. Also stop if they are less + // profitable than BestSucc. Position is important because we preserve it and + // prefer first best match. Here we aren't comparing in order, so we capture + // the position instead. + if (DupCandidates.size() != 0) { + auto cmp = + [](const std::tuple<BranchProbability, MachineBasicBlock *> &a, + const std::tuple<BranchProbability, MachineBasicBlock *> &b) { + return std::get<0>(a) > std::get<0>(b); + }; + std::stable_sort(DupCandidates.begin(), DupCandidates.end(), cmp); + } + for(auto &Tup : DupCandidates) { + BranchProbability DupProb; + MachineBasicBlock *Succ; + std::tie(DupProb, Succ) = Tup; + if (DupProb < BestProb) + break; + if (canTailDuplicateUnplacedPreds(BB, Succ, Chain, BlockFilter) + && (isProfitableToTailDup(BB, Succ, BestProb, Chain, BlockFilter))) { + DEBUG( + dbgs() << " Candidate: " << getBlockName(Succ) << ", probability: " + << DupProb + << " (Tail Duplicate)\n"); + BestSucc.BB = Succ; + BestSucc.ShouldTailDup = true; + break; + } + } + + if (BestSucc.BB) + DEBUG(dbgs() << " Selected: " << getBlockName(BestSucc.BB) << "\n"); return BestSucc; } @@ -862,7 +1529,7 @@ MachineBlockPlacement::selectBestSuccessor(MachineBasicBlock *BB, /// /// \returns The best block found, or null if none are viable. MachineBasicBlock *MachineBlockPlacement::selectBestCandidateBlock( - BlockChain &Chain, SmallVectorImpl<MachineBasicBlock *> &WorkList) { + const BlockChain &Chain, SmallVectorImpl<MachineBasicBlock *> &WorkList) { // Once we need to walk the worklist looking for a candidate, cleanup the // worklist of already placed entries. // FIXME: If this shows up on profiles, it could be folded (at the cost of @@ -881,13 +1548,15 @@ MachineBasicBlock *MachineBlockPlacement::selectBestCandidateBlock( MachineBasicBlock *BestBlock = nullptr; BlockFrequency BestFreq; for (MachineBasicBlock *MBB : WorkList) { - assert(MBB->isEHPad() == IsEHPad); + assert(MBB->isEHPad() == IsEHPad && + "EHPad mismatch between block and work list."); BlockChain &SuccChain = *BlockToChain[MBB]; if (&SuccChain == &Chain) continue; - assert(SuccChain.UnscheduledPredecessors == 0 && "Found CFG-violating block"); + assert(SuccChain.UnscheduledPredecessors == 0 && + "Found CFG-violating block"); BlockFrequency CandidateFreq = MBFI->getBlockFreq(MBB); DEBUG(dbgs() << " " << getBlockName(MBB) << " -> "; @@ -948,16 +1617,19 @@ MachineBasicBlock *MachineBlockPlacement::getFirstUnplacedBlock( } void MachineBlockPlacement::fillWorkLists( - MachineBasicBlock *MBB, + const MachineBasicBlock *MBB, SmallPtrSetImpl<BlockChain *> &UpdatedPreds, const BlockFilterSet *BlockFilter = nullptr) { BlockChain &Chain = *BlockToChain[MBB]; if (!UpdatedPreds.insert(&Chain).second) return; - assert(Chain.UnscheduledPredecessors == 0); + assert( + Chain.UnscheduledPredecessors == 0 && + "Attempting to place block with unscheduled predecessors in worklist."); for (MachineBasicBlock *ChainBB : Chain) { - assert(BlockToChain[ChainBB] == &Chain); + assert(BlockToChain[ChainBB] == &Chain && + "Block in chain doesn't match BlockToChain map."); for (MachineBasicBlock *Pred : ChainBB->predecessors()) { if (BlockFilter && !BlockFilter->count(Pred)) continue; @@ -970,23 +1642,23 @@ void MachineBlockPlacement::fillWorkLists( if (Chain.UnscheduledPredecessors != 0) return; - MBB = *Chain.begin(); - if (MBB->isEHPad()) - EHPadWorkList.push_back(MBB); + MachineBasicBlock *BB = *Chain.begin(); + if (BB->isEHPad()) + EHPadWorkList.push_back(BB); else - BlockWorkList.push_back(MBB); + BlockWorkList.push_back(BB); } void MachineBlockPlacement::buildChain( - MachineBasicBlock *BB, BlockChain &Chain, + const MachineBasicBlock *HeadBB, BlockChain &Chain, BlockFilterSet *BlockFilter) { - assert(BB && "BB must not be null.\n"); - assert(BlockToChain[BB] == &Chain && "BlockToChainMap mis-match.\n"); + assert(HeadBB && "BB must not be null.\n"); + assert(BlockToChain[HeadBB] == &Chain && "BlockToChainMap mis-match.\n"); MachineFunction::iterator PrevUnplacedBlockIt = F->begin(); - MachineBasicBlock *LoopHeaderBB = BB; + const MachineBasicBlock *LoopHeaderBB = HeadBB; markChainSuccessors(Chain, LoopHeaderBB, BlockFilter); - BB = *std::prev(Chain.end()); + MachineBasicBlock *BB = *std::prev(Chain.end()); for (;;) { assert(BB && "null block found at end of chain in loop."); assert(BlockToChain[BB] == &Chain && "BlockToChainMap mis-match in loop."); @@ -995,7 +1667,11 @@ void MachineBlockPlacement::buildChain( // Look for the best viable successor if there is one to place immediately // after this block. - MachineBasicBlock *BestSucc = selectBestSuccessor(BB, Chain, BlockFilter); + auto Result = selectBestSuccessor(BB, Chain, BlockFilter); + MachineBasicBlock* BestSucc = Result.BB; + bool ShouldTailDup = Result.ShouldTailDup; + if (TailDupPlacement) + ShouldTailDup |= (BestSucc && shouldTailDuplicate(BestSucc)); // If an immediate successor isn't available, look for the best viable // block among those we've identified as not violating the loop's CFG at @@ -1016,7 +1692,7 @@ void MachineBlockPlacement::buildChain( // Placement may have changed tail duplication opportunities. // Check for that now. - if (TailDupPlacement && BestSucc) { + if (TailDupPlacement && BestSucc && ShouldTailDup) { // If the chosen successor was duplicated into all its predecessors, // don't bother laying it out, just go round the loop again with BB as // the chain end. @@ -1052,7 +1728,7 @@ void MachineBlockPlacement::buildChain( /// unconditional jump (for the backedge) rotating it in front of the loop /// header is always profitable. MachineBasicBlock * -MachineBlockPlacement::findBestLoopTop(MachineLoop &L, +MachineBlockPlacement::findBestLoopTop(const MachineLoop &L, const BlockFilterSet &LoopBlockSet) { // Placing the latch block before the header may introduce an extra branch // that skips this block the first time the loop is executed, which we want @@ -1116,7 +1792,7 @@ MachineBlockPlacement::findBestLoopTop(MachineLoop &L, /// block to layout at the top of the loop. Typically this is done to maximize /// fallthrough opportunities. MachineBasicBlock * -MachineBlockPlacement::findBestLoopExit(MachineLoop &L, +MachineBlockPlacement::findBestLoopExit(const MachineLoop &L, const BlockFilterSet &LoopBlockSet) { // We don't want to layout the loop linearly in all cases. If the loop header // is just a normal basic block in the loop, we want to look for what block @@ -1235,12 +1911,18 @@ MachineBlockPlacement::findBestLoopExit(MachineLoop &L, /// branches. For example, if the loop has fallthrough into its header and out /// of its bottom already, don't rotate it. void MachineBlockPlacement::rotateLoop(BlockChain &LoopChain, - MachineBasicBlock *ExitingBB, + const MachineBasicBlock *ExitingBB, const BlockFilterSet &LoopBlockSet) { if (!ExitingBB) return; MachineBasicBlock *Top = *LoopChain.begin(); + MachineBasicBlock *Bottom = *std::prev(LoopChain.end()); + + // If ExitingBB is already the last one in a chain then nothing to do. + if (Bottom == ExitingBB) + return; + bool ViableTopFallthrough = false; for (MachineBasicBlock *Pred : Top->predecessors()) { BlockChain *PredChain = BlockToChain[Pred]; @@ -1255,7 +1937,6 @@ void MachineBlockPlacement::rotateLoop(BlockChain &LoopChain, // bottom is a viable exiting block. If so, bail out as rotating will // introduce an unnecessary branch. if (ViableTopFallthrough) { - MachineBasicBlock *Bottom = *std::prev(LoopChain.end()); for (MachineBasicBlock *Succ : Bottom->successors()) { BlockChain *SuccChain = BlockToChain[Succ]; if (!LoopBlockSet.count(Succ) && @@ -1268,6 +1949,36 @@ void MachineBlockPlacement::rotateLoop(BlockChain &LoopChain, if (ExitIt == LoopChain.end()) return; + // Rotating a loop exit to the bottom when there is a fallthrough to top + // trades the entry fallthrough for an exit fallthrough. + // If there is no bottom->top edge, but the chosen exit block does have + // a fallthrough, we break that fallthrough for nothing in return. + + // Let's consider an example. We have a built chain of basic blocks + // B1, B2, ..., Bn, where Bk is a ExitingBB - chosen exit block. + // By doing a rotation we get + // Bk+1, ..., Bn, B1, ..., Bk + // Break of fallthrough to B1 is compensated by a fallthrough from Bk. + // If we had a fallthrough Bk -> Bk+1 it is broken now. + // It might be compensated by fallthrough Bn -> B1. + // So we have a condition to avoid creation of extra branch by loop rotation. + // All below must be true to avoid loop rotation: + // If there is a fallthrough to top (B1) + // There was fallthrough from chosen exit block (Bk) to next one (Bk+1) + // There is no fallthrough from bottom (Bn) to top (B1). + // Please note that there is no exit fallthrough from Bn because we checked it + // above. + if (ViableTopFallthrough) { + assert(std::next(ExitIt) != LoopChain.end() && + "Exit should not be last BB"); + MachineBasicBlock *NextBlockInChain = *std::next(ExitIt); + if (ExitingBB->isSuccessor(NextBlockInChain)) + if (!Bottom->isSuccessor(Top)) + return; + } + + DEBUG(dbgs() << "Rotating loop to put exit " << getBlockName(ExitingBB) + << " at bottom\n"); std::rotate(LoopChain.begin(), std::next(ExitIt), LoopChain.end()); } @@ -1285,7 +1996,8 @@ void MachineBlockPlacement::rotateLoop(BlockChain &LoopChain, /// Therefore, the cost for a given rotation is the sum of costs listed above. /// We select the best rotation with the smallest cost. void MachineBlockPlacement::rotateLoopWithProfile( - BlockChain &LoopChain, MachineLoop &L, const BlockFilterSet &LoopBlockSet) { + BlockChain &LoopChain, const MachineLoop &L, + const BlockFilterSet &LoopBlockSet) { auto HeaderBB = L.getHeader(); auto HeaderIter = find(LoopChain, HeaderBB); auto RotationPos = LoopChain.end(); @@ -1422,7 +2134,7 @@ void MachineBlockPlacement::rotateLoopWithProfile( /// When profile data is available, exclude cold blocks from the returned set; /// otherwise, collect all blocks in the loop. MachineBlockPlacement::BlockFilterSet -MachineBlockPlacement::collectLoopBlockSet(MachineLoop &L) { +MachineBlockPlacement::collectLoopBlockSet(const MachineLoop &L) { BlockFilterSet LoopBlockSet; // Filter cold blocks off from LoopBlockSet when profile data is available. @@ -1459,14 +2171,16 @@ MachineBlockPlacement::collectLoopBlockSet(MachineLoop &L) { /// as much as possible. We can then stitch the chains together in a way which /// both preserves the topological structure and minimizes taken conditional /// branches. -void MachineBlockPlacement::buildLoopChains(MachineLoop &L) { +void MachineBlockPlacement::buildLoopChains(const MachineLoop &L) { // First recurse through any nested loops, building chains for those inner // loops. - for (MachineLoop *InnerLoop : L) + for (const MachineLoop *InnerLoop : L) buildLoopChains(*InnerLoop); - assert(BlockWorkList.empty()); - assert(EHPadWorkList.empty()); + assert(BlockWorkList.empty() && + "BlockWorkList not empty when starting to build loop chains."); + assert(EHPadWorkList.empty() && + "EHPadWorkList not empty when starting to build loop chains."); BlockFilterSet LoopBlockSet = collectLoopBlockSet(L); // Check if we have profile data for this function. If yes, we will rotate @@ -1496,10 +2210,11 @@ void MachineBlockPlacement::buildLoopChains(MachineLoop &L) { // walk the blocks, and use a set to prevent visiting a particular chain // twice. SmallPtrSet<BlockChain *, 4> UpdatedPreds; - assert(LoopChain.UnscheduledPredecessors == 0); + assert(LoopChain.UnscheduledPredecessors == 0 && + "LoopChain should not have unscheduled predecessors."); UpdatedPreds.insert(&LoopChain); - for (MachineBasicBlock *LoopBB : LoopBlockSet) + for (const MachineBasicBlock *LoopBB : LoopBlockSet) fillWorkLists(LoopBB, UpdatedPreds, &LoopBlockSet); buildChain(LoopTop, LoopChain, &LoopBlockSet); @@ -1533,7 +2248,7 @@ void MachineBlockPlacement::buildLoopChains(MachineLoop &L) { if (!LoopBlockSet.empty()) { BadLoop = true; - for (MachineBasicBlock *LoopBB : LoopBlockSet) + for (const MachineBasicBlock *LoopBB : LoopBlockSet) dbgs() << "Loop contains blocks never placed into a chain!\n" << " Loop header: " << getBlockName(*L.block_begin()) << "\n" << " Chain header: " << getBlockName(*LoopChain.begin()) << "\n" @@ -1546,31 +2261,6 @@ void MachineBlockPlacement::buildLoopChains(MachineLoop &L) { EHPadWorkList.clear(); } -/// When OutlineOpitonalBranches is on, this method collects BBs that -/// dominates all terminator blocks of the function \p F. -void MachineBlockPlacement::collectMustExecuteBBs() { - if (OutlineOptionalBranches) { - // Find the nearest common dominator of all of F's terminators. - MachineBasicBlock *Terminator = nullptr; - for (MachineBasicBlock &MBB : *F) { - if (MBB.succ_size() == 0) { - if (Terminator == nullptr) - Terminator = &MBB; - else - Terminator = MDT->findNearestCommonDominator(Terminator, &MBB); - } - } - - // MBBs dominating this common dominator are unavoidable. - UnavoidableBlocks.clear(); - for (MachineBasicBlock &MBB : *F) { - if (MDT->dominates(&MBB, Terminator)) { - UnavoidableBlocks.insert(&MBB); - } - } - } -} - void MachineBlockPlacement::buildCFGChains() { // Ensure that every BB in the function has an associated chain to simplify // the assumptions of the remaining algorithm. @@ -1605,16 +2295,15 @@ void MachineBlockPlacement::buildCFGChains() { } } - // Turned on with OutlineOptionalBranches option - collectMustExecuteBBs(); - // Build any loop-based chains. PreferredLoopExit = nullptr; for (MachineLoop *L : *MLI) buildLoopChains(*L); - assert(BlockWorkList.empty()); - assert(EHPadWorkList.empty()); + assert(BlockWorkList.empty() && + "BlockWorkList should be empty before building final chain."); + assert(EHPadWorkList.empty() && + "EHPadWorkList should be empty before building final chain."); SmallPtrSet<BlockChain *, 4> UpdatedPreds; for (MachineBasicBlock &MBB : *F) @@ -1839,7 +2528,7 @@ void MachineBlockPlacement::alignBlocks() { /// @return true if \p BB was removed. bool MachineBlockPlacement::repeatedlyTailDuplicateBlock( MachineBasicBlock *BB, MachineBasicBlock *&LPred, - MachineBasicBlock *LoopHeaderBB, + const MachineBasicBlock *LoopHeaderBB, BlockChain &Chain, BlockFilterSet *BlockFilter, MachineFunction::iterator &PrevUnplacedBlockIt) { bool Removed, DuplicatedToLPred; @@ -1901,21 +2590,16 @@ bool MachineBlockPlacement::repeatedlyTailDuplicateBlock( /// \return - True if the block was duplicated into all preds and removed. bool MachineBlockPlacement::maybeTailDuplicateBlock( MachineBasicBlock *BB, MachineBasicBlock *LPred, - const BlockChain &Chain, BlockFilterSet *BlockFilter, + BlockChain &Chain, BlockFilterSet *BlockFilter, MachineFunction::iterator &PrevUnplacedBlockIt, bool &DuplicatedToLPred) { - DuplicatedToLPred = false; + if (!shouldTailDuplicate(BB)) + return false; + DEBUG(dbgs() << "Redoing tail duplication for Succ#" << BB->getNumber() << "\n"); - bool IsSimple = TailDup.isSimpleBB(BB); - // Blocks with single successors don't create additional fallthrough - // opportunities. Don't duplicate them. TODO: When conditional exits are - // analyzable, allow them to be duplicated. - if (!IsSimple && BB->succ_size() == 1) - return false; - if (!TailDup.shouldTailDuplicate(IsSimple, *BB)) - return false; + // This has to be a callback because none of it can be done after // BB is deleted. bool Removed = false; @@ -1967,6 +2651,7 @@ bool MachineBlockPlacement::maybeTailDuplicateBlock( llvm::function_ref<void(MachineBasicBlock*)>(RemovalCallback); SmallVector<MachineBasicBlock *, 8> DuplicatedPreds; + bool IsSimple = TailDup.isSimpleBB(BB); TailDup.tailDuplicateAndUpdate(IsSimple, BB, LPred, &DuplicatedPreds, &RemovalCallbackRef); @@ -2006,25 +2691,46 @@ bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) { MLI = &getAnalysis<MachineLoopInfo>(); TII = MF.getSubtarget().getInstrInfo(); TLI = MF.getSubtarget().getTargetLowering(); - MDT = &getAnalysis<MachineDominatorTree>(); + MPDT = nullptr; // Initialize PreferredLoopExit to nullptr here since it may never be set if // there are no MachineLoops. PreferredLoopExit = nullptr; + assert(BlockToChain.empty() && + "BlockToChain map should be empty before starting placement."); + assert(ComputedEdges.empty() && + "Computed Edge map should be empty before starting placement."); + + unsigned TailDupSize = TailDupPlacementThreshold; + // If only the aggressive threshold is explicitly set, use it. + if (TailDupPlacementAggressiveThreshold.getNumOccurrences() != 0 && + TailDupPlacementThreshold.getNumOccurrences() == 0) + TailDupSize = TailDupPlacementAggressiveThreshold; + + TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>(); + // For agressive optimization, we can adjust some thresholds to be less + // conservative. + if (PassConfig->getOptLevel() >= CodeGenOpt::Aggressive) { + // At O3 we should be more willing to copy blocks for tail duplication. This + // increases size pressure, so we only do it at O3 + // Do this unless only the regular threshold is explicitly set. + if (TailDupPlacementThreshold.getNumOccurrences() == 0 || + TailDupPlacementAggressiveThreshold.getNumOccurrences() != 0) + TailDupSize = TailDupPlacementAggressiveThreshold; + } + if (TailDupPlacement) { - unsigned TailDupSize = TailDuplicatePlacementThreshold; + MPDT = &getAnalysis<MachinePostDominatorTree>(); if (MF.getFunction()->optForSize()) TailDupSize = 1; TailDup.initMF(MF, MBPI, /* LayoutMode */ true, TailDupSize); + precomputeTriangleChains(); } - assert(BlockToChain.empty()); - buildCFGChains(); // Changing the layout can create new tail merging opportunities. - TargetPassConfig *PassConfig = &getAnalysis<TargetPassConfig>(); // TailMerge can create jump into if branches that make CFG irreducible for // HW that requires structured CFG. bool EnableTailMerge = !MF.getTarget().requiresStructuredCFG() && @@ -2032,7 +2738,7 @@ bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) { BranchFoldPlacement; // No tail merging opportunities if the block number is less than four. if (MF.size() > 3 && EnableTailMerge) { - unsigned TailMergeSize = TailDuplicatePlacementThreshold + 1; + unsigned TailMergeSize = TailDupSize + 1; BranchFolder BF(/*EnableTailMerge=*/true, /*CommonHoist=*/false, *MBFI, *MBPI, TailMergeSize); @@ -2041,8 +2747,10 @@ bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) { /*AfterBlockPlacement=*/true)) { // Redo the layout if tail merging creates/removes/moves blocks. BlockToChain.clear(); - // Must redo the dominator tree if blocks were changed. - MDT->runOnMachineFunction(MF); + ComputedEdges.clear(); + // Must redo the post-dominator tree if blocks were changed. + if (MPDT) + MPDT->runOnMachineFunction(MF); ChainAllocator.DestroyAll(); buildCFGChains(); } @@ -2052,6 +2760,7 @@ bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) { alignBlocks(); BlockToChain.clear(); + ComputedEdges.clear(); ChainAllocator.DestroyAll(); if (AlignAllBlock) @@ -2067,6 +2776,12 @@ bool MachineBlockPlacement::runOnMachineFunction(MachineFunction &MF) { MBI->setAlignment(AlignAllNonFallThruBlocks); } } + if (ViewBlockLayoutWithBFI != GVDT_None && + (ViewBlockFreqFuncName.empty() || + F->getFunction()->getName().equals(ViewBlockFreqFuncName))) { + MBFI->view("MBP." + MF.getName(), false); + } + // We always return true as we have no way to track whether the final order // differs from the original order. |