summaryrefslogtreecommitdiffstats
path: root/contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
diff options
context:
space:
mode:
authordim <dim@FreeBSD.org>2017-04-02 17:24:58 +0000
committerdim <dim@FreeBSD.org>2017-04-02 17:24:58 +0000
commit60b571e49a90d38697b3aca23020d9da42fc7d7f (patch)
tree99351324c24d6cb146b6285b6caffa4d26fce188 /contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
parentbea1b22c7a9bce1dfdd73e6e5b65bc4752215180 (diff)
downloadFreeBSD-src-60b571e49a90d38697b3aca23020d9da42fc7d7f.zip
FreeBSD-src-60b571e49a90d38697b3aca23020d9da42fc7d7f.tar.gz
Update clang, llvm, lld, lldb, compiler-rt and libc++ to 4.0.0 release:
MFC r309142 (by emaste): Add WITH_LLD_AS_LD build knob If set it installs LLD as /usr/bin/ld. LLD (as of version 3.9) is not capable of linking the world and kernel, but can self-host and link many substantial applications. GNU ld continues to be used for the world and kernel build, regardless of how this knob is set. It is on by default for arm64, and off for all other CPU architectures. Sponsored by: The FreeBSD Foundation MFC r310840: Reapply 310775, now it also builds correctly if lldb is disabled: Move llvm-objdump from CLANG_EXTRAS to installed by default We currently install three tools from binutils 2.17.50: as, ld, and objdump. Work is underway to migrate to a permissively-licensed tool-chain, with one goal being the retirement of binutils 2.17.50. LLVM's llvm-objdump is intended to be compatible with GNU objdump although it is currently missing some options and may have formatting differences. Enable it by default for testing and further investigation. It may later be changed to install as /usr/bin/objdump, it becomes a fully viable replacement. Reviewed by: emaste Differential Revision: https://reviews.freebsd.org/D8879 MFC r312855 (by emaste): Rename LLD_AS_LD to LLD_IS_LD, for consistency with CLANG_IS_CC Reported by: Dan McGregor <dan.mcgregor usask.ca> MFC r313559 | glebius | 2017-02-10 18:34:48 +0100 (Fri, 10 Feb 2017) | 5 lines Don't check struct rtentry on FreeBSD, it is an internal kernel structure. On other systems it may be API structure for SIOCADDRT/SIOCDELRT. Reviewed by: emaste, dim MFC r314152 (by jkim): Remove an assembler flag, which is redundant since r309124. The upstream took care of it by introducing a macro NO_EXEC_STACK_DIRECTIVE. http://llvm.org/viewvc/llvm-project?rev=273500&view=rev Reviewed by: dim MFC r314564: Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to 4.0.0 (branches/release_40 296509). The release will follow soon. Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11 support to build; see UPDATING for more information. Also note that as of 4.0.0, lld should be able to link the base system on amd64 and aarch64. See the WITH_LLD_IS_LLD setting in src.conf(5). Though please be aware that this is work in progress. Release notes for llvm, clang and lld will be available here: <http://releases.llvm.org/4.0.0/docs/ReleaseNotes.html> <http://releases.llvm.org/4.0.0/tools/clang/docs/ReleaseNotes.html> <http://releases.llvm.org/4.0.0/tools/lld/docs/ReleaseNotes.html> Thanks to Ed Maste, Jan Beich, Antoine Brodin and Eric Fiselier for their help. Relnotes: yes Exp-run: antoine PR: 215969, 216008 MFC r314708: For now, revert r287232 from upstream llvm trunk (by Daniil Fukalov): [SCEV] limit recursion depth of CompareSCEVComplexity Summary: CompareSCEVComplexity goes too deep (50+ on a quite a big unrolled loop) and runs almost infinite time. Added cache of "equal" SCEV pairs to earlier cutoff of further estimation. Recursion depth limit was also introduced as a parameter. Reviewers: sanjoy Subscribers: mzolotukhin, tstellarAMD, llvm-commits Differential Revision: https://reviews.llvm.org/D26389 This commit is the cause of excessive compile times on skein_block.c (and possibly other files) during kernel builds on amd64. We never saw the problematic behavior described in this upstream commit, so for now it is better to revert it. An upstream bug has been filed here: https://bugs.llvm.org/show_bug.cgi?id=32142 Reported by: mjg MFC r314795: Reapply r287232 from upstream llvm trunk (by Daniil Fukalov): [SCEV] limit recursion depth of CompareSCEVComplexity Summary: CompareSCEVComplexity goes too deep (50+ on a quite a big unrolled loop) and runs almost infinite time. Added cache of "equal" SCEV pairs to earlier cutoff of further estimation. Recursion depth limit was also introduced as a parameter. Reviewers: sanjoy Subscribers: mzolotukhin, tstellarAMD, llvm-commits Differential Revision: https://reviews.llvm.org/D26389 Pull in r296992 from upstream llvm trunk (by Sanjoy Das): [SCEV] Decrease the recursion threshold for CompareValueComplexity Fixes PR32142. r287232 accidentally increased the recursion threshold for CompareValueComplexity from 2 to 32. This change reverses that change by introducing a separate flag for CompareValueComplexity's threshold. The latter revision fixes the excessive compile times for skein_block.c. MFC r314907 | mmel | 2017-03-08 12:40:27 +0100 (Wed, 08 Mar 2017) | 7 lines Unbreak ARMv6 world. The new compiler_rt library imported with clang 4.0.0 have several fatal issues (non-functional __udivsi3 for example) with ARM specific instrict functions. As temporary workaround, until upstream solve these problems, disable all thumb[1][2] related feature. MFC r315016: Update clang, llvm, lld, lldb, compiler-rt and libc++ to 4.0.0 release. We were already very close to the last release candidate, so this is a pretty minor update. Relnotes: yes MFC r316005: Revert r314907, and pull in r298713 from upstream compiler-rt trunk (by Weiming Zhao): builtins: Select correct code fragments when compiling for Thumb1/Thum2/ARM ISA. Summary: Value of __ARM_ARCH_ISA_THUMB isn't based on the actual compilation mode (-mthumb, -marm), it reflect's capability of given CPU. Due to this: - use __tbumb__ and __thumb2__ insteand of __ARM_ARCH_ISA_THUMB - use '.thumb' directive consistently in all affected files - decorate all thumb functions using DEFINE_COMPILERRT_THUMB_FUNCTION() --------- Note: This patch doesn't fix broken Thumb1 variant of __udivsi3 ! Reviewers: weimingz, rengolin, compnerd Subscribers: aemerson, dim Differential Revision: https://reviews.llvm.org/D30938 Discussed with: mmel
Diffstat (limited to 'contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp')
-rw-r--r--contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp374
1 files changed, 297 insertions, 77 deletions
diff --git a/contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp b/contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
index bfb0980..40537e4 100644
--- a/contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
+++ b/contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp
@@ -21,10 +21,12 @@
#include "llvm/ADT/Statistic.h"
#include "llvm/ADT/StringExtras.h"
#include "llvm/Analysis/ModuleSummaryAnalysis.h"
+#include "llvm/Analysis/ProfileSummaryInfo.h"
#include "llvm/Analysis/TargetLibraryInfo.h"
#include "llvm/Analysis/TargetTransformInfo.h"
+#include "llvm/Bitcode/BitcodeReader.h"
+#include "llvm/Bitcode/BitcodeWriter.h"
#include "llvm/Bitcode/BitcodeWriterPass.h"
-#include "llvm/Bitcode/ReaderWriter.h"
#include "llvm/ExecutionEngine/ObjectMemoryBuffer.h"
#include "llvm/IR/DiagnosticPrinter.h"
#include "llvm/IR/LLVMContext.h"
@@ -38,10 +40,13 @@
#include "llvm/Object/ModuleSummaryIndexObjectFile.h"
#include "llvm/Support/CachePruning.h"
#include "llvm/Support/Debug.h"
+#include "llvm/Support/Error.h"
#include "llvm/Support/Path.h"
#include "llvm/Support/SHA1.h"
#include "llvm/Support/TargetRegistry.h"
#include "llvm/Support/ThreadPool.h"
+#include "llvm/Support/Threading.h"
+#include "llvm/Support/ToolOutputFile.h"
#include "llvm/Target/TargetMachine.h"
#include "llvm/Transforms/IPO.h"
#include "llvm/Transforms/IPO/FunctionImport.h"
@@ -59,17 +64,34 @@ using namespace llvm;
namespace llvm {
// Flags -discard-value-names, defined in LTOCodeGenerator.cpp
extern cl::opt<bool> LTODiscardValueNames;
+extern cl::opt<std::string> LTORemarksFilename;
+extern cl::opt<bool> LTOPassRemarksWithHotness;
}
namespace {
-static cl::opt<int> ThreadCount("threads",
- cl::init(std::thread::hardware_concurrency()));
+static cl::opt<int>
+ ThreadCount("threads", cl::init(llvm::heavyweight_hardware_concurrency()));
-static void diagnosticHandler(const DiagnosticInfo &DI) {
- DiagnosticPrinterRawOStream DP(errs());
- DI.print(DP);
- errs() << '\n';
+Expected<std::unique_ptr<tool_output_file>>
+setupOptimizationRemarks(LLVMContext &Ctx, int Count) {
+ if (LTOPassRemarksWithHotness)
+ Ctx.setDiagnosticHotnessRequested(true);
+
+ if (LTORemarksFilename.empty())
+ return nullptr;
+
+ std::string FileName =
+ LTORemarksFilename + ".thin." + llvm::utostr(Count) + ".yaml";
+ std::error_code EC;
+ auto DiagnosticOutputFile =
+ llvm::make_unique<tool_output_file>(FileName, EC, sys::fs::F_None);
+ if (EC)
+ return errorCodeToError(EC);
+ Ctx.setDiagnosticsOutputFile(
+ llvm::make_unique<yaml::Output>(DiagnosticOutputFile->os()));
+ DiagnosticOutputFile->keep();
+ return std::move(DiagnosticOutputFile);
}
// Simple helper to save temporary files for debug.
@@ -78,9 +100,9 @@ static void saveTempBitcode(const Module &TheModule, StringRef TempDir,
if (TempDir.empty())
return;
// User asked to save temps, let dump the bitcode file after import.
- auto SaveTempPath = TempDir + llvm::utostr(count) + Suffix;
+ std::string SaveTempPath = (TempDir + llvm::utostr(count) + Suffix).str();
std::error_code EC;
- raw_fd_ostream OS(SaveTempPath.str(), EC, sys::fs::F_None);
+ raw_fd_ostream OS(SaveTempPath, EC, sys::fs::F_None);
if (EC)
report_fatal_error(Twine("Failed to open ") + SaveTempPath +
" to save optimized bitcode\n");
@@ -128,13 +150,13 @@ static void computePrevailingCopies(
}
static StringMap<MemoryBufferRef>
-generateModuleMap(const std::vector<MemoryBufferRef> &Modules) {
+generateModuleMap(const std::vector<ThinLTOBuffer> &Modules) {
StringMap<MemoryBufferRef> ModuleMap;
for (auto &ModuleBuffer : Modules) {
assert(ModuleMap.find(ModuleBuffer.getBufferIdentifier()) ==
ModuleMap.end() &&
"Expect unique Buffer Identifier");
- ModuleMap[ModuleBuffer.getBufferIdentifier()] = ModuleBuffer;
+ ModuleMap[ModuleBuffer.getBufferIdentifier()] = ModuleBuffer.getMemBuffer();
}
return ModuleMap;
}
@@ -144,22 +166,55 @@ static void promoteModule(Module &TheModule, const ModuleSummaryIndex &Index) {
report_fatal_error("renameModuleForThinLTO failed");
}
+static std::unique_ptr<Module>
+loadModuleFromBuffer(const MemoryBufferRef &Buffer, LLVMContext &Context,
+ bool Lazy, bool IsImporting) {
+ SMDiagnostic Err;
+ Expected<std::unique_ptr<Module>> ModuleOrErr =
+ Lazy
+ ? getLazyBitcodeModule(Buffer, Context,
+ /* ShouldLazyLoadMetadata */ true, IsImporting)
+ : parseBitcodeFile(Buffer, Context);
+ if (!ModuleOrErr) {
+ handleAllErrors(ModuleOrErr.takeError(), [&](ErrorInfoBase &EIB) {
+ SMDiagnostic Err = SMDiagnostic(Buffer.getBufferIdentifier(),
+ SourceMgr::DK_Error, EIB.message());
+ Err.print("ThinLTO", errs());
+ });
+ report_fatal_error("Can't load module, abort.");
+ }
+ return std::move(ModuleOrErr.get());
+}
+
static void
crossImportIntoModule(Module &TheModule, const ModuleSummaryIndex &Index,
StringMap<MemoryBufferRef> &ModuleMap,
const FunctionImporter::ImportMapTy &ImportList) {
- ModuleLoader Loader(TheModule.getContext(), ModuleMap);
+ auto Loader = [&](StringRef Identifier) {
+ return loadModuleFromBuffer(ModuleMap[Identifier], TheModule.getContext(),
+ /*Lazy=*/true, /*IsImporting*/ true);
+ };
+
FunctionImporter Importer(Index, Loader);
- Importer.importFunctions(TheModule, ImportList);
+ Expected<bool> Result = Importer.importFunctions(TheModule, ImportList);
+ if (!Result) {
+ handleAllErrors(Result.takeError(), [&](ErrorInfoBase &EIB) {
+ SMDiagnostic Err = SMDiagnostic(TheModule.getModuleIdentifier(),
+ SourceMgr::DK_Error, EIB.message());
+ Err.print("ThinLTO", errs());
+ });
+ report_fatal_error("importFunctions failed");
+ }
}
-static void optimizeModule(Module &TheModule, TargetMachine &TM) {
+static void optimizeModule(Module &TheModule, TargetMachine &TM,
+ unsigned OptLevel) {
// Populate the PassManager
PassManagerBuilder PMB;
PMB.LibraryInfo = new TargetLibraryInfoImpl(TM.getTargetTriple());
PMB.Inliner = createFunctionInliningPass();
// FIXME: should get it from the bitcode?
- PMB.OptLevel = 3;
+ PMB.OptLevel = OptLevel;
PMB.LoopVectorize = true;
PMB.SLPVectorize = true;
PMB.VerifyInput = true;
@@ -229,25 +284,65 @@ public:
const FunctionImporter::ExportSetTy &ExportList,
const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR,
const GVSummaryMapTy &DefinedFunctions,
- const DenseSet<GlobalValue::GUID> &PreservedSymbols) {
+ const DenseSet<GlobalValue::GUID> &PreservedSymbols, unsigned OptLevel,
+ const TargetMachineBuilder &TMBuilder) {
if (CachePath.empty())
return;
+ if (!Index.modulePaths().count(ModuleID))
+ // The module does not have an entry, it can't have a hash at all
+ return;
+
// Compute the unique hash for this entry
// This is based on the current compiler version, the module itself, the
// export list, the hash for every single module in the import list, the
// list of ResolvedODR for the module, and the list of preserved symbols.
+ // Include the hash for the current module
+ auto ModHash = Index.getModuleHash(ModuleID);
+
+ if (all_of(ModHash, [](uint32_t V) { return V == 0; }))
+ // No hash entry, no caching!
+ return;
+
SHA1 Hasher;
+ // Include the parts of the LTO configuration that affect code generation.
+ auto AddString = [&](StringRef Str) {
+ Hasher.update(Str);
+ Hasher.update(ArrayRef<uint8_t>{0});
+ };
+ auto AddUnsigned = [&](unsigned I) {
+ uint8_t Data[4];
+ Data[0] = I;
+ Data[1] = I >> 8;
+ Data[2] = I >> 16;
+ Data[3] = I >> 24;
+ Hasher.update(ArrayRef<uint8_t>{Data, 4});
+ };
+
// Start with the compiler revision
Hasher.update(LLVM_VERSION_STRING);
#ifdef HAVE_LLVM_REVISION
Hasher.update(LLVM_REVISION);
#endif
- // Include the hash for the current module
- auto ModHash = Index.getModuleHash(ModuleID);
+ // Hash the optimization level and the target machine settings.
+ AddString(TMBuilder.MCpu);
+ // FIXME: Hash more of Options. For now all clients initialize Options from
+ // command-line flags (which is unsupported in production), but may set
+ // RelaxELFRelocations. The clang driver can also pass FunctionSections,
+ // DataSections and DebuggerTuning via command line flags.
+ AddUnsigned(TMBuilder.Options.RelaxELFRelocations);
+ AddUnsigned(TMBuilder.Options.FunctionSections);
+ AddUnsigned(TMBuilder.Options.DataSections);
+ AddUnsigned((unsigned)TMBuilder.Options.DebuggerTuning);
+ AddString(TMBuilder.MAttr);
+ if (TMBuilder.RelocModel)
+ AddUnsigned(*TMBuilder.RelocModel);
+ AddUnsigned(TMBuilder.CGOptLevel);
+ AddUnsigned(OptLevel);
+
Hasher.update(ArrayRef<uint8_t>((uint8_t *)&ModHash[0], sizeof(ModHash)));
for (auto F : ExportList)
// The export list can impact the internalization, be conservative here
@@ -288,10 +383,9 @@ public:
}
// Cache the Produced object file
- std::unique_ptr<MemoryBuffer>
- write(std::unique_ptr<MemoryBuffer> OutputBuffer) {
+ void write(const MemoryBuffer &OutputBuffer) {
if (EntryPath.empty())
- return OutputBuffer;
+ return;
// Write to a temporary to avoid race condition
SmallString<128> TempFilename;
@@ -304,7 +398,7 @@ public:
}
{
raw_fd_ostream OS(TempFD, /* ShouldClose */ true);
- OS << OutputBuffer->getBuffer();
+ OS << OutputBuffer.getBuffer();
}
// Rename to final destination (hopefully race condition won't matter here)
EC = sys::fs::rename(TempFilename, EntryPath);
@@ -314,16 +408,8 @@ public:
if (EC)
report_fatal_error(Twine("Failed to open ") + EntryPath +
" to save cached entry\n");
- OS << OutputBuffer->getBuffer();
+ OS << OutputBuffer.getBuffer();
}
- auto ReloadedBufferOrErr = MemoryBuffer::getFile(EntryPath);
- if (auto EC = ReloadedBufferOrErr.getError()) {
- // FIXME diagnose
- errs() << "error: can't reload cached file '" << EntryPath
- << "': " << EC.message() << "\n";
- return OutputBuffer;
- }
- return std::move(*ReloadedBufferOrErr);
}
};
@@ -336,7 +422,7 @@ ProcessThinLTOModule(Module &TheModule, ModuleSummaryIndex &Index,
const GVSummaryMapTy &DefinedGlobals,
const ThinLTOCodeGenerator::CachingOptions &CacheOptions,
bool DisableCodeGen, StringRef SaveTempsDir,
- unsigned count) {
+ unsigned OptLevel, unsigned count) {
// "Benchmark"-like optimization: single-source case
bool SingleModule = (ModuleMap.size() == 1);
@@ -368,7 +454,7 @@ ProcessThinLTOModule(Module &TheModule, ModuleSummaryIndex &Index,
saveTempBitcode(TheModule, SaveTempsDir, count, ".3.imported.bc");
}
- optimizeModule(TheModule, TM);
+ optimizeModule(TheModule, TM, OptLevel);
saveTempBitcode(TheModule, SaveTempsDir, count, ".4.opt.bc");
@@ -377,8 +463,9 @@ ProcessThinLTOModule(Module &TheModule, ModuleSummaryIndex &Index,
SmallVector<char, 128> OutputBuffer;
{
raw_svector_ostream OS(OutputBuffer);
- ModuleSummaryIndexBuilder IndexBuilder(&TheModule);
- WriteBitcodeToFile(&TheModule, OS, true, &IndexBuilder.getIndex());
+ ProfileSummaryInfo PSI(TheModule);
+ auto Index = buildModuleSummaryIndex(TheModule, nullptr, nullptr);
+ WriteBitcodeToFile(&TheModule, OS, true, &Index);
}
return make_unique<ObjectMemoryBuffer>(std::move(OutputBuffer));
}
@@ -435,18 +522,27 @@ static void initTMBuilder(TargetMachineBuilder &TMBuilder,
} // end anonymous namespace
void ThinLTOCodeGenerator::addModule(StringRef Identifier, StringRef Data) {
- MemoryBufferRef Buffer(Data, Identifier);
+ ThinLTOBuffer Buffer(Data, Identifier);
if (Modules.empty()) {
// First module added, so initialize the triple and some options
LLVMContext Context;
- Triple TheTriple(getBitcodeTargetTriple(Buffer, Context));
+ StringRef TripleStr;
+ ErrorOr<std::string> TripleOrErr = expectedToErrorOrAndEmitErrors(
+ Context, getBitcodeTargetTriple(Buffer.getMemBuffer()));
+ if (TripleOrErr)
+ TripleStr = *TripleOrErr;
+ Triple TheTriple(TripleStr);
initTMBuilder(TMBuilder, Triple(TheTriple));
}
#ifndef NDEBUG
else {
LLVMContext Context;
- assert(TMBuilder.TheTriple.str() ==
- getBitcodeTargetTriple(Buffer, Context) &&
+ StringRef TripleStr;
+ ErrorOr<std::string> TripleOrErr = expectedToErrorOrAndEmitErrors(
+ Context, getBitcodeTargetTriple(Buffer.getMemBuffer()));
+ if (TripleOrErr)
+ TripleStr = *TripleOrErr;
+ assert(TMBuilder.TheTriple.str() == TripleStr &&
"ThinLTO modules with different triple not supported");
}
#endif
@@ -477,6 +573,7 @@ std::unique_ptr<TargetMachine> TargetMachineBuilder::create() const {
SubtargetFeatures Features(MAttr);
Features.getDefaultSubtargetFeatures(TheTriple);
std::string FeatureStr = Features.getString();
+
return std::unique_ptr<TargetMachine>(TheTarget->createTargetMachine(
TheTriple.str(), MCpu, FeatureStr, Options, RelocModel,
CodeModel::Default, CGOptLevel));
@@ -490,13 +587,14 @@ std::unique_ptr<ModuleSummaryIndex> ThinLTOCodeGenerator::linkCombinedIndex() {
std::unique_ptr<ModuleSummaryIndex> CombinedIndex;
uint64_t NextModuleId = 0;
for (auto &ModuleBuffer : Modules) {
- ErrorOr<std::unique_ptr<object::ModuleSummaryIndexObjectFile>> ObjOrErr =
- object::ModuleSummaryIndexObjectFile::create(ModuleBuffer,
- diagnosticHandler);
- if (std::error_code EC = ObjOrErr.getError()) {
+ Expected<std::unique_ptr<object::ModuleSummaryIndexObjectFile>> ObjOrErr =
+ object::ModuleSummaryIndexObjectFile::create(
+ ModuleBuffer.getMemBuffer());
+ if (!ObjOrErr) {
// FIXME diagnose
- errs() << "error: can't create ModuleSummaryIndexObjectFile for buffer: "
- << EC.message() << "\n";
+ logAllUnhandledErrors(
+ ObjOrErr.takeError(), errs(),
+ "error: can't create ModuleSummaryIndexObjectFile for buffer: ");
return nullptr;
}
auto Index = (*ObjOrErr)->takeIndex();
@@ -517,15 +615,23 @@ void ThinLTOCodeGenerator::promote(Module &TheModule,
ModuleSummaryIndex &Index) {
auto ModuleCount = Index.modulePaths().size();
auto ModuleIdentifier = TheModule.getModuleIdentifier();
+
// Collect for each module the list of function it defines (GUID -> Summary).
StringMap<GVSummaryMapTy> ModuleToDefinedGVSummaries;
Index.collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries);
+ // Convert the preserved symbols set from string to GUID
+ auto GUIDPreservedSymbols = computeGUIDPreservedSymbols(
+ PreservedSymbols, Triple(TheModule.getTargetTriple()));
+
+ // Compute "dead" symbols, we don't want to import/export these!
+ auto DeadSymbols = computeDeadSymbols(Index, GUIDPreservedSymbols);
+
// Generate import/export list
StringMap<FunctionImporter::ImportMapTy> ImportLists(ModuleCount);
StringMap<FunctionImporter::ExportSetTy> ExportLists(ModuleCount);
ComputeCrossModuleImport(Index, ModuleToDefinedGVSummaries, ImportLists,
- ExportLists);
+ ExportLists, &DeadSymbols);
// Resolve LinkOnce/Weak symbols.
StringMap<std::map<GlobalValue::GUID, GlobalValue::LinkageTypes>> ResolvedODR;
@@ -534,6 +640,16 @@ void ThinLTOCodeGenerator::promote(Module &TheModule,
thinLTOResolveWeakForLinkerModule(
TheModule, ModuleToDefinedGVSummaries[ModuleIdentifier]);
+ // Promote the exported values in the index, so that they are promoted
+ // in the module.
+ auto isExported = [&](StringRef ModuleIdentifier, GlobalValue::GUID GUID) {
+ const auto &ExportList = ExportLists.find(ModuleIdentifier);
+ return (ExportList != ExportLists.end() &&
+ ExportList->second.count(GUID)) ||
+ GUIDPreservedSymbols.count(GUID);
+ };
+ thinLTOInternalizeAndPromoteInIndex(Index, isExported);
+
promoteModule(TheModule, Index);
}
@@ -549,11 +665,18 @@ void ThinLTOCodeGenerator::crossModuleImport(Module &TheModule,
StringMap<GVSummaryMapTy> ModuleToDefinedGVSummaries(ModuleCount);
Index.collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries);
+ // Convert the preserved symbols set from string to GUID
+ auto GUIDPreservedSymbols = computeGUIDPreservedSymbols(
+ PreservedSymbols, Triple(TheModule.getTargetTriple()));
+
+ // Compute "dead" symbols, we don't want to import/export these!
+ auto DeadSymbols = computeDeadSymbols(Index, GUIDPreservedSymbols);
+
// Generate import/export list
StringMap<FunctionImporter::ImportMapTy> ImportLists(ModuleCount);
StringMap<FunctionImporter::ExportSetTy> ExportLists(ModuleCount);
ComputeCrossModuleImport(Index, ModuleToDefinedGVSummaries, ImportLists,
- ExportLists);
+ ExportLists, &DeadSymbols);
auto &ImportList = ImportLists[TheModule.getModuleIdentifier()];
crossImportIntoModule(TheModule, Index, ModuleMap, ImportList);
@@ -578,7 +701,7 @@ void ThinLTOCodeGenerator::gatherImportedSummariesForModule(
ExportLists);
llvm::gatherImportedSummariesForModule(ModulePath, ModuleToDefinedGVSummaries,
- ImportLists,
+ ImportLists[ModulePath],
ModuleToSummariesForIndex);
}
@@ -601,7 +724,7 @@ void ThinLTOCodeGenerator::emitImports(StringRef ModulePath,
ExportLists);
std::error_code EC;
- if ((EC = EmitImportsFiles(ModulePath, OutputName, ImportLists)))
+ if ((EC = EmitImportsFiles(ModulePath, OutputName, ImportLists[ModulePath])))
report_fatal_error(Twine("Failed to open ") + OutputName +
" to save imports lists\n");
}
@@ -623,11 +746,14 @@ void ThinLTOCodeGenerator::internalize(Module &TheModule,
StringMap<GVSummaryMapTy> ModuleToDefinedGVSummaries(ModuleCount);
Index.collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries);
+ // Compute "dead" symbols, we don't want to import/export these!
+ auto DeadSymbols = computeDeadSymbols(Index, GUIDPreservedSymbols);
+
// Generate import/export list
StringMap<FunctionImporter::ImportMapTy> ImportLists(ModuleCount);
StringMap<FunctionImporter::ExportSetTy> ExportLists(ModuleCount);
ComputeCrossModuleImport(Index, ModuleToDefinedGVSummaries, ImportLists,
- ExportLists);
+ ExportLists, &DeadSymbols);
auto &ExportList = ExportLists[ModuleIdentifier];
// Be friendly and don't nuke totally the module when the client didn't
@@ -654,7 +780,7 @@ void ThinLTOCodeGenerator::optimize(Module &TheModule) {
initTMBuilder(TMBuilder, Triple(TheModule.getTargetTriple()));
// Optimize now
- optimizeModule(TheModule, *TMBuilder.create());
+ optimizeModule(TheModule, *TMBuilder.create(), OptLevel);
}
/**
@@ -665,13 +791,61 @@ std::unique_ptr<MemoryBuffer> ThinLTOCodeGenerator::codegen(Module &TheModule) {
return codegenModule(TheModule, *TMBuilder.create());
}
+/// Write out the generated object file, either from CacheEntryPath or from
+/// OutputBuffer, preferring hard-link when possible.
+/// Returns the path to the generated file in SavedObjectsDirectoryPath.
+static std::string writeGeneratedObject(int count, StringRef CacheEntryPath,
+ StringRef SavedObjectsDirectoryPath,
+ const MemoryBuffer &OutputBuffer) {
+ SmallString<128> OutputPath(SavedObjectsDirectoryPath);
+ llvm::sys::path::append(OutputPath, Twine(count) + ".thinlto.o");
+ OutputPath.c_str(); // Ensure the string is null terminated.
+ if (sys::fs::exists(OutputPath))
+ sys::fs::remove(OutputPath);
+
+ // We don't return a memory buffer to the linker, just a list of files.
+ if (!CacheEntryPath.empty()) {
+ // Cache is enabled, hard-link the entry (or copy if hard-link fails).
+ auto Err = sys::fs::create_hard_link(CacheEntryPath, OutputPath);
+ if (!Err)
+ return OutputPath.str();
+ // Hard linking failed, try to copy.
+ Err = sys::fs::copy_file(CacheEntryPath, OutputPath);
+ if (!Err)
+ return OutputPath.str();
+ // Copy failed (could be because the CacheEntry was removed from the cache
+ // in the meantime by another process), fall back and try to write down the
+ // buffer to the output.
+ errs() << "error: can't link or copy from cached entry '" << CacheEntryPath
+ << "' to '" << OutputPath << "'\n";
+ }
+ // No cache entry, just write out the buffer.
+ std::error_code Err;
+ raw_fd_ostream OS(OutputPath, Err, sys::fs::F_None);
+ if (Err)
+ report_fatal_error("Can't open output '" + OutputPath + "'\n");
+ OS << OutputBuffer.getBuffer();
+ return OutputPath.str();
+}
+
// Main entry point for the ThinLTO processing
void ThinLTOCodeGenerator::run() {
+ // Prepare the resulting object vector
+ assert(ProducedBinaries.empty() && "The generator should not be reused");
+ if (SavedObjectsDirectoryPath.empty())
+ ProducedBinaries.resize(Modules.size());
+ else {
+ sys::fs::create_directories(SavedObjectsDirectoryPath);
+ bool IsDir;
+ sys::fs::is_directory(SavedObjectsDirectoryPath, IsDir);
+ if (!IsDir)
+ report_fatal_error("Unexistent dir: '" + SavedObjectsDirectoryPath + "'");
+ ProducedBinaryFiles.resize(Modules.size());
+ }
+
if (CodeGenOnly) {
// Perform only parallel codegen and return.
ThreadPool Pool;
- assert(ProducedBinaries.empty() && "The generator should not be reused");
- ProducedBinaries.resize(Modules.size());
int count = 0;
for (auto &ModuleBuffer : Modules) {
Pool.async([&](int count) {
@@ -679,10 +853,17 @@ void ThinLTOCodeGenerator::run() {
Context.setDiscardValueNames(LTODiscardValueNames);
// Parse module now
- auto TheModule = loadModuleFromBuffer(ModuleBuffer, Context, false);
+ auto TheModule =
+ loadModuleFromBuffer(ModuleBuffer.getMemBuffer(), Context, false,
+ /*IsImporting*/ false);
// CodeGen
- ProducedBinaries[count] = codegen(*TheModule);
+ auto OutputBuffer = codegen(*TheModule);
+ if (SavedObjectsDirectoryPath.empty())
+ ProducedBinaries[count] = std::move(OutputBuffer);
+ else
+ ProducedBinaryFiles[count] = writeGeneratedObject(
+ count, "", SavedObjectsDirectoryPath, *OutputBuffer);
}, count++);
}
@@ -703,9 +884,6 @@ void ThinLTOCodeGenerator::run() {
WriteIndexToFile(*Index, OS);
}
- // Prepare the resulting object vector
- assert(ProducedBinaries.empty() && "The generator should not be reused");
- ProducedBinaries.resize(Modules.size());
// Prepare the module map.
auto ModuleMap = generateModuleMap(Modules);
@@ -715,17 +893,20 @@ void ThinLTOCodeGenerator::run() {
StringMap<GVSummaryMapTy> ModuleToDefinedGVSummaries(ModuleCount);
Index->collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries);
+ // Convert the preserved symbols set from string to GUID, this is needed for
+ // computing the caching hash and the internalization.
+ auto GUIDPreservedSymbols =
+ computeGUIDPreservedSymbols(PreservedSymbols, TMBuilder.TheTriple);
+
+ // Compute "dead" symbols, we don't want to import/export these!
+ auto DeadSymbols = computeDeadSymbols(*Index, GUIDPreservedSymbols);
+
// Collect the import/export lists for all modules from the call-graph in the
// combined index.
StringMap<FunctionImporter::ImportMapTy> ImportLists(ModuleCount);
StringMap<FunctionImporter::ExportSetTy> ExportLists(ModuleCount);
ComputeCrossModuleImport(*Index, ModuleToDefinedGVSummaries, ImportLists,
- ExportLists);
-
- // Convert the preserved symbols set from string to GUID, this is needed for
- // computing the caching hash and the internalization.
- auto GUIDPreservedSymbols =
- computeGUIDPreservedSymbols(PreservedSymbols, TMBuilder.TheTriple);
+ ExportLists, &DeadSymbols);
// We use a std::map here to be able to have a defined ordering when
// producing a hash for the cache entry.
@@ -764,8 +945,8 @@ void ThinLTOCodeGenerator::run() {
std::iota(ModulesOrdering.begin(), ModulesOrdering.end(), 0);
std::sort(ModulesOrdering.begin(), ModulesOrdering.end(),
[&](int LeftIndex, int RightIndex) {
- auto LSize = Modules[LeftIndex].getBufferSize();
- auto RSize = Modules[RightIndex].getBufferSize();
+ auto LSize = Modules[LeftIndex].getBuffer().size();
+ auto RSize = Modules[RightIndex].getBuffer().size();
return LSize > RSize;
});
@@ -784,17 +965,24 @@ void ThinLTOCodeGenerator::run() {
ModuleCacheEntry CacheEntry(CacheOptions.Path, *Index, ModuleIdentifier,
ImportLists[ModuleIdentifier], ExportList,
ResolvedODR[ModuleIdentifier],
- DefinedFunctions, GUIDPreservedSymbols);
+ DefinedFunctions, GUIDPreservedSymbols,
+ OptLevel, TMBuilder);
+ auto CacheEntryPath = CacheEntry.getEntryPath();
{
auto ErrOrBuffer = CacheEntry.tryLoadingBuffer();
DEBUG(dbgs() << "Cache " << (ErrOrBuffer ? "hit" : "miss") << " '"
- << CacheEntry.getEntryPath() << "' for buffer " << count
- << " " << ModuleIdentifier << "\n");
+ << CacheEntryPath << "' for buffer " << count << " "
+ << ModuleIdentifier << "\n");
if (ErrOrBuffer) {
// Cache Hit!
- ProducedBinaries[count] = std::move(ErrOrBuffer.get());
+ if (SavedObjectsDirectoryPath.empty())
+ ProducedBinaries[count] = std::move(ErrOrBuffer.get());
+ else
+ ProducedBinaryFiles[count] = writeGeneratedObject(
+ count, CacheEntryPath, SavedObjectsDirectoryPath,
+ *ErrOrBuffer.get());
return;
}
}
@@ -802,9 +990,17 @@ void ThinLTOCodeGenerator::run() {
LLVMContext Context;
Context.setDiscardValueNames(LTODiscardValueNames);
Context.enableDebugTypeODRUniquing();
+ auto DiagFileOrErr = setupOptimizationRemarks(Context, count);
+ if (!DiagFileOrErr) {
+ errs() << "Error: " << toString(DiagFileOrErr.takeError()) << "\n";
+ report_fatal_error("ThinLTO: Can't get an output file for the "
+ "remarks");
+ }
// Parse module now
- auto TheModule = loadModuleFromBuffer(ModuleBuffer, Context, false);
+ auto TheModule =
+ loadModuleFromBuffer(ModuleBuffer.getMemBuffer(), Context, false,
+ /*IsImporting*/ false);
// Save temps: original file.
saveTempBitcode(*TheModule, SaveTempsDir, count, ".0.original.bc");
@@ -815,17 +1011,41 @@ void ThinLTOCodeGenerator::run() {
*TheModule, *Index, ModuleMap, *TMBuilder.create(), ImportList,
ExportList, GUIDPreservedSymbols,
ModuleToDefinedGVSummaries[ModuleIdentifier], CacheOptions,
- DisableCodeGen, SaveTempsDir, count);
-
- OutputBuffer = CacheEntry.write(std::move(OutputBuffer));
- ProducedBinaries[count] = std::move(OutputBuffer);
+ DisableCodeGen, SaveTempsDir, OptLevel, count);
+
+ // Commit to the cache (if enabled)
+ CacheEntry.write(*OutputBuffer);
+
+ if (SavedObjectsDirectoryPath.empty()) {
+ // We need to generated a memory buffer for the linker.
+ if (!CacheEntryPath.empty()) {
+ // Cache is enabled, reload from the cache
+ // We do this to lower memory pressuree: the buffer is on the heap
+ // and releasing it frees memory that can be used for the next input
+ // file. The final binary link will read from the VFS cache
+ // (hopefully!) or from disk if the memory pressure wasn't too high.
+ auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer();
+ if (auto EC = ReloadedBufferOrErr.getError()) {
+ // On error, keeping the preexisting buffer and printing a
+ // diagnostic is more friendly than just crashing.
+ errs() << "error: can't reload cached file '" << CacheEntryPath
+ << "': " << EC.message() << "\n";
+ } else {
+ OutputBuffer = std::move(*ReloadedBufferOrErr);
+ }
+ }
+ ProducedBinaries[count] = std::move(OutputBuffer);
+ return;
+ }
+ ProducedBinaryFiles[count] = writeGeneratedObject(
+ count, CacheEntryPath, SavedObjectsDirectoryPath, *OutputBuffer);
}, IndexCount);
}
}
CachePruning(CacheOptions.Path)
- .setPruningInterval(CacheOptions.PruningInterval)
- .setEntryExpiration(CacheOptions.Expiration)
+ .setPruningInterval(std::chrono::seconds(CacheOptions.PruningInterval))
+ .setEntryExpiration(std::chrono::seconds(CacheOptions.Expiration))
.setMaxSize(CacheOptions.MaxPercentageOfAvailableSpace)
.prune();
OpenPOWER on IntegriCloud