diff options
author | dim <dim@FreeBSD.org> | 2017-04-02 17:24:58 +0000 |
---|---|---|
committer | dim <dim@FreeBSD.org> | 2017-04-02 17:24:58 +0000 |
commit | 60b571e49a90d38697b3aca23020d9da42fc7d7f (patch) | |
tree | 99351324c24d6cb146b6285b6caffa4d26fce188 /contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp | |
parent | bea1b22c7a9bce1dfdd73e6e5b65bc4752215180 (diff) | |
download | FreeBSD-src-60b571e49a90d38697b3aca23020d9da42fc7d7f.zip FreeBSD-src-60b571e49a90d38697b3aca23020d9da42fc7d7f.tar.gz |
Update clang, llvm, lld, lldb, compiler-rt and libc++ to 4.0.0 release:
MFC r309142 (by emaste):
Add WITH_LLD_AS_LD build knob
If set it installs LLD as /usr/bin/ld. LLD (as of version 3.9) is not
capable of linking the world and kernel, but can self-host and link many
substantial applications. GNU ld continues to be used for the world and
kernel build, regardless of how this knob is set.
It is on by default for arm64, and off for all other CPU architectures.
Sponsored by: The FreeBSD Foundation
MFC r310840:
Reapply 310775, now it also builds correctly if lldb is disabled:
Move llvm-objdump from CLANG_EXTRAS to installed by default
We currently install three tools from binutils 2.17.50: as, ld, and
objdump. Work is underway to migrate to a permissively-licensed
tool-chain, with one goal being the retirement of binutils 2.17.50.
LLVM's llvm-objdump is intended to be compatible with GNU objdump
although it is currently missing some options and may have formatting
differences. Enable it by default for testing and further investigation.
It may later be changed to install as /usr/bin/objdump, it becomes a
fully viable replacement.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D8879
MFC r312855 (by emaste):
Rename LLD_AS_LD to LLD_IS_LD, for consistency with CLANG_IS_CC
Reported by: Dan McGregor <dan.mcgregor usask.ca>
MFC r313559 | glebius | 2017-02-10 18:34:48 +0100 (Fri, 10 Feb 2017) | 5 lines
Don't check struct rtentry on FreeBSD, it is an internal kernel structure.
On other systems it may be API structure for SIOCADDRT/SIOCDELRT.
Reviewed by: emaste, dim
MFC r314152 (by jkim):
Remove an assembler flag, which is redundant since r309124. The upstream
took care of it by introducing a macro NO_EXEC_STACK_DIRECTIVE.
http://llvm.org/viewvc/llvm-project?rev=273500&view=rev
Reviewed by: dim
MFC r314564:
Upgrade our copies of clang, llvm, lld, lldb, compiler-rt and libc++ to
4.0.0 (branches/release_40 296509). The release will follow soon.
Please note that from 3.5.0 onwards, clang, llvm and lldb require C++11
support to build; see UPDATING for more information.
Also note that as of 4.0.0, lld should be able to link the base system
on amd64 and aarch64. See the WITH_LLD_IS_LLD setting in src.conf(5).
Though please be aware that this is work in progress.
Release notes for llvm, clang and lld will be available here:
<http://releases.llvm.org/4.0.0/docs/ReleaseNotes.html>
<http://releases.llvm.org/4.0.0/tools/clang/docs/ReleaseNotes.html>
<http://releases.llvm.org/4.0.0/tools/lld/docs/ReleaseNotes.html>
Thanks to Ed Maste, Jan Beich, Antoine Brodin and Eric Fiselier for
their help.
Relnotes: yes
Exp-run: antoine
PR: 215969, 216008
MFC r314708:
For now, revert r287232 from upstream llvm trunk (by Daniil Fukalov):
[SCEV] limit recursion depth of CompareSCEVComplexity
Summary:
CompareSCEVComplexity goes too deep (50+ on a quite a big unrolled
loop) and runs almost infinite time.
Added cache of "equal" SCEV pairs to earlier cutoff of further
estimation. Recursion depth limit was also introduced as a parameter.
Reviewers: sanjoy
Subscribers: mzolotukhin, tstellarAMD, llvm-commits
Differential Revision: https://reviews.llvm.org/D26389
This commit is the cause of excessive compile times on skein_block.c
(and possibly other files) during kernel builds on amd64.
We never saw the problematic behavior described in this upstream commit,
so for now it is better to revert it. An upstream bug has been filed
here: https://bugs.llvm.org/show_bug.cgi?id=32142
Reported by: mjg
MFC r314795:
Reapply r287232 from upstream llvm trunk (by Daniil Fukalov):
[SCEV] limit recursion depth of CompareSCEVComplexity
Summary:
CompareSCEVComplexity goes too deep (50+ on a quite a big unrolled
loop) and runs almost infinite time.
Added cache of "equal" SCEV pairs to earlier cutoff of further
estimation. Recursion depth limit was also introduced as a parameter.
Reviewers: sanjoy
Subscribers: mzolotukhin, tstellarAMD, llvm-commits
Differential Revision: https://reviews.llvm.org/D26389
Pull in r296992 from upstream llvm trunk (by Sanjoy Das):
[SCEV] Decrease the recursion threshold for CompareValueComplexity
Fixes PR32142.
r287232 accidentally increased the recursion threshold for
CompareValueComplexity from 2 to 32. This change reverses that
change by introducing a separate flag for CompareValueComplexity's
threshold.
The latter revision fixes the excessive compile times for skein_block.c.
MFC r314907 | mmel | 2017-03-08 12:40:27 +0100 (Wed, 08 Mar 2017) | 7 lines
Unbreak ARMv6 world.
The new compiler_rt library imported with clang 4.0.0 have several fatal
issues (non-functional __udivsi3 for example) with ARM specific instrict
functions. As temporary workaround, until upstream solve these problems,
disable all thumb[1][2] related feature.
MFC r315016:
Update clang, llvm, lld, lldb, compiler-rt and libc++ to 4.0.0 release.
We were already very close to the last release candidate, so this is a
pretty minor update.
Relnotes: yes
MFC r316005:
Revert r314907, and pull in r298713 from upstream compiler-rt trunk (by
Weiming Zhao):
builtins: Select correct code fragments when compiling for Thumb1/Thum2/ARM ISA.
Summary:
Value of __ARM_ARCH_ISA_THUMB isn't based on the actual compilation
mode (-mthumb, -marm), it reflect's capability of given CPU.
Due to this:
- use __tbumb__ and __thumb2__ insteand of __ARM_ARCH_ISA_THUMB
- use '.thumb' directive consistently in all affected files
- decorate all thumb functions using
DEFINE_COMPILERRT_THUMB_FUNCTION()
---------
Note: This patch doesn't fix broken Thumb1 variant of __udivsi3 !
Reviewers: weimingz, rengolin, compnerd
Subscribers: aemerson, dim
Differential Revision: https://reviews.llvm.org/D30938
Discussed with: mmel
Diffstat (limited to 'contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp')
-rw-r--r-- | contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp | 374 |
1 files changed, 297 insertions, 77 deletions
diff --git a/contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp b/contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp index bfb0980..40537e4 100644 --- a/contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp +++ b/contrib/llvm/lib/LTO/ThinLTOCodeGenerator.cpp @@ -21,10 +21,12 @@ #include "llvm/ADT/Statistic.h" #include "llvm/ADT/StringExtras.h" #include "llvm/Analysis/ModuleSummaryAnalysis.h" +#include "llvm/Analysis/ProfileSummaryInfo.h" #include "llvm/Analysis/TargetLibraryInfo.h" #include "llvm/Analysis/TargetTransformInfo.h" +#include "llvm/Bitcode/BitcodeReader.h" +#include "llvm/Bitcode/BitcodeWriter.h" #include "llvm/Bitcode/BitcodeWriterPass.h" -#include "llvm/Bitcode/ReaderWriter.h" #include "llvm/ExecutionEngine/ObjectMemoryBuffer.h" #include "llvm/IR/DiagnosticPrinter.h" #include "llvm/IR/LLVMContext.h" @@ -38,10 +40,13 @@ #include "llvm/Object/ModuleSummaryIndexObjectFile.h" #include "llvm/Support/CachePruning.h" #include "llvm/Support/Debug.h" +#include "llvm/Support/Error.h" #include "llvm/Support/Path.h" #include "llvm/Support/SHA1.h" #include "llvm/Support/TargetRegistry.h" #include "llvm/Support/ThreadPool.h" +#include "llvm/Support/Threading.h" +#include "llvm/Support/ToolOutputFile.h" #include "llvm/Target/TargetMachine.h" #include "llvm/Transforms/IPO.h" #include "llvm/Transforms/IPO/FunctionImport.h" @@ -59,17 +64,34 @@ using namespace llvm; namespace llvm { // Flags -discard-value-names, defined in LTOCodeGenerator.cpp extern cl::opt<bool> LTODiscardValueNames; +extern cl::opt<std::string> LTORemarksFilename; +extern cl::opt<bool> LTOPassRemarksWithHotness; } namespace { -static cl::opt<int> ThreadCount("threads", - cl::init(std::thread::hardware_concurrency())); +static cl::opt<int> + ThreadCount("threads", cl::init(llvm::heavyweight_hardware_concurrency())); -static void diagnosticHandler(const DiagnosticInfo &DI) { - DiagnosticPrinterRawOStream DP(errs()); - DI.print(DP); - errs() << '\n'; +Expected<std::unique_ptr<tool_output_file>> +setupOptimizationRemarks(LLVMContext &Ctx, int Count) { + if (LTOPassRemarksWithHotness) + Ctx.setDiagnosticHotnessRequested(true); + + if (LTORemarksFilename.empty()) + return nullptr; + + std::string FileName = + LTORemarksFilename + ".thin." + llvm::utostr(Count) + ".yaml"; + std::error_code EC; + auto DiagnosticOutputFile = + llvm::make_unique<tool_output_file>(FileName, EC, sys::fs::F_None); + if (EC) + return errorCodeToError(EC); + Ctx.setDiagnosticsOutputFile( + llvm::make_unique<yaml::Output>(DiagnosticOutputFile->os())); + DiagnosticOutputFile->keep(); + return std::move(DiagnosticOutputFile); } // Simple helper to save temporary files for debug. @@ -78,9 +100,9 @@ static void saveTempBitcode(const Module &TheModule, StringRef TempDir, if (TempDir.empty()) return; // User asked to save temps, let dump the bitcode file after import. - auto SaveTempPath = TempDir + llvm::utostr(count) + Suffix; + std::string SaveTempPath = (TempDir + llvm::utostr(count) + Suffix).str(); std::error_code EC; - raw_fd_ostream OS(SaveTempPath.str(), EC, sys::fs::F_None); + raw_fd_ostream OS(SaveTempPath, EC, sys::fs::F_None); if (EC) report_fatal_error(Twine("Failed to open ") + SaveTempPath + " to save optimized bitcode\n"); @@ -128,13 +150,13 @@ static void computePrevailingCopies( } static StringMap<MemoryBufferRef> -generateModuleMap(const std::vector<MemoryBufferRef> &Modules) { +generateModuleMap(const std::vector<ThinLTOBuffer> &Modules) { StringMap<MemoryBufferRef> ModuleMap; for (auto &ModuleBuffer : Modules) { assert(ModuleMap.find(ModuleBuffer.getBufferIdentifier()) == ModuleMap.end() && "Expect unique Buffer Identifier"); - ModuleMap[ModuleBuffer.getBufferIdentifier()] = ModuleBuffer; + ModuleMap[ModuleBuffer.getBufferIdentifier()] = ModuleBuffer.getMemBuffer(); } return ModuleMap; } @@ -144,22 +166,55 @@ static void promoteModule(Module &TheModule, const ModuleSummaryIndex &Index) { report_fatal_error("renameModuleForThinLTO failed"); } +static std::unique_ptr<Module> +loadModuleFromBuffer(const MemoryBufferRef &Buffer, LLVMContext &Context, + bool Lazy, bool IsImporting) { + SMDiagnostic Err; + Expected<std::unique_ptr<Module>> ModuleOrErr = + Lazy + ? getLazyBitcodeModule(Buffer, Context, + /* ShouldLazyLoadMetadata */ true, IsImporting) + : parseBitcodeFile(Buffer, Context); + if (!ModuleOrErr) { + handleAllErrors(ModuleOrErr.takeError(), [&](ErrorInfoBase &EIB) { + SMDiagnostic Err = SMDiagnostic(Buffer.getBufferIdentifier(), + SourceMgr::DK_Error, EIB.message()); + Err.print("ThinLTO", errs()); + }); + report_fatal_error("Can't load module, abort."); + } + return std::move(ModuleOrErr.get()); +} + static void crossImportIntoModule(Module &TheModule, const ModuleSummaryIndex &Index, StringMap<MemoryBufferRef> &ModuleMap, const FunctionImporter::ImportMapTy &ImportList) { - ModuleLoader Loader(TheModule.getContext(), ModuleMap); + auto Loader = [&](StringRef Identifier) { + return loadModuleFromBuffer(ModuleMap[Identifier], TheModule.getContext(), + /*Lazy=*/true, /*IsImporting*/ true); + }; + FunctionImporter Importer(Index, Loader); - Importer.importFunctions(TheModule, ImportList); + Expected<bool> Result = Importer.importFunctions(TheModule, ImportList); + if (!Result) { + handleAllErrors(Result.takeError(), [&](ErrorInfoBase &EIB) { + SMDiagnostic Err = SMDiagnostic(TheModule.getModuleIdentifier(), + SourceMgr::DK_Error, EIB.message()); + Err.print("ThinLTO", errs()); + }); + report_fatal_error("importFunctions failed"); + } } -static void optimizeModule(Module &TheModule, TargetMachine &TM) { +static void optimizeModule(Module &TheModule, TargetMachine &TM, + unsigned OptLevel) { // Populate the PassManager PassManagerBuilder PMB; PMB.LibraryInfo = new TargetLibraryInfoImpl(TM.getTargetTriple()); PMB.Inliner = createFunctionInliningPass(); // FIXME: should get it from the bitcode? - PMB.OptLevel = 3; + PMB.OptLevel = OptLevel; PMB.LoopVectorize = true; PMB.SLPVectorize = true; PMB.VerifyInput = true; @@ -229,25 +284,65 @@ public: const FunctionImporter::ExportSetTy &ExportList, const std::map<GlobalValue::GUID, GlobalValue::LinkageTypes> &ResolvedODR, const GVSummaryMapTy &DefinedFunctions, - const DenseSet<GlobalValue::GUID> &PreservedSymbols) { + const DenseSet<GlobalValue::GUID> &PreservedSymbols, unsigned OptLevel, + const TargetMachineBuilder &TMBuilder) { if (CachePath.empty()) return; + if (!Index.modulePaths().count(ModuleID)) + // The module does not have an entry, it can't have a hash at all + return; + // Compute the unique hash for this entry // This is based on the current compiler version, the module itself, the // export list, the hash for every single module in the import list, the // list of ResolvedODR for the module, and the list of preserved symbols. + // Include the hash for the current module + auto ModHash = Index.getModuleHash(ModuleID); + + if (all_of(ModHash, [](uint32_t V) { return V == 0; })) + // No hash entry, no caching! + return; + SHA1 Hasher; + // Include the parts of the LTO configuration that affect code generation. + auto AddString = [&](StringRef Str) { + Hasher.update(Str); + Hasher.update(ArrayRef<uint8_t>{0}); + }; + auto AddUnsigned = [&](unsigned I) { + uint8_t Data[4]; + Data[0] = I; + Data[1] = I >> 8; + Data[2] = I >> 16; + Data[3] = I >> 24; + Hasher.update(ArrayRef<uint8_t>{Data, 4}); + }; + // Start with the compiler revision Hasher.update(LLVM_VERSION_STRING); #ifdef HAVE_LLVM_REVISION Hasher.update(LLVM_REVISION); #endif - // Include the hash for the current module - auto ModHash = Index.getModuleHash(ModuleID); + // Hash the optimization level and the target machine settings. + AddString(TMBuilder.MCpu); + // FIXME: Hash more of Options. For now all clients initialize Options from + // command-line flags (which is unsupported in production), but may set + // RelaxELFRelocations. The clang driver can also pass FunctionSections, + // DataSections and DebuggerTuning via command line flags. + AddUnsigned(TMBuilder.Options.RelaxELFRelocations); + AddUnsigned(TMBuilder.Options.FunctionSections); + AddUnsigned(TMBuilder.Options.DataSections); + AddUnsigned((unsigned)TMBuilder.Options.DebuggerTuning); + AddString(TMBuilder.MAttr); + if (TMBuilder.RelocModel) + AddUnsigned(*TMBuilder.RelocModel); + AddUnsigned(TMBuilder.CGOptLevel); + AddUnsigned(OptLevel); + Hasher.update(ArrayRef<uint8_t>((uint8_t *)&ModHash[0], sizeof(ModHash))); for (auto F : ExportList) // The export list can impact the internalization, be conservative here @@ -288,10 +383,9 @@ public: } // Cache the Produced object file - std::unique_ptr<MemoryBuffer> - write(std::unique_ptr<MemoryBuffer> OutputBuffer) { + void write(const MemoryBuffer &OutputBuffer) { if (EntryPath.empty()) - return OutputBuffer; + return; // Write to a temporary to avoid race condition SmallString<128> TempFilename; @@ -304,7 +398,7 @@ public: } { raw_fd_ostream OS(TempFD, /* ShouldClose */ true); - OS << OutputBuffer->getBuffer(); + OS << OutputBuffer.getBuffer(); } // Rename to final destination (hopefully race condition won't matter here) EC = sys::fs::rename(TempFilename, EntryPath); @@ -314,16 +408,8 @@ public: if (EC) report_fatal_error(Twine("Failed to open ") + EntryPath + " to save cached entry\n"); - OS << OutputBuffer->getBuffer(); + OS << OutputBuffer.getBuffer(); } - auto ReloadedBufferOrErr = MemoryBuffer::getFile(EntryPath); - if (auto EC = ReloadedBufferOrErr.getError()) { - // FIXME diagnose - errs() << "error: can't reload cached file '" << EntryPath - << "': " << EC.message() << "\n"; - return OutputBuffer; - } - return std::move(*ReloadedBufferOrErr); } }; @@ -336,7 +422,7 @@ ProcessThinLTOModule(Module &TheModule, ModuleSummaryIndex &Index, const GVSummaryMapTy &DefinedGlobals, const ThinLTOCodeGenerator::CachingOptions &CacheOptions, bool DisableCodeGen, StringRef SaveTempsDir, - unsigned count) { + unsigned OptLevel, unsigned count) { // "Benchmark"-like optimization: single-source case bool SingleModule = (ModuleMap.size() == 1); @@ -368,7 +454,7 @@ ProcessThinLTOModule(Module &TheModule, ModuleSummaryIndex &Index, saveTempBitcode(TheModule, SaveTempsDir, count, ".3.imported.bc"); } - optimizeModule(TheModule, TM); + optimizeModule(TheModule, TM, OptLevel); saveTempBitcode(TheModule, SaveTempsDir, count, ".4.opt.bc"); @@ -377,8 +463,9 @@ ProcessThinLTOModule(Module &TheModule, ModuleSummaryIndex &Index, SmallVector<char, 128> OutputBuffer; { raw_svector_ostream OS(OutputBuffer); - ModuleSummaryIndexBuilder IndexBuilder(&TheModule); - WriteBitcodeToFile(&TheModule, OS, true, &IndexBuilder.getIndex()); + ProfileSummaryInfo PSI(TheModule); + auto Index = buildModuleSummaryIndex(TheModule, nullptr, nullptr); + WriteBitcodeToFile(&TheModule, OS, true, &Index); } return make_unique<ObjectMemoryBuffer>(std::move(OutputBuffer)); } @@ -435,18 +522,27 @@ static void initTMBuilder(TargetMachineBuilder &TMBuilder, } // end anonymous namespace void ThinLTOCodeGenerator::addModule(StringRef Identifier, StringRef Data) { - MemoryBufferRef Buffer(Data, Identifier); + ThinLTOBuffer Buffer(Data, Identifier); if (Modules.empty()) { // First module added, so initialize the triple and some options LLVMContext Context; - Triple TheTriple(getBitcodeTargetTriple(Buffer, Context)); + StringRef TripleStr; + ErrorOr<std::string> TripleOrErr = expectedToErrorOrAndEmitErrors( + Context, getBitcodeTargetTriple(Buffer.getMemBuffer())); + if (TripleOrErr) + TripleStr = *TripleOrErr; + Triple TheTriple(TripleStr); initTMBuilder(TMBuilder, Triple(TheTriple)); } #ifndef NDEBUG else { LLVMContext Context; - assert(TMBuilder.TheTriple.str() == - getBitcodeTargetTriple(Buffer, Context) && + StringRef TripleStr; + ErrorOr<std::string> TripleOrErr = expectedToErrorOrAndEmitErrors( + Context, getBitcodeTargetTriple(Buffer.getMemBuffer())); + if (TripleOrErr) + TripleStr = *TripleOrErr; + assert(TMBuilder.TheTriple.str() == TripleStr && "ThinLTO modules with different triple not supported"); } #endif @@ -477,6 +573,7 @@ std::unique_ptr<TargetMachine> TargetMachineBuilder::create() const { SubtargetFeatures Features(MAttr); Features.getDefaultSubtargetFeatures(TheTriple); std::string FeatureStr = Features.getString(); + return std::unique_ptr<TargetMachine>(TheTarget->createTargetMachine( TheTriple.str(), MCpu, FeatureStr, Options, RelocModel, CodeModel::Default, CGOptLevel)); @@ -490,13 +587,14 @@ std::unique_ptr<ModuleSummaryIndex> ThinLTOCodeGenerator::linkCombinedIndex() { std::unique_ptr<ModuleSummaryIndex> CombinedIndex; uint64_t NextModuleId = 0; for (auto &ModuleBuffer : Modules) { - ErrorOr<std::unique_ptr<object::ModuleSummaryIndexObjectFile>> ObjOrErr = - object::ModuleSummaryIndexObjectFile::create(ModuleBuffer, - diagnosticHandler); - if (std::error_code EC = ObjOrErr.getError()) { + Expected<std::unique_ptr<object::ModuleSummaryIndexObjectFile>> ObjOrErr = + object::ModuleSummaryIndexObjectFile::create( + ModuleBuffer.getMemBuffer()); + if (!ObjOrErr) { // FIXME diagnose - errs() << "error: can't create ModuleSummaryIndexObjectFile for buffer: " - << EC.message() << "\n"; + logAllUnhandledErrors( + ObjOrErr.takeError(), errs(), + "error: can't create ModuleSummaryIndexObjectFile for buffer: "); return nullptr; } auto Index = (*ObjOrErr)->takeIndex(); @@ -517,15 +615,23 @@ void ThinLTOCodeGenerator::promote(Module &TheModule, ModuleSummaryIndex &Index) { auto ModuleCount = Index.modulePaths().size(); auto ModuleIdentifier = TheModule.getModuleIdentifier(); + // Collect for each module the list of function it defines (GUID -> Summary). StringMap<GVSummaryMapTy> ModuleToDefinedGVSummaries; Index.collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries); + // Convert the preserved symbols set from string to GUID + auto GUIDPreservedSymbols = computeGUIDPreservedSymbols( + PreservedSymbols, Triple(TheModule.getTargetTriple())); + + // Compute "dead" symbols, we don't want to import/export these! + auto DeadSymbols = computeDeadSymbols(Index, GUIDPreservedSymbols); + // Generate import/export list StringMap<FunctionImporter::ImportMapTy> ImportLists(ModuleCount); StringMap<FunctionImporter::ExportSetTy> ExportLists(ModuleCount); ComputeCrossModuleImport(Index, ModuleToDefinedGVSummaries, ImportLists, - ExportLists); + ExportLists, &DeadSymbols); // Resolve LinkOnce/Weak symbols. StringMap<std::map<GlobalValue::GUID, GlobalValue::LinkageTypes>> ResolvedODR; @@ -534,6 +640,16 @@ void ThinLTOCodeGenerator::promote(Module &TheModule, thinLTOResolveWeakForLinkerModule( TheModule, ModuleToDefinedGVSummaries[ModuleIdentifier]); + // Promote the exported values in the index, so that they are promoted + // in the module. + auto isExported = [&](StringRef ModuleIdentifier, GlobalValue::GUID GUID) { + const auto &ExportList = ExportLists.find(ModuleIdentifier); + return (ExportList != ExportLists.end() && + ExportList->second.count(GUID)) || + GUIDPreservedSymbols.count(GUID); + }; + thinLTOInternalizeAndPromoteInIndex(Index, isExported); + promoteModule(TheModule, Index); } @@ -549,11 +665,18 @@ void ThinLTOCodeGenerator::crossModuleImport(Module &TheModule, StringMap<GVSummaryMapTy> ModuleToDefinedGVSummaries(ModuleCount); Index.collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries); + // Convert the preserved symbols set from string to GUID + auto GUIDPreservedSymbols = computeGUIDPreservedSymbols( + PreservedSymbols, Triple(TheModule.getTargetTriple())); + + // Compute "dead" symbols, we don't want to import/export these! + auto DeadSymbols = computeDeadSymbols(Index, GUIDPreservedSymbols); + // Generate import/export list StringMap<FunctionImporter::ImportMapTy> ImportLists(ModuleCount); StringMap<FunctionImporter::ExportSetTy> ExportLists(ModuleCount); ComputeCrossModuleImport(Index, ModuleToDefinedGVSummaries, ImportLists, - ExportLists); + ExportLists, &DeadSymbols); auto &ImportList = ImportLists[TheModule.getModuleIdentifier()]; crossImportIntoModule(TheModule, Index, ModuleMap, ImportList); @@ -578,7 +701,7 @@ void ThinLTOCodeGenerator::gatherImportedSummariesForModule( ExportLists); llvm::gatherImportedSummariesForModule(ModulePath, ModuleToDefinedGVSummaries, - ImportLists, + ImportLists[ModulePath], ModuleToSummariesForIndex); } @@ -601,7 +724,7 @@ void ThinLTOCodeGenerator::emitImports(StringRef ModulePath, ExportLists); std::error_code EC; - if ((EC = EmitImportsFiles(ModulePath, OutputName, ImportLists))) + if ((EC = EmitImportsFiles(ModulePath, OutputName, ImportLists[ModulePath]))) report_fatal_error(Twine("Failed to open ") + OutputName + " to save imports lists\n"); } @@ -623,11 +746,14 @@ void ThinLTOCodeGenerator::internalize(Module &TheModule, StringMap<GVSummaryMapTy> ModuleToDefinedGVSummaries(ModuleCount); Index.collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries); + // Compute "dead" symbols, we don't want to import/export these! + auto DeadSymbols = computeDeadSymbols(Index, GUIDPreservedSymbols); + // Generate import/export list StringMap<FunctionImporter::ImportMapTy> ImportLists(ModuleCount); StringMap<FunctionImporter::ExportSetTy> ExportLists(ModuleCount); ComputeCrossModuleImport(Index, ModuleToDefinedGVSummaries, ImportLists, - ExportLists); + ExportLists, &DeadSymbols); auto &ExportList = ExportLists[ModuleIdentifier]; // Be friendly and don't nuke totally the module when the client didn't @@ -654,7 +780,7 @@ void ThinLTOCodeGenerator::optimize(Module &TheModule) { initTMBuilder(TMBuilder, Triple(TheModule.getTargetTriple())); // Optimize now - optimizeModule(TheModule, *TMBuilder.create()); + optimizeModule(TheModule, *TMBuilder.create(), OptLevel); } /** @@ -665,13 +791,61 @@ std::unique_ptr<MemoryBuffer> ThinLTOCodeGenerator::codegen(Module &TheModule) { return codegenModule(TheModule, *TMBuilder.create()); } +/// Write out the generated object file, either from CacheEntryPath or from +/// OutputBuffer, preferring hard-link when possible. +/// Returns the path to the generated file in SavedObjectsDirectoryPath. +static std::string writeGeneratedObject(int count, StringRef CacheEntryPath, + StringRef SavedObjectsDirectoryPath, + const MemoryBuffer &OutputBuffer) { + SmallString<128> OutputPath(SavedObjectsDirectoryPath); + llvm::sys::path::append(OutputPath, Twine(count) + ".thinlto.o"); + OutputPath.c_str(); // Ensure the string is null terminated. + if (sys::fs::exists(OutputPath)) + sys::fs::remove(OutputPath); + + // We don't return a memory buffer to the linker, just a list of files. + if (!CacheEntryPath.empty()) { + // Cache is enabled, hard-link the entry (or copy if hard-link fails). + auto Err = sys::fs::create_hard_link(CacheEntryPath, OutputPath); + if (!Err) + return OutputPath.str(); + // Hard linking failed, try to copy. + Err = sys::fs::copy_file(CacheEntryPath, OutputPath); + if (!Err) + return OutputPath.str(); + // Copy failed (could be because the CacheEntry was removed from the cache + // in the meantime by another process), fall back and try to write down the + // buffer to the output. + errs() << "error: can't link or copy from cached entry '" << CacheEntryPath + << "' to '" << OutputPath << "'\n"; + } + // No cache entry, just write out the buffer. + std::error_code Err; + raw_fd_ostream OS(OutputPath, Err, sys::fs::F_None); + if (Err) + report_fatal_error("Can't open output '" + OutputPath + "'\n"); + OS << OutputBuffer.getBuffer(); + return OutputPath.str(); +} + // Main entry point for the ThinLTO processing void ThinLTOCodeGenerator::run() { + // Prepare the resulting object vector + assert(ProducedBinaries.empty() && "The generator should not be reused"); + if (SavedObjectsDirectoryPath.empty()) + ProducedBinaries.resize(Modules.size()); + else { + sys::fs::create_directories(SavedObjectsDirectoryPath); + bool IsDir; + sys::fs::is_directory(SavedObjectsDirectoryPath, IsDir); + if (!IsDir) + report_fatal_error("Unexistent dir: '" + SavedObjectsDirectoryPath + "'"); + ProducedBinaryFiles.resize(Modules.size()); + } + if (CodeGenOnly) { // Perform only parallel codegen and return. ThreadPool Pool; - assert(ProducedBinaries.empty() && "The generator should not be reused"); - ProducedBinaries.resize(Modules.size()); int count = 0; for (auto &ModuleBuffer : Modules) { Pool.async([&](int count) { @@ -679,10 +853,17 @@ void ThinLTOCodeGenerator::run() { Context.setDiscardValueNames(LTODiscardValueNames); // Parse module now - auto TheModule = loadModuleFromBuffer(ModuleBuffer, Context, false); + auto TheModule = + loadModuleFromBuffer(ModuleBuffer.getMemBuffer(), Context, false, + /*IsImporting*/ false); // CodeGen - ProducedBinaries[count] = codegen(*TheModule); + auto OutputBuffer = codegen(*TheModule); + if (SavedObjectsDirectoryPath.empty()) + ProducedBinaries[count] = std::move(OutputBuffer); + else + ProducedBinaryFiles[count] = writeGeneratedObject( + count, "", SavedObjectsDirectoryPath, *OutputBuffer); }, count++); } @@ -703,9 +884,6 @@ void ThinLTOCodeGenerator::run() { WriteIndexToFile(*Index, OS); } - // Prepare the resulting object vector - assert(ProducedBinaries.empty() && "The generator should not be reused"); - ProducedBinaries.resize(Modules.size()); // Prepare the module map. auto ModuleMap = generateModuleMap(Modules); @@ -715,17 +893,20 @@ void ThinLTOCodeGenerator::run() { StringMap<GVSummaryMapTy> ModuleToDefinedGVSummaries(ModuleCount); Index->collectDefinedGVSummariesPerModule(ModuleToDefinedGVSummaries); + // Convert the preserved symbols set from string to GUID, this is needed for + // computing the caching hash and the internalization. + auto GUIDPreservedSymbols = + computeGUIDPreservedSymbols(PreservedSymbols, TMBuilder.TheTriple); + + // Compute "dead" symbols, we don't want to import/export these! + auto DeadSymbols = computeDeadSymbols(*Index, GUIDPreservedSymbols); + // Collect the import/export lists for all modules from the call-graph in the // combined index. StringMap<FunctionImporter::ImportMapTy> ImportLists(ModuleCount); StringMap<FunctionImporter::ExportSetTy> ExportLists(ModuleCount); ComputeCrossModuleImport(*Index, ModuleToDefinedGVSummaries, ImportLists, - ExportLists); - - // Convert the preserved symbols set from string to GUID, this is needed for - // computing the caching hash and the internalization. - auto GUIDPreservedSymbols = - computeGUIDPreservedSymbols(PreservedSymbols, TMBuilder.TheTriple); + ExportLists, &DeadSymbols); // We use a std::map here to be able to have a defined ordering when // producing a hash for the cache entry. @@ -764,8 +945,8 @@ void ThinLTOCodeGenerator::run() { std::iota(ModulesOrdering.begin(), ModulesOrdering.end(), 0); std::sort(ModulesOrdering.begin(), ModulesOrdering.end(), [&](int LeftIndex, int RightIndex) { - auto LSize = Modules[LeftIndex].getBufferSize(); - auto RSize = Modules[RightIndex].getBufferSize(); + auto LSize = Modules[LeftIndex].getBuffer().size(); + auto RSize = Modules[RightIndex].getBuffer().size(); return LSize > RSize; }); @@ -784,17 +965,24 @@ void ThinLTOCodeGenerator::run() { ModuleCacheEntry CacheEntry(CacheOptions.Path, *Index, ModuleIdentifier, ImportLists[ModuleIdentifier], ExportList, ResolvedODR[ModuleIdentifier], - DefinedFunctions, GUIDPreservedSymbols); + DefinedFunctions, GUIDPreservedSymbols, + OptLevel, TMBuilder); + auto CacheEntryPath = CacheEntry.getEntryPath(); { auto ErrOrBuffer = CacheEntry.tryLoadingBuffer(); DEBUG(dbgs() << "Cache " << (ErrOrBuffer ? "hit" : "miss") << " '" - << CacheEntry.getEntryPath() << "' for buffer " << count - << " " << ModuleIdentifier << "\n"); + << CacheEntryPath << "' for buffer " << count << " " + << ModuleIdentifier << "\n"); if (ErrOrBuffer) { // Cache Hit! - ProducedBinaries[count] = std::move(ErrOrBuffer.get()); + if (SavedObjectsDirectoryPath.empty()) + ProducedBinaries[count] = std::move(ErrOrBuffer.get()); + else + ProducedBinaryFiles[count] = writeGeneratedObject( + count, CacheEntryPath, SavedObjectsDirectoryPath, + *ErrOrBuffer.get()); return; } } @@ -802,9 +990,17 @@ void ThinLTOCodeGenerator::run() { LLVMContext Context; Context.setDiscardValueNames(LTODiscardValueNames); Context.enableDebugTypeODRUniquing(); + auto DiagFileOrErr = setupOptimizationRemarks(Context, count); + if (!DiagFileOrErr) { + errs() << "Error: " << toString(DiagFileOrErr.takeError()) << "\n"; + report_fatal_error("ThinLTO: Can't get an output file for the " + "remarks"); + } // Parse module now - auto TheModule = loadModuleFromBuffer(ModuleBuffer, Context, false); + auto TheModule = + loadModuleFromBuffer(ModuleBuffer.getMemBuffer(), Context, false, + /*IsImporting*/ false); // Save temps: original file. saveTempBitcode(*TheModule, SaveTempsDir, count, ".0.original.bc"); @@ -815,17 +1011,41 @@ void ThinLTOCodeGenerator::run() { *TheModule, *Index, ModuleMap, *TMBuilder.create(), ImportList, ExportList, GUIDPreservedSymbols, ModuleToDefinedGVSummaries[ModuleIdentifier], CacheOptions, - DisableCodeGen, SaveTempsDir, count); - - OutputBuffer = CacheEntry.write(std::move(OutputBuffer)); - ProducedBinaries[count] = std::move(OutputBuffer); + DisableCodeGen, SaveTempsDir, OptLevel, count); + + // Commit to the cache (if enabled) + CacheEntry.write(*OutputBuffer); + + if (SavedObjectsDirectoryPath.empty()) { + // We need to generated a memory buffer for the linker. + if (!CacheEntryPath.empty()) { + // Cache is enabled, reload from the cache + // We do this to lower memory pressuree: the buffer is on the heap + // and releasing it frees memory that can be used for the next input + // file. The final binary link will read from the VFS cache + // (hopefully!) or from disk if the memory pressure wasn't too high. + auto ReloadedBufferOrErr = CacheEntry.tryLoadingBuffer(); + if (auto EC = ReloadedBufferOrErr.getError()) { + // On error, keeping the preexisting buffer and printing a + // diagnostic is more friendly than just crashing. + errs() << "error: can't reload cached file '" << CacheEntryPath + << "': " << EC.message() << "\n"; + } else { + OutputBuffer = std::move(*ReloadedBufferOrErr); + } + } + ProducedBinaries[count] = std::move(OutputBuffer); + return; + } + ProducedBinaryFiles[count] = writeGeneratedObject( + count, CacheEntryPath, SavedObjectsDirectoryPath, *OutputBuffer); }, IndexCount); } } CachePruning(CacheOptions.Path) - .setPruningInterval(CacheOptions.PruningInterval) - .setEntryExpiration(CacheOptions.Expiration) + .setPruningInterval(std::chrono::seconds(CacheOptions.PruningInterval)) + .setEntryExpiration(std::chrono::seconds(CacheOptions.Expiration)) .setMaxSize(CacheOptions.MaxPercentageOfAvailableSpace) .prune(); |