diff options
Diffstat (limited to 'docs/PCHInternals.html')
-rw-r--r-- | docs/PCHInternals.html | 658 |
1 files changed, 0 insertions, 658 deletions
diff --git a/docs/PCHInternals.html b/docs/PCHInternals.html deleted file mode 100644 index 7fed5ba..0000000 --- a/docs/PCHInternals.html +++ /dev/null @@ -1,658 +0,0 @@ -<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" - "http://www.w3.org/TR/html4/strict.dtd"> -<html> -<head> - <title>Precompiled Header and Modules Internals</title> - <link type="text/css" rel="stylesheet" href="../menu.css"> - <link type="text/css" rel="stylesheet" href="../content.css"> - <style type="text/css"> - td { - vertical-align: top; - } - </style> -</head> - -<body> - -<!--#include virtual="../menu.html.incl"--> - -<div id="content"> - -<h1>Precompiled Header and Modules Internals</h1> - - <p>This document describes the design and implementation of Clang's - precompiled headers (PCH) and modules. If you are interested in the end-user - view, please see the <a - href="UsersManual.html#precompiledheaders">User's Manual</a>.</p> - - <p><b>Table of Contents</b></p> - <ul> - <li><a href="#usage">Using Precompiled Headers with - <tt>clang</tt></a></li> - <li><a href="#philosophy">Design Philosophy</a></li> - <li><a href="#contents">Serialized AST File Contents</a> - <ul> - <li><a href="#metadata">Metadata Block</a></li> - <li><a href="#sourcemgr">Source Manager Block</a></li> - <li><a href="#preprocessor">Preprocessor Block</a></li> - <li><a href="#types">Types Block</a></li> - <li><a href="#decls">Declarations Block</a></li> - <li><a href="#stmt">Statements and Expressions</a></li> - <li><a href="#idtable">Identifier Table Block</a></li> - <li><a href="#method-pool">Method Pool Block</a></li> - </ul> - </li> - <li><a href="#tendrils">AST Reader Integration Points</a></li> - <li><a href="#chained">Chained precompiled headers</a></li> - <li><a href="#modules">Modules</a></li> -</ul> - -<h2 id="usage">Using Precompiled Headers with <tt>clang</tt></h2> - -<p>The Clang compiler frontend, <tt>clang -cc1</tt>, supports two command line -options for generating and using PCH files.<p> - -<p>To generate PCH files using <tt>clang -cc1</tt>, use the option -<b><tt>-emit-pch</tt></b>: - -<pre> $ clang -cc1 test.h -emit-pch -o test.h.pch </pre> - -<p>This option is transparently used by <tt>clang</tt> when generating -PCH files. The resulting PCH file contains the serialized form of the -compiler's internal representation after it has completed parsing and -semantic analysis. The PCH file can then be used as a prefix header -with the <b><tt>-include-pch</tt></b> option:</p> - -<pre> - $ clang -cc1 -include-pch test.h.pch test.c -o test.s -</pre> - -<h2 id="philosophy">Design Philosophy</h2> - -<p>Precompiled headers are meant to improve overall compile times for - projects, so the design of precompiled headers is entirely driven by - performance concerns. The use case for precompiled headers is - relatively simple: when there is a common set of headers that is - included in nearly every source file in the project, we - <i>precompile</i> that bundle of headers into a single precompiled - header (PCH file). Then, when compiling the source files in the - project, we load the PCH file first (as a prefix header), which acts - as a stand-in for that bundle of headers.</p> - -<p>A precompiled header implementation improves performance when:</p> -<ul> - <li>Loading the PCH file is significantly faster than re-parsing the - bundle of headers stored within the PCH file. Thus, a precompiled - header design attempts to minimize the cost of reading the PCH - file. Ideally, this cost should not vary with the size of the - precompiled header file.</li> - - <li>The cost of generating the PCH file initially is not so large - that it counters the per-source-file performance improvement due to - eliminating the need to parse the bundled headers in the first - place. This is particularly important on multi-core systems, because - PCH file generation serializes the build when all compilations - require the PCH file to be up-to-date.</li> -</ul> - -<p>Modules, as implemented in Clang, use the same mechanisms as -precompiled headers to save a serialized AST file (one per module) and -use those AST modules. From an implementation standpoint, modules are -a generalization of precompiled headers, lifting a number of -restrictions placed on precompiled headers. In particular, there can -only be one precompiled header and it must be included at the -beginning of the translation unit. The extensions to the AST file -format required for modules are discussed in the section on <a href="#modules">modules</a>.</p> - -<p>Clang's AST files are designed with a compact on-disk -representation, which minimizes both creation time and the time -required to initially load the AST file. The AST file itself contains -a serialized representation of Clang's abstract syntax trees and -supporting data structures, stored using the same compressed bitstream -as <a href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitcode -file format</a>.</p> - -<p>Clang's AST files are loaded "lazily" from disk. When an -AST file is initially loaded, Clang reads only a small amount of data -from the AST file to establish where certain important data structures -are stored. The amount of data read in this initial load is -independent of the size of the AST file, such that a larger AST file -does not lead to longer AST load times. The actual header data in the -AST file--macros, functions, variables, types, etc.--is loaded only -when it is referenced from the user's code, at which point only that -entity (and those entities it depends on) are deserialized from the -AST file. With this approach, the cost of using an AST file -for a translation unit is proportional to the amount of code actually -used from the AST file, rather than being proportional to the size of -the AST file itself.</p> - -<p>When given the <code>-print-stats</code> option, Clang produces -statistics describing how much of the AST file was actually -loaded from disk. For a simple "Hello, World!" program that includes -the Apple <code>Cocoa.h</code> header (which is built as a precompiled -header), this option illustrates how little of the actual precompiled -header is required:</p> - -<pre> -*** PCH Statistics: - 933 stat cache hits - 4 stat cache misses - 895/39981 source location entries read (2.238563%) - 19/15315 types read (0.124061%) - 20/82685 declarations read (0.024188%) - 154/58070 identifiers read (0.265197%) - 0/7260 selectors read (0.000000%) - 0/30842 statements read (0.000000%) - 4/8400 macros read (0.047619%) - 1/4995 lexical declcontexts read (0.020020%) - 0/4413 visible declcontexts read (0.000000%) - 0/7230 method pool entries read (0.000000%) - 0 method pool misses -</pre> - -<p>For this small program, only a tiny fraction of the source -locations, types, declarations, identifiers, and macros were actually -deserialized from the precompiled header. These statistics can be -useful to determine whether the AST file implementation can -be improved by making more of the implementation lazy.</p> - -<p>Precompiled headers can be chained. When you create a PCH while -including an existing PCH, Clang can create the new PCH by referencing -the original file and only writing the new data to the new file. For -example, you could create a PCH out of all the headers that are very -commonly used throughout your project, and then create a PCH for every -single source file in the project that includes the code that is -specific to that file, so that recompiling the file itself is very fast, -without duplicating the data from the common headers for every -file. The mechanisms behind chained precompiled headers are discussed -in a <a href="#chained">later section</a>. - -<h2 id="contents">AST File Contents</h2> - -<img src="PCHLayout.png" style="float:right" alt="Precompiled header layout"> - -<p>Clang's AST files are organized into several different -blocks, each of which contains the serialized representation of a part -of Clang's internal representation. Each of the blocks corresponds to -either a block or a record within <a - href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitstream -format</a>. The contents of each of these logical blocks are described -below.</p> - -<p>For a given AST file, the <a -href="http://llvm.org/cmds/llvm-bcanalyzer.html"><code>llvm-bcanalyzer</code></a> -utility can be used to examine the actual structure of the bitstream -for the AST file. This information can be used both to help -understand the structure of the AST file and to isolate -areas where AST files can still be optimized, e.g., through -the introduction of abbreviations.</p> - -<h3 id="metadata">Metadata Block</h3> - -<p>The metadata block contains several records that provide -information about how the AST file was built. This metadata -is primarily used to validate the use of an AST file. For -example, a precompiled header built for a 32-bit x86 target cannot be used -when compiling for a 64-bit x86 target. The metadata block contains -information about:</p> - -<dl> - <dt>Language options</dt> - <dd>Describes the particular language dialect used to compile the -AST file, including major options (e.g., Objective-C support) and more -minor options (e.g., support for "//" comments). The contents of this -record correspond to the <code>LangOptions</code> class.</dd> - - <dt>Target architecture</dt> - <dd>The target triple that describes the architecture, platform, and -ABI for which the AST file was generated, e.g., -<code>i386-apple-darwin9</code>.</dd> - - <dt>AST version</dt> - <dd>The major and minor version numbers of the AST file -format. Changes in the minor version number should not affect backward -compatibility, while changes in the major version number imply that a -newer compiler cannot read an older precompiled header (and -vice-versa).</dd> - - <dt>Original file name</dt> - <dd>The full path of the header that was used to generate the -AST file.</dd> - - <dt>Predefines buffer</dt> - <dd>Although not explicitly stored as part of the metadata, the -predefines buffer is used in the validation of the AST file. -The predefines buffer itself contains code generated by the compiler -to initialize the preprocessor state according to the current target, -platform, and command-line options. For example, the predefines buffer -will contain "<code>#define __STDC__ 1</code>" when we are compiling C -without Microsoft extensions. The predefines buffer itself is stored -within the <a href="#sourcemgr">source manager block</a>, but its -contents are verified along with the rest of the metadata.</dd> - -</dl> - -<p>A chained PCH file (that is, one that references another PCH) and a -module (which may import other modules) have additional metadata -containing the list of all AST files that this AST file depends -on. Each of those files will be loaded along with this AST file.</p> - -<p>For chained precompiled headers, the language options, target -architecture and predefines buffer data is taken from the end of the -chain, since they have to match anyway.</p> - -<h3 id="sourcemgr">Source Manager Block</h3> - -<p>The source manager block contains the serialized representation of -Clang's <a - href="InternalsManual.html#SourceLocation">SourceManager</a> class, -which handles the mapping from source locations (as represented in -Clang's abstract syntax tree) into actual column/line positions within -a source file or macro instantiation. The AST file's -representation of the source manager also includes information about -all of the headers that were (transitively) included when building the -AST file.</p> - -<p>The bulk of the source manager block is dedicated to information -about the various files, buffers, and macro instantiations into which -a source location can refer. Each of these is referenced by a numeric -"file ID", which is a unique number (allocated starting at 1) stored -in the source location. Clang serializes the information for each kind -of file ID, along with an index that maps file IDs to the position -within the AST file where the information about that file ID is -stored. The data associated with a file ID is loaded only when -required by the front end, e.g., to emit a diagnostic that includes a -macro instantiation history inside the header itself.</p> - -<p>The source manager block also contains information about all of the -headers that were included when building the AST file. This -includes information about the controlling macro for the header (e.g., -when the preprocessor identified that the contents of the header -dependent on a macro like <code>LLVM_CLANG_SOURCEMANAGER_H</code>) -along with a cached version of the results of the <code>stat()</code> -system calls performed when building the AST file. The -latter is particularly useful in reducing system time when searching -for include files.</p> - -<h3 id="preprocessor">Preprocessor Block</h3> - -<p>The preprocessor block contains the serialized representation of -the preprocessor. Specifically, it contains all of the macros that -have been defined by the end of the header used to build the -AST file, along with the token sequences that comprise each -macro. The macro definitions are only read from the AST file when the -name of the macro first occurs in the program. This lazy loading of -macro definitions is triggered by lookups into the <a - href="#idtable">identifier table</a>.</p> - -<h3 id="types">Types Block</h3> - -<p>The types block contains the serialized representation of all of -the types referenced in the translation unit. Each Clang type node -(<code>PointerType</code>, <code>FunctionProtoType</code>, etc.) has a -corresponding record type in the AST file. When types are deserialized -from the AST file, the data within the record is used to -reconstruct the appropriate type node using the AST context.</p> - -<p>Each type has a unique type ID, which is an integer that uniquely -identifies that type. Type ID 0 represents the NULL type, type IDs -less than <code>NUM_PREDEF_TYPE_IDS</code> represent predefined types -(<code>void</code>, <code>float</code>, etc.), while other -"user-defined" type IDs are assigned consecutively from -<code>NUM_PREDEF_TYPE_IDS</code> upward as the types are encountered. -The AST file has an associated mapping from the user-defined types -block to the location within the types block where the serialized -representation of that type resides, enabling lazy deserialization of -types. When a type is referenced from within the AST file, that -reference is encoded using the type ID shifted left by 3 bits. The -lower three bits are used to represent the <code>const</code>, -<code>volatile</code>, and <code>restrict</code> qualifiers, as in -Clang's <a - href="http://clang.llvm.org/docs/InternalsManual.html#Type">QualType</a> -class.</p> - -<h3 id="decls">Declarations Block</h3> - -<p>The declarations block contains the serialized representation of -all of the declarations referenced in the translation unit. Each Clang -declaration node (<code>VarDecl</code>, <code>FunctionDecl</code>, -etc.) has a corresponding record type in the AST file. When -declarations are deserialized from the AST file, the data -within the record is used to build and populate a new instance of the -corresponding <code>Decl</code> node. As with types, each declaration -node has a numeric ID that is used to refer to that declaration within -the AST file. In addition, a lookup table provides a mapping from that -numeric ID to the offset within the precompiled header where that -declaration is described.</p> - -<p>Declarations in Clang's abstract syntax trees are stored -hierarchically. At the top of the hierarchy is the translation unit -(<code>TranslationUnitDecl</code>), which contains all of the -declarations in the translation unit but is not actually written as a -specific declaration node. Its child declarations (such as -functions or struct types) may also contain other declarations inside -them, and so on. Within Clang, each declaration is stored within a <a -href="http://clang.llvm.org/docs/InternalsManual.html#DeclContext">declaration -context</a>, as represented by the <code>DeclContext</code> class. -Declaration contexts provide the mechanism to perform name lookup -within a given declaration (e.g., find the member named <code>x</code> -in a structure) and iterate over the declarations stored within a -context (e.g., iterate over all of the fields of a structure for -structure layout).</p> - -<p>In Clang's AST file format, deserializing a declaration -that is a <code>DeclContext</code> is a separate operation from -deserializing all of the declarations stored within that declaration -context. Therefore, Clang will deserialize the translation unit -declaration without deserializing the declarations within that -translation unit. When required, the declarations stored within a -declaration context will be deserialized. There are two representations -of the declarations within a declaration context, which correspond to -the name-lookup and iteration behavior described above:</p> - -<ul> - <li>When the front end performs name lookup to find a name - <code>x</code> within a given declaration context (for example, - during semantic analysis of the expression <code>p->x</code>, - where <code>p</code>'s type is defined in the precompiled header), - Clang refers to an on-disk hash table that maps from the names - within that declaration context to the declaration IDs that - represent each visible declaration with that name. The actual - declarations will then be deserialized to provide the results of - name lookup.</li> - - <li>When the front end performs iteration over all of the - declarations within a declaration context, all of those declarations - are immediately de-serialized. For large declaration contexts (e.g., - the translation unit), this operation is expensive; however, large - declaration contexts are not traversed in normal compilation, since - such a traversal is unnecessary. However, it is common for the code - generator and semantic analysis to traverse declaration contexts for - structs, classes, unions, and enumerations, although those contexts - contain relatively few declarations in the common case.</li> -</ul> - -<h3 id="stmt">Statements and Expressions</h3> - -<p>Statements and expressions are stored in the AST file in -both the <a href="#types">types</a> and the <a - href="#decls">declarations</a> blocks, because every statement or -expression will be associated with either a type or declaration. The -actual statement and expression records are stored immediately -following the declaration or type that owns the statement or -expression. For example, the statement representing the body of a -function will be stored directly following the declaration of the -function.</p> - -<p>As with types and declarations, each statement and expression kind -in Clang's abstract syntax tree (<code>ForStmt</code>, -<code>CallExpr</code>, etc.) has a corresponding record type in the -AST file, which contains the serialized representation of -that statement or expression. Each substatement or subexpression -within an expression is stored as a separate record (which keeps most -records to a fixed size). Within the AST file, the -subexpressions of an expression are stored, in reverse order, prior to the expression -that owns those expression, using a form of <a -href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">Reverse -Polish Notation</a>. For example, an expression <code>3 - 4 + 5</code> -would be represented as follows:</p> - -<table border="1"> - <tr><td><code>IntegerLiteral(5)</code></td></tr> - <tr><td><code>IntegerLiteral(4)</code></td></tr> - <tr><td><code>IntegerLiteral(3)</code></td></tr> - <tr><td><code>BinaryOperator(-)</code></td></tr> - <tr><td><code>BinaryOperator(+)</code></td></tr> - <tr><td>STOP</td></tr> -</table> - -<p>When reading this representation, Clang evaluates each expression -record it encounters, builds the appropriate abstract syntax tree node, -and then pushes that expression on to a stack. When a record contains <i>N</i> -subexpressions--<code>BinaryOperator</code> has two of them--those -expressions are popped from the top of the stack. The special STOP -code indicates that we have reached the end of a serialized expression -or statement; other expression or statement records may follow, but -they are part of a different expression.</p> - -<h3 id="idtable">Identifier Table Block</h3> - -<p>The identifier table block contains an on-disk hash table that maps -each identifier mentioned within the AST file to the -serialized representation of the identifier's information (e.g, the -<code>IdentifierInfo</code> structure). The serialized representation -contains:</p> - -<ul> - <li>The actual identifier string.</li> - <li>Flags that describe whether this identifier is the name of a - built-in, a poisoned identifier, an extension token, or a - macro.</li> - <li>If the identifier names a macro, the offset of the macro - definition within the <a href="#preprocessor">preprocessor - block</a>.</li> - <li>If the identifier names one or more declarations visible from - translation unit scope, the <a href="#decls">declaration IDs</a> of these - declarations.</li> -</ul> - -<p>When an AST file is loaded, the AST file reader -mechanism introduces itself into the identifier table as an external -lookup source. Thus, when the user program refers to an identifier -that has not yet been seen, Clang will perform a lookup into the -identifier table. If an identifier is found, its contents (macro -definitions, flags, top-level declarations, etc.) will be -deserialized, at which point the corresponding -<code>IdentifierInfo</code> structure will have the same contents it -would have after parsing the headers in the AST file.</p> - -<p>Within the AST file, the identifiers used to name declarations are represented with an integral value. A separate table provides a mapping from this integral value (the identifier ID) to the location within the on-disk -hash table where that identifier is stored. This mapping is used when -deserializing the name of a declaration, the identifier of a token, or -any other construct in the AST file that refers to a name.</p> - -<h3 id="method-pool">Method Pool Block</h3> - -<p>The method pool block is represented as an on-disk hash table that -serves two purposes: it provides a mapping from the names of -Objective-C selectors to the set of Objective-C instance and class -methods that have that particular selector (which is required for -semantic analysis in Objective-C) and also stores all of the selectors -used by entities within the AST file. The design of the -method pool is similar to that of the <a href="#idtable">identifier -table</a>: the first time a particular selector is formed during the -compilation of the program, Clang will search in the on-disk hash -table of selectors; if found, Clang will read the Objective-C methods -associated with that selector into the appropriate front-end data -structure (<code>Sema::InstanceMethodPool</code> and -<code>Sema::FactoryMethodPool</code> for instance and class methods, -respectively).</p> - -<p>As with identifiers, selectors are represented by numeric values -within the AST file. A separate index maps these numeric selector -values to the offset of the selector within the on-disk hash table, -and will be used when de-serializing an Objective-C method declaration -(or other Objective-C construct) that refers to the selector.</p> - -<h2 id="tendrils">AST Reader Integration Points</h2> - -<p>The "lazy" deserialization behavior of AST files requires -their integration into several completely different submodules of -Clang. For example, lazily deserializing the declarations during name -lookup requires that the name-lookup routines be able to query the -AST file to find entities stored there.</p> - -<p>For each Clang data structure that requires direct interaction with -the AST reader logic, there is an abstract class that provides -the interface between the two modules. The <code>ASTReader</code> -class, which handles the loading of an AST file, inherits -from all of these abstract classes to provide lazy deserialization of -Clang's data structures. <code>ASTReader</code> implements the -following abstract classes:</p> - -<dl> - <dt><code>StatSysCallCache</code></dt> - <dd>This abstract interface is associated with the - <code>FileManager</code> class, and is used whenever the file - manager is going to perform a <code>stat()</code> system call.</dd> - - <dt><code>ExternalSLocEntrySource</code></dt> - <dd>This abstract interface is associated with the - <code>SourceManager</code> class, and is used whenever the - <a href="#sourcemgr">source manager</a> needs to load the details - of a file, buffer, or macro instantiation.</dd> - - <dt><code>IdentifierInfoLookup</code></dt> - <dd>This abstract interface is associated with the - <code>IdentifierTable</code> class, and is used whenever the - program source refers to an identifier that has not yet been seen. - In this case, the AST reader searches for - this identifier within its <a href="#idtable">identifier table</a> - to load any top-level declarations or macros associated with that - identifier.</dd> - - <dt><code>ExternalASTSource</code></dt> - <dd>This abstract interface is associated with the - <code>ASTContext</code> class, and is used whenever the abstract - syntax tree nodes need to loaded from the AST file. It - provides the ability to de-serialize declarations and types - identified by their numeric values, read the bodies of functions - when required, and read the declarations stored within a - declaration context (either for iteration or for name lookup).</dd> - - <dt><code>ExternalSemaSource</code></dt> - <dd>This abstract interface is associated with the <code>Sema</code> - class, and is used whenever semantic analysis needs to read - information from the <a href="#methodpool">global method - pool</a>.</dd> -</dl> - -<h2 id="chained">Chained precompiled headers</h2> - -<p>Chained precompiled headers were initially intended to improve the -performance of IDE-centric operations such as syntax highlighting and -code completion while a particular source file is being edited by the -user. To minimize the amount of reparsing required after a change to -the file, a form of precompiled header--called a precompiled -<i>preamble</i>--is automatically generated by parsing all of the -headers in the source file, up to and including the last -#include. When only the source file changes (and none of the headers -it depends on), reparsing of that source file can use the precompiled -preamble and start parsing after the #includes, so parsing time is -proportional to the size of the source file (rather than all of its -includes). However, the compilation of that translation unit -may already use a precompiled header: in this case, Clang will create -the precompiled preamble as a chained precompiled header that refers -to the original precompiled header. This drastically reduces the time -needed to serialize the precompiled preamble for use in reparsing.</p> - -<p>Chained precompiled headers get their name because each precompiled header -can depend on one other precompiled header, forming a chain of -dependencies. A translation unit will then include the precompiled -header that starts the chain (i.e., nothing depends on it). This -linearity of dependencies is important for the semantic model of -chained precompiled headers, because the most-recent precompiled -header can provide information that overrides the information provided -by the precompiled headers it depends on, just like a header file -<code>B.h</code> that includes another header <code>A.h</code> can -modify the state produced by parsing <code>A.h</code>, e.g., by -<code>#undef</code>'ing a macro defined in <code>A.h</code>.</p> - -<p>There are several ways in which chained precompiled headers -generalize the AST file model:</p> - -<dl> - <dt>Numbering of IDs</dt> - <dd>Many different kinds of entities--identifiers, declarations, - types, etc.---have ID numbers that start at 1 or some other - predefined constant and grow upward. Each precompiled header records - the maximum ID number it has assigned in each category. Then, when a - new precompiled header is generated that depends on (chains to) - another precompiled header, it will start counting at the next - available ID number. This way, one can determine, given an ID - number, which AST file actually contains the entity.</dd> - - <dt>Name lookup</dt> - <dd>When writing a chained precompiled header, Clang attempts to - write only information that has changed from the precompiled header - on which it is based. This changes the lookup algorithm for the - various tables, such as the <a href="#idtable">identifier table</a>: - the search starts at the most-recent precompiled header. If no entry - is found, lookup then proceeds to the identifier table in the - precompiled header it depends on, and so one. Once a lookup - succeeds, that result is considered definitive, overriding any - results from earlier precompiled headers.</dd> - - <dt>Update records</dt> - <dd>There are various ways in which a later precompiled header can - modify the entities described in an earlier precompiled header. For - example, later precompiled headers can add entries into the various - name-lookup tables for the translation unit or namespaces, or add - new categories to an Objective-C class. Each of these updates is - captured in an "update record" that is stored in the chained - precompiled header file and will be loaded along with the original - entity.</dd> -</dl> - -<h2 id="modules">Modules</h2> - -<p>Modules generalize the chained precompiled header model yet -further, from a linear chain of precompiled headers to an arbitrary -directed acyclic graph (DAG) of AST files. All of the same techniques -used to make chained precompiled headers work---ID number, name -lookup, update records---are shared with modules. However, the DAG -nature of modules introduce a number of additional complications to -the model: - -<dl> - <dt>Numbering of IDs</dt> - <dd>The simple, linear numbering scheme used in chained precompiled - headers falls apart with the module DAG, because different modules - may end up with different numbering schemes for entities they - imported from common shared modules. To account for this, each - module file provides information about which modules it depends on - and which ID numbers it assigned to the entities in those modules, - as well as which ID numbers it took for its own new entities. The - AST reader then maps these "local" ID numbers into a "global" ID - number space for the current translation unit, providing a 1-1 - mapping between entities (in whatever AST file they inhabit) and - global ID numbers. If that translation unit is then serialized into - an AST file, this mapping will be stored for use when the AST file - is imported.</dd> - - <dt>Declaration merging</dt> - <dd>It is possible for a given entity (from the language's - perspective) to be declared multiple times in different places. For - example, two different headers can have the declaration of - <tt>printf</tt> or could forward-declare <tt>struct stat</tt>. If - each of those headers is included in a module, and some third party - imports both of those modules, there is a potentially serious - problem: name lookup for <tt>printf</tt> or <tt>struct stat</tt> will - find both declarations, but the AST nodes are unrelated. This would - result in a compilation error, due to an ambiguity in name - lookup. Therefore, the AST reader performs declaration merging - according to the appropriate language semantics, ensuring that the - two disjoint declarations are merged into a single redeclaration - chain (with a common canonical declaration), so that it is as if one - of the headers had been included before the other.</dd> - - <dt>Name Visibility</dt> - <dd>Modules allow certain names that occur during module creation to - be "hidden", so that they are not part of the public interface of - the module and are not visible to its clients. The AST reader - maintains a "visible" bit on various AST nodes (declarations, macros, - etc.) to indicate whether that particular AST node is currently - visible; the various name lookup mechanisms in Clang inspect the - visible bit to determine whether that entity, which is still in the - AST (because other, visible AST nodes may depend on it), can - actually be found by name lookup. When a new (sub)module is - imported, it may make existing, non-visible, already-deserialized - AST nodes visible; it is the responsibility of the AST reader to - find and update these AST nodes when it is notified of the import.</dd> - -</dl> - -</div> - -</body> -</html> |