summaryrefslogtreecommitdiffstats
path: root/docs/PCHInternals.html
diff options
context:
space:
mode:
Diffstat (limited to 'docs/PCHInternals.html')
-rw-r--r--docs/PCHInternals.html658
1 files changed, 0 insertions, 658 deletions
diff --git a/docs/PCHInternals.html b/docs/PCHInternals.html
deleted file mode 100644
index 7fed5ba..0000000
--- a/docs/PCHInternals.html
+++ /dev/null
@@ -1,658 +0,0 @@
-<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
- "http://www.w3.org/TR/html4/strict.dtd">
-<html>
-<head>
- <title>Precompiled Header and Modules Internals</title>
- <link type="text/css" rel="stylesheet" href="../menu.css">
- <link type="text/css" rel="stylesheet" href="../content.css">
- <style type="text/css">
- td {
- vertical-align: top;
- }
- </style>
-</head>
-
-<body>
-
-<!--#include virtual="../menu.html.incl"-->
-
-<div id="content">
-
-<h1>Precompiled Header and Modules Internals</h1>
-
- <p>This document describes the design and implementation of Clang's
- precompiled headers (PCH) and modules. If you are interested in the end-user
- view, please see the <a
- href="UsersManual.html#precompiledheaders">User's Manual</a>.</p>
-
- <p><b>Table of Contents</b></p>
- <ul>
- <li><a href="#usage">Using Precompiled Headers with
- <tt>clang</tt></a></li>
- <li><a href="#philosophy">Design Philosophy</a></li>
- <li><a href="#contents">Serialized AST File Contents</a>
- <ul>
- <li><a href="#metadata">Metadata Block</a></li>
- <li><a href="#sourcemgr">Source Manager Block</a></li>
- <li><a href="#preprocessor">Preprocessor Block</a></li>
- <li><a href="#types">Types Block</a></li>
- <li><a href="#decls">Declarations Block</a></li>
- <li><a href="#stmt">Statements and Expressions</a></li>
- <li><a href="#idtable">Identifier Table Block</a></li>
- <li><a href="#method-pool">Method Pool Block</a></li>
- </ul>
- </li>
- <li><a href="#tendrils">AST Reader Integration Points</a></li>
- <li><a href="#chained">Chained precompiled headers</a></li>
- <li><a href="#modules">Modules</a></li>
-</ul>
-
-<h2 id="usage">Using Precompiled Headers with <tt>clang</tt></h2>
-
-<p>The Clang compiler frontend, <tt>clang -cc1</tt>, supports two command line
-options for generating and using PCH files.<p>
-
-<p>To generate PCH files using <tt>clang -cc1</tt>, use the option
-<b><tt>-emit-pch</tt></b>:
-
-<pre> $ clang -cc1 test.h -emit-pch -o test.h.pch </pre>
-
-<p>This option is transparently used by <tt>clang</tt> when generating
-PCH files. The resulting PCH file contains the serialized form of the
-compiler's internal representation after it has completed parsing and
-semantic analysis. The PCH file can then be used as a prefix header
-with the <b><tt>-include-pch</tt></b> option:</p>
-
-<pre>
- $ clang -cc1 -include-pch test.h.pch test.c -o test.s
-</pre>
-
-<h2 id="philosophy">Design Philosophy</h2>
-
-<p>Precompiled headers are meant to improve overall compile times for
- projects, so the design of precompiled headers is entirely driven by
- performance concerns. The use case for precompiled headers is
- relatively simple: when there is a common set of headers that is
- included in nearly every source file in the project, we
- <i>precompile</i> that bundle of headers into a single precompiled
- header (PCH file). Then, when compiling the source files in the
- project, we load the PCH file first (as a prefix header), which acts
- as a stand-in for that bundle of headers.</p>
-
-<p>A precompiled header implementation improves performance when:</p>
-<ul>
- <li>Loading the PCH file is significantly faster than re-parsing the
- bundle of headers stored within the PCH file. Thus, a precompiled
- header design attempts to minimize the cost of reading the PCH
- file. Ideally, this cost should not vary with the size of the
- precompiled header file.</li>
-
- <li>The cost of generating the PCH file initially is not so large
- that it counters the per-source-file performance improvement due to
- eliminating the need to parse the bundled headers in the first
- place. This is particularly important on multi-core systems, because
- PCH file generation serializes the build when all compilations
- require the PCH file to be up-to-date.</li>
-</ul>
-
-<p>Modules, as implemented in Clang, use the same mechanisms as
-precompiled headers to save a serialized AST file (one per module) and
-use those AST modules. From an implementation standpoint, modules are
-a generalization of precompiled headers, lifting a number of
-restrictions placed on precompiled headers. In particular, there can
-only be one precompiled header and it must be included at the
-beginning of the translation unit. The extensions to the AST file
-format required for modules are discussed in the section on <a href="#modules">modules</a>.</p>
-
-<p>Clang's AST files are designed with a compact on-disk
-representation, which minimizes both creation time and the time
-required to initially load the AST file. The AST file itself contains
-a serialized representation of Clang's abstract syntax trees and
-supporting data structures, stored using the same compressed bitstream
-as <a href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitcode
-file format</a>.</p>
-
-<p>Clang's AST files are loaded "lazily" from disk. When an
-AST file is initially loaded, Clang reads only a small amount of data
-from the AST file to establish where certain important data structures
-are stored. The amount of data read in this initial load is
-independent of the size of the AST file, such that a larger AST file
-does not lead to longer AST load times. The actual header data in the
-AST file--macros, functions, variables, types, etc.--is loaded only
-when it is referenced from the user's code, at which point only that
-entity (and those entities it depends on) are deserialized from the
-AST file. With this approach, the cost of using an AST file
-for a translation unit is proportional to the amount of code actually
-used from the AST file, rather than being proportional to the size of
-the AST file itself.</p>
-
-<p>When given the <code>-print-stats</code> option, Clang produces
-statistics describing how much of the AST file was actually
-loaded from disk. For a simple "Hello, World!" program that includes
-the Apple <code>Cocoa.h</code> header (which is built as a precompiled
-header), this option illustrates how little of the actual precompiled
-header is required:</p>
-
-<pre>
-*** PCH Statistics:
- 933 stat cache hits
- 4 stat cache misses
- 895/39981 source location entries read (2.238563%)
- 19/15315 types read (0.124061%)
- 20/82685 declarations read (0.024188%)
- 154/58070 identifiers read (0.265197%)
- 0/7260 selectors read (0.000000%)
- 0/30842 statements read (0.000000%)
- 4/8400 macros read (0.047619%)
- 1/4995 lexical declcontexts read (0.020020%)
- 0/4413 visible declcontexts read (0.000000%)
- 0/7230 method pool entries read (0.000000%)
- 0 method pool misses
-</pre>
-
-<p>For this small program, only a tiny fraction of the source
-locations, types, declarations, identifiers, and macros were actually
-deserialized from the precompiled header. These statistics can be
-useful to determine whether the AST file implementation can
-be improved by making more of the implementation lazy.</p>
-
-<p>Precompiled headers can be chained. When you create a PCH while
-including an existing PCH, Clang can create the new PCH by referencing
-the original file and only writing the new data to the new file. For
-example, you could create a PCH out of all the headers that are very
-commonly used throughout your project, and then create a PCH for every
-single source file in the project that includes the code that is
-specific to that file, so that recompiling the file itself is very fast,
-without duplicating the data from the common headers for every
-file. The mechanisms behind chained precompiled headers are discussed
-in a <a href="#chained">later section</a>.
-
-<h2 id="contents">AST File Contents</h2>
-
-<img src="PCHLayout.png" style="float:right" alt="Precompiled header layout">
-
-<p>Clang's AST files are organized into several different
-blocks, each of which contains the serialized representation of a part
-of Clang's internal representation. Each of the blocks corresponds to
-either a block or a record within <a
- href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitstream
-format</a>. The contents of each of these logical blocks are described
-below.</p>
-
-<p>For a given AST file, the <a
-href="http://llvm.org/cmds/llvm-bcanalyzer.html"><code>llvm-bcanalyzer</code></a>
-utility can be used to examine the actual structure of the bitstream
-for the AST file. This information can be used both to help
-understand the structure of the AST file and to isolate
-areas where AST files can still be optimized, e.g., through
-the introduction of abbreviations.</p>
-
-<h3 id="metadata">Metadata Block</h3>
-
-<p>The metadata block contains several records that provide
-information about how the AST file was built. This metadata
-is primarily used to validate the use of an AST file. For
-example, a precompiled header built for a 32-bit x86 target cannot be used
-when compiling for a 64-bit x86 target. The metadata block contains
-information about:</p>
-
-<dl>
- <dt>Language options</dt>
- <dd>Describes the particular language dialect used to compile the
-AST file, including major options (e.g., Objective-C support) and more
-minor options (e.g., support for "//" comments). The contents of this
-record correspond to the <code>LangOptions</code> class.</dd>
-
- <dt>Target architecture</dt>
- <dd>The target triple that describes the architecture, platform, and
-ABI for which the AST file was generated, e.g.,
-<code>i386-apple-darwin9</code>.</dd>
-
- <dt>AST version</dt>
- <dd>The major and minor version numbers of the AST file
-format. Changes in the minor version number should not affect backward
-compatibility, while changes in the major version number imply that a
-newer compiler cannot read an older precompiled header (and
-vice-versa).</dd>
-
- <dt>Original file name</dt>
- <dd>The full path of the header that was used to generate the
-AST file.</dd>
-
- <dt>Predefines buffer</dt>
- <dd>Although not explicitly stored as part of the metadata, the
-predefines buffer is used in the validation of the AST file.
-The predefines buffer itself contains code generated by the compiler
-to initialize the preprocessor state according to the current target,
-platform, and command-line options. For example, the predefines buffer
-will contain "<code>#define __STDC__ 1</code>" when we are compiling C
-without Microsoft extensions. The predefines buffer itself is stored
-within the <a href="#sourcemgr">source manager block</a>, but its
-contents are verified along with the rest of the metadata.</dd>
-
-</dl>
-
-<p>A chained PCH file (that is, one that references another PCH) and a
-module (which may import other modules) have additional metadata
-containing the list of all AST files that this AST file depends
-on. Each of those files will be loaded along with this AST file.</p>
-
-<p>For chained precompiled headers, the language options, target
-architecture and predefines buffer data is taken from the end of the
-chain, since they have to match anyway.</p>
-
-<h3 id="sourcemgr">Source Manager Block</h3>
-
-<p>The source manager block contains the serialized representation of
-Clang's <a
- href="InternalsManual.html#SourceLocation">SourceManager</a> class,
-which handles the mapping from source locations (as represented in
-Clang's abstract syntax tree) into actual column/line positions within
-a source file or macro instantiation. The AST file's
-representation of the source manager also includes information about
-all of the headers that were (transitively) included when building the
-AST file.</p>
-
-<p>The bulk of the source manager block is dedicated to information
-about the various files, buffers, and macro instantiations into which
-a source location can refer. Each of these is referenced by a numeric
-"file ID", which is a unique number (allocated starting at 1) stored
-in the source location. Clang serializes the information for each kind
-of file ID, along with an index that maps file IDs to the position
-within the AST file where the information about that file ID is
-stored. The data associated with a file ID is loaded only when
-required by the front end, e.g., to emit a diagnostic that includes a
-macro instantiation history inside the header itself.</p>
-
-<p>The source manager block also contains information about all of the
-headers that were included when building the AST file. This
-includes information about the controlling macro for the header (e.g.,
-when the preprocessor identified that the contents of the header
-dependent on a macro like <code>LLVM_CLANG_SOURCEMANAGER_H</code>)
-along with a cached version of the results of the <code>stat()</code>
-system calls performed when building the AST file. The
-latter is particularly useful in reducing system time when searching
-for include files.</p>
-
-<h3 id="preprocessor">Preprocessor Block</h3>
-
-<p>The preprocessor block contains the serialized representation of
-the preprocessor. Specifically, it contains all of the macros that
-have been defined by the end of the header used to build the
-AST file, along with the token sequences that comprise each
-macro. The macro definitions are only read from the AST file when the
-name of the macro first occurs in the program. This lazy loading of
-macro definitions is triggered by lookups into the <a
- href="#idtable">identifier table</a>.</p>
-
-<h3 id="types">Types Block</h3>
-
-<p>The types block contains the serialized representation of all of
-the types referenced in the translation unit. Each Clang type node
-(<code>PointerType</code>, <code>FunctionProtoType</code>, etc.) has a
-corresponding record type in the AST file. When types are deserialized
-from the AST file, the data within the record is used to
-reconstruct the appropriate type node using the AST context.</p>
-
-<p>Each type has a unique type ID, which is an integer that uniquely
-identifies that type. Type ID 0 represents the NULL type, type IDs
-less than <code>NUM_PREDEF_TYPE_IDS</code> represent predefined types
-(<code>void</code>, <code>float</code>, etc.), while other
-"user-defined" type IDs are assigned consecutively from
-<code>NUM_PREDEF_TYPE_IDS</code> upward as the types are encountered.
-The AST file has an associated mapping from the user-defined types
-block to the location within the types block where the serialized
-representation of that type resides, enabling lazy deserialization of
-types. When a type is referenced from within the AST file, that
-reference is encoded using the type ID shifted left by 3 bits. The
-lower three bits are used to represent the <code>const</code>,
-<code>volatile</code>, and <code>restrict</code> qualifiers, as in
-Clang's <a
- href="http://clang.llvm.org/docs/InternalsManual.html#Type">QualType</a>
-class.</p>
-
-<h3 id="decls">Declarations Block</h3>
-
-<p>The declarations block contains the serialized representation of
-all of the declarations referenced in the translation unit. Each Clang
-declaration node (<code>VarDecl</code>, <code>FunctionDecl</code>,
-etc.) has a corresponding record type in the AST file. When
-declarations are deserialized from the AST file, the data
-within the record is used to build and populate a new instance of the
-corresponding <code>Decl</code> node. As with types, each declaration
-node has a numeric ID that is used to refer to that declaration within
-the AST file. In addition, a lookup table provides a mapping from that
-numeric ID to the offset within the precompiled header where that
-declaration is described.</p>
-
-<p>Declarations in Clang's abstract syntax trees are stored
-hierarchically. At the top of the hierarchy is the translation unit
-(<code>TranslationUnitDecl</code>), which contains all of the
-declarations in the translation unit but is not actually written as a
-specific declaration node. Its child declarations (such as
-functions or struct types) may also contain other declarations inside
-them, and so on. Within Clang, each declaration is stored within a <a
-href="http://clang.llvm.org/docs/InternalsManual.html#DeclContext">declaration
-context</a>, as represented by the <code>DeclContext</code> class.
-Declaration contexts provide the mechanism to perform name lookup
-within a given declaration (e.g., find the member named <code>x</code>
-in a structure) and iterate over the declarations stored within a
-context (e.g., iterate over all of the fields of a structure for
-structure layout).</p>
-
-<p>In Clang's AST file format, deserializing a declaration
-that is a <code>DeclContext</code> is a separate operation from
-deserializing all of the declarations stored within that declaration
-context. Therefore, Clang will deserialize the translation unit
-declaration without deserializing the declarations within that
-translation unit. When required, the declarations stored within a
-declaration context will be deserialized. There are two representations
-of the declarations within a declaration context, which correspond to
-the name-lookup and iteration behavior described above:</p>
-
-<ul>
- <li>When the front end performs name lookup to find a name
- <code>x</code> within a given declaration context (for example,
- during semantic analysis of the expression <code>p-&gt;x</code>,
- where <code>p</code>'s type is defined in the precompiled header),
- Clang refers to an on-disk hash table that maps from the names
- within that declaration context to the declaration IDs that
- represent each visible declaration with that name. The actual
- declarations will then be deserialized to provide the results of
- name lookup.</li>
-
- <li>When the front end performs iteration over all of the
- declarations within a declaration context, all of those declarations
- are immediately de-serialized. For large declaration contexts (e.g.,
- the translation unit), this operation is expensive; however, large
- declaration contexts are not traversed in normal compilation, since
- such a traversal is unnecessary. However, it is common for the code
- generator and semantic analysis to traverse declaration contexts for
- structs, classes, unions, and enumerations, although those contexts
- contain relatively few declarations in the common case.</li>
-</ul>
-
-<h3 id="stmt">Statements and Expressions</h3>
-
-<p>Statements and expressions are stored in the AST file in
-both the <a href="#types">types</a> and the <a
- href="#decls">declarations</a> blocks, because every statement or
-expression will be associated with either a type or declaration. The
-actual statement and expression records are stored immediately
-following the declaration or type that owns the statement or
-expression. For example, the statement representing the body of a
-function will be stored directly following the declaration of the
-function.</p>
-
-<p>As with types and declarations, each statement and expression kind
-in Clang's abstract syntax tree (<code>ForStmt</code>,
-<code>CallExpr</code>, etc.) has a corresponding record type in the
-AST file, which contains the serialized representation of
-that statement or expression. Each substatement or subexpression
-within an expression is stored as a separate record (which keeps most
-records to a fixed size). Within the AST file, the
-subexpressions of an expression are stored, in reverse order, prior to the expression
-that owns those expression, using a form of <a
-href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">Reverse
-Polish Notation</a>. For example, an expression <code>3 - 4 + 5</code>
-would be represented as follows:</p>
-
-<table border="1">
- <tr><td><code>IntegerLiteral(5)</code></td></tr>
- <tr><td><code>IntegerLiteral(4)</code></td></tr>
- <tr><td><code>IntegerLiteral(3)</code></td></tr>
- <tr><td><code>BinaryOperator(-)</code></td></tr>
- <tr><td><code>BinaryOperator(+)</code></td></tr>
- <tr><td>STOP</td></tr>
-</table>
-
-<p>When reading this representation, Clang evaluates each expression
-record it encounters, builds the appropriate abstract syntax tree node,
-and then pushes that expression on to a stack. When a record contains <i>N</i>
-subexpressions--<code>BinaryOperator</code> has two of them--those
-expressions are popped from the top of the stack. The special STOP
-code indicates that we have reached the end of a serialized expression
-or statement; other expression or statement records may follow, but
-they are part of a different expression.</p>
-
-<h3 id="idtable">Identifier Table Block</h3>
-
-<p>The identifier table block contains an on-disk hash table that maps
-each identifier mentioned within the AST file to the
-serialized representation of the identifier's information (e.g, the
-<code>IdentifierInfo</code> structure). The serialized representation
-contains:</p>
-
-<ul>
- <li>The actual identifier string.</li>
- <li>Flags that describe whether this identifier is the name of a
- built-in, a poisoned identifier, an extension token, or a
- macro.</li>
- <li>If the identifier names a macro, the offset of the macro
- definition within the <a href="#preprocessor">preprocessor
- block</a>.</li>
- <li>If the identifier names one or more declarations visible from
- translation unit scope, the <a href="#decls">declaration IDs</a> of these
- declarations.</li>
-</ul>
-
-<p>When an AST file is loaded, the AST file reader
-mechanism introduces itself into the identifier table as an external
-lookup source. Thus, when the user program refers to an identifier
-that has not yet been seen, Clang will perform a lookup into the
-identifier table. If an identifier is found, its contents (macro
-definitions, flags, top-level declarations, etc.) will be
-deserialized, at which point the corresponding
-<code>IdentifierInfo</code> structure will have the same contents it
-would have after parsing the headers in the AST file.</p>
-
-<p>Within the AST file, the identifiers used to name declarations are represented with an integral value. A separate table provides a mapping from this integral value (the identifier ID) to the location within the on-disk
-hash table where that identifier is stored. This mapping is used when
-deserializing the name of a declaration, the identifier of a token, or
-any other construct in the AST file that refers to a name.</p>
-
-<h3 id="method-pool">Method Pool Block</h3>
-
-<p>The method pool block is represented as an on-disk hash table that
-serves two purposes: it provides a mapping from the names of
-Objective-C selectors to the set of Objective-C instance and class
-methods that have that particular selector (which is required for
-semantic analysis in Objective-C) and also stores all of the selectors
-used by entities within the AST file. The design of the
-method pool is similar to that of the <a href="#idtable">identifier
-table</a>: the first time a particular selector is formed during the
-compilation of the program, Clang will search in the on-disk hash
-table of selectors; if found, Clang will read the Objective-C methods
-associated with that selector into the appropriate front-end data
-structure (<code>Sema::InstanceMethodPool</code> and
-<code>Sema::FactoryMethodPool</code> for instance and class methods,
-respectively).</p>
-
-<p>As with identifiers, selectors are represented by numeric values
-within the AST file. A separate index maps these numeric selector
-values to the offset of the selector within the on-disk hash table,
-and will be used when de-serializing an Objective-C method declaration
-(or other Objective-C construct) that refers to the selector.</p>
-
-<h2 id="tendrils">AST Reader Integration Points</h2>
-
-<p>The "lazy" deserialization behavior of AST files requires
-their integration into several completely different submodules of
-Clang. For example, lazily deserializing the declarations during name
-lookup requires that the name-lookup routines be able to query the
-AST file to find entities stored there.</p>
-
-<p>For each Clang data structure that requires direct interaction with
-the AST reader logic, there is an abstract class that provides
-the interface between the two modules. The <code>ASTReader</code>
-class, which handles the loading of an AST file, inherits
-from all of these abstract classes to provide lazy deserialization of
-Clang's data structures. <code>ASTReader</code> implements the
-following abstract classes:</p>
-
-<dl>
- <dt><code>StatSysCallCache</code></dt>
- <dd>This abstract interface is associated with the
- <code>FileManager</code> class, and is used whenever the file
- manager is going to perform a <code>stat()</code> system call.</dd>
-
- <dt><code>ExternalSLocEntrySource</code></dt>
- <dd>This abstract interface is associated with the
- <code>SourceManager</code> class, and is used whenever the
- <a href="#sourcemgr">source manager</a> needs to load the details
- of a file, buffer, or macro instantiation.</dd>
-
- <dt><code>IdentifierInfoLookup</code></dt>
- <dd>This abstract interface is associated with the
- <code>IdentifierTable</code> class, and is used whenever the
- program source refers to an identifier that has not yet been seen.
- In this case, the AST reader searches for
- this identifier within its <a href="#idtable">identifier table</a>
- to load any top-level declarations or macros associated with that
- identifier.</dd>
-
- <dt><code>ExternalASTSource</code></dt>
- <dd>This abstract interface is associated with the
- <code>ASTContext</code> class, and is used whenever the abstract
- syntax tree nodes need to loaded from the AST file. It
- provides the ability to de-serialize declarations and types
- identified by their numeric values, read the bodies of functions
- when required, and read the declarations stored within a
- declaration context (either for iteration or for name lookup).</dd>
-
- <dt><code>ExternalSemaSource</code></dt>
- <dd>This abstract interface is associated with the <code>Sema</code>
- class, and is used whenever semantic analysis needs to read
- information from the <a href="#methodpool">global method
- pool</a>.</dd>
-</dl>
-
-<h2 id="chained">Chained precompiled headers</h2>
-
-<p>Chained precompiled headers were initially intended to improve the
-performance of IDE-centric operations such as syntax highlighting and
-code completion while a particular source file is being edited by the
-user. To minimize the amount of reparsing required after a change to
-the file, a form of precompiled header--called a precompiled
-<i>preamble</i>--is automatically generated by parsing all of the
-headers in the source file, up to and including the last
-#include. When only the source file changes (and none of the headers
-it depends on), reparsing of that source file can use the precompiled
-preamble and start parsing after the #includes, so parsing time is
-proportional to the size of the source file (rather than all of its
-includes). However, the compilation of that translation unit
-may already use a precompiled header: in this case, Clang will create
-the precompiled preamble as a chained precompiled header that refers
-to the original precompiled header. This drastically reduces the time
-needed to serialize the precompiled preamble for use in reparsing.</p>
-
-<p>Chained precompiled headers get their name because each precompiled header
-can depend on one other precompiled header, forming a chain of
-dependencies. A translation unit will then include the precompiled
-header that starts the chain (i.e., nothing depends on it). This
-linearity of dependencies is important for the semantic model of
-chained precompiled headers, because the most-recent precompiled
-header can provide information that overrides the information provided
-by the precompiled headers it depends on, just like a header file
-<code>B.h</code> that includes another header <code>A.h</code> can
-modify the state produced by parsing <code>A.h</code>, e.g., by
-<code>#undef</code>'ing a macro defined in <code>A.h</code>.</p>
-
-<p>There are several ways in which chained precompiled headers
-generalize the AST file model:</p>
-
-<dl>
- <dt>Numbering of IDs</dt>
- <dd>Many different kinds of entities--identifiers, declarations,
- types, etc.---have ID numbers that start at 1 or some other
- predefined constant and grow upward. Each precompiled header records
- the maximum ID number it has assigned in each category. Then, when a
- new precompiled header is generated that depends on (chains to)
- another precompiled header, it will start counting at the next
- available ID number. This way, one can determine, given an ID
- number, which AST file actually contains the entity.</dd>
-
- <dt>Name lookup</dt>
- <dd>When writing a chained precompiled header, Clang attempts to
- write only information that has changed from the precompiled header
- on which it is based. This changes the lookup algorithm for the
- various tables, such as the <a href="#idtable">identifier table</a>:
- the search starts at the most-recent precompiled header. If no entry
- is found, lookup then proceeds to the identifier table in the
- precompiled header it depends on, and so one. Once a lookup
- succeeds, that result is considered definitive, overriding any
- results from earlier precompiled headers.</dd>
-
- <dt>Update records</dt>
- <dd>There are various ways in which a later precompiled header can
- modify the entities described in an earlier precompiled header. For
- example, later precompiled headers can add entries into the various
- name-lookup tables for the translation unit or namespaces, or add
- new categories to an Objective-C class. Each of these updates is
- captured in an "update record" that is stored in the chained
- precompiled header file and will be loaded along with the original
- entity.</dd>
-</dl>
-
-<h2 id="modules">Modules</h2>
-
-<p>Modules generalize the chained precompiled header model yet
-further, from a linear chain of precompiled headers to an arbitrary
-directed acyclic graph (DAG) of AST files. All of the same techniques
-used to make chained precompiled headers work---ID number, name
-lookup, update records---are shared with modules. However, the DAG
-nature of modules introduce a number of additional complications to
-the model:
-
-<dl>
- <dt>Numbering of IDs</dt>
- <dd>The simple, linear numbering scheme used in chained precompiled
- headers falls apart with the module DAG, because different modules
- may end up with different numbering schemes for entities they
- imported from common shared modules. To account for this, each
- module file provides information about which modules it depends on
- and which ID numbers it assigned to the entities in those modules,
- as well as which ID numbers it took for its own new entities. The
- AST reader then maps these "local" ID numbers into a "global" ID
- number space for the current translation unit, providing a 1-1
- mapping between entities (in whatever AST file they inhabit) and
- global ID numbers. If that translation unit is then serialized into
- an AST file, this mapping will be stored for use when the AST file
- is imported.</dd>
-
- <dt>Declaration merging</dt>
- <dd>It is possible for a given entity (from the language's
- perspective) to be declared multiple times in different places. For
- example, two different headers can have the declaration of
- <tt>printf</tt> or could forward-declare <tt>struct stat</tt>. If
- each of those headers is included in a module, and some third party
- imports both of those modules, there is a potentially serious
- problem: name lookup for <tt>printf</tt> or <tt>struct stat</tt> will
- find both declarations, but the AST nodes are unrelated. This would
- result in a compilation error, due to an ambiguity in name
- lookup. Therefore, the AST reader performs declaration merging
- according to the appropriate language semantics, ensuring that the
- two disjoint declarations are merged into a single redeclaration
- chain (with a common canonical declaration), so that it is as if one
- of the headers had been included before the other.</dd>
-
- <dt>Name Visibility</dt>
- <dd>Modules allow certain names that occur during module creation to
- be "hidden", so that they are not part of the public interface of
- the module and are not visible to its clients. The AST reader
- maintains a "visible" bit on various AST nodes (declarations, macros,
- etc.) to indicate whether that particular AST node is currently
- visible; the various name lookup mechanisms in Clang inspect the
- visible bit to determine whether that entity, which is still in the
- AST (because other, visible AST nodes may depend on it), can
- actually be found by name lookup. When a new (sub)module is
- imported, it may make existing, non-visible, already-deserialized
- AST nodes visible; it is the responsibility of the AST reader to
- find and update these AST nodes when it is notified of the import.</dd>
-
-</dl>
-
-</div>
-
-</body>
-</html>
OpenPOWER on IntegriCloud