diff options
Diffstat (limited to 'docs/SourceLevelDebugging.html')
-rw-r--r-- | docs/SourceLevelDebugging.html | 1219 |
1 files changed, 1130 insertions, 89 deletions
diff --git a/docs/SourceLevelDebugging.html b/docs/SourceLevelDebugging.html index 75fae6e89..259a259 100644 --- a/docs/SourceLevelDebugging.html +++ b/docs/SourceLevelDebugging.html @@ -53,6 +53,28 @@ <li><a href="#ccxx_composite_types">C/C++ struct/union types</a></li> <li><a href="#ccxx_enumeration_types">C/C++ enumeration types</a></li> </ol></li> + <li><a href="#llvmdwarfextension">LLVM Dwarf Extensions</a> + <ol> + <li><a href="#objcproperty">Debugging Information Extension + for Objective C Properties</a> + <ul> + <li><a href="#objcpropertyintroduction">Introduction</a></li> + <li><a href="#objcpropertyproposal">Proposal</a></li> + <li><a href="#objcpropertynewattributes">New DWARF Attributes</a></li> + <li><a href="#objcpropertynewconstants">New DWARF Constants</a></li> + </ul> + </li> + <li><a href="#acceltable">Name Accelerator Tables</a> + <ul> + <li><a href="#acceltableintroduction">Introduction</a></li> + <li><a href="#acceltablehashes">Hash Tables</a></li> + <li><a href="#acceltabledetails">Details</a></li> + <li><a href="#acceltablecontents">Contents</a></li> + <li><a href="#acceltableextensions">Language Extensions and File Format Changes</a></li> + </ul> + </li> + </ol> + </li> </ul> </td> <td class="right"> @@ -231,8 +253,8 @@ height="369"> for the optimizer to optimize the program and debugging information without necessarily having to know anything about debugging information. In particular, the use of metadata avoids duplicated debugging information from - the beginning, and the global dead code elimination pass automatically - deletes debugging information for a function if it decides to delete the + the beginning, and the global dead code elimination pass automatically + deletes debugging information for a function if it decides to delete the function. </p> <p>To do this, most of the debugging information (descriptors for types, @@ -241,9 +263,9 @@ height="369"> <p>Debug information is designed to be agnostic about the target debugger and debugging information representation (e.g. DWARF/Stabs/etc). It uses a - generic pass to decode the information that represents variables, types, - functions, namespaces, etc: this allows for arbitrary source-language - semantics and type-systems to be used, as long as there is a module + generic pass to decode the information that represents variables, types, + functions, namespaces, etc: this allows for arbitrary source-language + semantics and type-systems to be used, as long as there is a module written for the target debugger to interpret the information. </p> <p>To provide basic functionality, the LLVM debugger does have to make some @@ -279,7 +301,7 @@ height="369"> the range 0x1000 through 0x2000 (there is a defined enum DW_TAG_user_base = 0x1000.)</p> -<p>The fields of debug descriptors used internally by LLVM +<p>The fields of debug descriptors used internally by LLVM are restricted to only the simple data types <tt>i32</tt>, <tt>i1</tt>, <tt>float</tt>, <tt>double</tt>, <tt>mdstring</tt> and <tt>mdnode</tt>. </p> @@ -301,7 +323,7 @@ height="369"> with the current debug version (LLVMDebugVersion = 8 << 16 or 0x80000 or 524288.)</a></p> -<p>The details of the various descriptors follow.</p> +<p>The details of the various descriptors follow.</p> <!-- ======================================================================= --> <h4> @@ -313,14 +335,14 @@ height="369"> <div class="doc_code"> <pre> !0 = metadata !{ - i32, ;; Tag = 17 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> + i32, ;; Tag = 17 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> ;; (DW_TAG_compile_unit) - i32, ;; Unused field. - i32, ;; DWARF language identifier (ex. DW_LANG_C89) + i32, ;; Unused field. + i32, ;; DWARF language identifier (ex. DW_LANG_C89) metadata, ;; Source file name metadata, ;; Source file directory (includes trailing slash) metadata ;; Producer (ex. "4.0.1 LLVM (LLVM research group)") - i1, ;; True if this is a main compile unit. + i1, ;; True if this is a main compile unit. i1, ;; True if this is optimized. metadata, ;; Flags i32 ;; Runtime version @@ -340,7 +362,7 @@ height="369"> <p>Compile unit descriptors provide the root context for objects declared in a specific compilation unit. File descriptors are defined using this context. - These descriptors are collected by a named metadata + These descriptors are collected by a named metadata <tt>!llvm.dbg.cu</tt>. Compile unit descriptor keeps track of subprograms, global variables and type information. @@ -356,7 +378,7 @@ height="369"> <div class="doc_code"> <pre> !0 = metadata !{ - i32, ;; Tag = 41 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> + i32, ;; Tag = 41 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> ;; (DW_TAG_file_type) metadata, ;; Source file name metadata, ;; Source file directory (includes trailing slash) @@ -384,7 +406,7 @@ height="369"> <div class="doc_code"> <pre> !1 = metadata !{ - i32, ;; Tag = 52 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> + i32, ;; Tag = 52 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> ;; (DW_TAG_variable) i32, ;; Unused field. metadata, ;; Reference to context descriptor @@ -403,7 +425,8 @@ height="369"> <p>These descriptors provide debug information about globals variables. The provide details such as name, type and where the variable is defined. All -global variables are collected by named metadata <tt>!llvm.dbg.gv</tt>.</p> +global variables are collected inside the named metadata +<tt>!llvm.dbg.cu</tt>.</p> </div> @@ -429,11 +452,12 @@ global variables are collected by named metadata <tt>!llvm.dbg.gv</tt>.</p> metadata, ;; Reference to type descriptor i1, ;; True if the global is local to compile unit (static) i1, ;; True if the global is defined in the compile unit (not extern) + i32, ;; Line number where the scope of the subprogram begins i32, ;; Virtuality, e.g. dwarf::DW_VIRTUALITY__virtual i32, ;; Index into a virtual function - metadata, ;; indicates which base type contains the vtable pointer for the + metadata, ;; indicates which base type contains the vtable pointer for the ;; derived class - i1, ;; isArtificial + i32, ;; Flags - Artifical, Private, Protected, Explicit, Prototyped. i1, ;; isOptimized Function *,;; Pointer to LLVM function metadata, ;; Lists function template parameters @@ -446,8 +470,6 @@ global variables are collected by named metadata <tt>!llvm.dbg.gv</tt>.</p> <p>These descriptors provide debug information about functions, methods and subprograms. They provide details such as name, return types and the source location where the subprogram is defined. - All subprogram descriptors are collected by a named metadata - <tt>!llvm.dbg.sp</tt>. </p> </div> @@ -501,9 +523,9 @@ global variables are collected by named metadata <tt>!llvm.dbg.gv</tt>.</p> <div class="doc_code"> <pre> !4 = metadata !{ - i32, ;; Tag = 36 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> + i32, ;; Tag = 36 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> ;; (DW_TAG_base_type) - metadata, ;; Reference to context + metadata, ;; Reference to context metadata, ;; Name (may be "" for anonymous types) metadata, ;; Reference to file where defined (may be NULL) i32, ;; Line number where defined (may be 0) @@ -561,9 +583,10 @@ DW_ATE_unsigned_char = 8 i64, ;; Size in bits i64, ;; Alignment in bits i64, ;; Offset in bits + i32, ;; Flags to encode attributes, e.g. private metadata, ;; Reference to type derived from - metadata, ;; (optional) Name of the Objective C property assoicated with - ;; Objective-C an ivar + metadata, ;; (optional) Name of the Objective C property associated with + ;; Objective-C an ivar metadata, ;; (optional) Name of the Objective C property getter selector. metadata, ;; (optional) Name of the Objective C property setter selector. i32 ;; (optional) Objective C property attributes. @@ -597,9 +620,9 @@ DW_TAG_restrict_type = 55 <p><tt>DW_TAG_typedef</tt> is used to provide a name for the derived type.</p> -<p><tt>DW_TAG_pointer_type</tt>,<tt>DW_TAG_reference_type</tt>, - <tt>DW_TAG_const_type</tt>, <tt>DW_TAG_volatile_type</tt> - and <tt>DW_TAG_restrict_type</tt> are used to qualify +<p><tt>DW_TAG_pointer_type</tt>, <tt>DW_TAG_reference_type</tt>, + <tt>DW_TAG_const_type</tt>, <tt>DW_TAG_volatile_type</tt> and + <tt>DW_TAG_restrict_type</tt> are used to qualify the <a href="#format_derived_type">derived type</a>. </p> <p><a href="#format_derived_type">Derived type</a> location can be determined @@ -667,7 +690,8 @@ DW_TAG_inheritance = 28 <p>The members of enumeration types (tag = <tt>DW_TAG_enumeration_type</tt>) are <a href="#format_enumeration">enumerator descriptors</a>, each representing the definition of enumeration value for the set. All enumeration type - descriptors are collected by named metadata <tt>!llvm.dbg.enum</tt>.</p> + descriptors are collected inside the named metadata + <tt>!llvm.dbg.cu</tt>.</p> <p>The members of structure (tag = <tt>DW_TAG_structure_type</tt>) or union (tag = <tt>DW_TAG_union_type</tt>) types are any one of @@ -738,7 +762,7 @@ DW_TAG_inheritance = 28 <div class="doc_code"> <pre> !6 = metadata !{ - i32, ;; Tag = 40 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> + i32, ;; Tag = 40 + <a href="#LLVMDebugVersion">LLVMDebugVersion</a> ;; (DW_TAG_enumerator) metadata, ;; Name i64 ;; Value @@ -820,9 +844,9 @@ DW_TAG_return_variable = 258 void %<a href="#format_common_declare">llvm.dbg.declare</a>(metadata, metadata) </pre> -<p>This intrinsic provides information about a local element (ex. variable.) The - first argument is metadata holding alloca for the variable. The - second argument is metadata containing description of the variable. </p> +<p>This intrinsic provides information about a local element (e.g., variable). The + first argument is metadata holding the alloca for the variable. The + second argument is metadata containing a description of the variable.</p> </div> <!-- ======================================================================= --> @@ -838,8 +862,8 @@ DW_TAG_return_variable = 258 <p>This intrinsic provides information when a user source variable is set to a new value. The first argument is the new value (wrapped as metadata). The second argument is the offset in the user source variable where the new value - is written. The third argument is metadata containing description of the - user source variable. </p> + is written. The third argument is metadata containing a description of the + user source variable.</p> </div> </div> @@ -906,27 +930,27 @@ entry: declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone -!0 = metadata !{i32 459008, metadata !1, metadata !"X", +!0 = metadata !{i32 459008, metadata !1, metadata !"X", metadata !3, i32 2, metadata !6}; [ DW_TAG_auto_variable ] !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] -!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo", - metadata !"foo", metadata !3, i32 1, metadata !4, +!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", metadata !"foo", + metadata !"foo", metadata !3, i32 1, metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ] -!3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", - metadata !"/private/tmp", metadata !"clang 1.1", i1 true, +!3 = metadata !{i32 458769, i32 0, i32 12, metadata !"foo.c", + metadata !"/private/tmp", metadata !"clang 1.1", i1 true, i1 false, metadata !"", i32 0}; [DW_TAG_compile_unit ] -!4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0, +!4 = metadata !{i32 458773, metadata !3, metadata !"", null, i32 0, i64 0, i64 0, i64 0, i32 0, null, metadata !5, i32 0}; [DW_TAG_subroutine_type ] !5 = metadata !{null} -!6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0, +!6 = metadata !{i32 458788, metadata !3, metadata !"int", metadata !3, i32 0, i64 32, i64 32, i64 0, i32 0, i32 5}; [DW_TAG_base_type ] !7 = metadata !{i32 2, i32 7, metadata !1, null} !8 = metadata !{i32 2, i32 3, metadata !1, null} -!9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3, +!9 = metadata !{i32 459008, metadata !1, metadata !"Y", metadata !3, i32 3, metadata !6}; [ DW_TAG_auto_variable ] !10 = metadata !{i32 3, i32 7, metadata !1, null} !11 = metadata !{i32 3, i32 3, metadata !1, null} -!12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5, +!12 = metadata !{i32 459008, metadata !13, metadata !"Z", metadata !3, i32 5, metadata !6}; [ DW_TAG_auto_variable ] !13 = metadata !{i32 458763, metadata !1}; [DW_TAG_lexical_block ] !14 = metadata !{i32 5, i32 9, metadata !13, null} @@ -946,7 +970,7 @@ declare void @llvm.dbg.declare(metadata, metadata) nounwind readnone <div class="doc_code"> <pre> -call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7 +call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7 </pre> </div> @@ -960,9 +984,9 @@ call void @llvm.dbg.declare(metadata, metadata !0), !dbg !7 <pre> !7 = metadata !{i32 2, i32 7, metadata !1, null} !1 = metadata !{i32 458763, metadata !2}; [DW_TAG_lexical_block ] -!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", - metadata !"foo", metadata !"foo", metadata !3, i32 1, - metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ] +!2 = metadata !{i32 458798, i32 0, metadata !3, metadata !"foo", + metadata !"foo", metadata !"foo", metadata !3, i32 1, + metadata !4, i1 false, i1 true}; [DW_TAG_subprogram ] </pre> </div> @@ -987,7 +1011,7 @@ call void @llvm.dbg.declare(metadata, metadata !12), !dbg !14 <p>The second intrinsic <tt>%<a href="#format_common_declare">llvm.dbg.declare</a></tt> - encodes debugging information for variable <tt>Z</tt>. The metadata + encodes debugging information for variable <tt>Z</tt>. The metadata <tt>!dbg !14</tt> attached to the intrinsic provides scope information for the variable <tt>Z</tt>.</p> @@ -1068,9 +1092,9 @@ int main(int argc, char *argv[]) { i32 524305, ;; Tag i32 0, ;; Unused i32 4, ;; Language Id - metadata !"MySource.cpp", - metadata !"/Users/mine/sources", - metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)", + metadata !"MySource.cpp", + metadata !"/Users/mine/sources", + metadata !"4.2.1 (Based on Apple Inc. build 5649) (LLVM build 00)", i1 true, ;; Main Compile Unit i1 false, ;; Optimized compile unit metadata !"", ;; Compiler flags @@ -1081,8 +1105,8 @@ int main(int argc, char *argv[]) { ;; !1 = metadata !{ i32 524329, ;; Tag - metadata !"MySource.cpp", - metadata !"/Users/mine/sources", + metadata !"MySource.cpp", + metadata !"/Users/mine/sources", metadata !2 ;; Compile unit } @@ -1092,7 +1116,7 @@ int main(int argc, char *argv[]) { !3 = metadata !{ i32 524329, ;; Tag metadata !"Myheader.h" - metadata !"/Users/mine/sources", + metadata !"/Users/mine/sources", metadata !2 ;; Compile unit } @@ -1100,9 +1124,9 @@ int main(int argc, char *argv[]) { </pre> </div> -<p>llvm::Instruction provides easy access to metadata attached with an +<p>llvm::Instruction provides easy access to metadata attached with an instruction. One can extract line number information encoded in LLVM IR -using <tt>Instruction::getMetadata()</tt> and +using <tt>Instruction::getMetadata()</tt> and <tt>DILocation::getLineNumber()</tt>. <pre> if (MDNode *N = I->getMetadata("dbg")) { // Here I is an LLVM instruction @@ -1141,44 +1165,79 @@ int MyGlobal = 100; ;; ;; List of debug info of globals ;; -!llvm.dbg.gv = !{!0} +!llvm.dbg.cu = !{!0} -;; -;; Define the global variable descriptor. Note the reference to the global -;; variable anchor and the global variable itself. -;; +;; Define the compile unit. !0 = metadata !{ - i32 524340, ;; Tag - i32 0, ;; Unused - metadata !1, ;; Context - metadata !"MyGlobal", ;; Name - metadata !"MyGlobal", ;; Display Name - metadata !"MyGlobal", ;; Linkage Name - metadata !3, ;; Compile Unit - i32 1, ;; Line Number - metadata !4, ;; Type - i1 false, ;; Is a local variable - i1 true, ;; Is this a definition - i32* @MyGlobal ;; The global variable + i32 786449, ;; Tag + i32 0, ;; Context + i32 4, ;; Language + metadata !"foo.cpp", ;; File + metadata !"/Volumes/Data/tmp", ;; Directory + metadata !"clang version 3.1 ", ;; Producer + i1 true, ;; Deprecated field + i1 false, ;; "isOptimized"? + metadata !"", ;; Flags + i32 0, ;; Runtime Version + metadata !1, ;; Enum Types + metadata !1, ;; Retained Types + metadata !1, ;; Subprograms + metadata !3 ;; Global Variables +} ; [ DW_TAG_compile_unit ] + +;; The Array of Global Variables +!3 = metadata !{ + metadata !4 } -;; -;; Define the basic type of 32 bit signed integer. Note that since int is an -;; intrinsic type the source file is NULL and line 0. -;; !4 = metadata !{ - i32 524324, ;; Tag - metadata !1, ;; Context - metadata !"int", ;; Name - metadata !1, ;; File - i32 0, ;; Line number - i64 32, ;; Size in Bits - i64 32, ;; Align in Bits - i64 0, ;; Offset in Bits - i32 0, ;; Flags - i32 5 ;; Encoding + metadata !5 } +;; +;; Define the global variable itself. +;; +!5 = metadata !{ + i32 786484, ;; Tag + i32 0, ;; Unused + null, ;; Unused + metadata !"MyGlobal", ;; Name + metadata !"MyGlobal", ;; Display Name + metadata !"", ;; Linkage Name + metadata !6, ;; File + i32 1, ;; Line + metadata !7, ;; Type + i32 0, ;; IsLocalToUnit + i32 1, ;; IsDefinition + i32* @MyGlobal ;; LLVM-IR Value +} ; [ DW_TAG_variable ] + +;; +;; Define the file +;; +!6 = metadata !{ + i32 786473, ;; Tag + metadata !"foo.cpp", ;; File + metadata !"/Volumes/Data/tmp", ;; Directory + null ;; Unused +} ; [ DW_TAG_file_type ] + +;; +;; Define the type +;; +!7 = metadata !{ + i32 786468, ;; Tag + null, ;; Unused + metadata !"int", ;; Name + null, ;; Unused + i32 0, ;; Line + i64 32, ;; Size in Bits + i64 32, ;; Align in Bits + i64 0, ;; Offset + i32 0, ;; Flags + i32 5 ;; Encoding +} ; [ DW_TAG_base_type ] + </pre> </div> @@ -1220,7 +1279,7 @@ int main(int argc, char *argv[]) { metadata !1, ;; File i32 1, ;; Line number metadata !4, ;; Type - i1 false, ;; Is local + i1 false, ;; Is local i1 true, ;; Is definition i32 0, ;; Virtuality attribute, e.g. pure virtual function i32 0, ;; Index into virtual table for C++ methods @@ -1314,7 +1373,7 @@ define i32 @main(i32 %argc, i8** %argv) { !2 = metadata !{ i32 524324, ;; Tag metadata !1, ;; Context - metadata !"unsigned char", + metadata !"unsigned char", metadata !1, ;; File i32 0, ;; Line number i64 8, ;; Size in Bits @@ -1803,6 +1862,988 @@ enum Trees { </div> + +<!-- *********************************************************************** --> +<h2> + <a name="llvmdwarfextension">Debugging information format</a> +</h2> +<!-- *********************************************************************** --> +<div> +<!-- ======================================================================= --> +<h3> + <a name="objcproperty">Debugging Information Extension for Objective C Properties</a> +</h3> +<div> +<!-- *********************************************************************** --> +<h4> + <a name="objcpropertyintroduction">Introduction</a> +</h4> +<!-- *********************************************************************** --> + +<div> +<p>Objective C provides a simpler way to declare and define accessor methods +using declared properties. The language provides features to declare a +property and to let compiler synthesize accessor methods. +</p> + +<p>The debugger lets developer inspect Objective C interfaces and their +instance variables and class variables. However, the debugger does not know +anything about the properties defined in Objective C interfaces. The debugger +consumes information generated by compiler in DWARF format. The format does +not support encoding of Objective C properties. This proposal describes DWARF +extensions to encode Objective C properties, which the debugger can use to let +developers inspect Objective C properties. +</p> + +</div> + + +<!-- *********************************************************************** --> +<h4> + <a name="objcpropertyproposal">Proposal</a> +</h4> +<!-- *********************************************************************** --> + +<div> +<p>Objective C properties exist separately from class members. A property +can be defined only by "setter" and "getter" selectors, and +be calculated anew on each access. Or a property can just be a direct access +to some declared ivar. Finally it can have an ivar "automatically +synthesized" for it by the compiler, in which case the property can be +referred to in user code directly using the standard C dereference syntax as +well as through the property "dot" syntax, but there is no entry in +the @interface declaration corresponding to this ivar. +</p> +<p> +To facilitate debugging, these properties we will add a new DWARF TAG into the +DW_TAG_structure_type definition for the class to hold the description of a +given property, and a set of DWARF attributes that provide said description. +The property tag will also contain the name and declared type of the property. +</p> +<p> +If there is a related ivar, there will also be a DWARF property attribute placed +in the DW_TAG_member DIE for that ivar referring back to the property TAG for +that property. And in the case where the compiler synthesizes the ivar directly, +the compiler is expected to generate a DW_TAG_member for that ivar (with the +DW_AT_artificial set to 1), whose name will be the name used to access this +ivar directly in code, and with the property attribute pointing back to the +property it is backing. +</p> +<p> +The following examples will serve as illustration for our discussion: +</p> + +<div class="doc_code"> +<pre> +@interface I1 { + int n2; +} + +@property int p1; +@property int p2; +@end + +@implementation I1 +@synthesize p1; +@synthesize p2 = n2; +@end +</pre> +</div> + +<p> +This produces the following DWARF (this is a "pseudo dwarfdump" output): +</p> +<div class="doc_code"> +<pre> +0x00000100: TAG_structure_type [7] * + AT_APPLE_runtime_class( 0x10 ) + AT_name( "I1" ) + AT_decl_file( "Objc_Property.m" ) + AT_decl_line( 3 ) + +0x00000110 TAG_APPLE_property + AT_name ( "p1" ) + AT_type ( {0x00000150} ( int ) ) + +0x00000120: TAG_APPLE_property + AT_name ( "p2" ) + AT_type ( {0x00000150} ( int ) ) + +0x00000130: TAG_member [8] + AT_name( "_p1" ) + AT_APPLE_property ( {0x00000110} "p1" ) + AT_type( {0x00000150} ( int ) ) + AT_artificial ( 0x1 ) + +0x00000140: TAG_member [8] + AT_name( "n2" ) + AT_APPLE_property ( {0x00000120} "p2" ) + AT_type( {0x00000150} ( int ) ) + +0x00000150: AT_type( ( int ) ) +</pre> +</div> + +<p> Note, the current convention is that the name of the ivar for an +auto-synthesized property is the name of the property from which it derives with +an underscore prepended, as is shown in the example. +But we actually don't need to know this convention, since we are given the name +of the ivar directly. +</p> + +<p> +Also, it is common practice in ObjC to have different property declarations in +the @interface and @implementation - e.g. to provide a read-only property in +the interface,and a read-write interface in the implementation. In that case, +the compiler should emit whichever property declaration will be in force in the +current translation unit. +</p> + +<p> Developers can decorate a property with attributes which are encoded using +DW_AT_APPLE_property_attribute. +</p> + +<div class="doc_code"> +<pre> +@property (readonly, nonatomic) int pr; +</pre> +</div> +<p> +Which produces a property tag: +<p> +<div class="doc_code"> +<pre> +TAG_APPLE_property [8] + AT_name( "pr" ) + AT_type ( {0x00000147} (int) ) + AT_APPLE_property_attribute (DW_APPLE_PROPERTY_readonly, DW_APPLE_PROPERTY_nonatomic) +</pre> +</div> + +<p> The setter and getter method names are attached to the property using +DW_AT_APPLE_property_setter and DW_AT_APPLE_property_getter attributes. +</p> +<div class="doc_code"> +<pre> +@interface I1 +@property (setter=myOwnP3Setter:) int p3; +-(void)myOwnP3Setter:(int)a; +@end + +@implementation I1 +@synthesize p3; +-(void)myOwnP3Setter:(int)a{ } +@end +</pre> +</div> + +<p> +The DWARF for this would be: +</p> +<div class="doc_code"> +<pre> +0x000003bd: TAG_structure_type [7] * + AT_APPLE_runtime_class( 0x10 ) + AT_name( "I1" ) + AT_decl_file( "Objc_Property.m" ) + AT_decl_line( 3 ) + +0x000003cd TAG_APPLE_property + AT_name ( "p3" ) + AT_APPLE_property_setter ( "myOwnP3Setter:" ) + AT_type( {0x00000147} ( int ) ) + +0x000003f3: TAG_member [8] + AT_name( "_p3" ) + AT_type ( {0x00000147} ( int ) ) + AT_APPLE_property ( {0x000003cd} ) + AT_artificial ( 0x1 ) +</pre> +</div> + +</div> + +<!-- *********************************************************************** --> +<h4> + <a name="objcpropertynewtags">New DWARF Tags</a> +</h4> +<!-- *********************************************************************** --> + +<div> +<table border="1" cellspacing="0"> + <col width="200"> + <col width="200"> + <tr> + <th>TAG</th> + <th>Value</th> + </tr> + <tr> + <td>DW_TAG_APPLE_property</td> + <td>0x4200</td> + </tr> +</table> + +</div> + +<!-- *********************************************************************** --> +<h4> + <a name="objcpropertynewattributes">New DWARF Attributes</a> +</h4> +<!-- *********************************************************************** --> + +<div> +<table border="1" cellspacing="0"> + <col width="200"> + <col width="200"> + <col width="200"> + <tr> + <th>Attribute</th> + <th>Value</th> + <th>Classes</th> + </tr> + <tr> + <td>DW_AT_APPLE_property</td> + <td>0x3fed</td> + <td>Reference</td> + </tr> + <tr> + <td>DW_AT_APPLE_property_getter</td> + <td>0x3fe9</td> + <td>String</td> + </tr> + <tr> + <td>DW_AT_APPLE_property_setter</td> + <td>0x3fea</td> + <td>String</td> + </tr> + <tr> + <td>DW_AT_APPLE_property_attribute</td> + <td>0x3feb</td> + <td>Constant</td> + </tr> +</table> + +</div> + +<!-- *********************************************************************** --> +<h4> + <a name="objcpropertynewconstants">New DWARF Constants</a> +</h4> +<!-- *********************************************************************** --> + +<div> +<table border="1" cellspacing="0"> + <col width="200"> + <col width="200"> + <tr> + <th>Name</th> + <th>Value</th> + </tr> + <tr> + <td>DW_AT_APPLE_PROPERTY_readonly</td> + <td>0x1</td> + </tr> + <tr> + <td>DW_AT_APPLE_PROPERTY_readwrite</td> + <td>0x2</td> + </tr> + <tr> + <td>DW_AT_APPLE_PROPERTY_assign</td> + <td>0x4</td> + </tr> + <tr> + <td>DW_AT_APPLE_PROPERTY_retain</td> + <td>0x8</td> + </tr> + <tr> + <td>DW_AT_APPLE_PROPERTY_copy</td> + <td>0x10</td> + </tr> + <tr> + <td>DW_AT_APPLE_PROPERTY_nonatomic</td> + <td>0x20</td> + </tr> +</table> + +</div> +</div> + +<!-- ======================================================================= --> +<h3> + <a name="acceltable">Name Accelerator Tables</a> +</h3> +<!-- ======================================================================= --> +<div> +<!-- ======================================================================= --> +<h4> + <a name="acceltableintroduction">Introduction</a> +</h4> +<!-- ======================================================================= --> +<div> +<p>The .debug_pubnames and .debug_pubtypes formats are not what a debugger + needs. The "pub" in the section name indicates that the entries in the + table are publicly visible names only. This means no static or hidden + functions show up in the .debug_pubnames. No static variables or private class + variables are in the .debug_pubtypes. Many compilers add different things to + these tables, so we can't rely upon the contents between gcc, icc, or clang.</p> + +<p>The typical query given by users tends not to match up with the contents of + these tables. For example, the DWARF spec states that "In the case of the + name of a function member or static data member of a C++ structure, class or + union, the name presented in the .debug_pubnames section is not the simple + name given by the DW_AT_name attribute of the referenced debugging information + entry, but rather the fully qualified name of the data or function member." + So the only names in these tables for complex C++ entries is a fully + qualified name. Debugger users tend not to enter their search strings as + "a::b::c(int,const Foo&) const", but rather as "c", "b::c" , or "a::b::c". So + the name entered in the name table must be demangled in order to chop it up + appropriately and additional names must be manually entered into the table + to make it effective as a name lookup table for debuggers to use.</p> + +<p>All debuggers currently ignore the .debug_pubnames table as a result of + its inconsistent and useless public-only name content making it a waste of + space in the object file. These tables, when they are written to disk, are + not sorted in any way, leaving every debugger to do its own parsing + and sorting. These tables also include an inlined copy of the string values + in the table itself making the tables much larger than they need to be on + disk, especially for large C++ programs.</p> + +<p>Can't we just fix the sections by adding all of the names we need to this + table? No, because that is not what the tables are defined to contain and we + won't know the difference between the old bad tables and the new good tables. + At best we could make our own renamed sections that contain all of the data + we need.</p> + +<p>These tables are also insufficient for what a debugger like LLDB needs. + LLDB uses clang for its expression parsing where LLDB acts as a PCH. LLDB is + then often asked to look for type "foo" or namespace "bar", or list items in + namespace "baz". Namespaces are not included in the pubnames or pubtypes + tables. Since clang asks a lot of questions when it is parsing an expression, + we need to be very fast when looking up names, as it happens a lot. Having new + accelerator tables that are optimized for very quick lookups will benefit + this type of debugging experience greatly.</p> + +<p>We would like to generate name lookup tables that can be mapped into + memory from disk, and used as is, with little or no up-front parsing. We would + also be able to control the exact content of these different tables so they + contain exactly what we need. The Name Accelerator Tables were designed + to fix these issues. In order to solve these issues we need to:</p> + +<ul> + <li>Have a format that can be mapped into memory from disk and used as is</li> + <li>Lookups should be very fast</li> + <li>Extensible table format so these tables can be made by many producers</li> + <li>Contain all of the names needed for typical lookups out of the box</li> + <li>Strict rules for the contents of tables</li> +</ul> + +<p>Table size is important and the accelerator table format should allow the + reuse of strings from common string tables so the strings for the names are + not duplicated. We also want to make sure the table is ready to be used as-is + by simply mapping the table into memory with minimal header parsing.</p> + +<p>The name lookups need to be fast and optimized for the kinds of lookups + that debuggers tend to do. Optimally we would like to touch as few parts of + the mapped table as possible when doing a name lookup and be able to quickly + find the name entry we are looking for, or discover there are no matches. In + the case of debuggers we optimized for lookups that fail most of the time.</p> + +<p>Each table that is defined should have strict rules on exactly what is in + the accelerator tables and documented so clients can rely on the content.</p> + +</div> + +<!-- ======================================================================= --> +<h4> + <a name="acceltablehashes">Hash Tables</a> +</h4> +<!-- ======================================================================= --> + +<div> +<h5>Standard Hash Tables</h5> + +<p>Typical hash tables have a header, buckets, and each bucket points to the +bucket contents: +</p> + +<div class="doc_code"> +<pre> +.------------. +| HEADER | +|------------| +| BUCKETS | +|------------| +| DATA | +`------------' +</pre> +</div> + +<p>The BUCKETS are an array of offsets to DATA for each hash:</p> + +<div class="doc_code"> +<pre> +.------------. +| 0x00001000 | BUCKETS[0] +| 0x00002000 | BUCKETS[1] +| 0x00002200 | BUCKETS[2] +| 0x000034f0 | BUCKETS[3] +| | ... +| 0xXXXXXXXX | BUCKETS[n_buckets] +'------------' +</pre> +</div> + +<p>So for bucket[3] in the example above, we have an offset into the table + 0x000034f0 which points to a chain of entries for the bucket. Each bucket + must contain a next pointer, full 32 bit hash value, the string itself, + and the data for the current string value.</p> + +<div class="doc_code"> +<pre> + .------------. +0x000034f0: | 0x00003500 | next pointer + | 0x12345678 | 32 bit hash + | "erase" | string value + | data[n] | HashData for this bucket + |------------| +0x00003500: | 0x00003550 | next pointer + | 0x29273623 | 32 bit hash + | "dump" | string value + | data[n] | HashData for this bucket + |------------| +0x00003550: | 0x00000000 | next pointer + | 0x82638293 | 32 bit hash + | "main" | string value + | data[n] | HashData for this bucket + `------------' +</pre> +</div> + +<p>The problem with this layout for debuggers is that we need to optimize for + the negative lookup case where the symbol we're searching for is not present. + So if we were to lookup "printf" in the table above, we would make a 32 hash + for "printf", it might match bucket[3]. We would need to go to the offset + 0x000034f0 and start looking to see if our 32 bit hash matches. To do so, we + need to read the next pointer, then read the hash, compare it, and skip to + the next bucket. Each time we are skipping many bytes in memory and touching + new cache pages just to do the compare on the full 32 bit hash. All of these + accesses then tell us that we didn't have a match.</p> + +<h5>Name Hash Tables</h5> + +<p>To solve the issues mentioned above we have structured the hash tables + a bit differently: a header, buckets, an array of all unique 32 bit hash + values, followed by an array of hash value data offsets, one for each hash + value, then the data for all hash values:</p> + +<div class="doc_code"> +<pre> +.-------------. +| HEADER | +|-------------| +| BUCKETS | +|-------------| +| HASHES | +|-------------| +| OFFSETS | +|-------------| +| DATA | +`-------------' +</pre> +</div> + +<p>The BUCKETS in the name tables are an index into the HASHES array. By + making all of the full 32 bit hash values contiguous in memory, we allow + ourselves to efficiently check for a match while touching as little + memory as possible. Most often checking the 32 bit hash values is as far as + the lookup goes. If it does match, it usually is a match with no collisions. + So for a table with "n_buckets" buckets, and "n_hashes" unique 32 bit hash + values, we can clarify the contents of the BUCKETS, HASHES and OFFSETS as:</p> + +<div class="doc_code"> +<pre> +.-------------------------. +| HEADER.magic | uint32_t +| HEADER.version | uint16_t +| HEADER.hash_function | uint16_t +| HEADER.bucket_count | uint32_t +| HEADER.hashes_count | uint32_t +| HEADER.header_data_len | uint32_t +| HEADER_DATA | HeaderData +|-------------------------| +| BUCKETS | uint32_t[n_buckets] // 32 bit hash indexes +|-------------------------| +| HASHES | uint32_t[n_buckets] // 32 bit hash values +|-------------------------| +| OFFSETS | uint32_t[n_buckets] // 32 bit offsets to hash value data +|-------------------------| +| ALL HASH DATA | +`-------------------------' +</pre> +</div> + +<p>So taking the exact same data from the standard hash example above we end up + with:</p> + +<div class="doc_code"> +<pre> + .------------. + | HEADER | + |------------| + | 0 | BUCKETS[0] + | 2 | BUCKETS[1] + | 5 | BUCKETS[2] + | 6 | BUCKETS[3] + | | ... + | ... | BUCKETS[n_buckets] + |------------| + | 0x........ | HASHES[0] + | 0x........ | HASHES[1] + | 0x........ | HASHES[2] + | 0x........ | HASHES[3] + | 0x........ | HASHES[4] + | 0x........ | HASHES[5] + | 0x12345678 | HASHES[6] hash for BUCKETS[3] + | 0x29273623 | HASHES[7] hash for BUCKETS[3] + | 0x82638293 | HASHES[8] hash for BUCKETS[3] + | 0x........ | HASHES[9] + | 0x........ | HASHES[10] + | 0x........ | HASHES[11] + | 0x........ | HASHES[12] + | 0x........ | HASHES[13] + | 0x........ | HASHES[n_hashes] + |------------| + | 0x........ | OFFSETS[0] + | 0x........ | OFFSETS[1] + | 0x........ | OFFSETS[2] + | 0x........ | OFFSETS[3] + | 0x........ | OFFSETS[4] + | 0x........ | OFFSETS[5] + | 0x000034f0 | OFFSETS[6] offset for BUCKETS[3] + | 0x00003500 | OFFSETS[7] offset for BUCKETS[3] + | 0x00003550 | OFFSETS[8] offset for BUCKETS[3] + | 0x........ | OFFSETS[9] + | 0x........ | OFFSETS[10] + | 0x........ | OFFSETS[11] + | 0x........ | OFFSETS[12] + | 0x........ | OFFSETS[13] + | 0x........ | OFFSETS[n_hashes] + |------------| + | | + | | + | | + | | + | | + |------------| +0x000034f0: | 0x00001203 | .debug_str ("erase") + | 0x00000004 | A 32 bit array count - number of HashData with name "erase" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x........ | HashData[3] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + |------------| +0x00003500: | 0x00001203 | String offset into .debug_str ("collision") + | 0x00000002 | A 32 bit array count - number of HashData with name "collision" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x00001203 | String offset into .debug_str ("dump") + | 0x00000003 | A 32 bit array count - number of HashData with name "dump" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + |------------| +0x00003550: | 0x00001203 | String offset into .debug_str ("main") + | 0x00000009 | A 32 bit array count - number of HashData with name "main" + | 0x........ | HashData[0] + | 0x........ | HashData[1] + | 0x........ | HashData[2] + | 0x........ | HashData[3] + | 0x........ | HashData[4] + | 0x........ | HashData[5] + | 0x........ | HashData[6] + | 0x........ | HashData[7] + | 0x........ | HashData[8] + | 0x00000000 | String offset into .debug_str (terminate data for hash) + `------------' +</pre> +</div> + +<p>So we still have all of the same data, we just organize it more efficiently + for debugger lookup. If we repeat the same "printf" lookup from above, we + would hash "printf" and find it matches BUCKETS[3] by taking the 32 bit hash + value and modulo it by n_buckets. BUCKETS[3] contains "6" which is the index + into the HASHES table. We would then compare any consecutive 32 bit hashes + values in the HASHES array as long as the hashes would be in BUCKETS[3]. We + do this by verifying that each subsequent hash value modulo n_buckets is still + 3. In the case of a failed lookup we would access the memory for BUCKETS[3], and + then compare a few consecutive 32 bit hashes before we know that we have no match. + We don't end up marching through multiple words of memory and we really keep the + number of processor data cache lines being accessed as small as possible.</p> + +<p>The string hash that is used for these lookup tables is the Daniel J. + Bernstein hash which is also used in the ELF GNU_HASH sections. It is a very + good hash for all kinds of names in programs with very few hash collisions.</p> + +<p>Empty buckets are designated by using an invalid hash index of UINT32_MAX.</p> +</div> + +<!-- ======================================================================= --> +<h4> + <a name="acceltabledetails">Details</a> +</h4> +<!-- ======================================================================= --> +<div> +<p>These name hash tables are designed to be generic where specializations of + the table get to define additional data that goes into the header + ("HeaderData"), how the string value is stored ("KeyType") and the content + of the data for each hash value.</p> + +<h5>Header Layout</h5> +<p>The header has a fixed part, and the specialized part. The exact format of + the header is:</p> +<div class="doc_code"> +<pre> +struct Header +{ + uint32_t magic; // 'HASH' magic value to allow endian detection + uint16_t version; // Version number + uint16_t hash_function; // The hash function enumeration that was used + uint32_t bucket_count; // The number of buckets in this hash table + uint32_t hashes_count; // The total number of unique hash values and hash data offsets in this table + uint32_t header_data_len; // The bytes to skip to get to the hash indexes (buckets) for correct alignment + // Specifically the length of the following HeaderData field - this does not + // include the size of the preceding fields + HeaderData header_data; // Implementation specific header data +}; +</pre> +</div> +<p>The header starts with a 32 bit "magic" value which must be 'HASH' encoded as + an ASCII integer. This allows the detection of the start of the hash table and + also allows the table's byte order to be determined so the table can be + correctly extracted. The "magic" value is followed by a 16 bit version number + which allows the table to be revised and modified in the future. The current + version number is 1. "hash_function" is a uint16_t enumeration that specifies + which hash function was used to produce this table. The current values for the + hash function enumerations include:</p> +<div class="doc_code"> +<pre> +enum HashFunctionType +{ + eHashFunctionDJB = 0u, // Daniel J Bernstein hash function +}; +</pre> +</div> +<p>"bucket_count" is a 32 bit unsigned integer that represents how many buckets + are in the BUCKETS array. "hashes_count" is the number of unique 32 bit hash + values that are in the HASHES array, and is the same number of offsets are + contained in the OFFSETS array. "header_data_len" specifies the size in + bytes of the HeaderData that is filled in by specialized versions of this + table.</p> + +<h5>Fixed Lookup</h5> +<p>The header is followed by the buckets, hashes, offsets, and hash value + data. +<div class="doc_code"> +<pre> +struct FixedTable +{ + uint32_t buckets[Header.bucket_count]; // An array of hash indexes into the "hashes[]" array below + uint32_t hashes [Header.hashes_count]; // Every unique 32 bit hash for the entire table is in this table + uint32_t offsets[Header.hashes_count]; // An offset that corresponds to each item in the "hashes[]" array above +}; +</pre> +</div> +<p>"buckets" is an array of 32 bit indexes into the "hashes" array. The + "hashes" array contains all of the 32 bit hash values for all names in the + hash table. Each hash in the "hashes" table has an offset in the "offsets" + array that points to the data for the hash value.</p> + +<p>This table setup makes it very easy to repurpose these tables to contain + different data, while keeping the lookup mechanism the same for all tables. + This layout also makes it possible to save the table to disk and map it in + later and do very efficient name lookups with little or no parsing.</p> + +<p>DWARF lookup tables can be implemented in a variety of ways and can store + a lot of information for each name. We want to make the DWARF tables + extensible and able to store the data efficiently so we have used some of the + DWARF features that enable efficient data storage to define exactly what kind + of data we store for each name.</p> + +<p>The "HeaderData" contains a definition of the contents of each HashData + chunk. We might want to store an offset to all of the debug information + entries (DIEs) for each name. To keep things extensible, we create a list of + items, or Atoms, that are contained in the data for each name. First comes the + type of the data in each atom:</p> +<div class="doc_code"> +<pre> +enum AtomType +{ + eAtomTypeNULL = 0u, + eAtomTypeDIEOffset = 1u, // DIE offset, check form for encoding + eAtomTypeCUOffset = 2u, // DIE offset of the compiler unit header that contains the item in question + eAtomTypeTag = 3u, // DW_TAG_xxx value, should be encoded as DW_FORM_data1 (if no tags exceed 255) or DW_FORM_data2 + eAtomTypeNameFlags = 4u, // Flags from enum NameFlags + eAtomTypeTypeFlags = 5u, // Flags from enum TypeFlags +}; +</pre> +</div> +<p>The enumeration values and their meanings are:</p> +<div class="doc_code"> +<pre> + eAtomTypeNULL - a termination atom that specifies the end of the atom list + eAtomTypeDIEOffset - an offset into the .debug_info section for the DWARF DIE for this name + eAtomTypeCUOffset - an offset into the .debug_info section for the CU that contains the DIE + eAtomTypeDIETag - The DW_TAG_XXX enumeration value so you don't have to parse the DWARF to see what it is + eAtomTypeNameFlags - Flags for functions and global variables (isFunction, isInlined, isExternal...) + eAtomTypeTypeFlags - Flags for types (isCXXClass, isObjCClass, ...) +</pre> +</div> +<p>Then we allow each atom type to define the atom type and how the data for + each atom type data is encoded:</p> +<div class="doc_code"> +<pre> +struct Atom +{ + uint16_t type; // AtomType enum value + uint16_t form; // DWARF DW_FORM_XXX defines +}; +</pre> +</div> +<p>The "form" type above is from the DWARF specification and defines the + exact encoding of the data for the Atom type. See the DWARF specification for + the DW_FORM_ definitions.</p> +<div class="doc_code"> +<pre> +struct HeaderData +{ + uint32_t die_offset_base; + uint32_t atom_count; + Atoms atoms[atom_count0]; +}; +</pre> +</div> +<p>"HeaderData" defines the base DIE offset that should be added to any atoms + that are encoded using the DW_FORM_ref1, DW_FORM_ref2, DW_FORM_ref4, + DW_FORM_ref8 or DW_FORM_ref_udata. It also defines what is contained in + each "HashData" object -- Atom.form tells us how large each field will be in + the HashData and the Atom.type tells us how this data should be interpreted.</p> + +<p>For the current implementations of the ".apple_names" (all functions + globals), + the ".apple_types" (names of all types that are defined), and the + ".apple_namespaces" (all namespaces), we currently set the Atom array to be:</p> +<div class="doc_code"> +<pre> +HeaderData.atom_count = 1; +HeaderData.atoms[0].type = eAtomTypeDIEOffset; +HeaderData.atoms[0].form = DW_FORM_data4; +</pre> +</div> +<p>This defines the contents to be the DIE offset (eAtomTypeDIEOffset) that is + encoded as a 32 bit value (DW_FORM_data4). This allows a single name to have + multiple matching DIEs in a single file, which could come up with an inlined + function for instance. Future tables could include more information about the + DIE such as flags indicating if the DIE is a function, method, block, + or inlined.</p> + +<p>The KeyType for the DWARF table is a 32 bit string table offset into the + ".debug_str" table. The ".debug_str" is the string table for the DWARF which + may already contain copies of all of the strings. This helps make sure, with + help from the compiler, that we reuse the strings between all of the DWARF + sections and keeps the hash table size down. Another benefit to having the + compiler generate all strings as DW_FORM_strp in the debug info, is that + DWARF parsing can be made much faster.</p> + +<p>After a lookup is made, we get an offset into the hash data. The hash data + needs to be able to deal with 32 bit hash collisions, so the chunk of data + at the offset in the hash data consists of a triple:</p> +<div class="doc_code"> +<pre> +uint32_t str_offset +uint32_t hash_data_count +HashData[hash_data_count] +</pre> +</div> +<p>If "str_offset" is zero, then the bucket contents are done. 99.9% of the + hash data chunks contain a single item (no 32 bit hash collision):</p> +<div class="doc_code"> +<pre> +.------------. +| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") +| 0x00000004 | uint32_t HashData count +| 0x........ | uint32_t HashData[0] DIE offset +| 0x........ | uint32_t HashData[1] DIE offset +| 0x........ | uint32_t HashData[2] DIE offset +| 0x........ | uint32_t HashData[3] DIE offset +| 0x00000000 | uint32_t KeyType (end of hash chain) +`------------' +</pre> +</div> +<p>If there are collisions, you will have multiple valid string offsets:</p> +<div class="doc_code"> +<pre> +.------------. +| 0x00001023 | uint32_t KeyType (.debug_str[0x0001023] => "main") +| 0x00000004 | uint32_t HashData count +| 0x........ | uint32_t HashData[0] DIE offset +| 0x........ | uint32_t HashData[1] DIE offset +| 0x........ | uint32_t HashData[2] DIE offset +| 0x........ | uint32_t HashData[3] DIE offset +| 0x00002023 | uint32_t KeyType (.debug_str[0x0002023] => "print") +| 0x00000002 | uint32_t HashData count +| 0x........ | uint32_t HashData[0] DIE offset +| 0x........ | uint32_t HashData[1] DIE offset +| 0x00000000 | uint32_t KeyType (end of hash chain) +`------------' +</pre> +</div> +<p>Current testing with real world C++ binaries has shown that there is around 1 + 32 bit hash collision per 100,000 name entries.</p> +</div> +<!-- ======================================================================= --> +<h4> + <a name="acceltablecontents">Contents</a> +</h4> +<!-- ======================================================================= --> +<div> +<p>As we said, we want to strictly define exactly what is included in the + different tables. For DWARF, we have 3 tables: ".apple_names", ".apple_types", + and ".apple_namespaces".</p> + +<p>".apple_names" sections should contain an entry for each DWARF DIE whose + DW_TAG is a DW_TAG_label, DW_TAG_inlined_subroutine, or DW_TAG_subprogram that + has address attributes: DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges or + DW_AT_entry_pc. It also contains DW_TAG_variable DIEs that have a DW_OP_addr + in the location (global and static variables). All global and static variables + should be included, including those scoped withing functions and classes. For + example using the following code:</p> +<div class="doc_code"> +<pre> +static int var = 0; + +void f () +{ + static int var = 0; +} +</pre> +</div> +<p>Both of the static "var" variables would be included in the table. All + functions should emit both their full names and their basenames. For C or C++, + the full name is the mangled name (if available) which is usually in the + DW_AT_MIPS_linkage_name attribute, and the DW_AT_name contains the function + basename. If global or static variables have a mangled name in a + DW_AT_MIPS_linkage_name attribute, this should be emitted along with the + simple name found in the DW_AT_name attribute.</p> + +<p>".apple_types" sections should contain an entry for each DWARF DIE whose + tag is one of:</p> +<ul> + <li>DW_TAG_array_type</li> + <li>DW_TAG_class_type</li> + <li>DW_TAG_enumeration_type</li> + <li>DW_TAG_pointer_type</li> + <li>DW_TAG_reference_type</li> + <li>DW_TAG_string_type</li> + <li>DW_TAG_structure_type</li> + <li>DW_TAG_subroutine_type</li> + <li>DW_TAG_typedef</li> + <li>DW_TAG_union_type</li> + <li>DW_TAG_ptr_to_member_type</li> + <li>DW_TAG_set_type</li> + <li>DW_TAG_subrange_type</li> + <li>DW_TAG_base_type</li> + <li>DW_TAG_const_type</li> + <li>DW_TAG_constant</li> + <li>DW_TAG_file_type</li> + <li>DW_TAG_namelist</li> + <li>DW_TAG_packed_type</li> + <li>DW_TAG_volatile_type</li> + <li>DW_TAG_restrict_type</li> + <li>DW_TAG_interface_type</li> + <li>DW_TAG_unspecified_type</li> + <li>DW_TAG_shared_type</li> +</ul> +<p>Only entries with a DW_AT_name attribute are included, and the entry must + not be a forward declaration (DW_AT_declaration attribute with a non-zero value). + For example, using the following code:</p> +<div class="doc_code"> +<pre> +int main () +{ + int *b = 0; + return *b; +} +</pre> +</div> +<p>We get a few type DIEs:</p> +<div class="doc_code"> +<pre> +0x00000067: TAG_base_type [5] + AT_encoding( DW_ATE_signed ) + AT_name( "int" ) + AT_byte_size( 0x04 ) + +0x0000006e: TAG_pointer_type [6] + AT_type( {0x00000067} ( int ) ) + AT_byte_size( 0x08 ) +</pre> +</div> +<p>The DW_TAG_pointer_type is not included because it does not have a DW_AT_name.</p> + +<p>".apple_namespaces" section should contain all DW_TAG_namespace DIEs. If + we run into a namespace that has no name this is an anonymous namespace, + and the name should be output as "(anonymous namespace)" (without the quotes). + Why? This matches the output of the abi::cxa_demangle() that is in the standard + C++ library that demangles mangled names.</p> +</div> + +<!-- ======================================================================= --> +<h4> + <a name="acceltableextensions">Language Extensions and File Format Changes</a> +</h4> +<!-- ======================================================================= --> +<div> +<h5>Objective-C Extensions</h5> +<p>".apple_objc" section should contain all DW_TAG_subprogram DIEs for an + Objective-C class. The name used in the hash table is the name of the + Objective-C class itself. If the Objective-C class has a category, then an + entry is made for both the class name without the category, and for the class + name with the category. So if we have a DIE at offset 0x1234 with a name + of method "-[NSString(my_additions) stringWithSpecialString:]", we would add + an entry for "NSString" that points to DIE 0x1234, and an entry for + "NSString(my_additions)" that points to 0x1234. This allows us to quickly + track down all Objective-C methods for an Objective-C class when doing + expressions. It is needed because of the dynamic nature of Objective-C where + anyone can add methods to a class. The DWARF for Objective-C methods is also + emitted differently from C++ classes where the methods are not usually + contained in the class definition, they are scattered about across one or more + compile units. Categories can also be defined in different shared libraries. + So we need to be able to quickly find all of the methods and class functions + given the Objective-C class name, or quickly find all methods and class + functions for a class + category name. This table does not contain any selector + names, it just maps Objective-C class names (or class names + category) to all + of the methods and class functions. The selectors are added as function + basenames in the .debug_names section.</p> + +<p>In the ".apple_names" section for Objective-C functions, the full name is the + entire function name with the brackets ("-[NSString stringWithCString:]") and the + basename is the selector only ("stringWithCString:").</p> + +<h5>Mach-O Changes</h5> +<p>The sections names for the apple hash tables are for non mach-o files. For + mach-o files, the sections should be contained in the "__DWARF" segment with + names as follows:</p> +<ul> + <li>".apple_names" -> "__apple_names"</li> + <li>".apple_types" -> "__apple_types"</li> + <li>".apple_namespaces" -> "__apple_namespac" (16 character limit)</li> + <li> ".apple_objc" -> "__apple_objc"</li> +</ul> +</div> +</div> +</div> + <!-- *********************************************************************** --> <hr> @@ -1814,7 +2855,7 @@ enum Trees { <a href="mailto:sabre@nondot.org">Chris Lattner</a><br> <a href="http://llvm.org/">LLVM Compiler Infrastructure</a><br> - Last modified: $Date: 2011-10-12 00:59:11 +0200 (Wed, 12 Oct 2011) $ + Last modified: $Date: 2012-04-03 02:43:49 +0200 (Tue, 03 Apr 2012) $ </address> </body> |