This document describes the LLVM bitstream file format and the encoding of the LLVM IR into it.

@@ -58,10 +58,10 @@ the LLVM IR into it.

What is commonly known as the LLVM bitcode file format (also, sometimes @@ -88,10 +88,10 @@ wrapper format, then describes the record structure used by LLVM IR files.

Bitstream Format

The bitstream format is literally a stream of bits, with a very simple @@ -114,13 +114,12 @@ href="CommandGuide/html/llvm-bcanalyzer.html">llvm-bcanalyzer tool can be used to dump and inspect arbitrary bitstreams, which is very useful for understanding the encoding.

- -

Magic Numbers -

+ Magic Numbers +

The first two bytes of a bitcode file are 'BC' (0x42, 0x43). The second two bytes are an application-specific magic number. Generic @@ -130,10 +129,11 @@ bitcode, while application-specific programs will want to look at all four.

Primitives -

+ Primitives +

A bitstream literally consists of a stream of bits, which are read in order @@ -144,13 +144,12 @@ Width Integers or as Variable Width Integers.

- -

Fixed Width Integers -

+ Fixed Width Integers +

Fixed-width integer values have their low bits emitted directly to the file. For example, a 3-bit integer value encodes 1 as 001. Fixed width integers @@ -161,10 +160,11 @@ Integers.

Variable Width -Integers

+ Variable Width Integers +

Variable-width integer (VBR) values encode values of arbitrary size, optimizing for the case where the values are small. Given a 4-bit VBR field, @@ -182,9 +182,9 @@ value of 24 (011 << 3) with no continuation. The sum (3+24) yields the value

6-bit characters

6-bit characters encode common characters into a fixed 6-bit field. They represent the following characters with the following 6-bit values:

@@ -206,9 +206,9 @@ characters not in the set.

Word Alignment

Occasionally, it is useful to emit zero bits until the bitstream is a multiple of 32 bits. This ensures that the bit position in the stream can be @@ -216,12 +216,14 @@ represented as a multiple of 32-bit words.

Abbreviation IDs -

+ Abbreviation IDs +

A bitstream is a sequential series of Blocks and @@ -253,10 +255,11 @@ an abbreviated record encoding.

Blocks -

+ Blocks +

Blocks in a bitstream denote nested regions of the stream, and are identified by @@ -297,13 +300,10 @@ its own set of abbreviations, and its own abbrev id width. When a sub-block is popped, the saved values are restored.

- -

ENTER_SUBBLOCK -Encoding

ENTER_SUBBLOCK Encoding

[ENTER_SUBBLOCK, blockid_vbr8, newabbrevlen_vbr4, <align32bits>, blocklen₃₂]

@@ -322,10 +322,9 @@ reader to skip over the entire block in one jump.

END_BLOCK -Encoding

END_BLOCK Encoding

[END_BLOCK, <align32bits>]

@@ -337,13 +336,14 @@ an even multiple of 32-bits.

- +

Data Records -

+ Data Records +

Data records consist of a record code and a number of (up to) 64-bit integer values. The interpretation of the code and values is @@ -355,13 +355,10 @@ which encodes the target triple of a module. The code is ASCII codes for the characters in the string.

- -

UNABBREV_RECORD -Encoding

UNABBREV_RECORD Encoding

[UNABBREV_RECORD, code_vbr6, numops_vbr6, op0_vbr6, op1_vbr6, ...]

@@ -385,10 +382,9 @@ bits. This is not an efficient encoding, but it is fully general.

Abbreviated Record -Encoding

Abbreviated Record Encoding

[<abbrevid>, fields...]

@@ -409,11 +405,14 @@ operand value).

- -

Abbreviations

+ +

+ Abbreviations +

+ +

Abbreviations are an important form of compression for bitstreams. The idea is to specify a dense encoding for a class of records once, then use that encoding @@ -431,13 +430,11 @@ As a concrete example, LLVM IR files usually emit an abbreviation for binary operators. If a specific LLVM module contained no or few binary operators, the abbreviation does not need to be emitted.

DEFINE_ABBREV - Encoding

DEFINE_ABBREV Encoding

[DEFINE_ABBREV, numabbrevops_vbr5, abbrevop0, abbrevop1, ...]

@@ -552,11 +549,14 @@ used for any other string value.

- -

Standard Blocks

+ +

+ Standard Blocks +

+ +

In addition to the basic block structure and record encodings, the bitstream @@ -565,13 +565,10 @@ stream is to be decoded or other metadata. In the future, new standard blocks may be added. Block IDs 0-7 are reserved for standard blocks.

- -

#0 - BLOCKINFO -Block

#0 - BLOCKINFO Block

The BLOCKINFO block allows the description of metadata for other @@ -620,11 +617,15 @@ from the corresponding blocks. It is not safe to skip them.

+ +

+ -

Bitcode Wrapper Format

Bitcode files for LLVM IR may optionally be wrapped in a simple wrapper @@ -652,10 +653,10 @@ value that can be used to encode the CPU of the target.

LLVM IR Encoding

LLVM IR is encoded into a bitstream by defining blocks and records. It uses @@ -666,16 +667,17 @@ that the writer uses, as these are fully self-described in the file, and the reader is not allowed to build in any knowledge of this.

- -

Basics -

+ Basics +

+ +

LLVM IR Magic Number

The magic number for LLVM IR files is: @@ -695,9 +697,9 @@ When combined with the bitcode magic number and viewed as bytes, this is

Signed VBRs

Variable Width Integer encoding is an efficient way to @@ -728,9 +730,9 @@ within CONSTANTS_BLOCK blocks. -

LLVM IR Blocks

LLVM IR is defined with the following blocks: @@ -758,11 +760,14 @@ LLVM IR is defined with the following blocks:

- -

MODULE_BLOCK Contents

+ +

+ MODULE_BLOCK Contents +

+ +

The MODULE_BLOCK block (id 8) is the top-level block for LLVM bitcode files, and each bitcode file must contain exactly one. In @@ -782,13 +787,10 @@ following sub-blocks:

METADATA_BLOCK

- -

MODULE_CODE_VERSION Record -

MODULE_CODE_VERSION Record

[VERSION, version#]

@@ -798,10 +800,9 @@ time.

MODULE_CODE_TRIPLE Record -

MODULE_CODE_TRIPLE Record

[TRIPLE, ...string...]

The TRIPLE record (code 2) contains a variable number of @@ -810,10 +811,9 @@ specification string.

MODULE_CODE_DATALAYOUT Record -

MODULE_CODE_DATALAYOUT Record

[DATALAYOUT, ...string...]

The DATALAYOUT record (code 3) contains a variable number of @@ -822,10 +822,9 @@ specification string.

MODULE_CODE_ASM Record -

MODULE_CODE_ASM Record

[ASM, ...string...]

The ASM record (code 4) contains a variable number of @@ -834,10 +833,9 @@ individual assembly blocks separated by newline (ASCII 10) characters.

MODULE_CODE_SECTIONNAME Record -

MODULE_CODE_SECTIONNAME Record

[SECTIONNAME, ...string...]

The SECTIONNAME record (code 5) contains a variable number @@ -850,10 +848,9 @@ referenced by the 1-based index in the section fields of

MODULE_CODE_DEPLIB Record -

MODULE_CODE_DEPLIB Record

[DEPLIB, ...string...]

The DEPLIB record (code 6) contains a variable number of @@ -864,10 +861,9 @@ library name referenced.

MODULE_CODE_GLOBALVAR Record -

MODULE_CODE_GLOBALVAR Record

[GLOBALVAR, pointer type, isconst, initid, linkage, alignment, section, visibility, threadlocal]

The GLOBALVAR record (code 7) marks the declaration or @@ -923,16 +919,15 @@ encoding of the visibility of this variable: is thread_local

unnamed_addr: If present and non-zero, indicates that the variable -has unnamed_addr

+has unnamed_addr

-MODULE_CODE_FUNCTION Record - +MODULE_CODE_FUNCTION Record - + [FUNCTION, type, callingconv, isproto, linkage, paramattr, alignment, section, visibility, gc] @@ -980,16 +975,15 @@ index in the table of MODULE_CODE_GCNAME entries. unnamed_addr: If present and non-zero, indicates that the function -has unnamed_addr +has unnamed_addr -MODULE_CODE_ALIAS Record - +MODULE_CODE_ALIAS Record - + [ALIAS, alias type, aliasee val#, linkage, visibility] @@ -1011,10 +1005,9 @@ for this alias -MODULE_CODE_PURGEVALS Record - +MODULE_CODE_PURGEVALS Record - + [PURGEVALS, numvals] The PURGEVALS record (code 10) resets the module-level @@ -1025,10 +1018,9 @@ new value indices will start from the given numvals value. -MODULE_CODE_GCNAME Record - +MODULE_CODE_GCNAME Record - + [GCNAME, ...string...] The GCNAME record (code 11) contains a variable number of @@ -1039,11 +1031,14 @@ the module. These records can be referenced by 1-based index in the gc fields of FUNCTION records. - -PARAMATTR_BLOCK Contents - + + + PARAMATTR_BLOCK Contents + + + The PARAMATTR_BLOCK block (id 9) contains a table of entries describing the attributes of function parameters. These @@ -1057,14 +1052,10 @@ href="#FUNC_CODE_INST_CALL">INST_CALL records. that each is unique (i.e., no two indicies represent equivalent attribute lists). - - - -PARAMATTR_CODE_ENTRY Record - +PARAMATTR_CODE_ENTRY Record - + [ENTRY, paramidx0, attr0, paramidx1, attr1...] @@ -1105,11 +1096,14 @@ the logarithm base 2 of the requested alignment, plus 1 - -TYPE_BLOCK Contents - + + + TYPE_BLOCK Contents + + + The TYPE_BLOCK block (id 10) contains records which constitute a table of type operator entries used to represent types @@ -1124,13 +1118,10 @@ type operator records. each entry is unique (i.e., no two indicies represent structurally equivalent types). - - -TYPE_CODE_NUMENTRY Record - +TYPE_CODE_NUMENTRY Record - + [NUMENTRY, numentries] @@ -1142,10 +1133,9 @@ in the block. -TYPE_CODE_VOID Record - +TYPE_CODE_VOID Record - + [VOID] @@ -1155,10 +1145,9 @@ type table. -TYPE_CODE_FLOAT Record - +TYPE_CODE_FLOAT Record - + [FLOAT] @@ -1168,10 +1157,9 @@ floating point) type to the type table. -TYPE_CODE_DOUBLE Record - +TYPE_CODE_DOUBLE Record - + [DOUBLE] @@ -1181,10 +1169,9 @@ floating point) type to the type table. -TYPE_CODE_LABEL Record - +TYPE_CODE_LABEL Record - + [LABEL] @@ -1194,10 +1181,9 @@ the type table. -TYPE_CODE_OPAQUE Record - +TYPE_CODE_OPAQUE Record - + [OPAQUE] @@ -1208,10 +1194,9 @@ unified. -TYPE_CODE_INTEGER Record - +TYPE_CODE_INTEGER Record - + [INTEGER, width] @@ -1222,10 +1207,9 @@ integer type. -TYPE_CODE_POINTER Record - +TYPE_CODE_POINTER Record - + [POINTER, pointee type, address space] @@ -1243,10 +1227,9 @@ default address space is zero. -TYPE_CODE_FUNCTION Record - +TYPE_CODE_FUNCTION Record - + [FUNCTION, vararg, ignored, retty, ...paramty... ] @@ -1268,10 +1251,9 @@ parameter types of the function -TYPE_CODE_STRUCT Record - +TYPE_CODE_STRUCT Record - + [STRUCT, ispacked, ...eltty...] @@ -1287,10 +1269,9 @@ types of the structure -TYPE_CODE_ARRAY Record - +TYPE_CODE_ARRAY Record - + [ARRAY, numelts, eltty] @@ -1305,10 +1286,9 @@ table. The operand fields are -TYPE_CODE_VECTOR Record - +TYPE_CODE_VECTOR Record - + [VECTOR, numelts, eltty] @@ -1323,10 +1303,9 @@ table. The operand fields are -TYPE_CODE_X86_FP80 Record - +TYPE_CODE_X86_FP80 Record - + [X86_FP80] @@ -1336,10 +1315,9 @@ floating point) type to the type table. -TYPE_CODE_FP128 Record - +TYPE_CODE_FP128 Record - + [FP128] @@ -1349,10 +1327,9 @@ floating point) type to the type table. -TYPE_CODE_PPC_FP128 Record - +TYPE_CODE_PPC_FP128 Record - + [PPC_FP128] @@ -1362,10 +1339,9 @@ floating point) type to the type table. -TYPE_CODE_METADATA Record - +TYPE_CODE_METADATA Record - + [METADATA] @@ -1374,11 +1350,14 @@ type to the type table. - -CONSTANTS_BLOCK Contents - + + + CONSTANTS_BLOCK Contents + + + The CONSTANTS_BLOCK block (id 11) ... @@ -1387,10 +1366,11 @@ type to the type table. -FUNCTION_BLOCK Contents - + + FUNCTION_BLOCK Contents + - + The FUNCTION_BLOCK block (id 12) ... @@ -1409,23 +1389,21 @@ type to the type table. -TYPE_SYMTAB_BLOCK Contents - + + TYPE_SYMTAB_BLOCK Contents + - + The TYPE_SYMTAB_BLOCK block (id 13) contains entries which map between module-level named types and their corresponding type indices. - - -TST_CODE_ENTRY Record - +TST_CODE_ENTRY Record - + [ENTRY, typeid, ...string...] @@ -1436,12 +1414,14 @@ name. Each entry corresponds to a single named type. + -VALUE_SYMTAB_BLOCK Contents - + + VALUE_SYMTAB_BLOCK Contents + - + The VALUE_SYMTAB_BLOCK block (id 14) ... @@ -1450,10 +1430,11 @@ name. Each entry corresponds to a single named type. -METADATA_BLOCK Contents - + + METADATA_BLOCK Contents + - + The METADATA_BLOCK block (id 15) ... @@ -1462,16 +1443,18 @@ name. Each entry corresponds to a single named type. -METADATA_ATTACHMENT Contents - + + METADATA_ATTACHMENT Contents + - + The METADATA_ATTACHMENT block (id 16) ... + @@ -1480,8 +1463,8 @@ name. Each entry corresponds to a single named type. Chris Lattner -The LLVM Compiler Infrastructure -Last modified: $Date: 2011-01-08 17:42:36 +0100 (Sat, 08 Jan 2011) $ +The LLVM Compiler Infrastructure +Last modified: $Date: 2011-04-23 02:30:22 +0200 (Sat, 23 Apr 2011) $ -- cgit v1.1

LLVM Bitcode File Format

+ Magic Numbers +

+ Primitives +

+ Fixed Width Integers +

+ Variable Width Integers +

+ Abbreviation IDs +

+ Blocks +

+ Data Records +

+ Abbreviations +

+ Standard Blocks +

+ Basics +

+ MODULE_BLOCK Contents +

+ PARAMATTR_BLOCK Contents +

+ TYPE_BLOCK Contents +

+ CONSTANTS_BLOCK Contents +

+ FUNCTION_BLOCK Contents +

+ TYPE_SYMTAB_BLOCK Contents +

+ VALUE_SYMTAB_BLOCK Contents +

+ METADATA_BLOCK Contents +

+ METADATA_ATTACHMENT Contents +