From 7b3392326c40c3c20697816acae597ba7b3144eb Mon Sep 17 00:00:00 2001 From: dim Date: Thu, 20 Oct 2011 21:10:27 +0000 Subject: Vendor import of llvm release_30 branch r142614: http://llvm.org/svn/llvm-project/llvm/branches/release_30@142614 --- docs/CodeGenerator.html | 208 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 195 insertions(+), 13 deletions(-) (limited to 'docs/CodeGenerator.html') diff --git a/docs/CodeGenerator.html b/docs/CodeGenerator.html index 29a2cce..e693a22 100644 --- a/docs/CodeGenerator.html +++ b/docs/CodeGenerator.html @@ -114,6 +114,7 @@
  • Prolog/Epilog
  • Dynamic Allocation
  • +
  • The PTX backend
  • @@ -1768,22 +1769,28 @@ bool RegMapping_Fer::compatible_class(MachineFunction &mf, different register allocators:

    The type of register allocator used in llc can be chosen with the @@ -1805,7 +1812,121 @@ $ llc -regalloc=pbqp file.bc -o pbqp.s;

    Prolog/Epilog Code Insertion

    -

    To Be Written

    + + +

    + Compact Unwind +

    + +
    + +

    Throwing an exception requires unwinding out of a function. The + information on how to unwind a given function is traditionally expressed in + DWARF unwind (a.k.a. frame) info. But that format was originally developed + for debuggers to backtrace, and each Frame Description Entry (FDE) requires + ~20-30 bytes per function. There is also the cost of mapping from an address + in a function to the corresponding FDE at runtime. An alternative unwind + encoding is called compact unwind and requires just 4-bytes per + function.

    + +

    The compact unwind encoding is a 32-bit value, which is encoded in an + architecture-specific way. It specifies which registers to restore and from + where, and how to unwind out of the function. When the linker creates a final + linked image, it will create a __TEXT,__unwind_info + section. This section is a small and fast way for the runtime to access + unwind info for any given function. If we emit compact unwind info for the + function, that compact unwind info will be encoded in + the __TEXT,__unwind_info section. If we emit DWARF unwind info, + the __TEXT,__unwind_info section will contain the offset of the + FDE in the __TEXT,__eh_frame section in the final linked + image.

    + +

    For X86, there are three modes for the compact unwind encoding:

    + +
    +
    Function with a Frame Pointer (EBP or RBP)
    +

    EBP/RBP-based frame, where EBP/RBP is pushed + onto the stack immediately after the return address, + then ESP/RSP is moved to EBP/RBP. Thus to + unwind, ESP/RSP is restored with the + current EBP/RBP value, then EBP/RBP is restored + by popping the stack, and the return is done by popping the stack once + more into the PC. All non-volatile registers that need to be restored must + have been saved in a small range on the stack that + starts EBP-4 to EBP-1020 (RBP-8 + to RBP-1020). The offset (divided by 4 in 32-bit mode and 8 + in 64-bit mode) is encoded in bits 16-23 (mask: 0x00FF0000). + The registers saved are encoded in bits 0-14 + (mask: 0x00007FFF) as five 3-bit entries from the following + table:

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    Compact Numberi386 Registerx86-64 Regiser
    1EBXRBX
    2ECXR12
    3EDXR13
    4EDIR14
    5ESIR15
    6EBPRBP
    + +
    + +
    Frameless with a Small Constant Stack Size (EBP + or RBP is not used as a frame pointer)
    +

    To return, a constant (encoded in the compact unwind encoding) is added + to the ESP/RSP. Then the return is done by popping the stack + into the PC. All non-volatile registers that need to be restored must have + been saved on the stack immediately after the return address. The stack + size (divided by 4 in 32-bit mode and 8 in 64-bit mode) is encoded in bits + 16-23 (mask: 0x00FF0000). There is a maximum stack size of + 1024 bytes in 32-bit mode and 2048 in 64-bit mode. The number of registers + saved is encoded in bits 9-12 (mask: 0x00001C00). Bits 0-9 + (mask: 0x000003FF) contain which registers were saved and + their order. (See + the encodeCompactUnwindRegistersWithoutFrame() function + in lib/Target/X86FrameLowering.cpp for the encoding + algorithm.)

    + +
    Frameless with a Large Constant Stack Size (EBP + or RBP is not used as a frame pointer)
    +

    This case is like the "Frameless with a Small Constant Stack Size" + case, but the stack size is too large to encode in the compact unwind + encoding. Instead it requires that the function contains "subl + $nnnnnn, %esp" in its prolog. The compact encoding contains the + offset to the $nnnnnn value in the function in bits 9-12 + (mask: 0x00001C00).

    +
    + +
    +

    Late Machine Code Optimizations @@ -2165,7 +2286,7 @@ is the key:

    - * + @@ -2261,9 +2382,6 @@ disassembling machine opcode bytes into MCInst's.

    This box indicates whether the target supports most popular inline assembly constraints and modifiers.

    -

    X86 lacks reliable support for inline assembly -constraints relating to the X86 floating point stack.

    - @@ -2794,6 +2912,70 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory + +

    + The PTX backend +

    + +
    + +

    The PTX code generator lives in the lib/Target/PTX directory. It is + currently a work-in-progress, but already supports most of the code + generation functionality needed to generate correct PTX kernels for + CUDA devices.

    + +

    The code generator can target PTX 2.0+, and shader model 1.0+. The + PTX ISA Reference Manual is used as the primary source of ISA + information, though an effort is made to make the output of the code + generator match the output of the NVidia nvcc compiler, whenever + possible.

    + +

    Code Generator Options:

    + + + + + + + + + + + + + + + + + +
    OptionDescription
    doubleIf enabled, the map_f64_to_f32 directive is + disabled in the PTX output, allowing native double-precision + arithmetic
    no-fmaDisable generation of Fused-Multiply Add + instructions, which may be beneficial for some devices
    smxy / computexySet shader model/compute capability to x.y, + e.g. sm20 or compute13
    + +

    Working:

    + + +

    In Progress:

    + + + +
    + @@ -2806,7 +2988,7 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory Chris Lattner
    The LLVM Compiler Infrastructure
    - Last modified: $Date: 2011-05-23 00:28:47 +0200 (Mon, 23 May 2011) $ + Last modified: $Date: 2011-09-19 20:15:46 +0200 (Mon, 19 Sep 2011) $ -- cgit v1.1