From 7b3392326c40c3c20697816acae597ba7b3144eb Mon Sep 17 00:00:00 2001
From: dim
The type of register allocator used in llc can be chosen with the @@ -1805,7 +1812,121 @@ $ llc -regalloc=pbqp file.bc -o pbqp.s;
To Be Written
Throwing an exception requires unwinding out of a function. The + information on how to unwind a given function is traditionally expressed in + DWARF unwind (a.k.a. frame) info. But that format was originally developed + for debuggers to backtrace, and each Frame Description Entry (FDE) requires + ~20-30 bytes per function. There is also the cost of mapping from an address + in a function to the corresponding FDE at runtime. An alternative unwind + encoding is called compact unwind and requires just 4-bytes per + function.
+ +The compact unwind encoding is a 32-bit value, which is encoded in an
+ architecture-specific way. It specifies which registers to restore and from
+ where, and how to unwind out of the function. When the linker creates a final
+ linked image, it will create a __TEXT,__unwind_info
+ section. This section is a small and fast way for the runtime to access
+ unwind info for any given function. If we emit compact unwind info for the
+ function, that compact unwind info will be encoded in
+ the __TEXT,__unwind_info
section. If we emit DWARF unwind info,
+ the __TEXT,__unwind_info
section will contain the offset of the
+ FDE in the __TEXT,__eh_frame
section in the final linked
+ image.
For X86, there are three modes for the compact unwind encoding:
+ +EBP
or RBP
)EBP/RBP
-based frame, where EBP/RBP
is pushed
+ onto the stack immediately after the return address,
+ then ESP/RSP
is moved to EBP/RBP
. Thus to
+ unwind, ESP/RSP
is restored with the
+ current EBP/RBP
value, then EBP/RBP
is restored
+ by popping the stack, and the return is done by popping the stack once
+ more into the PC. All non-volatile registers that need to be restored must
+ have been saved in a small range on the stack that
+ starts EBP-4
to EBP-1020
(RBP-8
+ to RBP-1020
). The offset (divided by 4 in 32-bit mode and 8
+ in 64-bit mode) is encoded in bits 16-23 (mask: 0x00FF0000
).
+ The registers saved are encoded in bits 0-14
+ (mask: 0x00007FFF
) as five 3-bit entries from the following
+ table:
Compact Number | +i386 Register | +x86-64 Regiser | +
---|---|---|
1 | +EBX |
+ RBX |
+
2 | +ECX |
+ R12 |
+
3 | +EDX |
+ R13 |
+
4 | +EDI |
+ R14 |
+
5 | +ESI |
+ R15 |
+
6 | +EBP |
+ RBP |
+
EBP
+ or RBP
is not used as a frame pointer)To return, a constant (encoded in the compact unwind encoding) is added
+ to the ESP/RSP
. Then the return is done by popping the stack
+ into the PC. All non-volatile registers that need to be restored must have
+ been saved on the stack immediately after the return address. The stack
+ size (divided by 4 in 32-bit mode and 8 in 64-bit mode) is encoded in bits
+ 16-23 (mask: 0x00FF0000
). There is a maximum stack size of
+ 1024 bytes in 32-bit mode and 2048 in 64-bit mode. The number of registers
+ saved is encoded in bits 9-12 (mask: 0x00001C00
). Bits 0-9
+ (mask: 0x000003FF
) contain which registers were saved and
+ their order. (See
+ the encodeCompactUnwindRegistersWithoutFrame()
function
+ in lib/Target/X86FrameLowering.cpp
for the encoding
+ algorithm.)
EBP
+ or RBP
is not used as a frame pointer)This case is like the "Frameless with a Small Constant Stack Size"
+ case, but the stack size is too large to encode in the compact unwind
+ encoding. Instead it requires that the function contains "subl
+ $nnnnnn, %esp
" in its prolog. The compact encoding contains the
+ offset to the $nnnnnn
value in the function in bits 9-12
+ (mask: 0x00001C00
).
This box indicates whether the target supports most popular inline assembly constraints and modifiers.
-X86 lacks reliable support for inline assembly -constraints relating to the X86 floating point stack.
- @@ -2794,6 +2912,70 @@ MOVSX32rm16 -> movsx, 32-bit register, 16-bit memory + +The PTX code generator lives in the lib/Target/PTX directory. It is + currently a work-in-progress, but already supports most of the code + generation functionality needed to generate correct PTX kernels for + CUDA devices.
+ +The code generator can target PTX 2.0+, and shader model 1.0+. The + PTX ISA Reference Manual is used as the primary source of ISA + information, though an effort is made to make the output of the code + generator match the output of the NVidia nvcc compiler, whenever + possible.
+ +Code Generator Options:
+Option | +Description | +
---|---|
double |
+ If enabled, the map_f64_to_f32 directive is + disabled in the PTX output, allowing native double-precision + arithmetic | +
no-fma |
+ Disable generation of Fused-Multiply Add + instructions, which may be beneficial for some devices | +
smxy / computexy |
+ Set shader model/compute capability to x.y, + e.g. sm20 or compute13 | +
Working:
+In Progress:
+