diff options
Diffstat (limited to 'docs/ControlFlowIntegrityDesign.rst')
-rw-r--r-- | docs/ControlFlowIntegrityDesign.rst | 224 |
1 files changed, 224 insertions, 0 deletions
diff --git a/docs/ControlFlowIntegrityDesign.rst b/docs/ControlFlowIntegrityDesign.rst index 89aa038..b4aacd3 100644 --- a/docs/ControlFlowIntegrityDesign.rst +++ b/docs/ControlFlowIntegrityDesign.rst @@ -273,3 +273,227 @@ Eliminating Bit Vector Checks for All-Ones Bit Vectors If the bit vector is all ones, the bit vector check is redundant; we simply need to check that the address is in range and well aligned. This is more likely to occur if the virtual tables are padded. + +Forward-Edge CFI for Indirect Function Calls +============================================ + +Under forward-edge CFI for indirect function calls, each unique function +type has its own bit vector, and at each call site we need to check that the +function pointer is a member of the function type's bit vector. This scheme +works in a similar way to forward-edge CFI for virtual calls, the distinction +being that we need to build bit vectors of function entry points rather than +of virtual tables. + +Unlike when re-arranging global variables, we cannot re-arrange functions +in a particular order and base our calculations on the layout of the +functions' entry points, as we have no idea how large a particular function +will end up being (the function sizes could even depend on how we arrange +the functions). Instead, we build a jump table, which is a block of code +consisting of one branch instruction for each of the functions in the bit +set that branches to the target function, and redirect any taken function +addresses to the corresponding jump table entry. In this way, the distance +between function entry points is predictable and controllable. In the object +file's symbol table, the symbols for the target functions also refer to the +jump table entries, so that addresses taken outside the module will pass +any verification done inside the module. + +In more concrete terms, suppose we have three functions ``f``, ``g``, ``h`` +which are members of a single bitset, and a function foo that returns their +addresses: + +.. code-block:: none + + f: + mov 0, %eax + ret + + g: + mov 1, %eax + ret + + h: + mov 2, %eax + ret + + foo: + mov f, %eax + mov g, %edx + mov h, %ecx + ret + +Our jump table will (conceptually) look like this: + +.. code-block:: none + + f: + jmp .Ltmp0 ; 5 bytes + int3 ; 1 byte + int3 ; 1 byte + int3 ; 1 byte + + g: + jmp .Ltmp1 ; 5 bytes + int3 ; 1 byte + int3 ; 1 byte + int3 ; 1 byte + + h: + jmp .Ltmp2 ; 5 bytes + int3 ; 1 byte + int3 ; 1 byte + int3 ; 1 byte + + .Ltmp0: + mov 0, %eax + ret + + .Ltmp1: + mov 1, %eax + ret + + .Ltmp2: + mov 2, %eax + ret + + foo: + mov f, %eax + mov g, %edx + mov h, %ecx + ret + +Because the addresses of ``f``, ``g``, ``h`` are evenly spaced at a power of +2, and function types do not overlap (unlike class types with base classes), +we can normally apply the `Alignment`_ and `Eliminating Bit Vector Checks +for All-Ones Bit Vectors`_ optimizations thus simplifying the check at each +call site to a range and alignment check. + +Shared library support +====================== + +**EXPERIMENTAL** + +The basic CFI mode described above assumes that the application is a +monolithic binary; at least that all possible virtual/indirect call +targets and the entire class hierarchy are known at link time. The +cross-DSO mode, enabled with **-f[no-]sanitize-cfi-cross-dso** relaxes +this requirement by allowing virtual and indirect calls to cross the +DSO boundary. + +Assuming the following setup: the binary consists of several +instrumented and several uninstrumented DSOs. Some of them may be +dlopen-ed/dlclose-d periodically, even frequently. + + - Calls made from uninstrumented DSOs are not checked and just work. + - Calls inside any instrumented DSO are fully protected. + - Calls between different instrumented DSOs are also protected, with + a performance penalty (in addition to the monolithic CFI + overhead). + - Calls from an instrumented DSO to an uninstrumented one are + unchecked and just work, with performance penalty. + - Calls from an instrumented DSO outside of any known DSO are + detected as CFI violations. + +In the monolithic scheme a call site is instrumented as + +.. code-block:: none + + if (!InlinedFastCheck(f)) + abort(); + call *f + +In the cross-DSO scheme it becomes + +.. code-block:: none + + if (!InlinedFastCheck(f)) + __cfi_slowpath(CallSiteTypeId, f); + call *f + +CallSiteTypeId +-------------- + +``CallSiteTypeId`` is a stable process-wide identifier of the +call-site type. For a virtual call site, the type in question is the class +type; for an indirect function call it is the function signature. The +mapping from a type to an identifier is an ABI detail. In the current, +experimental, implementation the identifier of type T is calculated as +follows: + + - Obtain the mangled name for "typeinfo name for T". + - Calculate MD5 hash of the name as a string. + - Reinterpret the first 8 bytes of the hash as a little-endian + 64-bit integer. + +It is possible, but unlikely, that collisions in the +``CallSiteTypeId`` hashing will result in weaker CFI checks that would +still be conservatively correct. + +CFI_Check +--------- + +In the general case, only the target DSO knows whether the call to +function ``f`` with type ``CallSiteTypeId`` is valid or not. To +export this information, every DSO implements + +.. code-block:: none + + void __cfi_check(uint64 CallSiteTypeId, void *TargetAddr) + +This function provides external modules with access to CFI checks for +the targets inside this DSO. For each known ``CallSiteTypeId``, this +functions performs an ``llvm.bitset.test`` with the corresponding bit +set. It aborts if the type is unknown, or if the check fails. + +The basic implementation is a large switch statement over all values +of CallSiteTypeId supported by this DSO, and each case is similar to +the InlinedFastCheck() in the basic CFI mode. + +CFI Shadow +---------- + +To route CFI checks to the target DSO's __cfi_check function, a +mapping from possible virtual / indirect call targets to +the corresponding __cfi_check functions is maintained. This mapping is +implemented as a sparse array of 2 bytes for every possible page (4096 +bytes) of memory. The table is kept readonly (FIXME: not yet) most of +the time. + +There are 3 types of shadow values: + + - Address in a CFI-instrumented DSO. + - Unchecked address (a “trusted” non-instrumented DSO). Encoded as + value 0xFFFF. + - Invalid address (everything else). Encoded as value 0. + +For a CFI-instrumented DSO, a shadow value encodes the address of the +__cfi_check function for all call targets in the corresponding memory +page. If Addr is the target address, and V is the shadow value, then +the address of __cfi_check is calculated as + +.. code-block:: none + + __cfi_check = AlignUpTo(Addr, 4096) - (V + 1) * 4096 + +This works as long as __cfi_check is aligned by 4096 bytes and located +below any call targets in its DSO, but not more than 256MB apart from +them. + +CFI_SlowPath +------------ + +The slow path check is implemented in compiler-rt library as + +.. code-block:: none + + void __cfi_slowpath(uint64 CallSiteTypeId, void *TargetAddr) + +This functions loads a shadow value for ``TargetAddr``, finds the +address of __cfi_check as described above and calls that. + +Position-independent executable requirement +------------------------------------------- + +Cross-DSO CFI mode requires that the main executable is built as PIE. +In non-PIE executables the address of an external function (taken from +the main executable) is the address of that function’s PLT record in +the main executable. This would break the CFI checks. |