diff options
Diffstat (limited to 'docs/DataFlowSanitizer.rst')
-rw-r--r-- | docs/DataFlowSanitizer.rst | 158 |
1 files changed, 158 insertions, 0 deletions
diff --git a/docs/DataFlowSanitizer.rst b/docs/DataFlowSanitizer.rst new file mode 100644 index 0000000..e0e9d74 --- /dev/null +++ b/docs/DataFlowSanitizer.rst @@ -0,0 +1,158 @@ +================= +DataFlowSanitizer +================= + +.. toctree:: + :hidden: + + DataFlowSanitizerDesign + +.. contents:: + :local: + +Introduction +============ + +DataFlowSanitizer is a generalised dynamic data flow analysis. + +Unlike other Sanitizer tools, this tool is not designed to detect a +specific class of bugs on its own. Instead, it provides a generic +dynamic data flow analysis framework to be used by clients to help +detect application-specific issues within their own code. + +Usage +===== + +With no program changes, applying DataFlowSanitizer to a program +will not alter its behavior. To use DataFlowSanitizer, the program +uses API functions to apply tags to data to cause it to be tracked, and to +check the tag of a specific data item. DataFlowSanitizer manages +the propagation of tags through the program according to its data flow. + +The APIs are defined in the header file ``sanitizer/dfsan_interface.h``. +For further information about each function, please refer to the header +file. + +ABI List +-------- + +DataFlowSanitizer uses a list of functions known as an ABI list to decide +whether a call to a specific function should use the operating system's native +ABI or whether it should use a variant of this ABI that also propagates labels +through function parameters and return values. The ABI list file also controls +how labels are propagated in the former case. DataFlowSanitizer comes with a +default ABI list which is intended to eventually cover the glibc library on +Linux but it may become necessary for users to extend the ABI list in cases +where a particular library or function cannot be instrumented (e.g. because +it is implemented in assembly or another language which DataFlowSanitizer does +not support) or a function is called from a library or function which cannot +be instrumented. + +DataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`. +The pass treats every function in the ``uninstrumented`` category in the +ABI list file as conforming to the native ABI. Unless the ABI list contains +additional categories for those functions, a call to one of those functions +will produce a warning message, as the labelling behavior of the function +is unknown. The other supported categories are ``discard``, ``functional`` +and ``custom``. + +* ``discard`` -- To the extent that this function writes to (user-accessible) + memory, it also updates labels in shadow memory (this condition is trivially + satisfied for functions which do not write to user-accessible memory). Its + return value is unlabelled. +* ``functional`` -- Like ``discard``, except that the label of its return value + is the union of the label of its arguments. +* ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F`` + is called, where ``F`` is the name of the function. This function may wrap + the original function or provide its own implementation. This category is + generally used for uninstrumentable functions which write to user-accessible + memory or which have more complex label propagation behavior. The signature + of ``__dfsw_F`` is based on that of ``F`` with each argument having a + label of type ``dfsan_label`` appended to the argument list. If ``F`` + is of non-void return type a final argument of type ``dfsan_label *`` + is appended to which the custom function can store the label for the + return value. For example: + +.. code-block:: c++ + + void f(int x); + void __dfsw_f(int x, dfsan_label x_label); + + void *memcpy(void *dest, const void *src, size_t n); + void *__dfsw_memcpy(void *dest, const void *src, size_t n, + dfsan_label dest_label, dfsan_label src_label, + dfsan_label n_label, dfsan_label *ret_label); + +If a function defined in the translation unit being compiled belongs to the +``uninstrumented`` category, it will be compiled so as to conform to the +native ABI. Its arguments will be assumed to be unlabelled, but it will +propagate labels in shadow memory. + +For example: + +.. code-block:: none + + # main is called by the C runtime using the native ABI. + fun:main=uninstrumented + fun:main=discard + + # malloc only writes to its internal data structures, not user-accessible memory. + fun:malloc=uninstrumented + fun:malloc=discard + + # tolower is a pure function. + fun:tolower=uninstrumented + fun:tolower=functional + + # memcpy needs to copy the shadow from the source to the destination region. + # This is done in a custom function. + fun:memcpy=uninstrumented + fun:memcpy=custom + +Example +======= + +The following program demonstrates label propagation by checking that +the correct labels are propagated. + +.. code-block:: c++ + + #include <sanitizer/dfsan_interface.h> + #include <assert.h> + + int main(void) { + int i = 1; + dfsan_label i_label = dfsan_create_label("i", 0); + dfsan_set_label(i_label, &i, sizeof(i)); + + int j = 2; + dfsan_label j_label = dfsan_create_label("j", 0); + dfsan_set_label(j_label, &j, sizeof(j)); + + int k = 3; + dfsan_label k_label = dfsan_create_label("k", 0); + dfsan_set_label(k_label, &k, sizeof(k)); + + dfsan_label ij_label = dfsan_get_label(i + j); + assert(dfsan_has_label(ij_label, i_label)); + assert(dfsan_has_label(ij_label, j_label)); + assert(!dfsan_has_label(ij_label, k_label)); + + dfsan_label ijk_label = dfsan_get_label(i + j + k); + assert(dfsan_has_label(ijk_label, i_label)); + assert(dfsan_has_label(ijk_label, j_label)); + assert(dfsan_has_label(ijk_label, k_label)); + + return 0; + } + +Current status +============== + +DataFlowSanitizer is a work in progress, currently under development for +x86\_64 Linux. + +Design +====== + +Please refer to the :doc:`design document<DataFlowSanitizerDesign>`. |