summaryrefslogtreecommitdiffstats
path: root/docs/LibASTMatchersTutorial.rst
diff options
context:
space:
mode:
authordim <dim@FreeBSD.org>2013-04-08 18:45:10 +0000
committerdim <dim@FreeBSD.org>2013-04-08 18:45:10 +0000
commitc72c57c9e9b69944e3e009cd5e209634839581d3 (patch)
tree4fc2f184c499d106f29a386c452b49e5197bf63d /docs/LibASTMatchersTutorial.rst
parent5b20025c30d23d521e12c1f33ec8fa6b821952cd (diff)
downloadFreeBSD-src-c72c57c9e9b69944e3e009cd5e209634839581d3.zip
FreeBSD-src-c72c57c9e9b69944e3e009cd5e209634839581d3.tar.gz
Vendor import of clang trunk r178860:
http://llvm.org/svn/llvm-project/cfe/trunk@178860
Diffstat (limited to 'docs/LibASTMatchersTutorial.rst')
-rw-r--r--docs/LibASTMatchersTutorial.rst538
1 files changed, 538 insertions, 0 deletions
diff --git a/docs/LibASTMatchersTutorial.rst b/docs/LibASTMatchersTutorial.rst
new file mode 100644
index 0000000..ba568e3
--- /dev/null
+++ b/docs/LibASTMatchersTutorial.rst
@@ -0,0 +1,538 @@
+===============================================================
+Tutorial for building tools using LibTooling and LibASTMatchers
+===============================================================
+
+This document is intended to show how to build a useful source-to-source
+translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is
+explicitly aimed at people who are new to Clang, so all you should need
+is a working knowledge of C++ and the command line.
+
+In order to work on the compiler, you need some basic knowledge of the
+abstract syntax tree (AST). To this end, the reader is incouraged to
+skim the :doc:`Introduction to the Clang
+AST <IntroductionToTheClangAST>`
+
+Step 0: Obtaining Clang
+=======================
+
+As Clang is part of the LLVM project, you'll need to download LLVM's
+source code first. Both Clang and LLVM are maintained as Subversion
+repositories, but we'll be accessing them through the git mirror. For
+further information, see the `getting started
+guide <http://llvm.org/docs/GettingStarted.html>`_.
+
+.. code-block:: console
+
+ mkdir ~/clang-llvm && cd ~/clang-llvm
+ git clone http://llvm.org/git/llvm.git
+ cd llvm/tools
+ git clone http://llvm.org/git/clang.git
+
+Next you need to obtain the CMake build system and Ninja build tool. You
+may already have CMake installed, but current binary versions of CMake
+aren't built with Ninja support.
+
+.. code-block:: console
+
+ cd ~/clang-llvm
+ git clone https://github.com/martine/ninja.git
+ cd ninja
+ git checkout release
+ ./bootstrap.py
+ sudo cp ninja /usr/bin/
+
+ cd ~/clang-llvm
+ git clone git://cmake.org/stage/cmake.git
+ cd cmake
+ git checkout next
+ ./bootstrap
+ make
+ sudo make install
+
+Okay. Now we'll build Clang!
+
+.. code-block:: console
+
+ cd ~/clang-llvm
+ mkdir build && cd build
+ cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off.
+ ninja
+ ninja check # Test LLVM only.
+ ninja clang-test # Test Clang only.
+ ninja install
+
+And we're live.
+
+All of the tests should pass, though there is a (very) small chance that
+you can catch LLVM and Clang out of sync. Running ``'git svn rebase'``
+in both the llvm and clang directories should fix any problems.
+
+Finally, we want to set Clang as its own compiler.
+
+.. code-block:: console
+
+ cd ~/clang-llvm/build
+ ccmake ../llvm
+
+The second command will bring up a GUI for configuring Clang. You need
+to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on
+advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to
+``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to
+configure, then ``'g'`` to generate CMake's files.
+
+Finally, run ninja one last time, and you're done.
+
+Step 1: Create a ClangTool
+==========================
+
+Now that we have enough background knowledge, it's time to create the
+simplest productive ClangTool in existence: a syntax checker. While this
+already exists as ``clang-check``, it's important to understand what's
+going on.
+
+First, we'll need to create a new directory for our tool and tell CMake
+that it exists. As this is not going to be a core clang tool, it will
+live in the ``tools/extra`` repository.
+
+.. code-block:: console
+
+ cd ~/clang-llvm/llvm/tools/clang
+ mkdir tools/extra/loop-convert
+ echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt
+ vim tools/extra/loop-convert/CMakeLists.txt
+
+CMakeLists.txt should have the following contents:
+
+::
+
+ set(LLVM_LINK_COMPONENTS support)
+ set(LLVM_USED_LIBS clangTooling clangBasic clangAST)
+
+ add_clang_executable(loop-convert
+ LoopConvert.cpp
+ )
+ target_link_libraries(loop-convert
+ clangTooling
+ clangBasic
+ clangASTMatchers
+ )
+
+With that done, Ninja will be able to compile our tool. Let's give it
+something to compile! Put the following into
+``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of
+why the different parts are needed can be found in the `LibTooling
+documentation <LibTooling.html>`_.
+
+.. code-block:: c++
+
+ // Declares clang::SyntaxOnlyAction.
+ #include "clang/Frontend/FrontendActions.h"
+ #include "clang/Tooling/CommonOptionsParser.h"
+ #include "clang/Tooling/Tooling.h"
+ // Declares llvm::cl::extrahelp.
+ #include "llvm/Support/CommandLine.h"
+
+ using namespace clang::tooling;
+ using namespace llvm;
+
+ // CommonOptionsParser declares HelpMessage with a description of the common
+ // command-line options related to the compilation database and input files.
+ // It's nice to have this help message in all tools.
+ static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
+
+ // A help message for this specific tool can be added afterwards.
+ static cl::extrahelp MoreHelp("\nMore help text...");
+
+ int main(int argc, const char **argv) {
+ CommonOptionsParser OptionsParser(argc, argv);
+ ClangTool Tool(OptionsParser.getCompilations(),
+ OptionsParser.getSourcePathList());
+ return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
+ }
+
+And that's it! You can compile our new tool by running ninja from the
+``build`` directory.
+
+.. code-block:: console
+
+ cd ~/clang-llvm/build
+ ninja
+
+You should now be able to run the syntax checker, which is located in
+``~/clang-llvm/build/bin``, on any source file. Try it!
+
+.. code-block:: console
+
+ cat "void main() {}" > test.cpp
+ bin/loop-convert test.cpp --
+
+Note the two dashes after we specify the source file. The additional
+options for the compiler are passed after the dashes rather than loading
+them from a compilation database - there just aren't any options needed
+right now.
+
+Intermezzo: Learn AST matcher basics
+====================================
+
+Clang recently introduced the :doc:`ASTMatcher
+library <LibASTMatchers>` to provide a simple, powerful, and
+concise way to describe specific patterns in the AST. Implemented as a
+DSL powered by macros and templates (see
+`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're
+curious), matchers offer the feel of algebraic data types common to
+functional programming languages.
+
+For example, suppose you wanted to examine only binary operators. There
+is a matcher to do exactly that, conveniently named ``binaryOperator``.
+I'll give you one guess what this matcher does:
+
+.. code-block:: c++
+
+ binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0))))
+
+Shockingly, it will match against addition expressions whose left hand
+side is exactly the literal 0. It will not match against other forms of
+0, such as ``'\0'`` or ``NULL``, but it will match against macros that
+expand to 0. The matcher will also not match against calls to the
+overloaded operator ``'+'``, as there is a separate ``operatorCallExpr``
+matcher to handle overloaded operators.
+
+There are AST matchers to match all the different nodes of the AST,
+narrowing matchers to only match AST nodes fulfilling specific criteria,
+and traversal matchers to get from one kind of AST node to another. For
+a complete list of AST matchers, take a look at the `AST Matcher
+References <LibASTMatchersReference.html>`_
+
+All matcher that are nouns describe entities in the AST and can be
+bound, so that they can be referred to whenever a match is found. To do
+so, simply call the method ``bind`` on these matchers, e.g.:
+
+.. code-block:: c++
+
+ variable(hasType(isInteger())).bind("intvar")
+
+Step 2: Using AST matchers
+==========================
+
+Okay, on to using matchers for real. Let's start by defining a matcher
+which will capture all ``for`` statements that define a new variable
+initialized to zero. Let's start with matching all ``for`` loops:
+
+.. code-block:: c++
+
+ forStmt()
+
+Next, we want to specify that a single variable is declared in the first
+portion of the loop, so we can extend the matcher to
+
+.. code-block:: c++
+
+ forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl()))))
+
+Finally, we can add the condition that the variable is initialized to
+zero.
+
+.. code-block:: c++
+
+ forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
+ hasInitializer(integerLiteral(equals(0))))))))
+
+It is fairly easy to read and understand the matcher definition ("match
+loops whose init portion declares a single variable which is initialized
+to the integer literal 0"), but deciding that every piece is necessary
+is more difficult. Note that this matcher will not match loops whose
+variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of
+zero besides the integer 0.
+
+The last step is giving the matcher a name and binding the ``ForStmt``
+as we will want to do something with it:
+
+.. code-block:: c++
+
+ StatementMatcher LoopMatcher =
+ forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
+ hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
+
+Once you have defined your matchers, you will need to add a little more
+scaffolding in order to run them. Matchers are paired with a
+``MatchCallback`` and registered with a ``MatchFinder`` object, then run
+from a ``ClangTool``. More code!
+
+Add the following to ``LoopConvert.cpp``:
+
+.. code-block:: c++
+
+ #include "clang/ASTMatchers/ASTMatchers.h"
+ #include "clang/ASTMatchers/ASTMatchFinder.h"
+
+ using namespace clang;
+ using namespace clang::ast_matchers;
+
+ StatementMatcher LoopMatcher =
+ forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl(
+ hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop");
+
+ class LoopPrinter : public MatchFinder::MatchCallback {
+ public :
+ virtual void run(const MatchFinder::MatchResult &Result) {
+ if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop"))
+ FS->dump();
+ };
+
+And change ``main()`` to:
+
+.. code-block:: c++
+
+ int main(int argc, const char **argv) {
+ CommonOptionsParser OptionsParser(argc, argv);
+ ClangTool Tool(OptionsParser.getCompilations(),
+ OptionsParser.getSourcePathList());
+
+ LoopPrinter Printer;
+ MatchFinder Finder;
+ Finder.addMatcher(LoopMatcher, &Printer);
+
+ return Tool.run(newFrontendActionFactory(&Finder));
+ }
+
+Now, you should be able to recompile and run the code to discover for
+loops. Create a new file with a few examples, and test out our new
+handiwork:
+
+.. code-block:: console
+
+ cd ~/clang-llvm/llvm/llvm_build/
+ ninja loop-convert
+ vim ~/test-files/simple-loops.cc
+ bin/loop-convert ~/test-files/simple-loops.cc
+
+Step 3.5: More Complicated Matchers
+===================================
+
+Our simple matcher is capable of discovering for loops, but we would
+still need to filter out many more ourselves. We can do a good portion
+of the remaining work with some cleverly chosen matchers, but first we
+need to decide exactly which properties we want to allow.
+
+How can we characterize for loops over arrays which would be eligible
+for translation to range-based syntax? Range based loops over arrays of
+size ``N`` that:
+
+- start at index ``0``
+- iterate consecutively
+- end at index ``N-1``
+
+We already check for (1), so all we need to add is a check to the loop's
+condition to ensure that the loop's index variable is compared against
+``N`` and another check to ensure that the increment step just
+increments this same variable. The matcher for (2) is straightforward:
+require a pre- or post-increment of the same variable declared in the
+init portion.
+
+Unfortunately, such a matcher is impossible to write. Matchers contain
+no logic for comparing two arbitrary AST nodes and determining whether
+or not they are equal, so the best we can do is matching more than we
+would like to allow, and punting extra comparisons to the callback.
+
+In any case, we can start building this sub-matcher. We can require that
+the increment step be a unary increment like this:
+
+.. code-block:: c++
+
+ hasIncrement(unaryOperator(hasOperatorName("++")))
+
+Specifying what is incremented introduces another quirk of Clang's AST:
+Usages of variables are represented as ``DeclRefExpr``'s ("declaration
+reference expressions") because they are expressions which refer to
+variable declarations. To find a ``unaryOperator`` that refers to a
+specific declaration, we can simply add a second condition to it:
+
+.. code-block:: c++
+
+ hasIncrement(unaryOperator(
+ hasOperatorName("++"),
+ hasUnaryOperand(declRefExpr())))
+
+Furthermore, we can restrict our matcher to only match if the
+incremented variable is an integer:
+
+.. code-block:: c++
+
+ hasIncrement(unaryOperator(
+ hasOperatorName("++"),
+ hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger())))))))
+
+And the last step will be to attach an identifier to this variable, so
+that we can retrieve it in the callback:
+
+.. code-block:: c++
+
+ hasIncrement(unaryOperator(
+ hasOperatorName("++"),
+ hasUnaryOperand(declRefExpr(to(
+ varDecl(hasType(isInteger())).bind("incrementVariable"))))))
+
+We can add this code to the definition of ``LoopMatcher`` and make sure
+that our program, outfitted with the new matcher, only prints out loops
+that declare a single variable initialized to zero and have an increment
+step consisting of a unary increment of some variable.
+
+Now, we just need to add a matcher to check if the condition part of the
+``for`` loop compares a variable against the size of the array. There is
+only one problem - we don't know which array we're iterating over
+without looking at the body of the loop! We are again restricted to
+approximating the result we want with matchers, filling in the details
+in the callback. So we start with:
+
+.. code-block:: c++
+
+ hasCondition(binaryOperator(hasOperatorName("<"))
+
+It makes sense to ensure that the left-hand side is a reference to a
+variable, and that the right-hand side has integer type.
+
+.. code-block:: c++
+
+ hasCondition(binaryOperator(
+ hasOperatorName("<"),
+ hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))),
+ hasRHS(expr(hasType(isInteger())))))
+
+Why? Because it doesn't work. Of the three loops provided in
+``test-files/simple.cpp``, zero of them have a matching condition. A
+quick look at the AST dump of the first for loop, produced by the
+previous iteration of loop-convert, shows us the answer:
+
+::
+
+ (ForStmt 0x173b240
+ (DeclStmt 0x173afc8
+ 0x173af50 "int i =
+ (IntegerLiteral 0x173afa8 'int' 0)")
+ <<>>
+ (BinaryOperator 0x173b060 '_Bool' '<'
+ (ImplicitCastExpr 0x173b030 'int'
+ (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int'))
+ (ImplicitCastExpr 0x173b048 'int'
+ (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int')))
+ (UnaryOperator 0x173b0b0 'int' lvalue prefix '++'
+ (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int'))
+ (CompoundStatement …
+
+We already know that the declaration and increments both match, or this
+loop wouldn't have been dumped. The culprit lies in the implicit cast
+applied to the first operand (i.e. the LHS) of the less-than operator,
+an L-value to R-value conversion applied to the expression referencing
+``i``. Thankfully, the matcher library offers a solution to this problem
+in the form of ``ignoringParenImpCasts``, which instructs the matcher to
+ignore implicit casts and parentheses before continuing to match.
+Adjusting the condition operator will restore the desired match.
+
+.. code-block:: c++
+
+ hasCondition(binaryOperator(
+ hasOperatorName("<"),
+ hasLHS(ignoringParenImpCasts(declRefExpr(
+ to(varDecl(hasType(isInteger())))))),
+ hasRHS(expr(hasType(isInteger())))))
+
+After adding binds to the expressions we wished to capture and
+extracting the identifier strings into variables, we have array-step-2
+completed.
+
+Step 4: Retrieving Matched Nodes
+================================
+
+So far, the matcher callback isn't very interesting: it just dumps the
+loop's AST. At some point, we will need to make changes to the input
+source code. Next, we'll work on using the nodes we bound in the
+previous step.
+
+The ``MatchFinder::run()`` callback takes a
+``MatchFinder::MatchResult&`` as its parameter. We're most interested in
+its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext``
+class to represent contextual information about the AST, as the name
+implies, though the most functionally important detail is that several
+operations require an ``ASTContext*`` parameter. More immediately useful
+is the set of matched nodes, and how we retrieve them.
+
+Since we bind three variables (identified by ConditionVarName,
+InitVarName, and IncrementVarName), we can obtain the matched nodes by
+using the ``getNodeAs()`` member function.
+
+In ``LoopActions.cpp``:
+
+.. code-block:: c++
+
+ #include "clang/AST/ASTContext.h"
+
+ void LoopPrinter::run(const MatchFinder::MatchResult &Result) {
+ ASTContext *Context = Result.Context;
+ const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>(LoopName);
+ // We do not want to convert header files!
+ if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc()))
+ return;
+ const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>(IncrementVarName);
+ const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>(ConditionVarName);
+ const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>(InitVarName);
+
+Now that we have the three variables, represented by their respective
+declarations, let's make sure that they're all the same, using a helper
+function I call ``areSameVariable()``.
+
+.. code-block:: c++
+
+ if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar))
+ return;
+ llvm::outs() << "Potential array-based loop discovered.\n";
+ }
+
+If execution reaches the end of ``LoopPrinter::run()``, we know that the
+loop shell that looks like
+
+.. code-block:: c++
+
+ for (int i= 0; i < expr(); ++i) { ... }
+
+For now, we will just print a message explaining that we found a loop.
+The next section will deal with recursively traversing the AST to
+discover all changes needed.
+
+As a side note, here is the implementation of ``areSameVariable``. Clang
+associates a ``VarDecl`` with each variable to represent the variable's
+declaration. Since the "canonical" form of each declaration is unique by
+address, all we need to do is make sure neither ``ValueDecl`` (base
+class of ``VarDecl``) is ``NULL`` and compare the canonical Decls.
+
+.. code-block:: c++
+
+ static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) {
+ return First && Second &&
+ First->getCanonicalDecl() == Second->getCanonicalDecl();
+ }
+
+It's not as trivial to test if two expressions are the same, though
+Clang has already done the hard work for us by providing a way to
+canonicalize expressions:
+
+.. code-block:: c++
+
+ static bool areSameExpr(ASTContext *Context, const Expr *First,
+ const Expr *Second) {
+ if (!First || !Second)
+ return false;
+ llvm::FoldingSetNodeID FirstID, SecondID;
+ First->Profile(FirstID, *Context, true);
+ Second->Profile(SecondID, *Context, true);
+ return FirstID == SecondID;
+ }
+
+This code relies on the comparison between two
+``llvm::FoldingSetNodeIDs``. As the documentation for
+``Stmt::Profile()`` indicates, the ``Profile()`` member function builds
+a description of a node in the AST, based on its properties, along with
+those of its children. ``FoldingSetNodeID`` then serves as a hash we can
+use to compare expressions. We will need ``areSameExpr`` later. Before
+you run the new code on the additional loops added to
+test-files/simple.cpp, try to figure out which ones will be considered
+potentially convertible.
OpenPOWER on IntegriCloud