diff options
author | dim <dim@FreeBSD.org> | 2013-04-08 18:45:10 +0000 |
---|---|---|
committer | dim <dim@FreeBSD.org> | 2013-04-08 18:45:10 +0000 |
commit | c72c57c9e9b69944e3e009cd5e209634839581d3 (patch) | |
tree | 4fc2f184c499d106f29a386c452b49e5197bf63d /docs/LibASTMatchersTutorial.rst | |
parent | 5b20025c30d23d521e12c1f33ec8fa6b821952cd (diff) | |
download | FreeBSD-src-c72c57c9e9b69944e3e009cd5e209634839581d3.zip FreeBSD-src-c72c57c9e9b69944e3e009cd5e209634839581d3.tar.gz |
Vendor import of clang trunk r178860:
http://llvm.org/svn/llvm-project/cfe/trunk@178860
Diffstat (limited to 'docs/LibASTMatchersTutorial.rst')
-rw-r--r-- | docs/LibASTMatchersTutorial.rst | 538 |
1 files changed, 538 insertions, 0 deletions
diff --git a/docs/LibASTMatchersTutorial.rst b/docs/LibASTMatchersTutorial.rst new file mode 100644 index 0000000..ba568e3 --- /dev/null +++ b/docs/LibASTMatchersTutorial.rst @@ -0,0 +1,538 @@ +=============================================================== +Tutorial for building tools using LibTooling and LibASTMatchers +=============================================================== + +This document is intended to show how to build a useful source-to-source +translation tool based on Clang's `LibTooling <LibTooling.html>`_. It is +explicitly aimed at people who are new to Clang, so all you should need +is a working knowledge of C++ and the command line. + +In order to work on the compiler, you need some basic knowledge of the +abstract syntax tree (AST). To this end, the reader is incouraged to +skim the :doc:`Introduction to the Clang +AST <IntroductionToTheClangAST>` + +Step 0: Obtaining Clang +======================= + +As Clang is part of the LLVM project, you'll need to download LLVM's +source code first. Both Clang and LLVM are maintained as Subversion +repositories, but we'll be accessing them through the git mirror. For +further information, see the `getting started +guide <http://llvm.org/docs/GettingStarted.html>`_. + +.. code-block:: console + + mkdir ~/clang-llvm && cd ~/clang-llvm + git clone http://llvm.org/git/llvm.git + cd llvm/tools + git clone http://llvm.org/git/clang.git + +Next you need to obtain the CMake build system and Ninja build tool. You +may already have CMake installed, but current binary versions of CMake +aren't built with Ninja support. + +.. code-block:: console + + cd ~/clang-llvm + git clone https://github.com/martine/ninja.git + cd ninja + git checkout release + ./bootstrap.py + sudo cp ninja /usr/bin/ + + cd ~/clang-llvm + git clone git://cmake.org/stage/cmake.git + cd cmake + git checkout next + ./bootstrap + make + sudo make install + +Okay. Now we'll build Clang! + +.. code-block:: console + + cd ~/clang-llvm + mkdir build && cd build + cmake -G Ninja ../llvm -DLLVM_BUILD_TESTS=ON # Enable tests; default is off. + ninja + ninja check # Test LLVM only. + ninja clang-test # Test Clang only. + ninja install + +And we're live. + +All of the tests should pass, though there is a (very) small chance that +you can catch LLVM and Clang out of sync. Running ``'git svn rebase'`` +in both the llvm and clang directories should fix any problems. + +Finally, we want to set Clang as its own compiler. + +.. code-block:: console + + cd ~/clang-llvm/build + ccmake ../llvm + +The second command will bring up a GUI for configuring Clang. You need +to set the entry for ``CMAKE_CXX_COMPILER``. Press ``'t'`` to turn on +advanced mode. Scroll down to ``CMAKE_CXX_COMPILER``, and set it to +``/usr/bin/clang++``, or wherever you installed it. Press ``'c'`` to +configure, then ``'g'`` to generate CMake's files. + +Finally, run ninja one last time, and you're done. + +Step 1: Create a ClangTool +========================== + +Now that we have enough background knowledge, it's time to create the +simplest productive ClangTool in existence: a syntax checker. While this +already exists as ``clang-check``, it's important to understand what's +going on. + +First, we'll need to create a new directory for our tool and tell CMake +that it exists. As this is not going to be a core clang tool, it will +live in the ``tools/extra`` repository. + +.. code-block:: console + + cd ~/clang-llvm/llvm/tools/clang + mkdir tools/extra/loop-convert + echo 'add_subdirectory(loop-convert)' >> tools/extra/CMakeLists.txt + vim tools/extra/loop-convert/CMakeLists.txt + +CMakeLists.txt should have the following contents: + +:: + + set(LLVM_LINK_COMPONENTS support) + set(LLVM_USED_LIBS clangTooling clangBasic clangAST) + + add_clang_executable(loop-convert + LoopConvert.cpp + ) + target_link_libraries(loop-convert + clangTooling + clangBasic + clangASTMatchers + ) + +With that done, Ninja will be able to compile our tool. Let's give it +something to compile! Put the following into +``tools/extra/loop-convert/LoopConvert.cpp``. A detailed explanation of +why the different parts are needed can be found in the `LibTooling +documentation <LibTooling.html>`_. + +.. code-block:: c++ + + // Declares clang::SyntaxOnlyAction. + #include "clang/Frontend/FrontendActions.h" + #include "clang/Tooling/CommonOptionsParser.h" + #include "clang/Tooling/Tooling.h" + // Declares llvm::cl::extrahelp. + #include "llvm/Support/CommandLine.h" + + using namespace clang::tooling; + using namespace llvm; + + // CommonOptionsParser declares HelpMessage with a description of the common + // command-line options related to the compilation database and input files. + // It's nice to have this help message in all tools. + static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage); + + // A help message for this specific tool can be added afterwards. + static cl::extrahelp MoreHelp("\nMore help text..."); + + int main(int argc, const char **argv) { + CommonOptionsParser OptionsParser(argc, argv); + ClangTool Tool(OptionsParser.getCompilations(), + OptionsParser.getSourcePathList()); + return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>()); + } + +And that's it! You can compile our new tool by running ninja from the +``build`` directory. + +.. code-block:: console + + cd ~/clang-llvm/build + ninja + +You should now be able to run the syntax checker, which is located in +``~/clang-llvm/build/bin``, on any source file. Try it! + +.. code-block:: console + + cat "void main() {}" > test.cpp + bin/loop-convert test.cpp -- + +Note the two dashes after we specify the source file. The additional +options for the compiler are passed after the dashes rather than loading +them from a compilation database - there just aren't any options needed +right now. + +Intermezzo: Learn AST matcher basics +==================================== + +Clang recently introduced the :doc:`ASTMatcher +library <LibASTMatchers>` to provide a simple, powerful, and +concise way to describe specific patterns in the AST. Implemented as a +DSL powered by macros and templates (see +`ASTMatchers.h <../doxygen/ASTMatchers_8h_source.html>`_ if you're +curious), matchers offer the feel of algebraic data types common to +functional programming languages. + +For example, suppose you wanted to examine only binary operators. There +is a matcher to do exactly that, conveniently named ``binaryOperator``. +I'll give you one guess what this matcher does: + +.. code-block:: c++ + + binaryOperator(hasOperatorName("+"), hasLHS(integerLiteral(equals(0)))) + +Shockingly, it will match against addition expressions whose left hand +side is exactly the literal 0. It will not match against other forms of +0, such as ``'\0'`` or ``NULL``, but it will match against macros that +expand to 0. The matcher will also not match against calls to the +overloaded operator ``'+'``, as there is a separate ``operatorCallExpr`` +matcher to handle overloaded operators. + +There are AST matchers to match all the different nodes of the AST, +narrowing matchers to only match AST nodes fulfilling specific criteria, +and traversal matchers to get from one kind of AST node to another. For +a complete list of AST matchers, take a look at the `AST Matcher +References <LibASTMatchersReference.html>`_ + +All matcher that are nouns describe entities in the AST and can be +bound, so that they can be referred to whenever a match is found. To do +so, simply call the method ``bind`` on these matchers, e.g.: + +.. code-block:: c++ + + variable(hasType(isInteger())).bind("intvar") + +Step 2: Using AST matchers +========================== + +Okay, on to using matchers for real. Let's start by defining a matcher +which will capture all ``for`` statements that define a new variable +initialized to zero. Let's start with matching all ``for`` loops: + +.. code-block:: c++ + + forStmt() + +Next, we want to specify that a single variable is declared in the first +portion of the loop, so we can extend the matcher to + +.. code-block:: c++ + + forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl())))) + +Finally, we can add the condition that the variable is initialized to +zero. + +.. code-block:: c++ + + forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( + hasInitializer(integerLiteral(equals(0)))))))) + +It is fairly easy to read and understand the matcher definition ("match +loops whose init portion declares a single variable which is initialized +to the integer literal 0"), but deciding that every piece is necessary +is more difficult. Note that this matcher will not match loops whose +variables are initialized to ``'\0'``, ``0.0``, ``NULL``, or any form of +zero besides the integer 0. + +The last step is giving the matcher a name and binding the ``ForStmt`` +as we will want to do something with it: + +.. code-block:: c++ + + StatementMatcher LoopMatcher = + forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( + hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); + +Once you have defined your matchers, you will need to add a little more +scaffolding in order to run them. Matchers are paired with a +``MatchCallback`` and registered with a ``MatchFinder`` object, then run +from a ``ClangTool``. More code! + +Add the following to ``LoopConvert.cpp``: + +.. code-block:: c++ + + #include "clang/ASTMatchers/ASTMatchers.h" + #include "clang/ASTMatchers/ASTMatchFinder.h" + + using namespace clang; + using namespace clang::ast_matchers; + + StatementMatcher LoopMatcher = + forStmt(hasLoopInit(declStmt(hasSingleDecl(varDecl( + hasInitializer(integerLiteral(equals(0)))))))).bind("forLoop"); + + class LoopPrinter : public MatchFinder::MatchCallback { + public : + virtual void run(const MatchFinder::MatchResult &Result) { + if (const ForStmt *FS = Result.Nodes.getNodeAs<clang::ForStmt>("forLoop")) + FS->dump(); + }; + +And change ``main()`` to: + +.. code-block:: c++ + + int main(int argc, const char **argv) { + CommonOptionsParser OptionsParser(argc, argv); + ClangTool Tool(OptionsParser.getCompilations(), + OptionsParser.getSourcePathList()); + + LoopPrinter Printer; + MatchFinder Finder; + Finder.addMatcher(LoopMatcher, &Printer); + + return Tool.run(newFrontendActionFactory(&Finder)); + } + +Now, you should be able to recompile and run the code to discover for +loops. Create a new file with a few examples, and test out our new +handiwork: + +.. code-block:: console + + cd ~/clang-llvm/llvm/llvm_build/ + ninja loop-convert + vim ~/test-files/simple-loops.cc + bin/loop-convert ~/test-files/simple-loops.cc + +Step 3.5: More Complicated Matchers +=================================== + +Our simple matcher is capable of discovering for loops, but we would +still need to filter out many more ourselves. We can do a good portion +of the remaining work with some cleverly chosen matchers, but first we +need to decide exactly which properties we want to allow. + +How can we characterize for loops over arrays which would be eligible +for translation to range-based syntax? Range based loops over arrays of +size ``N`` that: + +- start at index ``0`` +- iterate consecutively +- end at index ``N-1`` + +We already check for (1), so all we need to add is a check to the loop's +condition to ensure that the loop's index variable is compared against +``N`` and another check to ensure that the increment step just +increments this same variable. The matcher for (2) is straightforward: +require a pre- or post-increment of the same variable declared in the +init portion. + +Unfortunately, such a matcher is impossible to write. Matchers contain +no logic for comparing two arbitrary AST nodes and determining whether +or not they are equal, so the best we can do is matching more than we +would like to allow, and punting extra comparisons to the callback. + +In any case, we can start building this sub-matcher. We can require that +the increment step be a unary increment like this: + +.. code-block:: c++ + + hasIncrement(unaryOperator(hasOperatorName("++"))) + +Specifying what is incremented introduces another quirk of Clang's AST: +Usages of variables are represented as ``DeclRefExpr``'s ("declaration +reference expressions") because they are expressions which refer to +variable declarations. To find a ``unaryOperator`` that refers to a +specific declaration, we can simply add a second condition to it: + +.. code-block:: c++ + + hasIncrement(unaryOperator( + hasOperatorName("++"), + hasUnaryOperand(declRefExpr()))) + +Furthermore, we can restrict our matcher to only match if the +incremented variable is an integer: + +.. code-block:: c++ + + hasIncrement(unaryOperator( + hasOperatorName("++"), + hasUnaryOperand(declRefExpr(to(varDecl(hasType(isInteger()))))))) + +And the last step will be to attach an identifier to this variable, so +that we can retrieve it in the callback: + +.. code-block:: c++ + + hasIncrement(unaryOperator( + hasOperatorName("++"), + hasUnaryOperand(declRefExpr(to( + varDecl(hasType(isInteger())).bind("incrementVariable")))))) + +We can add this code to the definition of ``LoopMatcher`` and make sure +that our program, outfitted with the new matcher, only prints out loops +that declare a single variable initialized to zero and have an increment +step consisting of a unary increment of some variable. + +Now, we just need to add a matcher to check if the condition part of the +``for`` loop compares a variable against the size of the array. There is +only one problem - we don't know which array we're iterating over +without looking at the body of the loop! We are again restricted to +approximating the result we want with matchers, filling in the details +in the callback. So we start with: + +.. code-block:: c++ + + hasCondition(binaryOperator(hasOperatorName("<")) + +It makes sense to ensure that the left-hand side is a reference to a +variable, and that the right-hand side has integer type. + +.. code-block:: c++ + + hasCondition(binaryOperator( + hasOperatorName("<"), + hasLHS(declRefExpr(to(varDecl(hasType(isInteger()))))), + hasRHS(expr(hasType(isInteger()))))) + +Why? Because it doesn't work. Of the three loops provided in +``test-files/simple.cpp``, zero of them have a matching condition. A +quick look at the AST dump of the first for loop, produced by the +previous iteration of loop-convert, shows us the answer: + +:: + + (ForStmt 0x173b240 + (DeclStmt 0x173afc8 + 0x173af50 "int i = + (IntegerLiteral 0x173afa8 'int' 0)") + <<>> + (BinaryOperator 0x173b060 '_Bool' '<' + (ImplicitCastExpr 0x173b030 'int' + (DeclRefExpr 0x173afe0 'int' lvalue Var 0x173af50 'i' 'int')) + (ImplicitCastExpr 0x173b048 'int' + (DeclRefExpr 0x173b008 'const int' lvalue Var 0x170fa80 'N' 'const int'))) + (UnaryOperator 0x173b0b0 'int' lvalue prefix '++' + (DeclRefExpr 0x173b088 'int' lvalue Var 0x173af50 'i' 'int')) + (CompoundStatement … + +We already know that the declaration and increments both match, or this +loop wouldn't have been dumped. The culprit lies in the implicit cast +applied to the first operand (i.e. the LHS) of the less-than operator, +an L-value to R-value conversion applied to the expression referencing +``i``. Thankfully, the matcher library offers a solution to this problem +in the form of ``ignoringParenImpCasts``, which instructs the matcher to +ignore implicit casts and parentheses before continuing to match. +Adjusting the condition operator will restore the desired match. + +.. code-block:: c++ + + hasCondition(binaryOperator( + hasOperatorName("<"), + hasLHS(ignoringParenImpCasts(declRefExpr( + to(varDecl(hasType(isInteger())))))), + hasRHS(expr(hasType(isInteger()))))) + +After adding binds to the expressions we wished to capture and +extracting the identifier strings into variables, we have array-step-2 +completed. + +Step 4: Retrieving Matched Nodes +================================ + +So far, the matcher callback isn't very interesting: it just dumps the +loop's AST. At some point, we will need to make changes to the input +source code. Next, we'll work on using the nodes we bound in the +previous step. + +The ``MatchFinder::run()`` callback takes a +``MatchFinder::MatchResult&`` as its parameter. We're most interested in +its ``Context`` and ``Nodes`` members. Clang uses the ``ASTContext`` +class to represent contextual information about the AST, as the name +implies, though the most functionally important detail is that several +operations require an ``ASTContext*`` parameter. More immediately useful +is the set of matched nodes, and how we retrieve them. + +Since we bind three variables (identified by ConditionVarName, +InitVarName, and IncrementVarName), we can obtain the matched nodes by +using the ``getNodeAs()`` member function. + +In ``LoopActions.cpp``: + +.. code-block:: c++ + + #include "clang/AST/ASTContext.h" + + void LoopPrinter::run(const MatchFinder::MatchResult &Result) { + ASTContext *Context = Result.Context; + const ForStmt *FS = Result.Nodes.getStmtAs<ForStmt>(LoopName); + // We do not want to convert header files! + if (!FS || !Context->getSourceManager().isFromMainFile(FS->getForLoc())) + return; + const VarDecl *IncVar = Result.Nodes.getNodeAs<VarDecl>(IncrementVarName); + const VarDecl *CondVar = Result.Nodes.getNodeAs<VarDecl>(ConditionVarName); + const VarDecl *InitVar = Result.Nodes.getNodeAs<VarDecl>(InitVarName); + +Now that we have the three variables, represented by their respective +declarations, let's make sure that they're all the same, using a helper +function I call ``areSameVariable()``. + +.. code-block:: c++ + + if (!areSameVariable(IncVar, CondVar) || !areSameVariable(IncVar, InitVar)) + return; + llvm::outs() << "Potential array-based loop discovered.\n"; + } + +If execution reaches the end of ``LoopPrinter::run()``, we know that the +loop shell that looks like + +.. code-block:: c++ + + for (int i= 0; i < expr(); ++i) { ... } + +For now, we will just print a message explaining that we found a loop. +The next section will deal with recursively traversing the AST to +discover all changes needed. + +As a side note, here is the implementation of ``areSameVariable``. Clang +associates a ``VarDecl`` with each variable to represent the variable's +declaration. Since the "canonical" form of each declaration is unique by +address, all we need to do is make sure neither ``ValueDecl`` (base +class of ``VarDecl``) is ``NULL`` and compare the canonical Decls. + +.. code-block:: c++ + + static bool areSameVariable(const ValueDecl *First, const ValueDecl *Second) { + return First && Second && + First->getCanonicalDecl() == Second->getCanonicalDecl(); + } + +It's not as trivial to test if two expressions are the same, though +Clang has already done the hard work for us by providing a way to +canonicalize expressions: + +.. code-block:: c++ + + static bool areSameExpr(ASTContext *Context, const Expr *First, + const Expr *Second) { + if (!First || !Second) + return false; + llvm::FoldingSetNodeID FirstID, SecondID; + First->Profile(FirstID, *Context, true); + Second->Profile(SecondID, *Context, true); + return FirstID == SecondID; + } + +This code relies on the comparison between two +``llvm::FoldingSetNodeIDs``. As the documentation for +``Stmt::Profile()`` indicates, the ``Profile()`` member function builds +a description of a node in the AST, based on its properties, along with +those of its children. ``FoldingSetNodeID`` then serves as a hash we can +use to compare expressions. We will need ``areSameExpr`` later. Before +you run the new code on the additional loops added to +test-files/simple.cpp, try to figure out which ones will be considered +potentially convertible. |