Thursday, July 11, 2013

Clang's AST dump AKA 'WTH is the compiler doing???'

It's no secret that I use the Clang compiler for development. Although GCC is still somewhat better when things like the performance of the resulting code matter, there are other features that matter more during development. And although again competition helps (it's not difficult to guess where the inspiration for the new error reporting in 4.8 comes from), there are features where I expect it'd be hard for GCC to match Clang. The capabilities and ease of writing Clang plugins is one thing, but there are more hidden secrets, like the AST dump.

If Clang is invoked also with -Xclang -ast-dump options, it'll dump its internal representation of the compiled source. Which can be pretty useful when the source code doesn't actually mean what one expects, or if there's something unexpected from elsewhere interfering. Consider the following (simple, for clarity) example:

#include <iostream>
using namespace std;

class A

class B
        operator A() const { return A(); }

class C
    : public B

void foo( const A& )
    cout << "A" << endl;

void foo( B& )
    cout << "B" << endl;

int main()
    foo( C());

Looking only at class C, it may perhaps come as a surprise to some that this prints "A" and not "B". And overlooking the missing const or not knowing that it will prevent passing the temporary to the function certainly helps with the surprise, but even if not, still, so what is actually going on? With larger codebase, that can be a lot of time to find out. But finding out what the compiler thinks about the code can help:

$ clang++ -Wall a.cpp -Xclang -ast-dump
 `-FunctionDecl 0x90ffb90 <line:29:1, line:32:5> main 'int (void)'
  `-CompoundStmt 0x9100078 <line:30:5, line:32:5>
    `-CallExpr 0x90fffb8 <line:31:5, col:13> 'void'
      |-ImplicitCastExpr 0x90fffa8 <col:5> 'void (*)(const class A &)' <FunctionToPointerDecay>
      | `-DeclRefExpr 0x90fff74 <col:5> 'void (const class A &)' lvalue Function 0x90fec80 'foo' 'void (const class A &)'
      `-MaterializeTemporaryExpr 0x9100068 <col:10, col:12> 'const class A' lvalue
        `-ImplicitCastExpr 0x9100058 <col:10, col:12> 'const class A' <NoOp>
          `-ImplicitCastExpr 0x9100048 <col:10, col:12> 'class A' <UserDefinedConversion>
            `-CXXMemberCallExpr 0x9100028 <col:10, col:12> 'class A'
              `-MemberExpr 0x9100008 <col:10, col:12> '<bound member function type>' .operator A 0x90fe740
                `-ImplicitCastExpr 0x90ffff8 <col:10, col:12> 'const class B' <UncheckedDerivedToBase (B)>
                  `-CXXTemporaryObjectExpr 0x90ffdd8 <col:10, col:12> 'class C' 'void (void)' zeroing

Knowing a bit about how compilers work helps a lot, but even without it this is quite simple to read. From bottom to up, there's a temporary object of class C created and it's cast to its base class B. That's the expected part, the unexpected part is the 3 AST nodes up, which show that the object is converted to class A by a user defined conversion using operator A(). Which, as the rest of this part of the AST dump shows, results in calling foo( const A& ). Mystery solved.

(Fun trivia: I once helped a GCC developer to disentangle a problem in a complex C++ testsuite using this. But don't tell anyone ;) . )


  1. just as side note... you know gcc -fdump-tree-original-raw ?

    on you can find also a graphviz integration tool...

  2. No, I don't, and after seeing the kind of output -fdump-tree-original or ast2dot generate, I intend to keep it that way. Just like with e.g. compiler plugins, while one could use GCC for this, it is nowhere near what Clang provides.