I've been using Qi and Karma to do some processing on several small languages. Most of the grammars are pretty small (20-40 rules). I've been able to use autorules almost exclusively, so my parse trees consist entirely of variants, structs, and std::vectors.
This setup works great for the common case:
1) parse something (Qi),
2) make minor manipulations to the parse tree (visitor), and
3) output something (Karma).
However, I'm concerned about what will happen if I want to make complex structural changes to a syntax tree, like moving big subtrees around. Consider the following toy example:
A grammar for s-expr-style logical expressions that uses autorules...
// Inside grammar class; rule names match struct names...
pexpr %= pand | por | var | bconst;
pand %= lit("(and ") >> (pexpr % lit(" ")) >> ")";
por %= lit("(or ") >> (pexpr % lit(" ")) >> ")";
pnot %= lit("(not ") >> pexpr >> ")";
... which leads to parse tree representation that looks like this...
struct var {
std::string name;
};
struct bconst {
bool val;
};
struct pand;
struct por;
struct pnot;
typedef boost::variant<bconst,
var,
boost::recursive_wrapper<pand>,
boost::recursive_wrapper<por>,
boost::recursive_wrapper<pnot> > pexpr;
struct pand {
std::vector<pexpr> operands;
};
struct por {
std::vector<pexpr> operands;
};
struct pnot {
pexpr victim;
};
// Many Fusion Macros here
Suppose I have a parse tree that looks something like this:
pand
/ ... \
por por
/ \ / \
var var var var
(The ellipsis means 'many more children of similar shape for pand.')
Now, suppose that I want negate each of the por nodes, so that the end result is:
pand
/ ... \
pnot pnot
| |
por por
/ \ / \
var var var var
The direct approach would be, for each por subtree:
- create pnot node
(copies por in construction);
- re-assign the appropriate vector slot in the pand node
(copies pnot node and its por subtree).
Alternatively, I could construct a separate vector, and then replace (swap) the pand vector wholesale, eliminating a second round of copying.
All of this seems cumbersome compared to a pointer-based tree representation, which would allow for the pnot nodes to be inserted without any copying of existing nodes. My question:
Is there a way to avoid copy-heavy tree manipulations with autorule-compliant data structures? Should I bite the bullet and just use non-autorules to build a pointer-based AST (e.g., http://boost-spirit.com/home/2010/03/11/s-expressions-and-variants/)?