Binary or more generally, N-ary trees are very useful in symbolic computation, predicate calculus and so on.
Suppose we want to implement something simple, like an expression tree that can encode arithmetic formulas, something that could possibly look like this (I’ve added the node outputs as edge labels):

So we have:
- Leaf nodes, or terminals, which generally contain a value (be it random or corresponding to some value from the dataset etc)
- Non-leaf nodes, or functions, which execute an operation on their input
Let’s see how we can express this in C++, but first, since we are also interested in benchmarking our tree interpreter, we introduce an object counter using the Curiously Recurring Template Pattern (CRTP).
// Object counter
template <typename T>
struct ObjectCounter
{
ObjectCounter()
{
++objects_created;
++objects_alive;
}
virtual ~ObjectCounter()
{
--objects_alive;
}
static long int objects_created;
static long int objects_alive;
};
template <typename T> long int ObjectCounter<T>::objects_created(0);
template <typename T> long int ObjectCounter<T>::objects_alive(0);
The different types of nodes should have a base class:
enum {TERMINAL, FUNCTION}; // node types
class Node : public ObjectCounter<Node>
{
public:
explicit Node(int type) : nodeType(type), evaluated(false), evalValue(0.0) {}
virtual ~Node() {}
int Type() const { return nodeType; }
void Type(int type) { nodeType = type; }
bool Evaluated() const { return evaluated; }
void Evaluated(bool value) { evaluated = value; }
double EvalValue() const { return evalValue; }
void EvalValue(double value) { evalValue = value; }
virtual double Evaluate() =0; //!< update eval values and flags
protected:
int nodeType;
bool evaluated;
double evalValue;
};
The
Terminal is easiest to declare:
class Terminal : public Node
{
public:
Terminal() : Node(TERMINAL) {}
~Terminal() {}
double Value() const { return val; }
void Value(double value_) { val = value_; }
double Evaluate() { return val; }
private:
double val;
};
For our
Function class, we need to think about how we design and use our operators. Since I’m lazy, I wanted to use the binary operators from
functional header, but using them raises one important issue: we’d like to store them in a container, like
std::vector, which is not possible for operators like
std::plus,
std::minus,
std::multiplies and
std::divides due to the fact that their base class,
std::binary_function is not a polymorphic base class (and if it were, I would need to store pointers to it, not copies). In short, something like
typedef std::binary_function<double,double,double> binary_op;
std::vector<binary_op> functions;
would not work.
Therefore, for our functions vector we need a function type that can wrap polymorphic function objects.
This is where we consider our options: we want binary operators that accept and return double values.
Before we start considering the alternatives, let’s take a look at how our Function class might look like:
class Function : public Node
{
public:
Function() : Node(FUNCTION) {}
~Function()
{
for (auto i = children.begin(); i != children.end(); ++i)
delete *i;
}
void setOp(binary_op& op_) { op = op_; }
std::vector<Node*>& Children() { return children; }
const std::string& Symbol() const { return symbol; }
void setSymbol(std::string& symbol_) { symbol = symbol_; }
double operator()(double a, double b) {
return op(a,b);
}
double Evaluate()
{
EvalValue( (*this)(Children()[0]->Evaluate(), Children()[1]->Evaluate()) );
return EvalValue();
}
So a
Function uses its () operator to apply a
binary_op to its arguments. The problem is how do we define and use the
binary_op.
-
First and easiest,
boost::function:
typedef boost::function<double(double,double)> binary_op;
-
Second, and a little bit “more lightweight”, is the
std::tr1::function from the new C++0x standard:
typedef std::tr1::function<double(double,double)> binary_op;
Benchmark
At this point, I was already very curious to see how well this tree interpreter performs. I tested its speed against a C# tree interpreter that is part of HeuristicLab, which is a framework for heuristic and evolutionary algorithms developed in C# using .NET 4.0, and also against an older project of mine which has a different (and less generic) design: different function node types which have different methods defining the operations they perform.
So I got the following figures:
- HeuristicLab arithmetic grammar tree interpreter: 3.37e+007 nodes/s
- Other project (more straightforward): 4.49e+007 nodes/s
At this point I was really curious to see how the minimalistic approach would perform, but unfortunately the figures were rather disappointing:
- Using
boost::function: 2.45e+007 nodes/s
- Using
std::tr1::function: 3.1e+007 nodes/s
Surprisingly, boost::function is quite slow, and although std::tr1::function is a bit faster, I still wasn’t happy.
This is when internet research comes in handy, as I have discovered the following way of wrapping function objects:
typedef double (*binary_op)(double,double);
template<typename F> typename F::result_type
functor(typename F::first_argument_type arg1, typename F::second_argument_type arg2)
{
return F()(arg1, arg2);
}
Using the function above requires a small correction to the code. Where we previously had:
std::vector<binary_op> functions;
functions.push_back(std::plus<double>());
now we must have:
std::vector<binary_op> functions;
functions.push_back(functor< std::plus<double> >);
This approach gives a substantial speed boost: 3.54e+007 nodes/s, which is still not great, but, considering the simplicity of the design (a couple of node classes, a tree class and an evaluation function), I think its pretty ok.