by David Vandevoorde, Douglas Gregor, and Nicolai M. Josuttis

Introduction to metaprogramming in C++

how-to
Mar 1, 201829 mins

This book excerpt from “C++ Templates, Second Edition” explains how to get more functional, more efficient, and more easily maintained code where user-defined computation happens at translation time

binary code
Credit: Thinkstock

Metaprogramming consists of programming a program. In other words, you lay out code that the programming system executes to generate new code that implements the functionality you really want. Usually, the term metaprogramming implies a reflexive attribute: The metaprogramming component is part of the program for which it generates a bit of code (that is, an additional or different bit of the program).

Why would metaprogramming be desirable? As with most other programming techniques, the goal is to achieve more functionality with less effort, where effort can be measured as code size, maintenance cost, and so forth. What characterizes metaprogramming is that some user-defined computation happens at translation time. The underlying motivation is often performance (things computed at translation time can frequently be optimized away) or interface simplicity (a metaprogram is generally shorter than what it expands to) or both.

Metaprogramming often relies on the concepts of traits and type functions.

The state of modern C++ metaprogramming

C++ metaprogramming techniques evolved over time. Let’s categorize the various approaches to metaprogramming that are in common use in modern C++.

Value metaprogramming

In the first edition of this book, we were limited to the features introduced in the original C++ standard (published in 1998, with minor corrections in 2003). In that world, composing simple compile-time (“meta-”) computations was a minor challenge. We therefore devoted a good chunk of this chapter to doing just that; one fairly advanced example computed the square root of an integer value at compile time using recursive template instantiations. C++ 11 and, especially, C++ 14 removed most of that challenge with the introduction of constexpr functions. (The C++11 constexpr capabilities were sufficient to solve many common challenges, but the programming model was not always pleasant (for example, no loop statements were available, so iterative computation had to exploit recursive function calls). C++ 14 enabled loop statements and various other constructs.)

For example, since C++ 14, a compile-time function to compute a square root is easily written as follows:

meta/sqrtconstexpr.hpp
template<typename T>
constexpr T sqrt(T x)
{
   // handle cases where x and its square root are 
   // equal as a special case to simplify 
   // the iteration criterion for larger x: 
   if (x <= 1) {
     return x;
   } 

   // repeatedly determine in which half of 
   // a [lo, hi] interval the square root of x is located,
   // until the interval is reduced to just one value:
   T lo = 0, hi = x;
   for (;;) {
     auto mid = (hi+lo)/2, midSquared = mid*mid;
      if (lo+1 >= hi || midSquared == x) {
     // mid must be the square root: 
     return mid; 
      } 
     // continue with the higher/lower half-interval:
     if (midSquared < x) {
          lo = mid;
     }
     else {
        hi = mid; 
     } 
   }
}

This algorithm searches for the answer by repeatedly halving an interval known to contain the square root of x (the roots of 0 and 1 are treated as special cases to keep the convergence criterion simple). This sqrt() function can be evaluated at compile or run time:

static_assert(sqrt(25) == 5, "");  
   // OK (evaluated at compile time)
static_assert(sqrt(40) == 6, "");  
   // OK (evaluated at compile time)
std::array<int, sqrt(40)+1> arr;   
   // declares array of 7 elements (compile time)
long long l = 53478;
std::cout << sqrt(l) << 'n';      
   // prints 231 (evaluated at run time)

This function’s implementation may not be the most efficient at run time (where exploiting peculiarities of the machine often pays off), but because it is meant to perform compile-time computations, absolute efficiency is less important than portability. Note that no advanced “template magic” is in sight in that square root example, only the usual template argument deduction for a function template. The code is plain C++ and is not particularly challenging to read.

Value metaprogramming (that is, programming the computation of compile-time values) as we did above is occasionally quite useful, but there are two additional kinds of metaprogramming that can be performed with modern C++ (say, C++ 14 and C++ 17): type metaprogramming and hybrid metaprogramming.

Type metaprogramming

By relying on recursive template instantiation—a mainstay of template-based metaprogramming—we can perform type computations that are considerably more complex.

Consider the following small example:

meta/removeallextents.hpp
// primary template: in general we yield the given type: 
template<typename T>
struct RemoveAllExtentsT {
   using Type = T;
};

// partial specializations for array types (with and without bounds): 
template<typename T, std::size_t SZ>
struct RemoveAllExtentsT<T[SZ]> {
   using Type = typename RemoveAllExtentsT<T>::Type;
};

template<typename T>
struct RemoveAllExtentsT<T[]> {
   using Type = typename RemoveAllExtentsT<T>::Type;
};

template<typename T>
using RemoveAllExtents = typename RemoveAllExtentsT<T>::Type;

Here, RemoveAllExtents is a type metafunction (that is, a computational device that produces a result type) that will remove an arbitrary number of top-level “array layers” from a type. (The C++ standard library provides a corresponding type trait std::remove_all_extents.)

You can use it as follows:

RemoveAllExtents<int[]>          // yields int
RemoveAllExtents<int[5][10]>     // yields int
RemoveAllExtents<int[][10]>      // yields int
RemoveAllExtents<int(*)[5]>      // yields int(*)[5] 

The metafunction performs its task by having the partial specialization that matches the top-level array case recursively “invoke” the metafunction itself.

Computing with values would be very limited if all that was available to you were scalar values. Fortunately, just about any programming language has at least one container of values construct that greatly magnifies the power of that language (and most languages have a variety of container kinds, such as arrays/vectors and hash tables). The same is true of type metaprogramming: Adding a “container of types” construct greatly increases the applicability of the discipline. Fortunately, modern C++ includes mechanisms that enable the development of such a container.

Hybrid metaprogramming

With value metaprogramming and type metaprogramming, you can compute values and types at compile time. Ultimately, however, we’re interested in runtime effects, so we use these metaprograms in run time code in places where types and constants are expected. Metaprogramming can do more than that, however: You can programmatically assemble at compile time bits of code with a runtime effect. We call that hybrid metaprogramming.

To illustrate this principle, let’s start with a simple example: computing the dot-product of two std::array values. Recall that std::array is a fixed-length container template declared as follows:

namespace std {
   template<typename T, size_t N> struct array;
}

where N is the number of elements (of type T) in the array. Given two objects of the same array type, their dot-product can be computed as follows:

template<typename T, std::size_t N>
auto dotProduct(std::array<T, N> const& x, std::array<T, N> const& y)
{
   T result{};
   for (std::size_t k = 0; k<N; ++k) {
     result += x[k]*y[k];
   }
   return result;
}

A straightforward compilation of the for loop will produce branching instructions that on some machines may cause some overhead compared to a straight-line execution of

result += x[0]*y[0]; 
result += x[1]*y[1]; 
result += x[2]*y[2]; 
result += x[3]*y[3]; 
...

Fortunately, modern compilers will optimize the loop into whichever form is most efficient for the target platform. For the sake of discussion, however, let’s rewrite the dotProduct() implementation in a way that avoids the loop:

template<typename T, std::size_t N>
struct DotProductT {
    static inline T result(T* a, T* b) {
        return *a * *b  +  DotProduct<T, N-1>::result(a+1,b+1);
   } 
}; 

// partial specialization as end criteria 
template<typename T>
struct DotProductT<T, 0> {
    static inline T result(T*, T*) {
       return T{};
   } 
}; 

template<typename T, std::size_t N>
auto dotProduct(std::array<T, N> const& x,
                std::array<T, N> const& y)
{
   return DotProductT<T, N>::result(x.begin(), y.begin());
}

This is known as loop unrolling. We generally recommend against explicitly unrolling loops in portable code because the details that determine the best unrolling strategy depend highly on the target platform and the loop body; the compiler usually does a much better job of taking those considerations into account.

This new implementation delegates the work to a class template DotProductT. That lets you use recursive template instantiation with class template partial specialization to end the recursion. Note how each instantiation of DotProductT produces the sum of one term of the dot-product and the dot-product of the remaining components of the array. For values of type std::array<T,N> there will therefore be N instances of the primary template and one instance of the terminating partial specialization. For this to be efficient, it is critical that the compiler inlines every invocation of the static member functions result(). Fortunately, that is generally true when even a moderate level of compiler optimizations is enabled.

We specify the inline keyword explicitly here because some compilers (notably Clang) take this as hint to try a little harder to inline calls. From a language point of view, these functions are implicitly inline because they are defined in the body of their enclosing class.

The central observation about this code is that it blends a compile-time computation (achieved here through recursive template instantiation) that determines the overall structure of the code with a runtime computation (calling result()) that determines the specific runtime effect.

We mentioned earlier that type metaprogramming is greatly enhanced by the availability of a “container of types.” We’ve already seen that in hybrid metaprogramming a fixed-length array type can be useful. Nonetheless, the true “hero container” of hybrid metaprogramming is the tuple. A tuple is a sequence of values, each with a selectable type. The C++ standard library includes a std::tuple class template that supports that notion. For example,

std::tuple<int, std::string, bool> tVal{42, "Answer", true};

defines a variable tVal that aggregates three values of types int, std::string, and bool (in that specific order). The type of tVal above is very similar to a simple struct type like:

struct MyTriple {
   int v1;
   std::string v2;
   bool v3; 
};

Given that in std::array and std::tuple you have flexible counterparts to array types and (simple) struct types, it is natural to wonder whether a counterpart to simple union types would also be useful for hybrid computation. The answer is yes. The C++ standard library introduced the std::variant template for this purpose in C++ 17.

Because std::tuple and std::variant, like struct types, are heterogeneous types, hybrid metaprogramming that uses such types is sometimes called heterogeneous metaprogramming.

Hybrid metaprogramming for unit types

Another example demonstrating the power of hybrid computing is libraries that can compute results of values of different unit types. The value computation is performed at run time, but the computation of the resulting units it determined at compile time.

Let’s illustrate this with a highly simplified example. We are going to keep track of units in terms of their ratio (fraction) of a principal unit. For example, if the principal unit for time is a second, a millisecond is represented with ratio of 1/1000 and a minute with ratio 60/1. The key, then, is to define a ratio type where each value has its own type:

meta/ratio.hpp
template<unsigned N, unsigned D = 1>
struct Ratio {
   static constexpr unsigned num = N; // numerator 
   static constexpr unsigned den = D; // denominator 
   using Type = Ratio<num, den>; 
}; 

Now we can define compile-time computations such as adding two units:

meta/ratioadd.hpp
// implementation of adding two ratios: 
template<typename R1, typename R2>
struct RatioAddImpl
{
   private:
     static constexpr unsigned den = R1::den * R2::den;
     static constexpr unsigned num = R1::num * R2::den + R2::num * R1::den;
   public:
     typedef Ratio<num, den> Type;
}; 

// using declaration for convenient usage: 
template<typename R1, typename R2>
using RatioAdd = typename RatioAddImpl<R1, R2>::Type; 
This allows us to compute the sum of two ratios at compile time: 
using R1 = Ratio<1,1000>;
using R2 = Ratio<2,3>;
using RS = RatioAdd<R1,R2>;     //RShastypeRatio<2003,2000> 
std::cout << RS::num << '/' << RS::den << 'n';    
                                // prints 2003/3000 
using RA = RatioAdd<Ratio<2,3>,Ratio<5,7>>; 
                                // RA has type Ratio<29,21> 
std::cout << RA::num << '/' << RA::den << 'n';     
                                // prints 29/21 

We can now define a class template for durations, parameterized with an arbitrary value type and a unit type that is an instance of Ratio<>:

meta/duration.hpp
// duration type for values of type T with unit type U: 
template<typename T, typename U = Ratio<1>>
class Duration {
   public:
     using ValueType = T;
     using UnitType  = typename U::Type;
   private:
     ValueType val;
   public:
     constexpr Duration(ValueType v = 0)
        : val(v) { } 
     constexpr ValueType value() const {
        return val;
     } 
};

The interesting part is the definition of an operator+ to add two Durations:

meta/durationadd.hpp 
// adding two durations where unit type might differ: 
template<typename T1, typename U1, typename T2, typename U2>
auto constexpr operator+(Duration<T1, U1> const& lhs,
                         Duration<T2, U2> const& rhs)
{ 
   // resulting type is a unit with 1 a nominator and
   // the resulting denominator of adding both unit type fractions 
   using VT = Ratio<1,RatioAdd<U1,U2>::den>; 
   // resulting value is the sum of both values
   // converted to the resulting unit type: 
   auto val = lhs.value() * VT::den / U1::den * U1::num +
              rhs.value() * VT::den / U2::den * U2::num;
   return Duration<decltype(val), VT>(val); 
}

We allow the arguments to have different unit types, U1 and U2. And we use these unit types to compute the resulting duration to have a unit type that is the corresponding unit fraction (a fraction where the numerator is 1). With all that in place, we can compile the following code:

int x = 42;
int y = 77;

auto a = Duration<int, Ratio<1,1000>>(x);  // x milliseconds
auto b = Duration<int, Ratio<2,3>>(y);     // y 2/3 seconds
auto c = a + b; // computes resulting unit type 1/3000 seconds 
                // and generates runtime code for c = a*3 + b*2000

The key “hybrid” effect is that for the sum c the compiler determines the resulting unit type Ratio<1,3000> at compile time and generates code to compute at run time the resulting value, which is adjusted for the resulting unit type.

Because the value type is a template parameter, we can use class Duration with value types other than int or even use heterogeneous value types (as long as adding the values of these types is defined):

auto d = Duration<double, Ratio<1,3>>(7.5);   // 7.5 1/3 seconds 
auto e = Duration<int, Ratio<1>>(4);          // 4 seconds 
auto f = d + e;       
           // computes resulting unit type 1/3 seconds 
           // and generates code for f = d + e*3

In addition, the compiler can even perform the value computation at compile-time if the values are known at compile time, because operator+ for durations is constexpr.

The C++ standard library class template std::chrono uses this approach with several refinements, such as using predefined units (for example, std::chrono::milliseconds), supporting duration literals (for example, 10ms), and dealing with overflow.

The dimensions of reflective metaprogramming

Previously, we described value metaprogramming based on constexpr evaluation and type metaprogramming based on recursive template instantiation. These two options, both available in modern C++, clearly involve different methods driving the computation. It turns out that value metaprogramming can also be driven in terms of recursive template instantiation, and, before the introduction of constexpr functions in C++ 11, that was the mechanism of choice. For example, the following code computes a square root of an integer using recursive instantiation:

meta/sqrt1.hpp
// primary template to compute sqrt(N) 
template<int N, int LO=1, int HI=N> 
struct Sqrt { 
   // compute the midpoint, rounded up 
   static constexpr auto mid = (LO+HI+1)/2;

   // search a not too large value in a halved interval 
   static constexpr auto value = (N<mid*mid) ? Sqrt<N,LO,mid-1>::value
                                             : Sqrt<N,mid,HI>::value;
}; 
// partial specialization for the case when LO equals HI 
template<int N, int M>
struct Sqrt<N,M,M> { 
   static constexpr auto value = M;
};

This metaprogram uses much the same algorithm as our integer square root constexpr function, successively halving an interval known to contain the square root. However, the input to the metafunction is a nontype template argument instead of a function argument, and the “local variables” tracking the bounds to the interval are also recast as nontype template arguments. Clearly, this is a far less friendly approach than the constexpr function, but we will nevertheless analyze this code later on to examine how it consumes compiler resources.

In any case, you can see that the computational engine of metaprogramming could potentially have many options. That is, however, not the only dimension in which such options may be considered. Instead, we like to think that a comprehensive metaprogramming solution for C++ must make choices along three dimensions:

  • Computation
  • Reflection, the ability to programmatically inspect features of the program
  • Generation, the ability to generate additional code for the program

We have seen two options for computation: recursive instantiation and constexpr evaluation. For reflection, we have found a partial solution in type traits. Although available traits enable quite a few advanced template techniques, they are far from covering all that is wanted from a reflection facility in the language. For example, given a class type, many applications would like to programmatically explore the members of that class.

Our current traits are based on template instantiation, and C++ could conceivably provide additional language facilities or “intrinsic” library components to produce class template instances that contain the reflected information at compile time. (Some of the traits provided in the C++ standard library already rely on some cooperation from the compiler, via nonstandard “intrinsic” operators.) Such an approach is a good match for computations based on recursive template instantiations. Unfortunately, class template instances consume a lot of compiler storage and that storage cannot be released until the end of the compilation (trying to do otherwise would result in taking significant more compilation time).

An alternative option, which is expected to pair well with the constexpr evaluation option for the “computation” dimension, is to introduce a new standard type to represent reflected information. (This option is now under active investigation by the C++ standardization committee.)

Creating a flexible, general, and programmer-friendly code generation mechanism in the existing C++ language remains a challenge that is being investigated by various parties. However, it is also true that instantiating templates has always been a “code generation” mechanism of sorts. In addition, compilers have become reliable enough at expanding calls to small functions inline that that mechanism can be used as a vehicle for code generation. Those observations are exactly what underlies the DotProductT example and, combined with more powerful reflection facilities, existing techniques can already achieve remarkable metaprogramming effects.

The cost of recursive instantiation

Let’s analyze the Sqrt<> template further. The primary template is the general recursive computation that is invoked with the template parameter N (the value for which to compute the square root) and two other optional parameters. These optional parameters represent the minimum and maximum values the result can have. If the template is called with only one argument, we know that the square root is at least 1 and at most the value itself.

The recursion then proceeds using a binary search technique (often called method of bisection in this context). Inside the template, we compute whether value is in the first or the second half of the range between LO and HI. This case differentiation is done using the conditional operator ?:. If mid2 is greater than N, you continue the search in the first half. If mid2 is less than or equal to N, you use the same template for the second half again.

The partial specialization ends the recursive process when LO and HI have the same value M, which is the final value.

Template instantiations are not cheap: Even relatively modest class templates can allocate over a kilobyte of storage for every instance, and that storage cannot be reclaimed until compilation as completed. Let’s therefore examine the details of a simple program that uses the Sqrt template:

meta/sqrt1.cpp
#include <iostream>
#include "sqrt1.hpp"

int main() 
{ 
   std::cout << "Sqrt<16>::value = " << Sqrt<16>::value << 'n';
   std::cout << "Sqrt<25>::value = " << Sqrt<25>::value << 'n';
   std::cout << "Sqrt<42>::value = " << Sqrt<42>::value << 'n';
   std::cout << "Sqrt<1>::value =  " << Sqrt<1>::value << 'n';
}

The expression

Sqrt<16>::value

is expanded to

Sqrt<16,1,16>::value

Inside the template, the metaprogram computes Sqrt<16,1,16>::value as follows:

mid = (1+16+1)/2 
      =9 
value = (16<9*9) ? Sqrt<16,1,8>::value
                   : Sqrt<16,9,16>::value
      = (16<81)  ? Sqrt<16,1,8>::value
                   : Sqrt<16,9,16>::value
      = Sqrt<16,1,8>::value

Thus, the result is computed as Sqrt<16,1,8>::value, which is expanded as follows:

mid = (1+8+1)/2 
      =5 
value = (16<5*5) ? Sqrt<16,1,4>::value
                   : Sqrt<16,5,8>::value
      = (16<25)  ? Sqrt<16,1,4>::value
                   : Sqrt<16,5,8>::value
      = Sqrt<16,1,4>::value

Similarly, Sqrt<16,1,4>::value is decomposed as follows:

mid = (1+4+1)/2 
      =3 
value = (16<3*3) ? Sqrt<16,1,2>::value
                    : Sqrt<16,3,4>::value
      = (16<9)   ? Sqrt<16,1,2>::value
                    : Sqrt<16,3,4>::value
      = Sqrt<16,3,4>::value

Finally, Sqrt<16,3,4>::value results in the following:

mid = (3+4+1)/2 
      =4 
value = (16<4*4) ? Sqrt<16,3,3>::value
                    : Sqrt<16,4,4>::value
      = (16<16)  ? Sqrt<16,3,3>::value
                    : Sqrt<16,4,4>::value
      = Sqrt<16,4,4>::value

and Sqrt<16,4,4>::value ends the recursive process because it matches the explicit specialization that catches equal high and low bounds. The final result is therefore

value = 4

Tracking all instantiations

This analysis followed the significant instantiations that compute the square root of 16. However, when a compiler evaluates the expression

(16<=8*8) ? Sqrt<16,1,8>::value
          : Sqrt<16,9,16>::value

it instantiates not only the templates in the positive branch but also those in the negative branch (Sqrt<16,9,16>). Furthermore, because the code attempts to access a member of the resulting class type using the :: operator, all the members inside that class type are also instantiated. This means that the full instantiation of Sqrt<16,9,16> results in the full instantiation of Sqrt<16,9,12> and Sqrt<16,13,16>. When the whole process is examined in detail, we find that dozens of instantiations end up being generated. The total number is almost twice the value of N.

Fortunately, there are techniques to reduce this explosion in the number of instantiations. To illustrate one such important technique, we rewrite our Sqrt metaprogram as follows:

meta/sqrt2.hpp
#include "ifthenelse.hpp"

// primary template for main recursive step 
template<int N, int LO=1, int HI=N>
struct Sqrt {
   // compute the midpoint, rounded up 
   static constexpr auto mid = (LO+HI+1)/2;

   // search a not too large value in a halved interval 
   using SubT = IfThenElse<(N<mid*mid),
                           Sqrt<N,LO,mid-1>,
                           Sqrt<N,mid,HI>>;
   static constexpr auto value = SubT::value;
}; 
// partial specialization for end of recursion criterion 
template<int N, int S>
struct Sqrt<N, S, S> {
   static constexpr auto value = S;
}; 

The key change here is the use of the IfThenElse template. Remember, the IfThenElse template is a device that selects between two types based on a given Boolean constant. If the constant is true, the first type is type-aliased to Type; otherwise, Type stands for the second type. At this point it is important to remember that defining a type alias for a class template instance does not cause a C++ compiler to instantiate the body of that instance. Therefore, when we write

using SubT = IfThenElse<(N<mid*mid),
                           Sqrt<N,LO,mid-1>,
                           Sqrt<N,mid,HI>>;

neither Sqrt<N,LO,mid-1> nor Sqrt<N,mid,HI> is fully instantiated. Whichever of these two types ends up being a synonym for SubT is fully instantiated when looking up SubT::value. In contrast to our first approach, this strategy leads to a number of instantiations that is proportional to log2(N): a very significant reduction in the cost of metaprogramming when N gets moderately large.

The Sqrt<> example demonstrates that a template metaprogram can contain:

  • State variables: The template parameters
  • Loop constructs: Through recursion
  • Execution paths election: By using conditional expressions or specializations
  • Integer arithmetic

If there are no limits to the amount of recursive instantiations and the number of state variables that are allowed, this is sufficient to compute anything that is computable. However, it may not be convenient to do so using templates. Furthermore, because template instantiation requires substantial compiler resources, extensive recursive instantiation quickly slows down a compiler or even exhausts the resources available. The C++ standard recommends but does not mandate that 1,024 levels of recursive instantiations be allowed as a minimum, which is sufficient for most (but certainly not all) template metaprogramming tasks.

Thus, in practice, template metaprograms should be used sparingly. There are a few situations, however, when they are irreplaceable as a tool to implement convenient templates. In particular, they can sometimes be hidden in the innards of more conventional templates to squeeze more performance out of critical algorithm implementations.

Recursive instantiation versus recursive template arguments

Consider the following recursive template:

template<typename T, typename U>
struct Doublify {
};

template<int N>
struct Trouble {
   using LongType = Doublify<typename Trouble<N-1>::LongType,
                             typename Trouble<N-1>::LongType>;
}; 

template<>
struct Trouble<0> {
   using LongType = double;
};

Trouble<10>::LongType ouch;

The use of Trouble<10>::LongType not only triggers the recursive instantiation of Trouble<9>, Trouble<8>, …, Trouble<0>, but it also instantiates Doublify over increasingly complex types. The table illustrates how quickly it grows.

As the table shows, the complexity of the type description of the expression Trouble<N>::LongType grows exponentially with N. In general, such a situation stresses a C++ compiler even more than do recursive instantiations that do not involve recursive template arguments. One of the problems here is that a compiler keeps a representation of the mangled name for the type. This mangled name encodes the exact template specialization in some way, and early C++ implementations used an encoding that is roughly proportional to the length of the template-id. These compilers then used well over 10,000 characters for Trouble<10>::LongType.

Newer C++ implementations take into account the fact that nested template-ids are fairly common in modern C++ programs and use clever compression techniques to reduce considerably the growth in name encoding (for example, a few hundred characters for Trouble<10>::LongType). These newer compilers also avoid generating a mangled name if none is actually needed because no low-level code is actually generated for the template instance. Still, all other things being equal, it is probably preferable to organize recursive instantiation in such a way that template arguments need not also be nested recursively.

Enumeration values versus static constants

In the early days of C++, enumeration values were the only mechanism to create “true constants” (called constant-expressions) as named members in class declarations. With them, you could, for example, define a Pow3 metaprogram to compute powers of 3 as follows:

meta/pow3enum.hpp
// primary template to compute 3 to the Nth 
template<int N>
struct Pow3 { 
   enum { value = 3 * Pow3<N-1>::value };
};

// full specialization to end the recursion 
template<>
struct Pow3<0> {
   enum { value = 1 };
};

The standardization of C++ 98 introduced the concept of in-class static constant initializers, so that the Pow3 metaprogram could look as follows:

meta/pow3const.hpp
// primary template to compute 3 to the Nth 
template<int N>
struct Pow3 { 
   static int const value = 3 * Pow3<N-1>::value;
};

// full specialization to end the recursion 
template<>
struct Pow3<0> {
   static int const value = 1;
};

However, there is a drawback with this version: Static constant members are lvalues. So, if you have a declaration such as

void foo(int const&);

and you pass it the result of a metaprogram:

foo(Pow3<7>::value);

a compiler must pass the address of Pow3<7>::value, and that forces the compiler to instantiate and allocate the definition for the static member. As a result, the computation is no longer limited to a pure “compile-time” effect.

Enumeration values aren’t lvalues (that is, they don’t have an address). So, when you pass them by reference, no static memory is used. It’s almost exactly as if you passed the computed value as a literal.

C++ 11, however, introduced constexpr static data members, and those are not limited to integral types. They do not solve the address issue raised above, but in spite of that shortcoming they are now a common way to produce results of metaprograms. They have the advantage of having a correct type (as opposed to an artificial enum type), and that type can be deduced when the static member is declared with the auto type specifier. C++ 17 added inline static data members, which do solve the address issue raised above, and can be used with constexpr.

Metaprogramming history

The earliest documented example of a metaprogram was by Erwin Unruh, then representing Siemens on the C++ standardization committee. He noted the computational completeness of the template instantiation process and demonstrated his point by developing the first metaprogram. He used the Metaware compiler and coaxed it into issuing error messages that would contain successive prime numbers. Here is the code that was circulated at a C++ committee meeting in 1994 (modified so that it now compiles on standard conforming compilers):

meta/unruh.cpp
// prime number computation
// (modified with permission from original from 1994 by Erwin Unruh) 

template<int p, int i>
struct is_prime {
   enum { pri = (p==2) || ((p%i) && is_prime<(i>2?p:0),i-1>::pri) };
};

template<>
struct is_prime<0,0> {
   enum {pri=1};
};

template<>
struct is_prime<0,1> {
   enum {pri=1};
};

template<int i>
struct D {
   D(void*); 
}; 

template<int i>
struct CondNull {
   static int const value = i;
};

template<>
struct CondNull<0> {
   static void* value;
};
void* CondNull<0>::value = 0;
template<int i>
struct Prime_print { 
          // primary template for loop to print prime numbers 
    Prime_print<i-1> a;
   enum { pri = is_prime<i,i-1>::pri };
   void f() {
     D<i> d =  CondNull<pri ? 1 : 0>::value;    
          // 1 is an error, 0 is fine
     a.f(); 
   } 
}; 

template<>
struct Prime_print<1> {       
          // full specialization to end the loop
  enum {pri=0};
  void f() {
     D<1> d = 0; 
   }; 
}; 

#ifndef LAST
#define LAST 18
#endif

int main() 
{ 
   Prime_print<LAST> a;
   a.f();
}

If you compile this program, the compiler will print error messages when, in Prime_print::f(), the initialization of d fails. This happens when the initial value is 1 because there is only a constructor for void*, and only 0 has a valid conversion to void*. For example, on one compiler, we get (among several other messages) the following errors:

unruh.cpp:39:14: error: no viable conversion from ’const int’ to ’D<17>’
unruh.cpp:39:14: error: no viable conversion from ’const int’ to ’D<13>’
unruh.cpp:39:14: error: no viable conversion from ’const int’ to ’D<11>’
unruh.cpp:39:14: error: no viable conversion from ’const int’ to ’D<7>’
unruh.cpp:39:14: error: no viable conversion from ’const int’ to ’D<5>’
unruh.cpp:39:14: error: no viable conversion from ’const int’ to ’D<3>’
unruh.cpp:39:14: error: no viable conversion from ’const int’ to ’D<2>’

Note: As error handling in compilers differs, some compilers might stop after printing the first error message.

The concept of C++ template metaprogramming as a serious programming tool was first made popular (and somewhat formalized) by Todd Veldhuizen in his paper “Using C++ Template Metaprograms.” Veldhuizen’s work on Blitz++ (a numeric array library for C++) also introduced many refinements and extensions to metaprogramming (and to expression template techniques).

Both the first edition of this book and Andrei Alexandrescu’s Modern C++ Design contributed to an explosion of C++ libraries exploiting template-based metaprogramming by cataloging some of the basic techniques that are still in use today. The Boost project was instrumental in bringing order to this explosion. Early on, it introduced the MPL (metaprogramming library), which defined a consistent framework for type metaprogramming made popular also through David Abrahams and Aleksey Gurtovoy’s book C++ Template Metaprogramming.

Additional important advances have been made by Louis Dionne in making metaprogramming syntactically more accessible, particularly through his Boost.Hana library. Dionne, along with Andrew Sutton, Herb Sutter, David Vandevoorde, and others are now spearheading efforts in the standardization committee to give metaprogramming first-class support in the language. An important basis for that work is the exploration of what program properties should be available through reflection; Matúš Chochlík, Axel Naumann, and David Sankel are principal contributors in that area.

John J. Barton and Lee R. Nackman illustrated how to keep track of dimensional units when performing computations. The SIunits library was a more comprehensive library for dealing with physical units developed by Walter Brown. The std::chrono component in the standard library only deals with time and dates, and was contributed by Howard Hinnant.