Preprocessor

C/C++ Preprocessor Features

The C preprocessor is used by both C and C++. It is an indispensable phase in the compilation of programs in these languages, due their separate compilation design. Most examples are in C, but this article also mentions the preprocessor in relation to C++.

PREREQUISITES — You should already…
  • know how to compile C/C++ programs & understand the compilation process.
  • have some fundamental understanding of C/C++ structure, statements and types.

Overview

Although the C preprocessor is just a text manipulator, it is still very useful, if not in­dis­pen­sa­ble. It can be used to reduce or eliminate code duplication, and also to create conditional code, which can greatly enhance the portability of a program.

Here are some of the fundamental features:

Technically speaking, the preprocessor is not necessary to create any program. But omitting it from the compilation process will be inconvenient and result in manual duplication of code.

Directive List

For convenience, here is a list of all the directives, with a short description of each. You can also check out the GNU Preprocessor Manual.

Directive Description
#define Macro directive. Macros are expanded, and their existence can be tested. Generally, macros can be created on the command line, with a compiler driver switch (often: -Dmacro›, or even: -Dmacro=expansion). The expansion text may optionally contain any variation and number of the preprocessor oper­ators: # and ##.
#undef Removes macros created with #define (deletes them from preprocessor memory). It also applies to macros created on the command line.
#error Reports the existence of mutually exclusive macros, or incorrect values for some macros. Although never a requirement, it is good programming practice to use it when needed. Only useful after a conditional directive. Text after #error is ignored, but generally printed by the preprocessor before ter­mi­na­ting the preprocessing (and compilation as a whole). This can be in the form of a message to the programmer, explaining the “error”.
#include Includes the named file — replaces the current line with the contents of the named file. The file can be named with paths (absolute or relative), and the path separator should always be a forward slash (/), even on Windows®. Use double quote delimiters for project include files. Use the -Ipath› switch to add additional paths for the preprocessor to search for files.
#pragma Sets compiler-specific options locally, instead of on the command line. Often used to turn warnings off, and back on again, across a range of lines. Compilers that do not recognise the syntax following #pragma, should ignore it.
#if Conditional directive start. Can be followed by constant expressions, and the use of the defined preprocessor operator. The constant expressions can be compared using comparison operators, and logical operators are allowed. Can completely replace the use of #ifdef and #ifndef. Must be followed by #endif. When a non-existing macro name appears after #if, it is treated as 0. Like the C language's if(), it treats 0 as logically “false”, and any other value as logically “true”.
#elif “else if” — Has exactly the same features as #if, but cannot appear by itself, and only after a #if or another #elif. Optional.
#else Optional, and can only appear after a #if or #elif. There can be only one #else before the #endif of the matching #if, #ifdef or #ifndef.
#endif Terminates a #if, #ifdef or #ifndef.
#ifdef Checks for the existence of one macro only. Results in “true” if the macro exists.
#ifndef Checks for the absence of one macro only. Results in “true” if the macro does not exist.
#line Changes the line numbers reported by the compiler for error or warning mess­ages. Rarely used by programmers.

NOTEConditional Directives

The macros: #if, #ifdef, #ifndef, #elif, #else and #endif, together constitute a group that is called conditional directives, since they are used to conditionally output code (or not). They can nest, and should be indented to make the nesting more easily visible.

Macro Directive

The #define directive is used to create substitution text, which we call a “macro”. We use the #define macro directive to create an identifier, which is expanded to the substitution text everywhere it is used in the program.

Identifiers are not expanded inside literal strings, nor when part of a larger word (token).

The expansion of a macro is examined by the preprocessor for other macros, and it will expand those macros, whose expansion is also examined, and so forth. What it will not do, is re-expand a starting macro. This avoids recursive macros. The following will therefore not infinitely expand into X.

macros are not recursively expanded
#define X X

See C Preprocessor tricks, tips and idioms and/or C Preprocessor Magic for ‘tricks’ to force re-evaluation and thus recursion. Do take note that some of their examples may include dependence on non-standard GCC extensions. The Boost Preprocessor Library also provides the means to iterate using the preprocessor.

Symbolic Constants

Since neither C or C++98 offers proper symbolic constants, we have to abstract the concept by simulating a symbolic constant with a macro, for example:

symbolic constant example
#define PI 3.14159265358979323846

In C++11, you could (and should) use:

C++11 symbolic constant example
constexpr double PI = 3.14159265358979323846;

The C++ version can also be placed in a header file, so there is no downside.

NoteOptional Mathematical Constants

Compilers like GCC, Clang and MSVC supports a convention that allows you to create a macro called: _USE_MATH_DEFINES before you #include <cmath> (in C++, or in C: #include <math.h>). This will provide, for example, M_PI for pi amongst others.

Inline Generic Functions

Both C99 and C++ offer the inline modifier, which can be used in front of function definitions. However, the languages are statically typed, which means the modifier will only work for arguments of one particular type. In C++, you can use overloading to have functions with the same name, taking different types and/or number of arguments (based on the function signature). To write a simple, logically inline generic function, you can write macros with zero or more parameters:

macros with zero or more parameters
#define ANSI_CLS() (fprintf(stderr, "\x1B[2J0H"))
#define SQR(x)     ((x)*(x))

You use these functions as if they are functions:

using macros taking zero or more arguments
ANSI_CLS();
double d = SQR(2.5);
int i = SQR(25);
long l = SQR(25L);

The SQR() macro will work with any arithmetic type, and the result will be the same type as the argument. In C++, we combine inline and template to get the same benefits, but under the auspices of the C++ compiler, which can check syntax:

C++ alternative to macros with parameters
template <typename T>
   inline auto sqr (T x) -> T {
      return x * x;
      }

NOTEFunction Definition Syntax in C++11

Reminder that in C++11, there are two syntax forms to define or declare functions:

  • return-type› ‹ident(param)
  • autoident(param) ->return-type
int main ()         { ··· }
auto main () -> int { ··· }

Both of the above syntax versions means the same. We just chose the latter form in our sqr<> template function above, since it is only with template functions, that you would ever need to use the newer form.

This is the better solution for C++, but in C you will still need to use macros that look like function calls.

Note the syntax: the opening parenthesis of the macro cannot be separated from the macro name with whitespace.

As GCC/Clang extensions, you can create the SQR as follows, since C99 allows curly brace expressions:

Safer, but Non-portable SQR macro for GCC/Clang
#define SQR(x) ({int x_ = x; x_ * x_;})

This ensures that in SQR(++i), the ++i is only evaluated once. Note that the parentheses around the curly braces are part of the required syntax that allows curly braces in expressions; and that this is a non-standard extension.

Include/Header Files

The #include directive effectively causes the preprocessor to replace that line, with the contents of a file. The new content is also scanned by the preprocessor for directives. This means that the included file may contain more #include directives, in addition to other directives. This can easily become unmaintainable, and should be avoided, or carefully controlled and documented in a project.

Include Guards

As a consequence of nested inclusions, programmers are taught, almost blindly, to add an include guard to all header files. For example, if your header file is called header.h, it should follow this pattern:

   /*!@file  header.h
   *  @brief Header file with an “include guard” pattern.
   */
   #if !defined _HEADER_H_
       #define  _HEADER_H_
   ··· // “normal” contents of the header file.
   #endif // _HEADER_H_

You may find that people are so used to #ifdef, or #ifndef, that they would write an include guard as follows:

   /*!@file  header.h
   *  @brief Header file with an “include guard” pattern.
   */
   #ifndef _HEADER_H_
   #define _HEADER_H_
   ··· // “normal” contents of the header file.
   #endif // _HEADER_H_

Instead of making up some macro name, you can use a Globally/Universally Unique Identifier, often called uuidgen on POSIX systems.

UUIDs and Include Guards

As you know, you should use include guards in header files, using a macro formulated from the filename. Alternatively, the macro could be a UUID (Universally Unique IDentifier). On POSIX systems, your are bound to have uuidgen available. Use either of the command lines below:

$ echo H`uuidgen` | tr -d '-' | tr -s '[:lower:]' '[:upper:]'
$ echo H`uuidgen` | tr -d '-' | tr -s '[a-z]' '[A-Z]'

On Windows 7 and later, you can use PowerShell to create one. Or of course, in PowerShell 7 (previously called PowerShell Core) on Windows, Linux, or MacOS. Here is one possibility:

> "H"+[guid]::NewGuid().ToString().ToUpper().Replace("-", "")
H34C3C14AE5C45CE96C8B31312186B72

Now you can use the above UUID in a header file:

#if !defined H34C3C14AE5C45CE96C8B31312186B72
    #define  H34C3C14AE5C45CE96C8B31312186B72
···
#endif

Just do not use the above UUID! It is just an example. Create your own. Now it would not matter if you change the filename — forever, the UUID can identify this file, regardless.

Header File Places

A filename must appear after the #include directive. It can be delimited with angle brackets: <filename> or double quotes: "filename". Instead of a simple file name, a relative or absolute path can be used, although the latter is strongly discouraged. The path separator character should be a forward slash (/) and not a backslash, even in code written on/for Windows™.

When the ‹filename› is enclosed in angle brackets, the preprocessor will look for it in its standard header directories, and in directories added with a compiler option (-I for GCC). Names enclosed in double quotes should be reserved for header files that are part of the current project.

Header files should not contain code that cannot be repeated in a program. This effectively means no code that will add external references to the object file (no definitions with extern linkage).

Predefined Macros

A number of predefined preprocessor macros are mentioned in the C99 standard. These macros may not be redefined or undefined. Some are op­tion­ally de­fined i.e. they may not exist at all under certain circumstances.

Macro Description
__DATE__ Compilation date as literal string in the format: "Sep 9 2017" (not 09).
__FILE__ Literal string containing the name of the current file.
__LINE__ A literal integer (int) representing the current line number in the file.
__STDC__ Literal: 1, if the compiler conforms to the ISO C standard.
__STDC_HOSTED__ Literal: 1, if in a hosted environment, else 0.
__STDC_VERSION__ A long literal representing the current C standard in effect. It will be 199901L, for C99, and 201112L for C11 (2011).
__TIME__ A literal string ("13:05:03"), containing the time of compilation start.

Three additional macros are conditionally created under certain circumstances:

Macro Description
__STDC_IEC_559__ Expands to 1, if the compiler's floating point implementation conforms to the ISO60559 standard.
__STDC_IEC_559_COMPLEX__ Expands to 1, if the complex floating point implementation conforms to the ISO60559 standard.
__STDC_ISO_10646__ Expands to a long literal, representing a date up to which the compiler con­forms to the ISO/IEC 10646 standard for the wchar_t type.

NOTEC++ Macro

The __cplusplus macro is reserved for use by C++ compilers, and should never be cre­at­ed by your code. You can use it in conditional macros, for example:
#if defined __cplusplus.

All compilers create extra, compiler-specific macros, which can be used to identify the compiler be­ing used. This can be used for conditional compilation to support more compilers, which in­creas­es port­ability.

TIPPredefined Macros with GCC or CLang

With gcc, g++, clang or clang++ executables, you can add use a command line with some options to print out all the predefined macros known by these various compiler drivers; we use gcc as example here, in a POSIX shell:

$ echo | gcc -dM -E -

The trailing dash (-) is important. This command line will not work with Cmd Prompt, but the following will work in PowerShell:

> '' | gcc -dM -E -

Before the vertical bar (or pipe) is two single quotes (empty string). You could also use double quotes.

Miscellaneous Directives

Here are a number of directives, each specialised for a particular purpose.

Line Directive

The #line directive can be used to change the line numbers reported by the compiler for warning or error messages. This is seldom, if ever, used directly by programmers, but may be emitted by the pre­processor.

Pragma Directive

The pragma directive provides a way to control compiler-specific options. Compiler switches and options can be placed in a source file, instead of only on the command line.

The C99 standard docu­ments three pragmas, all dealing with the floating point environment:

They are rather specialised, and not often seen in general code.

NOTEWarning Directives

Some compilers may provide a non-standard #warning directive, which will simply print a warning message when encountered. GCC and MSVC provide a #pragma message() di­rec­tive, which can be used for a similar purpose (but they do not have the same syntax).

Pragma Once

The ‘#pragma once’ directive is a popular, yet non-standard pragma. Since it is non-standard, its use is not encouraged. Admittedly, it would have been very convenient if it was part of the C99 standard, since include guards are effectively mandatory anyway.

Pragma Pack

The #pragma pack, or #pragma pack(arg), is another popular, but non-standard, directive. It is sup­por­ted by GCC (only on Windows™) and MSVC. It is not part of the C99 standard, nor does the C++11 standard address packing of structures.

Prevent Warning Messages

Almost all the preprocessors of C compilers support the suppression of warning messages. The for­mat, however, is non-standard, so you will have to check the vendor's documentation. GCC's version can push the current settings, and you can later restore them with a pop:

warning state push/pop pattern (gcc)
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wformat"
···
#pragma GCC diagnostic pop

Function Name

Although __func__ is defined in the C99 standard, it is not a preprocessor macro. It is a C lang­uage feature that effectively creates a static char[] variable called __func__, which is initialised with the name of the current function. Non-standard pre-C99 support includes GCC's __FUNCTION__ and __PRETTY_FUNCTION__ extension keywords (these are still not macros, since the pre­processor knows nothing about function names — at least it should not). In MSVC++, you can use __FUNCSIG__ or __FUNCDNAME__.

We mention them here, since they are most often used in conjunction with __FILE__ and __LINE__ for conditional diagnostic messages (debugging messages).

Assertions

Although the assert() macro from <assert.h> is not a built-in macro, it is nevertheless a standard macro, which depends on the NDEBUG convention. If the NDEBUG macro exists, it will expand to no code. Otherwise, it will expand into a conditional statement, and if the condition is “false”, it will call abort() declared in <stdlib.h>.

We use assert() to formalise preconditions, and sometimes postconditions in C. Pre/post-con­di­tions are used to guard or test against programmer errors, which is why we do not want them in re­lease code. Sometimes, we also use assert() to test invariants. Assertions play an important role in creating robust code. C++11, and C11 even added static assertions (compile-time assertions), in­di­ca­ting the importance of this particular aspect of computer science.

pre/post-condition example snippet
void myStrCpy (char* dest, const char* src) {

   // precondition: neither `dest`, nor `src`, can be null pointers.
   assert(dest && src);
   ···
   // postcondition: `dest` and `src` must be identical.
   assert(strcmp(dest, src) == 0);
   }

Multiple assertions are fine. Having assert() in code is often indicative of careful reasoning and planning around an algorithm. It instils confidence in the reader, that the programmer is pro­fes­sion­al and attentive. In C & C++, we should use every available feature and good coding con­ven­tion to help ensure verifiable code.

simplification of what happens in assert.h.
#if !defined NDEBUG                         //←“debug compile”
   #define assert(x) ···/* some code */···
#else                                       //←“release compile”
   #define assert(x)
#endif

So, please use assert() freely — it guards against programmer errors, but has no effect in release code.

C11 & C++11 Static Assertions

In C++11, the new static_assert keyword allows us to perform invariant checks at compile time, which we call a static assertion. It has two variants, both of which take a boolean expression as first parameter, while one also accepts a string literal as second parameter, which will be printed out if the boolean expression of the first parameter results in false.

It is most commonly used in conjunction with templates found in the type_traits header (a lot of std::is_⋯<> templates). The static_assert expression itself is very common in templates, as a way to give more informative messages when a parameter type does not qualify for the al­go­rithm or implementation.

This has nothing to do with the preprocessor — it is more of a side note. Since it involves the same topic (assertions), we decided to mention it here.

The C11 version of static_assert from <assert.h>, which effectively just expands to the _Static_assert keyword, is less powerful than the C++11 version, but nevertheless useful… provided you are using C11, or C18.

Preprocessor Operators

The preprocessor also provides a limited number of preprocessor-specific operators. They ex­clude the arithmetic, comparison and logical operators, which can be utilised by #if and #elif. Consequently, these are generally useful in macro expansions:

Operator Description
defined Tests for the existence of macros, and can only be used after #if or #elif. Returns “true” if the operand macro does exist, otherwise “false”. The macro to be tested does not have to appear in parentheses.
# Stringisation operator (also called “stringification operator”). This can only appear after #define, and places double quotes around the expanded macro argument.
## Token concatenation operator, which can only be used in macro expansion (after #define), and is often used to create new identifiers.
_Pragma Pragma operator, which can be used to create #pragma directives, since macros cannot create other preprocessor directives. _Pragma("args") expands to: #pragma  ‹args›.

NOTECharisation Operator

The charisation operator is a Visual C/C++™ extension, which is used to put single quotes around an expanded macro argument. No loss, as it is not very useful.

Operators are not directives. They were introduced at the same time as the #if directive, since some of them are only useful after #if — which includes the normal logical and comparison operators as found in C and C++. Others are useful in macro expansion, so they are used after #define (the macro directive).

Defined

The defined operator can be used to test for the existence of a macro, after #if or #elif. As such, it replaces the need for #ifdef, or #ifndef, albeit with a slightly longer syntax:

#if !defined vs #ifndef
#if !defined NDEBUG   // vs: `#ifndef NDEBUG`

The advantage over the shorter #ifndef variant, is that we can use the #if's ability to use logical op­e­ra­tors in conjunction with defined:

superiority of #if illustrated
#if !defined NDEBUG || defined DEBUG || defined _DEBUG
···
#if defined __STDC__ && __STDC_VERSION < 199901L
···

It is therefore far more flexible than the archaic versions. But we cannot really expect pro­gram­mers to type a few more characters, in the interests of consistency. Or can we? If you feel this is not significant, is that a pragmatic view, or your personal preference, based on bias and/or habit?

Stringify / Stringisation

Mechanically, this operator is very simple: it simply surrounds a macro parameter, by putting double quotes around the expanded argument value. Clearly, this is only useful in macros with parameters.

In the following example, the DBGVAR() macro uses stringisation and implicit literal string con­ca­te­na­tion to print the name and value of a variable, with a given format, to stderr. It will only do this for a debug compile.

stringification operator in debugging code
#if !defined NDEBUG
   #define DBGVAR(v,f) \
      fprintf(stderr, #v " = %" #f "\n", v) 
#else
   #define DBGVAR(v,f)
#endif
int i = 123;  double d = 4.56;  char* s = "ABCDEF";
DBGVAR(i,d);  DBGVAR(d,.2f);    DBGVAR(s,s);

We can expand the above example, to also print the current filename and line number:

debugging code with stringification and predefined macros
#if !defined NDEBUG
   #define DBGVAR(v,f)                          \
      fprintf(stderr, #v " = %" #f "(%s:%d)\n", \
         v, __FILE__, __LINE__) 
#else
   #define DBGVAR(v,f)
#endif

This variant will also work well. The additional benefit is that we can align some output:

alternative output with formatting (alignment)
   #define DBGVAR(v,f)                            \
      fprintf(stderr, "%-12s = %" #f "(%s:%d)\n", \
         #v, v, __FILE__, __LINE__) 

Although the above examples concern debugging output, the code can be used for any purpose where you want to create a literal string with a macro. If you do not want to use fprintf() in C++, you can use std::cerr and insertion operators. Formatting would not be necessary (but you will have less control):

c++ output and formatting alternative
   #define DBGVAR(v)                                   \
      std::cerr                                        \
         << std::setw(12) << std::left << #v " = "     \
         << v << " (" __FILE__ ", " << __LINE__ << ')' \
         << std::endl

The C++-friendly macro above uses features from <iostream>, and <iomanip>.

Token Concatenation

This is a simple preprocessor operator, which just concatenates two tokens. These tokens are the arguments passed to a macro with parameters. The following example is not useful, but simply illustrates the mechanics:

simple token concatenation illustrated
extern void FA(void);
extern void FB(void);
#define CALL(a,b) a##b()
CALL(F,A);  CALL(F,B);

Since a new token is created, the expansion is not re-examined for other macros. The following will not work, since the X is concatenated with A, or B, and becomes one token, and the X will not be ex­pan­ded to become F:

creation of tokens is not expansion
#define X F
CALL(X,A);  CALL(X,B);

Token concatenation can be useful to create macros that attempt to emulate C++ templates, in par­ti­cu­lar, for generating the equivalent of template functions. Although the following code is simple, you should recognise that the algorithms are identical. The only differences are in the function names, and in the types of variables and parameters:

repetition of algorithm, with only type changes
void iswap (int* a, int* b) {
   int t = *a; *a = *b; *b = t;
   }
void dswap (double* a, double* b) {
   double t = *a; *a = *b; *b = t;
   }

For every new type we want to swap, we have to duplicate the algorithm, and change the types (and names). If we use a macro to generate the functions, then the algorithm will not be dupli­cated:

single algorithm instance as macro
#define MAKE_SWAP(name,type)        \
   void name (type* a, type* b) {   \
      type t = *a; *a = *b; *b = t; \
      }
MAKE_SWAP(iswap, int)
MAKE_SWAP(dswap, double)
MAKE_SWAP(llswap, long long)

You could also make the macro expand into an inline function, if the algorithms are as brief as the one above. If not, you would still have to declare the functions, which you can either do man­ual­ly, or create another macro that will expand in a declaration. For example:

macro to declare “generic” function/algorithm
#define DECL_SWAP(name,type) extern void name (type*, type*)
DECL_SWAP(iswap, int);
DECL_SWAP(dswap, double);
DECL_SWAP(llswap, long long);

This is a very useful technique, but surprisingly, not utilised as much as it probably could be. In C++, however, you should absolutely avoid using this technique, since it has template functions, which are handled by the compiler and not the simplistic text-manipulator which we call the C preprocessor.

Here is a C++ solution, using two features: inline, and template functions. These together absolve us from reliance on the preprocessor, with only advantages and no disadvantages:

template <typename T>
inline auto swap (T& a, T& b) noexcept -> void {
   T t = std::move(a);
   a = std::move(b);
   b = std::move(t);
   }

It requires the std::move() cast from <utility> for maximum benefit across diverse types. Or you could just use the C11 standard std::swap(), also from <utility>. It is the principle here that is important: in C++, we use inline and template functions to create “fast/generic functions”, not the preprocessor.

Language Extensions

Macro expansions are examined for macros, and if found, further expanded. However, the pre­pro­ces­sor will not recursively expand originating macros (it guards against infinite recursion). The result of the fol­low­ing expansion, will simply be MACRO:

non-recursive expansion guaranteed
#define MACRO MACRO
MACRO

The preprocessor can be used to create ‘fake’ keywords and superficial language extensions. This is questionable, but here are a few examples:

example not to be followed
#define if if (
#define then ) {
#define end }
#define else } else {
#define elseif  } else if (
#define repeat do {
#define until(x) } while(!(x))
···
   if x <= 3 then
      printf("OK, here goes: ");
      repeat
         int c = getchar();
      until (c == '\n');
   else
      fprintf(stderr, "Bailing.\n");
      exit (EXIT_FAILURE);
   end

Generally speaking, you should not write code like this, appealing as it may seem. One other pos­si­bi­li­ty, slightly less questionable, involves having only a single return, even if your logic requires mul­ti­ple exit points:

ensure single exit point
#define return EXIT: return
#define RETURN goto EXIT
···
   if (something) RETURN;
   working();
   switch (x) {
      case 1: dothis(); RETURN;
      case 2: dothat(); RETURN;
      ···
      }
   ···
   return;

You could get more elaborate:

single exit point with result
#define RETURN   EXIT: return result
#define LEAVE(x) do{result=x;goto EXIT;}while(0)
···
   int result = 0;
   if (something) LEAVE(1); else otherstuff();
   working();
   switch (x) {
      case 1: dothis(); LEAVE(2);
      case 2: dothat(); LEAVE(3);
      ···
      }
   ···
   RETURN;

IMPORTANTMacros Expanding Into Blocks

If we did not put a do{⋯}while(0) around the expanded code, we would have en­countered a syntax error before the else (caused by the semicolon after the expanded compound state­ment). This is a very common technique. When macros expand into multiple statements, even variable def­ini­tions, we have to enclose it all in a compound statement.

Variadic Macros

Variadic macros were introduced in C99, mostly to help with creating macros that call functions like printf(). Unfortunately, the standard did not deal with empty variable arguments lists pro­perly, leading to portability issues, since different compilers deal with it in a variety of ways.

This involves a new syntax, whereby macros with parameters can have ellipses (...) as a last macro parameter. Whatever arguments are passed to the macro from that point onwards, can be expanded with a new preprocessor macro: __VA_ARGS__.

debug message macro using variadic arguments
#define DBGMSG(fmt,...) \
   fprintf(stderr, "(%s:%d) " fmt, __FILE__, __LINE__, __VA_ARGS__)
int i = 123;
double d = 4.56;
DBGMSG("i = %d, d = %f\n", i, d);

This can be very useful, as long as you do pass at least one argument after the format string. If you really have nothing to pass, do the following:

dealing with no argument in variadic expansion
DBGMSG("Whatever here%s\n", ""); // pass `…%s…` and empty string `""`.

Passing no arguments at all, will result in a trailing comma in the argument list of the macro expansion, which will be reported as a syntax error by the compiler. The standard does not specify what preprocessors must do in this situation, and while some implementations will work as expected when no arguments are given, others do not. The most portable solution is thus to always pass at least one argument.

Richard Hansen provides a standard solution to a GCC extension regarding the comma before empty __VA_ARGS__. Here is a modified extract that can handle up to 20 arguments:

Richard Hansen empty variadic arguments solution
#define REST(...) REST_(NUM(__VA_ARGS__), __VA_ARGS__)
#define REST_(qty, ...) REST_2(qty, __VA_ARGS__)
#define REST_2(qty, ...) REST__##qty(__VA_ARGS__)
#define REST__1(first)
#define REST__GE2(first, ...) , __VA_ARGS__
#define NUM(...) SELECT_20TH(__VA_ARGS__,\
   GE2, GE2, GE2, GE2, GE2, GE2, GE2, GE2, GE2, GE2,\
   GE2, GE2, GE2, GE2, GE2, GE2, GE2, GE2,   1,   _)
#define SELECT_20TH(\
   a1,   a2,  a3,  a4,  a5,  a6,  a7,  a8,  a9, a10, \
   a11, a12, a13, a14, a15, a16, a17, a18, a19, a20, ...) a20

With that in hand, we can enhance the FMTDBG macro above to deal with empty variadic arguments in a a portable way:

enhanced debug message with variadic arguments
#define DBGMSG(fmt, ...) printf(fmt " (%s:%d)" \
   ARG1(__VA_ARGS__) REST(__VA_ARGS__), __FILE__, __LINE__)

With the new FMTDBG macro in hand, we do not have to pass an empty string. Consider the following two macro invocations:

example variadic debug macro invocations
DBGMSG();
DBGMSG("Got there\n")
DBGMSG("i = %d, d = %f\n", i, d);

They will result in the following lines of code, all of which are legal and standard C:

example variadic debug macro expansions
printf( " (%s:%d)" , "pputils.hpp", 5);
printf("Got there\n" " (%s:%d)" , "pputils.hpp", 6)
printf("i = %d, d = %f\n" " (%s:%d)" i , d, "pputils.hpp", 7);

C++ and the Preprocessor

We do not need the preprocessor as much from C++11, since instead of using #define to create sym­bol­ic con­stants, we now have constexpr (and we can even create constexpr functions, which have no real equivalent in the preprocessor). Instead of using macros with parameters to avoid function call overhead, we can use inline functions. When we combine inline with function temp­lates, mac­ros with arguments offer no benefits whatsoever.

This means we mostly use the preprocessor in C++ for file inclusion, and conditional compilation. When we declare C functions in a C library, to be used in C++, we must prevent C++ from deco­ra­ting the function names. For this, C++ provides a language linkage syntax. The problem is that C does not understand this syntax, so we must only enable it when compiling with C++:

c++ inclusion guard and language linking
///@file someheader.h
#if !defined _SOMEHEADER_H_
    #define  _SOMEHEADER_H_
    #if defined __cplusplus
        extern "C" {
    #endif
    // C function declarations go here.
    #if defined __cplusplus
       } // terminate `extern "C"` block.
    #endif
#endif // _SOMEHEADER_HPP_

If you must also compile the function definitions with C++, this would not be necessary, although the function definitions could start with extern "C", but then it gets tedious, since you will have to do that conditionally. So, this is mostly used as suggested: when declaring C functions, in a C header, for functions in a C library, for use by C++ code. This is already done for you when you #include C stan­dard library headers as <cstdio>, instead of <stdio.h>, for example.

Consider the following macro called SQR(), that takes one argument. It can be used with any nu­mer­ic type. Due to the expansion, if you give an int argument, the result is int; if you give a double ar­gu­ment, the result is double. That is very convenient:

traditional macro-as-fast-function
#define SQR(x) ((x) * (x))
···
int i = SQR(5);
double d = SQR(5.0);
long l = SQR(5L);

To avoid such macros in C++, yet still have the same convenience, we must resort to a com­bi­na­tion of inline and function templates. The code would still appear in a header file, as the above macro would:

c++ inline and template function combination
template <typename T>
inline auto SQR(T x) noexcept -> T {
   return x * x;
   }
···
int i = SQR(5);
double d = SQR(5.0);
long l = SQR(5L);

It is better than the macro version, in that the C++ compiler, which uses syntax checking, is involved, and not the preprocessor, which blindly replaces or expands text. Debuggers can show you the lines in the function when single-stepping code (and you can make breakpoints in the function). There is no excuse to not use this pattern — just do not use all-capitals for template functions; only macro names should be all-capital letters in production code.

C++ Using Directives

It is cumbersome to create scoped using std::ident; directives in C++, since every identifier must have its own using. It would be convenient if the following syntax was allowed:

using std::cin, std::cout, std::endl;

But instead we have to write:

using std::cin; using std::cout; using std::endl;

We could use the preprocessor to create a macro, e.g. USE, or USING which could be invoked as follows, providing enough abstraction, yet saving significant code:

USE(std::, cin, cout, endl)

Or even more minimalistic maybe:

STD(cin, cout, endl)

Unfortunately though, preprocessor recursion is non-trivial as expounded in C Preprocessor Tricks and C Pre-Processor Magic. By extracting (with some naming and small modifications), the relevant parts from Paul Fultz II's cloak.h code, the minimal required macros to achieve the above, leads to:

Paul Fultz II macros to create C++ using directives
#define EVAL(...) EVAL1024(__VA_ARGS__)
#define EVAL1024(...) EVAL512(EVAL512(__VA_ARGS__))
#define EVAL512(...) EVAL256(EVAL256(__VA_ARGS__))
#define EVAL256(...) EVAL128(EVAL128(__VA_ARGS__))
#define EVAL128(...) EVAL64(EVAL64(__VA_ARGS__))
#define EVAL64(...) EVAL32(EVAL32(__VA_ARGS__))
#define EVAL32(...) EVAL16(EVAL16(__VA_ARGS__))
#define EVAL16(...) EVAL8(EVAL8(__VA_ARGS__))
#define EVAL8(...) EVAL4(EVAL4(__VA_ARGS__))
#define EVAL4(...) EVAL2(EVAL2(__VA_ARGS__))
#define EVAL2(...) EVAL1(EVAL1(__VA_ARGS__))
#define EVAL1(...) __VA_ARGS__

#define PASS(...) __VA_ARGS__
#define NIL()
#define SEMICOLON() ;

#define DEFER1(n) n NIL()
#define DEFER2(n) n NIL NIL()()
#define DEFER3(n) n NIL NIL NIL()()()
#define DEFER4(n) n NIL NIL NIL NIL()()()()
#define DEFER5(n) n NIL NIL NIL NIL NIL()()()()()
#define DEFER6(n) n NIL NIL NIL NIL NIL NIL()()()()()()
#define DEFER7(n) n NIL NIL NIL NIL NIL NIL NIL()()()()()()()
#define DEFER8(n) n NIL NIL NIL NIL NIL NIL NIL NIL()()()()()()()()

#define CAT(a, ...) a ## __VA_ARGS__
#define CAT3(a, b, ...) a ## b ## __VA_ARGS__

#define ARG1(...) ARG1_(__VA_ARGS__, _) /* modified */
#define ARG1_(first, ...) first          /* added */
#define SECOND(a, b, ...) b
#define IS_PROBE(...) SECOND(__VA_ARGS__, 0)
#define PROBE() ~, 1

#define NOT(x) IS_PROBE(CAT(_NOT_, x))
#define _NOT_0 PROBE()
#define BOOL(x) NOT(NOT(x))

#define IF(c) _IF(BOOL(c))
#define _IF(c) CAT(_IF_,c)
#define _IF_0(...)
#define _IF_1(...) __VA_ARGS__

#define HAS_ARGS(...) BOOL(ARG1(_END_OF_ARGS_ __VA_ARGS__)(0))
#define _END_OF_ARGS_(...) BOOL(ARG1(__VA_ARGS__))

We need one more set of macros to create the using directives. Again, heavily based on code from the cloak.h code, we end up with:

macro mapping a list of identifiers to a single namespace
#define MAP_WITH_NS(op,ns,...) \
  IF(HAS_ARGS(__VA_ARGS__))(EVAL(MAP_WITH_NS_(op,ns, __VA_ARGS__)))
#define MAP_WITH_NS_(op,ns,cur_val, ...) \
  op(cur_val,ns) \
  IF(HAS_ARGS(__VA_ARGS__))( \
    DEFER2(_MAP_WITH_NS_)()(op, ns, __VA_ARGS__) \
  )
#define _MAP_WITH_NS_() MAP_WITH_NS_

Now only, can we create some macros like USE and STD below:

macros to expand using directives
#define MAKE_USE(id,ns) using ns id;
#define USE(ns, ...) MAP_WITH_NS(MAKE_USE, ns, __VA_ARGS__)
#define STD(...) USE(std ::, __VA_ARGS__)

With the above macros in hand, this:

   USE(std::, cin, cout, endl)
   STD(cin, cout, endl)

expands to the following lines, which apart from whitespace, are semantically identical:

   using std:: cin; using std:: cout; using std:: endl;
   using std :: cin; using std :: cout; using std :: endl;

You can find a complete listing of the above macros in pputils.hpp.

TIPUse GCC to Only Preprocess File

The -E and -P options to gcc (or g++, or clang) will preprocess your file to standard output. To test pputils.hpp, run: gcc -DTEST_PPUTILS -E -P pputils.hpp

Miscellaneous Ideas

We can do interesting things with the preprocessor, like abstracting a function call as a variable:

abstracting function call as a variable (C)
#define VAR (*varfunc())
int* varfunc() {
   static int data = 0;
   ···// do anything here
   return &data;
   }
VAR = 123;             *varfunc() = 123;
int* p = &VAR;         int* q = &*varfunc();
printf("%d\n", *p);    printf("%d\n", *q);

For all practical purposes, VAR is a variable (lvalue, to be exact). We can even “take its address”. But since it involves a function in implementation, we can do any additional tasks we desire when the “variable” is ac­ces­sed. In C++, we can go one better: return an int& (int reference):

abstracting function call as a variable (C++)
#define VAR (varfunc())
int& varfunc () {
   static int data{};
   ···// do anything here
   return data;
   }
VAR = 123;             varfunc() = 123;
int* p = &VAR;         int* q = &varfunc();
cout << *p << '\n';    cout << *q < '\n';

We could also, for example, create “safe arrays”, provided we are content with treating the pa­ren­the­ses as subscript. Although the code below will always check for out of bounds indexes, it could have been enclosed in conditional compilation directives, so that after testing, and in release compiles, the check will not be performed.

abstracting range-checked arrays
#define SZE 4
#define ARR(n) (*arrfunc(n))

int* arrfunc(size_t n) {
   static int data[SZE] = { 11, 22, 33, 44 };
   if (n >= sizeof(data)/sizeof(*data)) {
      static int dummy;
      fprintf(stderr, "index out of bounds\n");
      return &dummy;
      }
   return data + n;
   }
···
for (int i = 0; i < SZE * 2; ++i)
   ARR(i) = i * 2;
for (int i = 0; i < SZE * 2; ++i)
   printf("ARR(%d) = %d\n", i, ARR(i));
int *p = &ARR(2);
*p = 123;
printf("ARR(2) = %d\n", ARR(2));
for (int i = 0; i < SZE * 2; ++i)
   *p++ = i; // might crash; still *BAD*.
for (int i = 0; i < SZE * 2; ++i)
   (&ARR(0))[i] = i; // might crash; still *BAD*.

Again, as long as we can look past ARR(n) = X; (treating parentheses as subscript), ARR can be treat­ed as an array, except that if we want a pointer to an element, we have to explicitly take the ad­dress of it: &ARR(i). And once you have a pointer, all bets are off, like with any pointer — no safety checks any more.

There are much better ways to achieve this in C++, using custom classes, or the standard con­tain­ers. But, even so, we could improve the above code again with references, using several C++11 features. This approach is shown below as a com­plete pro­gram:

arrfunc.cppFunction as Array
/*!@file  arrfunc.cpp
*  @brief Function as Array
*/
#include <iostream>

#define ARR(n) (arrfunc(n))

template <typename T, size_t sz>
inline constexpr auto SIZE (T(&)[sz]) noexcept -> size_t {
   return sz;
   }

auto arrfunc (size_t n) -> int& {
   static int data[]{ 11, 22, 33, 44 };
   if (n >= SIZE(data))
      throw std::runtime_error("Index out of bounds");  
   return data[n];
   }

int main () {
   using std::string; using std::cout; using std::endl;

   // We cannot statically, nor dynamically, determine the size
   // of the array inside `arrfunc()`, so we hard-code it here.
   //
   constexpr auto SZE = 4;

   try{ // deliberately go out of bounds
      for (int i = 0; i < SZE * 2; ++i)
         ARR(i) = i * 2;
      }
   catch (std::exception& ex) {
      std::cerr << "(1) " << ex.what() << endl;
      }
   try { // deliberately go out of bounds
      for (int i = 0; i < SZE * 2; ++i)
         printf("ARR(%d) = %d\n", i, ARR(i));
      }
   catch (std::exception& ex) {
      std::cerr << "(2) " <<  ex.what() << endl;
      }
   int *p = &ARR(2);                        //← violating our own convention,
   *p = 123;                                //  since we cannot control pointers.
   cout << "ARR(2) = " << ARR(2) << endl;

   return EXIT_SUCCESS;
   }

As bonus, we provided a generic, constant expression, inline function called SIZE, that will return the size of any C-style array, whose definition is in scope. The name of the parameter was not used, so we omitted it (this is legal). You can, of course, also use std::extent and std::rank from the <type_traits> meta-programming header in the C++ standard library, but they can only accept types, not variables or expressions:

example of std::rank and std::extent
cout << "\"Dimensions\" : " << std::rank<int[5]>::value << endl;
cout << "No. Elements : " << std::extent<int[5]>::value << endl;

NOTEAlternative Member Access

The at() member function, found in many C++ collection classes that also provide an over­load­ed subscript operator, works just like the arrfunc() above, and range-checks indexes.

Summary

The C preprocessor, apart from being a convenient tool to avoid duplication, allows for certain abstractions. Some of these have been superceded in C++ with better direct language support. Mechanically, at all times, the preprocessor is no more than a “text in, text out” manipulator; but it is a useful tool nonetheless.

Non-Recursive Macro Expansion

Although the expansion of a macro is examined for other macros, the initial macro will not be re-expanded. This deliberately avoids the possibility of recursive macros, since there would be no easy way to stop the recursion.

Code Duplication

Since C/C++ compilers can only compile one file at a time (called separate compilation), it is often necessary to repeat declarations, type specifications, and symbolic constants in a program consisting of multiple source files. We place such code in header files, so that we can “repeat” it where necessary by #include'ing it.

This is still just text manipulation, and programmers should not award some higher function or meaning to the process — it has no connection to the concept of libraries, except by convention.

Symbolic Constants

In C, this is still required, since const means “read-only variable”, and does not facilitate immediate values in assembler. From C++11, programmers should exclusively use:
constexprtype› ‹ident=const-expr;
which can still be placed in header files, and has no disadvantage.

symbolic constant in c
#define PI 3.14159
symbolic constant in c++11
constexpr double PI = 3.14159;

Using const in either language, will not provide you with a “proper” and efficient symbolic constant.

Fast/Inline/Generic Functions

Although C99 does have inline, unlike C++, it does not have template functions, and thus macros with parameters are still useful in C. In C++, we combine inline and template functions to give us language-supported “fast” functions with no function call overhead, and that can accept different types of arguments — within reason, since the types allowed will depend on the operations used in the body.

Conditional Compilation

This is so useful that even C# has adopted (only) that part of the C preprocessor. By changing our compilation command line (either directly on the command line, in Makefile rules, or IDE configuration), we can compile different parts of a program. Most commonly, we sometimes want to compile debugging code, but in a “release” compile, this code must not appear.

The NDEBUG macro is the only standard way to distinguish between “debug” compiles versus “release” compiles. If this macro is present in the compilation, it means “no debugging”, i.e., “release”. The assert() macro will expand in no code when this macro is present.

Conditional compilation is crucial in creating portable code, to deal with variations in architectures and operating systems when we write non-trivial C/C++ programs.


2019-11-12: Added a ‘better’ SQR macro for GCC/Clang. [brx]
2019-04-08: Links to preprocessor articles and C++ using directives generator macros. [brx]
2019-02-26: Fixed missing closing parenthesis in ANSI_CLS macro. [brx]
2019-02-19: Added tip for listing predefined macros in GCC and CLang. [brx]
2019-02-18: Added reference to the term ‘compilation unit’. [brx]
2018-11-13: Add note about _USE_MATH_DEFINES. [brx]
2018-09-12: More about operators and the assert() macro. [brx]
2018-07-25: Edited. [jjc]
2018-07-24: Replaced cout in C99 example. Thanks Privilege Mendes. [brx]
2018-07-09: Created. Modified from previous Course Notes. [brx]