Preprocessor

C/C++ Preprocessor Features

The C preprocessor is used by both C and C++. It is an indispensable phase in the compilation of programs in these languages, due their separate compilation design. Most examples are in C, but this article also mentions the preprocessor in relation to C++.
PREREQUISITES — You should already…
  • know how to compile C/C++ programs & understand the compilation process.
  • have some fundamental understanding of C/C++ structure, statements and types.

Overview

Although the C preprocessor is just a text manipulator, it is still very useful, if not in­dis­pen­sa­ble. It can be used to reduce or eliminate code duplication, and also to create conditional code, which can greatly enhance the portability of a program.

Here are some of the fundamental features:

Technically speaking, the preprocessor is not necessary to create any program. But omitting it from the compilation process will be inconvenient and result in manual duplication of code.

Directive List

For convenience, here is a list of all the directives, with a short description of each.

Directive Description
#define Macro directive. Macros are expanded, and their existence can be tested. Generally, macros can be created on the command line, with a compiler driver switch (often: -Dmacro›, or even: -Dmacro=expansion). The expansion text may optionally contain any variation and number of the preprocessor oper­ators: # and ##.
#undef Removes macros created with #define (deletes them from preprocessor memory). It also applies to macros created on the command line.
#error Reports the existence of mutually exclusive macros, or incorrect values for some macros. Although never a requirement, it is good programming practice to use it when needed. Only useful after a conditional directive. Text after #error is ignored, but generally printed by the preprocessor before ter­mi­na­ting the preprocessing (and compilation as a whole). This can be in the form of a message to the programmer, explaining the “error”.
#include Includes the named file — replaces the current line with the contents of the named file. The file can be named with paths (absolute or relative), and the path separator should always be a forward slash (/), even on Windows®. Use double quote delimiters for project include files. Use the -Ipath› switch to add additional paths for the preprocessor to search for files.
#pragma Sets compiler-specific options locally, instead of on the command line. Often used to turn warnings off, and back on again, across a range of lines. Compilers that do not recognise the syntax following #pragma, should ignore it.
#if Conditional directive start. Can be followed by constant expressions, and the use of the defined preprocessor operator. The constant expressions can be compared using comparison operators, and logical operators are allowed. Can completely replace the use of #ifdef and #ifndef. Must be followed by #endif. When a non-existing macro name appears after #if, it is treated as 0. Like the C language's if(), it treats 0 as logically “false”, and any other value as logically “true”.
#elif “else if” — Has exactly the same features as #if, but cannot appear by itself, and only after a #if or another #elif. Optional.
#else Optional, and can only appear after a #if or #elif. There can be only one #else before the #endif of the matching #if, #ifdef or #ifndef.
#endif Terminates a #if, #ifdef or #ifndef.
#ifdef Checks for the existence of one macro only. Results in “true” if the macro exists.
#ifndef Checks for the absence of one macro only. Results in “true” if the macro does not exist.
#line Changes the line numbers reported by the compiler for error or warning mess­ages. Rarely used by programmers.

NOTEConditional Directives

The macros: #if, #ifdef, #ifndef, #elif, #else and #endif, together constitute a group that is called conditional directives, since they are used to conditionally output code (or not). They can nest, and should be indented to make the nesting more easily visible.

Macro Directive

The #define directive is used to create substitution text, which we call a “macro”. We use the #define macro directive to create an identifier, which is expanded to the substitution text everywhere it is used in the program.

Identifiers are not expanded inside literal strings, nor when part of a larger word (token).

The expansion of a macro is examined by the preprocessor for other macros, and it will expand those macros, whose expansion is also examined, and so forth. What it will not do, is re-expand a starting macro. This avoids recursive macros. The following will therefore not infinitely expand into X.

macros are not recursively expanded

Symbolic Constants

Since neither C or C++98 offers proper symbolic constants, we have to abstract the concept by simulating a symbolic constant with a macro, for example:

symbolic constant example

In C++11, you could (and should) use:

C++11 symbolic constant example

The C++ version can also be placed in a header file, so there is no downside.

NoteOptional Mathematical Constants

Compilers like GCC, Clang and MSVC supports a convention that allows you to create a macro called: _USE_MATH_DEFINES before you #include <cmath> (in C++, or in C: #include <math.h>). This will provide, for example, M_PI for pi amongst others.

Inline Generic Functions

Both C99 and C++ offer the inline modifier, which can be used in front of function definitions. However, the languages are statically typed, which means the modifier will only work for arguments of one particular type. In C++, you can use overloading to have functions with the same name, taking different types and/or number of arguments (based on the function signature). To write a simple, logically inline generic function, you can write macros with zero or more parameters:

macros with zero or more parameters

You use these functions as if they are functions:

using macros taking zero or more arguments

The SQR() macro will work with any arithmetic type, and the result will be the same type as the argument. In C++, we combine inline and template to get the same benefits, but under the auspices of the C++ compiler, which can check syntax:

C++ alternative to macros with parameters

NOTEFunction Definition Syntax in C++11

Reminder that in C++11, there are two syntax forms to define or declare functions:

  • return-type› ‹ident(param)
  • autoident(param) ->return-type

Both of the above syntax versions means the same. We just chose the latter form in our sqr<> template function above, since it is only with template functions, that you would ever need to use the newer form.

This is the better solution for C++, but in C you will still need to use macros that look like function calls.

Note the syntax: the opening parenthesis of the macro cannot be separated from the macro name with whitespace.

Include/Header Files

The #include directive effectively causes the preprocessor to replace that line, with the contents of a file. The new content is also scanned by the preprocessor for directives. This means that the included file may contain more #include directives, in addition to other directives. This can easily become unmaintainable, and should be avoided, or carefully controlled and documented in a project.

Include Guards

As a consequence of nested inclusions, programmers are taught, almost blindly, to add an include guard to all header files. For example, if your header file is called header.h, it should follow this pattern:

You may find that people are so used to #ifdef, or #ifndef, that they would write an include guard as follows:

Instead of making up some macro name, you can use a Globally/Universally Unique Identifier, often called uuidgen on POSIX systems.

UUIDs and Include Guards

As you know, you should use include guards in header files, using a macro formulated from the filename. Alternatively, the macro could be a UUID (Universally Unique IDentifier). On POSIX systems, your are bound to have uuidgen available. Use either of the command lines below:

> echo H`uuidgen` | tr -d '-' | tr -s '[:lower:]' '[:upper:]'
> echo H`uuidgen` | tr -d '-' | tr -s '[a-z]' '[A-Z]'

On Windows 7 and later, you can use PowerShell to create one. Or of course, in PowerShell 6 (previously called PowerShell Core) on Windows, Linux, or MacOS. Here is one possibility:

> "H"+[guid]::NewGuid().ToString().ToUpper().Replace("-", "")
H34C3C14AE5C45CE96C8B31312186B72

Now you can use the above UUID in a header file:

Just do not use the above UUID! It is just an example. Create your own. Now it would not matter if you change the filename — forever, the UUID can identify this file, regardless.

Header File Places

A filename must appear after the #include directive. It can be delimited with angle brackets: <filename> or double quotes: "filename". Instead of a simple file name, a relative or absolute path can be used, although the latter is strongly discouraged. The path separator character should be a forward slash (/) and not a backslash, even in code written on/for Windows™.

When the ‹filename› is enclosed in angle brackets, the preprocessor will look for it in its standard header directories, and in directories added with a compiler option (-I for GCC). Names enclosed in double quotes should be reserved for header files that are part of the current project.

Header files should not contain code that cannot be repeated in a program. This effectively means no code that will add external references to the object file (no definitions with extern linkage).

Predefined Macros

A number of predefined preprocessor macros are mentioned in the C99 standard. These macros may not be redefined or undefined. Some are op­tion­ally de­fined i.e. they may not exist at all under certain circumstances.

Macro Description
__DATE__ Compilation date as literal string in the format: "Sep 9 2017" (not 09).
__FILE__ Literal string containing the name of the current file.
__LINE__ A literal integer (int) representing the current line number in the file.
__STDC__ Literal: 1, if the compiler conforms to the ISO C standard.
__STDC_HOSTED__ Literal: 1, if in a hosted environment, else 0.
__STDC_VERSION__ A long literal representing the current C standard in effect. It will be 199901L, for C99, and 201112L for C11 (2011).
__TIME__ A literal string ("13:05:03"), containing the time of compilation start.

Three additional macros are conditionally created under certain circumstances:

Macro Description
__STDC_IEC_559__ Expands to 1, if the compiler's floating point implementation conforms to the ISO60559 standard.
__STDC_IEC_559_COMPLEX__ Expands to 1, if the complex floating point implementation conforms to the ISO60559 standard.
__STDC_ISO_10646__ Expands to a long literal, representing a date up to which the compiler con­forms to the ISO/IEC 10646 standard for the wchar_t type.

NOTEC++ Macro

The __cplusplus macro is reserved for use by C++ compilers, and should never be cre­at­ed by your code. You can use it in conditional macros, for example:
#if defined __cplusplus.

All compilers create extra, compiler-specific macros, which can be used to identify the compiler be­ing used. This can be used for conditional compilation to support more compilers, which in­creas­es port­ability.

Miscellaneous Directives

Here are a number of directives, each specialised for a particular purpose.

Line Directive

The #line directive can be used to change the line numbers reported by the compiler for warning or error messages. This is seldom, if ever, used directly by programmers, but may be emitted by the pre­processor.

Pragma Directive

The pragma directive provides a way to control compiler-specific options. Compiler switches and options can be placed in a source file, instead of only on the command line.

The C99 standard docu­ments three pragmas, all dealing with the floating point environment:

They are rather specialised, and not often seen in general code.

NOTEWarning Directives

Some compilers may provide a non-standard #warning directive, which will simply print a warning message when encountered. GCC and MSVC provide a #pragma message() di­rec­tive, which can be used for a similar purpose (but they do not have the same syntax).

Pragma Once

The ‘#pragma once’ directive is a popular, yet non-standard pragma. Since it is non-standard, its use is not encouraged. Admittedly, it would have been very convenient if it was part of the C99 standard, since include guards are effectively mandatory anyway.

Pragma Pack

The #pragma pack, or #pragma pack(arg), is another popular, but non-standard, directive. It is sup­por­ted by GCC (only on Windows™) and MSVC. It is not part of the C99 standard, nor does the C++11 standard address packing of structures.

Prevent Warning Messages

Almost all the preprocessors of C compilers support the suppression of warning messages. The for­mat, however, is non-standard, so you will have to check the vendor's documentation. GCC's version can push the current settings, and you can later restore them with a pop:

warning state push/pop pattern (gcc)

Function Name

Although __func__ is defined in the C99 standard, it is not a preprocessor macro. It is a C lang­uage feature that effectively creates a static char[] variable called __func__, which is initialised with the name of the current function. Non-standard pre-C99 support includes GCC's __FUNCTION__ and __PRETTY_FUNCTION__ extension keywords (these are still not macros, since the pre­processor knows nothing about function names — at least it should not). In MSVC++, you can use __FUNCSIG__ or __FUNCDNAME__.

We mention them here, since they are most often used in conjunction with __FILE__ and __LINE__ for conditional diagnostic messages (debugging messages).

Assertions

Although the assert() macro from <assert.h> is not a built-in macro, it is nevertheless a standard macro, which depends on the NDEBUG convention. If the NDEBUG macro exists, it will expand to no code. Otherwise, it will expand into a conditional statement, and if the condition is “false”, it will call abort() declared in <stdlib.h>.

We use assert() to formalise preconditions, and sometimes postconditions in C. Pre/post-con­di­tions are used to guard or test against programmer errors, which is why we do not want them in re­lease code. Sometimes, we also use assert() to test invariants. Assertions play an important role in creating robust code. C++11 even added static assertions (compile-time assertions), in­di­ca­ting the importance of this particular aspect of computer science.

pre/post-condition example snippet

Multiple assertions are fine. Having assert() in code is often indicative of careful reasoning and planning around an algorithm. It instils confidence in the reader, that the programmer is pro­fes­sion­al and attentive. In C & C++, we should use every available feature and good coding con­ven­tion to help ensure verifiable code.

simplification of what happens in assert.h.

So, please use assert() freely — it guards against programmer errors, but has no effect in release code.

C++ Static Assertions

In C++11, the new static_assert keyword allows us to perform invariant checks at compile time, which we call a static assertion. It has two variants, both of which take a boolean expression as first parameter, while one also accepts a string literal as second parameter, which will be printed out if the boolean expression of the first parameter results in false.

It is most commonly used in conjunction with templates found in the type_traits header (a lot of std::is_⋯<> templates). The static_assert expression itself is very common in templates, as a way to give more informative messages when a parameter type does not qualify for the al­go­rithm or implementation.

This has nothing to do with the preprocessor — it is more of a side note. Since it involves the same topic (assertions), we decided to mention it here.

Preprocessor Operators

The preprocessor also provides a limited number of preprocessor-specific operators. They ex­clude the arithmetic, comparison and logical operators, which can be utilised by #if and #elif. Consequently, these are generally useful in macro expansions:

Operator Description
defined Tests for the existence of macros, and can only be used after #if or #elif. Returns “true” if the operand macro does exist, otherwise “false”. The macro to be tested does not have to appear in parentheses.
# Stringisation operator (also called “stringification operator”). This can only appear after #define, and places double quotes around the expanded macro argument.
## Token concatenation operator, which can only be used in macro expansion (after #define), and is often used to create new identifiers.
_Pragma Pragma operator, which can be used to create #pragma directives, since macros cannot create other preprocessor directives. _Pragma("args") expands to: #pragma  ‹args›.

NOTECharisation Operator

The charisation operator is a Visual C/C++™ extension, which is used to put single quotes around an expanded macro argument. No loss, as it is not very useful.

Operators are not directives. They were introduced at the same time as the #if directive, since some of them are only useful after #if — which includes the normal logical and comparison operators as found in C and C++. Others are useful in macro expansion, so they are used after #define (the macro directive).

Defined

The defined operator can be used to test for the existence of a macro, after #if or #elif. As such, it replaces the need for #ifdef, or #ifndef, albeit with a slightly longer syntax:

#if !defined vs #ifndef

The advantage over the shorter #ifndef variant, is that we can use the #if's ability to use logical op­e­ra­tors in conjunction with defined:

superiority of #if illustrated

It is therefore far more flexible than the archaic versions. But we cannot really expect pro­gram­mers to type a few more characters, in the interests of consistency. Or can we? If you feel this is not significant, is that a pragmatic view, or your personal preference, based on bias and/or habit?

Stringify / Stringisation

Mechanically, this operator is very simple: it simply surrounds a macro parameter, by putting double quotes around the expanded argument value. Clearly, this is only useful in macros with parameters.

In the following example, the DBGVAR() macro uses stringisation and implicit literal string con­ca­te­na­tion to print the name and value of a variable, with a given format, to stderr. It will only do this for a debug compile.

stringification operator in debugging code

We can expand the above example, to also print the current filename and line number:

debugging code with stringification and predefined macros

This variant will also work well. The additional benefit is that we can align some output:

alternative output with formatting (alignment)

Although the above examples concern debugging output, the code can be used for any purpose where you want to create a literal string with a macro. If you do not want to use fprintf() in C++, you can use std::cerr and insertion operators. Formatting would not be necessary (but you will have less control):

c++ output and formatting alternative

The C++-friendly macro above uses features from <iostream>, and <iomanip>.

Token Concatenation

This is a simple preprocessor operator, which just concatenates two tokens. These tokens are the arguments passed to a macro with parameters. The following example is not useful, but simply illustrates the mechanics:

simple token concatenation illustrated

Since a new token is created, the expansion is not re-examined for other macros. The following will not work, since the X is concatenated with A, or B, and becomes one token, and the X will not be ex­pan­ded to become F:

creation of tokens is not expansion

Token concatenation can be useful to create macros that attempt to emulate C++ templates, in par­ti­cu­lar, for generating the equivalent of template functions. Although the following code is simple, you should recognise that the algorithms are identical. The only differences are in the function names, and in the types of variables and parameters:

repetition of algorithm, with only type changes

For every new type we want to swap, we have to duplicate the algorithm, and change the types (and names). If we use a macro to generate the functions, then the algorithm will not be dupli­cated:

single algorithm instance as macro

You could also make the macro expand into an inline function, if the algorithms are as brief as the one above. If not, you would still have to declare the functions, which you can either do man­ual­ly, or create another macro that will expand in a declaration. For example:

macro to declare “generic” function/algorithm

This is a very useful technique, but surprisingly, not utilised as much as it probably could be. In C++, however, you should absolutely avoid using this technique, since it has template functions, which are handled by the compiler and not the simplistic text-manipulator which we call the C preprocessor.

Here is a C++ solution, using two features: inline, and template functions. These together absolve us from reliance on the preprocessor, with only advantages and no disadvantages:

It requires the std::move() cast from <utility> for maximum benefit across diverse types. Or you could just use the C11 standard std::swap(), also from <utility>. It is the principle here that is important: in C++, we use inline and template functions to create “fast/generic functions”, not the preprocessor.

Language Extensions

Macro expansions are examined for macros, and if found, further expanded. However, the pre­pro­ces­sor will not recursively expand originating macros (it guards against infinite recursion). The result of the fol­low­ing expansion, will simply be MACRO:

non-recursive expansion guaranteed

The preprocessor can be used to create “fake” keywords and superficial language extensions. This is questionable, but here are a few examples:

example not to be followed

Generally speaking, you should not write code like this, appealing as it may seem. One other pos­si­bi­li­ty, slightly less questionable, involves having only a single return, even if your logic requires mul­ti­ple exit points:

ensure single exit point

You could get more elaborate:

single exit point with result

IMPORTANTMacros Expanding Into Blocks

If we did not put a do{⋯}while(0) around the expanded code, we would have en­countered a syntax error before the else (caused by the semicolon after the expanded compound state­ment). This is a very common technique. When macros expand into multiple statements, even variable def­ini­tions, we have to enclose it all in a compound statement.

Variadic Macros

Variadic macros were introduced in C99, mostly to help with creating macros that call functions like printf(). Unfortunately, the standard did not deal with empty variable arguments lists pro­perly, leading to portability issues, since different compilers deal with it in a variety of ways.

This involves a new syntax, whereby macros with parameters can have ellipses (...) as a last macro parameter. Whatever arguments are passed to the macro from that point onwards, can be expanded with a new preprocessor macro: __VA_ARGS__.

debug message macro using variadic arguments

This can be very useful, as long as you do pass at least one argument after the format string. If you really have nothing to pass, do the following:

dealing with no argument in variadic expansion

Passing no arguments at all, will result in a trailing comma in the argument list of the macro expansion, which will be reported as a syntax error by the compiler. The standard does not specify what preprocessors must do in this situation, and while some implementations will work as expected when no arguments are given, others do not. The most portable solution is thus to always pass at least one argument.

C++ and the Preprocessor

We do not need the preprocessor as much from C++11, since instead of using #define to create sym­bol­ic con­stants, we now have constexpr (and we can even create constexpr functions, which have no real equivalent in the preprocessor). Instead of using macros with parameters to avoid function call overhead, we can use inline functions. When we combine inline with function temp­lates, mac­ros with arguments offer no benefits whatsoever.

This means we mostly use the preprocessor in C++ for file inclusion, and conditional compilation. When we declare C functions in a C library, to be used in C++, we must prevent C++ from deco­ra­ting the function names. For this, C++ provides a language linkage syntax. The problem is that C does not understand this syntax, so we must only enable it when compiling with C++:

c++ inclusion guard and language linking

If you must also compile the function definitions with C++, this would not be necessary, although the function definitions could start with extern "C", but then it gets tedious, since you will have to do that conditionally. So, this is mostly used as suggested: when declaring C functions, in a C header, for functions in a C library, for use by C++ code. This is already done for you when you #include C stan­dard library headers as <cstdio>, instead of <stdio.h>, for example.

Consider the following macro called SQR(), that takes one argument. It can be used with any nu­mer­ic type. Due to the expansion, if you give an int argument, the result is int; if you give a double ar­gu­ment, the result is double. That is very convenient:

traditional macro-as-fast-function

To avoid such macros in C++, yet still have the same convenience, we must resort to a com­bi­na­tion of inline and function templates. The code would still appear in a header file, as the above macro would:

c++ inline and template function combination

It is better than the macro version, in that the C++ compiler, which uses syntax checking, is involved, and not the preprocessor, which blindly replaces or expands text. Debuggers can show you the lines in the function when single-stepping code (and you can make breakpoints in the function). There is no excuse to not use this pattern — just do not use all-capitals for template functions; only macro names should be all-capital letters in production code.

Miscellaneous Ideas

We can do interesting things with the preprocessor, like abstracting a function call as a variable:

abstracting function call as a variable (C)

For all practical purposes, VAR is a variable (lvalue, to be exact). We can even “take its address”. But since it involves a function in implementation, we can do any additional tasks we desire when the “variable” is ac­ces­sed. In C++, we can go one better: return an int& (int reference):

abstracting function call as a variable (C++)

We could also, for example, create “safe arrays”, provided we are content with treating the pa­ren­the­ses as subscript. Although the code below will always check for out of bounds indexes, it could have been enclosed in conditional compilation directives, so that after testing, and in release compiles, the check will not be performed.

abstracting range-checked arrays

Again, as long as we can look past ARR(n) = X; (treating parentheses as subscript), ARR can be treat­ed as an array, except that if we want a pointer to an element, we have to explicitly take the ad­dress of it: &ARR(i). And once you have a pointer, all bets are off, like with any pointer — no safety checks any more.

There are much better ways to achieve this in C++, using custom classes, or the standard con­tain­ers. But, even so, we could improve the above code again with references, using several C++11 features. This approach is shown below as a com­plete pro­gram:

arrfunc.cppFunction as Array

As bonus, we provided a generic, constant expression, inline function called SIZE, that will return the size of any C-style array, whose definition is in scope. The name of the parameter was not used, so we omitted it (this is legal). You can, of course, also use std::extent and std::rank from the <type_traits> meta-programming header in the C++ standard library, but they can only accept types, not variables or expressions:

example of std::rank and std::extent

NOTEAlternative Member Access

The at() member function, found in many C++ collection classes that also provide an over­load­ed subscript operator, works just like the arrfunc() above, and range-checks indexes.

Summary

The C preprocessor, apart from being a convenient tool to avoid duplication, allows for certain abstractions. Some of these have been superceded in C++ with better direct language support. Mechanically, at all times, the preprocessor is no more than a “text in, text out” manipulator; but it is a useful tool nonetheless.

Non-Recursive Macro Expansion

Although the expansion of a macro is examined for other macros, the initial macro will not be re-expanded. This deliberately avoids the possibility of recursive macros, since there would be no easy way to stop the recursion.

Code Duplication

Since C/C++ compilers can only compile one file at a time (called separate compilation), it is often necessary to repeat declarations, type specifications, and symbolic constants in a program consisting of multiple source files. We place such code in header files, so that we can “repeat” it where necessary by #include'ing it.

This is still just text manipulation, and programmers should not award some higher function or meaning to the process — it has no connection to the concept of libraries, except by convention.

Symbolic Constants

In C, this is still required, since const means “read-only variable”, and does not facilitate immediate values in assembler. From C++11, programmers should exclusively use:
constexprtype› ‹ident=const-expr;
which can still be placed in header files, and has no disadvantage.

symbolic constant in c
symbolic constant in c++11

Using const in either language, will not provide you with a “proper” and efficient symbolic constant.

Fast/Inline/Generic Functions

Although C99 does have inline, unlike C++, it does not have template functions, and thus macros with parameters are still useful in C. In C++, we combine inline and template functions to give us language-supported “fast” functions with no function call overhead, and that can accept different types of arguments — within reason, since the types allowed will depend on the operations used in the body.

Conditional Compilation

This is so useful that even C# has adopted (only) that part of the C preprocessor. By changing our compilation command line (either directly on the command line, in Makefile rules, or IDE configuration), we can compile different parts of a program. Most commonly, we sometimes want to compile debugging code, but in a “release” compile, this code must not appear.

The NDEBUG macro is the only standard way to distinguish between “debug” compiles versus “release” compiles. If this macro is present in the compilation, it means “no debugging”, i.e., “release”. The assert() macro will expand in no code when this macro is present.

Conditional compilation is crucial in creating portable code, to deal with variations in architectures and operating systems when we write non-trivial C/C++ programs.


2018-11-13: Add note about _USE_MATH_DEFINES. [brx]
2018-09-12: More about operators and the assert() macro. [brx]
2018-07-25: Edited. [jjc]
2018-07-24: Replaced cout in C99 example. Thanks Privilege Mendes. [brx]
2018-07-09: Created. Modified from previous Course Notes. [brx]