# Concepts of, and Operations on, Pointers

Pointer types and operations in C are the most fundamental, most important, and most per­va­sive of its features. Incorrect application of the rules that govern these features, is a major cause in failing programs. A complete and unambiguous understanding of these topics is, ar­guably, the most valuable skill any C/C++ programmer can master. All applicable C99 rules and sup­port­ive back­ground are presented here, but not at introductory level. These rules also apply to C++, except that references are not covered.

• have some programming experience, preferably in C.
• understand the concept and purpose of types in a statically typed language.
• understand expressions, operators and precedence.

# Background

The model of memory, as represented to programs in most CPU architectures, is linear. To pro­grams, memory is a sequential arrangement of bytes. Each of these bytes can be ac­ces­sed by ad­dress, which is another way to say: “every byte has an address”.

 All examples and values, that depend on architecture and compiler, depict typical results for a 32-bit, little endian architecture, and an int size of 4 bytes. The numerical val­ues for ad­dres­ses are random examples, used only for referential purposes. This is just a mat­ter of con­sis­ten­cy, and does not affect the concepts and rules discussed in any way.

#### Typographical Conventions

Only a small number of unusual typographical conventions are employed here, in particular:

• Guillemets (French quotes) are used to delimit descriptions in syntax, so for example: ‹ident› means “identifier”, and must be replaced with a legal identifier.
• A long left arrow (\longleftarrow) followed by text describes the syntax on the left of the arrow.
• This sequence: -t\rightarrow is shorthand for “has the type”.
• This sequence: -r\rightarrow is shorthand for “read as”, or “is pronounced”.
• The math symbol: \equiv is shorthand for “is equivalent to”.

The other conventions are common, like code using a monospaced font.

## Memory Layout

The following assembler code extract (Intel® syntax) allocates 4 variables (C, S, I and L), each having different sizes:

   C     DB  052h               ; BYTE  (82 in decimal)
S     DW  -1                 ; SHORT (FFFF in hex, 2's complement)
I     DD  123456             ; DWORD (0001E240 in hex)
L     DQ  0102030405060708h  ; QWORD

An example memory layout model is represented in the following diagram. Note that the ad­dress of each variable, is the address of the first byte of the sequence that comprises its val­ue. This is the byte with the lowest address, regardless of the endian­ness of the ar­chi­tec­ture.

The equivalent example in C is shown below. Unlike assembler, C affords little control over the order in which variables are allocated, or where they are allocated. The memory layout may not exactly match the model above.

For performance reasons, variables may be aligned on 2-byte, 4-byte or 8-byte boundaries, so gaps may exist between the variables (padding). This is possible in assembler as well, but must be di­rect­ly con­trolled by the programmer. C compilers generally default to per­form­ance align­ment, but can often be con­trolled with a compiler switch / option.

// variable definitions --------------------- typical size in bytes
char  C     = 'R';                      // 1 (BYTE)
short S     = -1;                       // 2 (WORD)
int   I     = 123456;                   // 4 (DWORD)
long long L = 0x0102030405060708LL;     // 8 (QWORD/OWORD)

// output statements ------------------------ outputs (values only)
printf("&C = %p, C = %c\n",   &C, C);   // A
printf("&S = %p, S = %d\n",   &S, S);   // -1
printf("&I = %p, I = %d\n",   &I, I);   // 123456
printf("&L = %p, L = %lld\n", &L, L);   // 72623859790382856

For brevity, the addresses passed to printf() were not cast to void*, which is what "…%p…" ex­pects. This is seldom an issue, apart from a potential compiler warning. The ad­dress-of op­e­ra­tor (&) used in the example, is discussed later.

#### Memory Divisions

The space for the machine code, stack, static memory, and dynamic memory are generally in se­pa­rate sections. That does not necessarily prohibit a program from accessing any of the mem­o­ry; it is simply how memory is organised. Some architectures or operating systems, for se­cu­ri­ty rea­sons, may prevent programs from accessing the machine code division as data (read or write).

This does not affect operations on pointers, only what is allowed to be accessed, and it fa­cil­i­tates a better understanding of the C memory environment. The stack and dynamic memory of­ten grow to­wards each other, while static memory is fixed at compile time. The program mem­o­ry is also fix­ed, but is of little con­cern, since it is not directly ma­nip­u­la­ted in pro­grams.

Many machine instructions allow for the provision of arguments, which they will treat as ad­dres­ses. Some machine instructions, for example, can use such a number to locate (address) a byte in mem­o­ry; fetch a fixed number of bytes, starting at the first byte; and store it in a reg­is­ter. De­pen­ding on the en­di­an­ness of the CPU, the bytes may be swapped around when stored in a reg­is­ter. En­di­an­ness does not affect our treatment of addresses, and even less, our un­der­stand­ing of point­ers from a C perspective.

 An address is a number that represents the location of a single byte in memory. It pro­vides no se­man­tics regarding the number of bytes that comprise the value stored there.

In assembler, an address is just a number. Programmers decide when a number should be treat­ed as an ad­dress. Other times, a number might represent the count of items in a col­lec­tion, or the age of a per­son, or the code of a character. Without context, a number is just a number, es­pe­cial­ly as far as ma­chine code is concerned. How the number is applied, or which machine code instructions are util­ised, determines its meaning from a prob­lem-solv­ing per­spec­tive.

Any non-trivial machine code program, regardless of how it was written, will be littered with ad­dress values, used immediately, or indirectly, to access other values in memory. It is there­fore fun­da­men­tal to the operation of programs in a computer.

 In assembler, values that form part of the machine instructions (op-codes), are called im­me­di­ate values. This means they exist in the code segment, as opposed to values (vari­ables, even read-only vari­ables), that reside in one of the data segments.

Macro assemblers allow programmers to name memory locations, but even so, the names simp­ly result in addresses. This does not absolve the programmer from using these named ad­dres­ses cor­rect­ly. The number of bytes comprising the values fetched (or stored), de­pends on the in­struc­tions chosen.

## Indirection

Some machine instructions can be supplied with a register or memory location, and use the val­ue stored there as an address, which in turn indicates the location where a BYTE (1-byte), WORD (2-byte), DWORD (4-byte), QWORD (8-byte), or longer value, can be fetched or stored. This pro­cess is called de­ref­er­enc­ing, or in­di­rec­tion.

 When a value in memory is accessed by first obtaining its address from another value, the pro­cess is called indirection, and will require knowledge of the number of bytes that com­prise the value.

Although simplistic, the following assembler extract shows one variation on an in­di­rect mem­o­ry fetching instruction. It loads the value of V into a CPU register called EAX, by getting the ad­dress of V from another register, called EBX. Obviously, the correct address of V must be in EBX. The same in­di­rec­tion in­struc­tion can therefore fetch different variables, by simply changing the ad­dress value in EBX.

   V        DD 123456       ; DWORD variable (0001E240h).
···
mov EAX, dword ptr [EBX] ; indirectly, via EBX, load the
; value of V into EAX.

The diagram below attempts to illustrate the operation. In the PDF versions of this material, it may be on the next page.

The address of V could have been stored in another memory location, for example named P, and an­other instruction to fetch the value of V, indirectly through P, could have been used. This has no ef­fect on the principle outlined here.

Especially in architectures with operating systems, a user program is not loaded at address 0 (the be­gin­ning of memory). Practically, this means that address 0 is an illegal address, for all intents and purposes. So much so, that it is often represented with a keyword, or spec­ial val­ue in pro­gram­ming lang­uages, like nil, null, nullptr, or NULL. More sophisticated ar­chi­tec­tures will allow an operating system to let a pro­gram thinks it has access to all of mem­o­ry, starting from address 0, but meanwhile it may be mapped to another location in physical memory.

## Data Movement

In assembler, programmers explicitly choose whether to move data to the memory location of a vari­able, or whether the address of the variable should be loaded. In a pro­gram­ming lang­uage like C, where variables are generally represented by a variable identifier, the com­pi­ler will emit one of two kinds of in­struc­tions, depending on context.

• When the variable is an operand to an operator that modifies memory, the rel­e­vant ma­chine code instructions are emitted. Such operators are often categorised as side-effect op­e­ra­tors.

• When the variable is an operand to an operator that does not modify memory, the value is placed as a result in the expression where it was used.

In either case, the name of the variable itself, does not automatically cause it to be fetched. It de­pends entirely on the operators in the same expression. If an operator with a side-ef­fect (one that mo­di­fies memory) is used on the variable, that instruction is performed. On the other hand, if the vari­able is used in conjunction with other operators, the val­ue is fetched and placed in the ex­pres­sion as a temporary.

### Lvalues

C uses the term lvalue to refer to expressions, like variables, that represent memory. Several op­e­ra­tors will report that an lvalue is required, when an incorrect operand is supplied. Not all vari­ables are lvalues. Array variables, for example, are not lvalues, nor are const storage class vari­ables. Lvalues will always have an address.

Some operators can also represent memory, and their results qualify as lvalues. These are the in­di­rec­tion, subscript, and indirect member selection operators. As we shall see later, they all re­sult in in­di­rec­tion expressions. If the indirection expression represents an array, or const stor­age class mem­o­ry, the result is not an lvalue.

### Rvalues

The term rvalue refers to expressions that are not necessarily required to represent memory. It means the operators applied to it only require a value, and need not modify memory. Vari­ables can be rvalues, but so can literals, and the results of expressions.

Rvalue expressions need not have addresses. For example, literals and enumerated values of­ten re­sult in immediate values in the machine code, and their addresses cannot be ob­tain­ed. Any temporary results of expressions, like return values of functions, are rvalues, often called temporaries.

### Assignment

The most obvious data movement is performed with the assignment operator. This includes all the com­pound assignment operators. The assignment operator is not to be confused with ini­tial­i­sa­tion syn­tax, which also uses the equals sign. The same type conversion rules apply how­ever.

Assignment operators, apart from the results they place in an expression, as is the case with all operators, also modifies memory; meaning the assignment operator is one of the side-effect operators mentioned above.

### Argument Passing

When an argument is passed to a function, it serves as initialisation of a parameter, which is a special kind of local variable to the function being called. Parameters have the same scope and lifetime as local variables — for all practical purposes, they are, except that the caller can ini­tial­ise them.

Although we use phrases like “pass an argument”, that is really an abstraction for “initialise a special parameter variable”, so that given a function F, returning void with one parameter P, of type int, i.e.:

void F (int param)

then calling the function as: F(123); is equivalent to this C pseudocode:

F(int param=123)

Syntax resembling this can be seen in languages that allow the arguments to be named when calling a function.

### Function Returns

The return statement conceptually assigns the result of an expression to a temporary anon­y­mous “function result” variable. Abstraction: given for example, a function that returns a type T, then the statement:

returnexpr;

effectively results in this pseudocode:

T retvar =expr;
assembler: reclaim local variable memory (stack)
assembler: restore saved registers

The result of retvar will be available in the calling expression as the result of the function call operator, as a temporary variable. Often retvar may be a register for the sake of efficiency, but the abstraction remains valid.

 The point about data movement is that, regardless of which kind of move­ment is used, a source type and a destination type are involved. In all these cases, if the source and des­ti­na­tion types do not match, the compiler will, if possible, provide implicit type con­ver­sion on the source type to match the destination type.

The C/C++ languages, like many others, abstract machine code operations, by allowing the pro­gram­mer to choose types for variables and values. The compiler will then emit the ap­pro­pri­ate in­struc­tions to access 1, 2, 4 or 8-byte values, based on the type. It also uses the type of a val­ue or vari­able, to allocate the correct amount of space. This is a compile-time op-er­a­tion; once compiled, there are no more types, just pure machine code.

## Types & Pointers

Abstractions ease the cognitive process in programming environments, and a type is an abstraction — it makes a programmer's life easier. Different languages have dif­fer­ent levels, and kinds of, abstractions. Some higher level languages tend to hide the numeric na­ture of ad­dres­ses behind abstractions called references.

### Pointers

In C, the abstraction of an address is elementary, and is called a pointer. Formally, we say that a pointer type is a derived type, simply because the word ‟point­er” by it­self, is just a concept. Application of the concept involves other, existing, types. This means, in a pro­gram, we can have “pointer to some type”, not just “pointer”. So saying “pointer” is like saying “carnivore” — it only categorises broadly, not qualifies un­am­bigu­ous­ly.

 The term pointer represents a classification for a group of derived types that ab­stract the con­cept: “the ad­dress of some type of value”.

### Pointer Types

The pattern: T*, is read as “T pointer”, where T can be any existing type — hence the as­ser­tion that pointers are derived types. Consequently, we then read, or pronounce, T** as “T pointer pointer”.

Some might prefer to pronounce it as “pointer to a pointer to a T”, which is slight­ly clos­er to its most verbose explanation: “the type of a value, that is the address of a value, which has the address of a T value”.

Since “T pointer pointer” correlates with the read­ing di­rec­tion, maps one-to-one to the to­kens, and is the most succinct, we will persist with that pro­nun­ci­a­tion, except in one early example.

 The type T*, pronounced “T pointer”, is a type that depicts the address of a T-type value, or in­formally: “means the address of a T value”.

Syntactically, whitespace, or the absence thereof, between the T and the *, or after the *, is not sig­nif­i­cant. However, we consciously join them together to help emphasise that it is a singular type.

Ex­am­ples of pointer types include the following: (we use the long­er de­scrip­tion in the com­ments, hoping that in the short term: “pointer to an int-type value” might have more value than “int point­er”).

table: Pointer Type Descriptions
Type Description / Verbalisation
int* “pointer to an int-type value”, or: “int pointer”.
double* “pointer to a double-type value”, or: “double pointer”.
unsigned long long* “pointer to an unsigned long long-type value”, or:
unsigned long long pointer”
int** “pointer to an int*-type value”; same as:
“pointer to a pointer to an int-type value”, or:
int pointer pointer”.

Since int, long, etc., are existing types, we could derive some pointer types from them. Whether we use existing types, or derive some types, variables can be created with the type:

  int V;               // V has type int.
int* P;              // P has int* (“int pointer”)
int** Q;             // Q has type int** (“int pointer pointer”)

Considering that int* is a type, albeit a derived type, we can in turn derive a pointer type from it: int**. And since that is also a type, we can derive another pointer type from it as well: int***, ad in­fini­tum.

### Variables

The C programming language provides syntax to represent a value in memory. The rep­re­sen­ta­tion al­lows operators to access the value, either by reading it, or changing it. The most com­mon method is to use a named variable to represent a piece of memory. The size of the memory is de­ter­mined by the type of the variable. This premise holds for any other ex­pres­sions that also re­present memory.

 Given: Then: T V = X; \longleftarrow variable called V, of type T, containing X. In expressions, V represents the memory containing X, with type T.

REMINDER: A variable in C is a chunk of memory, whose size is determined by its type, and is either re­pre­sen­ted by name, or is reachable via a correctly typed pointer.

### Lvalues & Rvalues

An expression, like a variable name, which represents memory, does not by itself affect the mem­o­ry; operators that utilise the expression, however, can read from, or write to, the mem­o­ry so re­pre­sented. In fact, operators that modify memory, require expressions that represent memory.

In C, we formally state that operators that write to memory, require lvalues, and those that read from memory, require rvalues. Lvalues can always be used as rvalues, but an rvalue, on the other hand, is not automatically an lvalue.

For example, a literal value 123 cannot be as­sign­ed to (it is not an lvalue), but its value can be used in an expression (it is an rvalue). Con­verse­ly, regarding V from the definition above, V can be assigned to, because it is an l­val­ue. V can be used as an r­val­ue (value is fetched). All variables, except array variables, are l­val­ues by default.

### Constness

Normally, as we have seen, non-array variables are by default lvalues. We can override this, by pre­fix­ing a variable's definition with the const modifier. This give the variable the const storage class, which for practical purposes, makes it a read-only variable.

This sounds like a con­tra­dic­tion, but is not: It is handled and stored in exactly the same way as other variables, except that the compiler does its best to see that you do not create instructions to modify it. In other words, it checks that you do not use it as an lvalue. And yes, you can circumvent the compiler if you really want to.

### Void Pointers

The closest we can get to an address as just a number, in C, is by using a void pointer. This has type void*, which fortuitously, we pronounce “void pointer”. It is simply a pointer type that very few op­e­ra­tors can work with, but the few operations that are legal, can be valuable. These in­clude: as­sign­ment, passing as argument, the result of function return, and casting to any other pointer type.

No arithmetic or indirection can be performed on void pointers, apart from subtracting one from an­other. They are most commonly used for functions that can work with any type of mem­o­ry, like memcpy(), and memset(). Writing a function taking a void pointer as parameter, means it is easy to call, since you can pass it any address without a cast. However, inside the function, the void point­er will have to be cast to another pointer type, before any­thing prac­ti­cal can be done with it.

## Obtaining Pointers in C

C provides four ways in which a programmer can manifest values of a pointer type:

• Implicitly, by string literal.
• Implicitly, by array expression.
• Implicitly, by function name.

The last two are special cases; they have their own specialised rules, and even in­volve ex­cep­tions to established rules.

### Pointer Type Categories

C provides three categories of pointer types, which affect syntax, and the operators which can be used:

• T* \longleftarrow pointer to a T value.
• T(*)[N] \longleftarrow pointer-to-array of N elements of type T.
• T(*)(P) \longleftarrow pointer-to-function, taking P parameters, returning T.

The second occurs whenever we take the address of an array. The third is simply the type that all function names result in. As always, any T can be a combination of other types, in­clud­ing more point­er types.

#### Pointer & Type Syntax Complexity

The following is entirely legal, but not considered recreational reading, and should be skipped on first reading. It is shown here simply to highlight the deficiencies in the syntax cho­sen for types, which are fine at a simple level, until they are combined.

double*(*(*X[3])(int(*)(void)))(long);      // define X
double d = *X[i](f)(123L);                  // use X, store result.

X is an array of 3 elements, which are pointers to functions, each taking a pa­ra­me­ter of (point­er to func­tion, taking void parameters, returning int), returning a pointer to a function taking a long as pa­ra­me­ter, returning a double*.

Of course, some functions must still be defined, and the actual elements of X given values, but it is workable, and conceptually, not so complex — the type syntax is the problem:

• X[i] \longleftarrow Select one of the elements in X, which is a pointer to a function.

• X[i](f) \longleftarrow Call the function, passing it a function pointer to f. It returns another function point­er.

• X[i](f)(123L) \longleftarrow Call the function pointer returned from the previous step, passing it 123L. It re­turns a double*.

• *X[i](f)(123L) \longleftarrow Use indirection to represent the double from the double* returned in the prev­ious step.

• double d = *X[i](f)(123L) \longleftarrow Define and initialise d with the result of the previous step.

Pure poetry, but complex enough that the use of such code should be discouraged (at least when not using typedef to simplify the syntax).

The most obvious way to obtain a pointer value, is by using the “address-of” operator: & (am­per­sand). It is a prefixed unary operator. Its operand can be any expression that has an ad­dress. All lvalues have addresses.

Even though an array expression is not an lvalue, it nev­er­the­less has an ad­dress. An array's el­e­ments, unless const has been applied to them, are lvalues. Lit­er­als, on the other hand, or the results of most operators, cannot be used as op­er­ands to the ad­dress-of operator.

 Given: and: Then: and: Also: T V; \longleftarrow variable named V of type T. T A[N]; \longleftarrow array of N values of type T. &V t T*  r “address of V has type T pointer” &A t T(*)[N]  r “address of A t pointer-to-array of N values of T”. &A[0] t T* ⇒ address of first element.

Take note that &A (address of array) is not equivalent to &A[0] (address of first element) — they do not have the same type, even though they have the same value. Also, one can take the ad­dress of a const variable. Given: const T V = Ec;, then the result of &V will be: const T* (or T const * if you like), instead of T* as above.

 Any arbitrarily complex expression, that produces a value of a pointer type, is called a pointer expression. This includes literal strings and const pointers.

Since int* means “address of an int”, it seems reasonable that taking the address of an int vari­able with the address-of operator, should produce an int*; and it does, which is why the as­sign­ment to P below, is valid. The same goes for the assignment to Q, which is a variable that can store an int** val­ue, and because P has type int*, taking its address, yields an int** type value, it can be stored in Q.

  int V = 123;  // variable V, storing 123, of type int.
int* P = &V;  // variable P, storing ADR_OF_V, of type int*.
int** Q = &P; // variable Q, storing ADD_OF_P, of type int**.

In the illustration below, the variable P occupies 4 bytes, and by coincidence, so does the vari­able V — they are not necessarily the same size on all architectures. A lit­tle-en­di­an ar­chi­tec­ture stores the low byte of a value, in low memory, so the hexadecimal value in P is 0x0000014A, and that of V, is 0x00003039. In a big endian architecture, the bytes in memory would have been re­versed, but the values would remain the same.

The actual addresses of the variables, i.e., the results of the address-of operators, depend on the com­pi­ler, architecture, compiler options, memory model, and numerous other fac­tors, maybe even the time of day, or phase of the moon. In C, we rarely care about the ac­tu­al nu­mer­i­cal value of an ad­dress — just that it is a valid numerical value, and in this case, will be the correct address for the first byte of the variable.

#### Printing Pointers

The only portable way to print the value of a pointer with printf, requires the %p formatting se­quen­ce. It expects a value of type void* though, so in order to avoid warnings, cast the pointer ex­pres­sion passed as argument, to void*

   printf("&I = %p\n", (void*)&I);

Remember that “pointer expression” means: an arbitrarily complex expression, resulting in a value of a pointer type.

### Array Variables and Expressions

Array variables are somewhat special — any expression representing an array, by array vari­able name, or other means, will result in the ad­dress of the first element. As a con­se­quen­ce, an array nev­er represents the complete collection of values that it contains. The result is that arrays can­not be moved around as a whole (cannot be assigned, re­turned from func­tions, or passed as ar­gu­ments). It is not as limiting as it sounds — pointer operations are powerful.

 Given: Then: Thus: Note: T A[N]; \longleftarrow array variable A, containing N values of type T; A \equiv &A[0] \longleftarrow decays to the address of the first element. A -t\rightarrow T*. T[N] \longleftarrowused for memory allocation only; becomes T* in expressions.

The type of A is still T[N], meaning A is an array, as evidenced by taking its size: sizeof(A) will be equal to sizeof(T[N]). When represented in an expression, except as operand to the sizeof or cast op­e­ra­tors, it results in a pointer, in this case: T*.

The first element of an array A, can be expressed in code as A[0]. It represents the first T val­ue. The ad­dress of the first element, can thus be expressed as: &A[0]. In other words then, as an ex­pres­sion, A is equivalent to &(A[0]), yielding a T* value, where T is the type of the first element of A. Due to op­e­ra­tor precedence, we can shorten it to: A \equiv &A[0]. Both will be legal expressions, but it would never be ne­ces­sa­ry to write &A[0].

   int A[3] = { 11, 22, 33 };    // initialised sequence of int values.
int* P = A;                   // P = address of first element of A.
int* Q = &A[0];               // redundant, since A === &A[0]
if (A != &A[0])
printf("It's the end of days!\n");
if (sizeof(A) != sizeof(int[3]))
printf("Life as we know it, has ceased.\n");

In short: The type of the variable A above is int[3]; the type of the expression A is int*.

The same operations applicable to A, above, are also applicable, and work exactly the same, when ap­plied to P, since both A and P have the same type and value in an expression. They are not re­mote­ly the same kind of variable: the one is an array variable, the other is a point­er vari­able, by clas­si­fi­ca­tion.

Operators never care where the value of their operands orig­i­nat­ed; whether it was a literal, a variable, or the result of a previous operator — they only care about the type.

#### Sizeof Operator and Arrays

The sizeof operator does not treat the expression A, as a pointer. It is an exception, and on­ly be­cause the definition of A is within scope. It is fortuitous and for our convenience. For ex­amp­le, the ex­pres­sion: sizeof(A)/sizeof(A[0]) will result in the number of elements in the array. It is often wrapped in a macro:

#define ARRAY_SIZE(arr) (sizeof(arr)/sizeof(*(arr)))
···
int total = 0, data[] = { 11, 22, 33 };
for (int i = 0; i < ARRAY_SIZE(data); ++i)
total += data[i];
printf("Sum of data = %d\n", total);

Because we did not specify the size of the array called data, we can easily add more ini­tia­li­sers, or re­move some, without worrying about maintaining a macro with the size of the array, which is what of­ten happens.

### Function Pointers

Although function types are not data types, the name of a function results in the address of the func­tion in the code, which is a data type. This is called a function pointer, or verbosely, pointer to a func­tion. Since we have defined the term pointer to be synonymous with an as­so­ci­at­ed type, it follows that “function pointer” is only a classification; it be­comes con­crete when it has an actual type.

 Given: Then: Thus: T F (P) \longleftarrow function F, taking P parameters, returning T. F \equiv &F \equiv *F  -t\rightarrow T(*)(P) -r\rightarrow “pointer to function taking P parameters, returning a T”. Functions represented in an expression, result in function pointers.

The only legitimate way, therefore, to obtain a function pointer in C, is to use the name of a func­tion. Other languages, like C++11, allow for anonymous functions as expressions. These ex­pres­sions are for­mal­ly called lambdas, so you might hear that some language supports, or does not sup­port, lamb­das. Well, C does not support lambdas, which is a shame, but only a minor in­con­ven­ience.

The names of the parameters are not relevant in the function pointer type. Point­er arith­metic can­not be performed on function pointers.

##### fpdemo01.c — Function Pointer Example 01
/*!@file  fpdemo01.c
*  @brief Function Pointer Demonstration 01
*/
#include <stdio.h>

// divide an int by 10 with rounding, returning a long.
//
long F (int p) {
return ((long)p * 10L + 50L) / 100L;
}

// divide an int by 10 without rounding, returning a long.
//
long G (int p) {
return (long)p / 10L;
}

int main (void) {

long (*P)(int);

P = F;   printf("%ld\n", P(95));   // P(95) calls F(95).
P = G;   printf("%ld\n", P(95));   // P(95) calls G(95).

return EXIT_SUCCESS;
}

The simplistic example above shows that the variable P can store any value, as long as the value has type long(*)(int), i.e., “pointer to function taking one int parameter, returning a long”.

Code involving function pointers is significantly simplified by using a typedef. Here is the same pro­gram, with the addition of a user-defined type, called PF_t, which is simp­ly a syn­onym for long(*)(int):

##### fpdemo02.c — Function Pointer Example 02
/*!@file  fpdemo02.c
*  @brief Function Pointer Demonstration 02
*/
#include <stdio.h>

typedef long (*PF_t)(int);  //← PF_t ≡ long(*)(int)
typedef long FP(int);       //← FP*  ≡ long(*)(int)

// divide an int by 10 with rounding, returning a long.
//
long F (int p) {
return ((long)p * 10L + 50L) / 100L;
}

// divide an int by 10 without rounding, returning a long.
//
long G (int p) {
return (long)p / 10L;
}

int main (void) {

PF_t P;
FP*  Q;

Q = P = F;   printf("%ld,%ld\n", P(95), Q(95)); //← calls F(95).
Q = P = G;   printf("%ld,%ld\n", P(95), Q(95)); //← calls G(95).

return EXIT_SUCCESS;
}

Using the PF_t type, it becomes trivial to pass function pointers, return function pointers, or store them in arrays:

typedef long (*PF_t)(int);  //← PF_t ≡ long(*)(int) (1)
typedef long FT(int);       //← FT*  ≡ lont(*)(int) (2)
···
// function returning a function pointer.
PF_t FRP (void);                    // using typedef (1).
FT*  FRP (void);                    // using typedef (2)
long (*FRP(void))(void);            // not using typedef.

// function taking a function pointer as argument.
void FAP (PF_t parm);               // using typedef (1).
void FAP (FT*  parm);               // using typedef (2).
void FAP (long (*parm)(int));       // not using typedef.

// array of function pointers.
PF_t AFP[2] = { F, G };             // using typedef (1).
FT*  AFP[2] = { F, G };             // using typedef (2).
long (*AFP[2])(int) = { F, G };     // not using typedef.

// function returning, and accepting a function pointer.
PF_t FFF (PF_t parm);               // using typedef (1).
FT*  FFF (FT*  parm);               // using typedef (2).
long (*FFF(long (*parm)(int))(int); // ouch.

Like all operators, the function call operator has no idea from where its function pointer op­er­and orig­i­nat­es. It simply does its job, regardless. Some programmers, with convoluted ra­tio­nal­i­sa­tions, will write: P = &F; instead of P = F;, as if they are somehow different. The com­pi­ler will allow this syntax, but it is ignored. Similarly, they write: (*P)(95), instead of P(95), as if it's special. Again, it is al­lowed, but the com­pi­ler completely ignores it. In fact, you can even write: (*F)(95), and it will also be ignored, and be treated as F(95).

If you're not convinced, consider what set of rules, other than mentioned, explain why this will com­pile, and call F(), every time:

long F (int parm) { ··· }
long (*P)(int) = F;                  // or … = &F; if you like.
···
(*****F)(123);   P(123);          // calls F().
(***F)(123);  (*P)(123);          // calls F().
(*F)(123);  (***P)(123);          // calls F().
F(123);   (*****P)(123);          // calls F().
• The name F, represents a function, and results in type: long(*)(int).
• The variable P, contains the address of F, and has type: long(*)(int).
• Expressions (*F) and (*P) both represent functions with type: long(*)(int).

The function call operator gets the same value, and the same type for any of these ex­pres­sions, and will perform the same task on each — call F().

If you like to see even more pathologically insane variants of the above, consider the fol­low­ing, which assumes that the definitions of F and P above are still in scope:

   (****(***(**(*F))))(123);        // still calls F().
(****(***(**(*P))))(123);        // still calls F().
(&*&*&*&*&*&*&*&F)(123);         // still calls F().
(&*&*&*&*&*&*&*&P)(123);         // still calls F().

It does not matter how you represent a function, by name or pointer variable, it will always just be a function pointer — which can only be called with the function call operator, stored in a variable, passed as argument, or returned from a function; whether you unnecessarily ap­ply in­di­rec­tion to it or not.

### Literal Strings

Literal strings (e.g. "ABC" or L"ABC") result in pointers to their first characters. Since each char­ac­ter has type char, or wchar_t, that means the type is char* or wchar_t* respectively. In C++, they de­cay to have types const char* and const wchar_t* respectively. The fact that they are not const point­ers in C, does not mean it is portable to write to the string location.

Fur­ther­more, duplicate & iden­ti­cal literal strings are allowed to share the same space. Al­though it has no real impact on pro­grams, a literal string can be considered an array of chars or wchar_ts, but since representing it will result in a pointer, it is rather a moot point.

The important point here is that, conceptually, programmers tend not to think of a literal string as a pointer. They think of it as a “string”, which it is, and which it isn't, depending on your per­spec­tive. But regardless of perspective or assumption, it results in a pointer, and hence all op­e­ra­tors that work with pointers, will work with literal strings.

   putchar(     *"ABCDEF"     );     // indirection,                  ⇒ A
putchar(      "ABCDEF"[2]  );     // subscript (indirection),      ⇒ C
putchar(    2["ABCDEF"]    );     // subscript (indirection),      ⇒ C
putchar(    *("ABCDEF" + 2));     // ptr arithmetic & indirection. ⇒ C
putchar(*(2 + "ABCDEF")    );     // ptr arithmetic & indirection: ⇒ C
char* P =     "ABCDEF";           // store pointer in P variable.
char S[] =    "ABCDEF";           // exception. not a a literal string:
char T[] = {'A', 'B', 'C', 'D', 'E', 'F', '\0'};  // equivalent to this.

We have not yet discussed the intricacies of indirection, pointer arithmetic, and the sub­script op­er­a­tor, so do not get too concerned about that. The code is only to prove that literal strings result in pointers — in other words, the code is syntactically correct, will compile, run and pro­duce the same results on all conformant C compilers.

In C++, the type of a literal string is: const char[N], decaying to: const char*. You should also treat literals strings in C as if they are const pointers as a good convention. In other words, rather use:

const char* ident = "ABC";, than:
char* ident = "ABC";.

## Operations on Pointers

Pointer values are often passed to functions as arguments, so that functions may have the opt­ion to modify the value at that address, without the function needing to know the name of that value. In some other languages, the compiler may do it automatically, and then it is called “pass by ref­er­ence”. C has no syntax for this feature; we must manually pass addresses, should we need to.

### Indirection Operator

In order to represent the value a pointer “points to”, we can use the indirection operator. Any ex­pres­sion, where the last operator to be performed is the indirection operator, is called an “in­di­rec­tion ex­pres­sion”.

 Given: Then: Means: Type: E -t\rightarrow T* \longleftarrow any expression E of type T*. *E -r\rightarrow “indirect E”, and “represent the T value at address E”. *E -t\rightarrow T

Assuming the runtime value of the expression E above is ADR_OF_V, which is an address, we of­ten say “E points to ADR_OF_V”. Generally however, we know ADR_OF_V is the address of some variable, say V, in which case we will say: “E points to V”. Hence the term “pointer”.

Since we have defined a variable name to represent a value in memory, and we have also de­fin­ed an indirection expression, like *E above, to represent a value in memory, it means that the same op-er­a­tions can be applied to either. Given the right values, it will be possible for an in­di­rec­tion to be a practical alias, in almost every respect, for some variable:

#include <stdio.h>
int main (void) {
int V = 123;                          // value 123. assume &V is XXX.
int* P = &V;                          // value XXX. assume &P is YYY.
int** Q = &P;                         // value YYY. assume &Q is ZZZ.

#define _p(expr) printf("%*s = %p\n", -5, #expr, ((void*)(expr)))
_p( &V   );                           //⇒ &V   = XXX
_p( &P   );                           //⇒ &P   = YYY
_p( &Q   );                           //⇒ &Q   = ZZZ
_p( &*P  );                           //⇒ &*P  = XXX
_p( &*Q  );                           //⇒ &*Q  = YYY
_p( &**Q );                           //⇒ &**Q = XXX
_p( P    );                           //⇒ P    = XXX
#undef _p

#define _p(expr) printf("%*s = %d\n", -5, #expr, expr)
_p( V    );                           //⇒ V    = 123
_p( *P   );                           //⇒ *P   = 123
_p( **Q  );                           //⇒ **Q  = 123
#undef _p
}

From the C precedence table, notice that the address-of and indirection operators have the same level of precedence, but they associate from right to left. Consequently, in the ex­pres­sion &*P, in­di­rec­tion is performed first, which represents memory, and is exactly what the ad­dress-of op­e­ra­tor re­quires: it can only take the address of an expression that re­pre­sents memory.

Empirically, one can see by the output that *P is effectively (not entirely), an alias for V, just like *Q is effectively an alias for P, and **Q consequently also an alias for V. That supports the above definitions, so it should be no sur­prise. This will remain true, for as along as P con­tains the address of V. If P is modified to con­tain the ad­dress of another vari­able, say W of type int, then from that point on, *P will be an alias for W.

The example above is not practical code, since there is no point in taking the address of a vari­able, and store it in another variable in the same scope. This is not illegal though, so we used it to illustrate the fun­da­men­tals of in­di­rec­tion. In practical programs, the most common reason for taking an ad­dress, is to pass it to a func­tion, so that the function may modify the variable via indirection.

extern void tripple_it (int* parm);
···
int var1 = 12, var2 = 20;
printf("var1 = %d\n", var1);  //⇒ 12
tripple_it (&var1);
printf("var1 = %d\n", var1);  //⇒ 36
printf("var2 = %d\n", var2);  //⇒ 20
tripple_it (&var2);
printf("var2 = %d\n", var2);  //⇒ 60
···
void tripple_it (int* parm) {
*parm *= 3; // *parm is an “alias” for value at the address in parm.
}

The same machine code in tripple_it() can now modify any variable whose address has been passed, by using the indirection operator, without ever knowing the names of the vari­ables, or em­ploy­ing vari­ables defined on the external level.

Another reason we pass pointers to functions, is when we have no choice. This is when, con­cep­tu­al­ly, we want to pass an array. We say conceptually, because we have now established that an array can­not be used as a complete unit, or “chunk” of memory.

#### Const Pointers

The problem now is that we may want to pass the array so that the function can read from it; we don't want the function to write to the array. Of course, we can hope the function will not mo­di­fy it. But better to be safe than sorry, which brings us to the concept of “read-on­ly point­ers” or, more for­mal­ly: “const pointers”.

 Given: Same: Then: const T* P; \longleftarrow pointer to a value of type const T. T const* P; \longleftarrow pointer to a value of type const T. *P -t\rightarrow const T \longleftarrow means it can only be read.

The following definition of P has the same effect: T const *P;, but most programmers use the first ver­sion in the definition above. Here, this pattern is used in an example:

#define ARRSZ(a) (sizeof(a)/sizeof(*(a)))
extern int sum (const int* beg, const int* end);
···
int data[] = { 11, 22, 33, 44 };
int total = sum(data, data + ARRSZ(data));
printf("Sum of data = %d\n", total);
···
int sum (const int* beg, const int* end) {
assert(beg < end);   // bad args if beg >= end.
int result = *beg++;
while (beg != end)
result += *beg++;
return result;
}

Of course, you could have used another algorithm and parameters for sum(), but you will still have at least one parameter of type const int*. We show such an example below, but also explain one more rule:

#### Array Parameter Optional Syntax

To aid readability, C allows one to optionally define pointer parameters as if they are arrays.

 When, and only when, a ‹param›eter of a function is declared or defined, then: ‘T* ‹param›’ \equiv ‘T ‹param›[]’, and ‘const T* ‹param›’ \equiv ‘const T ‹param›[]’. ‘T (*‹param›)[N]’ \equiv ‘T ‹param›[][N]’. Also ‘const T (* ‹param›)[N]’ \equiv ‘const T ‹param›[][N]’. Constant integer expressions between the square brackets, are allowed, but ignored. The N for the pointer-to-array type, is required however.

This rule exists solely so that programmers may convey more meaning: “abstractly, passing/expecting an array”. It does not change any be­ha­viour. If a function is expecting an ar­ray, seeing that a parameter is of type T [], is more mean­ing­ful to a reader than seeing it is of type T*, which could just as well be the ad­dress of a single val­ue. It is considered a good coding convention to use this rule, where ap­plic­able:

#define ARRSZ(a) (sizeof(a)/sizeof(*(a)))
extern int sum (const int arr[], size_t count);
···
int data[] = { 11, 22, 33, 44 };
int total = sum(data, ARRSZ(data));
printf("Sum of data = %d\n", total);
···
int sum (const int arr[], size_t count) {
int result = 0;
for (size_t i = 0; i < count; ++i)
result += arr[i];
return result;
}

If, for some reason, you wanted a pointer variable or parameter, to also be const, not just what it is pointing to, you can use const twice:

   int const * const p = some_value;

Now both p and *p result in const types, and that is why p has to be initialised with some value — last chance you'll get. It could also have been writ­ten as follows, with the same semantics:

   const int* const p = some_value;

If only p has to be const, and not what it points to, i.e. *p:

   int* const p = some_value;

In the last case, p must be initialised, otherwise it will result in a compilation error.

#### Passing Pointers for Speed

The only other reason we pass pointers to functions, is when passing by value would be too ex­pen­sive, as may be the case for large structure type values. Passing an address could be much faster. Again, we should make it a const pointer, if we do not want the func­tion to mo­di­fy the mem­bers of the struct.

typedef struct S_t {
int member;
} S_t;

void F (const S_t* parm) {
printf("(*parm).member = %d\n", (*parm).member);
printf("parm->member   = %d\n", parm->member  );
}
···
S_t V = { 123 };
F(&V);                  //⇒ 123

We have to parenthesise the (*parm) expression, which represents the actual value parm points to, since the indirection operator (*) has lower precedence than the mem­ber se­lec­tion op­e­ra­tor (.). The member se­lec­tion operation will not work on pointers, hence the need to represent the struct first, then we can select a member from the representation.

Indirect member selection, like subscript, is a shortcut operator. The ex­pres­sion S->M is preferred, but is synonymous with (*S).M. It is therefore also an in­di­rec­tion ex­pres­sion, and if S is an lvalue, so is S->M, assuming M complies with lvalue rules (not an array).

### Pointer Arithmetic

It is possible to add or subtract (+/-) an integer type value to, or from, a pointer type value. This is a special case, and is called pointer arithmetic. In this “arithmetic”, 1+1 will not ne­ces­sa­ri­ly be equal to 2, nor will 2-1 necessarily be equal to 1.

 Given: Then: Value: E -t\rightarrow T*,  and  I  -t\rightarrow any integer type. E + I  \equiv  I + E,  E - I  -t\rightarrow T*  \longleftarrow commutativity1 of + holds. E ± I  * sizeof(T)  -r\rightarrow “value of E plus-or-minus I times sizeof(T)”

From the definition, we can see that - or + will not change the type of the pointer expression oper­and, so that the result will still be T*.

Conceptually, this means that a T*, can only point to T values, so when incremented, for ex­amp­le, it can only point to the next T value. If two T values are adjacent in memory, and E results in the ad­dress of the first, then E+1, will give us the next T, i.e., the address of the sec­ond T value, re­gard­less of the size of a T value. This can be extended to any se­quen­ce, so that E+5, for ex­amp­le, will re­sult in the address of the sixth T. Of course, although not ne­ces­sa­ry, E+0 is legal, and equal to E.

Pointer arithmetic exposes one of the biggest dangers in C. By using pointer arithmetic, any ad­dress can effectively be reached, whether that address is valid, or whether that address actually con­tains a value of type T, or not. The compiler cannot help us to stay within the bounds of our se­quen­ce of T values — it is just performing arithmetic. In the example below, we com­bine point­er arith­me­tic and in­di­rec­tion, which allows us to access elements of the ar­ray, using a point­er to the first element, stored in P, and an offset.

   int* P = (int[3]){ 11, 22, 33 };        // ptr. to seq. of 3 ints.
printf("*(P + 0) = %d\n", *(P + 0));    //⇒ 11
printf("*(P + 1) = %d\n", *(P + 1));    //⇒ 22
printf("*(P + 2) = %d\n", *(P + 2));    //⇒ 33
printf("*(0 + P) = %d\n", *(0 + P));    //⇒ 11
printf("*(1 + P) = %d\n", *(1 + P));    //⇒ 22
printf("*(2 + P) = %d\n", *(2 + P));    //⇒ 33

The first statement uses a C99 “compound literal” to create a static, unnamed array of 3 int val­ues, and assign the address of the first element to P. An array compound literal acts just like an array re­pre­sented by name — it results in the address of the first element.

#### Arrays of Arrays

An element of an array could be another array, also known as “an array of arrays”, more com­mon­ly mis­re­pre­sented as a “multi-dimensional array”, leading to all kinds of mis­con­cep­tions. Talking of a multi-dimensional array, is useful only as an algorithmic abstraction — it does not represent C se­man­tics, which only involves types, and pointer arithmetic.

In the example below, you can easily replace ROW with T, and see how it compares with the de­f­i­n­i­tions above. Ultimately, M + 1 must result, as we have seen, in the address of the next ROW type (2nd element). For that to work, the compiler uses the type to calculate that offset. And the type of the el­e­ment is ROW, which is a synonym for int[3] (array of 3 ints). Thus, M + 1 gets calculated as:

M + 1 * sizeof(ROW), or, since ROW is a synonym for int[3], as:
M + 1 * sizeof(int[3]), which provides the correct address.
/* typedef version
*/ {
typedef int ROW[3];        // ROW is synonym for int[3].
ROW M[4] = {               // M stores 4 ROW values.
{ 11, 12, 13 },         // values for M[0] “first ROW”.
{ 21, 22, 23 },         // values for M[1] “second ROW”.
{ 31, 32, 33 },         // values for M[2] “third ROW”.
{ 41, 42, 43 }};        // values for M[3] “fourth ROW”.

ROW* P = M;                // M === &M[0]
ROW (*Q)[4] = &M;          // &M is “ptr-to-array of 4 ROWs”.

printf("M    %p\n", (void*) M);
printf("&M   %p\n", (void*) M);
printf("P    %p\n", (void*) P);
printf("&P   %p\n", (void*) &P);

printf("%d %d\n", *(*(M + 1) + 2)       , M[1][2]   );  //⇒ 23 23
printf("%d %d\n", *(*(P + 1) + 2)       , P[1][2]   );  //⇒ 23 23
printf("%d %d\n", *(*(*(Q + 0) + 1) + 2), Q[0][1][2]);  //⇒ 23 23
}

Without using the ROW user-defined typedef, no operators need to change, only the syntax for the de­f­i­n­i­tion of the M variable:

/* fundamental type version
*/ {
int M[4][3] = {
{ 11, 12, 13 },         // values for M[0] “first int[3]”.
{ 21, 22, 23 },         // values for M[1] “second int[3]”.
{ 31, 32, 33 },         // values for M[2] “third int[3]”.
{ 41, 42, 43 }};        // values for M[3] “fourth int[3]”.

int (*P)[3] = M;           // M === &M[0]
int (*Q)[4][3] = &M;       // &M is “ptr-to-array of int[4][3]s”.

printf("M    %p\n", (void*) M);
printf("&M   %p\n", (void*) M);
printf("P    %p\n", (void*) P);
printf("&P   %p\n", (void*) &P);

printf("%d %d\n", *(*(M + 1) + 2)       , M[1][2]   );  //⇒ 23 23
printf("%d %d\n", *(*(P + 1) + 2)       , P[1][2]   );  //⇒ 23 23
printf("%d %d\n", *(*(*(Q + 0) + 1) + 2), Q[0][1][2]);  //⇒ 23 23
}

As arrays of arrays can become large quite quickly, and because the stack can be very lim­it­ed in some environments, it is often more convenient to allocate the memory at run­time (dy­nam­i­cal­ly), using the standard library, or a custom library. Here is a program similar to the above ex­am­ples, but em­ploy­ing dynamic memory:

   int (*M)[3] = (int(*)[3])malloc(4 * 3 * sizeof(int));
// or: …malloc(4 * sizeof(int[3]));.
if (!M) { // malloc returns null pointer on failure.
fprintf(stderr, "No memory.");
exit(EXIT_FAILURE);
}
M[0][0] = 11; M[0][1] = 12; M[0][2] = 13;
M[1][0] = 21; M[1][1] = 22; M[1][2] = 23;
M[2][0] = 31; M[2][1] = 32; M[2][2] = 33;
M[3][0] = 41; M[3][1] = 42; M[3][2] = 43;

printf("M[1][2] = %d\n", M[1][2]);          // tidy expression, but is
printf("        = %d\n", *(*(M + 1) + 2));  // calculated like this.
···
free(M); // important.

Since C does not store metadata anywhere for arrays, it follows that M, &M, and &M[0], will all pro­duce the same address. The only difference is in the type of the address &M produces, as op­posed to the type of M and &M[0], which in turn affects any pointer arithmetic applied to it.

One of the biggest problems with dynamic memory allocation, is to remember to free() the mem­o­ry once done with it. This is easy to forget, or to miss on a return path, and is called a “mem­o­ry leak”. Sim­i­lar­ly, one must check the return value of malloc() for a fail­ure to al­lo­cate mem­o­ry. To continue without error checking, is looking for trouble, and in­di­cates slop­py or lazy programming.

An alternative for C99, which does not use the stack, or dynamic memory, but rather static mem­o­ry (glob­al life­time), is to use a compound literal. The compiler simply creates an un­named array, with a global lifetime, and returns a pointer to the first element:

   int (*M)[3] = (int[4][3]){
{ 11, 12, 13 },
{ 21, 22, 23 },
{ 31, 32, 33 },
{ 41, 42, 43 }};
// now we can use M algorithmically like a 2D array. the 1
// is the “row” offset, and the 2 is the “column” offset:
printf("M{Row2,Col3} = %d\n", *(*(M + 1) + 2));    //⇒ 23

All the arrays of arrays examples above use the same pointer arithmetic. The ex­po­si­tion be­low refers to any one of the above arrays of arrays examples, all referenced by M; since the types are the same, the same operators will produce the same results.

Instead of using the above C99/11 compound literal syntax, we could use the type int[4][3] with malloc(), then the code might looks as follows:

   int (*M)[3] = (int(*)[3])malloc(sizeof(int[4][3]);
int i, j;
M[i=0][j=0] = 11;  M[i][++j] = 12;  M[i][++j] = 13;
M[i=1][j=0] = 21;  M[i][++j] = 22;  M[i][++j] = 23;
M[i=2][j=0] = 31;  M[i][++j] = 32;  M[i][++j] = 33;
M[i=3][j=0] = 41;  M[i][++j] = 42;  M[i][++j] = 43;
···
free(M); // release the dynamically allocated memory.

To place initial values in the M “array”, is now more cumbersome, explaining the addition of com­pound literals to the C language.

#### Arrays of Arrays Pointer Arithmetic

Since M, in an expression, is pointing to the first element, which is an array, the type is a point­er-to-array, which we write as: int(*)[3], or ROW*, if using the synonym.

So, *(M + 1) represents the 2nd row, but since the second row is an array, it must result in a pointer to the first element: int*.

Assuming the result of *(M + 1) == R, then (R + 2) is the address of the 3rd element, and thus *(R + 2), i.e., *(*(M + 1) + 2) represents the 3rd element: 23, of the second element of M.

The following example does not add much more, but does try to show that an int[2][3] array (like A below), will result in a pointer-to-array: int(*)[3]. Given a variable of that type, like Q, the same op­e­ra­tors will give the same result on both A and Q; they are different kinds of vari­ables, but they have the same type, and in the example, the same val­ue in an ex­pres­sion.

   // “P is a ptr-to-array of 3 elements of type int”, and the compound
// initialiser, whose result is assigned to P, is:
// “an array of 2 elements, each being an array of 3 elements of int.”
//
int (*P)[3] = (int[2][3]){ { 11, 12, 13 }, { 21, 22, 23 } };

// in an expression, “A *results* in a ptr-to-array of 3 elements,
// of type int”. Or: a pointer to a int[3], which we cannot write
// as int[3]*, we must write it as int(*)[3].
//
int A[2][3] = { { 11, 12, 13 }, { 21, 22, 23 } };

int (*Q)[3] = A;          // A has type int(*)[3] here.

// all printfs below, output 23.
//
printf("*(*(A + 1) + 2) = %d\n", *(*(A + 1) + 2) );
printf("*(*(Q + 1) + 2) = %d\n", *(*(Q + 1) + 2) );

// A[1], for example, represents an array (the second element), so it
// must result in a int*, because the first element of the second
// array, is an int.

int* L = *(A + 1);         // all good.
//int** M = A;             // illegal. will not compile. wrong types.

Pointer arithmetic is at the core of all array operations. Fortunately, as shown later, C pro­vides the sub­script operator, which allows for more concise expressions.

#### Pointer Difference

As a matter of interest, pointers can be subtracted from each other. The result has the type ptrdiff_t, (from <stddef.h>), which is not an intrinsic type. It is “implementation-defined”, which means a compiler im­ple­ment­er can decide about the size, and therefore range, of the val­ue. Generally, it is a signed type. Not all possible differences may be legal, i.e., a result may be bigger than PTRDIFF_MAX (from <stdint.h>).

Caveat: pointer differences are only well-defined when pointers to different elements in the same array are subtracted, and includes the pointer that is one past the end of the array. That also im­plies that the pointers must be of the same type.

Void pointers can also be subtracted from one another, but keep the above caveat in mind.

### Subscript Operator

The subscript operator, or index operator, is actually simply shorthand, or “syntactic sugar”. In fact, it is very superficial shorthand, since a subscript operation is simply phys­i­cal­ly re­arranged into a pointer arithmetic and indirection expression before types are checked, or machine code gen­er­at­ed. This is crucial to accept and understand, otherwise the following will not make sense.

 Given: Rewritten: Meaning: X[Y] \longleftarrow i.e., any pattern in this form, it is… *(X + Y) \longleftarrow before type checking and compilation. X[Y] \equiv *(X + Y) \equiv *(Y + X) \equiv Y[X]

This is transformed literally, so that X[Y] is equal to Y[X], just like *(X + Y) is equal to *(Y + X), which is what the first two patterns are translated to respectively, anyway2. This is only a prob­lem if you have pre­con­ceived ideas about the subscript operator. Since most C pro­gram­mers are not aware of this de­f­i­n­i­tion, the convention is to persevere with the most “nat­ur­al-look­ing” ver­sion.

   int A[3] = { 11, 22, 33 };
int* P = A;                             // &A[0] stored in P.
printf("A[2]     = %d\n", A[2]     );   // recommended pattern.
printf("2[A]     = %d\n", 2[A]     );
printf("*(A + 2) = %d\n", *(A + 2) );
printf("*(2 + A) = %d\n", *(2 + A) );
printf("P[2]     = %d\n", P[2]     );   // recommended pattern.
printf("2[P]     = %d\n", 2[P]     );
printf("*(P + 2) = %d\n", *(P + 2) );
printf("*(2 + P) = %d\n", *(2 + P) );

Now, when most junior C programmers are asked to find the address of the third element, they most likely will write: &A[2] (hopefully, they are not so junior, that they will write &A[3]). But it should be clear that it will give the same result as: A + 2.

The only reason you may want to use &A[2], as opposed to A + 2, is your conviction that it pro­vides more information to a reader or maintainer of your code. Also, prefer A[2] over *(A + 2), even if you know that is what C compiles, regardless of your abstraction.

#### Subscript Operator and Arrays of Arrays

Since the subscript operator is shorthand for pointer arithmetic, we can avoid man­u­al­ly ap­ply­ing pointer arithmetic. Consider a previous arrays of arrays example, rewritten to use the sub­script op­e­ra­tor:

   int M[4][3] = {
{ 11, 12, 13 },
{ 21, 22, 23 },
{ 31, 32, 33 },
{ 41, 42, 43 }};
// now we can use M algorithmically like a 2D array. the 1
// is the “row” offset, and the 2 is the “column” offset:
printf("M{Row2,Col3} = %d\n", M[1][2]);    //⇒ 23

Since M[1][2] is firstly translated to: *(M[1] + 2), and M[1] subsequently rewritten as well, it leaves us with: *(*(M + 1) + 2), which is the expression that the previous example used to se­lect the 3rd element from the 2nd “row”. This is clearly the preferred syntax to use, as long as there is no doubt, that this is not abstract, but simply disguised pointer arithmetic.

### Arrays of Arrays Alternative

Arrays of arrays are not convenient, mostly because the number of “columns” must be con­stant, and part of the type. This makes it difficult to write generic functions with such types.

   int _mem[4][3] = {
{ 11, 12, 13 },
{ 21, 22, 23 },
{ 31, 32, 33 },
{ 41, 42, 43 }};

int* M[4] = { _mem[0], _mem[1], _mem[2], _mem[3] };  // array of int*.
int** P = M;

printf("M{Row2,Col3} = %d\n", M[1][2]);      //⇒ 23
printf("P{Row2,Col3} = %d\n", P[1][2]);      //⇒ 23

The addresses of the “rows” stored in M, could have been dynamically allocated. To keep the code small, the example uses _mem to “allocate” and initialise the memory. We could have used C99's des­ig­nat­ed initialisers instead, but the “rows” would not be guaranteed to be con­tigu­ous, which could be prob­lem­at­ic for certain algorithms:

   int* M[4] = {
(int[3]){ 11, 12, 13 },
(int[3]){ 21, 22, 23 },
(int[3]){ 31, 32, 33 },
(int[3]){ 41, 42, 43 }};

The point is that M is not an array of arrays — it is simply an array which happens to contain a list of int pointers. Consequently, selecting an element yields an int*, which we arranged to be the ad­dress of a sequence of 3 int values. Now we can use an additional subscript op­e­ra­tor to re­pre­sent an element in the “row”: M[row][col].

Memory for the array of pointers, and the elements can be allocated as one block. In this case, the num­ber of “rows” and the number of “columns” can be determined dynamically, depending on run­time re­quire­ments. We can arrange memory as follows:

There may be a gap between the array of pointers, and the memory for the actual elements, with­out affecting operations. Since both the number of “rows” and the number of “columns” can vary, both values must be transmitted to a function taking such a construct as parameter:

int sum2d (int* arr[], size_t rows, size_t cols) {
int total = 0;
for (size_t r = 0; r < rows; ++r)
for (size_t c = 0; c < cols; ++c)
total += arr[r][c];
}

// dynamically allocate the “2D” array. R and C can be variables
// determined at runtime from other sources. it takes a few liberties
// regarding the alignment of int* and int, and not checking if
// the memory allocation succeeded. also set values for elements.

size_t R = 4, C = 3;
int** M = (int**)malloc(R * sizeof(int*) + R * C * sizeof(int));
for (size_t r = 0; r < R; ++r) {
M[r] = (int*)(M + R) + r * C;
for (size_t c = 0; c < C; ++c)
M[r][c] = (r + 1) * 10 + (c + 1);
}
printf("M[1][2] = %d\n", M[1][2]);
printf("sum2d(M, R, C) = %d\n", sum2d(M, R, C));

Remember that, in this context, int* arr[] is equivalent to: int** arr, but is more de­scrip­tive for this particular situation. For better portability, the space for the “row” pointer could have been se­pa­rate­ly allocated from the memory for the actual elements. The only danger with that option, is re­mem­ber­ing to free() two blocks of memory. C allows us to decide on trade­offs.

If we wanted to protect the array elements from accidental modification in a function like sum2d(), we could have defined it as:

int sum2d (const int * const arr[], size_t rows, size_t cols) {
···
arr[1][2] = 99; // will fail to compile
···
}
… sum2d((const int * const *)M, R, C) …

Unfortunately, because of some C inadequacies, we have to cast the passed parameter to the cor­rect type. When safety is paramount, however, it is a small price to pay.

Because of the memory layout achieved, we could also treat the actual elements as a con­tigu­ous ar­ray of int values, and pass it to functions that can work with a “normal” array:

int sum (const int* beg, const int* end) {
int total = *beg++;
while (beg != end)
total += *beg++;
}
···
int t = sum(M[0], M[0] + R * C);

This version also does not suffer from having to cast the argument passed to the sum() function.

### Pointer Type Conversions

Simplistically: any pointer type can be converted to any other pointer type with an explicit cast. The only implicit pointer type cast, is converting from any pointer type to a void*. In C, the re­verse is also au­to­ma­tic, but should not be depended on, since it is not true in C++.

The practical result of casting a const T* to T* (a const pointer, to a non-const pointer), is im­ple­men­ta­tion defined, and generally just bad practice.

Casting function pointers is possible, but very dubious, and the result is implementation de­fin­ed.

Converting an integer type to a pointer type, and vice versa, is allowed. But the result is im­ple­men­ta­tion defined, and thus not necessarily very portable.

Assuming data is properly aligned in memory, a pointer to the first byte, e.g. char*, can be cast to a pointer of any type, e.g. long*. Applying the indirection operation to the result means that, ef­fec­tive­ly, we can treat any piece of memory as any type of value. Again, we try to avoid this as much as possible.

   unsigned char _mem[] = { 0x41, 0x42, 0x43, 0x44, 0x00, 0x11, 0x22, 0x00 };
printf("%c\n",  *(char*)_mem);           //⇒ A
printf("%s\n",   (char*)_mem);           //⇒ ABCD
printf("%d\n", *(short*)_mem);           //⇒ 16961
printf("%d\n",   *(int*)_mem);           //⇒ 1145258561
printf("%ld\n", *(long*)_mem);           //⇒ 9588842051093057
printf("%c\n", *(char*)(_mem+2));        //⇒ C

The values used are not significant. We only wanted to show that the same se­quen­ce of mem­o­ry can be treated as different types, and hence can produce different values, since more bytes are in­volved in the value. Also note that, if sizeof(long) is not 8 bytes, it will not dis­play the value as in­di­cat­ed in the rel­e­vant comment.

Any pointer type value can be cast implicitly to void*, as mentioned before. The reverse is true in C, but not C++, so it should be rather cast explicitly. The return of malloc(), for ex­amp­le, is a void*, and this extract follows the suggestion:

   int* P = (int*)malloc(N * sizeof(int));

## Supplementary Topics

A few topics, although not directly related to pointers, are lightly covered below, because they are often not well understood, and this may aid comprehension.

### Implicit Data Movement

Some operations are implied, in other words, the operation takes place because of the con­text in which an expression is used, not because of a physical operator. In particular, when passing ar­gu­ments to functions, no operator is required to facilitate the movement of the expression's val­ue to the special local variable of the function, which we call a parameter.

Here is a summary of all occasions where data movement takes place, in other words, a source and a destination is involved. The source and destination types should either be the same, or the source type must be implicitly convertible to the destination type.

• Variable initialisation — Option clause to variable definition syntax.

• Lvalue assignment — Use of side-effect operators on expressions that represent modifiable memory.

• Parameter initialisation — Passing arguments to a function ef­fec­tive­ly ini­tial­i­ses pa­ra­me­ters, which are, in all other respects, local variables to the function being called.

• Function returns — Conceptually, all non-void functions have an anonymous variable that holds its return value temporarily, when the return ‹expression›; statement is executed.

When we say a language has “pass by reference”, it means the language has a syntax where­by a pro­gram­mer can spec­i­fy that an argument to a parameter must be automatically passed as an address (trans­par­ent­ly), and that access to the parameter will automatically and trans­par­ent­ly, indirect through the address.

In C, we must do all that explicitly, by declaring the parameter as a pointer type, by explicitly ob­tain­ing the address of the argument, and by explicitly applying the indirection operators — no syntax, nothing automatic, no transparency. Even in languages that support it, “passing a ref­er­ence” is dif­fer­ent from “pass by reference”.

Implicit data movement takes place when a value is returned from a function. It is returned as a tem­po­rary, anonymous, variable. Practically, for efficiency, smaller values may be re­turned in a register, but this does not affect the principle. The same rules that apply to as­sign­ment, not only apply to ar­gu­ment pass­ing, but also to function returns.

### Concept of a Singular Type / Value

A value in C does not have to map cleanly to assembler types (integers of varying sizes and float­ing point values). A single value in C can be compound; in other words, not a scalar. There is only one way, how­ever, to get a compound, arbitrarily sized value, and that is with a struc­tured type value.

Regardless of size, a structured type value can be moved around (assigned, passed as ar­gu­ment, or returned from a function), as ef­fort­less­ly as an int. We can therefore treat a struc­tured type value as sin­gu­lar, when we require — it is just another T. Technically, however, we cannot call it scalar.

For types that depict singular types, the syntax for definition is simple: T V;. The variable V fol­lows the type T. This is also true for pointer types (scalar values): T* P;, but this is not sur­pris­ing, since we have al­ready ascertained that T* is “just another” Type.

For contrast: With arrays, the syntax requires that the type enfolds the object of the type: T A[N];. The type of A is T[N], but we must write the type around the A. If, however, we create a synonym for this type, we can use it as a singular type:

typedef int T[3]; // T is synonym for int[5].
···
T A       = { 11, 22, 33 };   // A has type T, i.e. int[3].
int B[3]  = { 11, 22, 33 };   // B has type int[3]
T* P = &A;                    // T* is synonym for int(*)[3].
int(*Q)[3] = &A;              // P and Q have the same type.

printf("%d %d %d %d\n", A[1], B[1], P[0][1], Q[0][1]);

The output will be 22 for all expressions passed to printf(). This is not useful for function types, which also enfold the subject of the type, but conversely, it is very useful for function pointer types.

### Type Syntax Variations

Here is a complete summary of the shapes (syntax) for various categories of types in C. The ◎ symbol indicates the subject of the type, i.e., the language element to which the type in the declaration or definition, is ap­plied:

• T ◎ \longleftarrow singular type. Includes structured and union types, or type synonyms.
• T* ◎ \longleftarrow pointer type. A derived type3.
• T ◎ [N] \longleftarrow array type. A derived type.
• T ◎ (P) \longleftarrow function type. Not a data type. Define or declare functions only.
• T(* ◎ )[M] \longleftarrow pointer-to-array type.
• T(* ◎ )(P) \longleftarrow pointer-to-function type.
• void \longleftarrow abstract type.

Optional keywords used together with types, as type modifiers, or type specifiers, that affect the storage class, linkage, and volatility:

extern static volatile register const

These have no effect on the operators discussed, except for variables with register qual­i­fied types, whose addresses cannot be taken.

Performing indirection on a const* pointer yields a const, which is not an lvalue. Ordinarily, in­di­rec­tion yields an lvalue.

Remember that subscript and indirect member selection, are also indirection expressions (or in­volve in­di­rec­tion, in the case of the indirect member selection operator).

# Summary

All aspects of pointers, and indirection, are supported by a handful of rules. An (ad­mit­ted­ly in­ti­mate) understanding a handful of core rules, is all that is required. This means mastery is em­i­nent­ly ach­iev­able. It does not preclude the requirement of understanding the other rules of C, but con­sid­er­ing these are arguably the most complex, the premise should hold.

It should be apparent that arrays are only superficially supported in C, and that pointers, and the op­e­ra­tors that employ them, play an indispensable role.

## Pointers and Operators Summary

Summarized in the points below, T is any type, P has type T*, and A is an array of N elements of type T. V is a variable of type T, and F is a function taking parm parameters (any), returning a T value. X can be any integer type (int used here). In short, assume these definitions and declarations as “given”, and properly initialised:

T V; T* P; T A[N]; T F(parm); int X;

1. An address is a number depicting the address of a byte in memory.
2. A pointer type, e.g., T*, depicts the address of a T value.
3. &V (address of V) results in a value of type T*.
4. *P (indirect P) means “represent a T at address P” and is an lvalue by default.
5. P +/- I is commutative, and equals P +/- I * sizeof(T).
6. A[X] or X[A] is rewritten as *(A + X) or *(X + A) respectively, before compilation. This is legal, but given A is the pointer operand, using X[A] is discouraged.
7. A results in a pointer to the first element, and thus has type T*. This applies to any expression representing an array, not just array names.
8. &A has type: T(*)[N] (pointer-to-array).
9. F has type: T(*)(P) (pointer-to-function).
10. F(arg) has type T (result of function call operator).
11. P->M is syntactic sugar (shorthand) for (*P).M.

The last rule requires that P is a pointer to a structured type, and that M is a member of that struc­tured type.

## Obtaining Pointers Summary

Although some of the following points are implied by the previous summary, the focus here is on ob­tain­ing pointer type values only. Programmers can obtain a pointer type value by:

1. Using the address-of operator on lvalues.
2. Representing an array in an expression.
3. Using the name of a function.
4. Using a string literal.

## Last Words

There are not many rules. Nor complicated ones. The main problem is the syntax chosen for de­c­la­ra­tions, especially since the types can be combined in endless combinations. Gen­er­al­ly, many ex­am­ples and hours of practice are required before most programmers feel com­plete­ly com­fort­able with all these rules. But it is entirely possible. A suggested course of action is to be­come familiar with these rules, before trying to combine several types, since that tends to ob­scure the patterns. The liberal use of typedef is another feasible technique.

1. When overloading + on pointers in C++, commutativity is not preserved.↩︎

2. If the operator is overloaded in C++, this rule does not apply.↩︎

3. In C++, we also have T::*, or “pointer to member”, but it acts more like an offset to a member in a struct or class, and thus is not technically an address.↩︎

2021-11-26: Fixed int* where char* should have been. [brx]
2020-05-29: Formatting (due to Ockert van Schalkwyk's advice). [brx]
2019-02-18: Changed type alias FP to FT. [brx]
2018-10-17: Additional typedef for function pointer types. [brx]
2018-08-10: Code corrections, typography changes, small additions & editing. [brx]
2018-05-24: Modified some output examples for arrays-of-arrays. [brx]
2018-04-12: Fixed reported typos. [brx]
2017-11-16: Update to new admonitions. [brx]
2017-09-22: Editing and formatting. [jjc]
2017-03-11: Created. [brx]