C Pointer and Array Summary

C Types and Operators Related to Pointers

Pointers (variables of a pointer type) and arrays (which result in a pointer type) are of­ten men­tion­ed to­geth­er. This is not because they are similar, but because certain rules make it pos­sib­le to apply the same operators to both types. This is a summary of the main points.
PREREQUISITES — You should already…
  • have some C programming experience.
  • understand the concept and purpose of types in a statically typed language.
  • understand expressions, operators and precedence.

Background

Variables of an array type behave differently from other variables. They degenerate into a pointer ex­pres­sion, i.e., an expression whose resulting value is a pointer type. This cre­at­es some con­fus­ion, since they result in pointer values, but are not pointer variables.

Addresses in General

In assembler, an address is just a number that represents the location of a byte in memory. As­sem­bler programs must remember how many bytes they stored in memory, starting at that byte lo­ca­tion, and choose the appropriate machine instruction to fetch the correct number of bytes.

In C, although we must be aware that all variables have an address, the type we choose for the var­ia­ble will determine the machine instructions the compiler generates when we access the variable. This is much simpler — we only have to consider the types of our variables, and the compiler will de­ter­mine the size and machine instructions.

Pointers

The term pointer type is a categorisation, or classification, of a potentially infinite list of possible types, which all share the same fundamental characteristic. It is an abstraction for the concept of “address of a particular type of value”. There is no type called “pointer”. Pointer types are derived from other, existing types.

Given that T represents any type, built-in or user-defined; then T* is a pointer type derived from T, which we pronounce as “T pointer”. It means “address of a T value”. So, int* means the address of an int value, and double* means the address of a double value, and so forth.

Obtaining Addresses

The address of any lvalue expression can be obtained with the address-of operator. An lvalue ex­pres­sion is an expression that represents (generally) modifiable memory. As va­ri­ab­les are lval­ues, we can take the addresses of variables with the address-of operator. The results of most op­e­ra­tors are not lvalues, so you cannot take the address of, for example, the return value of the func­tion call ope­ra­tor.

Arrays automatically result in “the address of the first element”, when they ap­pear in an ex­pres­sion. Considering that A[0] represents the first element of an array A of T values, then &A[0] is a le­gal ex­pres­sion — but it is unnecessary, because A by itself already results in that expression.

The type of an expression which is the result of taking the address of an array variable, must mean “address of and array of N elements of type T”. So the type of the expression &A for A above, must mean “address of an array of 3 elements of type int”, and the syntax for such a special type, is: int(*)[3], which is unfortunate (strange, obscure and limited). We pronounce it “pointer to array of 3 elements of int”.

Another instance where pointers are automatically generated, is in the case of literal strings. A literal string ultimately results in a pointer to the first character. For simple literal strings, e.g. "ABC", the re­sult will be a char*, whereas for L"ABC" (wide character literal string), the result will be wchar_t*.

Indirection

Pointers would not be very useful if we did not have any operators that could work with them. There are not many of these operators, but they are crucial. The most important is the in­di­rec­tion o­pe­ra­tor. The indirection operator can represent memory, with exactly the same con­se­que­nces and abi­li­ties as a variable (which is the traditional way to represent memory).

DefinitionIndirection

Given: E –t→ T* (any expression E, of type T*);
Then: *E is read as indirect E, and
Means: represent the T value at address E, and so, has
Type: T.

Remember that the value of E is an address.

In the example above, *P represents the same memory as I. But only because it cur­rent­ly con­tains the address of I. If we later put the address of some variable J into P, the *P will represent the same memory as J.

Indirection expressions are expressions where the last operator evaluated is the indirection op­e­ra­tor. Indirection expressions produce lvalues (like non-const variables). Hence the indirection op­e­ra­tor is one of the few operators that produces a result, which can be assigned to (*P = 456; above). This also means that you can take the address of the result of an indirection operator, like &*P (the precedence rules will evaluate the indirection first). This is of course pointless, since the result is ex­act­ly the same as just P by itself.

Pointer values are most useful when the name of a variable is not available (not in scope). This is why they are most commonly used as the type for some parameters: if we want a function that can modify a variable, even if it is not in the scope of the function, we can simp­ly pass the ad­dress of the variable.

The following function will swap the values of any two int variables. We pass it the addresses of the two relevant variables:

Pointer Arithmetic

A surprisingly useful feature in C is that of pointer arithmetic, where 1+1 does not necessarily eq­ual 2. The rule states that we can add or subtract integer type values to or from pointer type val­ues. The re­sult is calculated using pointer arithmetic, without changing the type of the pointer. The result is de­ter­min­ed as follows:

DEFINITIONPointer Arithmetic

Given: E –t→ T* (any expression E, of type T*);
Then: E +/- I is legal, and so is I +/- E (commutativity of addition) and the result
Calculated as: E +/- I * sizeof(T), with
Type: T*

The arithmetic is commutative, so I +/- E will produce the same result as E +/- I. Pointer arith­me­tic also applies to the increment and decrement operators.

NOTEVoid Expressions

Apart from assignment (including argument passing and function returns) and cast­ing to another pointer type, no operators work with void* expressions. This also applies to poin­ter arithmetic.

Subscript Subterfuge

The subscript operator is not a real operator. It just represents a particular (easy-to-read) pat­tern, which is rewritten upon compilation to a more fundamental expression, and only then com­pil­ed.

Given the pattern A[I], it is translated to *(A+I), which is a combination of pointer arithmetic and in­di­rec­tion, in that order. The pattern I[A] is, by the same rule, translated to *(I+A). Be­cause of the commutativity rule, all four expressions produce the exact same result (22 in the examples below). However, this is not under­stood by many programmers, so one should rather use the A[I] form, which is better understood by most, albeit superficially.

As the example illustrates, the subscript operator is simply disguised pointer arithmetic and in­di­rec­tion. These operators do not care where the operands originate from — whether from an ar­ray ex­pres­sion, a pointer variable, a function return, or whatever.

Indirect Member Selection

The indirect member selection operator is another non-operator. Given a pointer to a structured type, e.g. P of type S*, a member M of the structure can be selected with:

Since that seems a bit verbose, the following operator can be used instead, but it will be trans­lat­ed to the expression above:

Because the indirect member selection operator is more concise, it is recommended that you use it instead, even if it is not a real operator1.

Concluding Remarks

All rules that apply to indirection expressions, also apply to the array subscript operator, and to the indirect member selection operator.

Pointer arithmetic and indirection are at the heart of all C code — not much can be accomplished without them. This is exactly the same in assembler, except that we do not have the convenience of types, and automatic pointer arithmetic based on types.

These types and operators are fundamentally simple, but not often clearly explained. Con­si­der­ing that they are the cause of many bugs, C programmers should strive to completely understand these rules and operators. The result will be more than worth the effort.


  1. In C++, the indirect member selection operator can be over­load­ed, in which case it becomes very real, and will not be translated.


2018-05-24: Added note about the pointer-to-array type. [brx]
2017-11-18: Update to new admonitions. [brx]
2017-09-23: Edited. [jjc]