Scalar Literals and Variables

Values and Variable Definitions in Perl

After Getting Started with Perl, the next step is to learn about constants, literals and variable definitions. This part focuses on scalar (single value) literals and variables, using numbers and strings.
PREREQUISITES — You should already…

Concepts

perl-logo

Programs use values of different types and characteristics. Perl has very few types. It is not strongly-typed, in that it does minimal compile-time type checking, and performs implicit conversions at runtime.

Type Concepts

In Perl’s terminology, it has only three types: scalars, lists and hashes. That is not very helpful when considering the differences between numbers and strings, for example. In Perl’s view then, these are different kinds of scalar values, which can be stored in variables of type: scalar, list, or hash.

Type Sigils

Variables in Perl do not have an absolute, or pre-defined preference for the kind of data you store in them. Their names incorporate their type by virtual of a sigil in the form of a prefix character. This is ‘$’ for scalars, ‘@’ for arrays, and ‘%’ for hashes (dictionaries). It cannot be modified. One consequence of this arrangement, is that these are three different variables:

Name variables distinctly; do not depend on sigils as the only differentiation.

Arrays and Lists

Although Perl has an array type, the term is only applied to variables. An array can be created with a list of values, often called a literal list. Wherever documentation refers to ‘LIST’, you can use either a literal list, or an array variable.

Type Context

Many of Perl’s functions and operators expect a certain type as argument or operand. The pos­i­tion where a scalar, or list (literal array) is expected, is called a context, i.e., list context, or scalar context.

Certain operations and functions will behave differently when used in a list context, as opposed to a scalar context. This is always documented, and we will point this out where appropriate. At this point, we just want to make you aware of the concept and the terminology around it.

Although not officially a “context”, some operators need logical (boolean) values as one or more operands. There are now true or false constants in Perl, which means that values are treated logically, depending on certain rules.

References in General

Perl references are not often introduced to beginners, but they will eventually become useful. We take this opportunity merely to introduce the topic, and establish some terminology.

References are scalar values, and abstractly are addresses, or “pointers” (if you are familiar with that term). The ref(expr) function can be used to determine the type of an ‹expr›ession, as long as it is a reference expression. This suggests that a reference can store the location of (or “a reference to”) any kind of Perl object:

   SCALAR   CODE   LVALUE   VSTRING
   ARRAY    REF    FORMAT   Regexp 
   HASH     GLOB   IO

Since the above list is from Perl’s documentation, it would suggest that Perl has more than three types. And yes, it does, but formalism in open-source documentation cannot always be rig­or­ous­ly effected. Scalars, arrays and hashes are by far the most common types, and all that many pro­gram­mers need to be comfortable with.

Type Conversion

Type conversions are implicit. When an operator expects a number, it will convert the expression used as operand to a number auto­matically. There are no cast, or type conversion, operators in Perl. The closest is the int function, which truncates any decimal parts from a number (no roun­ding).

One consequence of this design choice, is that the comparison operators for numbers and strings are different. For example: == tests for equality on numbers, but eq tests for equality be­tween strings. Operators therefore drive implicit type conversion — they will automatically con­vert operands as required.

Undefined Values

It is possible for a value to be undefined. This may arise from the result of an operation or func­tion call. This value is represented by the undef function, and its result can be as­sign­ed to a va­ri­ab­le. To test for an undef value, use the defined function; this is the only way to distinguish an undef value from other “false” values.

Truthiness

Since Perl does not have a boolean type, it must treat expressions as either ‘true’ or ‘false’, in any context where it needs to treat values as logical. The treatment is based on certain criteria. We list the conditions under which an expression will be treated as ‘false’.

DefinitionTruthiness

Only the following values are ever treated as false: if the expression…

  • results in the number ‘0’, or ‘0.0’.
  • is an empty string: ‘''’, or ‘""’.
  • is the string: ‘'0'’, or ‘"0"’.
  • is an empty list: ‘()’.
  • results in ‘undef’.

All other values will be treated as true.

Operators like the comparison operators, which must report whether some condition is ‘true’ or ‘false’, will return ‘0’ for ‘false’ and ‘1’ for ‘true’. The logical operators, on the other hand, return the value of the last operand evaluated.

Do not worry too much about the ‘for’ loop, or the ‘ifstatement modifier, for now. The code above will iterate through all the elements in the @tests list, and successively store the current ele­ment in the default variable: ‘$_’. This variable is tested for “truthiness”, during each iteration.

IMPORTANTDefault Variable $_

Many Perl constructs, functions and statements require a value clause or ar­gu­ment. If none is provided, they will simply use the $_ “default variable”. If you are not aware of this, some code may not make sense. For example: print; actually results in: print $_;.

Whenever Perl documentation refers to a “Boolean” value as returned by operators and func­tions, it depends entirely on the relevant operator or function exactly what form the value will take. What you can be sure of, is that the value returned, will be treatable as true or false.

Constants

Very few introductory Perl material ever mentions the constant module. The concept of symbolic constants in programming seems to be past its prime. However, if you want to be “old school”, and value the concept of symbolic constants (values with a name), you can use the following as a pattern, without deep understanding. The pattern is very simple.

Syntax:Symbolic Constants

use constantNAME=>expr;

The ‹expr›ession can be a string, a number, or a calculation — but the result must be a scalar. It cannot be used to create array constants.

Consider a situation where you want to have your own constant for π (pi). As a convention, you may decide on some name consisting of only capital letters, e.g.: PI. Then you can create your con­stant as fol­lows:

From this point on, you can use PI just like any other literal. And without a sigil, which means it will not interpolated, but string concatenation can be used: "ABC".PI."DEF".

Literals

Literals are values without names; sometimes called “magic numbers” (assuming they are num­bers, otherwise they are strings or other magic values). Literals have a syntax notation, which de­ter­min­es what value will eventually end up in memory.

Numeric Literals

There is no distinction in memory between integer types and floating point types — in Perl, there are just “numbers”. Notation-wise, you may distinguish, and certain operators will disregard (trun­cate) decimals, but their type will remain the same.

Numeric Notation

Numbers are by default considered to be in base 10 (decimal) format. They may optionally be pre­fixed with a + (plus) or - (minus). If the number contains no decimal point, we may call it an “integer literal”. Underscores may be placed at strategic positions in any numeric literal. Here are some ex­am­ples of legal, base 10, integer literals:

A base other than decimal can be used. For octal, you merely start the integer literal with a 0 digit. The following digits are then treated as base 8, meaning that the largest legal digit will be 7, so: 0678 is illegal (since 8 is not an octal digit). The following example contains only octal in­te­ger literals and will print very different numbers to the decimal literals above:

Hexadecimal integer literals are indicated with a leading: 0x, where the x may also be a capital let­ter. As before, you can use underscores at any point.

And finally, you can use the 0b prefix for binary (base 2) numbers. The b may be a capital letter B. In the following example, we use all the available notations to represent the decimal value 1234:

Floating point literals contain a decimal point, which may or may not be followed by a digit. This is fixed-point notation: 123.456. Alternatively, regardless of the presence of a decimal point, ex­pon­en­tial notation can be used (1.23456×102): 1.23456e2. The e can also be a capital E. The ex­po­nent can be explicitly positive: 1.23456e+2, or negative: 123.456e-2.

Basic Numeric Operations

Regardless of numeric literal base notation, all numerical operations produce Perl numbers as re­pre­sen­ted in­ter­nal­ly. Some operators will disregard the fractional part (decimals).

Arithmetic. As one may expect, Perl provides the traditional arithmetic operators: + (addition), - (subtraction), * (multiplication) and / (division). There is no concept of integer division — the re­sult is always a Perl number, which may have decimals, so: say 123/50; will print 2.46. You can use the int func­tion to truncate the decimals: say int(123/50); will print 2.

Remainder. To get the remainder (modulus) of a division, you can use the modulus operator: %. It truncates any decimals before the operation. The code: say 123%50; will thus print 23.

To the Power. Numbers can be raised to a power with the ** operator, which has high pre­ce­dence. So, the equation πr2 can be expressed in Perl as: PI * $radius ** 2, given the ap­pro­pri­ate va­ri­ab­le and constant.

Bitwise. If you have need for bitwise operations, these mirror those of C: & (bitwise AND), | (bit­wise OR), ^ (bitwise XOR), ~ (bitwise complement or negation). They will all disregard any dec­i­mals present in operands. You also have the << (bitwise shift left), and >> (bitwise shift right) op­e­ra­tors at your disposal.

Numeric Output

Simply passing a numeric value to either the say function, or the print function, will output it in decimal to the highest precision. If the value has no decimal digits, it will not print any.

The only way to control numerical output appearance and base, is with the printf function (the sprintf function works the same, except it returns a formatted string). We will only show a cou­ple of common patterns here:

The % (percentage) character is the start of a format specification. It represents a placeholder, which will be replaced by the corresponding argument following the initial format string. Any other text is output verbatim (unless string interpolation is used, which is a Perl operation, and nothing to do with printf).

SyntaxPrint Format Specifier

  • %[‹width›][.prec›]‹spec
  • width total output width; fill with spaces on left; if negative, fill with spaces on right.
  • prec precision: number of decimal digits, rounded; only valid on floating point.
  • spec format specifier. see examples and documentation.

Both the ‹width› and ‹prec›ision parts can be an asterisk (*), in which case, for each asterisk, an additional integer parameter must be passed, which will be used for the ‹width› or ‹prec›ision.

To actually output a percentage sign, use two consecutive signs: %%. It does not matter whether you use double quotes for the formatting string literal, or single quotes. The format string can be a variable.

Rounding

A common beginner’s question involves rounding of floating point values, in particular the round­ing to a set number of decimals. There is no round function. You can use math functions from the core POSIX module, in particular floor() and ceil(), to convert down or up to the closest integer re­spec­tive­ly. However, here is a pattern that rounds a floating point number to any number of de­ci­mal digits:

PatternRounding to N Decimal Digits

  • var1= 1.0 * sprintf("%.Nf",var2);
  • var1› can be the same variable as ‹var2›.

As an example, consider this snippet, using the above pattern to round $num to 2 decimal digits:

The output will be: 123.46. The ‘1.0 * …’ part exists solely to convert the result back to a number, since the sprintf function re­turns a string. This part is completely optional.

String Literals

To represent a sequence of characters as fixed, constant values, many languages including Perl provide string literals, normally enclosed in either double quotes ("⋯"), or single quotes ('⋯'). In Perl, you can use either, but double-quoted string literals are special, in that they are parsed and processed at runtime, just like an expression containing operators.

Perl has no “character literal” syntax. To represent a single character, use a string con­tain­ing on­ly one character.

String Literal Notation

The simplest and most efficient form for a string literal, is one enclosed in single quotes. The on­ly special sequence inside a single-quoted string literal is: ‘\'’ to represent a single quote inside the string. This is not necessary for double-quoted strings, but they too must use: \" to get a double quote character into a string.

Both single-quoted and double-quoted literal strings can enclose nothing. This is called an “empty string”, and when treated as logical, will be treated as “false”. The length of an empty string is therefore 0. The backslash inside a literal string is called an “escape character”, and is the start of an “escape sequence”.

Although it is really not recommended for use, you should probably know that string literals may span multiple lines. This will embed newlines in the string, but will mess with indentation, and is thus not considered good programming practice. The following example will output 3 separate lines because of the embedded newlines.

Any indentation you apply before the GH… and QR… lines, will become spaces in the string. This also works for double-quoted string literals.

Here Strings / Documents

In Perl, long strings can be created that span several lines (apart from single-quoted strings). They are called “here documents” (here­docs) or “here strings”. You can use a here document anywhere a string is expected: you can as­sign it to a variable, pass it to a function, or pass it to ‘say’ or ‘print’.

SyntaxHere Strings / Documents

  • <<IDENT; notice trailing semicolon.
    ⋯⋯⋯⋯⋯   string, line(s) treated as double-quoted string literal.
    IDENT›        termination. notice absence of semicolon.
  • IDENT typically, an all-capitals identifier.

By default ‘<<IDENT;’ is treated as ‘<<"IDENT";’, although you can explicitly write it like that. The quotes must not be repeated on the terminating ‹IDENT›. Using: ‘<<'IDENT';’, will result in the here document being treated as a single-quoted string literal, so in­ter­po­la­tion and escape se­quen­ces will not work.

The trailing ‹IDENT› must be on a line by itself, without leading spaces, and no trailing semicolon.

Notice that the leading spaces are part of the resulting string. It is not much better than a string literal spanning many lines. You can also use here documents like this:

It is a little bit weird, we must admit. But that is Perl for you. Have fun; amuse your bosses and colleagues; they will appreciate your insight and mastery.

Escape Sequences

As we have seen before, the backslash inside string literals is treated as special: in this context, it is the escape character, starting an escape sequence. Apart from the one exception, escape se­quen­ces have meaning only inside double-quoted string literals.

Following a single backslash, you have several syntax options:

The conversion sequences are only useful when combined with string interpolation below.

To represent a single backslash, you have to use two: \\.

String Interpolation

Only double-quoted string literals support variable interpolation, and the special escape sequences. Va­ri­ab­les can be in­ter­po­lat­ed (expanded in the string), by simply ref­er­en­cing them: ‘"⋯$var⋯"’. If there is po­ten­tial for ambiguity, enclose the variable’s name in cur­ly braces: ‘"⋯${var}⋯"’.

The name MSG1 in this example is just an arbitrary name you choose to indicate the start and end of the here document. Note that there cannot be a trailing ‘;’ after the terminating MSG1. If you inserted indentation before the lines, those spaces will be part of the strings on each line.

This string can contain embedded newlines, but otherwise
will act like a double-quoted string, so interpolation
will work as expected. SOME MORE WORDS.

Message complete. Some more words.

Basic String Operations

Strings are concatenated (joined) with the . (period character) operator. If any of its two op­e­r­ands are not strings, they will be converted to strings.

The x (yes, the ‘x’ character), is a string repetition operator. The left-hand operand must be a string, or convertible to a string. The right-hand operand must be an integer expression (de­ci­mals are discarded), and can be a literal, variable, or an expression containing other operators, but do take precedence into account.

String operations include the index’ function (find first substring within another), and for finding the last match, the: rindex’ function. The most powerful is probably the substr’ function, since it can also appear on the left of assignment, which means it can be used to insert, delete and re­place parts of a string. The sprintf’ function can format strings.

You can also represent a string with one character from a code point with the chr function, and get the code point (ordinal) of the first character in a string with the ord function.

String Output

The printf function also provides for the %s (string), and %c (character) format specifiers. The %s specifier will format its argument as a string, while the %c specifier requires an integer value; this value must be the code point of a character.

The ‘%c’ was replaced with ‘A’, because the code point for the letter ‘A’ is 65. The second ‘%s’ caused the 0x7B (123 in decimal) to be converted to a string, and then printed. You can verify that the code point for ‘A’ is 65 with the ord function (returning a number), or manually perform the task of ‘%c’ with the chr function (which returns a string):

UTF-8

While Perl is fully Unicode compatible, you may sometimes get a “Wide character…” mes­sage dur­ing out­put. That is because the ‘STDOUT’ and ‘STDERR’ file handles are not initially set to UTF-8 en­cod­ing. You can stop the warnings by adding ‘no warnings 'utf8';’ at the top of your scripts (not recommended), or you can set the file handle encoding:

NOTEText Encoding

You can also use ‘":encoding(UTF-8)"’, in which case Perl will perform UTF-8 val­id­ity check­ing on the input and output strings. This is generally preferable, but you can ‘use utf8’ if you feel that your text is properly encoded, and you do not want the overhead.

You can check out the Unicode Introduction at perldoc, and/or the Unicode FAQ.

Generally speaking, you should not use Unicode for variable and function names in Perl, just to avoid potential problems. But your source file should be UTF-8 encoded, as good convention. To tell Perl that you are using a UTF-8 source file, add: ‘use utf8;’ at the top of your scripts.

UTF-8 is useful and ubiquitous. By all means, embrace it and use it. All modern POSIX terminals sup­port it as default. The Windows Command Prompt Console supports it to some degree with the: “change code­page command” (chcp 65001). XML and HTML use UTF-8. Decent editors support UTF-8. The list goes on.

Here is a code snippet from Getting Started with Perl that works portably across most, if not all operating systems that Perl supports and where UTF-8 terminals are used:

With the above code you are not reliant on the user setting the Console code page to 65001; the script does that itself. As bonus, it ensures that VT100/ANSI terminal escape sequences are processed by the Console.

Variables

Literals by themselves are of little use; at some point we would like to store values in variables. Traditionally, you only had to assign a value to a non-existent variable, and it would be au­to­ma­ti­cal­ly defined. This can cause some difficult bugs, so this is deprecated, even if it is the default be­hav­io­ur for legacy scripts.

Because of our recommendation to use strict; or implicitly: use v5.16; as minimum, you will have to define your variables before you can use them. You can initialise them at the same time as you define them.

Scalar variables can store single values of any kind: number, string, reference, etc. They can be modified to store any other kind of value at any time. You can start by storing a number in a variable, and later store a string in the same variable; Perl will not mind.

Variables can be interpolated in double-quoted string literals, and here documents / strings, as we mentioned before.

Variable Definitions

Although Perl has a number of ways to define variables, the most common and useful are: my and our. The latter is only useful when you start writing modules, so we will ignore it for now; which leaves the my function (as Perl documentation calls it). References to local should be avoided in the short term, and should not be used even when you understand it — fair warning, local does not work as you would expect, or even as the name suggests; especially if you have experience with other programming languages.

To define a single scalar variable, is simple: ‘my $ident;’. It can optionally be initialised at the same time: ‘my $ident=ident;’. The ‹ident›ifier cannot start with a digit, only underscores or alphabetic characters are allowed. The following characters may include digits.

SyntaxScalar Variable Definitions

  • my $ident; define scalar variable ‹ident› containing undef.
  • my ($ident); same as above, thus parentheses are optional.
  • my $ident=expr; define & initialise scalar variable ‹ident›.
  • my ($ident) = (expr); same as above; parentheses thus optional.
  • my ($ident1,ident2,); define arbitrary number of variables, all containing undef.
  • my ($ident1,ident2,) = (expr1,expr2,); define & initialise arbitrary number of variables; too few initialisers will cause undef to be placed in remaining variables.

Parentheses are not optional when defining several variables at the same time.

Scope

The term scope refers to an area inside a program where a user-defined identifier (like a variable name) is visible. If a variable is visible, we say it is “in scope”; if not visible, we say it is “out of scope”. Scopes can nest, and are introduced by curly-brace delimited blocks.

Every Perl script has a scope that is global to that script. Any block will create a nested scope. Inside a block you can have further blocks. For humans to keep track of the nesting, it is crucial that consistent indentation is used.

Identifiers defined on a higher level, are visible in all nested blocks. A name can be redefined in a nested block, in which case it will hide the higher-level identifier with the same name, until the end of the nested block.

When a nested variable is defined with the same name as a variable in a higher level, the new variable hides or shadows the original. Until this new name goes out of scope, the original value cannot be accessed.

Variable Assignment

Apart from the my prefix, the same syntax as above can be used to assign the results of ex­pres­sions to ex­ist­ing va­ri­ab­les, even multiple variables at the same time. In multiple assignments, the left-hand variable list can contain undef, in which case the corresponding right-hand ex­pres­sion will be ignored.

Although not shown in the example code above, it is not illegal to have too many expressions in the right-hand list — the unused ones will simply be discarded. Notice two further points:

Although this is more a quality of operators, and in particular the assignment operator, multiple variables can be given the same value with a simple pattern. Assuming the variables above still exist, we can set them all to 0 as follows:

This works because the assignment operator associates with its operands from right to left, so it is executed as if you parenthesised the expression as follows:

This is common idiom, and well-defined behaviour, so you are welcome to employ it liberally (with­out the superfluous parentheses, of course).

String Modifications

Parts of a string variable can be extracted with the substr function, which can also modify parts when appearing on the left of assignment.

ABC-DEF-GHI-JKL-MNO-PQR-STU-VWX-YZ
ABC-DEF-GHI-MNO-PQR-STU-VWX-YZ
ABC-DEF-GHI-#######-MNO-PQR-STU-VWX-YZ
ABC-DEF-GHI-MNO-PQR-STU-VWX-YZ
DEF-GHI-MNO-PQR-STU-VWX-YZ

Unlike some other languages, you cannot subscript a string to obtain single characters. You will have to use substr.

Variable References

A reference is like an address, in that it indicates the location of a value. This means that you may have two or more references to the same location. Accessing the value at that location indirectly via the reference will be an alias for the value stored there.

To take the reference of a variable, the backslash (\) is used as prefix operator.

Given the above variable definitions, we can access the values the references represent, by using an extra $ sigil (because the values are all scalars):

As we can see from the last line, a REF (reference) is a special type in Perl, but still a “kind of scalar” value.

Without further Perl knowledge, like arrays, hashes, subroutines and object-oriented pro­gram­ming, references are of little use. Do take note of the fact that assigning to an indirection changes the original value. Again, assuming the above variables are still in effect:

So, $$ref represents the exact same location and value as $var, and is in all respects, an alias for $var. Just like $$rrf is an alias for $ref, and thus $$$rrf is an alias for $var as well.

Variable Summary

Once variables have been defined, we can initialise them during the definition, or later assign new values to them at any time. The contents of variables can be expanded in double-quoted string literals, using interpolation The location where a variable is defined, determines its scope.

You can assign undef to a variable, or use ‘undef $ident›’ to “clear” the current value stored in a variable.

Parts of a string variable can be modified with the powerful substr function (which can also extract parts).

The readline function, or its alias <STDIN>, can be used to read strings from standard input, which can then be saved in a variable. If no assignment is made, the return value is stored in the default variable: $_. Passing no arguments to the print or say functions, will print the contents of $_. Lines read from input are often trimmed with the chomp function.


2017-12-22: Edited. [jjc]
2017-12-19: Created. [brx]