Arrays and Collections

Creating and Using Aggregate Data Structures

Non-trivial programs often need to work with collections of values. Java offers relatively simple arrays, and a veritable multitude of collection classes. Files can be written from these collections, and text files read into collections.

Statements and Functions

Arrays and Collections

PREREQUISITES — You should already…

Arrays

java-logo

Arrays affords a simple shorthand syntax to create a mutable collection of data, all having the same type. The items in an array, are traditionally called elements, though the Java Specification on arrays prefers the term components). Al­though arrays are reference types, you can avoid calling the new operator under some circumstance supported by shorthand syntax.

Elements in an array can be accessed in constant time. This means you can access any element in the same time, regardless of the size of the array. Arrays, like any class, have constructors, methods and a few public fields. Although the elements are mutable, the size of an array cannot change after created. The java.util.Array class, is part of the Java Collections Framework, and provides are large number of static functions that perform various operations on arrays.

Technically arrays are not a class type, but acts as if, for all intents and purposes, it inherits from Object. Arrays have the same element type, shares the same internal Class object.

Defining Array Variables

Like any variable definition in Java, the type of the array must appear before the array variable identifier. In addition to a ‹type›, an empty set of square brackes ([]) are required to indicate that this is any array of ‹type›; meaning each element is of a single ‹type›.

SyntaxArray Variable Definition

  • type[]ident; define an array variable.
  • type› ‹ident[]; alternative syntax to define an array variable.
  • type[]ident= {init-list}; define, allocate and initialise array variable.
  • type[]ident= newtype[size]; define variable; allocate space.
  • init-list initialiser list; series of values, separated by comma.
  • size integral number of elements to create space for.

When space for an array is created with ‘newtype[size]’, the elements are initialised to default values for the type. For the primitive types, this means the space is filled with zero, and for class types (reference type), the values will be null.

Array variable definition observations
  • Variables ia1 and ia2 are uninitialised and do not reference any elements.
  • Variable ia3 uses a shorthand syntax to call new and initialise elements.
  • For ia4, Java provides no syntax to also initialise the elements. They must be individually assigned as necessary.
  • An initialiser list may optionally have a comma after the last expression (see ia3).

At any point, you can allocate a new array object with ‘newtype[size]’, where ‹size› can be any expression producing an int value. Normally, you would assign the result to an array variable of the right type (‹type[]). Each element will be zero’d by new, or made null for reference types.

Arrays can be defined final inside a function, which means it must be initialised. This does not mean the elements are final; it just means that you cannot let the array variable reference an­other array object. At class level, an array member can also be final, and even static, or a com­bi­na­tion of both.

Arrays Are Reference Types

Logically, we visualise an array as a contiguous collection of values in memory. The number of values cannot change, but the elements can change. Once an element is selected from a ‹type[] array, the resulting type is ‹type›. You can do anything with this result, that you could do with a simple ‹type› variable.

ia Array Memory Representation
ia Array Memory Representation

It does not matter which syntax was chosen to define and populate the elements of ia above, the memory representation and layout will remain the same. We really must understand this well: the variable ia, which you may abstractly think of as your array, is not in reality — it only stores a reference to the real array in memory. Consider the following:

ia & ib Reference Same Memory
ia & ib Reference Same Memory

From above, you can see that ib[1] and ia[1], for example, reference the exact same value in memory; that is because both ia and ib contain the same value, and thus reference the same memory.

Subscripting/Indexing

As we have seen by example, array variable can use a subscript operator to select (or reference) a single element (or component) in the array object. The integer operand between the square brackets, must be positive, and is treated as an offset, although many call it an index, or subscript.

Offset 0 references the first element, when used with the subscript operator. It stands to reason then, that the last element is at offset ‘‹arr.length-1’, where ‹arr› is any array expression, like an array variable. We can use this fact in a for loop, to iterate through the elements of an arbitrary-sized array:

The code above increases the values of each element by 50%, which represents the rate of in­fla­tion in some locales.

If you were to use a negative offset, or an offset that is bigger or equal to an array’s length, Java will throw and IndexOutOfBoundsException. This does add some runtime overhead, but is a fair price, considering the disasters that will result otherwise — just ask the C/C++ programmers.

For-Each Loop and Arrays

We can also use the “For-Each Loop” to iterate through the elements. However, this kind of loop cannot be used to modify the elements, since the for-each loop makes successive copies of each element. It is also not useful, if you need to keep track of the offset of each element. However, it is the most generic pattern to use, since the for-each loop not only work with arrays, but with all Java collections, whereas the subscript operator is limited to arrays.

For arrays then, a for-each loop is not that useful, but slightly more succinct, especially if you do not have a use for the offset. Even with an array of reference types, this statement still applies.

Pass/Return Arrays to/from Functions

We can pass arrays to functions. The code extract above can be placed in a function, so that the function can be used to arbitrarily modify each element, of any array, by some percentage, which we can pass as an extra argument:

ArrayFunction.javaPassing Array to a Function
da[0] = 1.80           │    db[0] = 0.60
da[1] = 3.45           │    db[1] = 1.15
da[2] = 5.10           │    db[2] = 1.70
da[3] = 6.75           │    db[3] = 2.25
                       │    db[4] = 2.80
                       │    db[5] = 3.35

It should be clear that a copy of a reference to an array is passed to the applyPercentage function, and be­cause the elements are mutable, we can modify the elements referenced via the parameter (arr), which contains the copy of the reference.

WARNINGArray Elements Modifiable in Functions

Because passing and array, means passing a reference, a function can modify the elements of the array. This can be useful, but is a bug when elements are unintentionally modified. If a function is designed not to modify elements, care should be taken in its implementation. Similarly, if a function is designed to modify an array, the caller should manually make a copy of the original elements if needed.

Returning Arrays

Just as we can pass arrays, we can return arrays, but should remember, that we will be returning a reference. This is not as big an issue as it may sound, if you are not yet that comfortable with references.

ReturnArray.javaFunction Returning an Array

On a practical note, depending on what you want to do with an array of char values, you do not necessarily have to convert it to a String as we have done above. Both out.println statements above, will output the same value, since println has an overloaded version that can take a char array as argument.

Select Array Operations

Although you write any algorithm possible using subscripting and iteration, the java.util.Arrays class may have what you need. There may appear to be many static functions in this class, but considering that almost all have overloaded versions for different ‹type› of arrays, it is not really overwhelming.

An easy function, which is useful for learning and debugging, is toString. It will return a formatted String containing all the array elements, enclosed in square brackets.

ia = [11, 22, 33, 44, 55, 66]
[I@42a57993

As you can see, the toString that arrays inherit from Object, does not produce anything useful, unlike …Arrays.toString(). If the formatting suits you, or for debugging, you may find this use­ful. We used an int[] (int array), but instead of int, you could use any of the primitive types, or an array of any ‹type› really, even Object[]:

oa = [ABC, 22, D, 44.5, 55]
sa = [GHI, JKL, MNO]

Instead of writing a loop to copy elements from one array to another, we can simply use one of the copyOf functions.

da = [1.2, 2.3, 3.4, 4.5, 5.6, 6.7]
db = [1.2, 2.3, 3.4, 4.5, 5.6, 6.7]
dc = [1.2, 2.3, 3.4, 4.5, 5.6, 6.7]
dd = [4.5, 5.6, 6.7]
da == db: true

Note that for the copyOfRange versions, the upper bound is exclusive; the last offset used, will be one less than the upper bound. Notice that we also used another useful function: equals, that can compare two arrays. You can also use fill to set all the elements of an existing array to a par­ti­cu­lar value.

Two other potentially useful functions, are sort to sort arrays in-place, and binarySearch to ef­fi­ci­ent­ly find a key in a sorted array.

NOTEImporting java.util.Arrays

If you work with java.util.Arrays a lot, you may want to add import java.util.Arrays; at the top of your source file; then you can use: Arrays.toString(…), for example.

Or, less commendable, add: import java.util.* to use all java.util classes without the java.util prefix. You can go further, and add: import static java.util.Arrays.*; (but still will have to add import java.util.Array; at least before), and then you can use most of the functions without prefix:

You cannot use toString or equals without qualification, as shown.

Collections

There are so many collections, the Java documentation calls it the Collection Framework. The rea­son we need more than arrays, is a matter of performance related to the kind of operations your algorithm require to be efficient. For example, you can insert an element in the beginning of an array (losing the last one), but the larger the array, the slower the process will be; an array is therefore not the best solution if your algorithm depends fast insertions in a list of values.

ArrayList

An java.util.ArrayList<E> is a generic type (which effectively can work with any elements of type E), is very efficient with arbitrary deletions and insertions of items in a collection. Furthermore, un­like an array, which is of a fixed size, the number of items in an ArrayList can in­crease or de­crease ar­bi­tra­ri­ly.

One major caveat: generic types, including ArrayList does not work well with primitive types: you have to use primitive wrappers, or if speed is important, look for a third party library, like trove4j.

To define an ArrayList<E> variable, you must replace E with a (non-primitive) ‹type›, for ex­am­ple: Integer, String or Double, etc. To add elements, you call the .add(item) instance method. You cannot use subscript.

To continue from above, to change an existing element, the .set(offset,‹item›) method can overwrite and existing item with the new ‹item› at the specified ‹offset›.

As with arrays, if the ‹offset› is out of bounds, an exception will be thrown. The current number of items in the collection, can be determined with the .size() method — which can be 0 for an ob­ject containing no items. You can also use the .isEmpty() method to determine if a list is empty.

As promised, the for-each loop works well with any collection, include ArrayList expressions:

Although slightly advanced, we do show two examples using the .forEach(…) method as an al­ter­na­tive to physical for(…) statements:

You can easily .remove() items from a collection by offset, or by value. You can find the offset of an item with .indeOf() — it finds the first, while lastIndexOf() will search from the back.

From java.util.Collection, ArrayList inherits the .stream() method, which is the closest equi­va­lent to LINQ (Language INtegrated Query) in C# (fluent syntax). The java.util.stream in­ter­face is implemented by ArrayList, and allows you to chain method calls together, since many methods return a stream.

ArrayListStream.javaChaining Stream Methods on ArrayList
Original ArrayList : [66, 22, 77, 44, 11, 33, 88, 55]
Sorted even numbers: [22, 44, 66, 88]

We could have called .sorted() before .mapToInt(), and would still have achieved the desired result. Afficionados were very exited when streams appeared in Java 8, and you can see why.

Although an ArrayList can grow and shrink (via trimToSize()) arbitrarily, and reasonably efficiently, it is on par with normal arrays as far as constant time item access is concercerned. It is a good all-rounder, since it has a good selection of list and array characteristics. An ArrayList’s main weakness, is inserting items at the start, which is still slow, relatively speaking.

LinkedList

When arbitrary insertions at the beginning or at the end of a collection are common tasks for a certain algorithm, a java.util.LinkList<E> might be the best option. Like ArrayList, it also inherits from java.util.Collection, implements similar inferfaces, and as a result have very similar operations.

Accessing items in the List<E> by offset, gets slower the larger the number of items, since it has to count from the start of the list. In general then, you can easily replace an ArrayList with a List and vice versa, without affecting the rest of your code. So in ArrayListStream.java above, you only have to replace the variable type ArrayList, with LinkedList, while the rest of the code remains the same:

Just for interest’s sake, here is an alternative solution to initialising a collection, making use of the fact that most collections have a constructor that can accept a list of some kind:

It does beat having a series of .add(…) method calls.

Sets

The main characteristic of any set, is that it can contain no duplicates, and that is exactly what many programmers use it for: to eliminate duplicates, for adding an item that already exists, has no effect. There are different implementations of sets, but like ArrayList and LinkedList, they all inherit from java.util.Collection, and implement the java.util.Set<E> interface.

HashSet vs SortedSet

These are two implementations of java.util.Set<E>, each having different char­ac­te­ris­tics. The im­ple­men­ta­tion of java.util.HashSet<E>, uses hash table, that over­all makes it the best per­for­mer. The advantage of java.util.SortedSet<E>, is that the ele­ments are in a natural or­der­ing; this does require more work, but the ad­van­tage is that your items in the set are always sorted, re­gard­less of the order in which you added the items.

Intersection and Union

Now, if your were looking for set intersection, union, difference, etc. methods, you will be dis­ap­point­ed. At least we can si­mu­late a union with .addAll() and an in­ter­sec­tion with .retainAll() from Collection (which all col­lec­tion clas­ses in­her­it).

If you are to use those statements a lot, you could of course wrap it in a function. This answer on Stack Overflow has examples. Here is a number of so­lu­tions to ob­tain a union and an in­ter­sec­tion be­tween two sets:

SetUnionIntersect.javaSet Union and Intersection Examples
import static java.lang.System.*;

public class SetUnionIntersect {

public static void main (String[] args) {

   java.util.LinkedList<String> tennis
      = new java.util.LinkedList<String>(java.util.Arrays.asList(
         "petra", "garry", "john", "peter", "vera"));
   java.util.ArrayList<String> golf
      = new java.util.ArrayList<String>(java.util.Arrays.asList(
         "sandra", "louis", "john", "garry"));

   out.println("Who plays tennis: " + tennis);
   out.println("Who plays golf  : " + golf);

   /* 1a) find persons playing both sports (intersection)
   */ {
      java.util.List<String> both_sports =
         new java.util.ArrayList<String>();
      for (String s : tennis)
         if (golf.contains(s))
            both_sports.add(s);
      out.println("1a) Both sports : " + both_sports);
      }

   /* 1b) find persons playing *any* sport (union)
   */ {
      java.util.Set<String> play_sport 
         = new java.util.HashSet<String>();
      play_sport.addAll(tennis);
      play_sport.addAll(golf);
      out.println("1b) Play a sport: " + play_sport);
      }

   /* 2a) find persons playing both sports (intersection)
   */ {
      java.util.Collection<String> both_sports =
         new java.util.TreeSet<String>(tennis);
      both_sports.retainAll(golf);
      out.println("2a) Both sports : " + both_sports);
      }

   /* 3a) find persons playing both sports (intersection)
   */ {
      java.util.List<String> both_sports = tennis.stream()
         .filter(golf::contains)
         .collect(java.util.stream.Collectors.toList());
      out.println("3a) Both sports : " + both_sports);
      }

   /* 3b) find persons playing *any* sport (union)
   */ {
      java.util.List<String> play_sport 
         = java.util.stream.Stream.concat(tennis.stream(), golf.stream())
            .distinct()
            .collect(java.util.stream.Collectors.toList());
      out.println("3b) Play a sport: " + play_sport);
      }

   exit(0);
   }

}//class

We have fully qualified all the names, but you are of course welcome to add import statements to reduce the vebosity considerably. At least this way, you do not have to guess where each method or class resides.

The one advantage is that the sets do not have to be the same type. We made one an ArrayList and the other a LinkedList, yet it caused no problems. By the same token, the results were different types of collections, just to show how generic the classes can be.

Maps or Dictionaries

Java prefer the term “map”, but you may know the concept as “dictionary”, or a collection of “key-value pairs”. Java’s java.util.Map<K,V> interface is implemented by a number of classes, which means they work similarly, but have different performance characteristics for the same operations.

The java.util.HashMap<K,V> class uses a hash table, which is overall quite efficient, but the order in which the keys are retrieved is not specified. The java.util.TreeMap<K,V> uses a red-black tree, which keeps the keys ordered, with slight overhead. The K represents the type of the key, and V the type of the value. For example, if we wanted to associated keys of type Integer to values of type String, we could define a variable ishm (integer-string hash map) as follows:

Instead of and .add() method as with sets, all maps provide the .put(K,V) method to either mo­di­fy the value for an existing key, or add a new entry in the map. We could add a num­ber of en­tries (“items” if you like), into the ishm map we defined above, and print out all the values with a for-each loop, using a set of keys obtained from .entrySet():

1040: Value for 1040
1010: Value for 1010
1030: Value for 1030
1000: Value for 1000
1020: New value for 1020

As you can see, the second ishm.put(1020,…) modifies the original value for key 1020.

Instead of: …Map.Entry<?,?>…, you may want to use: …Map.Entry<Integer,String>…, but for the purpose of our code above, the simpler version is fine. The java.util.Map.Entry interface re­pre­sents a single key-value pair (an “entry” or “item”).

You can obtain the value for a given key with the .get(K) method. This method re­turns null if ei­ther the val­ue for the key is null, or the key does not exist. To check if a key does not exist, you can use the .containsKey(Object) to check if a key exists.

To delete an entry, you can use .remove(Object), or remove all entries with .clear(). Checking if a map is empty, requires the .isEmpty() method, while .size() returns the number of entries in the map. Below, we remove the entry with the key 1020, and check if it still present:

A list of keys can be obtained with the .keySet() method. For a HashMap, the keys are retrieved in arbitrary order, and for a [TreeHash][od-util-hashtree], the keys are in sorted order. The type of value returned by the .keySet() method is java.util.Set<K>. We can use this to iterate through the map, using the set of keys:

Map in arbitrary order:
1040: Value for 1040
1010: Value for 1010
1030: Value for 1030
1000: Value for 1000

Often, we want the keys in sorted order. There are several ways to accomplish this, but there is no method that returns ordered keys; we have to either use a TreeMap from the start, or assign the set to some other type which can be sorted with, for example, java.util.Collections.sort(). We show three possibilities below:

This about covers the major tasks we tend to perform on maps/dictionaries. Java provides nu­me­rous other collections, and if high-performance is important, you can try the collection utilities set from the highly-acclaimed Guava Core Libraries.

File Input/Ouput

Before long you may be required to read and/or write files. This requires iteration statements, special classes and a handful of concepts. Understand that reading an writing files are not related to “file handling”, which concerns delete, renaming, copying and moving files.

Prior to Java 8, file IO used to be somewhat cumbersome, but now much improved and more concise. However, you should unfortunately still have some grasp of the older NIO and NIO2 classes, since a) they are not deprecated, and b) you will still find it in older programs which you may have to maintain.

Portability is a concern when dealing with files, because different operating systems have in­com­pa­ti­ble file naming rules, different path separators, different permissions, etc. This you have to deal with to some extent in your programs; Java does not entirely shield you from it.

The java.io package represents the original file I/O classes, but has mostly (though not entirely) su­per­se­ded by the java.nio package since Java 1.4. Subsequent versions of the JDK added incremental enhancements to java.nio. A full treatment would involve combining these I/O classes with the Java 8 java.util.stream package, and lambdas (although older code may use anonymous classes). For beginners, this is quite overwhelming, so this section will focus on practical and typical use, not all the fancy detail.

Dealing With Paths

The java.nio.file package houses, amongst others, the Files class, which provides a number of static methods to manipulate files and di­rec­to­ries. Many of these methods return or require as ar­gu­ments, a Path interface, im­ple­men­ted by the Paths class, all in the same package.

A Path is an abstraction of file system naming. It may be absolute, which means it has a root, or re­la­tive, which represents a relation to another directory, commonly the current working directory. You can see several examples in the Path Operations section of the official Java Tutorials.

Current Working Directory

When your program starts, either is was opened from the command line, or via a shortcut. Either way, the directory in the shortcut, or the current working directory in your shell, becomes your programs current directory. This is not necessarily the same directory where your program code resides. The following seems to reliably do the trick:

But you can also just query a system property with java.utils.Properties.getProperty(). The re­le­vant property is "user.dir", but you should also be aware of

Other mechanisms are available, but one of the above two should suffice for most purposes.

Application Directory

The get the application’s directory, we need the location of the class that contains the main function. This ‘.class’ file could reside in a ‘.jar’ file, and we generally do not care about the .jar file name, for this purpose.

First, we have to obtain an object of a java.lang.Class<T> type, which we can get from the getClass method, inherited by all types from java.lang.Object, if we had an instance of our main class. Or we can hard code the class name, which is generally not acceptable. The alternative is a bit more convoluted, and involves even more reflection. Either way, now it is possible for exceptions to be thrown, which means we must deal with it.

You must replace the ‹main-class-name› class literal part below, with your main class name (the one containing the main function):

To get a Class<T> object, without hard coding a name, we can use a “trick”: get the main function by inspecting the current stack trace, an select the last item in the list (practically, that means the first stack trace element).

This should ultimately be wrapped in some function, probably in some utility class, since one would normally only call it once during application initialisation. We do that in the following complete listing, and also add a function to return the name of the executable (.class or .jar file).

AppDir.javaFind Application Directory & Name

File Operations

Statements and Functions

Arrays and Collections

2018-01-02: Created. [brx]