Use Java 8's forEach() method and Streams API to fine-tune and parallelize the behavior of Java iterators Credit: Felix Brönnimann Anytime you have a collection of things you will need some mechanism to systematically step though the items in that collection. As an everyday example, consider the television remote control, which lets us iterate through various television channels. Similarly, in the programming world, we need a mechanism to systematically iterate through a collection of software objects. Java includes various mechanisms for iteration, including index (for iterating over an array), cursor (for iterating over the results of a database query), enumeration (in early versions of Java), and iterator (in more recent versions of Java). The Iterator pattern An iterator is a mechanism that permits all elements of a collection to be accessed sequentially, with some operation being performed on each element. Essentually, an iterator provides a means of “looping” over an encapsulated collection of objects. Examples of using iterators include Visit each file in a directory (aka folder) and display its name. Visit each node in a graph and determine whether it is reachable from a given node. Visit each customer in a queue (for instance, simulating a line in a bank) and find out how long he or she has been waiting. Visit each node in a compiler’s abstract syntax tree (which is produced by the parser) and perform semantic checking or code generation. (You could also use the Visitor pattern in this context.) Certain principles hold for the use of iterators: In general, you should be able to have multiple traversals in progress at the same time; that is, an iterator should allow for the concept of nested looping. An iterator should also be nondestructive in the sense that the act of iteration should not, by itself, change the collection. Of course the operation being performed on the elements in a collection could possibly change some of the elements. It might also be possible for an iterator to support removing an element from a collection or inserting a new element at a particular point in the collection, but such changes should be explicit within the program and not a byproduct of the iteration. In some cases, you will also need to have iterators with different traversal methods; for instance, preorder and postorder traversal of a tree, or depth-first and breadth-first traversal of a graph. Iterators and the Gang of Four design patterns According to the Gang of Four (see below), the Iterator design pattern is a behavioral pattern, whose key idea is “to take the responsibility for access and traversal out of the list [ed. think collection] object and put it into an iterator object.” This article isn’t as much about the Iterator pattern as it is about how iterators are used in practice. To fully cover the pattern would require discussing how an iterator would be designed, participants (objects and classes) in the design, possible alternative designs, and tradeoffs of different design alternatives. I’d rather focus on how iterators are used in practice, but I’ll point you to a few resources for investigating the Iterator pattern and design patterns generally: Design Patterns: Elements of Reusable Object-Oriented Software (Addison-Wesley Professional, 1994) written by Erich Gamma, Richard Helm, Ralph Johnson, and John Vlissides (also known as the Gang of Four or simply GoF) is the definitive resource for learning about design patterns. Although the book was first published in 1994, it remains a classic, as evidenced by the fact that there have been more than 40 printings. Bob Tarr, a lecturer at University of Maryland Baltimore County, has an excellent set of slides for his course on design patterns, including his introduction to the Iterator pattern. David Geary’s JavaWorld series Java Design Patterns introduces many of the Gang of Four design patterns, including the Singleton, Observer, and Composite patterns. Also on JavaWorld, Jeff Friesen’s more recent three-part overview of design patterns includes a guide to the GoF patterns. Active iterators vs passive iterators There are two general approaches to implementing an iterator depending on who controls the iteration. For an active iterator (also known as explicit iterator or external iterator), the client controls the iteration in the sense that the client creates the iterator, tells it when to advance to the next element, tests to see if every element has been visited, and so on. This approach is common in languages like C++, and it is the approach that receives the most attention in the GoF book. Although iterators in Java have taken different forms, using an active iterator was essentially the only viable option prior to Java 8. For a passive iterator (also known as an implicit iterator, internal iterator, or callback iterator), the iterator itself controls the iteration. The client essentially says to the iterator, “perform this operation on the elements in the collection.” This approach is common in languages like LISP that provide anonymous functions or closures. With the release of Java 8, this approach to iteration is now a reasonable alternative for Java programmers. To illustrate the various approaches to iteration in Java, I need an example of a collection and something that needs to be done with its elements. For the initial part of this article I’ll use a collection of strings representing names of things. For each name in the collection, I will simply print its value to standard output. These basic ideas are easily extended to collections of more complicated objects (such as employees), and where the processing for each object is a little more involved (like giving each highly rated employee a 4.5 percent raise). Iteration with the Enumeration class In Java 1.0 and 1.1, the two primary collection classes were Vector and Hashtable, and the Iterator design pattern was implemented in a class called Enumeration. In retrospect this was a bad name for the class. Do not confuse the class Enumeration with the concept of enum types, which didn’t appear until Java 5. Today both Vector and Hashtable are generic classes, but back then generics were not part of the Java language. The code to process a vector of strings using Enumeration would look something like Listing 1. Listing 1. Using enumeration to iterate over a vector of strings Vector names = new Vector(); // ... add some names to the collection Enumeration e = names.elements(); while (e.hasMoreElements()) { String name = (String) e.nextElement(); System.out.println(name); } Iteration with the Iterator class Java 1.2 introduced the collection classes that we all know and love, and the Iterator design pattern was implemented in a class appropriately named Iterator. Because we didn’t yet have generics in Java 1.2, casting an object returned from an Iterator was still necessary. For Java versions 1.2 through 1.4, iterating over a list of strings might resemble Listing 2. Listing 2. Using an Iterator to iterate over a list of strings List names = new LinkedList(); // ... add some names to the collection Iterator i = names.iterator(); while (i.hasNext()) { String name = (String) i.next(); System.out.println(name); } Iteration with generics and the enhanced for-loop Java 5 gave us generics, the interface Iterable, and the enhanced for-loop. The enhanced for-loop is one of my all-time-favorite small additions to Java. The creation of the iterator and calls to its hasNext() and next() methods are not expressed explicitly in the code, but they still take place behind the scenes. Thus, even though the code is more compact, we are still using an active iterator. Using Java 5, our example would look something like what you see in Listing 3. Listing 3. Using generics and the enhanced for-loop to iterate over a list of strings List<String> names = new LinkedList<String>(); // ... add some names to the collection for (String name : names) System.out.println(name); Java 7 gave us the diamond operator, which reduces the verbosity of generics. Gone were the days of having to repeat the type used to instantiate the generic class after invoking the new operator! In Java 7 we could simplify the first line in Listing 3 above to the following: List<String> names = new LinkedList<>(); Iteration with the forEach() method Before delving into Java 8 iteration features, let’s reflect on what’s wrong with the code shown in the previous listings–which is, well, nothing really. There are millions of lines of Java code in currently deployed applications that use active iterators similar to those shown in my listings. Java 8 simply provides additional capabilities and new ways of performing iteration. For some scenarios, the new ways can be better. The major new features in Java 8 center on lambda expressions, along with related features such as streams, method references, and functional interfaces. These new features in Java 8 allow us to seriously consider using passive iterators instead of the more conventional active iterators. In particular, the Iterable interface provides a passive iterator in the form of a default method called forEach(). A default method, another new feature in Java 8, is a method in an interface with a default implementation. In this case, the forEach() method is actually implemented using an active iterator in a manner similar to what you saw in Listing 3. Collection classes that implement Iterable (for example, all list and set classes) now have a forEach() method. This method takes a single parameter that is a functional interface. Therefore the actual parameter passed to the forEach() method is a candidate for a lambda expression. Using the features of Java 8, our running example would evolve to the form shown in Listing 4. Listing 4. Iteration in Java 8 using the forEach() method List<String> names = new LinkedList<>(); // ... add some names to the collection names.forEach(name -> System.out.println(name)); Note the difference between the passive iterator in Listing 4 and the active iterator in the previous three listings. In the first three listings, the loop structure controls the iteration, and during each pass through the loop, an object is retrieved from the list and then printed. In Listing 4, there is no explicit loop. We simply tell the forEach() method what to do with the objects in the list — in this case we simply print the object. Control of the iteration resides within the forEach() method. Iteration with Java streams Now let’s consider doing something slightly more involved than simply printing the names in our list. Suppose, for example, that we want to count the number of names that begin with the letter A. We could implement the more complicated logic as part of the lambda expression, or we could use . Let’s take the latter approach. Collections and arrays can be used to generate streams–but beware: Streams are not collections that store elements! Rather, think of a stream as a mechanism for carrying a sequence of values from a source through a pipeline of operations. A stream pipeline consists of the stream source, intermediate operations that transform the stream and produce a new stream, and a terminal operation that either produces a result or calls the forEach() method. In general, stream operations are lazy, in the sense that they are performed only when the result is needed. An example should help clarify these concepts. As illustrated in Listing 5, the list names is used to create the stream source for the operation pipeline. Then a filter is applied to transform the source stream into a stream consisting only of those names that start with the letter A. The filter() method has a single parameter of type Predicate, which is simply a boolean-valued function. A lambda expression is the perfect way to express the actual parameter. Finally, the stream’s count() method is applied as the terminal operation that yields the result. Listing 5. Using a stream pipeline of operations List<String> names = new LinkedList<>(); // ... add some names to the collection long count = names.stream() .filter(name -> name.startsWith("A")) .count(); Why Java streams are important To understand why streams matter, compare the code in Listing 5 with the code in Listing 6 below, which implements the same functionality using an active iterator. Listing 6. Using active iterators in Java 7 List<String> names = new LinkedList<>(); // ... add some names to the collection long count = 0; for (String name : names) { if (name.startsWith("A")) ++count; } You might still wonder why anyone would prefer the code in Listing 5 over Listing 6, which probably looks more familiar to most Java developers. It can be argued that the declarative approach in Listing 5 is actually more readable and less error prone, at least once you understand the basic concepts of streams and stream operations. In particular, depending on the context, the logic in Listing 6 might not be thread-safe, while Listing 5 is thread safe. It is also easier to parallelize the operations shown in Listing 5. Parallel streams Collection classes do not only have a stream() method, which returns a sequential stream, but they also have a parallelStream() method, which returns a parallel stream. Parallel streams have the potential to improve performance by allowing pipeline operations to be executed concurrently in separate Java threads; but note that the order in which collection elements are processed can change. When I used a parallel stream to simply print the names in a list, the order that they were printed in was not the order that they appeared in, in the list. Parallel streams are implemented using the Fork/Join framework that was added to Java 7. Parallelizing the operations in a stream pipeline is usually as simple as replacing the call to stream() in Listing 5 with a call to parallelStream(). Benchmarking performance: Active vs passive iterators While there are several advantages to using the passive iterators now available in Java 8, I thought that it would be interesting to test whether or not they provide a performance improvement. In order to compare the performance of active iterators, streams, and parallel streams, I created a simple, non-scientific benchmark that allowed me to test all three iteration approaches with several different collection classes. Similar to Listings 5 and 6 above, the benchmark code counts the number of names in a collection that begin with the letter A. Listing 7 shows the actual benchmark methods that I used to test the three approaches. Listing 7. Benchmarking code private static long testActiveIterator(Collection<String> names) { long count = 0; for (String name : names) { if (name.startsWith("A")) ++count; } return count; } private static long testSequential(Collection<String> names) { return names.stream() .filter(name -> name.startsWith("A")) .count(); } private static long testParallel(Collection<String> names) { return names.parallelStream() .filter(name -> name.startsWith("A")) .count(); } On my computer, which has a dual-core Pentium processor, 4G RAM, the 64-bit version of Windows 7, and a 64-bit version of Java 8, I timed the calls to each of these three methods using five different collection classes (LinkedList, ArrayList, HashSet, LinkedHashSet, and TreeSet and two different collection sizes (1,000,000 and 2,000,000). For each collection class and size, I ran the benchmark several times and summed the results, which are summarized in Tables 1 and 2. In some cases, the benchmark results were not what I expected. More extensive benchmarking should be performed using different computer configurations in a production or simulated production scenarios before drawing any definitive conclusions. But based on this simple benchmark and the results shown, the following statements appear to be true: For the linked data structures LinkedList and LinkedHashSet, active iterators and streams appear to have similar performance, and both outperformed parallel streams on this benchmark. For ArrayList and TreeSet, active iterators and streams appear to have similar performance, but parallel streams outperformed both on this benchmark. For HashSet, parallel streams outperformed active iterators, and active iterators outperformed streams on this benchmark. Conclusion New features in Java 8 support a more declarative approach to iteration, in that we specify the result we want rather than how to compute it. Benefits of the new approach are that it can be more readable, less error prone, and more easily parallelized. Simple benchmarks suggest that the parallel versions are not necessarily faster for every collection class, however. About the author John I. Moore, Jr., Professor of Mathematics and Computer Science at The Citadel, has a wide range of experience in both industry and academia, with specific expertise in the areas of object-oriented technology, software engineering, and applied mathematics. For more than three decades he has designed and developed software using relational databases and several high-order languages, and he has worked extensively in Java since version 1.1. In addition, he has developed and taught numerous academic courses and industrial seminars on advanced topics in computer science. Learn more about this topic For another look at benchmarking parallel streams, see “From Imperative Programming to Fork/Join to Parallel Streams in Java 8” by Raoul-Gabriel Urma and Mario Fusco (InfoQ, February 2014). To learn more about Java’s Fork/Join framework, see the Java Tutorial section on Fork/Join or the article “A Java Fork/Join Framework” by Doug Lea, the principal designer of the framework. Also see Edward Harned’s critical review of the Fork/Join framework, “A Java Fork-Join Calamity” (Cooperative Software Systems, last updated July 2014). JavaProgramming LanguagesDesign PatternsSoftware Development