Java Stream Collect

Advanced collect

We already saw how we used Collectors.toList() to get the list out of the stream. Let’s now see a few more ways to collect elements from the stream.

joining

@Test
public void whenCollectByJoining_thenGetJoinedString() {
    String empNames = empList.stream()
      .map(Employee::getName)
      .collect(Collectors.joining(", "))
      .toString();
    
    assertEquals(empNames, "Jeff Bezos, Bill Gates, Mark Zuckerberg");
}

Collectors.joining() will insert the delimiter between the two String elements of the stream. It internally uses a java.util.StringJoiner to perform the joining operation.

toSet

We can also use toSet() to get a set out of stream elements:

@Test
public void whenCollectBySet_thenGetSet() {
    Set<String> empNames = empList.stream()
            .map(Employee::getName)
            .collect(Collectors.toSet());
    
    assertEquals(empNames.size(), 3);
}

toCollection

We can use Collectors.toCollection() to extract the elements into any other collection by passing in a Supplier<Collection>. We can also use a constructor reference for the Supplier:

@Test
public void whenToVectorCollection_thenGetVector() {
    Vector<String> empNames = empList.stream()
            .map(Employee::getName)
            .collect(Collectors.toCollection(Vector::new));
    
    assertEquals(empNames.size(), 3);
}

Here, an empty collection is created internally, and its add() method is called on each element of the stream.

summarizingDouble

summarizingDouble() is another interesting collector—which applies a double-producing mapping function to each input element and returns a special class containing statistical information for the resulting values:

@Test
public void whenApplySummarizing_thenGetBasicStats() {
    DoubleSummaryStatistics stats = empList.stream()
      .collect(Collectors.summarizingDouble(Employee::getSalary));

    assertEquals(stats.getCount(), 3);
    assertEquals(stats.getSum(), 600000.0, 0);
    assertEquals(stats.getMin(), 100000.0, 0);
    assertEquals(stats.getMax(), 300000.0, 0);
    assertEquals(stats.getAverage(), 200000.0, 0);
}

Notice how we can analyze the salary of each employee and get statistical information on that data—such as min, max, average etc.

summaryStatistics() can be used to generate similar results when we’re using one of the specialized streams:

@Test
public void whenApplySummaryStatistics_thenGetBasicStats() {
    DoubleSummaryStatistics stats = empList.stream()
      .mapToDouble(Employee::getSalary)
      .summaryStatistics();

    assertEquals(stats.getCount(), 3);
    assertEquals(stats.getSum(), 600000.0, 0);
    assertEquals(stats.getMin(), 100000.0, 0);
    assertEquals(stats.getMax(), 300000.0, 0);
    assertEquals(stats.getAverage(), 200000.0, 0);
}

partitioningBy

We can partition a stream into two—based on whether the elements satisfy certain criteria or not.

Let’s split our List of numerical data, into even and odds:

@Test
public void whenStreamPartition_thenGetMap() {
    List<Integer> intList = Arrays.asList(2, 4, 5, 6, 8);
    Map<Boolean, List<Integer>> isEven = intList.stream().collect(
      Collectors.partitioningBy(i -> i % 2 == 0));
    
    assertEquals(isEven.get(true).size(), 4);
    assertEquals(isEven.get(false).size(), 1);
}

Here, the stream is partitioned into a Map, with even and odds stored as true and false keys.

groupingBy

groupingBy() offers advanced partitioning—where we can partition the stream into more than just two groups.

It takes a classification function as its parameter. This classification function is applied to each element of the stream.

The value returned by the function is used as a key to the map that we get from the groupingBy collector:

@Test
public void whenStreamGroupingBy_thenGetMap() {
    Map<Character, List<Employee>> groupByAlphabet = empList.stream().collect(
      Collectors.groupingBy(e -> new Character(e.getName().charAt(0))));

    assertEquals(groupByAlphabet.get('B').get(0).getName(), "Bill Gates");
    assertEquals(groupByAlphabet.get('J').get(0).getName(), "Jeff Bezos");
    assertEquals(groupByAlphabet.get('M').get(0).getName(), "Mark Zuckerberg");
}

In this quick example, we grouped the employees based on the initial character of their first name.

mapping

groupingBy() discussed in the section above, groups elements of the stream with the use of a Map.

However, sometimes we might need to group data into a type other than the element type.

Here’s how we can do that; we can use mapping(), which can adapt the collector to a different type—using a mapping function:

@Test
public void whenStreamMapping_thenGetMap() {
    Map<Character, List<Integer>> idGroupedByAlphabet = empList.stream().collect(
      Collectors.groupingBy(e -> new Character(e.getName().charAt(0)),
        Collectors.mapping(Employee::getId, Collectors.toList())));

    assertEquals(idGroupedByAlphabet.get('B').get(0), new Integer(2));
    assertEquals(idGroupedByAlphabet.get('J').get(0), new Integer(1));
    assertEquals(idGroupedByAlphabet.get('M').get(0), new Integer(3));
}

Here mapping() maps the stream element Employee into just the employee ID—which is an Integer—using the getId() mapping function. These IDs are still grouped based on the initial character of employee first name.

reducing

reducing() is similar to reduce() – which we explored before. It simply returns a collector which performs a reduction of its input elements:

@Test
public void whenStreamReducing_thenGetValue() {
    Double percentage = 10.0;
    Double salIncrOverhead = empList.stream().collect(Collectors.reducing(
        0.0, e -> e.getSalary() * percentage / 100, (s1, s2) -> s1 + s2));

    assertEquals(salIncrOverhead, 60000.0, 0);
}

Here, reducing() gets the salary increment of each employee and returns the sum.

reducing() is most useful when used in a multi-level reduction, downstream of groupingBy() or partitioningBy(). To perform a simple reduction on a stream, use reduce() instead.

For example, let’s see how we can use reducing() with groupingBy():

@Test
public void whenStreamGroupingAndReducing_thenGetMap() {
    Comparator<Employee> byNameLength = Comparator.comparing(Employee::getName);
    
    Map<Character, Optional<Employee>> longestNameByAlphabet = empList.stream().collect(
      Collectors.groupingBy(e -> new Character(e.getName().charAt(0)),
        Collectors.reducing(BinaryOperator.maxBy(byNameLength))));

    assertEquals(longestNameByAlphabet.get('B').get().getName(), "Bill Gates");
    assertEquals(longestNameByAlphabet.get('J').get().getName(), "Jeff Bezos");
    assertEquals(longestNameByAlphabet.get('M').get().getName(), "Mark Zuckerberg");
}

Here, we group the employees based on the initial character of their first name. Within each group, we find the employee with the longest name.

Parallel Streams

Using the support for parallel streams, we can perform stream operations in parallel without having to write any boilerplate code; we just have to designate the stream as parallel:

@Test
public void whenParallelStream_thenPerformOperationsInParallel() {
    Employee[] arrayOfEmps = {
      new Employee(1, "Jeff Bezos", 100000.0), 
      new Employee(2, "Bill Gates", 200000.0), 
      new Employee(3, "Mark Zuckerberg", 300000.0)
    };

    List<Employee> empList = Arrays.asList(arrayOfEmps);
    
    empList.stream().parallel().forEach(e -> e.salaryIncrement(10.0));
    
    assertThat(empList, contains(
      hasProperty("salary", equalTo(110000.0)),
      hasProperty("salary", equalTo(220000.0)),
      hasProperty("salary", equalTo(330000.0))
    ));
}

Here salaryIncrement() would get executed in parallel on multiple elements of the stream, by simply adding the parallel() syntax.

This functionality can, of course, be tuned and configured further, if you need more control over the performance characteristics of the operation.

As is the case when writing multi-threaded code. We need to be aware of a few things while using parallel streams:

We need to ensure that the code is thread-safe. Take special care if the operations performed in parallel modify shared data.
We should not use parallel streams if the order in which operations are performed or the order returned in the output stream matters. For example operations like findFirst() may generate different results in the case of parallel streams.
Also, we should ensure that it’s worth making the code execute in parallel. Understanding the performance characteristics of the operation in particular, but also of the system as a whole – is naturally very important here.