Two characteristics of JavaStream data processing: delayed execution and immutability

When I was writing business in the company recently, I suddenly couldn’t remember how to write the accumulation in Stream?

Helpless, I can only program for Google. After spending my precious three minutes, I learned it, it is very simple.

Since I used JDK8, Stream is my most commonly used feature, and it is used for various streaming operations. However, after this incident, I suddenly felt that Stream was really strange to me.

Maybe everyone is the same, The most commonly used things are the easiest to ignore, even if you want to prepare for an interview, you probably won’t think of looking at something like Stream.

But since I have noticed it, I have to sort it out again, which can be regarded as checking for gaps in my overall knowledge system.

It took a lot of effort to write this Stream, and I hope that everyone will re-understand and learn about Stream together with me, whether it is to understand the API or internal features, I am afraid that the truth is endless, and there will be further joy.

In this article, I divide the content of Stream into the following parts:

At first glance at this map, you may be a little confused by the terms conversion stream operation and termination stream operation. In fact, I divided all APIs in Stream into two categories, and each category has a corresponding name (refer to Java8 related Books, see end of article):

Conversion stream operation: such as filter and map methods, which convert one Stream into another Stream, and the return value is Stream.
Terminal stream operation: For example, the count and collect methods summarize a Stream into the result we need, and the return value is not a Stream.

Among them, I also divide the API of conversion stream operation into two categories. There will be detailed examples in the article. Here is a look at the definition first, and I have a general impression:

Stateless : That is, the execution of this method does not need to depend on the result set of the execution of the previous method.
Stateful : That is, the execution of this method needs to rely on the result set of the execution of the previous method.

Because the content of Stream is too much, I split Stream into two articles. This article is the first one, with detailed content and simple and rich use cases.

Although the topic of the second article has only one terminal operation, the terminal operation API is relatively complex, so the content is also informative, and the use cases are simple and rich. From the perspective of length, the two are similar, so stay tuned.

Note: Since my local computer is JDK11, and I forgot to switch to JDK8 when writing, so the List.of() that appears in large numbers in the use cases is in JDK8 No, it is equivalent to Arrays.asList() in JDK8.

Note: During the writing process, I read a lot of Stream source code and Java8 books (at the end of the article). It is not easy to create, and I have over a hundred likes. I will publish the second article soon.

1. Why use Stream?

Everything stems from the release of JDK8. In the era when functional programming languages were in full swing, Java was criticized for its bloat (strong object-oriented). The community urgently needs Java to add functional language features to improve this situation. Finally, In 2014 Java released JDK8.

In JDK8, I think the biggest new feature is the addition of functional interfaces and lambda expressions, which are taken from functional programming.

The addition of these two features makes Java simpler and more elegant. Using functional against functional and consolidating the status of Java’s big brother is simply learning from the barbarian’s skills to control the barbarian.

And Stream is a class library made by JDK8 for the collection class library based on the above two features. It allows us to process the data in the collection in a more concise and pipelined way through lambda expressions, which can be easily Complete operations such as filtering, grouping, collecting, and reducing, so I would like to call Stream the best practice for functional interfaces.

1.1 Clearer code structure

Stream has a clearer code structure. In order to better explain how Stream makes the code clearer, here we assume that we have a very simple requirement: Find all elements greater than 2 in a collection.

Let’s take a look before using Stream:

 List<Integer> list = List.of(1, 2, 3);
        
        List<Integer> filterList = new ArrayList<>();
        
        for (Integer i : list) {
            if (i > 2) {
                filterList. add(i);
            }
        }
        
        System.out.println(filterList);
Copy Code

The above code is easy to understand, so I won’t explain it too much. In fact, it’s okay, because our needs are relatively simple. What if there are more needs?

Every time there is an additional requirement, another condition needs to be added to the if, and in our development, there are often many fields on the object, so there may be four or five conditions, and it may end up like this:

 List<Integer> list = List.of(1, 2, 3);

        List<Integer> filterList = new ArrayList<>();

        for (Integer i : list) {
            if (i > 2 & amp; & amp; i < 10 & amp; & amp; (i % 2 == 0)) {
                filterList. add(i);
            }
        }

        System.out.println(filterList);
Copy Code

There are a lot of conditions in the if, and it looks messy. In fact, this is okay. The most terrible thing is that there are often many similar requirements in the project. The difference between them is only a certain condition. Then you need Copy a lot of code, change it and go online, which leads to a lot of repeated code in the code.

Everything becomes clear and understandable if you Stream:

 List<Integer> list = List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());
Copy Code

You only need to pay attention to what we care about most in this code: the filter condition is enough, the method name filter can let you know clearly that it is a filter condition, and the method name collect can also tell that it is a collector, which will eventually The results are collected into a List.

At the same time, you may find out why there is no need to write a loop in the above code?

Because Stream will help us to perform an implicit loop, which is called: internal iteration, which corresponds to our common external iteration.

So even if you don’t write a loop, it will loop again.

1.2 Don’t care about variable state

Stream was designed to be immutable at the beginning of its design, and its immutability has two meanings:

Since each Stream operation generates a new Stream, Stream is immutable, just like String.
Only the reference of the original collection is saved in the Stream, so when performing some operations that modify elements, a new new element is generated through the original element, so any operation of the Stream will not affect the original object.

The first meaning can help us make chain calls. In fact, we often use chain calls in the process of using Stream, while the second meaning is a major feature in functional programming: not modifying the state.

No matter what kind of operation is done on the Stream, it will not affect the original collection in the end, and its return value is also calculated on the basis of the original collection.

So in Stream, we don’t have to care about the side effects of manipulating the original object collection, just use it up.

For functional programming, please refer to Ruan Yifeng’s A Preliminary Study of Functional Programming.

1.3 Delayed execution and optimization

Stream will only be executed when encountering terminal operation, such as:

 List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .peek(System.out::println);
Copy Code

Such a piece of code will not be executed. The peek method can be regarded as forEach. Here I use it to print the elements in the Stream.

Because the filter method and the peek method are both transformation flow methods, they will not trigger execution.

If we add a count method later, it can be executed normally:

 List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .peek(System.out::println)
                .count();
Copy Code

The count method is a terminal operation used to calculate how many elements there are in the Stream, and its return value is a long type.

This property of a Stream not executing without finalization is called delayed execution.

At the same time, Stream will also optimize the stateless methods in the API called loop merging, see the third section for specific examples.

2. Create Stream

For the sake of the integrity of the article, I thought about it and added the section of creating a Stream. This section mainly introduces some common ways to create a Stream. The creation of a Stream can generally be divided into two situations:

Created using the Steam interface
Created by collection class library

At the same time, I will talk about the parallel flow and connection of Stream, both of which create Stream, but have different characteristics.

2.1 Created through the Stream interface

As an interface, Stream defines several static methods in the interface to provide us with an API to create Stream:

 public static<T> Stream<T> of(T... values) {
        return Arrays. stream(values);
    }

Copy Code

The first is the of method, which provides a generic variable parameter and creates a stream with generics for us. At the same time, if your parameter is a basic type, it will use automatic packaging to wrap the basic type:

 Stream<Integer> integerStream = Stream.of(1, 2, 3);

        Stream<Double> doubleStream = Stream.of(1.1d, 2.2d, 3.3d);

        Stream<String> stringStream = Stream.of("1", "2", "3");
Copy Code

Of course, you can also directly create an empty Stream, just call another static method-empty(), whose generic type is an Object:

 Stream<Object> empty = Stream.empty();
Copy Code

The above are the creation methods that we make easy to understand, and there is another way to create a Stream with an unlimited number of elements-generate():

 public static<T> Stream<T> generate(Supplier<? extends T> s) {
        Objects. requireNonNull(s);
        return StreamSupport. stream(
                new StreamSpliterators.InfiniteSupplyingSpliterator.OfRef<>(Long.MAX_VALUE, s), false);
    }
Copy Code

From the perspective of method parameters, it accepts a functional interface-Supplier as a parameter. This functional interface is an interface used to create objects. You can compare it to an object creation factory. Stream puts objects created in this factory into into the Stream:

 Stream<String> generate = Stream. generate(() -> "Supplier");

        Stream<Integer> generateInteger = Stream. generate(() -> 123);
Copy Code

I am here to construct a Supplier object directly using Lamdba for convenience. You can also pass in a Supplier object directly, and it will construct the object through the get() method of the Supplier interface.

2.2 Create through collection class library

Compared with the above one, the second method is more commonly used. We often perform Stream operations on collections instead of manually constructing a Stream:

 Stream<Integer> integerStreamList = List.of(1, 2, 3).stream();
        
        Stream<String> stringStreamList = List.of("1", "2", "3").stream();
Copy Code

In Java8, the top-level interface Collection of the collection is added a new interface default method – stream(), through this method we can conveniently classify all collection subclasses Perform the operation of creating Stream:

 Stream<Integer> listStream = List.of(1, 2, 3).stream();
        
        Stream<Integer> setStream = Set.of(1, 2, 3).stream();
Copy Code

By consulting the source code, you can find that the stream() method essentially creates a Stream by calling a Stream tool class:

 default Stream<E> stream() {
        return StreamSupport. stream(spliterator(), false);
    }
Copy Code

2.3 Creating parallel streams

In the above example, all Streams are serial streams. In some scenarios, in order to maximize the performance of squeezing multi-core CPUs, we can use parallel streams, which perform parallel operations through the fork/join framework introduced in JDK7. We can create parallel streams as follows:

 Stream<Integer> integerParallelStream = Stream.of(1, 2, 3).parallel();

        Stream<String> stringParallelStream = Stream.of("1", "2", "3").parallel();

        Stream<Integer> integerParallelStreamList = List.of(1, 2, 3).parallelStream();

        Stream<String> stringParallelStreamList = List.of("1", "2", "3").parallelStream();
Copy Code

Yes, there is no way to directly create a parallel stream in the static method of Stream. We need to call the parallel() method again after constructing the Stream to create a parallel stream, because calling the parallel() method will not recreate a parallel stream object , but set a parallel parameter on the original Stream object.

Of course, we can also see that parallel streams can be created directly in the Collection interface, just call the parallelStream() method corresponding to stream(), as I just said As you can see, the only difference between them is the parameters:

 default Stream<E> stream() {
        return StreamSupport. stream(spliterator(), false);
    }

    default Stream<E> parallelStream() {
        return StreamSupport. stream(spliterator(), true);
    }
Copy Code

However, under normal circumstances, we do not need to use parallel streams. If the number of elements in the Stream is less than a thousand, the performance will not be greatly improved, because it is costly to distribute the elements to different CPUs for calculation.

The advantage of parallelism is to make full use of the performance of multi-core CPUs, but it is often necessary to divide the data in use, and then disperse it to each CPU for processing. If the data we use is an array structure, it can be easily divided, but if it is a linked list structure The data of the data or the data of the Hash structure is obviously not as convenient as the array structure.

Therefore, only when the elements in the Stream exceed 10,000 or even larger, choosing parallel streams can bring you more obvious performance improvements.

Finally, when you have a parallel stream, you can also conveniently convert it to a serial stream with sequential():

 Stream.of(1, 2, 3).parallel().sequential();
Copy Code

2.4 Connect to Stream

If you have constructed two Streams in two places and want to combine them when using them, you can use concat():

 Stream<Integer> concat = Stream
                .concat(Stream.of(1, 2, 3), Stream.of(4, 5, 6));
Copy Code

If two different generic streams are combined, automatic inference will automatically infer two parent classes of the same type:

 Stream<Integer> integerStream = Stream.of(1, 2, 3);

        Stream<String> stringStream = Stream.of("1", "2", "3");

        Stream<? extends Serializable> stream = Stream. concat(integerStream, stringStream);
Copy Code

3. Stateless method of Stream conversion operation

Stateless method: the execution of this method does not need to depend on the result set of the previous method execution.

There are about three commonly used stateless APIs in Stream:

map() method: The parameter of this method is a Function object, which allows you to perform custom operations on the elements in the collection and retain the operated elements.
filter() method: The parameter of this method is a Predicate object, and the execution result of Predicate is a Boolean type, so this method only retains the elements whose return value is true, just like its name, we can use this method Do some filtering.
flatMap() method: The parameter of this method is a Function object like the map() method, but the return value of this Function is required to be a Stream, and this method can aggregate elements in multiple Streams together to return.

Let’s take a look at an example of the map() method:

 Stream<Integer> integerStreamList = List.of(1, 2, 3).stream();

        Stream<Integer> mapStream = integerStreamList. map(i -> i * 10);
Copy Code

We have a List, if we want to multiply 10 for each element, we can use the above method, where i is the variable name of the elements in the List , the logic behind → is the operation to be performed on this element, and a piece of code logic is passed in in a very concise and clear way for logic execution. This code will finally return a new Stream containing the operation result .

Here, in order to better help everyone understand, I drew a simple diagram:

Next is an example of the filter() method:

 Stream<Integer> integerStreamList = List.of(1, 2, 3).stream();

        Stream<Integer> filterStream = integerStreamList. filter(i -> i >= 20);
Copy Code

In this code, the logic of i >= 20 will be executed, and then the result with a return value of true will be saved in a new Stream and returned.

Here I also have a simple diagram:

The description of the flatMap() method has been described above, but it is a bit too abstract. I also searched for many examples in learning this method to get a better understanding.

According to the official documentation, this method is for flattening one-to-many elements:

 List<Order> orders = List.of(new Order(), new Order());

        Stream<Item> itemStream = orders. stream()
                .flatMap(order -> order.getItemList().stream());
Copy Code

Here I use an order example to illustrate this method. Each of our orders contains a product list. If I want to combine all the product lists in the two orders into a new product list, I need to use flatMap() method.

In the above code example, you can see that each order returns a stream of product lists. In this example, we only have two orders, so it will eventually return two streams of product lists. The function of the flatMap() method It is to extract the elements in these two Streams and put them into a new Stream.

The old rules, put a simple diagram to illustrate:

In the legend, I use cyan to represent Stream. In the final output, you can see that flatMap() converts two streams into one stream for output. This is very useful in some scenarios, such as my order example above.

There is also a very uncommon stateless method peek():

 Stream<T> peek(Consumer<? super T> action);
Copy Code

The peek method accepts a Consumer object as a parameter, which is a parameter with no return value. We can use the peek method to do some operations such as printing elements:

 Stream<Integer> peekStream = integerStreamList.peek(i -> System.out.println(i));

Copy Code

However, if you are not familiar with it, it is not recommended to use it, and it will not take effect in some cases, such as:

 List.of(1, 2, 3).stream()
                .map(i -> i * 10)
                .peek(System.out::println)
                .count();
Copy Code

The API documentation also indicates that this method is used for Debug. According to my experience, peek will only be executed when the Stream finally needs to regenerate elements.

In the above example, count only needs to return the number of elements, so peek is not executed, and it will be executed if it is replaced by the collect method.

Or if there are filtering methods such as filter method and match related method in Stream, it will also be executed.

3.1 Basic type Stream

The previous section mentioned the three most commonly used stateless methods in the three Streams. There are also several methods corresponding to map() and flatMap() in the stateless methods of Stream. They are:

mapToInt
mapToLong
mapToDouble
flatMapToInt
flatMapToLong
flatMapToDouble

These six methods can be seen from the method names. They just convert the return value on the basis of map() or flatMap(). It stands to reason that there is no need to single them out to make a method. In fact, The key is the return value:

The return value of mapToInt is IntStream
The return value of mapToLong is LongStream
The return value of mapToDouble is DoubleStream
The return value of flatMapToInt is IntStream
The return value of flatMapToLong is LongStream
The return value of flatMapToDouble is DoubleStream

In JDK5, in order to make Java more object-oriented, the concept of wrapper class was introduced. The eight basic data types all correspond to a wrapper class, which allows you to automatically unbox/box when using basic types. It is to automatically use the conversion method of the wrapper class.

For example, in the first example, I used this example:

 Stream<Integer> integerStream = Stream.of(1, 2, 3);
Copy Code

I used the basic data type parameters in the creation of Stream, and its generics are automatically wrapped into Integer, but sometimes we may ignore the automatic unboxing and there is a cost. If we want to ignore this cost in using Stream, we can use Stream is converted to Stream designed for basic data types:

IntStream: Corresponding int, short, char, boolean in the basic data type
LongStream: corresponds to long in the basic data type
DoubleStream: corresponds to double and float in the basic data type

In these interfaces, Stream can be constructed through the of method as in the above example, and the box will not be automatically unpacked.

So the six methods mentioned above are actually converting ordinary streams into this basic type stream, which can have higher efficiency when we need it.

The basic type stream has the same API as Stream in terms of API, so as long as you understand Stream in terms of usage, the basic type stream is also the same.

Note: IntStream, LongStream and DoubleStream are all interfaces, but they are not inherited from the Stream interface.

3.2 Loop Merging of Stateless Methods

After talking about these stateless methods, let’s look at an example from the previous article:

 List<Integer> list = List.of(1, 2, 3).stream()
                .filter(i -> i > 2)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());
Copy Code

In this example, I used the filter method three times, so do you think Stream will loop three times for filtering?

If one of the filters is replaced with a map, how many times do you think it will loop?

 List<Integer> list = List.of(1, 2, 3).stream()
                .map(i -> i * 10)
                .filter(i -> i < 10)
                .filter(i -> i % 2 == 0)
                .collect(toList());
Copy Code

From our intuition, we need to use the map method to process all elements first, and then use the filter method to filter, so three loops need to be executed.

But looking back at the definition of the stateless method, you can find that the other three conditions can be done in a loop, because the filter only depends on the calculation result of the map, and does not have to rely on the result set after the map is executed, so just make sure to operate the map first Then operate the filter, they can be completed in a loop, this optimization method is called loop merge.

All stateless methods can be executed in the same loop, and they can also be easily executed on multiple CPUs using parallel streams.

4. Stateful method of Stream conversion operation

After talking about the stateless method, the stateful method is relatively simple. You can know its function just by looking at the name:

Method name	Method result
distinct()	element deduplication.
sorted()	Element sorting, two overloaded methods, you can pass in a sorting object when needed.
limit(long maxSize)	A number is passed in, which means that only the first X elements are taken.
skip(long n)	A number is passed in, which means skipping X elements and taking the following elements.
takeWhile(Predicate predicate)	JDK9 is newly added, pass in an assertion parameter, stop when the first assertion is false, and return the previous assertion as true Elements.
dropWhile(Predicate predicate)	JDK9 is newly added, pass in an assertion parameter and stop when the first assertion is false, delete the previous assertion as true Elements.

The above are all stateful methods, and their method execution must rely on the result set of the previous method to execute. For example, the sorting method needs to rely on the result set of the previous method to sort.

At the same time, the limit method and takeWhile are two short-circuit operation methods, which means higher efficiency, because the element we want may have been selected before the inner loop has finished.

Therefore, stateful methods cannot be executed in a loop like stateless methods. Each stateful method has to go through a separate inner loop, so the order of writing code will affect the execution results and performance of the program. Hope you all Readers take note during development.

5. Summary

This article mainly gives an overview of Stream, and describes the two characteristics of Stream:

Immutable: Does not affect the original collection, and returns a new Stream every time it is called.
Delayed execution: The Stream will not execute until it encounters a finalization operation.

At the same time, the API of Stream is divided into conversion operation and termination operation, and all commonly used conversion operations are explained. The main content of the next chapter will be termination operation.

In the process of looking at the source code of Stream, I found an interesting thing. In the ReferencePipeline class (the implementation class of Stream), its method order from top to bottom is exactly: stateless method → stateful method → Aggregation method.

Well, after finishing this article, I think everyone has a clear understanding of Stream as a whole, and at the same time, you should have mastered the API for conversion operations. After all, there are not many, Java8 still has many powerful features. Let’s talk next time~