[Cloud Computing | AWS Practice] List all AWS S3 objects in a bucket using Java

This article is included in the column [#CloudComputingIntroduction and Practice-AWS], which includes blog posts related to AWS introduction and practice.

This article is synchronized with my personal public account: [Cloud Computing Insights]

For more information about cloud computing technology, please pay attention to: CSDN [#云computingIntroduction and Practice – AWS] column.

This series has updated blog posts:

  • [Cloud Computing | AWS Practice] A complete guide to using Amazon S3 for bucket and object operations in Java applications
  • [Cloud Computing | AWS Practice] How to rename files and folders in Amazon S3 in Java
  • [Cloud Computing | AWS Practice] List all AWS S3 objects in a bucket using Java
  • [Cloud Computing | AWS Practice] Updating existing Amazon S3 objects using Java
  • [Cloud Computing | AWS Practice] Build a personal cloud storage service based on the Amazon S3 protocol
  • [Cloud Computing | AWS Practice] Check if a specified key exists in a given Amazon S3 bucket using Java

Article directory

    • I. Introduction
    • 2. Preparation
    • 3. List the objects in the S3 bucket
    • 4. Use continuation tags for paging
    • 5. Use ListObjectsV2Iterable for paging
    • 6. List objects using prefixes
    • 7. Summary at the end of the article

1. Foreword

In this article, we will focus on how to list all objects in an S3 bucket using Java. We’ll discuss how to interact with S3 using the AWS SDK for Java and look at examples of different use cases.

The focus is on using the AWS SDK for Java V2, which has several improvements over previous versions, such as enhanced performance, non-blocking I/O, and user-friendly API design.

2. Preparation

To list all objects in an S3 bucket, we can leverage the S3Client class provided by the AWS SDK for Java.

First, let’s create a new Java project and add the following Maven dependencies to the pom.xml file:

<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>s3</artifactId>
    <version>2.21.0</version>
</dependency>

For the examples in this article, we will use version 2.21.0. To see the latest version, we can check the Maven Repository.

We also need to set up an AWS account, install the AWS CLI, and configure it with our AWS credentials ( AWS_ACCESS_KEY_ID and AWS_SECERET_ACCESS_KEY ) to be able to access AWS resources programmatically. We can find all the steps to do this in the AWS documentation.

Finally, we need to create an AWS S3 bucket and upload some files. As shown in the image below, for our example, we created a bucket called baeldung-tutorials-s3 and uploaded some test files to it:

3. List the objects in the S3 bucket

Let’s use the AWS SDK for Java V2 and create a method that reads objects from the bucket:

String AWS_BUCKET = "baeldung-tutorial-s3";
Region AWS_REGION = Region.EU_CENTRAL_1;
void listObjectsInBucket() {<!-- -->
    S3Client s3Client = S3Client.builder()
      .region(AWS_REGION)
      .build();

    ListObjectsV2Request listObjectsV2Request = ListObjectsV2Request.builder()
      .bucket(AWS_BUCKET)
      .build();
    ListObjectsV2Response listObjectsV2Response = s3Client.listObjectsV2(listObjectsV2Request);

    List<S3Object> contents = listObjectsV2Response.contents();

    System.out.println("Number of objects in the bucket: " + contents.stream().count());
    contents.stream().forEach(System.out::println);
    
    s3Client.close();
}

To list objects in an AWS S3 bucket, we first need to create a ListObjectsV2Request instance and specify the bucket name. We then call the listObjectsV2 method on the s3Client object, passing the request as a parameter. This method returns a ListObjectsV2Response which contains information about the objects in the bucket.

Finally, we use the contents() method to access the S3 object list and write the number of retrieved objects as output. We also define two static properties for the bucket name and corresponding AWS region.

After executing the method, we get the following results:

Number of objects in the bucket: 1000
S3Object(Key=file_0.txt, LastModified=2023-11-01T11:35:06Z, ETag="b9ece18c950afbfa6b0fdbfa4ff731d3", Size=1, StorageClass=STANDARD)
S3Object(Key=file_1.txt, LastModified=2023-11-01T11:35:07Z, ETag="97a6dd4c45b23db9c5d603ce161b8cab", Size=1, StorageClass=STANDARD)
S3Object(Key=file_10.txt, LastModified=2023-11-01T11:35:07Z, ETag="3406877694691ddd1dfb0aca54681407", Size=1, StorageClass=STANDARD)
S3Object(Key=file_100.txt, LastModified=2023-11-01T11:35:15Z, ETag="b99834bc19bbad24580b3adfa04fb947", Size=1, StorageClass=STANDARD)
S3Object(Key=file_1000.txt, LastModified=2023-08-01T18:54:31Z, ETag="47ed733b8d10be225eceba344d533586", Size=1, StorageClass=STANDARD)
[...]

As we can see, we don’t get all the uploaded objects.

It’s worth noting that this solution is designed to only return a maximum of 1000 objects. If the bucket contains more than 1000 objects, we must implement paging using the nextContinuationToken() method in the ListObjectsV2Response object.

4. Use continuation tags for paging

If our AWS S3 bucket contains more than 1000 objects, we need to implement paging using the nextContinuationToken() method.

Let’s look at an example showing how to handle this situation:

void listAllObjectsInBucket() {<!-- -->
    S3Client s3Client = S3Client.builder()
      .region(AWS_REGION)
      .build();
    String nextContinuationToken = null;
    long totalObjects = 0;

    do {<!-- -->
        ListObjectsV2Request.Builder requestBuilder = ListObjectsV2Request.builder()
          .bucket(AWS_BUCKET)
          .continuationToken(nextContinuationToken);

        ListObjectsV2Response response = s3Client.listObjectsV2(requestBuilder.build());
        nextContinuationToken = response.nextContinuationToken();

        totalObjects + = response.contents().stream()
          .peek(System.out::println)
          .reduce(0, (subtotal, element) -> subtotal + 1, Integer::sum);
    } while (nextContinuationToken != null);
    System.out.println("Number of objects in the bucket: " + totalObjects);

    s3Client.close();
}

Here, we use a do-while loop to paginate all the objects in the bucket. The loop continues until there are no more continue markers, indicating that we have retrieved all objects.

Therefore, we get the following output:

Number of objects in the bucket: 1060

Using this approach, we manage pagination explicitly. We check if the continuation token exists and use it in the following request. This gives us full control over when and how the next page is requested. It allows for greater flexibility in handling the paging process.

By default, the maximum number of objects returned in a response is 1000. It may contain fewer keys, but never more. We can change this setting through the ListObjectsV2Reqeust‘s maxKeys() method.

5. Use ListObjectsV2Iterable for paging

We can use the AWS SDK to handle pagination through the ListObjectsV2Iterable class and the listObjectsV2Paginator() method. This simplifies the code as we don’t need to manually manage the pagination process. This makes the code more concise and readable, thus easier to maintain.

The implemented code is as follows:

void listAllObjectsInBucketPaginated(int pageSize) {<!-- -->
    S3Client s3Client = S3Client.builder()
      .region(AWS_REGION)
      .build();

    ListObjectsV2Request listObjectsV2Request = ListObjectsV2Request.builder()
      .bucket(AWS_BUCKET)
      .maxKeys(pageSize) // Set the maxKeys parameter to control the page size
      .build();

    ListObjectsV2Iterable listObjectsV2Iterable = s3Client.listObjectsV2Paginator(listObjectsV2Request);
    long totalObjects = 0;

    for (ListObjectsV2Response page : listObjectsV2Iterable) {<!-- -->
        long retrievedPageSize = page.contents().stream()
          .peek(System.out::println)
          .reduce(0, (subtotal, element) -> subtotal + 1, Integer::sum);
        totalObjects + = retrievedPageSize;
        System.out.println("Page size: " + retrievedPageSize);
    }
    System.out.println("Total objects in the bucket: " + totalObjects);

    s3Client.close()
}

This is the output we get when we call the method with a pageSize of 500:

S3Object(Key=file_0.txt, LastModified=2023-08-01T11:35:06Z, ETag="b9ece18c950afbfa6b0fdbfa4ff731d3", Size=1, StorageClass=STANDARD)
S3Object(Key=file_1.txt, LastModified=2023-08-01T11:35:07Z, ETag="97a6dd4c45b23db9c5d603ce161b8cab", Size=1, StorageClass=STANDARD)
S3Object(Key=file_10.txt, LastModified=2023-08-01T11:35:07Z, ETag="3406877694691ddd1dfb0aca54681407", Size=1, StorageClass=STANDARD)
[..]
S3Object(Key=file_494.txt, LastModified=2023-11-01T18:53:56Z, ETag="69b7a7308ee1b065aa308e63c44ae0f3", Size=1, StorageClass=STANDARD)
Page size: 500
S3Object(Key=file_495.txt, LastModified=2023-11-01T18:53:57Z, ETag="83acb6e67e50e31db6ed341dd2de1595", Size=1, StorageClass=STANDARD)
S3Object(Key=file_496.txt, LastModified=2023-11-01T18:53:57Z, ETag="3beb9cf0eab8cbf2215990b4a6bdc271", Size=1, StorageClass=STANDARD)
S3Object(Key=file_497.txt, LastModified=2023-11-01T18:53:57Z, ETag="69691c7bdcc3ce6d5d8a1361f22d04ac", Size=1, StorageClass=STANDARD)
[..]
S3Object(Key=file_944.txt, LastModified=2023-11-01T18:54:27Z, ETag="f623e75af30e62bbd73d6df5b50bb7b5", Size=1, StorageClass=STANDARD)
Page size: 500
S3Object(Key=file_945.txt, LastModified=2023-11-01T18:54:27Z, ETag="55a54008ad1ba589aa210d2629c1df41", Size=1, StorageClass=STANDARD)
S3Object(Key=file_946.txt, LastModified=2023-11-01T18:54:27Z, ETag="ade7a0dcf4ddc0673ed48b70a4a340d6", Size=1, StorageClass=STANDARD)
S3Object(Key=file_947.txt, LastModified=2023-11-01T18:54:27Z, ETag="0a476d83ef9cef4bce7f9025522be3b5", Size=1, StorageClass=STANDARD)
[..]
S3Object(Key=file_999.txt, LastModified=2023-11-01T18:54:31Z, ETag="5e732a1878be2342dbfeff5fe3ca5aa3", Size=1, StorageClass=STANDARD)
Page size: 60
Total objects in the bucket: 1060

The AWS SDK delays pagination by retrieving the next page as we iterate over the pages in the for loop. Only when we reach the end of the current page does it fetch the next page, meaning pages are loaded on demand rather than all at once.

6. List objects using prefixes

In some cases we only want to list objects with a common prefix, for example, all objects starting with backup.

To demonstrate this use case, we upload a file named backup1.txt to the bucket, create a folder named backup and move six files into it. The bucket now contains a total of seven objects.

This is what our bucket looks like, pictured below:

Next change the function to return only objects with a common prefix:

void listAllObjectsInBucketPaginatedWithPrefix(int pageSize, String prefix) {<!-- -->
    S3Client s3Client = S3Client.builder()
      .region(AWS_REGION)
      .build();
    ListObjectsV2Request listObjectsV2Request = ListObjectsV2Request.builder()
      .bucket(AWS_BUCKET)
      .maxKeys(pageSize) // Set the maxKeys parameter to control the page size
      .prefix(prefix) // Set the prefix
      .build();

    ListObjectsV2Iterable listObjectsV2Iterable = s3Client.listObjectsV2Paginator(listObjectsV2Request);
    long totalObjects = 0;

    for (ListObjectsV2Response page : listObjectsV2Iterable) {<!-- -->
        long retrievedPageSize = page.contents().stream().count();
        totalObjects + = retrievedPageSize;
        System.out.println("Page size: " + retrievedPageSize);
    }
    System.out.println("Total objects in the bucket: " + totalObjects);

    s3Client.close();
}

We just need to call the prefix method on ListObjectsV2Request. If we call the function with the prefix parameter set to backup, it will count all objects in the bucket that start with backup.

Both keys “backup1.txt” and “backup/file1.txt” will match:

listAllObjectsInBucketPaginatedWithPrefix(10, "backup");

This is what we get back:

S3Object(Key=backup/, LastModified=2023-11-01T17:47:33Z, ETag="d41d8cd98f00b204e9800998ecf8427e", Size=0, StorageClass=STANDARD)
S3Object(Key=backup/file_0.txt, LastModified=2023-11-01T17:48:13Z, ETag="a87ff679a2f3e71d9181a67b7542122c", Size=1, StorageClass=STANDARD)
S3Object(Key=backup/file_1.txt, LastModified=2023-11-01T17:48:13Z, ETag="9eecb7db59d16c80417c72d1e1f4fbf1", Size=1, StorageClass=STANDARD)
S3Object(Key=backup/file_2.txt, LastModified=2023-11-01T17:48:13Z, ETag="800618943025315f869e4e1f09471012", Size=1, StorageClass=STANDARD)
S3Object(Key=backup/file_3.txt, LastModified=2023-11-01T17:48:13Z, ETag="8666683506aacd900bbd5a74ac4edf68", Size=1, StorageClass=STANDARD)
S3Object(Key=backup/file_4.txt, LastModified=2023-11-01T17:49:05Z, ETag="f95b70fdc3088560732a5ac135644506", Size=1, StorageClass=STANDARD)
S3Object(Key=backup1.txt, LastModified=2023-05-04T13:29:23Z, ETag="ec631d7335abecd318f09f56515ed63c", Size=1, StorageClass=STANDARD)
Page size: 7
Total objects in the bucket: 7

If we don’t want to count objects directly below the bucket, we need to add a slash after the prefix:

listAllObjectsInBucketPaginatedWithPrefix(10, "backup/");

Now we only get the objects in the bucket/ folder:

S3Object(Key=backup/, LastModified=2023-11-01T17:47:33Z, ETag="d41d8cd98f00b204e9800998ecf8427e", Size=0, StorageClass=STANDARD)
S3Object(Key=backup/file_0.txt, LastModified=2023-11-01T17:48:13Z, ETag="a87ff679a2f3e71d9181a67b7542122c", Size=1, StorageClass=STANDARD)
S3Object(Key=backup/file_1.txt, LastModified=2023-11-01T17:48:13Z, ETag="9eecb7db59d16c80417c72d1e1f4fbf1", Size=1, StorageClass=STANDARD)
S3Object(Key=backup/file_2.txt, LastModified=2023-11-01T17:48:13Z, ETag="800618943025315f869e4e1f09471012", Size=1, StorageClass=STANDARD)
S3Object(Key=backup/file_3.txt, LastModified=2023-11-01T17:48:13Z, ETag="8666683506aacd900bbd5a74ac4edf68", Size=1, StorageClass=STANDARD)
S3Object(Key=backup/file_4.txt, LastModified=2023-11-01T17:49:05Z, ETag="f95b70fdc3088560732a5ac135644506", Size=1, StorageClass=STANDARD)
Page size: 6
Total objects in the bucket: 6

7. Summary at the end of the article

This article describes different ways of listing objects in an AWS S3 bucket, including using continuation tags for pagination, using ListObjectsV2Iterable for pagination, and using prefixes for listing of objects. Through these methods, you can more effectively manage objects in S3 buckets and achieve more efficient data retrieval and management. Preliminary preparation and basic knowledge are also mentioned in the article to help readers better understand and apply these methods. I hope this information is helpful in your work managing objects in AWS S3 buckets.

[Author of this article] bluetata
[Original link] https://bluetata.blog.csdn.net/article/details/134174962
[Last updated] 11/01/2023 2:31
[Copyright Statement] If you see this line on a non-CSDN website,
It means that web crawlers may have grabbed my article before I published it completely.
The content may be incomplete, please go to the original link above to view the original text.