Hot questions for Using Amazon S3 in awss3transfermanager

Top Java Programmings / Amazon S3 / awss3transfermanager

Question:

When I tried to download all files of a specific folder from S3 using KeyPrefix, It downloads only the directory structure and not the files inside it.

Below is the code TransferManager xfer_mgr = TransferManagerBuilder.standard().build();

    File a = new File("./");
    try {
        GetObjectRequest gor = new GetObjectRequest(bucketName, "folder3");
        MultipleFileDownload xfer = xfer_mgr.downloadDirectory(
                bucketName, null, a);

    } catch (AmazonServiceException e) {
        System.err.println(e.getErrorMessage());
        System.exit(1);
    }
    System.out.println("done...............");

    xfer_mgr.shutdownNow();

Am I missing anything in the code, or Any permissions has to be added? Any Suggestions would be really helpful.


Answer:

Solved It. Transfer manager downloads folder structure first and then all the files inside.

So the Solution is: Making MultipleFileDownload xfer to 'waitForCompletion'.

Adding a line xfer.waitForCompletion() solved the problem.

Question:

Let's say that I have an S3 bucket named bucketSample.

And I have different folders like abc,def and xyz.

Now I have multiple files having prefix hij_ in all the above mentioned folders.

I want to download all the files having prefix hij_. (For Example, hij_qwe.txt,hij_rty.pdf,etc)

I have gone through various ways but for GetObject I have to provide specific object names and I only know the prefix.

And using TransferManager I can download all files of folder abc but not the files with the specific prefix only.

So is there any way that I can only download all the files with prefix hij_?


Answer:

public void getFiles(final Set<String> bucketName, final Set<String> keys, final Set<String> prefixes) {
    try {
        ObjectListing objectListing = s3Client.listObjects(bucketName); //lists all the objects in the bucket
        while (true) {
            for (Iterator<?> iterator = objectListing.getObjectSummaries().iterator();
                 iterator.hasNext(); ) {
                S3ObjectSummary summary = (S3ObjectSummary) iterator.next();
                for (String key : keys) {
                    for (String prefix : prefixes)
                        if (summary.getKey().startsWith(key + "/" prefix)) {
                            //HERE YOU CAN GET THE FULL KEY NAME AND HENCE DOWNLOAD IT IN NEW FILE USING THE TRANFER MANAGER
                        }
                    }
                }
            }
            if (objectListing.isTruncated()) {
                objectListing = s3Client.listNextBatchOfObjects(objectListing);
            } else {
                break;
            }
        }
    } catch (AmazonServiceException e) { }
}

Read about the AWS Directory Structure in here : How does AWS S3 store files? (directory structure)

Therefore, for your use case key + "/" + prefix acts as the prefix of the objects stored in the S3 bucket. By comparing the prefix will all the objects in the S3 Bucket, you can get the full key name.

Question:

I am currently downloading S3 folders of which name matches with prefix_ to a local TMP_FOLDER via the following code snippet:

    TransferManager transferManager = TransferManagerBuilder.standard().withS3Client(s3Instance).build();
    MultipleFileDownload multipleFileDownload = transferManager.downloadDirectory(S3_BUCKET, "prefix_", TMP_FOLDER);
    multipleFileDownload.waitForCompletion();
    transferManager.shutdownNow();

The problem is that my machine runs out of disk space because those folders I am trying to download are huge (they contain many tiny text files but in sum the size of each one of the folders is up to several GiB).

So, my question is, is there a way to limit TransferManager or MultipleFileDownload just to download up to a specific size of data?

Ideally it should download the tiny files one by one checking if with the next file it surpasses the predefined limit or not, stopping if it does (this will avoid to have any incomplete file at the end)

(Of course I am supposing that none of the text files itself will be bigger that the limit established)


Answer:

You can use getProgress().getBytesTransferredMethod() to track the file size that is being uploaded/downloaded from S3.When you reach the file limit you can pause/abort the S3 transfer and resume later. The following links might be helpful to you:

Question:

I am trying to implement implement pause and resume mechanism for Amazon S3 SDK.

I am using TransferManager to start download and to resume it like the given example in the link

There is a problem with downloadInstance.pause() method.

It does not pause the download like in the uploadInstance.tryPause(true)

I have attached ProgressListener to downloadInstance like following:

TransferProgress progress = myDownload.getProgress();

I have tried to pause progress like following:

PersistableDownload persistableDownload = myDownload.pause();

After this point I have checked progress instance and see the progress changed in debug. So I can say that it is not pausing download operation as intended.

To resume the download from the persistableDownload instance, I have tried the following:

Download resumedDownload = transferManager.resumeDownload(persistableDownload);

I have also attached different TransferProgress instance to resumeDownload instance and I have seen that it is starting from beginning instead of resuming.


Answer:

I have found the root cause of the problem. S3 is allowing parallel downloads as default and it is not possible to resume download if it is parallel downloaded as parts.