Hot questions for Using Amazon S3 in aws java sdk

Top Java Programmings / Amazon S3 / aws java sdk

Question:

I know with version 1.x of the SDK it's as simple as per the docs

java.util.Date expiration = new java.util.Date();
long msec = expiration.getTime();
msec += 1000 * 60 * 60; // Add 1 hour.
expiration.setTime(msec);

GeneratePresignedUrlRequest generatePresignedUrlRequest = new GeneratePresignedUrlRequest(bucketName, objectKey);
generatePresignedUrlRequest.setMethod(HttpMethod.PUT); 
generatePresignedUrlRequest.setExpiration(expiration);

URL s = s3client.generatePresignedUrl(generatePresignedUrlRequest); 

However looking at the 2.0 docs but I can't find anything close to the GeneratePresignedUrlRequest.

Hopefully there is another simple pattern for this?


Answer:

This is now supported for S3's GetObject. See here.

  // Create an S3Presigner using the default region and credentials.
      // This is usually done at application startup, because creating a presigner can be expensive.
      S3Presigner presigner = S3Presigner.create();

      // Create a GetObjectRequest to be pre-signed
      GetObjectRequest getObjectRequest =
              GetObjectRequest.builder()
                              .bucket("my-bucket")
                              .key("my-key")
                              .build();

      // Create a GetObjectPresignRequest to specify the signature duration
      GetObjectPresignRequest getObjectPresignRequest =
          GetObjectPresignRequest.builder()
                                 .signatureDuration(Duration.ofMinutes(10))
                                 .getObjectRequest(getObjectRequest)
                                 .build();

      // Generate the presigned request
      PresignedGetObjectRequest presignedGetObjectRequest =
          presigner.presignGetObject(getObjectPresignRequest);

      // Log the presigned URL, for example.
      System.out.println("Presigned URL: " + presignedGetObjectRequest.url());

      // It is recommended to close the S3Presigner when it is done being used, because some credential
      // providers (e.g. if your AWS profile is configured to assume an STS role) require system resources
      // that need to be freed. If you are using one S3Presigner per application (as recommended), this
      // usually is not needed.
      presigner.close();

This is also now supported for S3's PutObject. Example here.

S3Presigner presigner = S3Presigner.create();
PresignedPutObjectRequest presignedRequest =
    presigner.presignPutObject(r -> r.signatureDuration(Duration.ofMinutes(5))
                                        .putObjectRequest(por -> por.bucket(testBucket).key(objectKey)));

System.out.println("Pre-signed URL to upload a file to: " + 
                   presignedRequest.url());
System.out.println("Which HTTP method needs to be used when uploading a file: " + 
                   presignedRequest.httpRequest().method());
System.out.println("Which headers need to be sent with the upload: " + 
                   presignedRequest.signedHeaders())

Here's an example of uploading to S3 using a PresignedPutObjectRequest:

PresignedPutObjectRequest presignedRequest = ...;
SdkHttpClient httpClient = ApacheHttpClient.builder().build();

ContentStreamProvider contentToUpload = () -> new StringInputStream("Hello, World!");
HttpExecuteRequest uploadRequest = HttpExecuteRequest.builder()
                                                     .request(presignedRequest.httpRequest())
                                                     .contentStreamProvider(contentToUpload)
                                                     .build();

HttpExecuteResponse response = httpClient.prepareRequest(uploadRequest).call();
Validate.isTrue(response.httpResponse().isSuccessful());

Question:

I am trying to use aws-sdk-java AwsS3client to talk to a minio storage. From the CLI I am able to do:

aws --profile=minioplay  --endpoint-url https://play.minio.io:9000 s3 cp logback.xml s3://miniohstest-jixusroqeb --debug

thus using a non default profile and a custom endpoint. Not sure how to do this (would I be able to ?) from the java sdk. I roughly translated the above awscli command to this scala snippet :

val cred = ...
val endpoint = "https://play.minio.io:9000"
val client = AmazonS3ClientBuilder
      .standard()
      .withCredentials(cred)
      .withEndpointConfiguration(
        new EndpointConfiguration(
          endpoint,
          AwsHostNameUtils.parseRegion(endpoint, AmazonS3Client.S3_SERVICE_NAME)
        )
      )
      .build()

Using the above client I am only able to make very simple requests such as :

client.listBuckets().asScala.foreach(println(_))

which works. But when I try to do something advanced such as :

val listRequest = new ListObjectsRequest()
      .withBucketName("miniohstest-jixusroqeb")
      //.withPrefix(r.getURI.getPath)
      //.withDelimiter(delimiter)

val res = client.listObjects(listRequest)
res.getObjectSummaries.forEach(x => println(x.getKey))

it throws the following exception :

Exception in thread "main" com.amazonaws.SdkClientException: Unable to execute HTTP request: miniohstest-jixusroqeb.play.minio.io
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleRetryableException(AmazonHttpClient.java:1114)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1064)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)

What am I doing wrong?


Answer:

I resolved this by setting withPathStyleAccessEnabled(true).

Question:

My goal is to fetch an object (image) from S3, change the metadata of the file, and replace it with new file that has changed metadata.

For changing the metadata I am using commons imaging library. I have coded the sample below that works as expected but does not deal with S3.

File newFile = new File("newImage2.jpg");
OutputStream os = new BufferedOutputStream(new FileOutputStream(newFile))
InputStream isNew = new BufferedInputStream(new FileInputStream(newFile))
InputStream is = new BufferedInputStream(new FileInputStream(new File("newImage.jpg")))
try {
            String xmpXml = "<x:xmpmeta>" +
            "\n<Lifeshare>" +
            "\n\t<Date>"+"some date"+"</Date>" +
            "\n\t<Latitude>"+"somelat"+"</Latitude>" +
            "\n\t<Longitude>"+"somelong"+"</Longitude>" +
            "\n\t<Altitude>"+"somealt"+"</Altitude>" +
            "\n\t<Z>"+"someZ"+"</Z>" +
            "\n\t<X>"+"someX"+"</X>" +
            "\n\t<Y>"+"Some y"+"</Y>" +
            "\n</Lifeshare>" +
            "\n</x:xmpmeta>";
            JpegXmpRewriter rewriter = new JpegXmpRewriter();
            rewriter.updateXmpXml(is,os, xmpXml);
            String newXmpXml = Imaging.getXmpXml(isNew, "newImage2.jpg");
            println newXmpXml
        }
finally {
   is.close()
   os.close()
}

The above works since I can run exiftool on the newimage2.jpg and view the set metadata properties:

$ exiftool newImage2.jpg | grep "Lifeshare"
Lifeshare Date                  : some date
Lifeshare Latitude              : somelat
Lifeshare Longitude             : somelong
Lifeshare Altitude              : somealt
Lifeshare Z                     : someZ
Lifeshare X                     : someX
Lifeshare Y                     : Some y

Question

How can I do the same using an object on S3 using AWS S3 SDK? The updateXmpXml method above requires OutputStream as a second parameter. However, I don't see any outputstream class in the AWS sdk http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/allclasses-noframe.html


Answer:

http://docs.aws.amazon.com/AmazonS3/latest/dev/RetrievingObjectUsingJava.html

using apache ioutils

OutputStream os = new ByteArrayOutputStream();


    AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider());        
    S3Object object = s3Client.getObject(
                      new GetObjectRequest(bucketName, key));
    InputStream in= object.getObjectContent();
    IOUtils.copy(in, out);

    in.close();
    out.close();

Question:

I'm using the AWS Java SDK v1 and I'm trying to create an AmazonS3 instance using the builder. I'm following the guide in the AWS documentation here and passing explicit credentials using a BasicAWSCredentials object. When the builder builds, I get a null client back to s3Client which is where I'm storing the instance, and I'm not sure why. There are no errors thrown or caught by the try-catch blocks. Here is how I am making the connection:

    BasicAWSCredentials awsCreds = null;
    try {
        awsCreds = new BasicAWSCredentials(settings.getAccessKey(), settings.getSecretKey());
    } catch (Exception e) {
        e.printStackTrace();
        throw new IOException(e.getMessage());
    }

    try {
        AmazonS3 s3Client = AmazonS3Client.builder()
                .withCredentials(new AWSStaticCredentialsProvider(awsCreds))
                .withRegion(Regions.US_EAST_1)
                .build();
    } catch (Exception e) {
        e.printStackTrace();
    }
    System.out.println(s3Client);

I have stepped through it with the Eclipse debugger and it looks like the build() call is actually returning a valid AmazonS3Client instance, but then before this reference is returned back to the s3Client variable, there's a step in which checkPackageAccess() is called, and this return value is what is being returned back for some reason. On further inspection I found that checkPackageAccess() is a method in the java.lang.ClassLoader that's being called by the JVM, and it has a void return type. In my application, it looks like there's no default SecurityManager set, so there's no other function called inside the ClassLoader's checkPackageAccess() method.

I'm a bit confused by this behavior. My JRE is 1.8. So far as I understand, the ClassLoader is always called when looking for a class definition, but why would it be called after a class has already been instantiated, and why is the original return value of the build() function not coming back to the caller context? In this case, the debugger shows a valid AmazonS3Client object exists and is returned by the call to build() even before the checkPackageAccess call.

I've made an AmazonRedshift instance with almost identical code in the same project previously, and that worked without a hitch, so I'm quite positive the problem is with the AmazonS3 class or the builder, but I'm not sure where and I'm not seeing any errors or strange prints.

Code used for making a similar connection to Amazon Redshift:

        BasicAWSCredentials awsCreds = null;
    try {
        awsCreds = new BasicAWSCredentials(ACCESS_KEY, SECRET_KEY);
    } catch (Exception e) {
        e.printStackTrace();
        throw new IOException(e.getMessage());
    }
    try {
        client = AmazonRedshiftClientBuilder.standard()
                                .withCredentials(new AWSStaticCredentialsProvider(awsCreds))
                                .withRegion(Regions.US_EAST_1)
                                .build();
    } catch (Exception e) {
        e.printStackTrace();
    }
    System.out.println(client);

Has anyone debugged this kind of issue before? What can I do to resolve this and get back a valid instance?


Answer:

The problem ended up being far more fundamental. I feel almost embarrassed to write this. I had to get a second pair of eyes on my code to find it, but I was redeclaring the s3Client within the try block.

private AmazonS3 s3Client = null;

...

BasicAWSCredentials awsCreds = null;
try {
    awsCreds = new BasicAWSCredentials(settings.getAccessKey(), settings.getSecretKey());
} catch (Exception e) {
    e.printStackTrace();
    throw new IOException(e.getMessage());
}

try {
    AmazonS3 s3Client = AmazonS3Client.builder()
            .withCredentials(new AWSStaticCredentialsProvider(awsCreds))
            .withRegion(Regions.US_EAST_1)
            .build();
} catch (Exception e) {
    e.printStackTrace();
}
System.out.println(s3Client);

By the time it reached the code that actually used the AmazonS3 object, the local variable holding a reference to it had already passed out of scope. Can't believe I didn't catch this.

Question:

Is there any way to specify to only include the latest version of each object in the ListVersionsRequest? I need the version value, so a simple AmazonS3Client.listObjects(...) will not suffice because S3ObjectSummary has no version information.

I am creating a utility that pings S3 for all objects in the versioned bucket and compares the latest version value to what the utility is already tracking. The only solution I can think of right now is just to do AmazonS3Client.listVersions(...), iterate through the S3VersionSummary list, handle the first most recent version then manually iterate and skip all older versions of an objet until it gets to the new key. Is this the best solution?

Thanks


Answer:

I only see two options:

  1. Do what you described and list all versions of the data that you have. Afterwards you'll get a list that is in order of versioning, so you'll need to know when to stop checking for versions. I think that this can be done by iterating through the list of versions and stopping at the first islatest() call that returns false ref

  2. You can list objects and then get the object summary for each object which will contain the version ID.

Question:

From a Web API, I receive the following information about an Amazon S3 Bucket I am allowed to upload a File to:

s3_bucket (the Bucket name)
s3_key (the Bucket key)
s3_policy (the Bucket policy)
s3_signature the Bucket signature)

Because I am not the owner of the Bucket, I am provided with the s3_policy and s3_signature values, which, according to the AWS Upload Examples, can be used to authenticate a Put request to a Bucket.

However, in AWS's official Java SDK I'm using, I can't seem to find a way to perform this authentication. My code:

PutObjectRequest putObjectRequest = new PutObjectRequest(s3_bucket, s3_key, fileToUpload);
s3Client.putObject(putObjectRequest);

I do understand that I need to use the s3_signature and s3_policy I'm given at some point, but how do I do so to authenticate my PutObjectRequest?

Thanks in advance,

CrushedPixel


Answer:

I don't think you're going to use the SDK for this operation. It's possible that the SDK will do what you need at this step, but it seems unlikely, since the SDK would typically take the access key and secret as arguments, and generate the signature, rather than accepting the signature as an argument.

What you describe is an upload policy document, not a bucket policy. That policy, the signature, and your file, would all go into an HTTP POST (not PUT) request of type multipart/form-data -- a form post -- as shown in the documentation page you cited. All you should need is an HTTP user agent.

You'd also need to craft the rest of the form, including all of the other fields in the policy, which you should be able to access by base64-decoding it.

The form also requires the AWSAccessKeyId, which looks something like "AKIAWTFBBQEXAMPLE", which is -- maybe -- what you are calling the "s3_key," although in S3 terminology, the "(object) key" refers to the path and filename.

This seems like an odd set of parameters to receive from a web API, particularly if they are expecting you to generate the form yourself.

Question:

I would like to think that it's a big YES, but I prefer to ask before to suppose. So, do you know if the AWS SDK for Java always uses a secure channel when I download/upload files from/to S3 buckets? Or this is something that should be configured when I write the code or into the S3 buckets itself?


Answer:

Amazon S3 end points support both HTTP and HTTPS (http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region)

when you're using the Java SDK you will create an AmazonS3Client and if you do not specify to he specifically using the HTTP protocol it will use by default HTTPS (see http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html#setEndpoint(java.lang.String))

Callers can pass in just the endpoint (ex: "ec2.amazonaws.com") or a full URL, including the protocol (ex: "https://ec2.amazonaws.com"). If the protocol is not specified here, the default protocol from this client's ClientConfiguration will be used, which by default is HTTPS.

Question:

I am looking at usage example provided in AWS SDK docs for TransferManager, in particular for the following code:

TransferManager tx = new TransferManager(
               credentialProviderChain.getCredentials());
Upload myUpload = tx.upload(myBucket, myFile.getName(), myFile);
 // Transfers also allow you to set a <code>ProgressListener</code> to receive
// asynchronous notifications about your transfer's progress.
myUpload.addProgressListener(myProgressListener);

and I am wondering whether we don't have here case of a race condition. AFAIU TransferManager works asynchronously, it may start the uploading file straight away after the Upload object creation, even before we add the listener, so if we use the snippet as provided in the docs, it seems to be possible that we won't receive all notifications. I've looked briefly into the addProgressListener and I don't see there that past events would be replayed on attaching a new listener. Am I wrong? Am I missing something?


Answer:

If you need to get ALL events, I imagine this can be achieved using a different upload method that takes in a ProgressListener as a parameter. Of course, using this method will require encapsulating your bucketname, key, and file into an instance of PutObjectRequest.

Question:

I'm pushing files to an S3 bucket and the bucket owner cannot see the files while my role is granted access to write to the bucket. I'm not sure why this is and was hoping I would have to programatically force bucket-owner-full-control.

Simple Code Blob:
      ObjectMetadata metadata = constructMetadata();
      PutObjectRequest request = new PutObjectRequest(bucketName, filename, data, metadata);
      s3Supplier.get().putObject(request);

It uploads successfully, but not seen by the bucket owner. Any reason as to why this would be?


Answer:

The S3 objects are owned by the AWS Account that writes them. By default, Only the object owner has all privileges on the object and is not accessible by others, including the bucket owner, unless the access is explicitly granted (via ACL).

This example from AWS documentation explains how to grant access to the bucket owner for such objects.

Question:

How can I get config values using the java SDK for an equivalent cli command like this: aws configure get s3.multipart_chunksize --profile profile1?

I don't see anything in the docs. I am using the aws s3 api, AmazonS3.

I am looking to store certain information from the config during object upload (aws s3 cp) as metadata, in case my configuration changes down the road, I know what config was used for a specific object.


Answer:

aws configure get pulls values from your local AWS configuration file. I would recommend storing the values that you need in environment variables instead. Then you can use them in java via System.getenv, and still use them in your CLI commands by using the ENV vars instead of pulling values from the config file. This will make your code more portable since it won't depend on having an AWS configuration file packaged with it.

If you need to access the values from both the configuration file and from code, you could write a quick script in e.g. bash to copy the values from the config and add them as ENV vars. Another option would be to use Java system properties if you want to set the values at run time.

Question:

For getting s3 client object i am using below code.

BasicAWSCredentials creds = new BasicAWSCredentials(key, S3secretKey); 
AmazonS3 s3Client =AmazonS3ClientBuilder.standard().withCredentials(new AWSStaticCredentialsProvider(creds)).build();

Getting below errors

Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region.


Answer:

I had to change to:

AmazonS3 client = AmazonS3ClientBuilder.standard()
                         .withRegion(Regions.US_EAST_1)
                         .withForceGlobalBucketAccess(true)
                         .build();

to emulate the "old" way (i.e. new AmazonS3Client() )

Question:

I know there is a doesObjectExist method to check if an object exists in a specified bucket, but how do I check if an object with a specific version exists in an S3 bucket?

I want to call doesObjectExist(bucketName, objectName, s3Version).

Is there any way I can do this, or do I need to call listVersions first and check if the version exists using the VersionListing? This approach seems a lot more verbose.


Answer:

There is no one step check in the current API. You could try using something like

s3Client.getObjectMetadata(
  new GetObjectMetadataRequest(bucketName, key, versionId)
)

But then I don't see any reliable way to know when such object doesn't exist (because there's no a special "object doesn't exist" exception for such case). So after it fails you should check that this object exists with doesObjectExist. Or another way round: check that it exists, then query the metadata with version, if it exists but the metadata request fails, this version of the object doesn't exist.

Question:

i want to set object-specific retention but not successful with below java code:

So I try to lock the object but i am getting below error: Bucket is missing ObjectLockConfiguration

Note: already provided full s3 access to the user

    ObjectMetadata metadata = new ObjectMetadata();
    metadata.setContentLength(bytes.length);
    metadata.setContentType(contentType);
    metadata.setExpirationTime(DateTime.now().toDate());
    metadata.setHeader("x-amz-bucket-object-lock-enabled", true);
    //metadata.setHeader("expires", expirationTime);
    //metadata.setHttpExpiresDate(expirationTime);
    ObjectLockConfiguration oc = new ObjectLockConfiguration();

    PutObjectRequest putRequest = new PutObjectRequest(targetBucketName, objectName, baInputStream, metadata);

    putRequest.setObjectLockRetainUntilDate(DateTime.now().plusDays(2).toDate());
    s3client.putObject(putRequest);

Answer:

    ObjectMetadata metadata = new ObjectMetadata();
    System.out.println("size:"+bytes.length);
    metadata.setContentLength(bytes.length);
    metadata.setContentType(contentType);
    Date expirationTime = new Date(2025,5,10);
    metadata.setExpirationTime(DateTime.now().toDate());
    metadata.setHeader("x-amz-object-lock-retain-until-date", closerDate+"T00:00:00.000Z");
    metadata.setHeader("x-amz-object-lock-mode","COMPLIANCE");
    byte[] md5 = Md5Utils.computeMD5Hash(baInputStream);
    String md5Base64 = BinaryUtils.toBase64(md5);
    metadata.setHeader("Content-MD5", md5Base64);
    baInputStream.reset();
    PutObjectRequest putRequest = new PutObjectRequest(targetBucketName, objectName, baInputStream, metadata);
    s3client.putObject(putRequest);

Question:

My spark application is not able to load the AWSCredentials class and displaying the message - Failed to load com.pipeline.ana.SparkApp: com/amazonaws/auth/AWSCredentials

I have these imports -

import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.client.builder.AwsClientBuilder;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.AmazonS3;

Created s3Client like this -

AWSCredentials credentials = new BasicAWSCredentials("access_key", "secret_key");
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
                    .withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration("https://10.10.1.225:19443", "region1"))
                    .withCredentials(new AWSStaticCredentialsProvider(credentials))
                    .build();

is used to put the object (string present inside the java RDD). When I do mvn clean package, build is successful but spark-submit is not able to find the AWS classes.

My maven dependency com.amazonaws.aws-java-sdk.1.11.595. How can this be resolved?


Answer:

Since I am using IntelliJ ide, it is sometime the caching issue. Simply delete the project metadata directories from project folder and import the project. The build.sbt would reload the dependencies without any failure. My issue is solved with this naive approach.

Question:

How can I get list of all regions to show to user to choose from. basically I want to show a dropdown with all regions possible.

This is for S3 bucket configuration (not sure if ec2 describe region would work?).

All it should be dynamic and not tried to and SDK so that if AWS creates a new region user should be able to select it. (So don't want to use Regions enum)

I am using aws java sdk.


Answer:

I used ec2client with describeRegions api to get eligible regions -

    BasicAWSCredentials credentials = new BasicAWSCredentials(awsAcessKeyId, awsSecretKey);

    AmazonEC2 ec2Client = AmazonEC2ClientBuilder.standard()
            .withCredentials(new AWSStaticCredentialsProvider(credentials)).build();
    DescribeRegionsResult describeRegionsResult = ec2Client.describeRegions();
    for(com.amazonaws.services.ec2.model.Region region : describeRegionsResult.getRegions()) {
        System.out.println(region.getRegionName());
    }

Question:

I am working on project where I need to download keys from Amazon S3 bucket, which has more than 1 Billion objects. I wrote a code using Java V2 API but it doesn't help as it downloads only 1000 keys at a time. Its takes days to get list of all keys from this bucket. Is there any faster way to get all list of keys.

I have checked other answers related to this topic and it didn't help.

Thanks


Answer:

We had the same issue with a large number of objects.

We followed a pattern timestamp in 10 increments in their object name. It looks like this,

s3://bucket-name/timestamp/actualobject.extension

Eg.,
s3://mys3bucket/1506237300/datafile001.json

When you iterate through I have parallel threads running for each timestamp for 15-minute increments and everything was read very fast.

The key way to solve is to find out the pattern you have used in storing those objects and list the object names based on those patterns.

Hope it helps.