Hot questions for Using Amazon S3 in amazon ec2

Question:

I've already seen this, but there was no answer to explain my problem. I first used the sample provided here (GetObject class), and it worked immediately on my desktop. However, my friend could not get it to work on his machine, nor will it work on our EC2 instance.

It was mentioned that there are to be credentials files specified, which makes sense, but I never had to do that and am pretty sure the default permissions were set to enable accessing this bucket.

Here's the stacktrace:

Exception in thread "main" java.lang.IllegalArgumentException: profile file cannot be null
    at com.amazonaws.util.ValidationUtils.assertNotNull(ValidationUtils.java:37)
    at com.amazonaws.auth.profile.ProfilesConfigFile.<init>(ProfilesConfigFile.java:142)
    at com.amazonaws.auth.profile.ProfilesConfigFile.<init>(ProfilesConfigFile.java:133)
    at com.amazonaws.auth.profile.ProfilesConfigFile.<init>(ProfilesConfigFile.java:100)
    at com.amazonaws.auth.profile.ProfileCredentialsProvider.getCredentials(ProfileCredentialsProvider.java:135)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1029)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1049)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:949)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:662)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:636)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:619)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$300(AmazonHttpClient.java:587)
    at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:574)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:446)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4035)
    at com.amazonaws.services.s3.AmazonS3Client.getBucketRegionViaHeadRequest(AmazonS3Client.java:4474)
    at com.amazonaws.services.s3.AmazonS3Client.fetchRegionFromCache(AmazonS3Client.java:4448)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4020)
    at com.amazonaws.services.s3.AmazonS3Client.getObject(AmazonS3Client.java:1307)
    at GetObject.main(GetObject.java:26)

I can guarantee that neither the bucketName nor the key params in the GetObjectRequest are null. What's the discrepancy here? Why might it succeed only on my PC? Is this at all related to the fact that I had to supplement numerous jars that the aws-sdk jar was supposed to have already (jackson-databind, jackson-core, jackson-annotations, httpclient, httpcore, commons-logging, and joda-time)? It seems similar, what with the otherwise inexplicable errors (giving non-null params, something in aws-sdk says it's null).


Answer:

It looks like you solved this in the comments, but I got burned on this and want to leave a clearer answer for future readers. To be super clear, the problem here has nothing to do with files in S3. This error message has nothing to do with the file on your hard drive nor the file that you're trying to push/pull from S3. The problem is that you're initializing S3 with something like:

AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider());

When you do that, it looks in ~/.aws/credentials for a list of profiles. This might work great on your computer but won't work anywhere that you're getting AWS access through an IAM role (ex. Lambda, Docker, EC2 instance, etc). The fix, is to initialize the AmazonS3Client like:

AmazonS3 s3Client = new AmazonS3Client();

If you're using code that requires some kind of credentials provider, you can also do:

AmazonS3 s3Client = new AmazonS3Client(DefaultAWSCredentialsProviderChain.getInstance());

Hopefully that helps the next person. In my case, I was using DynamoDB and SQS, but I had the same error. I originally ignored this question because I thought your problem was S3 related and was super confused. Wrist slapped.

Question:

I am new to S3. One of our vendor is sharing a bucket and objects with us. We created an AWS account and added our team members as users. We can access data in the bucket via amazon aws cli. I am looking for Java API to download data programmatically.

My code is :

/*
 * Create your credentials file at ~/.aws/credentials (C:\Users\USER_NAME\.aws\credentials for Windows users) 
 * and save the following lines after replacing the underlined values with your own.
 *
 * [default]
 * aws_access_key_id = YOUR_ACCESS_KEY_ID
 * aws_secret_access_key = YOUR_SECRET_ACCESS_KEY
 */
AmazonS3 s3 = new AmazonS3Client();
Region usEast1 = Region.getRegion(Regions.US_EAST_1);
s3.setRegion(usEast1);
System.out.println("Downloading an object");
S3Object object = s3.getObject(new GetObjectRequest("exports.xyz.t-z", "abc/2015/12/07/62542f4f0164689f5d18cf6-2c324750-6c47-11e5-0e29-00deb82fd81f"));
System.out.println("Content-Type: "  + object.getObjectMetadata().getContentType());

the error message is :

Downloading an object
Caught an AmazonServiceException, which means your request made it to Amazon S3, but was rejected with an error response for some reason.
Error Message:    The specified key does not exist. (Service: Amazon S3; Status Code: 404; Error Code: NoSuchKey; Request ID: 3DD21D3934A4456D)
HTTP Status Code: 404
AWS Error Code:   NoSuchKey
Error Type:       Client
Request ID:       3DD21D3934A4456D

Now as the error message says : Keys does not exist I am not sure what the key is. The data is placed to the s3 by the vendor.

However I can access the data to see what is listed in the s3 directory, using following piece of code. (You also can refer to my previous post)

System.out.println("Listing objects");
ObjectListing objectListing = s3.listObjects(
    new ListObjectsRequest().withBucketName("exports.xyz.t-z")
                            .withPrefix("abc/2015/12/07/62542f4f0164689f5d18cf6-2c324750-6c47-11e5-0e29-00deb82fd81f")
);
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
    System.out.println(" - " + objectSummary.getKey() + "  " +
            "(size = " + objectSummary.getSize() + ")");
}

What would be the best recommended way to download the data?


Answer:

It looks like you are trying to download the "directory" rather than a particular key in that directory.

You have code that lists the objects that have that prefix. When you try to download a file using getObject(new GetObjectRequest()) the second parameter to the GetObjectRequest constructor must be one of the values that is returned when you call objectSummary.getKey().

Question:

I created IAM role associated with the EC2 instance on AMAZON and as I understood from the amazon docs I can retrieve temp AWS credentials and do some stuff with that.I read that the EC2 metadata api(which is used internally by InstanceProfileCredentialsProvider) is only available for calls from within the instance, not from the outside world. What this means? How can I get secure communication with AWS when develop app on local tomcat server?


Answer:

You should use the The default provider chain and EC2 instance profiles. In your case, since you've already added the role to your instance, and considering you are using the Java SDK, you need to call:

InstanceProfileCredentialsProvider mInstanceProfileCredentialsProvider = new InstanceProfileCredentialsProvider();
AWSCredentials credentials = mInstanceProfileCredentialsProvider.getCredentials();

Or, if you are using a specific service, such as AWS S3, you can directly call:

AmazonS3 s3Client = new AmazonS3Client(new DefaultAWSCredentialsProviderChain());

For more information: http://docs.aws.amazon.com/java-sdk/latest/developer-guide/java-dg-roles.html

And just a reminder: you should NEVER leave your Access Key and Key Secret in your code.

Question:

I am trying to download a large number of files (~50 terabytes) into a S3 bucket. The problem is that these files are only accessible through various download links located on a website (they aren't already on my hard drive). I could just download a small portion of the data directly onto the hard drive of my own computer, upload it to the S3 bucket, delete it from my hard drive, and repeat with another portion, but I'm worried that doing so will take far too long and use too much bandwidth. Instead, I was hoping I could use an EC2 instance to do the same thing, as the answerer of this question suggested, but I'm having trouble how I would go about doing this with Java.

With Java, requesting and starting EC2 instances seems pretty clear; however, actually using the instance gets kind of blurry. I understand that you can use the EC2 Management Console to connect to an instance directly, and I could just manually run a script while connected to the instance that would download and upload the files, though I would prefer running a script from my computer that creates the EC2 instance and then uses the instance to accomplish my goal. This is because later on in my project, I will be downloading a file from the same website daily, and using the Windows Scheduled Task manager on my computer to run a script is cheaper than leaving the EC2 instance running 24/7 and doing it daily there.

Simply put, how do I use Java to use an EC2 instance?


Answer:

There are two distinct phases that your solution would require:

  1. Obtain a list of files to download
  2. Download the files

I would recommend separating these two tasks because an error in the logic for listing the files could stop the download process mid-way, making it difficult to resume once the problem is corrected.

Listing the files is probably best done on your local computer, which would be easy to debug and track progress. The result would be a text file with lots of links. (This is similar in concept to a lot of scraper utilities.)

The second portion (downloading the files) could be done on either Amazon EC2 or via AWS Lambda functions.

Using Amazon EC2

This would be a straight-forward app that reads your text file, loops through the links and downloads the files. If this is a one-off requirement, I wouldn't invest too much time getting fancy with multi-threading your app. However, this means you won't be taking full advantage of the network bandwidth, and Amazon EC2 is charged per hour.

Therefore, I would recommending using fairly small instance types (each with limited network bandwidth that you can saturate), but running multiple instances in parallel, each with a portion of your list of text files. This way you can divide and conquer.

If something goes wrong mid-way, you can always tweak the code, manually edit the text file to remove the entries already completed, then continue. This is fairly quick-and-dirty, but fine if this is just a one-off requirement.

Additionally, I would recommend using Amazon EC2 Spot Instances, which can save up to 90% of the cost of Amazon EC2. There is a risk of an instance being terminated if the Spot Price rises, which would cause you some extra work to determine where to resume, so simply bid a price equal to the normal On-Demand price and it will be unlikely (but not guaranteed) that your instances won't be terminated.

Using AWS Lambda functions

Each AWS Lambda function can only run for a maximum of 5 minutes and can only store 500MB of data locally. Fortunately, functions can be run in parallel.

Therefore, to use AWS Lambda, you would need to write a controlling app that calls an AWS Lambda function for each file in your list. If any of the files exceed 500MB, this would need special handling.

Writing, debugging and monitoring a parallel, distributed application like this probably isn't worth the effort for a one-off task. It would be much harder to debug any problems and recover from errors. (It would, however, be an ideal way to do continuous downloads if you have a continuing business need for this process.)

Bottom line: I would recommend writing and debugging the downloader app on your local computer (with a small list of test files), then using multiple Amazon EC2 Spot Instances running in parallel to download the files and upload them to Amazon S3. Start with one instance and a small list to test the setup, then go parallel with bigger lists. Have fun!

Question:

I want to change default domain for aws-java-sdk-s3. I need to send objects not on:

http://s3.amazonaws.com/mybucket

but on:

http://my_own_domain.com/mybucket

How to change it?

There is an example:

BasicAWSCredentials awsCreds = new BasicAWSCredentials("my_access", "my_secret");


AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
                .withCredentials(new AWSStaticCredentialsProvider(awsCreds))
                .build();

But it uses domain of amazone.


Answer:

If you really want to, you can call setEndpoint() on the AmazonS3Client object http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3Client.html#setEndpoint-java.lang.String-

Question:

I am using Amazon S3 in my android app for uploading files to the cloud storage. Every user can upload and download files from the cloud storage i.e. S3 I want to keep track of every user's upload and download... for eg User abc uploaded 26MB and downloaded 94MB

One solution to this is to implement this on my mobile app, store/track size while uploading and downloading.

Does AWS lets us know via any analytics or is there any other 3rd party api which gives transfer details.


Answer:

AWS Console does not provide any useful method to track user's upload and download, One suggestion is that you should post these information using locally if you have a local database, but if you have a remote database to save your records you may send these informations to your server using web service.

Question:

This is my first time using amazon s3 and I want to store pdf files that I create using itext in java spring.

The code (hosted on ec2 instance) creates a pdf that I would like to store somewhere. I am exploring if amazon s3 can hold those files. Eventually I would like to retrieve it as well. Can this be done using itext and java spring? Any examples would be great.


Answer:

To Upload Files to Amazon s3 You need to use putObject method of AmazonS3Client class like this:

    AWSCredentials credentials = new BasicAWSCredentials(appId,appSecret);
    AmazonS3 s3Client = new AmazonS3Client(credentials);

    String bucketPath = "YOUR_BUCKET_NAME/FOLDER_INSIDE_BUCKET";
    InputStream is = new FileInputStream("YOUR_PDF_FILE_PATH");
    ObjectMetadata meta = new ObjectMetadata();
    meta.setContentLength(is.available());
    s3Client.putObject(new PutObjectRequest(bucketPath,"YOUR_FILE.pdf", is, meta).withCannedAcl(CannedAccessControlList.Private));

And to get file from S3, You need to generate a pre-signed URL to access private file from S3 or if your files are public then you can directly access your file by hitting link of file in your browser, The link for your file will be available in AWS S3 console.

Also we have specified CannedAccessControlList.Private in the above upload code which means we are making permission of file as private So we need to generate presigned URL to access file like this:

  AWSCredentials credentials = new BasicAWSCredentials(appId,appSecret);
  AmazonS3 s3Client = new AmazonS3Client(credentials);

  GeneratePresignedUrlRequest generatePresignedUrlRequest = new GeneratePresignedUrlRequest("YOUR_BUCKET_NAME", "FOLDER_INSIDE_BUCKET/YOUR_FILE.pdf");

  generatePresignedUrlRequest.setMethod(HttpMethod.GET); 

  Date expiration = new Date();
  long milliSeconds = expiration.getTime();
  milliSeconds += 1000 * 60 * 60; // Add 1 hour.
  expiration.setTime(milliSeconds);
  generatePresignedUrlRequest.setExpiration(expiration);
  URL url = s3Client.generatePresignedUrl(generatePresignedUrlRequest); 
  String finalUrl  = url.toString();  

Question:


Answer:

What I understand is that you have some files which are modified day to day and then uploaded to S3.

  1. You can keep doing that and uploading files to S3 from your java app and AWS automatically maintains high availability across multiple availability zones and you don't need to worry about data loss.

  2. Assume you have a file abc.html which you have uploaded on S3. Next time you modify the file and upload on S3, AWS will overwrite/replace that file and you will have the latest content in that file.

  3. In case you want to maintain different versions of the same file, you can go ahead and use AWS S3 versioning. You can read more about that here: http://docs.aws.amazon.com/AmazonS3/latest/dev/Versioning.html

Also, since you're using S3 you can use AWS S3 acceleration for faster file access into your application: http://docs.aws.amazon.com/AmazonS3/latest/dev/transfer-acceleration.html