Hot questions for Using Amazon S3 in gradle

Question:

this may be a very simple question, but I've been stuck on it for hours...

I'm trying to add amazon cloud integrations into my project, and I can't seem to get the project to recognize the jar files. I'm currently simply trying to instantiate an AmazonS3 client:

AmazonS3 s3 = new AmazonS3Client();

I've added this to my build.gradle under dependencies{ }:

compile('com.amazonaws:aws-java-sdk:1.10.6')

When I run gradle build, it looks like it is downloading a bunch of jar files from maven, but when I go to compile my project, I get "symbol not found" errors.

error: cannot find symbol
        AmazonS3 s3 = new AmazonS3Client();
        ^

It seems like gradle isn't adding the classes to my classpath or something. Is there some plugin I need? Do I need to add the jars manually to my project?

Thanks

Edit: I'm using IntelliJ Idea to manage the project.


Answer:

Not entirely sure what the problem was before, but I was able to resolve it by:

gradle clean
gradle cleanIdea
gradle idea
...
<import required classes>
...
gradle build

Seems like this resolved it. Something was wrong with the project setup.

Question:

I want to create a private Maven repository using AWS S3 from jar, because I must control a jar which is not registered Maven Central.

I searched the way to do that by google, but I could only search the way to create from a .java file by using Maven or Gradle like this(https://medium.com/@JacobASeverson/s3-maven-repositories-and-gradle-911c25cebeeb).

How do I create a private Maven repository on S3 from jar?


Answer:

You will need an artifactory with a private repository, the quickest way to do this is with jfrog-artifactory-amazon-ec2

You can then add your private artifacts in a private repository (see creating a repository)

Then you can create a virtual repository which will combine maven from the internet and your private repository.

Use this virtual repository in you settings.xml, it will see both private and public artifacts.

Question:

I have been using a Beam pipeline examples as a guide in an attempt to load files from S3 for my pipeline. Like in the examples I have defined my own PipelineOptions that also extends S3Options and I am attempting to use the DefaultAWSCredentialsProviderChain. The code to configure this is:

MyPipelineOptions options = PipelineOptionsFactory.fromArgs(args).as(MyPipelineOptions.class);

options.setAwsCredentialsProvider(new DefaultAWSCredentialsProviderChain());
options.setAwsRegion("us-east-1");

runPipeline(options);

When I run it from Intellij it works fine using the Direct Runner but when I package it as a jar and it execute it (also using the Direct Runner) I see:

Exception in thread "main" java.lang.IllegalArgumentException: PipelineOptions specified failed to serialize to JSON.
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:166)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
    at a.b.c.beam.CleanSkeleton.runPipeline(CleanSkeleton.java:69)
    at a.b.c.beam.CleanSkeleton.main(CleanSkeleton.java:53)
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unexpected IOException (of type java.io.IOException): Failed to serialize and deserialize property 'awsCredentialsProvider' with value 'com.amazonaws.auth.DefaultAWSCredentialsProviderChain@40f33492'
    at com.fasterxml.jackson.databind.JsonMappingException.fromUnexpectedIOE(JsonMappingException.java:338)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsBytes(ObjectMapper.java:3247)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:163)
    ... 5 more

I am using gradle to build my jar with the following task:

jar {
    manifest {
        attributes (
                'Main-Class': 'a.b.c.beam.CleanSkeleton'
        )
    }
    from {
        configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) }
    }
    from('src') {
        include '/main/resources/*'
    }



    zip64 true
    exclude 'META-INF/*.RSA', 'META-INF/*.SF', 'META-INF/*.DSA'
}

Answer:

The problem was occuring because when the the fat/uber jar was being created, files in META-INF/serivces where being overwritten by duplicate files. Specifically com.fasterxml.jackson.databind.Module where a number of Jackson modules needed to be defined but where missing. These include org.apache.beam.sdk.io.aws.options.AwsModule and com.fasterxml.jackson.datatype.joda.JodaModule. The code in the DirectRunner instantiates the ObjectMapper like so :

new ObjectMapper()
      .registerModules(ObjectMapper.findModules(ReflectHelpers.findClassLoader()));

ObjectMapper::findModules relies on java.util.ServiceLoader which locates services from META-INF/services/ files.

The solution was to use the gradle Shadow plugin to build the fat/uber jar and configure it to merge the services files:

apply plugin: 'com.github.johnrengelman.shadow'
shadowJar {
    mergeServiceFiles()
    zip64 true
}

Question:

We recently started using the AWS API for S3 and SES so we added this in our dependencies as required by the docs:

compile group: 'com.amazonaws', name: 'aws-java-sdk', version: '1.11.48'

But our WAR file grew from a mere 66Mb to almost 150Mb. Is there a way to cut down on the overhead with Amazon code as it's been flooding our Perm Gen and getting OOM. I have temporarily increased our Perm Gen, but if I could remove the unnecessary code I might be able to lower our Perm Gen again.

Any official way to trim down the dependencies?


Answer:

That pulls in the entire AWS SDK. If you only want to use specific services you can just include those specific SDK components.

For example, to include just the S3 and DynamoDB components of the SDK:

compile group: 'com.amazonaws', name: 'aws-java-sdk-s3', version: '1.11.48'
compile group: 'com.amazonaws', name: 'aws-java-sdk-dynamodb', version: '1.11.48'

You can see the different components by looking at the project on GitHub.