Hot questions for Using Amazon S3 in cloud

Question:

I have a website (running within Tomcat on Elastic Beanstalk) that generates artist discographies (a single page for one artist). This can be resource intensive, so as the artist pages don't change over a month period I put a CloudFront Distribution in front of it.

I thought this would mean no artist request ever had to be served more than once by my server however its not quite as good as that. This post explains that every edge location (Europe, US etc.) will get a miss the first time they look up the resource and that there is a limit to how many resources are kept in the cloudfront cache so they could be dropped.

So to counter this I have changed by server code to store a copy of the webpage in a bucket within S3 AND to check this first when a request comes in, so if the artist page already exists in S3 then the server retrieves it and returns its contents as the webpage. This greatly reduces the processing as it only constructs a webpage for a particular artist once.

However:

  1. The request still has to go to the server to check if the artist page exists.
  2. If the artist page exists then the webpage (and they can sometimes be large up-to 20mb) is first downloaded to the server and then server returns the page.

So I wanted to know if I could improve this - I know you can construct an S3 bucket as a redirect to another website. Is there a per-page way I could get the artist request to go to the S3 bucket and then have it return the page if it exists or call server if it does not?

Alternatively could I get the server to check if page exists and then redirect to the S3 page rather than download the page to the server first?


Answer:

OP says:

they can sometimes be large up-to 20mb

Since the volume of data you serve can be pretty large, I think it is feasible for you to do this in 2 requests instead of one, where you decouple the content generation from the content serving part. The reason to do this is so as to minimize the amount of time/resources it takes on the server to fetch data from S3 and serve it.

AWS supports pre-signed URLs which can be valid for a short amount of time; We can try using the same here to avoid issues around security etc.

Currently, your architecture looks something like below, wherein. the client initiates a request, you check if the requested data exists on the S3 and then fetch and serve it if there, else you generate the content, and save it to S3:

                           if exists on S3
client --------> server --------------------> fetch from s3 and serve
                    |
                    |else
                    |------> generate content -------> save to S3 and serve

In terms of network resources, you always consume 2X the amount of bandwidth and time here. If the data exists, then once you have to pull it from server and serve it to customer (so it is 2X). If the data doesn't exist, you send it to customer and to S3 (so again it is 2X)


Instead, you can try 2 approaches below, both of which assume that you have some base template, and that the other data can be fetched via AJAX calls, and both of which bring down that 2X factor in the overall architecture.

  1. Serve the content from S3 only. This calls for changes to the way your product is designed, and hence may not be that easily integrable.

    Basically, for every incoming request, return the S3 URL for it if the data already exists, else create a task for it in SQS, generate the data and push it to S3. Based on your usage patterns for different artists, you should be having an estimate of how much time it takes to pull together the data on the average, and so return a URL which would be valid with the estimated_time_for_completetion(T) of the task.

    The client waits for time T, and then makes the request to the URL returned earlier. It makes upto say 3 attempts for fetching this data in case of failure. In fact, the data already existing on S3 can be thought of as the base case when T = 0.

    In this case, you make 2-4 network requests from the client, but only the first of those requests comes to your server. You transmit the data once to S3 only in the case it doesn't exists and the client always pulls in from S3.

                               if exists on S3, return URL
    client --------> server --------------------------------> s3
                        |
                        |else SQS task
                        |---------------> generate content -------> save to S3 
                         return pre-computed url
    
    
               wait for time `T`
    client  -------------------------> s3
    

  1. Check if data already exists, and make second network call accordingly.

    This is similar to what you currently do when serving data from the server in case it doesn't already exist. Again, we make 2 requests here, however, this time we try to serve data synchronously from the server in the case it doesn't exist.

    So, in the first hit, we check if the content had ever been generated previously, in which case, we get a successful URL, or error message. When successful, the next hit goes to S3.

    If the data doesn't exist on S3, we make a fresh request (to a different POST URL), on getting which, the server computes data, serves it, while adding an asynchronous task to push it to S3.

                               if exists on S3, return URL
    client --------> server --------------------------------> s3
    
    client --------> server ---------> generate content -------> serve it
                                           |
                                           |---> add SQS task to push to S3
    

Question:

I have a typical stateless Java application which provides a REST API and performs updates (CRUD) in a Postgresql Database.

However the number of clients is growing and I feel the need to

  • Increase redundancy, so that if one fails another takes place
  • For this I will probably need a load balancer?
  • Increase response speed by not flooding the network and the CPU of just one server (however how will the load balancer not get flooded?)
  • Maybe I will need to distribute the Database?
  • I want to be able to update my app seamlessly (I have seen a thingy called kubernetes doing this): Kill each redundant node one by one and immediately replace it with an updated version
  • My app also stores some image files, which grow fast in disk size, I need to be able to distribute them
  • All of this must be backup-able

This is the diagram of what I have now (both Java app and DB are on the same server):

What is the best/correct way of scaling this?

Thanks!


Answer:

Web Servers:

Run your app on multiple servers, behind a load balancer. Use AWS Elastic Beanstalk or roll your own solution with EC2 + Autoscaling Groups + ELB.

You mentioned a concern about "flooding" of the load balancer, but if you use Amazon's Elastic Load Balancer service it will scale automatically to handle whatever traffic you get so that you don't need to worry about this concern.

Database Servers:

Move your database to RDS and enable multi-az fail-over. This will create a hot-standby server that your database will automatically fail-over to if there are issues with your primary server. Optionally add read replicas to scale-out your database capacity.

Start caching your database queries in Redis if you aren't already. There are plugins out there to do this with Hibernate fairly easily. This will take a huge load off your database servers if your app performs the same queries regularly. Use AWS ElastiCache or RedisLabs for your Redis server(s).

Images:

Stop storing your image files on your web servers! That creates lots of scalability issues. Move those to S3 and serve them directly from S3. S3 gives you unlimited storage space, automated backups, and the ability to serve the images directly from S3 which reduces the load on your web servers.

Deployments:

There are so many solutions here that it just becomes a question about which method someone prefers. If you use Elastic Beanstalk then it provides a solution for deployments. If you don't use EB, then there are hundreds of solutions to pick from. I'd recommend designing your environment first, then choosing an automated deployment solution that will work with the environment you have designed.

Backups:

If you do this right you shouldn't have much on your web servers to backup. With Elastic Beanstalk all you will need in order to rebuild your web servers is the code and configuration files you have checked into Git. If you end up having to backup EC2 servers you will want to look into EBS snapshots.

For database backups, RDS will perform a daily backup automatically. If you want backups outside RDS you can schedule those yourself using pg_dump with a cron job.

For images, you can enable S3 versioning and multi-region replication.

CDN:

You didn't mention this, but you should look into a CDN. This will allow your application to be served faster while reducing the load on your servers. AWS provides the CloudFront CDN, and I would also recommend looking at CloudFlare.

Question:

I'd like to upload image to S3 via CloudFront. If you see the document about CloudFront, you can find that cloud front offers put method for uploading to cloudFront There could be someone to ask me why i use the cloud front for uploading to S3 If you search out about that, you can find the solution

What i wanna ask is whether there is method in SDK for uploading to cloud front or not As you know , there is method "putObejct" for uploading directly to S3 but i can't find for uploading cloud front ...

please help me..


Answer:

Data can be sent through Amazon CloudFront to the back-end "origin". This is used for using a POST on web forms, to send information back to web servers. It can also be used to POST data to Amazon S3.

If you would rather use an SDK to upload data to Amazon S3, there is no benefit in sending it "via CloudFront". Instead, use the Amazon S3 APIs to upload the data directly to S3.

So, bottom line:

  • If you're uploading from a web page that was initially served via CloudFront, send it through CloudFront to S3
  • If you're calling an API, call S3 directly

Question:

I'm developing a Spring Boot application in which I'm integrating Amazon S3 service. This class is my repository to access the S3 bucket :

public class S3FileRepository implements ImageRepository {

private String bucket;
private AmazonS3 s3Client;
private ResourceLoader resourceLoader;

public S3FileRepository(ResourceLoader resourceLoader, AmazonS3 s3Client, String bucket) {
    this.resourceLoader = resourceLoader;
    this.s3Client = s3Client;
    this.bucket = bucket;
}

private static String toS3Uri(String bucket, String imageName) {
    return String.format("s3://%s/%s", bucket, imageName);
}

@Override
public Resource getImage(String name) {
    return resourceLoader.getResource(S3FileRepository.toS3Uri(this.bucket, name).concat(this.IMAGE_EXTENSION));
}

using Spring Boot Autoconfiguration as suggested. So in my pom.xml, among other things, I've

<dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-aws-autoconfigure</artifactId>
        <version>2.1.1.RELEASE</version>
    </dependency>

    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-aws-context</artifactId>
        <version>2.1.1.RELEASE</version>
    </dependency>

Moreover I've an application.properties done like this:

cloud.aws.credentials.accessKey= (mykey)
cloud.aws.credentials.secretKey= (mysecret)
cloud.aws.credentials.instanceProfile=true
cloud.aws.region.static=eu-west-2
cloud.aws.stack.auto=false
The problem:

Everything works fine if i compile my project and then I simply run the JAR with java -jar target/myproject.jar, i correctly get the image that I ask for and everything is fine.

Instead if I run the project with the IDE default mvn spring-boot:run when I try to get an image (present in the bucket) an Exception occour saying following:

    ServletContext resource [/s3://mybucket/test.jpeg] cannot be resolved to URL because it does not exist
java.io.FileNotFoundException: ServletContext resource [/s3://mybucket/test.jpeg] cannot be resolved to URL because it does not exist

So what I think is that it throws an Exception because it's like it goes inside the jar to look for something that match s3://mybucket/test.jpeg but I can't get why, and why it happens only running the project with mvn spring-boot:run and not running the jar.


Answer:

You're likely hitting the spring-cloud-aws issue #384 whereupon the spring-boot-devtools dependency, which is activated when you start the application from the IDE, activates a different code path in resource loading.

You can test whether you're hitting this issue by removing the spring-boot-devtools dependency from your pom.xml file, reloading the project in your IDE, and running the same test.

Question:

I had found the AWS cloud watch collects Billing metrics, but couldn't find any api references to use them programmatically. I want them as metrics just like Volume metrics, Instance metrics but not as CSV format in S3 bucket. Is there any way to achieve this?


Answer:

First, two things to keep in mind:

  1. According to this documentation, the maximum number of data points returned from a single GetMetricStatistics request is 1,440. So you can't for example, query data from a week in periods of 5 minutes (cause that will be 2,016 datapoints).
  2. To get the billing metrics you can request the total of estimated charges across all services or the estimated charges per service. As stated here.

This Java 8 sample, retrieves the total of estimated charges across all services for the last two weeks in periods of twelve hours.

import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.services.cloudwatch.AmazonCloudWatchClient;
import com.amazonaws.services.cloudwatch.model.Datapoint;
import com.amazonaws.services.cloudwatch.model.Dimension;
import com.amazonaws.services.cloudwatch.model.GetMetricStatisticsRequest;
import com.amazonaws.services.cloudwatch.model.GetMetricStatisticsResult;
import java.util.Collections;
import java.util.Date;

public class AWSCloudWatchBillingService { 

    public static void main(String[] args) {
        final String awsAccessKey = "<YOUR_AWS_ACCESS_KEY>";
        final String awsSecretKey = "<YOUR_AWS_SECRET_ACCESS_KEY>";

        final AmazonCloudWatchClient client = client(awsAccessKey, awsSecretKey);
        final GetMetricStatisticsRequest request = request(); 
        final GetMetricStatisticsResult result = result(client, request);
        printIt(result);   
    }

    private static AmazonCloudWatchClient client(final String awsAccessKey, final String awsSecretKey) {
        final AmazonCloudWatchClient client = new AmazonCloudWatchClient(new BasicAWSCredentials(awsAccessKey, awsSecretKey));
        client.setEndpoint("http://monitoring.us-east-1.amazonaws.com/");
        return client;
    }

    private static GetMetricStatisticsRequest request() {
        final long twoWeeks = 1000 * 60 * 60 * 24 * 15;
        final int twelveHours = 60 * 60 * 12;
        return new GetMetricStatisticsRequest()
            .withStartTime(new Date(new Date().getTime() - twoWeeks))
            .withNamespace("AWS/Billing")
            .withPeriod(twelveHours)
            .withDimensions(new Dimension().withName("Currency").withValue("USD"))
            .withMetricName("EstimatedCharges")
            .withStatistics("Average", "Maximum")
            .withEndTime(new Date());
    }

    private static GetMetricStatisticsResult result(
            final AmazonCloudWatchClient client, final GetMetricStatisticsRequest request) {
         return client.getMetricStatistics(request);
    }

    private static void printIt(final GetMetricStatisticsResult result) {
        Collections.sort(result.getDatapoints(), (Datapoint dp1, Datapoint dp2) -> dp1.getTimestamp().compareTo(dp2.getTimestamp()));
        System.out.println("**************************************"); 
        System.out.println(result);
    }
}

Question:

Is there any official java.nio.file implementation for AWS?

I found one for GoogleCloudStorage here, and need similar for AWS and Azure.


Answer:

You can try using Amazon AWS S3 FileSystem Provider JSR-203 for Java 7 (NIO2)

Download from Maven Central

<dependency>
    <groupId>com.upplication</groupId>
    <artifactId>s3fs</artifactId>
    <version>2.2.2</version>
</dependency>

add to your META-INF/services/java.nio.file.spi.FileSystemProvider (create if not exists yet) a new line like this: com.upplication.s3fs.S3FileSystemProvider.

Use this code to create the fileSystem and set to a concrete endpoint.

FileSystems.newFileSystem("s3:///", new HashMap<String,Object>(), Thread.currentThread().getContextClassLoader());

How to use in Apache MINA

public FileSystemFactory createFileSystemFactory(String bucketName) throws IOException, URISyntaxException {
    FileSystem fileSystem = FileSystems.newFileSystem(new URI("s3:///"), env, Thread.currentThread().getContextClassLoader());
    String bucketPath = fileSystem.getPath("/" + bucketName);

    return new VirtualFileSystemFactory(bucketPath);
}

How to use in Spring

Add to classpath and configure:

@Configuration
public class AwsConfig {

    @Value("${upplication.aws.accessKey}")
    private String accessKey;

    @Value("${upplication.aws.secretKey}")
    private String secretKey;

    @Bean
    public FileSystem s3FileSystem() throws IOException {
        Map<String, String> env = new HashMap<>();
        env.put(com.upplication.s3fs.AmazonS3Factory.ACCESS_KEY, accessKey);
        env.put(com.upplication.s3fs.AmazonS3Factory.SECRET_KEY, secretKey);

        return FileSystems.newFileSystem(URI.create("s3:///"), env, Thread.currentThread().getContextClassLoader());
    }
}

Inject in any spring component:

@Autowired
private FileSystem s3FileSystem;

Question:

In Amazon S3, I have created 1 bucket under that bucket multiple subfolders like <bucket_name>/<year>/<month>/<day>/files (i.e.objects).

I want functionality where on request I can download bulk objects by year / month / day, all files in zip.

Is there any way I can do this by Amazon Java SDK?


Answer:

There is a MultipleFileDownload method provided with the Transfer Manager library that provides Multiple file download of an entire virtual directory. The contents, however, is not zipped.

See: MultipleFileDownload javadoc

Since your objects are in directories by year/month/day, you could use this method to download all files in a specific path. However, the files would not be zipped.

Question:

I have a project written in java with multiple beam pipelines within it that I compile to a jar file for execution on a server. Everything works currently where I'm just reading from GCP resources but I just added a pipeline that writes to S3. The S3 part works independently but now when I try to run the other pipelines that just use GCP, it throws an exception because I'm not providing S3 options (even though I don't need them) - the the error message below. It seems a little off that I need to specify an AWS region when I'm only using GCP resources (or maybe I'm doing something wrong). Is there a way to only register the filesystems that I'm using for a specific pipeline rather than a blanket register all filesystems on initialization?

INFO: The AWS S3 Beam extension was included in this build, but the awsRegion flag was not specified. If you don't plan to use S3, then ignore this message.

It throws this (above) info warning as if its possible to ignore it the AWS region but then throws an Exception (below).

Exception in thread "main" com.amazonaws.SdkClientException: Could not find region information for 'null' in SDK metadata.

I'm packaging my Jar file using Maven then I execute a pipeline by passing in the specific main for that pipeline (i.e. ). Here is the stack trace I get when I try to run my pipeline that does not use AWS or S3 at all, only GCP.

Jan 08, 2019 4:14:00 PM org.apache.beam.sdk.io.aws.s3.S3FileSystem <init>
INFO: The AWS S3 Beam extension was included in this build, but the awsRegion flag was not specified. If you don't plan to use S3, then ignore this message.
Exception in thread "main" com.amazonaws.SdkClientException: Could not find region information for 'null' in SDK metadata.
    at com.amazonaws.client.builder.AwsClientBuilder.getRegionObject(AwsClientBuilder.java:256)
    at com.amazonaws.client.builder.AwsClientBuilder.withRegion(AwsClientBuilder.java:243)
    at org.apache.beam.sdk.io.aws.s3.DefaultS3ClientBuilderFactory.createBuilder(DefaultS3ClientBuilderFactory.java:42)
    at org.apache.beam.sdk.io.aws.s3.S3FileSystem.<init>(S3FileSystem.java:112)
    at org.apache.beam.sdk.io.aws.s3.S3FileSystemRegistrar.fromOptions(S3FileSystemRegistrar.java:39)
    at org.apache.beam.sdk.io.FileSystems.verifySchemesAreUnique(FileSystems.java:489)
    at org.apache.beam.sdk.io.FileSystems.setDefaultPipelineOptions(FileSystems.java:479)
    at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:47)
    at org.apache.beam.sdk.Pipeline.create(Pipeline.java:145)
    at foo.GCSPipeline.runGCSPipeline(GCSPipeline.java:192)
    at foo.GCSPipeline.main(GCSPipeline.java:239)

Answer:

This is bug, which is being tracked here: https://issues.apache.org/jira/browse/BEAM-6266

Question:

This script shows how I create a CloudFront origin access identity, a bucket that will hold my webapp and how I assign the bucket policy in order to only allow access to the bucket from the CloudFront distribution.

Having this scenario, what it is really surprising (and annoying) is the fact that this code works If I debug it line by line within Eclipse but If I try to launch it without going line by line (i.e. setting a breakpoint just after the policy assignment), then the below exception appears...

Hope someone can help!

String myBucket = transferManager.getAmazonS3Client().createBucket(new CreateBucketRequest("my-bucket-name")).getName();

CloudFrontOriginAccessIdentity myOAI = cloudFrontClient.createCloudFrontOriginAccessIdentity(
                        new CreateCloudFrontOriginAccessIdentityRequest().withCloudFrontOriginAccessIdentityConfig(
                                new CloudFrontOriginAccessIdentityConfig().withCallerReference(UUID.randomUUID().toString()).withComment("myOAI"))).getCloudFrontOriginAccessIdentity();

//*ATTEMPT 1: Using canonical user Id*
transferManager.getAmazonS3Client().setBucketPolicy(myBucketName, new Policy().
withId("MyPolicyForCloudFrontPrivateContent").
withStatements(new Statement(Effect.Allow).
withId("Grant CloudFront Origin Identity access to support private content").
withActions(S3Actions.GetObject).
withPrincipals(new Principal("CanonicalUser:" + myOAI.getS3CanonicalUserId())).
withResources(new S3ObjectResource(myBucketName,"*"))).toJson());

//*ATTEMPT 2: Using OAI id*
transferManager.getAmazonS3Client().setBucketPolicy(myBucketName, new Policy().
withId("MyPolicyForCloudFrontPrivateContent").
withStatements(new Statement(Effect.Allow).
withActions(S3Actions.GetObject).
withPrincipals(new Principal("arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity " + myOAI.getId())).
withResources(new S3ObjectResource(myBucketName,"*"))).toJson());

//*ATTEMP 3: HARDCODING THE POLICY*
String myPolicy = "{\"Version\":\"2012-10-17\",\"Id\":\"PolicyForCloudFrontPrivateContent\",\"Statement\":[{\"Sid\":\" Grant a CloudFront Origin Identity access to support private content\",\"Effect\":\"Allow\",\"Principal\":{\"CanonicalUser\":\"" + myOAI.getS3CanonicalUserId() + "\"},\"Action\":\"s3:GetObject\",\"Resource\":\"arn:aws:s3:::" + myBucketName + "/*\"}]}";
transferManager.getAmazonS3Client().setBucketPolicy(myBucketName, myPolicy);


//*ERROR MESSAGE*

Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Invalid principal in policy (Service: Amazon S3; Status Code: 400; Error Code: MalformedPolicy; Request ID: XXXXXXXXXXXXX), S3 Extended Request ID: YYYYYYYYYYYYYYYYYYYYYY+XXXXXXXXXXXXXXXXXXXXXXX=
    at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1088)
    at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:735)
    at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:461)
    at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:296)
    at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3737)
    at com.amazonaws.services.s3.AmazonS3Client.setBucketPolicy(AmazonS3Client.java:2372)
    at com.myapp.services.DeploymentService.applyVersion(DeploymentService.java:234)
    at com.myapp.services.DeploymentService.launch(DeploymentService.java:3553)
    at com.myapp.EntryPoint.main(EntryPoint.java:35)

Answer:

Found the problem...

It looks like when you create a Cloudfront Origin Access Identity (OAI) and try to assign it into a bucket policy inmediately the error appears because that new OAI change is not propagated inmediately.

A valid workaround is to implement a retry condition policy:

class CloudFrontRetryCondition implements RetryCondition {
    @Override
    public boolean shouldRetry(AmazonWebServiceRequest originalRequest, AmazonClientException exception, int retriesAttempted) {
        if(exception instanceof AmazonS3Exception) {
            final AmazonS3Exception s3Exception = (AmazonS3Exception) exception;
            return  s3Exception.getStatusCode() == 400 &&
                    s3Exception.getErrorCode().equals("MalformedPolicy") &&
                    s3Exception.getErrorMessage().equals("Invalid principal in policy") &&
                s3Exception.getAdditionalDetails().get("Detail").contains("arn:aws:iam::cloudfront:user/CloudFront Origin Access Identity");
        } else {
            return false;
        }
    }
}

Question:

To get a message when the key is not present in S3 bucket. iam retrieving all the objects in that bucket and matching these keys with the given Search-key. if available returning the URL-String otherwise returning a message 'The specified key does not exist'.

Is their any other way to improve performance while accessing the key, which is not available in S3 bucket.

Here is my Code:

public class S3Objects {
    static Properties props = new Properties();
    static InputStream resourceAsStream;
    static {
        ClassLoader classLoader = new S3Objects().getClass().getClassLoader();
        resourceAsStream = classLoader.getResourceAsStream("aws.properties");
        try {
            props.load(resourceAsStream);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) throws IOException, AmazonServiceException, AmazonClientException, InterruptedException {
        AWSCredentials awsCreds = new 
                        BasicAWSCredentials(props.getProperty("accessKey"), props.getProperty("secretKey"));
                        // PropertiesCredentials(resourceAsStream);
        AmazonS3 s3Client = new AmazonS3Client( awsCreds );

        String s3_BucketName = props.getProperty("bucketname");
        String folderPath_fileName = props.getProperty("path");

        //uploadObject(s3Client, s3_BucketName, folderPath_fileName);
        //downloadObject(s3Client, s3_BucketName, folderPath_fileName);
        //getSignedURLforS3File(s3Client, s3_BucketName, folderPath_fileName);
        String url = getSingnedURLKey(s3Client, s3_BucketName, folderPath_fileName);
        System.out.println("Received response:"+url);
    }
    //  <MaxKeys>1000</MaxKeys>
    private static String getSingnedURLKey(AmazonS3 s3Client, String s3_BucketName, String folderPath_fileName) {
        String folderPath = folderPath_fileName.substring(0,folderPath_fileName.lastIndexOf("/"));      
        ObjectListing folderPath_Objects = s3Client.listObjects(s3_BucketName, folderPath);

        List<S3ObjectSummary> listObjects = folderPath_Objects.getObjectSummaries();
        for(S3ObjectSummary object : listObjects){
            if(object.getKey().equalsIgnoreCase(folderPath_fileName)){
                return getSignedURLforS3File(s3Client, s3_BucketName, folderPath_fileName);
            }
        }
        return "The specified key does not exist.";
    }

    //  providing pre-signed URL to access an object w/o any AWS security credentials.
   //   Pre-Signed URL = s3_BucketName.s3.amazonaws.com/folderPath_fileName?AWSAccessKeyId=XX&Expires=XX&Signature=XX
    public static String getSignedURLforS3File(AmazonS3 s3Client, String s3_BucketName, String folderPath_fileName){
        GeneratePresignedUrlRequest request = new GeneratePresignedUrlRequest(s3_BucketName, folderPath_fileName, HttpMethod.GET);
        request.setExpiration( new Date(System.currentTimeMillis() + 1000 * 60 * 15) ); // Default 15 min

        String url = s3Client.generatePresignedUrl( request ).toString();
        System.out.println("Pre-Signed URL = " + url);
        return url;
    }

    public static void uploadObject(AmazonS3 s3Client, String s3_BucketName, String folderPath_fileName) 
            throws AmazonServiceException, AmazonClientException, InterruptedException{
        TransferManager tm = new TransferManager(s3Client);

        PutObjectRequest putObjectRequest = 
                new PutObjectRequest(s3_BucketName, folderPath_fileName, new File("newImg.jpg"));
        Upload myUpload = tm.upload( putObjectRequest );
        myUpload.waitForCompletion();//block the current thread and wait for your transfer to complete.
        tm.shutdownNow();            //to release the resources once the transfer is complete.
    }
   //   When accessing a key which is not available in S3, it throws an exception The specified key does not exist.
    public static void downloadObject(AmazonS3 s3Client, String s3_BucketName, String folderPath_fileName) throws IOException{
        GetObjectRequest request = new GetObjectRequest(s3_BucketName,folderPath_fileName);
        try{
            S3Object s3object = s3Client.getObject( request );
            System.out.println("Content-Type: " + s3object.getObjectMetadata().getContentType());
            S3ObjectInputStream objectContent = s3object.getObjectContent();

            FileUtils.copyInputStreamToFile(objectContent, new File("targetFile.jpg"));
        }catch(AmazonS3Exception s3){
            System.out.println("Received error response:"+s3.getMessage());
        }
    }

}

aws.properties

accessKey   =XXXXXXXXX
secretKey   =XXXXXXXXX

bucketname  =examplebucket
path        =/photos/2006/February/sample.jpg

Please let me know weather their is any other way to reduce no.of iterations over all the keys and get some message 'Key not exists'.

When am requesting a key to generate Pre-Signed URL. if

  • Key Present « Returning the signed URL.
  • Key Not Present « Message as key not available.

Answer:

Use getObjectMetadata to quickly determine whether an object exists, given the key. If it succeeds, the object exists. If it doesn't, inspect the error to confirm it wasn't a transient error that needs to be retried. If not, there's no such key.

Iterating through the objects as you are doing not only doesn't scale, it'a also substantially more expensive, since the list requests carry a higher price per request than getting an object or getting its metadata, which should be very fast. This operation sends S3 an HTTP HEAD request, which returns 200 OK only if the object is there.

However, I would argue from a design perspective that this service shouldn't really care whether the object exists. Why would you receive requests for objects that don't exist? Who's asking for that? That should be the caller's problem -- and if you generate a signed URL for an object that doesn't exist, the request will fail with an error, when the caller tries to use the URL... But generating a signed URL for a non-existent object is a perfectly valid operation. The URL can be signed before the object is actually uploaded, and, as long as the URL hasn't expired, it will still work once the object is created, if it's created later.

Question:

I recently upgraded my SpringCloud project from Brixton to Finchley and everything was working just fine. I was working on Finchley.SR2 and I had no problems, but whenever I upgrade my project to Finchley.RELEASE (and this is the only change I make), the project fails to start.

The reason is that the project find the AmazonS3Client Bean:

...Unsatisfied dependency expressed through constructor parameter 0; 
nested exception is 
  org.springframework.beans.factory.NoSuchBeanDefinitionException: 
    No qualifying bean of type 'com.amazonaws.services.s3.AmazonS3Client' available: 
      expected at least 1 bean which qualifies as autowire candidate. 
        Dependency annotations: {}

These are my previous relevant configurations and classes:

build.gradle

buildscript {
    ext {
        springBootVersion = '2.0.2.RELEASE'
    }

    ...

    dependencies {
        classpath("org.springframework.boot:spring-boot-gradle-plugin:${springBootVersion}")
        classpath('io.spring.gradle:dependency-management-plugin:1.0.5.RELEASE')
    }
}

apply plugin: 'java'
apply plugin: 'org.springframework.boot'
apply plugin: 'io.spring.dependency-management'

dependencyManagement {
    imports {
        mavenBom "org.springframework.cloud:spring-cloud-dependencies:Finchley.SR2"
    }
}

dependencies {
    ...
    compile('org.springframework.boot:spring-boot-starter-web')
    compile('org.springframework.cloud:spring-cloud-starter-aws')
    compile('org.springframework.cloud:spring-cloud-starter-config'
    ...
}

...

S3Config.java (The class that creates the AmazonS3/AmazonS3Client Bean)

...

@Configuration
public class S3Config {

    @Bean
    public AmazonS3 amazonS3() {
        return AmazonS3ClientBuilder.standard()
                .withCredentials(new DefaultAWSCredentialsProviderChain())
                .build();
    }
}

StorageService (the class that fails to find the Bean)

...

@Service
public class StorageService {

    private final AmazonS3Client amazonS3Client;

    @Autowired
    public StorageService(AmazonS3Client amazonS3Client) {
        this.amazonS3Client = amazonS3Client;
    }

    ...
}

And this is the only change I make to the build.gradle file when upgrading to Finchley.Release:

dependencyManagement {
    imports {
        mavenBom "org.springframework.cloud:spring-cloud-dependencies:Finchley.RELEASE"
    }
}

I've tried looking for any missing library and tweaking all the configurations I can find, but none seem to take any effect.


Answer:

After a brief talk with the Spring maintainers, a solution was found.

It seems I was at fault by assuming that a Bean of AmazonS3 should always be found as an AmazonS3Client Bean just because one implements the other. It was just pure luck that it worked on previous Spring versions.

The proper way to create an AmazonS3Client would be the following:

@Configuration
public class S3Config {

    @Bean
    public static AmazonS3Client amazonS3Client() {
        return (AmazonS3Client) AmazonS3ClientBuilder.standard()
                .withCredentials(new DefaultAWSCredentialsProviderChain())
                .build();
    }
}

Question:

How to upload document from Amazon S3 to Cloud Search using AWS Java SDK and make it indexable as well?


Answer:

I does not found any direct way within the SDK to index document from S3 to Cloud Search. So i use like this.

Question:

I have a Spring Batch Job that reads a bunch of files from an S3 Bucket, process them and then send it to a database, doing this in a multi-threaded configuration. The application.properties file contains this:

cloud.aws.credentials.accessKey=accessKey 
cloud.aws.credentials.secretKey=secret
cloud.aws.region.static=us-east-1
cloud.aws.credentials.instanceProfile=true 
cloud.aws.stack.auto=false

My ItemReader:

@Bean
ItemReader<DataRecord> itemReader() {
    FlatFileItemReader<DataRecord> flatFileItemReader = new FlatFileItemReader<>();
    flatFileItemReader.setLinesToSkip(0);
    flatFileItemReader.setLineMapper(new DataRecord.DataRecordLineMapper());
    flatFileItemReader.setSaveState(false);

    MultiResourceItemReader<DataRecord> multiResourceItemReader = new MultiResourceItemReader<>();
    multiResourceItemReader.setDelegate(flatFileItemReader);
    multiResourceItemReader.setResources(loadS3Resources(null, null));
    multiResourceItemReader.setSaveState(false);

    SynchronizedItemStreamReader<DataRecord> itemStreamReader = new SynchronizedItemStreamReader<>();
    itemStreamReader.setDelegate(multiResourceItemReader);
    return itemStreamReader;
}

And my TaskExecutor:

@Bean
TaskExecutor taskExecutor() {
    ThreadPoolTaskExecutor threadPoolTaskExecutor = new ThreadPoolTaskExecutor();
    threadPoolTaskExecutor.setCorePoolSize(Runtime.getRuntime().availableProcessors());
    return threadPoolTaskExecutor;
}

The Job consist in only one Step where it reads from the files, process them and then writes to the DB. Under this configuration, the resources are loaded, the Job starts and the step do the processing for the ~240k first lines of the first Resource (There are 7 Resources, each one with 1.2M lines). Then I get the following Exception:

org.springframework.batch.item.file.NonTransientFlatFileException: Unable to read from resource: [Amazon s3 resource [bucket='my-bucket' and object='output/part-r-00000']]
at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:220) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.item.file.FlatFileItemReader.doRead(FlatFileItemReader.java:173) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.item.support.AbstractItemCountingItemStreamItemReader.read(AbstractItemCountingItemStreamItemReader.java:88) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.item.file.MultiResourceItemReader.readFromDelegate(MultiResourceItemReader.java:140) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.item.file.MultiResourceItemReader.readNextItem(MultiResourceItemReader.java:119) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.item.file.MultiResourceItemReader.read(MultiResourceItemReader.java:108) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.item.support.SynchronizedItemStreamReader.read(SynchronizedItemStreamReader.java:55) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider.doRead(SimpleChunkProvider.java:91) ~[spring-batch-core-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider.read(SimpleChunkProvider.java:157) ~[spring-batch-core-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider$1.doInIteration(SimpleChunkProvider.java:116) ~[spring-batch-core-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:374) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:215) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:144) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.core.step.item.SimpleChunkProvider.provide(SimpleChunkProvider.java:110) ~[spring-batch-core-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.core.step.item.ChunkOrientedTasklet.execute(ChunkOrientedTasklet.java:69) ~[spring-batch-core-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:406) ~[spring-batch-core-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:330) ~[spring-batch-core-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:133) ~[spring-tx-4.3.9.RELEASE.jar!/:4.3.9.RELEASE]
at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:271) ~[spring-batch-core-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:81) ~[spring-batch-core-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at org.springframework.batch.repeat.support.TaskExecutorRepeatTemplate$ExecutingRunnable.run(TaskExecutorRepeatTemplate.java:262) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_65]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_65]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
Caused by: javax.net.ssl.SSLException: SSL peer shut down incorrectly
at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:596) ~[na:1.8.0_65]
at sun.security.ssl.InputRecord.read(InputRecord.java:532) ~[na:1.8.0_65]
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) ~[na:1.8.0_65]
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) ~[na:1.8.0_65]
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) ~[na:1.8.0_65]
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137) ~[httpcore-4.4.6.jar!/:4.4.6]
at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198) ~[httpcore-4.4.6.jar!/:4.4.6]
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176) ~[httpcore-4.4.6.jar!/:4.4.6]
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135) ~[httpclient-4.5.3.jar!/:4.5.3]
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) ~[aws-java-sdk-core-1.11.125.jar!/:na]
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) ~[aws-java-sdk-core-1.11.125.jar!/:na]
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) ~[aws-java-sdk-core-1.11.125.jar!/:na]
at com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:117) ~[aws-java-sdk-s3-1.11.125.jar!/:na]
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) ~[aws-java-sdk-core-1.11.125.jar!/:na]
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) ~[aws-java-sdk-core-1.11.125.jar!/:na]
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) ~[aws-java-sdk-core-1.11.125.jar!/:na]
at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180) ~[aws-java-sdk-core-1.11.125.jar!/:na]
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) ~[aws-java-sdk-core-1.11.125.jar!/:na]
at com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107) ~[aws-java-sdk-core-1.11.125.jar!/:na]
at com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72) ~[aws-java-sdk-core-1.11.125.jar!/:na]
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[na:1.8.0_65]
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[na:1.8.0_65]
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[na:1.8.0_65]
at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[na:1.8.0_65]
at java.io.BufferedReader.fill(BufferedReader.java:161) ~[na:1.8.0_65]
at java.io.BufferedReader.readLine(BufferedReader.java:324) ~[na:1.8.0_65]
at java.io.BufferedReader.readLine(BufferedReader.java:389) ~[na:1.8.0_65]
at org.springframework.batch.item.file.FlatFileItemReader.readLine(FlatFileItemReader.java:201) ~[spring-batch-infrastructure-3.0.7.RELEASE.jar!/:3.0.7.RELEASE]
... 23 common frames omitted

I would like to know if there's a simple way to fix this. Currently I'm thinking to just make a local copy of the files and then read from those but I would like to know if this Exception can be avoided by some configuration.

Thanks!


Answer:

My guess would be one thread closing the SFTP session, while another thread is still processing.

Better to use the MultiResourcePartitioner to create a partition per resource (file) and then have the reader pick up each file separately as it's own partition. With that configuration, you don't need the MultiResourceItemReader anymore as well (you can go straight to the delegate).

Refer example here https://github.com/spring-projects/spring-batch/blob/master/spring-batch-samples/src/main/resources/jobs/partitionFileJob.xml

Also refer How to apply partitioned count for MultiResourceItemReader?

Question:


Answer:

Finally, I got the answer. I'm using the CloudFront Http method for upload Image/Document in S3 bucket via CloudFront. Here is my code

URL url;
    try {
        url = new URL("**cloudfront URL***"+imagePath);
          HttpURLConnection connection = (HttpURLConnection) url.openConnection();
            connection.setDoOutput(true);
            connection.setDoInput(true);
            connection.setRequestMethod("PUT");
            connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); 
            connection.setRequestProperty("charset","UTF-8");
            connection.setRequestProperty("Content-Length",imageByteArray.length+"");
            DataOutputStream wr = new DataOutputStream(connection.getOutputStream ());
            wr.write(imageByteArray);
            wr.flush();
            wr.close();
            connection.disconnect();
            // Check the HTTP response code. To complete the upload and make the object available, 
            // you must interact with the connection object in some way.
            connection.getResponseCode();
            System.out.println("HTTP response code: " + connection.getResponseCode());
    } catch (Exception e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

Question:

I am currently trying to create a Java application that can call and reference the Amazon AWS API through the AWS Java SDK. I have been able to make calls directly to services like S3 and EC2 but when I try to pull data from Cloudwatch, I am unable to get any datapoints.

I have tried adjusting different variables (Dimensions, adjusting the time variables) and I have tried to pull the data through the CLI. When I request the data through the CLI, I AM able to get datapoints but the Java app does not get the same data. Here is my CLI code:

aws cloudwatch get-metric-statistics --metric-name BucketSizeBytes --namespace "AWS/S3" --start-time 2019-06-21T00:00:00Z --end-time 2019-06-22T00:00:00Z --period 3600 --statistics Average --unit Bytes --output json --region us-east-1 --dimensions Name=BucketName,Value=XXXXX Name=StorageType,Value=StandardStorage

Here is what I am using on the Java side. The variable namespace is equal to the string "AWS/S3" and the variable region is set to Region.US_EAST_1

Setting up CloudWatch Client

 private CloudWatchClient cwClient = CloudWatchClient.builder().region(region).build();

Calling the Data

public S3 individualS3BucketSize(S3 s3) {
        Instant now = Instant.now();
        Dimension dimensions = Dimension.builder().name("BucketName").value("XXXXX").name("StorageType").value("StandardStorage").build();

        GetMetricStatisticsRequest request = GetMetricStatisticsRequest.builder().namespace(namespace).metricName("BucketSizeBytes")
            .statistics(Statistic.AVERAGE)
            .startTime(now.minus(Duration.ofDays(1))).endTime(now).period(3600)
            .dimensions(dimensions)
            .build();

        GetMetricStatisticsResponse response;

        response = cwClient.getMetricStatistics(request);
        System.out.println(response.toString());
}

When the method is called and the print method is run, I get:

GetMetricStatisticsResponse(Label=BucketSizeBytes, Datapoints=[])

Any thoughts as to why it is coming back blank in the Java app but not the CLI?


Answer:

The problem with the above code lies within the time piece. This specific call (BucketSizeBytes) to Cloudwatch does not return data unless it is in a 1d window of time due to the reporting time of this specific metric. If you go onto the web dashboard for CloudWatch, no data will be pulled unless the time range is set to 1d.

Since the above code had the start and end time within 24 hours of each other, no data points were going to appear. I have revised the code for slight readability improvements and correct functionality.

public S3 individualS3BucketSize(S3 s3) {
        Instant now = Instant.now();
        Instant earlier = now.minusSeconds(259201); //3 Days in the past in seconds
        Statistic stat = Statistic.AVERAGE;
        GetMetricStatisticsResponse response;

        Dimension dimensionsName = Dimension.builder().name("BucketName").value(XXXXX).build();
        Dimension dimensionsStorage = Dimension.builder().name("StorageType").value("StandardStorage").build();

        Collection<Dimension> dimensions = new ArrayList<>();
        dimensions.add(dimensionsName);
        dimensions.add(dimensionsStorage);

        GetMetricStatisticsRequest request = GetMetricStatisticsRequest.builder().namespace(namespace).metricName("BucketSizeBytes")
            .dimensions(dimensions)
            .startTime(earlier).endTime(now).period(3600)
            .unit("Bytes").statistics(stat).build();

        response = cwClient.getMetricStatistics(request);
        System.out.println(response.toString());
        return null;
    }

Question:

I am writing Java app which sends email via Amazon SES service, and that works fine. But now I need to retrieve email sending statistics as granural as per email ID basis.

So, I use CloudWatch and pass the notifications to SNS. Yet, I cannot infer away how to get the statistics as per explicit request to the Web service. The SNS endpoints are able to dispatch the data as on needing basis. When I want to make explicit request from my app on service for stats.

The S3 service is for storage. Do I need to store stats somehow on it, so that later I can query it? Any resolutions and details are appretiated?


Answer:

Hi For your requirement as per my understanding AWS Dynamo DB is the best way. AWS Dynamo DB is a No sql related DB. After sending email you can store the result (emailId, if you want time ect..) into Dynamo DB by using SNS, or nay lambda functions. You can fire a query to dynamo DB to get the statistics.

If you want to go with S3 bucket way, you have a maintain one json file, and each time need to overwrite that file.

Question:

I've got a frontend built with angularJS that speaks to a backend in Java.

I have all code up and running which uploads and downloads images from S3 to my app.

I can also access videos in S3 and display them with the videojs library.

Performance-wise, I've been reading a bit about Amazon Cloudfront. I've already implemented the ETag cache in my code and it works well.

However, what is my next step to integrate this Cloudfront? All tutorials I've looked into only show static files. Would I need to make additional code changes or is it only a matter of configuration on my AWS stack? One of the key points I want to achieve and learn is how to make the videos being streamed instead of having them downloaded fully into the client.


Answer:

CloudFront is a Content Delivery Network (CDN). You create a distribution and you tell it which S3 bucket holds the files you want to serve. Then, if you have a domain name you can add a CNAME cdn.yourdomain.com and map it to the domain offered to you by CloudFront.

Users visiting cdn.yourdomain.com/yourfile will effectively get the cached versions instead of downloading it from S3.

You can also create an RMTP distribution for streaming, which allows users to play the video while it's being downloaded and it uses the Adobe Flash Media RMTP protocol.