An efficient way to download/upload large files with AWS S3 library on client-side with blazing fast speed

Please share if you find this useful!!!
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

In a typical client-server application, upload and download from/to S3 is performed by the client application via the backend application. AWS library is never used at the client application as it requires an access key and secret key which cannot be shared with the client application. Backend application controls it and exposes APIs to perform such operations. 

At a high level, the following architecture is used to solve this problem.

Disadvantages of the above architecture:

  1. File content streams to S3 through the backend application adding additional network and compute resources.
  2. Limitation on file size (maximum 5GB) as backend application uses AWS S3 PUT API.  
  3. Single stream hence slower upload and download
  4. No support to use AWS accelerated transfer.

To avoid these disadvantages, we can implement a multipart upload/download feature available in the AWS S3 library in the client-side application. This will be a secure implementation where temporary access key and secret key will be generated per operation, requiring no client-side key storage.

Using AWS S3 library at client-side

AWS provides multipart upload/download functionality using AWS S3 library or REST APIs. In the case of multipart upload/download, the file splits into multiple parts and is uploaded or downloaded parallelly. 

To use the AWS S3 library at the client-side, the access key and secret key are required. However, due to security concerns, we cannot share the access and secret key on the client-side.  

This can be solved, if the backend application can generate one-time use access and secret key with specific permission to perform a specific operation. 

AWS provides the Simple Token Service (STS) to generate temporary keys for a specific operation. The backend application can leverage STS to generate one-time use access and secret key and share it with the client application. The client application can use these single-use keys to upload/download the file using the AWS S3 library.  

At a high level, the following architecture will provide a solution.

Steps to upload/download a file using temporary keys from the client application.

  • The client application requests the backend application to generate temporary keys for a specific operation.
  • The backend application will leverage AWS STS and generate temporary keys for a specific operation.
  • The client application will use temporary keys and the AWS library to perform the upload/download.
  • Configure S3 to invoke lambda to perform any post-processing like checksum calculation, deduct file type etc. 
  • Lambda will push a message to SQS if any post-processing is required upon successful upload. 
  • The Backend application will listen to SQS and perform any post-processing. 

Note: Step 4, 5 and 6 are optional.

Sequence diagram:

Advantages of the above architecture:

  1. A secure way to use AWS keys at the client application. 
  2. The client application can leverage the multipart upload/download feature available AWS S3 library.
  3. Up to 5TB file can be uploaded.
  4. Fast upload and download using multipart upload and download.

Refreshing temporary keys:

While generating temporary keys, the Backend application can set the duration for which these keys are valid. Ideally, Backend should keep this number as low as possible.  The minimum value that can be set is 15 minutes and this will work in most of the scenarios. However, in the case of a large file (100GB+), sometimes upload can take more than 15 minutes and the Client needs to refresh these keys and pass them to the AWS client.   

Steps to upload/download a file using temporary keys from the client application.

  • The client application requests the backend application to generate temporary keys.
  • The backend application will leverage AWS STS and generate temporary keys with 15 minutes of validity.
  • The client application will use temporary keys and the AWS library to perform the upload/download.
  • If upload takes more than 15 minutes, the client application will request the backend application to refresh keys.
  • The backend application will generate new keys and return them to the client.  
  • The client will replace new keys in the signature generation process and continue the upload operation. 
  • Configure S3 to invoke lambda to perform any post-processing like checksum calculation, deduct file type etc. 
  • Lambda will push a message to SQS if any post-processing is required upon successful upload. 
  • The Backend will listen to SQS and perform any post-processing.

Having done various such applications, this by far ticks all the boxes for me. I hope you find it useful in your upload/download scenarios. 


Please share if you find this useful!!!
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

2 Comments

  1. Can we use multi part upload with pre-signed URL having expiration time instead of creating temp access key and secret key every time?
    Any bottleneck we can face in multi part upload with pre-signed URLs
    Thanks

    1. Technically, we can leverage the pre-signed URL for multipart upload. However, the client application needs to generate a pre-signed URL for each part which can be overhead in case of large file upload. Also, the pre-signed URL typically used for non-programmatic access or external clients.

Leave a Reply to Aakash Cancel reply

Your email address will not be published. Required fields are marked *