[REPORT] Multicloud and on-premises data transfers at scale with AWS DataSync #AWSreInvent #STG353


I participated in the Builders' Session for AWS DataSync. In this post, I will briefly introduce this session.



Join this builders’ session to immerse yourself in the world of multi-cloud and on-premises data transfers. Learn how to configure and perform a data transfer from an on-premises NFS server and a publicly accessible Google Cloud Storage bucket that is hosting a public dataset to Amazon S3. AWS DataSync makes it fast and simple to migrate your data from other clouds or on-premises NFS servers to AWS as part of your business workflow. Walk away with a step-by-step guide on how to scale out DataSync tasks using multiple DataSync agents. You must bring your laptop to participate.




Single DataSync task and agent


Google Cloud Storage to Amazon S3


On premises to Amazon S3


Multiple agents for a single task


Multiple agents per task


Maximize bandwidth and copy large datasets with multiple tasks


Multiple tasks scale out agents



The environment was prepared in advance of the workshop by CloudFormation, I started by allowing the HTTP 80 port from MyIP to the DataSync agent security group, which is required for DataSync agent activation.


Activate DataSync agents

DataSync > Agents > Create agent

image image

Two agents were created, but I did not have time to run them using two.


Data transfer to AWS from Google Cloud Storage

In this case, we will transfer data from Google Cloud Storage to Amazon S3. We will use a single DataSync agent to start the DataSync task and observe the task metrics.


Check the Google Cloud Storage bucket


Transfer these files.

Create DataSync task

DataSync > AgenTasksts > Create task


Configure source location

  • Source location options: Create a new location
  • Location type: Object storage
  • Agents: Agent-1
  • Server: storage.googleapis.com
  • Bucket name: gcp-public-data-arco-era5
  • Folder: /co/single-level-reanalysis.zarr/
  • Authentication Requires credentials is unchecked

Configure destination location

  • Destination location options: Create a new location
  • Location type: Amazon S3
  • S3 bucket: datasync-s3-workshop
  • S3 storage class: Standard
  • Folder: gcp-to-s3-with-single-agent/
  • IAM role: Click Autogenerate button

Configure settings

  • Task Name: gcp-to-s3-with-single-agent
  • Verify data: Verify only the data transferred
  • Set bandwidth limit: Use available

Data transfer configuration as follows.

From Specific files and folders, set Add Pattern to copy files beginning with a specific folder and specific file name.

  • Copy object tags: OFF

In Logging, click Autogenerate to create a CloudWatch resource policy that allows CloudWatch log groups and DataSync to write to CloudWatch.

Check the contents and create a task with Create task.

Execute the DataSync task

When the task status becomes 'Available,' click on 'Start,' and then click on the 'Start with defaults' option.


Once the task has been executed, we can check its progress in History.



We can see that the data throughput was approximately 202 MB/second. Additionally, the file transfer took about 6 minutes, and it was copied at a rate of 209 files/second.

To check if it has been transferred to the S3 bucket



We found that the data was transferred as configured.


The Builders Session is a 60-minute workshop where you can easily experience AWS services. So, when I attend re:Invent, I always choose services that I don't usually work with or ones I want to catch up on. The DataSync session was many repeat sessions, and it seemed like there was a high interest from people who wanted to learn about migration services for implementing migrations. Additionally, using AWS DataSync allowed us to experience data transfer in just a few steps.



Scale out data migrations to AWS Storage using AWS DataSync