[REPORT] Multicloud and on-premises data transfers at scale with AWS DataSync #AWSreInvent #STG353

2023.12.08

I participated in the Builders' Session for AWS DataSync. In this post, I will briefly introduce this session.

Overview

image

Join this builders’ session to immerse yourself in the world of multi-cloud and on-premises data transfers. Learn how to configure and perform a data transfer from an on-premises NFS server and a publicly accessible Google Cloud Storage bucket that is hosting a public dataset to Amazon S3. AWS DataSync makes it fast and simple to migrate your data from other clouds or on-premises NFS servers to AWS as part of your business workflow. Walk away with a step-by-step guide on how to scale out DataSync tasks using multiple DataSync agents. You must bring your laptop to participate.

REPORT

Agenda

image

Single DataSync task and agent

image

Google Cloud Storage to Amazon S3

image

On premises to Amazon S3

image

Multiple agents for a single task

image

Multiple agents per task

image

Maximize bandwidth and copy large datasets with multiple tasks

image

Multiple tasks scale out agents

image

workshop

The environment was prepared in advance of the workshop by CloudFormation, I started by allowing the HTTP 80 port from MyIP to the DataSync agent security group, which is required for DataSync agent activation.

image

Activate DataSync agents

DataSync > Agents > Create agent

image image

Two agents were created, but I did not have time to run them using two.

image

Data transfer to AWS from Google Cloud Storage

In this case, we will transfer data from Google Cloud Storage to Amazon S3. We will use a single DataSync agent to start the DataSync task and observe the task metrics.

image

Check the Google Cloud Storage bucket

image

Transfer these files.

Create DataSync task

DataSync > AgenTasksts > Create task

image

Configure source location

  • Source location options: Create a new location
  • Location type: Object storage
  • Agents: Agent-1
  • Server: storage.googleapis.com
  • Bucket name: gcp-public-data-arco-era5
  • Folder: /co/single-level-reanalysis.zarr/
  • Authentication Requires credentials is unchecked

Configure destination location

  • Destination location options: Create a new location
  • Location type: Amazon S3
  • S3 bucket: datasync-s3-workshop
  • S3 storage class: Standard
  • Folder: gcp-to-s3-with-single-agent/
  • IAM role: Click Autogenerate button

Configure settings

  • Task Name: gcp-to-s3-with-single-agent
  • Verify data: Verify only the data transferred
  • Set bandwidth limit: Use available

Data transfer configuration as follows.

From Specific files and folders, set Add Pattern to copy files beginning with a specific folder and specific file name.

/stl1/10*
/stl2/10*
/stl3/10*
/stl4/10*
  • Copy object tags: OFF

In Logging, click Autogenerate to create a CloudWatch resource policy that allows CloudWatch log groups and DataSync to write to CloudWatch.

Check the contents and create a task with Create task.

Execute the DataSync task

When the task status becomes 'Available,' click on 'Start,' and then click on the 'Start with defaults' option.

image

Once the task has been executed, we can check its progress in History.

image

image

We can see that the data throughput was approximately 202 MB/second. Additionally, the file transfer took about 6 minutes, and it was copied at a rate of 209 files/second.

To check if it has been transferred to the S3 bucket

image

image

We found that the data was transferred as configured.

Conclusion

The Builders Session is a 60-minute workshop where you can easily experience AWS services. So, when I attend re:Invent, I always choose services that I don't usually work with or ones I want to catch up on. The DataSync session was many repeat sessions, and it seemed like there was a high interest from people who wanted to learn about migration services for implementing migrations. Additionally, using AWS DataSync allowed us to experience data transfer in just a few steps.

Resources

image

Scale out data migrations to AWS Storage using AWS DataSync