AWS DataSync

Abimuktheeswaran Chidambaram
4 min readJul 3, 2023

--

AWS DataSync is a secure, automated, online data movement from cloud to cloud (or) on-premise to cloud, Archives data to free up the storage capacity in on-premise, and replicates data to the cloud for business continuity.

In this article, we will see the following chapters briefly

1. Terminologies in DataSync

2. How the AWS DataSync Discovery works

3. How the DataSync Transfer works

4. Features of DataSync

1. Terminologies in DataSync

DataSync agent is a Virtual Machine appliance that should be downloaded from the cloud and installed in your on-premise to collect information about on-premise data regarding migration.

DataSync discovery job is the process of collecting migration information.

DataSync discovery analyses the data about your on-premise and matches with the cloud which recommends data migration.

The task describes where and how AWS DataSync transfers the data.

Task Execution means the status of the task being executed.

2. How the AWS DataSync Discovery works

AWS DataSync agent collects the data from on—premise storage systems. Based on the collected data, the agent sends the information to AWS DataSync Discovery through a public endpoint service. No need to deploy the agent to move the data within AWS services to the same account. AWS DataSync Discovery analyses the performance, and capacity of storage systems, sharing resources in file systems, Data transfer protocols, and utilized metrics, then gives a recommendation plan to migrate. The maximum duration to run the Data Discovery Job is from 1 hour to 31 days. AWS recommends at least 14 days to run the job. It keeps the details up to 60 days. The collected data can be viewed in AWS DataSync console (or) SDK (or) CLI.

3. How the DataSync Transfer works

The maximum speed of utilizing DataSync is 10 Gbps between on-premise and AWS. If the task is interrupted or fails means, the agent is restarted and transfers the missing files by performing incremental copy so all the files are transferred correctly. When transferring files, AWS DataSync creates the same directory structure on the destination as on the source location’s structure. It uses optional verification checks and integrity checks to ensure both source and destination have the same data at the end of the transfer. You can configure what you want to copy such as certain folders, specific file types, etc. When recurring transfer, it may over the destination data if the source data is changed. Recurring transfer means moving the fixed data on a planned schedule. You can schedule the task in terms of Hourly, Daily, Days of the week, weekly, and custom.

4. Features of DataSync

For Monitoring purposes, you can use Cloud Watch. It collects the raw data from AWS DataSync and processes it. This happens every 5 minutes. These statistics are retained for 15 months. You can monitor the performance of your tasks in the DataSync Management console as well as the CloudWatch console.

You can use Event Bridge events which describe the changes in DataSync resources. The DataSync Transfer event notifies if there are any changes in the agent state, Task state, task execution state, and location state. The DataSync Discovery event notifies if there are changes in the Discovery Job state and on-premise storage system state.

For Data Protection, data in transit can be protected by TLS/SSL protocol, and data in rest can be protected by encrypted keys.

If you need a secure and private connection, then you can use AWS Direct Connect along with DataSync. If your destination point is within the VPC, You may use the VPC endpoint along with AWS Private Link, so the network traffic is kept secure within Amazon VPC.

Last Updated: 07-Jan-2024

--

--