Data is one of the most valuable raw materials around the world, and it’s not hard to see why. From marketing to genomics, the analysis of large datasets leads to predictive models, which lead to favorable results for the company. The more data you use, the better these models are, which means the better results they can produce. Of course, this means that moving data from one place to another is a crucial skill for any engineer, but it’s not always as easy as it sounds.
For example, if you use AWS S3 bucket storage, moving data to another S3 bucket is a single CLI command,
aws s3 cp s3://SourceBucket/* s3://DestinationBucket/. Moving the same files to a different cloud provider, such as Microsoft Azure or Google Cloud Platform, requires a completely different tool.
By the end of this tutorial, you will be able to sync files from an AWS S3 bucket to an Azure Blob storage container using rclone, an open source data synchronization tool that works with most cloud providers and local file systems.
To follow up, you will need the following:
- To the AWS S3 bucket
- An Azure Blob Storage Container
- AWS Access Keys and Azure Storage Account Access Keys
- A computer with any modern operating system
- The screenshots are from Windows 10 with WSL
- Some files to copy
How to configure rclone
Installing rclone is different for each operating system, but once installed, the instructions are the same: run
Running the config command will ask you to link your cloud provider accounts to rclone. The rclone term for this is a at a distance. When you run the config command, enter
n to create a new remote control. You’ll need it for both AWS and Azure, but there are plenty of other providers to choose from as well.
After choosing Azure Blob Storage, you will need:
- A name for the remote control. (In this demo, it’s “Azure”.)
- The name of the storage account
- One of the storage account access keys
You will be prompted for a Shared Access Signature URL, and while it can be configured using it, this demo only uses a passkey. After entering the default value for the rest of the values by pressing log into during the rest of the setup, you should be able to start using the remote.
To list the remotes configured on your system, enter
rclone listremotes, which will show the remotes available. You can also list any blob storage container by running
rclone lsd . Be sure to include a
: at the end of the remote when executing these commands because this is how rclone determines whether you want to use a remote or not. you can run
rclone --help at any time to get the list of available commands.
cat commands with a rclone remote control.
Setting up the remote control for an S3 bucket is very similar to the Azure Blob storage container, with only a few minor differences. As there are other cloud storage providers which are considered S3 compatible by rclone, you may also get some extra requests while running
rclone config. You will need:
- A name for the remote control. (In this demo, it’s “AWS.”)
- An AWS access key and a corresponding secret access key
- The AWS Region in which the bucket is located
The rest of the prompts can be configured to create other buckets or perform other operations, but for copying you can skip the rest by pressing Log into.
If the user to whom the access keys belong has access to the bucket, you will be able to access it with the same commands you used to log in to the Azure remote.
You can confirm the type of remote control by adding the button
--long flag for the
rclone listremotes command.
Now that the remotes are set up, you can transfer files, create new buckets, or manipulate files in any way you want using a standard set of commands. Instead of relying on knowing how to work with the AWS S3 CLI or Azure PowerShell, you can communicate between both storage buckets with rclone.
Some common useful commands to get started are:
rclone tree– Lists the contents of the bucket in a tree format. Add the
-Cflag to add color to the output.
rclone size—View the number of files in the bucket and their total size.
rclone sync—Make the target match the source, but only change the target. Source and destination can be local file paths or remote rclone paths. Add the
-Pflag to view progress interactively.
In the following example, the AWS S3 bucket is synchronized with the Azure remote, which deletes the existing file in Azure and then copies the data from S3. If you need to keep the files in the destination folder, use the button
rclone copyto command.
At this point, you should be comfortable installing rclone and configuring remotes, as well as using those remotes to copy data between different clouds. rclone is an extremely flexible tool and isn’t limited to just AWS and Azure, so if you’re using another cloud provider, try setting up remotes for them as well.