Automate File Handling using Python and Boto3 in AWS

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

What are we trying to do?

We might have a directory that's very cluttered and we need to organise it and back it up at the same time. We can do this by writing a script that checks for the file extensions and uploads those files to certain sub directories within a bucket. So, getting started, you need to login into your AWS account.

The first thing we need is a user that has API keys to access S3. Let's make that user first. To do this we'll create an IAM user. Go to services and search for IAM. Then click on users and add a new user. Be sure to give this user programmatic access.This will give us API keys that we can use to access AWS resources.

Now, click on permissions. We have to attach this user to a group with certain permissions attached. Permissions are just what allows your user to do certain things within the AWS platform. In this example, since our user needs to access S3 buckets, they need full access to S3. So, let's add our user to S3-Full-Acccess group that has Amazon S3 full access.

Skip the tags and move to the final page. Now you can have your API key and secret access key. You need to copy these to a python script 'secrets.py'.

access_key = ''
secret_access_key = ''

Now create a new python script and import the variables created in secrets.py. I named it as 'automatic_s3_uploader.py'.

from secrets import access_key, secret_access_key

Import some other useful packages such as boto3 and OS. Boto3 is a software development kit (SDK) designed to improve the use of the Python programming language in Amazon Web Services. While OS module in Python provides functions for interacting with the operating system. OS comes under Python's standard utility modules.

import boto3
import os

Just a few lines of code more. Our first step is to interact with the API so that we can access the resources.

 client = boto3.client('s3', aws_access_key_id = access_key, aws_secret_access_key = secret_access_key)

'aws_access_key_id' and 'aws_secret_access_key' are the keyword arguments that the client method takes in and 'access_key' and 'secret_access_key' are the variables that we created in secrets.py. With this, we are now able to access all the s3 resources that this access key has permission to, which was s3 full access. Now we're able to upload and download files.

Now we just just need to walk into a directory and depending on the extension of the file, upload it to s3.

for file in os.listdir():
    if '.py' in file:
        upload_file_bucket = 'file-handling-bucket'
        upload_file_key = 'python/' + str(file)
        client.upload_file(file, upload_file_bucket, upload_file_key)

This means that if file is of .py extension, it will get uploaded to the bucket already created. In my case, I created a bucket named 'file-handling-bucket'. And then we need to pass a key which is just the path of the file inside the bucket. So we are uploading the file to the path of a python subdirectory and then the file name.

Finally, we'll upload the files using client and upload_file method. We are passing 3 arguments in 'upload_file' method. First one is the file name, second one is the bucket, and the last one is the key.

We only have one if statement here, i.e. for '.py' files. But if you have a lot of different extensions, you can add more statements like this under 'for' loop for every file in your directory.

I hope you like this blog!