Dynamodb Queries and Scanning using Python Boto3

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

I was trying to work with dynamodb using python Boto3 and I thought of sharing with you the concept of 'Query and Scan'.

Before we dive into DynamoDB queries and scanning, let's make sure we have the necessary tools and credentials to interact with DynamoDB using Python Boto3.

First, you'll need to have Python installed on your machine. You can download the latest version of Python from the official website https://www.python.org/downloads/.

Next, you'll need to install the Boto3 library, which is the Python SDK for AWS. You can install Boto3 using pip, which is a package manager for Python. Open up your terminal or command prompt and run the following command:

pip install boto3

Once Boto3 is installed, you'll need to configure your AWS credentials. You can do this by creating a credentials file and a config file in the .aws directory in your home directory. You can do this by typing aws config in your terminal.

Make sure to replace YOUR_ACCESS_KEY_ID and YOUR_SECRET_ACCESS_KEY with your own AWS credentials.

Now that we have Python, Boto3, and our AWS credentials set up, we can start querying and scanning data in DynamoDB.

We'll start by making a table through Python.

import boto3

dynamodb = boto3.resource('dynamodb')

# Create the DynamoDB table.
table = dynamodb.create_table(
    TableName='employees_total',
    KeySchema=[
        {
            'AttributeName': 'emp_id',
            'KeyType': 'HASH'
        },
        {
            'AttributeName': 'entry_time',
            'KeyType': 'RANGE'
        }
    ],
    AttributeDefinitions=[
        {
            'AttributeName': 'emp_id',
            'AttributeType': 'S'
        },
        {
            'AttributeName': 'entry_time',
            'AttributeType': 'S'
        },
    ],
    ProvisionedThroughput={
        'ReadCapacityUnits': 5,
        'WriteCapacityUnits': 5
    }
)

In the given snippet, KeySchema refers to the primary key structure of the DynamoDB table being created. The primary key uniquely identifies each item in the table and consists of one or two attributes.

In this example, the KeySchema has two attributes, username and last_name. 'emp_id' is defined as the HASH key, meaning it is the primary key for the table and is used to distribute data across partitions. 'entry_time' is defined as the RANGE key, which is used to sort the data within each partition by the specified attribute.

Together, the HASH and RANGE keys form a composite primary key that uniquely identifies each item in the table. The AttributeDefinitions parameter is used to define the data types of these attributes, and the ProvisionedThroughput parameter is used to specify the read and write capacity units for the table.

In order to use the existing table, write the following code,

import boto3

# Get the service resource.
dynamodb = boto3.resource('dynamodb')

emp_table = dynamodb.Table('employees_total')

print(emp_table.creation_date_time)

Now, you can enter the data in your table.

emp_table.put_item(
   Item={
        'emp_id': 'CM101',
        'emp_name': 'Jane Doe',
        'entry_time':'16:00:00',
        'exit_time': '17:00:00'
    }
)

You can enter similar data according to your choice in your table.

Queries and Scans

Now, let's start with Queries and Scans. Before starting with the code, let's first understand what they mean.

Queries retrieve a subset of items that match a specific partition key value or a combination of partition key value and sort key value. Queries work by creating an index on one or more attributes, and then using that index to search for items that match the specified criteria.

Scans, on the other hand, retrieve all items in a table or a subset of items based on a filter expression. Scans are less efficient than queries because they read all items in a table or a subset of items, which can be very expensive for large tables.

With the table full of items, you can then query or scan the items in the table using the DynamoDB.Table.query() or DynamoDB.Table.scan() methods respectively. To add conditions to scanning and querying the table, you will need to import the boto3.dynamodb.conditions.Key and boto3.dynamodb.conditions.Attr classes. The boto3.dynamodb.conditions.Key should be used when the condition is related to the key of the item. The boto3.dynamodb.conditions.Attr should be used when the condition is related to an attribute of the item. You should import them through the following command:

from boto3.dynamodb.conditions import Key, Attr

For example, this query is for the user whose emp_id key equals 'CM105':

response = emp_table.query(
    KeyConditionExpression=Key('emp_id').eq('CM105')
)
items = response['Items']
print(items)

Expected Output:

[{'exit_time': '21:15:00', 'entry_time': '21:00:00', 'emp_name': 'Kasa', 'emp_id': 'CM105'}]

Similarly you can scan the table based on attributes of the items. For example, this scans for all the users whose name starts with 'M':

response = emp_table.scan(
    FilterExpression=Attr('emp_name').begins_with('M')
)
items = response['Items']
print(items)

Expected Output:

[{'exit_time': '20:01:00', 'entry_time': '20:00:00', 'emp_name': 'Max', 'emp_id': 'CM104'}, {'exit_time': '14:35:00', 'entry_time': '14:00:00', 'emp_name': 'Manji', 'emp_id': 'CM110'}]

You can also chain conditions together using the logical operators: & (and), | (or), and ~ (not). For example, this scans for all users whose emp_name starts with 'K' and whose entry_time equals '19:00:00':

response = emp_table.scan(
    FilterExpression=Attr('emp_name').begins_with(
        'K') & Attr('entry_time').eq('19:00:00')
)
items = response['Items']
print(items)

Expected Output:

[{'exit_time': '19:02:00', 'entry_time': '19:00:00', 'emp_name': 'Ken', 'emp_id': 'CM103'}]

There are many other dynamodb conditions available for queries and scans. You can experiment with all of them to learn more. Thank you for your time.

Happy Learning :)