DynamoDB with AWS CLI : scan, get-item and query

2022.09.28

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

Amazon's DynamoDB is a NoSQL key-value store offering from AWS which is quite popular among its users. Using DynamoDB has a lot of advantages and there are various ways of performing CRUD operations on it, in this article I would like to walk you through creating some DynamoDB tables, inserting data in these tables and various ways of reading this data.

First configure your AWS CLI with the IAM user which you would like to use to run these commands.

aws sts get-caller-identity

This command will display the identity of the AWS role sending the requests, if the output is not as expected you need to configure your credentials again.

Creating Tables

Now, let's get some data to insert in the DB. For this demo we'll be inserting data provided by AWS which is pre-processed to be inserted in DynamoDB.

wget https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/samples/sampledata.zip

unzip sampledata.zip

This downloads and unzips the data, you will be able to 4 files, ProductCatalog.json, Forum.json, Thread.json, Reply.json. Now, let us create some tables to insert all this data into

aws dynamodb create-table --table-name ProductCatalog --attribute-definitions AttributeName=Id,AttributeType=N --key-schema AttributeName=Id,KeyType=HASH --provisioned-throughput ReadCapacity=10,WriteCapacity=5

aws dynamodb create-table --table-name Forum --attribute-definitions AttributeName=Name,AttributeType=S --key-schema AttributeName=Name,KeyType=HASH --provisioned-throughput ReadCapacity=10,WriteCapacity=5

aws dynamodb create-table --table-name Thread --attribute-definitions AttributeName=ForumName,AttributeType=S AttributeName=Subject,AttributeType=S --key-schema AttributeName=ForumName,KeyType=HASH AttributeName=Subject,KeyType=RANGE --provisioned-throughput ReadCapacity=10,WriteCapacity=5

aws dynamodb create-table --table-name Reply --attribute-definitions AttributeName=Id,AttributeType=S AttributeName=ReplyDateTime,AttributeType=S --key-schema AttributeName=Id,KeyType=HASH AttributeName=ReplyDateTime,KeyType=RANGE --provisioned-throughput ReadCapacity=10,WriteCapacity=5

Notice how the partition key and sort key are defined using the HASH and RANGE reserved keywords respectively.

Creating ProductCatalog.

Inserting Data

Now that we have the data and tables, all that's left is to insert some data in these tables.

aws dynamodb batch-write-item --request-items file://ProductCatalog.json

aws dynamodb batch-write-item --request-items file://Forum.json

aws dynamodb batch-write-item --request-items file://Thread.json

aws dynamodb batch-write-item --request-items file://Reply.json

These commands will write the items to their respective tables. batch-write-item writes items in a batch of 25 items of up to 16 MB in size.

Reading Table

scan

The scan operation reads all the data present in a dynamodb table and presents it to the requester. This is an inefficient operation when the table has huge amount of data since we do end up consuming a lot of RCUs.

aws dynamodb scan --table-name ProductCatalog

By default any read operation on DynamoDB will use eventual consistency, since it is cheaper.

To read data from the table and limiting the data which we receive, we can use --expression-attribute-values, but this does not reduce the read capacity consumed by the command. Let's read some data from the Forum table.

aws dynamodb scan --table-name Forum --filter-expression 'Threads> :threads and #c > :v' --expression-attribute-values '{":threads" : {"N" : "1"}, ":v" : {"N": "1"}}' --expression-attribute-names '{"#c" : "Views"}' --return-consumed-capacity TOTAL

Scan with a filter

Notice how the return count is 1 but the ScannedCount is 2, this is a proof of the inefficiencies of scan query.

This query returns all Threads with partition key as 1 and Views more than 1. You might be thinking that there is no attribute named #c and yet it is present in the filter expression this is because Views is a reserved keyword, which means that it has a predefined meaning in the world of DynamoDB and if it is used directly in the query then it will generate the desired output. Hence we need to use a placeholder and later define the meaning of the placeholder using the expression-attribute-names option. To know more about DynamoDB's reserved keywords, go here.

get-item

When we want to get a single item from the table, the most suitable query type is the get-item query. Let's read an item from the Thread table.

aws dynamodb get-item --table-name Thread --key '{"ForumName" : {"S" : "Amazon S3"}, "Subject" : {"S" : "S3 Thread 1"}}' --return-consumed-capacity INDEXES

Since a table may have both the partition key and the sort key, in every get-item query, we need to define both of these if present, so that it creates a situation where there is only 1 possible item to be returned.

Reading item from Thread

query

When we want to get 1 or more than 1 item which satisfy a certain condition then query is the most suitable query type. This command is very versatile and gives a loads of possible options.

aws dynamodb query --table-name Reply --key-condition-expression 'Id=:Id' --expression-attribute-values '(":Id" : {"S" : "Amazon DynamoDB#DynamoDB Thread 2"}}' --return-consumed-capacity TOTAL

Since query is capable of returning multiple values, we can define either of the 2 keys and query will return all the objects which match those. Additionally, we can also use comparison operators to define a condition better.

Query on Reply

Sometime the data which is returned by Query is large in number which makes it difficult to read and it is also computationally intense. You can use the --max-items to define a limit on the number of items to be displayed. If the number of objects which satisfy the condition(s) exceeds the maximum items which can be displayed then a NextToken is returned along with the result, you can send this NextToken in another query which will return the next batch of items from the same table. It is worth noting that using NextToken to read more items does not consume any read capacity units.

aws dynamodb query --table-name Reply --key-condition-expression 'Id=:Id' --expression-attribute-values '(":Id" : {"S" : "Amazon DynamoDB#DynamoDB Thread 1"}}' --max-items 1 --return-consumed-capacity TOTAL

To send the NextToken with the query, you just use the --starting-token option.

aws dynamodb query --table-name Reply --key-condition-expression 'Id=:Id' --expression-attribute-values '(":Id" : {"S" : "Amazon DynamoDB#DynamoDB Thread 1"}}' --max-items 1 --starting-token $STARTING-TOKEN --return-consumed-capacity TOTAL

max-items and starting-token.

Well, that's it for this demo, I hope you learned something new today. For more information on DynamoDB's scan, get-item and query check out their respective CLI references. Their links have been embedded.