DynamoDB with AWS CLI : scan, get-item and query
Amazon's DynamoDB is a NoSQL key-value store offering from AWS which is quite popular among its users. Using DynamoDB has a lot of advantages and there are various ways of performing CRUD operations on it, in this article I would like to walk you through creating some DynamoDB tables, inserting data in these tables and various ways of reading this data.
First configure your AWS CLI with the IAM user which you would like to use to run these commands.
aws sts get-caller-identity
This command will display the identity of the AWS role sending the requests, if the output is not as expected you need to configure your credentials again.
Creating Tables
Now, let's get some data to insert in the DB. For this demo we'll be inserting data provided by AWS which is pre-processed to be inserted in DynamoDB.
wget https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/samples/sampledata.zip unzip sampledata.zip
This downloads and unzips the data, you will be able to 4 files, ProductCatalog.json, Forum.json, Thread.json, Reply.json. Now, let us create some tables to insert all this data into
aws dynamodb create-table --table-name ProductCatalog --attribute-definitions AttributeName=Id,AttributeType=N --key-schema AttributeName=Id,KeyType=HASH --provisioned-throughput ReadCapacity=10,WriteCapacity=5 aws dynamodb create-table --table-name Forum --attribute-definitions AttributeName=Name,AttributeType=S --key-schema AttributeName=Name,KeyType=HASH --provisioned-throughput ReadCapacity=10,WriteCapacity=5 aws dynamodb create-table --table-name Thread --attribute-definitions AttributeName=ForumName,AttributeType=S AttributeName=Subject,AttributeType=S --key-schema AttributeName=ForumName,KeyType=HASH AttributeName=Subject,KeyType=RANGE --provisioned-throughput ReadCapacity=10,WriteCapacity=5 aws dynamodb create-table --table-name Reply --attribute-definitions AttributeName=Id,AttributeType=S AttributeName=ReplyDateTime,AttributeType=S --key-schema AttributeName=Id,KeyType=HASH AttributeName=ReplyDateTime,KeyType=RANGE --provisioned-throughput ReadCapacity=10,WriteCapacity=5
Notice how the partition key and sort key are defined using the HASH and RANGE reserved keywords respectively.
Inserting Data
Now that we have the data and tables, all that's left is to insert some data in these tables.
aws dynamodb batch-write-item --request-items file://ProductCatalog.json aws dynamodb batch-write-item --request-items file://Forum.json aws dynamodb batch-write-item --request-items file://Thread.json aws dynamodb batch-write-item --request-items file://Reply.json
These commands will write the items to their respective tables. batch-write-item writes items in a batch of 25 items of up to 16 MB in size.
Reading Table
scan
The scan operation reads all the data present in a dynamodb table and presents it to the requester. This is an inefficient operation when the table has huge amount of data since we do end up consuming a lot of RCUs.
aws dynamodb scan --table-name ProductCatalog
By default any read operation on DynamoDB will use eventual consistency, since it is cheaper.
To read data from the table and limiting the data which we receive, we can use --expression-attribute-values, but this does not reduce the read capacity consumed by the command. Let's read some data from the Forum table.
aws dynamodb scan --table-name Forum --filter-expression 'Threads> :threads and #c > :v' --expression-attribute-values '{":threads" : {"N" : "1"}, ":v" : {"N": "1"}}' --expression-attribute-names '{"#c" : "Views"}' --return-consumed-capacity TOTAL
Notice how the return count is 1 but the ScannedCount is 2, this is a proof of the inefficiencies of scan query.
This query returns all Threads with partition key as 1 and Views more than 1. You might be thinking that there is no attribute named #c and yet it is present in the filter expression this is because Views is a reserved keyword, which means that it has a predefined meaning in the world of DynamoDB and if it is used directly in the query then it will generate the desired output. Hence we need to use a placeholder and later define the meaning of the placeholder using the expression-attribute-names option. To know more about DynamoDB's reserved keywords, go here.
get-item
When we want to get a single item from the table, the most suitable query type is the get-item query. Let's read an item from the Thread table.
aws dynamodb get-item --table-name Thread --key '{"ForumName" : {"S" : "Amazon S3"}, "Subject" : {"S" : "S3 Thread 1"}}' --return-consumed-capacity INDEXES
Since a table may have both the partition key and the sort key, in every get-item query, we need to define both of these if present, so that it creates a situation where there is only 1 possible item to be returned.
query
When we want to get 1 or more than 1 item which satisfy a certain condition then query is the most suitable query type. This command is very versatile and gives a loads of possible options.
aws dynamodb query --table-name Reply --key-condition-expression 'Id=:Id' --expression-attribute-values '(":Id" : {"S" : "Amazon DynamoDB#DynamoDB Thread 2"}}' --return-consumed-capacity TOTAL
Since query is capable of returning multiple values, we can define either of the 2 keys and query will return all the objects which match those. Additionally, we can also use comparison operators to define a condition better.
Sometime the data which is returned by Query is large in number which makes it difficult to read and it is also computationally intense. You can use the --max-items to define a limit on the number of items to be displayed. If the number of objects which satisfy the condition(s) exceeds the maximum items which can be displayed then a NextToken is returned along with the result, you can send this NextToken in another query which will return the next batch of items from the same table. It is worth noting that using NextToken to read more items does not consume any read capacity units.
aws dynamodb query --table-name Reply --key-condition-expression 'Id=:Id' --expression-attribute-values '(":Id" : {"S" : "Amazon DynamoDB#DynamoDB Thread 1"}}' --max-items 1 --return-consumed-capacity TOTAL
To send the NextToken with the query, you just use the --starting-token option.
aws dynamodb query --table-name Reply --key-condition-expression 'Id=:Id' --expression-attribute-values '(":Id" : {"S" : "Amazon DynamoDB#DynamoDB Thread 1"}}' --max-items 1 --starting-token $STARTING-TOKEN --return-consumed-capacity TOTAL
Well, that's it for this demo, I hope you learned something new today. For more information on DynamoDB's scan, get-item and query check out their respective CLI references. Their links have been embedded.