[New features] Advanced Query Accelerator (AQUA) for Amazon Redshift is now generally available!

2021.04.23

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

(The original Japanese article is available here.)

Amazon Redshift using RA3 instances, called Advanced Query Accelerator (AQUA) for Amazon Redshift was initially announced in AWS re:Invent 2019 and is now finally generally available. I am going to try it out and show you its process in this article.

Mr. Ozono from AWS Japan (AWSJ) confirms it in the tweet too, so it must be real.

AQUA is available with the latest node family, RA3 (ra3.4xl or ra3.16xl) and with release version 1.0.24421 or later in the following AWS Regions:

  • US East (N. Virginia) Region (us-east-1)
  • US East (Ohio) Region (us-east-2)
  • US West (Oregon) Region (us-west-2)
  • Asia Pacific (Tokyo) Region (ap-northeast-1)
  • Europe (Ireland) Region (eu-west-1)

So glad that it’s already available in the Tokyo Region as well!

What is AQUA (Advanced Query Accelerator)?

AQUA serves as a powerful query accelerator that is a hardware provided between the RA3 nodes (ra3.4xl or ra3.16xl) and S3 managed storage.

Here is the introduction by AWS Blog.

The storage system uses multiple cues, including data block temperature, data blockage, and workload patterns, to manage the cache for high performance

...

Building on the caches that I told you about earlier, and taking advantage of the AWS Nitro System and custom FPGA-based acceleration, AQUA pushes the computation needed to handle reduction and aggregation queries closer to the data. This reduces network traffic, offloads work from the CPUs in the RA3 nodes, and allows AQUA to improve the performance of those queries by up to 10x, at no extra cost and without any code changes. AQUA also makes use of a fast, high-bandwidth connection to Amazon Simple Storage Service (S3).

Creating a cluster using AQUA from a snapshot

This time, I tried creating a cluster which uses AQUA by restoring from a snapshot. You can choose either ra3.4xlarge or ra3.16xlarge as a node type. If you already have a cluster with either of these node types, the AQUA is configured as ‘Automatic’ by default. To start using AQUA, choose [Actions]-[Configure AQUA] and change the configuration from Automatic to Turn On in the following dialogue.

I created a cluster with the default configuration, Automatic. You have the following choices when you configure AQUA:

  • Automatic (default)
    • Redshift determines whether to use AQUA.
    • As of today, Working with AQUA (Advanced Query Accelerator) states “Currently, AQUA isn't activated with this option, but this behavior is subject to change.” At this moment, this option means the same as Turn Off; AQUA won’t be activated.
  • Turn On
    • You choose to always use AQUA. AQUA can only be activated in certain AWS Regions and for ra3.4xlarge and ra3.16xlarge node types.
  • Turn Off
    • You choose not to use AQUA.

It turned Available after about 5 minutes. You can see the cluster is launched with AQUA of ‘Automatic’.

Creating test data

AQUA seems to excel at the acceleration of LIKE and SIMILAR TO in particular, so I prepared approximately 300 million of data.

dev=> create table lineitem (
dev(>   l_orderkey bigint not null,
dev(>   l_partkey bigint,
dev(>   l_suppkey bigint,
dev(>   l_linenumber integer not null,
dev(>   l_quantity decimal(18,4),
dev(>   l_extendedprice decimal(18,4),
dev(>   l_discount decimal(18,4),
dev(>   l_tax decimal(18,4),
dev(>   l_returnflag varchar(1),
dev(>   l_linestatus varchar(1),
dev(>   l_shipdate date,
dev(>   l_commitdate date,
dev(>   l_receiptdate date,
dev(>   l_shipinstruct varchar(25),
dev(>   l_shipmode varchar(10),
dev(>   l_comment varchar(44))
dev-> distkey (l_orderkey)
dev-> sortkey (l_receiptdate);
CREATE TABLE
dev=> copy lineitem from 's3://cm-bucket/redshift-immersionday-labs/data/lineitem-part/'
dev-> iam_role 'arn:aws:iam::123456789012:role/AmazonRedshiftRole'
dev-> region 'ap-northeast-1' gzip delimiter '|' compupdate preset;

INFO:  Load into table 'lineitem' completed, 303008217 record(s) loaded successfully.
COPY

dev=> select * from lineitem limit 1;
-[ RECORD 1 ]---+----------------------------------------
l_orderkey      | 7428384
l_partkey       | 9121341
l_suppkey       | 621360
l_linenumber    | 4
l_quantity      | 23.0000
l_extendedprice | 31323.4700
l_discount      | 0.0900
l_tax           | 0.0500
l_returnflag    | R
l_linestatus    | F
l_shipdate      | 1992-01-02
l_commitdate    | 1992-03-22
l_receiptdate   | 1992-01-03
l_shipinstruct  | DELIVER IN PERSON
l_shipmode      | FOB
l_comment       | haggle carefully about the furiously ir

Performance test of AQUA

I reviewed the behavior by explicitly changing Turn On/Off. To update the AQUA configuration, you can click [Actions]-[Configure AQUA].

Note:

After changing Turn ON/Off in the dialogue and clicking on [Save changes], the cluster is immediately rebooted to apply the change.

Testing queries

I executed the following SIMILAR TO and LIKE sample queries and measured each response time.

SIMILAR TO sample query

-- explain
select sum(l_orderkey), count(*) 
from lineitem 
where
  l_comment similar to 'slyly %' or
  l_comment similar to 'plant %' or
  l_comment similar to 'fina %' or
  l_comment similar to 'quick %' or
  l_comment similar to 'slyly %' or
  l_comment similar to 'quickly %' or
  l_comment similar to ' %about%' or
  l_comment similar to ' final%' or
  l_comment similar to ' %final%' or
  l_comment similar to ' breach%' or
  l_comment similar to ' egular%' or
  l_comment similar to ' %closely%' or
  l_comment similar to ' closely%' or
  l_comment similar to ' %idea%' or
  l_comment similar to ' idea%' ;

LIKE sample query

-- explain
select sum(l_orderkey), count(*) 
from lineitem 
where
  l_comment like 'slyly %' or
  l_comment like 'plant %' or
  l_comment like 'fina %' or
  l_comment like 'quick %' or
  l_comment like 'slyly %' or
  l_comment like 'quickly %' or
  l_comment like ' %about%' or
  l_comment like ' final%' or
  l_comment like ' %final%' or
  l_comment like ' breach%' or
  l_comment like ' egular%' or
  l_comment like ' %closely%' or
  l_comment like ' closely%' or
  l_comment like ' %idea%' or
  l_comment like ' idea%' ;

I searched strings contained in ‘l_comment’ and aggregated the respective records. I felt really guilty of making such queries with SIMILAR TO and LIKE just by connecting with or, but I believed in AQUA’s capacity.

In the following results, the result cache was disabled to measure the processing time.

set enable_result_cache_for_session to off;

Comparison of SIMILAR TO

AQUA deactivated (Turn Off)

dev=> select sum(l_orderkey), count(*)
dev-> from lineitem
dev-> where
dev->   l_comment similar to 'slyly %' or
dev->   l_comment similar to 'plant %' or
dev->   l_comment similar to 'fina %' or
dev->   l_comment similar to 'quick %' or
dev->   l_comment similar to 'slyly %' or
dev->   l_comment similar to 'quickly %' or
dev->   l_comment similar to ' %about%' or
dev->   l_comment similar to ' final%' or
dev->   l_comment similar to ' %final%' or
dev->   l_comment similar to ' breach%' or
dev->   l_comment similar to ' egular%' or
dev->   l_comment similar to ' %closely%' or
dev->   l_comment similar to ' closely%' or
dev->   l_comment similar to ' %idea%' or
dev->   l_comment similar to ' idea%' ;

       sum        |  count
------------------+---------
 1440371216714447 | 9496106
(1 row)
Time: 215896.819 ms

select sum(l_orderkey), count(*)
from lineitem
where
  l_comment similar to 'slyly %' or
  l_comment similar to 'plant %' or
  l_comment similar to 'fina %' or
  l_comment similar to 'quick %' or
  l_comment similar to 'slyly %' or
  l_comment similar to 'quickly %' or
  l_comment similar to ' %about%' or
  l_comment similar to ' final%' or
  l_comment similar to ' %final%' or
  l_comment similar to ' breach%' or
  l_comment similar to ' egular%' or
  l_comment similar to ' %closely%' or
  l_comment similar to ' closely%' or
  l_comment similar to ' %idea%' or
  l_comment similar to ' idea%' ;
       sum        |  count
------------------+---------
 1440371216714447 | 9496106
(1 row)
Time: 211313.374 ms

Below is the execution plan.

dev=> explain
select sum(l_orderkey), count(*)
from lineitem
where
  l_comment similar to 'slyly %' or
  l_comment similar to 'plant %' or
  l_comment similar to 'fina %' or
  l_comment similar to 'quick %' or
  l_comment similar to 'slyly %' or
  l_comment similar to 'quickly %' or
  l_comment similar to ' %about%' or
  l_comment similar to ' final%' or
  l_comment similar to ' %final%' or
  l_comment similar to ' breach%' or
  l_comment similar to ' egular%' or
  l_comment similar to ' %closely%' or
  l_comment similar to ' closely%' or
  l_comment similar to ' %idea%' or
  l_comment similar to ' idea%' ;
                                                                                                                                                                                                                                                                                                                                           QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 XN Aggregate  (cost=13830214.62..13830214.62 rows=1 width=8)
   ->  XN Seq Scan on lineitem  (cost=0.00..13635370.08 rows=38968908 width=8)
         Filter: (((l_comment)::text ~ '^( .*idea.*)$'::text) OR ((l_comment)::text ~ '^( idea.*)$'::text) OR ((l_comment)::text ~ '^(fina .*)$'::text) OR ((l_comment)::text ~ '^( .*about.*)$'::text) OR ((l_comment)::text ~ '^( .*final.*)$'::text) OR ((l_comment)::text ~ '^( final.*)$'::text) OR ((l_comment)::text ~ '^(plant .*)$'::text) OR ((l_comment)::text ~ '^(quick .*)$'::text) OR ((l_comment)::text ~ '^(slyly .*)$'::text) OR ((l_comment)::text ~ '^( breach.*)$'::text) OR ((l_comment)::text ~ '^( egular.*)$'::text) OR ((l_comment)::text ~ '^( .*closely.*)$'::text) OR ((l_comment)::text ~ '^( closely.*)$'::text) OR ((l_comment)::text ~ '^(quickly .*)$'::text))
(3 rows)

Time: 8.506 ms

AQUA activated (Turn On)

dev=> select sum(l_orderkey), count(*)
dev-> from lineitem
dev-> where
dev->   l_comment similar to 'slyly %' or
dev->   l_comment similar to 'plant %' or
dev->   l_comment similar to 'fina %' or
dev->   l_comment similar to 'quick %' or
dev->   l_comment similar to 'slyly %' or
dev->   l_comment similar to 'quickly %' or
dev->   l_comment similar to ' %about%' or
dev->   l_comment similar to ' final%' or
dev->   l_comment similar to ' %final%' or
dev->   l_comment similar to ' breach%' or
dev->   l_comment similar to ' egular%' or
dev->   l_comment similar to ' %closely%' or
dev->   l_comment similar to ' closely%' or
dev->   l_comment similar to ' %idea%' or
dev->   l_comment similar to ' idea%' ;
       sum        |  count
------------------+---------
 1440371216714447 | 9496106
(1 row)
Time: 29191.625 ms

dev=> select sum(l_orderkey), count(*)
dev-> from lineitem
dev-> where
dev->   l_comment similar to 'slyly %' or
dev->   l_comment similar to 'plant %' or
dev->   l_comment similar to 'fina %' or
dev->   l_comment similar to 'quick %' or
dev->   l_comment similar to 'slyly %' or
dev->   l_comment similar to 'quickly %' or
dev->   l_comment similar to ' %about%' or
dev->   l_comment similar to ' final%' or
dev->   l_comment similar to ' %final%' or
dev->   l_comment similar to ' breach%' or
dev->   l_comment similar to ' egular%' or
dev->   l_comment similar to ' %closely%' or
dev->   l_comment similar to ' closely%' or
dev->   l_comment similar to ' %idea%' or
dev->   l_comment similar to ' idea%' ;
       sum        |  count
------------------+---------
 1440371216714447 | 9496106
(1 row)
Time: 7512.982 ms

Below is the execution plan.

dev=> explain
select sum(l_orderkey), count(*)
from lineitem
where
  l_comment similar to 'slyly %' or
  l_comment similar to 'plant %' or
  l_comment similar to 'fina %' or
  l_comment similar to 'quick %' or
  l_comment similar to 'slyly %' or
  l_comment similar to 'quickly %' or
  l_comment similar to ' %about%' or
  l_comment similar to ' final%' or
  l_comment similar to ' %final%' or
  l_comment similar to ' breach%' or
  l_comment similar to ' egular%' or
  l_comment similar to ' %closely%' or
  l_comment similar to ' closely%' or
  l_comment similar to ' %idea%' or
  l_comment similar to ' idea%' ;
                                                                                                                                                                                                                                                                                                                                           QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 XN Aggregate  (cost=13830214.62..13830214.62 rows=1 width=8)
   ->  XN Seq Scan on lineitem  (cost=0.00..13635370.08 rows=38968908 width=8)
         Filter: (((l_comment)::text ~ '^( .*idea.*)$'::text) OR ((l_comment)::text ~ '^( idea.*)$'::text) OR ((l_comment)::text ~ '^(fina .*)$'::text) OR ((l_comment)::text ~ '^( .*about.*)$'::text) OR ((l_comment)::text ~ '^( .*final.*)$'::text) OR ((l_comment)::text ~ '^( final.*)$'::text) OR ((l_comment)::text ~ '^(plant .*)$'::text) OR ((l_comment)::text ~ '^(quick .*)$'::text) OR ((l_comment)::text ~ '^(slyly .*)$'::text) OR ((l_comment)::text ~ '^( breach.*)$'::text) OR ((l_comment)::text ~ '^( egular.*)$'::text) OR ((l_comment)::text ~ '^( .*closely.*)$'::text) OR ((l_comment)::text ~ '^( closely.*)$'::text) OR ((l_comment)::text ~ '^(quickly .*)$'::text))
(3 rows)

Time: 8.683 ms

Results

SIMILAR TO queries with AQUA activated significantly improved the performance: 7.4 times in the first time, 28.1 times in the second and later times. The query plans were the same regardless of AQUA, in which the conditions were converted to regular expressions. (The unit of the following table is “seconds”.)

AQUA (Off) AQUA (On) Improvement by AQUA
Round 1 215.896 29.191 7.4 times faster
Round 2 211.313 7.512 28.1 times faster
First run ratio 1 3.9 -

Comparison of LIKE

AQUA deactivated (Turn Off)

dev=> select sum(l_orderkey), count(*)
dev-> from lineitem
dev-> where
dev->   l_comment like 'slyly %' or
dev->   l_comment like 'plant %' or
dev->   l_comment like 'fina %' or
dev->   l_comment like 'quick %' or
dev->   l_comment like 'slyly %' or
dev->   l_comment like 'quickly %' or
dev->   l_comment like ' %about%' or
dev->   l_comment like ' final%' or
dev->   l_comment like ' %final%' or
dev->   l_comment like ' breach%' or
dev->   l_comment like ' egular%' or
dev->   l_comment like ' %closely%' or
dev->   l_comment like ' closely%' or
dev->   l_comment like ' %idea%' or
dev->   l_comment like ' idea%' ;

       sum        |  count
------------------+---------
 1440371216714447 | 9496106
(1 row)

Time: 10276.394 ms
dev=>
dev=>
dev=> select sum(l_orderkey), count(*)
from lineitem
where
  l_comment like 'slyly %' or
  l_comment like 'plant %' or
  l_comment like 'fina %' or
  l_comment like 'quick %' or
  l_comment like 'slyly %' or
  l_comment like 'quickly %' or
  l_comment like ' %about%' or
  l_comment like ' final%' or
  l_comment like ' %final%' or
  l_comment like ' breach%' or
  l_comment like ' egular%' or
  l_comment like ' %closely%' or
  l_comment like ' closely%' or
  l_comment like ' %idea%' or
  l_comment like ' idea%' ;
       sum        |  count
------------------+---------
 1440371216714447 | 9496106
(1 row)

Time: 6921.963 ms

Below is the execution plan.

dev=> explain
select sum(l_orderkey), count(*)
from lineitem
where
  l_comment like 'slyly %' or
  l_comment like 'plant %' or
  l_comment like 'fina %' or
  l_comment like 'quick %' or
  l_comment like 'slyly %' or
  l_comment like 'quickly %' or
  l_comment like ' %about%' or
  l_comment like ' final%' or
  l_comment like ' %final%' or
  l_comment like ' breach%' or
  l_comment like ' egular%' or
  l_comment like ' %closely%' or
  l_comment like ' closely%' or
  l_comment like ' %idea%' or
  l_comment like ' idea%' ;
                                                                                                                                                                                                                                                                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 XN Aggregate  (cost=13688958.11..13688958.11 rows=1 width=8)
   ->  XN Seq Scan on lineitem  (cost=0.00..13635370.08 rows=10717605 width=8)
         Filter: (((l_comment)::text ~~ ' %idea%'::text) OR ((l_comment)::text ~~ ' %about%'::text) OR ((l_comment)::text ~~ ' %final%'::text) OR ((l_comment)::text ~~ ' %closely%'::text) OR ((l_comment)::text ~~ ' breach%'::text) OR ((l_comment)::text ~~ ' closely%'::text) OR ((l_comment)::text ~~ ' egular%'::text) OR ((l_comment)::text ~~ ' final%'::text) OR ((l_comment)::text ~~ ' idea%'::text) OR ((l_comment)::text ~~ 'fina %'::text) OR ((l_comment)::text ~~ 'plant %'::text) OR ((l_comment)::text ~~ 'quick %'::text) OR ((l_comment)::text ~~ 'quickly %'::text) OR ((l_comment)::text ~~ 'slyly %'::text))
(3 rows)

Time: 7.985 ms

AQUA activated (Turn On)

dev=> select sum(l_orderkey), count(*)
dev-> from lineitem
dev-> where
dev->   l_comment like 'slyly %' or
dev->   l_comment like 'plant %' or
dev->   l_comment like 'fina %' or
dev->   l_comment like 'quick %' or
dev->   l_comment like 'slyly %' or
dev->   l_comment like 'quickly %' or
dev->   l_comment like ' %about%' or
dev->   l_comment like ' final%' or
dev->   l_comment like ' %final%' or
dev->   l_comment like ' breach%' or
dev->   l_comment like ' egular%' or
dev->   l_comment like ' %closely%' or
dev->   l_comment like ' closely%' or
dev->   l_comment like ' %idea%' or
dev->   l_comment like ' idea%' ;

       sum        |  count
------------------+---------
 1440371216714447 | 9496106
(1 row)

Time: 11116.387 ms
dev=>
dev=> select sum(l_orderkey), count(*)
from lineitem
where
  l_comment like 'slyly %' or
  l_comment like 'plant %' or
  l_comment like 'fina %' or
  l_comment like 'quick %' or
  l_comment like 'slyly %' or
  l_comment like 'quickly %' or
  l_comment like ' %about%' or
  l_comment like ' final%' or
  l_comment like ' %final%' or
  l_comment like ' breach%' or
  l_comment like ' egular%' or
  l_comment like ' %closely%' or
  l_comment like ' closely%' or
  l_comment like ' %idea%' or
  l_comment like ' idea%' ;
       sum        |  count
------------------+---------
 1440371216714447 | 9496106
(1 row)

Time: 7526.141 ms

Below is the execution plan.

dev=> explain
select sum(l_orderkey), count(*)
from lineitem
where
  l_comment like 'slyly %' or
  l_comment like 'plant %' or
  l_comment like 'fina %' or
  l_comment like 'quick %' or
  l_comment like 'slyly %' or
  l_comment like 'quickly %' or
  l_comment like ' %about%' or
  l_comment like ' final%' or
  l_comment like ' %final%' or
  l_comment like ' breach%' or
  l_comment like ' egular%' or
  l_comment like ' %closely%' or
  l_comment like ' closely%' or
  l_comment like ' %idea%' or
  l_comment like ' idea%' ;
                                                                                                                                                                                                                                                                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 XN Aggregate  (cost=13688958.11..13688958.11 rows=1 width=8)
   ->  XN Seq Scan on lineitem  (cost=0.00..13635370.08 rows=10717605 width=8)
         Filter: (((l_comment)::text ~~ ' %idea%'::text) OR ((l_comment)::text ~~ ' %about%'::text) OR ((l_comment)::text ~~ ' %final%'::text) OR ((l_comment)::text ~~ ' %closely%'::text) OR ((l_comment)::text ~~ ' breach%'::text) OR ((l_comment)::text ~~ ' closely%'::text) OR ((l_comment)::text ~~ ' egular%'::text) OR ((l_comment)::text ~~ ' final%'::text) OR ((l_comment)::text ~~ ' idea%'::text) OR ((l_comment)::text ~~ 'fina %'::text) OR ((l_comment)::text ~~ 'plant %'::text) OR ((l_comment)::text ~~ 'quick %'::text) OR ((l_comment)::text ~~ 'quickly %'::text) OR ((l_comment)::text ~~ 'slyly %'::text))
(3 rows)

Time: 8.096 ms

Results

LIKE queries with AQUA activated resulted in slightly lower performance: 0.9 times both in the first time and after the second time. (The unit of the following table is “seconds”.)

AQUA (Off) AQUA (On) Improvement by AQUA
Round 1 10.276 11.116 0.9 times (10% slower)
Round 2 6.921 7.557 0.9 times (10% slower)
First run ratio 1.5 1.5 -

Discussion

By using AQUA, queries filtered with SIMILAR TO showed 7-28 times of performance improvement, whereas queries with LIKE were executed a bit slower. AQUA might cause a certain degree of overhead.

In this test, there was a gap in processing time between the first attempt and later attempts, even though I had disabled the result cache. This is possibly due to some causes:

The first run of a query without AQUA includes time to compile a record and to load data from managed S3 to the local storage. Both of these are cached after the second runs. On the other hand, the first run of a query with AQUA includes time to compile a record and presumably to load data from managed S3 to AQUA. This data loading to AQUA might be cached too, but currently, the conditions or forms where AQUA caches are not officially indicated, and we cannot expect how much it caches.

Pricing of AQUA

It is completely free of charge!!!

Conclusion

Through these tests, I could confirm 7-28 times of performance enhancement in the queries filtered with SIMILAR TO by activating AQUA.

I conducted the test by explicitly turning AQUA on/off, also with other different data. Depending on the queries and workloads, the query execution time got faster when AQUA was not used. I hope the default setting, ‘Automatic’ will be in force soon so that Redshift can automatically decide if AQUA is necessary or not.

Since not all workloads are generally suitable for AQUA, it is recommended to determine whether to activate AQUA based on your use case. At least, the default AQUA setting of ‘Automatic’ is already configured in the clusters which use the supported node types. If the features are further developed such as automating AQUA activation, we could be benefitting from the work of AQUA without even realizing it. Let us be patient and look forward to seeing how Redshift evolves and AQUA advances.

If there is a way to invalidate compile cache and cache of data which is loaded from managed S3 to local storage, a more fine-grained test will be possible. This would then help us consider the issues, challenges, and workarounds that we may have. It is not only about Redshift, but cloud DWHs are getting more and more complex and making it difficult for humans to grasp their behavior. I wish the time comes when we don’t even need to understand their operations but they can keep running always in good shape.

-Satoru Ishikawa

(This article was originally written in Japanese and has been translated by Ito and Norita.)