Compression Encoding – Amazon Redshift

2021.09.15

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

What Is Compression Encoding

A Compression Encoding specifies the type of compression that is applied to a column of data values as rows are added to a table. Data compression reduces the size of the data, which directly reduces storage costs and improves query performance. Amazon Redshift provide two types of compression automatic compression and manual compression. Compression Encodings can be applied to the columns in tables automatically and manually, based on the own evaluation of the data.

Automatic Compression

ENCODE AUTO

Amazon Redshift automatically manages compression encoding for all columns in the table. ENCODE AUTO is the default for tables. You can specify the ENCODE AUTO option for the table to enable Automatic Compression. It requires enough rows in the sample data to decide appropriate column encoding as per the data. Automatic Compression selects most appropriate encoding for each column of the data.

If you specify compression encoding for any column in the table, the table is no longer set to ENCODE AUTO. if you don't specify the ENCODE AUTO option for the table, Amazon Redshift automatically select compression encoding for columns as follows:

  • Columns that are defined as sort keys are assigned RAW compression.
  • Columns that are defined as BOOLEAN, REAL, or DOUBLE PRECISION data types are assigned RAW compression.
  • Columns that are defined as SMALLINT, INTEGER, BIGINT, DECIMAL, DATE, TIMESTAMP, or TIMESTAMPTZ data types are assigned AZ64 compression.
  • Columns that are defined as CHAR or VARCHAR data types are assigned LZO compression.

COPY Command

The COPY command analyzes the input data and applies compression encodings to an empty table automatically as part of the load operation. Amazon strongly recommends using the COPY command to apply automatic compression.

  • Automatic compression can be use while creating a new table. The COPY command will perform a compression analysis.
  • To apply automatic compression to an empty table, regardless of its current compression encodings, execute the COPY command with the COMPUPDATE option set to ON.
  • To disable automatic compression, run the COPY command with the COMPUPDATE option set to OFF.

Example

Copy Table_Name from ‘s3://****' 
iam_role ‘arn:aws:iam::**** '
delimiter '|' COMPUPDATE ON;

Manual Compression

CREATE TABLE Statement

Manual compression can be apply by specifying the compression encodings for individual columns as part of the CREATE TABLE statement.

Syntax

create table table_name (column_name 
data_type ENCODE encoding-type)[, ...]

Here, encoding-type is taken from the following table.

Example

create table model(
model_id int encode raw,
model_name char(10) encode bytedict);

ALTER TABLE Statement.

Amazon Redshift allows users to modify the compression encoding of existing tables and columns with a single statement.This statement updates the values and properties set by CREATE TABLE or CREATE EXTERNAL TABLE. Data is changes with the time so this feature makes it easier to modify column compression encodings and maintain the optimal compression encodings in Amazon Redshift to achieve high performance and reduce the storage utilization.

Compression encoding can be applied for a column when it is added to a table using the ALTER TABLE command.

Syntax

ALTER TABLE Table_Name ADD [ COLUMN ] column_name column_type ENCODE encoding type

Compression encoding can be applied to the existing table.

For existing table , the following statement alters the encoding to AUTO.

Example

alter table model alter encode auto;

Compression encoding can be applied to the column of an existing table.

Example

alter table model alter column c3 encode bytedict;

Analyze Compression

Amazone Redshift tries to analyze the data and select best possible encoding. It also suggest various options of encoding types that can be suitable for the existing data and selected manually. Analyze Compression is use to view the suggested encodings for existing data in the table.

Analyze Compression does not take sort keys into account. Analyze Compression changes the encoding to RAW for sort keys. In manual compression, Analyze Compression command can be run against an existing table to view the suggested encoding options. Result of Analyze Compression command helps to choose compression encodings.

Syntax

ANALYZE COMPRESSION 
[ [ table_name ]
[ ( column_name [, ...] ) ] ] 
[COMPROWS numrows]

Example

analyze compression model;

Verify the encoding type of Redshift table

PG_TABLE_DEF

PG_TABLE_DEF table contains meta data about the tables in the database. PG_TABLE_DEF contains information about all the schemas, tables, columns and also provides the relationship between them. PG_TABLE_DEF view provide information about how data can be store in the column and also for exploring the Redshift data warehouse.

To select all the rows from pg_table_def.

select *from pg_table_def

This will select all the columns from all the tables in all the schema of data warehouse.

PG_TABLE_DEF only returns information about tables that are visible to the user. If PG_TABLE_DEF does not return the expected results, verify that the search_path parameter is set correctly to include the relevant schemas. PG_TABLE_DEF will only return information for tables in schemas that are included in the search path. To limit the results to user-defined tables, it’s important to specify the schemaname column or table name column to return only results which are Required :

Example

select *from pg_table_def where schemaname = 'public';

This shows all the tables and their associated columns information exist in the public schema.

select *from pg_table_def where tablename = 'model';

This shows all the columns and their types, encoding, sortkey, distkey exist in the table 'model'.

Conclusion

I think Compression encoding is the amazing feature provided by Amazon Redshift. When the data grow exponentially, it becomes very difficult to optimise data storage. Encoded data can have huge impact on data storage and query performance. Because the size of the data doesn’t just impact storage size and costs, it also affects query performance. It is important to select right encoding option for your data. Compression encodings in Amazon Redshift helps to reduces the size of the data, which directly reduces storage costs and achieve high query performance.