こんにちは、データアナリティクス事業本部コンサルティンングチームの八木です。
先週、AWS re:Invent 2022にて、Amazon Redshift auto-copy from S3という機能(プレビュー)が発表されました。
この機能はS3にアップロードしたファイルを、自動的にRedshiftにロードしてくれる機能です。
何が嬉しいの?
元々、S3からRedshiftへのロードはCOPYコマンドを利用する手法がありました。
ただし、このコマンドはRedshiftで都度実行する必要があります。S3へのファイルアップロードをトリガーにRedshiftへのロードを自動化するには、S3イベント通知とLambda等を使い、自動実行環境を自分で構築する必要がありました。
今回の発表された機能により、これらのアーキテクチャが必要なくなりました。私たちが行うのはauto-copyの設定のみです。
神アプデです。ありがとう、AWS。ありがとう、Redshift。
やってみた
とういことで、早速やってみました。
S3バケットの作成
データをアップロードするS3バケットを作成します。今回は全てデフォルトの設定で作成しました。
Redshiftクラスターの作成
auto-copyの機能はプレビューのため、専用のクラスターを作成する必要があります。
Redshiftのコンソールにログインすると、上部にTry new Amazon Redshift features in preview.
という案内がが出てきます。Create preview cluster
を選択し、プレビュー専用のクラスターを作成します。
Preview track
でpreview_autocopy_2022
を選択します。
今回はクラスターサイズはdc2.large、ノード数は1にしました。
関連付けられたIAMロールには、先ほど作成したS3バケットへの権限を持つIAMロールを指定します。
なお、公式ドキュメントによると、プレビュー終了(2023/2/28予定)から2週間後に、プレビュークラスターは削除する、との記載があります。ここで作成したクラスターはそのまま本番移行できなそうなので、ご注意ください。
データロード先のテーブル作成
Redshiftに接続し、以下のコマンドでテーブルを作成します。
CREATE TABLE public.cities(
latd integer,
latm integer,
lats integer,
ns varchar(16),
lond integer,
lonm integer,
lons integer,
ew varchar(16),
city varchar(64),
state varchar(16)
);
COPY JOB(auto-copy)の作成
続いて、以下のコマンドでCOPY JOB(auto-copy)を作成します。
COPY public.cities FROM 's3://<your-bucket-name>/dir/'
IAM_ROLE 'arn:aws:iam::<aws-account-id>:role/<role-name>'
FORMAT CSV
IGNOREHEADER 1
JOB CREATE "job_name"
AUTO ON;
作成したジョブを確認します。
COPY JOB LIST;
これでS3バケットのdir
パス配下にアップロードされるファイルは自動的にロードされるようになりました。
COPY JOBの作成コマンドは、基本的にCOPYコマンドにJOB CREATE job-name [AUTO ON | OFF]
をつけたSQL文になります。
AUTO
オプションは自動的にジョブを実行(=自動的にS3からロード)を行う設定です。OFFにした場合、自動的にロードは行われません。デフォルトはONです。
S3バケットにCSVファイルをアップロードしてみる
設定が完了したので、S3にファイルをアップロードし、自動的にRedshiftにロードされるか、確認してみます。
S3バケットのdir
パス配下に以下のCSVファイルをアップロードします。
cities.csv(長いので折りたたんでいます)
LatD,LatM,LatS,NS,LonD,LonM,"LonS","EW","City","State"
41,5,59,"N",80,39,0,"W","Youngstown",OH
42,52,48,"N",97,23,23,"W","Yankton",SD
46,35,59,"N",120,30,36,"W","Yakima",WA
42,16,12,"N",71,48,0,"W","Worcester",MA
43,37,48,"N",89,46,11,"W","WisconsinDells",WI
36,5,59,"N",80,15,0,"W","Winston-Salem",NC
49,52,48,"N",97,9,0,"W","Winnipeg",MB
39,11,23,"N",78,9,36,"W","Winchester",VA
34,14,24,"N",77,55,11,"W","Wilmington",NC
39,45,0,"N",75,33,0,"W","Wilmington",DE
48,9,0,"N",103,37,12,"W","Williston",ND
41,15,0,"N",77,0,0,"W","Williamsport",PA
37,40,48,"N",82,16,47,"W","Williamson",WV
33,54,0,"N",98,29,23,"W","WichitaFalls",TX
37,41,23,"N",97,20,23,"W","Wichita",KS
40,4,11,"N",80,43,12,"W","Wheeling",WV
26,43,11,"N",80,3,0,"W","WestPalmBeach",FL
47,25,11,"N",120,19,11,"W","Wenatchee",WA
41,25,11,"N",122,23,23,"W","Weed",CA
31,13,11,"N",82,20,59,"W","Waycross",GA
44,57,35,"N",89,38,23,"W","Wausau",WI
42,21,36,"N",87,49,48,"W","Waukegan",IL
44,54,0,"N",97,6,36,"W","Watertown",SD
43,58,47,"N",75,55,11,"W","Watertown",NY
42,30,0,"N",92,20,23,"W","Waterloo",IA
41,32,59,"N",73,3,0,"W","Waterbury",CT
38,53,23,"N",77,1,47,"W","Washington",DC
41,50,59,"N",79,8,23,"W","Warren",PA
46,4,11,"N",118,19,48,"W","WallaWalla",WA
31,32,59,"N",97,8,23,"W","Waco",TX
38,40,48,"N",87,31,47,"W","Vincennes",IN
28,48,35,"N",97,0,36,"W","Victoria",TX
32,20,59,"N",90,52,47,"W","Vicksburg",MS
49,16,12,"N",123,7,12,"W","Vancouver",BC
46,55,11,"N",98,0,36,"W","ValleyCity",ND
30,49,47,"N",83,16,47,"W","Valdosta",GA
43,6,36,"N",75,13,48,"W","Utica",NY
39,54,0,"N",79,43,48,"W","Uniontown",PA
32,20,59,"N",95,18,0,"W","Tyler",TX
42,33,36,"N",114,28,12,"W","TwinFalls",ID
33,12,35,"N",87,34,11,"W","Tuscaloosa",AL
34,15,35,"N",88,42,35,"W","Tupelo",MS
36,9,35,"N",95,54,36,"W","Tulsa",OK
32,13,12,"N",110,58,12,"W","Tucson",AZ
37,10,11,"N",104,30,36,"W","Trinidad",CO
40,13,47,"N",74,46,11,"W","Trenton",NJ
44,45,35,"N",85,37,47,"W","TraverseCity",MI
43,39,0,"N",79,22,47,"W","Toronto",ON
39,2,59,"N",95,40,11,"W","Topeka",KS
41,39,0,"N",83,32,24,"W","Toledo",OH
33,25,48,"N",94,3,0,"W","Texarkana",TX
39,28,12,"N",87,24,36,"W","TerreHaute",IN
27,57,0,"N",82,26,59,"W","Tampa",FL
30,27,0,"N",84,16,47,"W","Tallahassee",FL
47,14,24,"N",122,25,48,"W","Tacoma",WA
43,2,59,"N",76,9,0,"W","Syracuse",NY
32,35,59,"N",82,20,23,"W","Swainsboro",GA
33,55,11,"N",80,20,59,"W","Sumter",SC
40,59,24,"N",75,11,24,"W","Stroudsburg",PA
37,57,35,"N",121,17,24,"W","Stockton",CA
44,31,12,"N",89,34,11,"W","StevensPoint",WI
40,21,36,"N",80,37,12,"W","Steubenville",OH
40,37,11,"N",103,13,12,"W","Sterling",CO
38,9,0,"N",79,4,11,"W","Staunton",VA
39,55,11,"N",83,48,35,"W","Springfield",OH
37,13,12,"N",93,17,24,"W","Springfield",MO
42,5,59,"N",72,35,23,"W","Springfield",MA
39,47,59,"N",89,39,0,"W","Springfield",IL
47,40,11,"N",117,24,36,"W","Spokane",WA
41,40,48,"N",86,15,0,"W","SouthBend",IN
43,32,24,"N",96,43,48,"W","SiouxFalls",SD
42,29,24,"N",96,23,23,"W","SiouxCity",IA
32,30,35,"N",93,45,0,"W","Shreveport",LA
33,38,23,"N",96,36,36,"W","Sherman",TX
44,47,59,"N",106,57,35,"W","Sheridan",WY
35,13,47,"N",96,40,48,"W","Seminole",OK
32,25,11,"N",87,1,11,"W","Selma",AL
38,42,35,"N",93,13,48,"W","Sedalia",MO
47,35,59,"N",122,19,48,"W","Seattle",WA
41,24,35,"N",75,40,11,"W","Scranton",PA
41,52,11,"N",103,39,36,"W","Scottsbluff",NB
42,49,11,"N",73,56,59,"W","Schenectady",NY
32,4,48,"N",81,5,23,"W","Savannah",GA
46,29,24,"N",84,20,59,"W","SaultSainteMarie",MI
27,20,24,"N",82,31,47,"W","Sarasota",FL
38,26,23,"N",122,43,12,"W","SantaRosa",CA
35,40,48,"N",105,56,59,"W","SantaFe",NM
34,25,11,"N",119,41,59,"W","SantaBarbara",CA
33,45,35,"N",117,52,12,"W","SantaAna",CA
37,20,24,"N",121,52,47,"W","SanJose",CA
37,46,47,"N",122,25,11,"W","SanFrancisco",CA
41,27,0,"N",82,42,35,"W","Sandusky",OH
32,42,35,"N",117,9,0,"W","SanDiego",CA
34,6,36,"N",117,18,35,"W","SanBernardino",CA
29,25,12,"N",98,30,0,"W","SanAntonio",TX
31,27,35,"N",100,26,24,"W","SanAngelo",TX
40,45,35,"N",111,52,47,"W","SaltLakeCity",UT
38,22,11,"N",75,35,59,"W","Salisbury",MD
36,40,11,"N",121,39,0,"W","Salinas",CA
38,50,24,"N",97,36,36,"W","Salina",KS
38,31,47,"N",106,0,0,"W","Salida",CO
44,56,23,"N",123,1,47,"W","Salem",OR
44,57,0,"N",93,5,59,"W","SaintPaul",MN
38,37,11,"N",90,11,24,"W","SaintLouis",MO
39,46,12,"N",94,50,23,"W","SaintJoseph",MO
42,5,59,"N",86,28,48,"W","SaintJoseph",MI
44,25,11,"N",72,1,11,"W","SaintJohnsbury",VT
45,34,11,"N",94,10,11,"W","SaintCloud",MN
29,53,23,"N",81,19,11,"W","SaintAugustine",FL
43,25,48,"N",83,56,24,"W","Saginaw",MI
38,35,24,"N",121,29,23,"W","Sacramento",CA
43,36,36,"N",72,58,12,"W","Rutland",VT
33,24,0,"N",104,31,47,"W","Roswell",NM
35,56,23,"N",77,48,0,"W","RockyMount",NC
41,35,24,"N",109,13,48,"W","RockSprings",WY
42,16,12,"N",89,5,59,"W","Rockford",IL
43,9,35,"N",77,36,36,"W","Rochester",NY
44,1,12,"N",92,27,35,"W","Rochester",MN
37,16,12,"N",79,56,24,"W","Roanoke",VA
37,32,24,"N",77,26,59,"W","Richmond",VA
39,49,48,"N",84,53,23,"W","Richmond",IN
38,46,12,"N",112,5,23,"W","Richfield",UT
45,38,23,"N",89,25,11,"W","Rhinelander",WI
39,31,12,"N",119,48,35,"W","Reno",NV
50,25,11,"N",104,39,0,"W","Regina",SA
40,10,48,"N",122,14,23,"W","RedBluff",CA
40,19,48,"N",75,55,48,"W","Reading",PA
41,9,35,"N",81,14,23,"W","Ravenna",OH
Redshiftのテーブルに自動的にロードされたか、確認してみます。
SELECT * FROM public.cities;
SELECT count(*) FROM public.cities;
無事ロードされていました。
フォーマットがマッチしないファイルをアップロードすると、どうなるのか?
Redshfitのテーブルとフォーマットが異なるCSVファイルをアップロードすると、どうなるのでしょうか?
試しに、以下のようなCSVファイルをS3にアップロードしてみます。
id,name
1,yagi
2,tanaka
3,suzuki
Redshiftでテーブルを確認しましたが、ロードされていないようです。
SELECT count(*) FROM public.cities;
システムビューを確認してみると、エラーが発生したことがわかりました。
SELECT * FROM STL_LOAD_ERRORS ORDER BY starttime DESC LIMIT 5;
userid | slice | tbl | starttime | session | query | copy_job_id | filename | line_number | colname | type | col_length | position | raw_line | raw_field_value | err_code | err_reason | is_partial | start_offset
--------+-------+--------+----------------------------+------------+--------+-------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+---------------------------------------------------------------------------------------------------------------------------------+------------+------------+----------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------------------------------------------------------------------------------------------------------+------------+--------------
100 | 1 | 110827 | 2022-12-04 12:12:43.754541 | 1073774984 | 200735 | 110823 | s3://sample-bucket-auto-copy-123456789012/dir/users.csv | 2 | latm | int4 | 0 | 0 | 1,yagi | yagi | 1214 | Delimiter not found | 0 | 0
エラーが発生したCOPY JOBのIDも確認することができました。
最後に
今回はre:Invent 2022で発表された、Redshiftのauto-copy機能を試してみました。
S3のアップロードされたデータを自動的にRedshiftに読み込んでくれるため、オペレーションコストが大幅に軽減されそうです。
GAが待ち遠しいですね!
以上、八木でした!
参考リンク
COPY JOB (preview) - Amazon Redshift
Amazon Redshift now supports auto-copy from Amazon S3