RedshiftのSUPER型でUNNESTと大文字フィールド名を扱ってみる

2021.12.27

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

はじめに

データアナリティクス事業本部のkobayashiです。

RedshiftのSUPER型を使用する案件がありその中でSUPER型のデータを扱う場合にいくつか気になる点があり、ドキュメントを調べながら対応しましたのでその内容をまとめます。

RedshiftのSUPER型

RedshfitのSUPER型は半構造化データをRedshiftへ取り込むことができる型で、主にJSONデータをRedshiftへ格納する際に使います。

SUPER型の説明は弊社ブログにて詳しい記事がありますのでそちらの記事をお読みください。

実際にSUPER型を使っている中でよく使う操作で気になった以下の2点について実際のデータを使いながら試してみます。

  • SUPER型データのUNNEST
  • 大文字フィールドが含まれているSUPER型データの取り扱い

なお、使用するデータは公式ドキュメントである「半構造化データを Amazon Redshift にロードする - Amazon Redshift 」のnationsテーブルのデータを使います。

"regionkey","name","comment","nations"
"0","AFRICA","""lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to ""","[{""n_comment"":"" haggle. carefully final deposits detect slyly agai"",""n_nationkey"":11,""n_name"":""ALGERIA""},{""n_comment"":""ven packages wake quickly. regu"",""n_nationkey"":5,""n_name"":""ETHIOPIA""},{""n_comment"":"" pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t"",""n_nationkey"":14,""n_name"":""KENYA""},{""n_comment"":""rns. blithely bold courts among the closely regular packages use furiously bold platelets?"",""n_nationkey"":15,""n_name"":""MOROCCO""},{""n_comment"":""s. ironic, unusual asymptotes wake blithely r"",""n_nationkey"":16,""n_name"":""MOZAMBIQUE""}]"
"1","AMERICA","""hs use ironic, even requests. s""","[{""n_comment"":""al foxes promise slyly according to the regular accounts. bold requests alon"",""n_nationkey"":1,""n_name"":""ARGENTINA""},{""n_comment"":""y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special "",""n_nationkey"":2,""n_name"":""BRAZIL""},{""n_comment"":""eas hang ironic, silent packages. slyly regular packages are furiously over the tithes. fluffily bold"",""n_nationkey"":3,""n_name"":""CANADA""},{""n_comment"":""platelets. blithely pending dependencies use fluffily across the even pinto beans. carefully silent accoun"",""n_nationkey"":17,""n_name"":""PERU""},{""n_comment"":""y final packages. slow foxes cajole quickly. quickly silent platelets breach ironic accounts. unusual pinto be"",""n_nationkey"":24,""n_name"":""UNITED STATES""}]"
"2","ASIA","""ges. thinly even pinto beans ca""","[{""n_comment"":""ss excuses cajole slyly across the packages. deposits print aroun"",""n_nationkey"":8,""n_name"":""INDIA""},{""n_comment"":"" slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull"",""n_nationkey"":9,""n_name"":""INDONESIA""},{""n_comment"":""ously. final, express gifts cajole a"",""n_nationkey"":12,""n_name"":""JAPAN""},{""n_comment"":""c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos"",""n_nationkey"":18,""n_name"":""CHINA""},{""n_comment"":""hely enticingly express accounts. even, final "",""n_nationkey"":21,""n_name"":""VIETNAM""}]"
"3","EUROPE","""ly final courts cajole furiously final excuse""","[{""n_comment"":""refully final requests. regular, ironi"",""n_nationkey"":6,""n_name"":""FRANCE""},{""n_comment"":""l platelets. regular accounts x-ray: unusual, regular acco"",""n_nationkey"":13,""n_name"":""GERMANY""},{""n_comment"":""ular asymptotes are about the furious multipliers. express dependencies nag above the ironically ironic account"",""n_nationkey"":19,""n_name"":""ROMANIA""},{""n_comment"":"" requests against the platelets use never according to the quickly regular pint"",""n_nationkey"":22,""n_name"":""RUSSIA""},{""n_comment"":""eans boost carefully special requests. accounts are. carefull"",""n_nationkey"":23,""n_name"":""UNITED KINGDOM""}]"
"4","MIDDLE EAST","""uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl""","[{""n_comment"":""y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d"",""n_nationkey"":17,""n_name"":""EGYPT""},{""n_comment"":""efully alongside of the slyly final dependencies. "",""n_nationkey"":10,""n_name"":""IRAN""},{""n_comment"":""nic deposits boost atop the quickly final requests? quickly regula"",""n_nationkey"":11,""n_name"":""IRAQ""},{""n_comment"":""ic deposits are blithely about the carefully regular pa"",""n_nationkey"":13,""n_name"":""JORDAN""},{""n_comment"":""ts. silent requests haggle. closely express packages sleep across the blithely"",""n_nationkey"":20,""n_name"":""SAUDI ARABIA""}]"

SUPER型データのUNNEST

Redshiftは PartiQL 構文を使ってSUPER型のデータを取り扱うことができます。これにより例えばnationsのテーブルでSUPER型のカラムnationのデータを展開してフラット化することができますので試してみます。

はじめにnationsのデータをUNNESTしない状態でデータを取得します。

SELECT c.regionkey, c.name, c.comment, c.nations
FROM nations c;
regionkey name comment nations
0 AFRICA "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " [{"n_comment":" haggle. carefully final deposits detect slyly agai",....
1 AMERICA "hs use ironic, even requests. s" [{"n_comment":"al foxes promise slyly according to the regular accounts. bold requests alon","n_nationkey":1,"n_name":"ARGENTINA"},....
2 ASIA "ges. thinly even pinto beans ca" [{"n_comment":"ss excuses cajole slyly across the packages. deposits print aroun","n_nationkey":8,"n_name":"INDIA"},....
3 EUROPE "ly final courts cajole furiously final excuse" [{"n_comment":"refully final requests. regular, ironi","n_nationkey":6,"n_name":"FRANCE"},....
4 MIDDLE EAST "uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl" [{"n_comment":"y above the carefully unusual theodolites. final dugouts are quickly across the furiously regular d","n_nationkey":17,"n_name":"EGYPT"},....

これに対してnationカラムをUNNESTしてデータを取得します。

SELECT c.regionkey, c.name, c.comment,
       n.n_name, n.n_nationkey, n.n_comment
FROM nations c,
     c.nations n;
regionkey name comment n_name n_nationkey n_comment
0 AFRICA "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " "ALGERIA" 11 " haggle. carefully final deposits detect slyly agai"
0 AFRICA "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " "ETHIOPIA" 5 "ven packages wake quickly. regu"
1 AMERICA "hs use ironic, even requests. s" "ARGENTINA" 1 "al foxes promise slyly according to the regular accounts. bold requests alon"
2 ASIA "ges. thinly even pinto beans ca" "INDIA" 8 "ss excuses cajole slyly across the packages. deposits print aroun"
2 ASIA "ges. thinly even pinto beans ca" "INDONESIA" 9 " slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull"

上記のようにRedshiftのSUPER型のUNNESTはFROM句の拡張で扱うことができます。nationsのデータですと、SUPRE型であるnationのカラムの配列データを展開してフラットなデータを取得することができています。

またATキーワードを使うことでSUPER型データの配列インデックスも取得することができます。

SELECT c.regionkey, c.name, c.comment,
       n.n_name, n.n_nationkey, n.n_comment, index n_index
FROM nations c,
     c.nations n AT index;
regionkey name comment n_name n_nationkey n_comment n_index
0 AFRICA "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " "ALGERIA" 11 " haggle. carefully final deposits detect slyly agai" 0
0 AFRICA "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " "ETHIOPIA" 5 "ven packages wake quickly. regu" 1
0 AFRICA "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " "KENYA" 14 " pending excuses haggle furiously deposits. pending, express pinto beans wake fluffily past t" 2
0 AFRICA "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " "MOROCCO" 15 "rns. blithely bold courts among the closely regular packages use furiously bold platelets?" 3
0 AFRICA "lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to " "MOZAMBIQUE" 16 "s. ironic, unusual asymptotes wake blithely r" 4
1 AMERICA "hs use ironic, even requests. s" "ARGENTINA" 1 "al foxes promise slyly according to the regular accounts. bold requests alon" 0
1 AMERICA "hs use ironic, even requests. s" "BRAZIL" 2 "y alongside of the pending deposits. carefully special packages are about the ironic forges. slyly special " 1

このようにUNNESTを使うことで簡単にSUPER型のデータをフラットにすることができます。JSONのデータを一旦RedshiftのSUPER型で取り込んだ後でSQLで展開して必要なデータにするといった形でELT処理を行えば半構造化データから必要なデータを迅速に取り出すことができるので大変便利です。

大文字フィールドが含まれているSUPER型データの取り扱い

JSONデータのフィールド名が大文字であったり、大文字と小文字が混ざったりしていることはよくあると思います。その際にはRedshiftの識別子の大文字・小文字区別の有無を設定するenable_case_sensitive_identifierTRUEにすることでクエリが可能になります。

SET enable_case_sensitive_identifier to TRUE;

nations.nationに含まれるデータのn_commentN_COMMENTへ、n_nationkeyN_Nationkeyへ更新したデータで試してみます。大文字を含むフィールドを指定する場合はフィールド名をダブルクオート"で囲うことで指定が可能になります。

SELECT c.regionkey, c.name, c.comment,
       n.n_name, n."N_Nationkey", n."N_COMMENT"
FROM nations c,
     c.nations n;
regionkey name comment n_name N_Nationkey N_COMMENT
2 ASIA ges. thinly even pinto beans ca "INDIA" 8 "ss excuses cajole slyly across the packages. deposits print aroun"
2 ASIA ges. thinly even pinto beans ca "INDONESIA" 9 " slyly express asymptotes. regular deposits haggle slyly. carefully ironic hockey players sleep blithely. carefull"
2 ASIA ges. thinly even pinto beans ca "JAPAN" 12 "ously. final, express gifts cajole a"
2 ASIA ges. thinly even pinto beans ca "CHINA" 18 "c dependencies. furiously express notornis sleep slyly regular accounts. ideas sleep. depos"

SUPER 設定 - Amazon Redshift

まとめ

RedshiftのSUPER型の操作でよく使う操作のUNNESTとフィールドの大文字対応を試してみました。どちらも特に難しい操作はなく簡単に半構造化データを扱うことができました。SUPER型が出るまではRedshiftで直接半構造化データを扱うことができないため事前のETL処理が必要でしたがSUPER型を使うことで非常に簡易に扱えるようになりました。

最後まで読んで頂いてありがとうございました。