AWS Glue Spark ETL ジョブでリアルタイムの進行状況を追跡する『Continuous Logging』をサポートしました

AWS Glueは、Spark ETLジョブでApache Sparkのステージを実行するリアルタイムの進行状況を追跡する『Continuous Logging』をサポートしました。リアルタイムにDriver/Executorのログやカスタムログを用いることで、監視やデバックが容易になるはずです。追加されたログや Progress Bar、Custom Script logger について解説します。

Continuous Logging とは

Apache Sparkのステージを実行するリアルタイムの進行状況を追跡する目的のログです。AWS Glueジョブを開始して実行が開始すると、5秒毎にリアルタイムログ情報がCloudWatchに送信されます。ログはAWS GlueコンソールまたはCloudWatchコンソールダッシュボードで表示できます。

Amazon CloudWatchでApache SparkのDriverとExecutorの異なるログストリームにアクセスし、詳細で冗長なApache Sparkのログメッセージを除外してETLジョブの監視とデバッグが容易になります。

Continuous Logging 機能には、次の機能があります。

  • Log filltering
    • Standerd filter:ログ内の冗長性を減らすためのデフォルトのフィルタを使用した連続ロギング
    • No filter:フィルターなしの連続ロギング
  • Custom Script logger:アプリケーション固有のメッセージを記録
  • Progress bar:現在のAWS Glueジョブの実行ステータスの進行状況を追跡

ログはCloudWatch Logsのロググループ/aws-glue/jobs/logs-v2に以下の3種類のログストリームが追加されます。

  • 1行目は、Executorのログ
  • 2行目は、Driverのログ
  • 3行目が、 Progress Bar のログ

なお、従来の「ログ」はロググループ/aws-glue/jobs/output、「エラーログ」はロググループ/aws-glue/jobs/errorです。

Continuous Logging の設定とログ出力

設定

ジョブの作成・変更画面でジョブプロパティ[Continuous Logging]を「有効化」に設定します。さらに[Log filltering]を指定できます。「Standerd filter」(デフォルト)は、詳細で冗長なログをフィルタします。フィルタしない場合は「No Filter」を設定してください。

既存のジョブに対してまとめて、Continuous Logging を有効化したいときは、ジョブの一覧の中から設定したいジョブをチェックして、[ユーザー設定]を選択すると同様のダイアログが表示しますので、同様に設定が可能です。これで既存のジョブもまとめて移行できます。

ログの参照

Glueのコンソールの[スクリプトの編集]の下の[Driver and executor log streams]のリンクをクリックすると、CloudWatch Logsの/aws-glue/jobs/logs-v2のログストリーム画面に遷移します。直接、CloudWatch Logsを参照していただいても構いません。

DriverとExecutorのログ

実際のログは、DriverとExecutorのログのサンプルは、長いので本ブログの最後のAppendexに記載しましたので参照してください。

Progress Bar の表示

Glueのコンソールの[スクリプトの編集]画面で、ジョブを実行すると以下のプログレスバーがリアルタイムに表示されます。Progress Bar は、5秒ごとに次の進捗状況が表示されます。glueContextを初期化せずにSparkのみで書かれたジョブを実行した場合、Progress Bar は表示されません。

上記の進行状況は以下の式の内容が表示されます。

Stage Number (Stage Name): > (numCompletedTasks + numActiveTasks) / totalNumOfTasksInThisStage]

Custom Script logger による固有のメッセージの出力

従来も標準出力にメッセージ出力することでログにメッセージを残せましたが、AWS GlueのLoggerを使用することで、リアルタイムでDriverのログストリームにアプリケーション固有のメッセージをスクリプトに記録できるようになりました。

glueContextから取得したloggerを使ってログ出力します。ログレベルは、infowarnerrorの3つです。

from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)
logger = glueContext.get_logger()
logger.info("info message")
logger.warn("warn message")
logger.error("error message")

上記のコードをPySparkのスクリプトに書いて、実際に動かすと以下のようにDriverのログにメッセージ出力されます。

:
19/06/15 08:38:48 INFO GlueLogger: info message
19/06/15 08:38:48 WARN GlueLogger: warn message
19/06/15 08:38:48 ERROR GlueLogger: error message
:

Custom Script loggerで出力したリアルタイムのログを、CloudWatch Logsのメトリクスフィルタでフックすることも可能です。

まとめ

DriverやExecutorのログが分離され、さらにリアルタイムに出力されるため、開発時はデバックに役立ち、運用時は監視の役に立つはずです。リアルタイムのログを、CloudWatch Logsのメトリクスフィルタでフックすることも可能になります。例えば、5時間以上実行したジョブがメモリ不足で終了した場合、どこまで処理できたのか、DPUを増やすのか、DPUのWorkerタイプをスケールアップで対応できるのか、作りを見直すべきか、など判断するのに不可欠な情報なはずです。

Appendex

以下のログは、テーブルを読み込んでParquetに変換するジョブを実行したものです。

Driver のログサンプル

19/06/15 06:43:08 INFO ApplicationMaster: Preparing Local resources
19/06/15 06:43:09 INFO ApplicationMaster: ApplicationAttemptId: appattempt_1560580476388_0001_000001
19/06/15 06:43:09 INFO ApplicationMaster: Starting the user application in a separate Thread
19/06/15 06:43:09 INFO ApplicationMaster: Waiting for spark context initialization...
19/06/15 06:43:10 INFO Utils: Successfully started service 'sparkDriver' on port 40183.
19/06/15 06:43:11 WARN Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
19/06/15 06:43:12 WARN Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
19/06/15 06:43:12 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
19/06/15 06:43:12 INFO ApplicationMaster: 
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> ./*:/usr/lib/hadoop-lzo/lib/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/glue/etl/jars/aws-glue-datacatalog-spark-client-1.8.0-SNAPSHOT.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>$HADOOP_CONF_DIR<CPS>$HADOOP_COMMON_HOME/*<CPS>$HADOOP_COMMON_HOME/lib/*<CPS>$HADOOP_HDFS_HOME/*<CPS>$HADOOP_HDFS_HOME/lib/*<CPS>$HADOOP_MAPRED_HOME/*<CPS>$HADOOP_MAPRED_HOME/lib/*<CPS>$HADOOP_YARN_HOME/*<CPS>$HADOOP_YARN_HOME/lib/*<CPS>/usr/lib/hadoop-lzo/lib/*<CPS>/usr/share/aws/emr/emrfs/conf<CPS>/usr/share/aws/emr/emrfs/lib/*<CPS>/usr/share/aws/emr/emrfs/auxlib/*<CPS>/usr/share/aws/emr/lib/*<CPS>/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar<CPS>/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar<CPS>/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar<CPS>/usr/share/aws/emr/cloudwatch-sink/lib/*<CPS>/usr/share/aws/aws-java-sdk/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>/usr/lib/hadoop-lzo/lib/*<CPS>/usr/share/aws/emr/emrfs/conf<CPS>/usr/share/aws/emr/emrfs/lib/*<CPS>/usr/share/aws/emr/emrfs/auxlib/*<CPS>/usr/share/aws/emr/lib/*<CPS>/usr/share/aws/emr/ddb/lib/emr-ddb-hadoop.jar<CPS>/usr/share/aws/emr/goodies/lib/emr-hadoop-goodies.jar<CPS>/usr/share/aws/emr/kinesis/lib/emr-kinesis-hadoop.jar<CPS>/usr/share/aws/emr/cloudwatch-sink/lib/*<CPS>/usr/share/aws/aws-java-sdk/*
SPARK_YARN_STAGING_DIR -> *********(redacted)
SPARK_USER -> *********(redacted)
SPARK_YARN_MODE -> true
PYTHONPATH -> {{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.4-src.zip<CPS>{{PWD}}/PyGlue.zip

command:
LD_LIBRARY_PATH="/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:$LD_LIBRARY_PATH" \ 
{{JAVA_HOME}}/bin/java \ 
-server \ 
-Xmx5120m \ 
'-XX:+UseConcMarkSweepGC' \ 
'-XX:CMSInitiatingOccupancyFraction=70' \ 
'-XX:MaxHeapFreeRatio=70' \ 
'-XX:+CMSClassUnloadingEnabled' \ 
'-XX:OnOutOfMemoryError=kill -9 %p' \ 
'-XX:+UseCompressedOops' \ 
'-Djavax.net.ssl.trustStore=ExternalAndAWSTrustStore.jks' \ 
'-Djavax.net.ssl.trustStoreType=JKS' \ 
'-Djavax.net.ssl.trustStorePassword=amazon' \ 
'-DRDS_ROOT_CERT_PATH=rds-combined-ca-bundle.pem' \ 
'-DREDSHIFT_ROOT_CERT_PATH=redshift-ssl-ca-cert.pem' \ 
'-DRDS_TRUSTSTORE_URL=file:RDSTrustStore.jks' \ 
'-Dlog4j.configuration=log4j' \ 
-Djava.io.tmpdir={{PWD}}/tmp \ 
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \ 
org.apache.spark.executor.CoarseGrainedExecutorBackend \ 
--driver-url \ 
spark://CoarseGrainedScheduler@172.31.0.37:40183 \ 
--executor-id \ 
<executorId> \ 
--hostname \ 
<hostname> \ 
--cores \ 
4 \ 
--app-id \ 
application_1560580476388_0001 \ 
--user-class-path \ 
file:$PWD/__app__.jar \ 
--user-class-path \ 
file:$PWD/glue-assembly.jar \ 
--user-class-path \ 
file:$PWD/glueml-assembly.jar \ 
1><LOG_DIR>/stdout \ 
2><LOG_DIR>/stderr

resources:
rds-combined-ca-bundle.pem -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/rds-combined-ca-bundle.pem" } size: 31848 timestamp: 1560580981641 type: FILE visibility: PRIVATE
py4j-0.10.4-src.zip -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/py4j-0.10.4-src.zip" } size: 74096 timestamp: 1560580982168 type: FILE visibility: PRIVATE
glue-assembly.jar -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/glue-assembly.jar" } size: 428309025 timestamp: 1560580980732 type: FILE visibility: PRIVATE
glue-override.conf -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/glue-override.conf" } size: 279 timestamp: 1560580981610 type: FILE visibility: PRIVATE
log4j -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/log4j.properties" } size: 6654 timestamp: 1560580982115 type: FILE visibility: PRIVATE
ExternalAndAWSTrustStore.jks -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/ExternalAndAWSTrustStore.jks" } size: 118406 timestamp: 1560580981627 type: FILE visibility: PRIVATE
__spark_conf__ -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/__spark_conf__.zip" } size: 9902 timestamp: 1560580982215 type: ARCHIVE visibility: PRIVATE
glueml-assembly.jar -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/glueml-assembly.jar" } size: 46708803 timestamp: 1560580981577 type: FILE visibility: PRIVATE
RDSTrustStore.jks -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/RDSTrustStore.jks" } size: 19135 timestamp: 1560580981671 type: FILE visibility: PRIVATE
redshift-ssl-ca-cert.pem -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/redshift-ssl-ca-cert.pem" } size: 8621 timestamp: 1560580981656 type: FILE visibility: PRIVATE
pyspark.zip -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/pyspark.zip" } size: 482687 timestamp: 1560580982152 type: FILE visibility: PRIVATE
PyGlue.zip -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/PyGlue.zip" } size: 110536 timestamp: 1560580982184 type: FILE visibility: PRIVATE
__spark_libs__ -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/__spark_libs__6825946872772345135.zip" } size: 220406929 timestamp: 1560580976626 type: ARCHIVE visibility: PRIVATE
script_2019-06-15-06-42-39.py -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/script_2019-06-15-06-42-39.py" } size: 2573 timestamp: 1560580981699 type: FILE visibility: PRIVATE
glue-default.conf -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/glue-default.conf" } size: 382 timestamp: 1560580981594 type: FILE visibility: PRIVATE
image-creation-time -> resource { scheme: "hdfs" host: "ip-172-31-10-217.ap-northeast-1.compute.internal" port: 8020 file: "/user/root/.sparkStaging/application_1560580476388_0001/image-creation-time" } size: 11 timestamp: 1560580981685 type: FILE visibility: PRIVATE

===============================================================================
19/06/15 06:43:13 WARN Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
19/06/15 06:43:13 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
19/06/15 06:43:18 INFO GlueContext: GlueMetrics configured and enabled
19/06/15 06:43:19 INFO GlueContext: classification csv
19/06/15 06:43:19 INFO GlueContext: location s3://cm-user/customer/
19/06/15 06:43:22 INFO MultipartUploadOutputStream: close closed:false s3://aws-glue-temporary-123456789012-ap-northeast-1/admin/partitionlisting/customer/jr_f0405dc86c4c0fe5b53cc30a663c082d650f2a7f6844ba9f2fd8966a332fc5e2/datasource0.input-files.json
19/06/15 06:43:24 INFO DAGScheduler: Got job 0 (save at DataSink.scala:126) with 20 output partitions
19/06/15 06:43:26 WARN ServletHandler: Error for /api/v1/applications/application_1560580476388_0001
19/06/15 06:45:04 INFO DAGScheduler: Job 0 finished: save at DataSink.scala:126, took 99.705935 s
19/06/15 06:45:04 INFO ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0

Executor のログサンプル

19/06/15 06:43:17 INFO Executor: Starting executor ID 1 on host ip-172-31-0-37.ap-northeast-1.compute.internal
19/06/15 06:43:25 INFO NewHadoopRDD: Input split: s3://cm-user/customer/customer0002_part_00.gz:0+105338147
19/06/15 06:44:21 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064413_0001_m_000000_0/part-00000-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:21 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064413_0001_m_000003_0/part-00003-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:21 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064413_0001_m_000002_0/part-00002-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:22 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064413_0001_m_000001_0/part-00001-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:22 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064413_0001_m_000000_0' to s3://cm-user/datalake/customer
19/06/15 06:44:22 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064413_0001_m_000002_0' to s3://cm-user/datalake/customer
19/06/15 06:44:22 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064413_0001_m_000001_0' to s3://cm-user/datalake/customer
19/06/15 06:44:23 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064413_0001_m_000003_0' to s3://cm-user/datalake/customer
19/06/15 06:44:29 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064422_0001_m_000004_0/part-00004-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:30 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064422_0001_m_000004_0' to s3://cm-user/datalake/customer
19/06/15 06:44:31 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064424_0001_m_000006_0/part-00006-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:31 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064424_0001_m_000005_0/part-00005-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:31 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064424_0001_m_000007_0/part-00007-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:31 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064424_0001_m_000006_0' to s3://cm-user/datalake/customer
19/06/15 06:44:31 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064424_0001_m_000007_0' to s3://cm-user/datalake/customer
19/06/15 06:44:31 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064424_0001_m_000005_0' to s3://cm-user/datalake/customer
19/06/15 06:44:37 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064430_0001_m_000008_0/part-00008-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:38 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064430_0001_m_000008_0' to s3://cm-user/datalake/customer
19/06/15 06:44:39 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064431_0001_m_000010_0/part-00010-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:39 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064431_0001_m_000009_0/part-00009-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:39 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064432_0001_m_000011_0/part-00011-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:39 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064431_0001_m_000010_0' to s3://cm-user/datalake/customer
19/06/15 06:44:40 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064431_0001_m_000009_0' to s3://cm-user/datalake/customer
19/06/15 06:44:40 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064432_0001_m_000011_0' to s3://cm-user/datalake/customer
19/06/15 06:44:45 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064438_0001_m_000012_0/part-00012-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:46 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064438_0001_m_000012_0' to s3://cm-user/datalake/customer
19/06/15 06:44:47 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064439_0001_m_000013_0/part-00013-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:47 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064440_0001_m_000014_0/part-00014-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:47 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064440_0001_m_000015_0/part-00015-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:47 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064439_0001_m_000013_0' to s3://cm-user/datalake/customer
19/06/15 06:44:48 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064440_0001_m_000014_0' to s3://cm-user/datalake/customer
19/06/15 06:44:48 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064440_0001_m_000015_0' to s3://cm-user/datalake/customer
19/06/15 06:44:53 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064446_0001_m_000016_0/part-00016-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:54 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064447_0001_m_000017_0/part-00017-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:55 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064448_0001_m_000019_0/part-00019-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:55 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064447_0001_m_000017_0' to s3://cm-user/datalake/customer
19/06/15 06:44:55 INFO MultipartUploadOutputStream: close closed:false s3://cm-user/datalake/customer/_temporary/0/_temporary/attempt_20190615064448_0001_m_000018_0/part-00018-cc6d2e58-28f5-4ea0-9e46-75cc588b51b6-c000.snappy.parquet
19/06/15 06:44:55 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064448_0001_m_000019_0' to s3://cm-user/datalake/customer
19/06/15 06:44:55 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064448_0001_m_000018_0' to s3://cm-user/datalake/customer
19/06/15 06:45:04 INFO FileOutputCommitter: Saved output of task 'attempt_20190615064446_0001_m_000016_0' to s3://cm-user/datalake/customer