こんにちは!AWSエンジニアのおつまみです!
先日fluentdでEC2のログをS3に出力する際に困った内容をシェアします!
困っていた内容
- fluentdの起動で
Job for td-agent.service failed because the control process exited with error code.
エラーが発生。 - fluentdが起動できず、S3にログが出力できなかった。
いきなり結論
td-agent --dry-run -c /etc/td-agent/td-agent.conf
コマンドで設定ファイルの構文エラーを確認しましょう。
背景
fluentdインストール後、設定ファイル/etc/td-agent/td-agent.conf
を以下のように設定しました。
# read apache logs
<source>
@type tail
<parse>
@type none
</parse>
path /var/log/httpd/access_log
pos_file /var/log/td-agent/tmp/access.log.pos
tag os.apache.access
</source>
# read secure logs
<source>
@type tail
<parse>
@type none
</parse>
path /var/log/secure
pos_file /var/log/td-agent/tmp/access.log.pos
tag os.secure
</source>
# send to S3
<match os.**>
@type s3
s3_bucket yourbucketname
path ${tag}/
s3_region ap-northeast-1
time_slice_format %Y/%m/%d
s3_object_key_format %{path}%{time_slice}/%{index}.%{file_extension}
<format>
@type single_value
</format>
<buffer tag,time>
@type file
path /var/log/td-agent/s3/${tag}
timekey 30
timekey_wait 30
timekey_use_utc true
</buffer>
</match>
その後、systemctl start td-agent.service
コマンドでサービスを起動しようとすると、以下のエラーが発生しました。
$ systemctl status td-agent.service
Job for td-agent.service failed because the control process exited with error code.
See "systemctl status td-agent.service" and "journalctl -xeu td-agent.service" for details.
エラーに表示されている通りのコマンドを実行しましたが、詳細な原因がわかりませんでした。
$ systemctl status td-agent.service
× td-agent.service - td-agent: Fluentd based data collector for Treasure Data
Loaded: loaded (/usr/lib/systemd/system/td-agent.service; enabled; preset: disabled)
Active: failed (Result: exit-code) since Mon 2023-08-28 11:28:59 UTC; 35s ago
Duration: 2h 5min 57.904s
Docs: https://docs.treasuredata.com/display/public/PD/About+Treasure+Data%27s+Server-Side+Agent
Process: 1935 ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=1/FAILURE)
CPU: 1.201s
Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Control process exited, code=exited, status=1/FAILURE
Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Failed with result 'exit-code'.
Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Scheduled restart job, restart counter is at 5.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Stopped td-agent: Fluentd based data collector for Treasure Data.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Start request repeated too quickly.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Failed with result 'exit-code'.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
$ journalctl -xeu td-agent.service
░░ Subject: A start job for unit td-agent.service has failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit td-agent.service has finished with a failure.
░░
░░ The job identifier is 3441 and the job result is failed.
Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ The unit td-agent.service completed and consumed the indicated resources.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Scheduled restart job, restart counter is at 5.
░░ Subject: Automatic restarting of a unit has been scheduled
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ Automatic restarting of the unit td-agent.service has been scheduled, as the result for
░░ the configured Restart= setting for the unit.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Stopped td-agent: Fluentd based data collector for Treasure Data.
░░ Subject: A stop job for unit td-agent.service has finished
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A stop job for unit td-agent.service has finished.
░░
░░ The job identifier is 3526 and the job result is done.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ The unit td-agent.service completed and consumed the indicated resources.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Start request repeated too quickly.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ The unit td-agent.service has entered the 'failed' state with result 'exit-code'.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
░░ Subject: A start job for unit td-agent.service has failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░
░░ A start job for unit td-agent.service has finished with a failure.
░░
░░ The job identifier is 3526 and the job result is failed.
対処方法
冒頭にも記載したとおり、fluentd --dry-run -c /etc/td-agent/td-agent.conf
コマンドで設定ファイルの構文エラーを確認します。
2023-08-28 12:05:21 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-08-28 12:05:21 +0000 [info]: parsing config file is succeeded path="/etc/td-agent/td-agent.conf"
2023-08-28 12:05:21 +0000 [info]: gem 'fluentd' version '1.16.1'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-calyptia-monitoring' version '0.1.3'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.3.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-flowcounter-simple' version '0.1.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-kafka' version '0.19.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-metrics-cmetrics' version '0.1.2'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.1.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.3'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.1.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-s3' version '1.7.2'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-sd-dns' version '0.1.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-td' version '1.2.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-utmpx' version '0.5.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0'
2023-08-28 12:05:21 +0000 [info]: starting fluentd-1.16.1 as dry run mode ruby="3.1.4"
2023-08-28 12:05:22 +0000 [error]: config error file="/etc/td-agent/td-agent.conf" error_class=Fluent::ConfigError error="Other 'in_tail' plugin already use same pos_file path: plugin_id = object:cd0, pos_file path = /var/log/td-agent/tmp/access.log.pos"
すると、一番下にerror
メッセージが表示されていました。
内容を確認すると、pos_file
で指定したpath
が重複していることが原因でした。
そのため、設定ファイルを修正します。
# read apache logs
<source>
@type tail
<parse>
@type none
</parse>
path /var/log/httpd/access_log
pos_file /var/log/td-agent/tmp/access.log.pos
tag os.apache.access
</source>
# read secure logs
<source>
@type tail
<parse>
@type none
</parse>
path /var/log/secure
pos_file /var/log/td-agent/tmp/secure.log.pos
tag os.secure
</source>
# send to S3
<match os.**>
@type s3
s3_bucket yourbucketname
path ${tag}/
s3_region ap-northeast-1
time_slice_format %Y/%m/%d
s3_object_key_format %{path}%{time_slice}/%{index}.%{file_extension}
<format>
@type single_value
</format>
<buffer tag,time>
@type file
path /var/log/td-agent/s3/${tag}
timekey 30
timekey_wait 30
timekey_use_utc true
</buffer>
</match>
再度、fluentd --dry-run -c /etc/td-agent/td-agent.conf
コマンドで設定ファイルの構文エラーを確認します。
2023-08-28 12:08:11 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-08-28 12:08:11 +0000 [info]: parsing config file is succeeded path="/etc/td-agent/td-agent.conf"
2023-08-28 12:08:11 +0000 [info]: gem 'fluentd' version '1.16.1'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-calyptia-monitoring' version '0.1.3'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.3.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-flowcounter-simple' version '0.1.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-kafka' version '0.19.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-metrics-cmetrics' version '0.1.2'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.1.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.3'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.1.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-s3' version '1.7.2'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-sd-dns' version '0.1.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-td' version '1.2.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-utmpx' version '0.5.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0'
2023-08-28 12:08:11 +0000 [info]: starting fluentd-1.16.1 as dry run mode ruby="3.1.4"
2023-08-28 12:08:12 +0000 [info]: using configuration file: <ROOT>
<source>
@type tail
path "/var/log/httpd/access_log"
pos_file "/var/log/td-agent/tmp/access.log.pos"
tag "os.apache.access"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<source>
@type tail
path "/var/log/secure"
pos_file "/var/log/td-agent/tmp/secure.log.pos"
tag "os.secure"
<parse>
@type "none"
unmatched_lines
</parse>
</source>
<match os.**>
@type s3
s3_bucket "td-agent-test-o2mami-bucket"
path "${tag}/"
s3_region "ap-northeast-1"
time_slice_format %Y/%m/%d
s3_object_key_format "%{path}%{time_slice}/%{index}.%{file_extension}"
<format>
@type "single_value"
</format>
<buffer tag,time>
@type "file"
path "/var/log/td-agent/s3/${tag}"
timekey 30
timekey_wait 30
timekey_use_utc true
</buffer>
</match>
</ROOT>
2023-08-28 12:08:12 +0000 [info]: finished dry run mode
error
メッセージが表示されず、設定ファイルが正しく修正されたようでした。
この状態でサービスを起動してみます。
$ systemctl start td-agent
$ systemctl status td-agent.service
● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
Loaded: loaded (/usr/lib/systemd/system/td-agent.service; enabled; preset: disabled)
Active: active (running) since Mon 2023-08-28 12:11:44 UTC; 6s ago
Docs: https://docs.treasuredata.com/display/public/PD/About+Treasure+Data%27s+Server-Side+Agent
Process: 2065 ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 2071 (fluentd)
Tasks: 8 (limit: 4421)
Memory: 109.6M
CPU: 2.481s
CGroup: /system.slice/td-agent.service
├─2071 /opt/td-agent/bin/ruby /opt/td-agent/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid
└─2074 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /opt/td-agent/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid --un>
Aug 28 12:11:43 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Starting td-agent: Fluentd based data collector for Treasure Data...
Aug 28 12:11:44 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Started td-agent: Fluentd based data collector for Treasure Data.
無事、active
状態となり、S3にログが出力されました!
さいごに
今回はfluentdの起動に失敗した時の対処方法をお伝えしました。
最後までお読みいただきありがとうございました!
どなたかのお役に立てれば幸いです。
以上、おつまみ(@AWS11077)でした!