[小ネタ]fluentdの起動でJob for td-agent.service failed because the control process exited with error code.エラーが発生した時の対処方法

td-agent --dry-run -c /etc/td-agent/td-agent.confコマンドでエラー原因を特定しましょう。
2023.08.28

こんにちは!AWSエンジニアのおつまみです!

先日fluentdでEC2のログをS3に出力する際に困った内容をシェアします!

困っていた内容

  • fluentdの起動でJob for td-agent.service failed because the control process exited with error code.エラーが発生。
  • fluentdが起動できず、S3にログが出力できなかった。

いきなり結論

  • td-agent --dry-run -c /etc/td-agent/td-agent.confコマンドで設定ファイルの構文エラーを確認しましょう。

背景

fluentdインストール後、設定ファイル/etc/td-agent/td-agent.confを以下のように設定しました。

# read apache logs
<source>
  @type tail
  <parse>
    @type none
  </parse>
  path /var/log/httpd/access_log
  pos_file /var/log/td-agent/tmp/access.log.pos
  tag os.apache.access
</source>

# read secure logs
<source>
  @type tail
  <parse>
    @type none
  </parse>
  path /var/log/secure
  pos_file /var/log/td-agent/tmp/access.log.pos
  tag os.secure
</source>

# send to S3
<match os.**>
  @type s3
  s3_bucket yourbucketname
  path ${tag}/
  s3_region ap-northeast-1
  time_slice_format %Y/%m/%d
  s3_object_key_format %{path}%{time_slice}/%{index}.%{file_extension}
  <format>
    @type single_value
  </format>
  <buffer tag,time>
    @type file
    path /var/log/td-agent/s3/${tag}
    timekey 30
    timekey_wait 30
    timekey_use_utc true
  </buffer>
</match>

その後、systemctl start td-agent.serviceコマンドでサービスを起動しようとすると、以下のエラーが発生しました。

$ systemctl status td-agent.service
Job for td-agent.service failed because the control process exited with error code.
See "systemctl status td-agent.service" and "journalctl -xeu td-agent.service" for details.

エラーに表示されている通りのコマンドを実行しましたが、詳細な原因がわかりませんでした。

$ systemctl status td-agent.service
× td-agent.service - td-agent: Fluentd based data collector for Treasure Data
     Loaded: loaded (/usr/lib/systemd/system/td-agent.service; enabled; preset: disabled)
     Active: failed (Result: exit-code) since Mon 2023-08-28 11:28:59 UTC; 35s ago
   Duration: 2h 5min 57.904s
       Docs: https://docs.treasuredata.com/display/public/PD/About+Treasure+Data%27s+Server-Side+Agent
    Process: 1935 ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=1/FAILURE)
        CPU: 1.201s

Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Control process exited, code=exited, status=1/FAILURE
Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Failed with result 'exit-code'.
Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Scheduled restart job, restart counter is at 5.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Stopped td-agent: Fluentd based data collector for Treasure Data.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Start request repeated too quickly.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Failed with result 'exit-code'.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
$ journalctl -xeu td-agent.service
░░ Subject: A start job for unit td-agent.service has failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░ 
░░ A start job for unit td-agent.service has finished with a failure.
░░ 
░░ The job identifier is 3441 and the job result is failed.
Aug 28 11:28:58 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░ 
░░ The unit td-agent.service completed and consumed the indicated resources.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Scheduled restart job, restart counter is at 5.
░░ Subject: Automatic restarting of a unit has been scheduled
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░ 
░░ Automatic restarting of the unit td-agent.service has been scheduled, as the result for
░░ the configured Restart= setting for the unit.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Stopped td-agent: Fluentd based data collector for Treasure Data.
░░ Subject: A stop job for unit td-agent.service has finished
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░ 
░░ A stop job for unit td-agent.service has finished.
░░ 
░░ The job identifier is 3526 and the job result is done.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Consumed 1.201s CPU time.
░░ Subject: Resources consumed by unit runtime
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░ 
░░ The unit td-agent.service completed and consumed the indicated resources.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Start request repeated too quickly.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: td-agent.service: Failed with result 'exit-code'.
░░ Subject: Unit failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░ 
░░ The unit td-agent.service has entered the 'failed' state with result 'exit-code'.
Aug 28 11:28:59 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Failed to start td-agent: Fluentd based data collector for Treasure Data.
░░ Subject: A start job for unit td-agent.service has failed
░░ Defined-By: systemd
░░ Support: https://access.redhat.com/support
░░ 
░░ A start job for unit td-agent.service has finished with a failure.
░░ 
░░ The job identifier is 3526 and the job result is failed.

対処方法

冒頭にも記載したとおり、fluentd --dry-run -c /etc/td-agent/td-agent.confコマンドで設定ファイルの構文エラーを確認します。

2023-08-28 12:05:21 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-08-28 12:05:21 +0000 [info]: parsing config file is succeeded path="/etc/td-agent/td-agent.conf"
2023-08-28 12:05:21 +0000 [info]: gem 'fluentd' version '1.16.1'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-calyptia-monitoring' version '0.1.3'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.3.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-flowcounter-simple' version '0.1.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-kafka' version '0.19.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-metrics-cmetrics' version '0.1.2'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.1.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.3'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.1.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-s3' version '1.7.2'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-sd-dns' version '0.1.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-td' version '1.2.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-utmpx' version '0.5.0'
2023-08-28 12:05:21 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0'
2023-08-28 12:05:21 +0000 [info]: starting fluentd-1.16.1 as dry run mode ruby="3.1.4"
2023-08-28 12:05:22 +0000 [error]: config error file="/etc/td-agent/td-agent.conf" error_class=Fluent::ConfigError error="Other 'in_tail' plugin already use same pos_file path: plugin_id = object:cd0, pos_file path = /var/log/td-agent/tmp/access.log.pos"

すると、一番下にerrorメッセージが表示されていました。 内容を確認すると、pos_fileで指定したpathが重複していることが原因でした。

そのため、設定ファイルを修正します。

# read apache logs
<source>
  @type tail
  <parse>
    @type none
  </parse>
  path /var/log/httpd/access_log
  pos_file /var/log/td-agent/tmp/access.log.pos
  tag os.apache.access
</source>

# read secure logs
<source>
  @type tail
  <parse>
    @type none
  </parse>
  path /var/log/secure
  pos_file /var/log/td-agent/tmp/secure.log.pos
  tag os.secure
</source>

# send to S3
<match os.**>
  @type s3
  s3_bucket yourbucketname
  path ${tag}/
  s3_region ap-northeast-1
  time_slice_format %Y/%m/%d
  s3_object_key_format %{path}%{time_slice}/%{index}.%{file_extension}
  <format>
    @type single_value
  </format>
  <buffer tag,time>
    @type file
    path /var/log/td-agent/s3/${tag}
    timekey 30
    timekey_wait 30
    timekey_use_utc true
  </buffer>
</match>

再度、fluentd --dry-run -c /etc/td-agent/td-agent.confコマンドで設定ファイルの構文エラーを確認します。

2023-08-28 12:08:11 +0000 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2023-08-28 12:08:11 +0000 [info]: parsing config file is succeeded path="/etc/td-agent/td-agent.conf"
2023-08-28 12:08:11 +0000 [info]: gem 'fluentd' version '1.16.1'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-calyptia-monitoring' version '0.1.3'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-elasticsearch' version '5.3.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-flowcounter-simple' version '0.1.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-kafka' version '0.19.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-metrics-cmetrics' version '0.1.2'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-opensearch' version '1.1.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-prometheus' version '2.0.3'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-prometheus_pushgateway' version '0.1.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-record-modifier' version '2.1.1'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-s3' version '1.7.2'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-sd-dns' version '0.1.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-systemd' version '1.0.5'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-td' version '1.2.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-utmpx' version '0.5.0'
2023-08-28 12:08:11 +0000 [info]: gem 'fluent-plugin-webhdfs' version '1.5.0'
2023-08-28 12:08:11 +0000 [info]: starting fluentd-1.16.1 as dry run mode ruby="3.1.4"
2023-08-28 12:08:12 +0000 [info]: using configuration file: <ROOT>
  <source>
    @type tail
    path "/var/log/httpd/access_log"
    pos_file "/var/log/td-agent/tmp/access.log.pos"
    tag "os.apache.access"
    <parse>
      @type "none"
      unmatched_lines 
    </parse>
  </source>
  <source>
    @type tail
    path "/var/log/secure"
    pos_file "/var/log/td-agent/tmp/secure.log.pos"
    tag "os.secure"
    <parse>
      @type "none"
      unmatched_lines 
    </parse>
  </source>
  <match os.**>
    @type s3
    s3_bucket "td-agent-test-o2mami-bucket"
    path "${tag}/"
    s3_region "ap-northeast-1"
    time_slice_format %Y/%m/%d
    s3_object_key_format "%{path}%{time_slice}/%{index}.%{file_extension}"
    <format>
      @type "single_value"
    </format>
    <buffer tag,time>
      @type "file"
      path "/var/log/td-agent/s3/${tag}"
      timekey 30
      timekey_wait 30
      timekey_use_utc true
    </buffer>
  </match>
</ROOT>
2023-08-28 12:08:12 +0000 [info]: finished dry run mode

errorメッセージが表示されず、設定ファイルが正しく修正されたようでした。

この状態でサービスを起動してみます。

$  systemctl start td-agent
$  systemctl status td-agent.service
● td-agent.service - td-agent: Fluentd based data collector for Treasure Data
     Loaded: loaded (/usr/lib/systemd/system/td-agent.service; enabled; preset: disabled)
     Active: active (running) since Mon 2023-08-28 12:11:44 UTC; 6s ago
       Docs: https://docs.treasuredata.com/display/public/PD/About+Treasure+Data%27s+Server-Side+Agent
    Process: 2065 ExecStart=/opt/td-agent/bin/fluentd --log $TD_AGENT_LOG_FILE --daemon /var/run/td-agent/td-agent.pid $TD_AGENT_OPTIONS (code=exited, status=0/SUCCESS)
   Main PID: 2071 (fluentd)
      Tasks: 8 (limit: 4421)
     Memory: 109.6M
        CPU: 2.481s
     CGroup: /system.slice/td-agent.service
             ├─2071 /opt/td-agent/bin/ruby /opt/td-agent/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid
             └─2074 /opt/td-agent/bin/ruby -Eascii-8bit:ascii-8bit /opt/td-agent/bin/fluentd --log /var/log/td-agent/td-agent.log --daemon /var/run/td-agent/td-agent.pid --un>

Aug 28 12:11:43 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Starting td-agent: Fluentd based data collector for Treasure Data...
Aug 28 12:11:44 ip-10-234-112-8.ap-northeast-1.compute.internal systemd[1]: Started td-agent: Fluentd based data collector for Treasure Data.

無事、active状態となり、S3にログが出力されました!

さいごに

今回はfluentdの起動に失敗した時の対処方法をお伝えしました。

最後までお読みいただきありがとうございました!
どなたかのお役に立てれば幸いです。

以上、おつまみ(@AWS11077)でした!

参考資料

fluentdでOSのいろんなログをまとめてS3に出力する設定考えてみた | DevelopersIO