Setting up Fluentd on Amazon Linux: A step-by-step guide

2023.04.14

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

Hello, my name is Aayush, I recently discovered a service called fluentd that collects log data and sends it to a variety of destinations. so i will blog the steps and

What is Fluentd?

Fluentd is a free and open-source data gathering and forwarding programme that can be used for log processing, metric collection, and other data aggregation operations. It is intended to be highly scalable and adaptable, capable of processing massive amounts of data from numerous sources and routing it to various destinations such as Elasticsearch, Hadoop, or Amazon S3. Fluentd supports over 500 plugins and can be modified to include custom plugins for various data sources and destinations. It is often used to gather and manage log data created by multiple containers and microservices in cloud-native systems such as Kubernetes and Docker.

Here are the steps to configure Fluentd on Amazon Linux:

Install Fluentd:

The first step is to download and install Fluentd on your computer or server. Fluentd is compatible with a number of operating systems, including Linux, macOS, and Windows. Fluentd can be installed via package managers such as yum, apt-get, or Homebrew, or it can be downloaded and installed manually from the Fluentd website.

Update the system packages:

sudo yum update

Install the required dependencies:

$sudo yum install -y ruby-devel gcc gcc-c++ make

Install Fluentd using RubyGems:

$sudo gem install fluentd

td-agent

To handle plugin gems, td-agent-gem is utilised. The following command, for example, installs a plugin to connect to S3:

$sudo /usr/sbin/td-agent-gem install fluent-plugin-s3

Configure Fluentd:

After installing Fluentd, you must generate a configuration file that describes the input, output, and filtering plugins you wish to use. The configuration file is commonly written in YAML and can be saved in the /etc/fluentd directory or elsewhere. The Fluentd configuration file can get complicated depending on the amount and complexity of plugins used.

Bellow are the steps

$sudo mkdir /etc/fluentd
$sudo touch /etc/fluentd/fluent.conf

Set up input plugins: Fluentd input plugins are used to collect data from different sources, such as log files, system logs, or network protocols like syslog or TCP. You can configure the input plugins in your Fluentd configuration file to read data from your desired source.

Set up output plugins: Fluentd output plugins are used to forward data to various destinations, such as Elasticsearch, Hadoop, or cloud storage services like Amazon S3. You can configure the output plugins in your Fluentd configuration file to send data to your desired destination.

Set up filtering plugins: Fluentd filtering plugins are used to manipulate and transform data before it is forwarded to the output plugins. You can configure the filtering plugins in your Fluentd configuration file to apply filters to the data collected from the input plugins.

For example, to collect logs from the Apache web server and forward them to s3, you can use the following configuration:

<source>
  @type tail
  path /var/log/httpd/access_log
  pos_file /var/log/td-agent/httpd-access.pos
  tag apache.access
  format apache
</source>

<filter apache.access>
  @type parser
  key_name message
  format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>[^ ]*) (?<path>[^ ]*) [^"]*" (?<code>[^ ]*) (?<size>[^ ]*) "([^"]*)" "([^"]*)"$/
  time_key time
  time_format %d/%b/%Y:%H:%M:%S %z
  reserve_data true
</filter>

<match apache.access>
  @type s3
  aws_key_id YOUR_AWS_KEY_ID
  aws_sec_key YOUR_AWS_SECRET_KEY
  s3_bucket YOUR_S3_BUCKET_NAME
  s3_region ap-northeast-1
  path logs/
  # if you want to use ${tag} or %Y/%m/%d/ like syntax in path / s3_object_key_format,
  # need to specify tag for ${tag} and time for %Y/%m/%d in <buffer> argument.
  <buffer tag,time>
    @type file
    path /var/log/fluent/s3
    timekey 3600 # 1 hour partition
    timekey_wait 10m
    timekey_use_utc true # use utc
    chunk_limit_size 256m
  </buffer>
</match>

Save and close the configuration file.

Conclusion

With Fluentd flexibility and versatility, you can easily customize it to meet your specific data collection and forwarding needs. if you are considering for open source Data collection you should give a try to fluentd

References

fluentd Documents