Prometheus用のフルマネージドでエージェントレスなコレクターが発表されました！　#AWSreInvent

従来、EKSのPrometheusメトリクス収集にはユーザー側でエージェントの導入が必須でしたが、今回のアップデートでエージェントレスなフルマネージドコレクターが発表されました。運用の省力化が期待できます。

AWS re:Invent 2023

2023.12.04

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

皆さん、EKS環境のメトリクス可視化にManaged Service for Prometheus使ってますか？従来は、Prometheus用のメトリクス収集のためにエージェントの導入が必須だったのですが、今回のアップデートで、フルマネージドなエージェントレスコネクターが発表されました。

アップデートの概要と、実際に設定してみた様子をお届けするので、このアップデート気になっていた方は要チェックやで！！

なんか、楽なのきたか…!!

　 ( ﾟдﾟ)　ｶﾞﾀｯ
　 /　　 ヾ
＿_L| /￣￣￣/＿
　 ＼/　　　/

一部制約もあるYO

アップデートの概要

アップデートの公式情報はこちら。

Amazon Managed Service for Prometheus launches an agentless collector for Prometheus metrics from Amazon EKS

a fully-managed agentless collector customers can use to collect Prometheus metrics from their workloads running on Amazon EKS. Customers can now enable the discovery and collection of Prometheus metrics from their Amazon EKS applications and infrastructure through the EKS console or through an API call, without having to self-manage agents.
（以下、日本語訳）
Amazon Managed Service for Prometheus collectorは、Amazon EKS上で実行されているワークロードからPrometheusメトリクスを収集するためにお客様が使用できるフルマネージドエージェントレスコレクターです。お客様は、エージェントを自己管理することなく、EKSコンソールまたはAPIコールを通じて、Amazon EKSアプリケーションとインフラストラクチャからPrometheusメトリクスの検出と収集を可能にすることができます。

以前は、EKSワークロードからのPrometheusメトリクスの収集には、個別にPrometheusのエージェントをインストールする必要がありましたが、今回のアップデートで、このエージェントをユーザー側で管理することなく、マネージドな仕組みでエージェントレスでPrometheusのメトリクス収集ができるようになりました。

詳細は、以下の公式ドキュメントを参照。

AWS managed collectors - Amazon Managed Service for Prometheus

利用可能なリージョン

Amazon Managed Service for Prometheusが利用可能な全てのリージョンで、利用可能です。

主な制約

agentless collectorには、以下の公式ドキュメントに記載の通りの制約があります。

Scraper limitations

リージョン
- EKSクラスタ、Amazon Managed Service for Prometheus スクレイパー、Amazon Managed Service for Prometheusワークスペースはすべて同じAWSリージョンにある必要があります
アカウント
- EKSクラスタ、Amazon Managed Service for Prometheus スクレイパー、Amazon Managed Service for Prometheusワークスペースはすべて同じAWSアカウントである必要があります
コレクター
- Amazon Managed Service for Prometheus スクレイパーは、アカウントごとにリージョンあたり最大 10 個まで持つことができます

料金

Amazon Managed Service for Prometheus Pricing

上記公式ドキュメントに記載がありますが、Amazon Managed Service for Prometheusの料金はメトリクスのストレージ、クエリにより処理されるサンプル、コレクターアワー、クエリにより処理されるサンプルに依存しています。このあたり、慣れていないと理解が難しいのですが、下記公式ドキュメントに、コストの最適化方法が記載されているので、合わせて参照するのをオススメします。

Understand and optimize costs

Workspaceを削除したのに、まだPrometheusの料金がかかる場合

2024年5月13日現在、AWSのマネージドコンソールからは、このScraperの存在を確認できません。Workspaceは削除したけれどPrometheusの料金が継続して発生している場合、Scraperによる課金が継続していることが考えられます。以下のコマンドを参考に、Scraperを削除しておきましょう。

Scraperの存在を確認。

$ aws amp list-scrapers

Scraperがある場合、以下のコマンドで、Scraperを削除。

$ aws amp delete-scraper --scraper-id scraperId

Scraperの削除は思いの外時間がかかります（自分の場合は10分ほどかかりました）。削除実行中は、list-scrapersの実行結果で、statusCodeがDELETINGになっているかと思うのでしばらく待ちましょう。

このあたり、下記公式ドキュメントにも記載がありました。

I deleted all my Amazon Managed Service for Prometheus workspaces, but I still seem to be getting charged. What might be happening?

agentless collectorを使ってみる。

というわけで、実際にagentless collectorをセットアップして使ってみます。各ツールのバージョンは以下の通り。

$ aws --version
aws-cli/2.14.5 Python/3.11.6 Darwin/22.5.0 exe/x86_64 prompt/off
$ kubectl version --client
Client Version: v1.28.3-eks-e71965b
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
$ eksctl version
0.165.0

Prometheusワークスペースの作成

最初に、新規でPrometheusワークスペースを作成しておきます。以下では、ワークスペース名をtest-managed-collector-workspaceで作成します。

$ aws amp create-workspace --alias=test-managed-collector-workspace

EKSクラスターの作成

東京リージョンに、test-managed-collectorという、EKSクラスターを作成します。AWSマネジメントコンソールのEKSクラスター作成ウィザードに、Managed Collectorの設定が既に追加されているので、ここでは、マネジメントコンソールからクラスターを作成します。

マネジメントコンソールからEKSにアクセスし、[Add Cluster] -> [Create]。

Name:test-managed-collector-cluster
Kubernetes version:1.28
Cluster service role:既定のものを選択

[Next]をクリックすると、Specify networking画面が表示されます。ここはデフォルトのまま、[Next]をクリック。

Configure observabilityの画面が表示されます。ここで、メトリクスに関する設定を実施します。

Prometheus: [Send Prometheus metrics to Amazon Managed Service for Prometheus]をクリック
Scraper alias:　デフォルトのまま
Destination: 作成するScraperの接続先を指定します。[Select existing workspace]をクリックし、前の手順で作成したPrometheus Workspaceを選択

その他設定はデフォルトのまま、[Next]ボタンをクリック。Select add-ons画面が表示されるので、デフォルトのまま（kube-proxy、Amazon VPC CNI、CoreDNS、Amazon EKS Pod Identity Agentが選択）で、[Next]ボタンをクリック。

Configure selected add-ons settings画面はそのままで、[Next]ボタンをクリック。

Review and create画面で、[Create Cluster]ボタンをクリック。クラスターが作成されるのを待ちます。無事、StatusがActiveになればOK。

Prometheusのscraper設定ファイルの作成

Managed Collectorの設定前に、Prometheus用のscraperファイルを作成しておきます。ここでは、Amazon Managed Service for Prometheus API の GetDefaultScraperConfigurationで提供されている内容をbase64デコードして、default.ymlに書き出しておきます。一旦、設定はこのデフォルトファイルをそのまま利用します。

get-default-scraper-configurationAPIは、base64エンコードされた状態で出力されるので、base64からデコードして、ファイル保存します。

$ aws amp get-default-scraper-configuration --output text | base64 -D > default.yml

ファイルdefault.ymlの内容は、以下の通り。

global:
  scrape_interval: 30s
  # external_labels:
    # clusterArn: <REPLACE_ME>
scrape_configs:
  # pod metrics
  - job_name: pod_exporter
    kubernetes_sd_configs:
      - role: pod
  # container metrics
  - job_name: cadvisor
    scheme: https
    authorization:
      credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - replacement: kubernetes.default.svc:443
        target_label: __address__
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
  # apiserver metrics
  - bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    job_name: kubernetes-apiservers
    kubernetes_sd_configs:
    - role: endpoints
    relabel_configs:
    - action: keep
      regex: default;kubernetes;https
      source_labels:
      - __meta_kubernetes_namespace
      - __meta_kubernetes_service_name
      - __meta_kubernetes_endpoint_port_name
    scheme: https
  # kube proxy metrics
  - job_name: kube-proxy
    honor_labels: true
    kubernetes_sd_configs:
    - role: pod
    relabel_configs:
    - action: keep
      source_labels:
      - __meta_kubernetes_namespace
      - __meta_kubernetes_pod_name
      separator: '/'
      regex: 'kube-system/kube-proxy.+'
    - source_labels:
      - __address__
      action: replace
      target_label: __address__
      regex: (.+?)(\\:\\d+)?
      replacement: $1:10249

メトリクス収集用のscraperの作成

Amazon Managed Service for Prometheusのコレクターは、EKSクラスターからメトリクスを検出し収集するscraperで構成されます。scraper用の構成ファイルを作成します。前の段階で作成したscraper設定ファイルを利用して、クラスター用のscraperを設定します。

設定には、AWS CLIのcreate-scraperAPIを利用します。

以下の変数を利用して、scraperを作成します。

EKSクラスターのARN
EKSクラスターのSecurityGroupID
EKSクラスターのsubnetID
Managed Prometheus WorkspaceのARN
--scrape-configuration：前の手順で作成したdefault.ymlをbase64エンコードして利用

EKSクラスターに設定されているSecurityGroupとSubnetIDとPrometheusのワークスペースARNを利用して、[aws amp create-scraper]コマンドでscraperを作成します。実際のコマンドは上記パラメーターで適宜書き換えて実行してください。

aws amp create-scraper \
  --source eksConfiguration="{clusterArn='arn:aws:eks:us-west-2:account-id:cluster/cluster-name', securityGroupIds=['sg-security-group-id'],subnetIds=['subnet-subnet-id-1', 'subnet-subnet-id-2']}" \
  --destination ampConfiguration="{workspaceArn='arn:aws:aps:us-west-2:account-id:workspace/ws-workspace-id'}" \
  --scrape-configuration configurationBlob=$(cat default.yml | base64)"

うまく登録できると、このように作成されたScraperIdが表示されます。マネジメントコンソールでもクラスターの[Observability]タブに、作成したScraperが表示されています。現段階では、ロールの設定が完了していないため、StatusはCreatingとなっています。

EKSクラスターの設定

公式ドキュメントのConfiguring your Amazon EKS clusterを参考に、EKSクラスターの設定を進めます。

以下のclusterrole-binding.ymlファイルを作成します。

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: aps-collector-role
rules:
  - apiGroups: [""]
    resources: ["nodes", "nodes/proxy", "nodes/metrics", "services", "endpoints", "pods", "ingresses", "configmaps"]
    verbs: ["describe", "get", "list", "watch"]
  - apiGroups: ["extensions", "networking.k8s.io"]
    resources: ["ingresses/status", "ingresses"]
    verbs: ["describe", "get", "list", "watch"]
  - nonResourceURLs: ["/metrics"]
    verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: aps-collector-user-role-binding
subjects:
- kind: User
  name: aps-collector-user
  apiGroup: rbac.authorization.k8s.io
roleRef:
  kind: ClusterRole
  name: aps-collector-role
  apiGroup: rbac.authorization.k8s.io

クラスターに適用します。

$ kubectl apply -f clusterrole-binding.yml

前手順で作成したScraperのIDを利用して、Scraperの内容を確認します。

$ aws amp describe-scraper --scraper-id <scraper-id>

出力内容から、roleArnを確認し、以下のコマンドで、ロールにIAMをマッピングします。

$ eksctl create iamidentitymapping --cluster test-managed-collector-cluster --arn <roleArn> --username aps-collector-user

このコマンドで、scraperが、clusterrole-binding.ymlで作成した、ロールとユーザーでクラスターにアクセスすることを許可します。この権限付与が正常に完了したら、このように先程InActiveだったscraperのStatusがActiveになります。

メトリクス収集内容の確認

Managed Service for Prometheusでメトリクスが収集されているかどうか確認するのに、最も手っ取り早い方法は、awscurlを利用する方法です。

参考：Query using Prometheus-compatible APIs - Amazon Managed Service for Prometheus

macの場合、以下のコマンドで、awscurlをインストール。

$ brew install awscurl

先ほど作成したPrometheus WorkspaceのQUERYエンドポイントを取得します。Workspace-idを、皆さんの環境のPrometheus Workspaceの値に書き換えてください。このQUERYエンドポイントは、マネジメントコンソールからも確認できます。

$ export AMP_QUERY_ENDPOINT=https://aps-workspaces.Region.amazonaws.com/workspaces/<Workspace-id>/api/v1/query

aws認証情報を保持したクライアントから、PromQLのクエリupを実行し、何かしらのJSONが返却されればOKです！

$ awscurl -X POST --service aps "$AMP_QUERY_ENDPOINT?query=up"

{
  "status": "success",
  "data": {
    "resultType": "vector",
    "result": [
      {
        "metric": {
          "__name__": "up",
          "instance": "localhost:9090",
          "job": "prometheus",
          "monitor": "monitor"
        },
        "value": [
          1652452637.636,
          "1"
        ]
      },
      
      
    ]
  }
}

Prometheusのクエリ式の詳細は、こちらの公式ドキュメントから確認。

Querying basics | Prometheus

もし、正常にメトリクスが収集できていない場合は、下記公式ドキュメントを参考にscraperの設定を見直してみてください。

Troubleshooting scraper configuration

（オプション）Managed Grafanaを利用したPrometheusメトリクスの可視化

Prometheusのメトリクスは、実際にはGrafana上で可視化することが多いでしょう。まだGrafanaの環境を持っていない場合は、以下を参考に、Grafanaのセットアップと、上記Prometheus WorkspaceへのDataSource設定を実施してみてください。

公式：Set up Amazon Managed Grafana for use with Amazon Managed Service for Prometheus - Amazon Managed Service for Prometheus
ユーザー管理をIAM Identity Centerを利用する場合
- 参考：Amazon Managed Grafanaをセットアップして使ってみる | DevelopersIO
ユーザー管理をSAML(Auth0)を利用する場合
- 参考：Amazon Managed Grafana（AMG）のワークスペースに管理者ユーザー/一般ユーザーがSAML（Auth0）でアクセスできるようにしてみた | DevelopersIO

エージェントレスコネクターによる、さらなるEKS監視のPrometheusメトリクス利用の広がりに期待

以上、公式ドキュメントを紐解きながら、PrometheusのAWS managed collectorの設定方法を解説してきました。今までのhelmチャートを利用したNodeへのエージェント導入に比べて、作成するリソースがシンプルになり管理対象が減っていることが理解できたかと思います。

冒頭の公式情報の概要にも述べましたが、以下の制約がありますので、まずは皆さんの環境で導入の余地があるかどうか把握した後、開発環境で試してみていただければと思います。

Scraper limitations

それでは、今日はこのへんで。濱田（@hamako9999）でした。