EKS で Auto monitor を有効化して CloudWatch Application Signals を使ってみた

EKS で Auto monitor を有効化して CloudWatch Application Signals を使ってみた

こんにちは。クラウド事業本部の枡川です。
EKS で Application Signals を利用する際、Auto Monitor を有効化できるようになったそうなので試してみます。

https://aws.amazon.com/about-aws/whats-new/2025/05/amazon-cloudwatch-signals-auto-monitor-eks-workloads/

CloudWatch Application Signals とは?

OpenTelemetry 互換のアプリケーションパフォーマンスモニタリングを実現するためのサービスです。
アプリケーションの自動計測をセットアップして、各種メトリクスを分析するためのダッシュボードを用意して、各種メトリクスから SLO を定義してアラームを設定して、といった一連の流れをベストプラクティスに沿った形で比較的簡単に行うことができます。

https://dev.classmethod.jp/articles/amazon-cloudwatch-application-signals-ga/

Auto Monitor は何が自動と言っている?

EKS で Application Signals を扱おうとすると、自動計装用のライブラリと計装に必要な各種環境変数が差し込まれます。

https://dev.classmethod.jp/articles/eks-cloudwatch-application-signals/

この際、マニフェストファイルにアノテーションを付けるか、マネジメントコンソールから Kubernetes リソースを選択して有効化する必要がありました。
Auto Monitor を利用することで、追加設定不要で各 Kubernetes リソースを Application Signals の監視対象とすることができます。

コンソールから設定する場合 (非 Auto Monitor)

サービスや名前空間を都度指定する必要があります。

app-sig.png

監視対象を AWS マネジメントコンソールから一元管理可能なことがメリットです。

マニフェストにアノテーションを付与する場合

instrumentation.opentelemetry.io/inject-xxx: "true" といったアノテーションを対象リソースに付与する必要があります。
例えば、Java の場合は下記のようになります。

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: default
  name: spring-boot
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: spring-boot
  replicas: 1
  template:
    metadata:
      labels:
        app.kubernetes.io/name: spring-boot
      annotations:
        instrumentation.opentelemetry.io/inject-java: "true"
    spec:
      containers:
        - image: xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app:v3
          imagePullPolicy: Always
          name: spring-boot
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "0.5"
          env:
            - name: DATABASE_HOST
              value: "sample-aurora-postgres-cluster.cluster-xxxxxxxxxxxx.ap-northeast-1.rds.amazonaws.com"
            - name: DATABASE_NAME
              value: "postgres"
            - name: DATABASE_USER
              value: "postgres"
            - name: DATABASE_PASSWORD
              value: "password"

監視対象のリソースを細かく制御したい場合、この方法が使いやすいと思います。

Auto Monitor を利用する場合

新しく利用できるようになった方法です。

auto-monitor.png

マネコンにも記載がある通り、Observability アドオンは v4.0.0 以上を使う必要があります。

https://github.com/aws-observability/helm-charts/releases/tag/amazon-cloudwatch-observability-4.0.0

監視対象リソースを細かく選択する必要が無い場合、設定を簡素化可能です。
Auto restart という項目もあり、こちらを合わせて有効化することで自動で全ての Pod を再起動して自動インスツルメントしてくれるようです。
特にアノテーションを付与しない Java のアプリケーションを構築した上で、有効化してみます。

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: default
  name: spring-boot
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: spring-boot
  replicas: 1
  template:
    metadata:
      labels:
        app.kubernetes.io/name: spring-boot
    spec:
      containers:
        - image: xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app:v3
          imagePullPolicy: Always
          name: spring-boot
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "0.5"
          env:
            - name: DATABASE_HOST
              value: "sample-aurora-postgres-cluster.cluster-xxxxxxxxxxxx.ap-northeast-1.rds.amazonaws.com"
            - name: DATABASE_NAME
              value: "postgres"
            - name: DATABASE_USER
              value: "postgres"
            - name: DATABASE_PASSWORD
              value: "password"

EKS クラスターの作成は下記記事と同様に作成しています。

https://dev.classmethod.jp/articles/cloudwatch-observability-addon-pod-identity/

Auto Monitor と Auto restart を有効化します。

auto-monitor2.png

有効化後、アプリケーションコンテナが再起動されました。
この際、Application Signals が対応している全言語分のアノテーションを自動で付与するようです。

% kubectl describe pod spring-boot-77d66d46fc-8fcsc
Name:             spring-boot-77d66d46fc-8fcsc
Namespace:        default
Priority:         0
Service Account:  default
Node:             i-0cfcb3f270d724f2c/10.0.101.165
Start Time:       Sun, 06 Jul 2025 14:55:21 +0900
Labels:           app.kubernetes.io/name=spring-boot
                  pod-template-hash=77d66d46fc
Annotations:      cloudwatch.aws.amazon.com/auto-annotate-dotnet: true
                  cloudwatch.aws.amazon.com/auto-annotate-java: true
                  cloudwatch.aws.amazon.com/auto-annotate-nodejs: true
                  cloudwatch.aws.amazon.com/auto-annotate-python: true
                  instrumentation.opentelemetry.io/inject-dotnet: true
                  instrumentation.opentelemetry.io/inject-java: true
                  instrumentation.opentelemetry.io/inject-nodejs: true
                  instrumentation.opentelemetry.io/inject-python: true
Status:           Running
IP:               10.0.101.81
IPs:
  IP:           10.0.101.81
Controlled By:  ReplicaSet/spring-boot-77d66d46fc
Init Containers:
  opentelemetry-auto-instrumentation-java:
    Container ID:  containerd://746b6ff59be48d39eba589e37a9f5997f07e6f625acf159ac22505f601d7882c
    Image:         602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-java:v2.10.0
    Image ID:      602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-java@sha256:d16db829c68a6826c2ae28cba0feb063b48dd9c7ff434a8a4ecf4753d6d30dea
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      /javaagent.jar
      /otel-auto-instrumentation-java/javaagent.jar
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 06 Jul 2025 14:55:23 +0900
      Finished:     Sun, 06 Jul 2025 14:55:23 +0900
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  64Mi
    Requests:
      cpu:        50m
      memory:     64Mi
    Environment:  <none>
    Mounts:
      /otel-auto-instrumentation-java from opentelemetry-auto-instrumentation-java (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6w75l (ro)
  opentelemetry-auto-instrumentation-nodejs:
    Container ID:  containerd://90cbf1aba62bd341a9e720927a2c382f51f59b6d9842f936140cc5f67b862fd9
    Image:         602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-node:v0.6.0
    Image ID:      602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-node@sha256:bbc64bc498525678047f95e50734a2f027811787842848f3e7480011a94349fa
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -a
      /autoinstrumentation/.
      /otel-auto-instrumentation-nodejs
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 06 Jul 2025 14:55:26 +0900
      Finished:     Sun, 06 Jul 2025 14:55:28 +0900
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:        50m
      memory:     128Mi
    Environment:  <none>
    Mounts:
      /otel-auto-instrumentation-nodejs from opentelemetry-auto-instrumentation-nodejs (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6w75l (ro)
  opentelemetry-auto-instrumentation-python:
    Container ID:  containerd://e630ef46258c8d472b8cd67f7e51dc836ded4400ddd6a5c00c22d08c290b8da6
    Image:         602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-python:v0.9.0
    Image ID:      602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-python@sha256:3d579f46ac74eb2e6eee168b531f7b9357b45cf7328efd8c77fe8459670533d4
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -a
      /autoinstrumentation/.
      /otel-auto-instrumentation-python
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 06 Jul 2025 14:55:30 +0900
      Finished:     Sun, 06 Jul 2025 14:55:31 +0900
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  32Mi
    Requests:
      cpu:        50m
      memory:     32Mi
    Environment:  <none>
    Mounts:
      /otel-auto-instrumentation-python from opentelemetry-auto-instrumentation-python (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6w75l (ro)
  opentelemetry-auto-instrumentation-dotnet:
    Container ID:  containerd://c62272ed4ad0f47d05e57f9d1b6e2654365164a6545e15376022c68d3595d864
    Image:         602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-dotnet:v1.7.0
    Image ID:      602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-dotnet@sha256:e8e72b4a9f31b0d530286facc86f9e1f7aaecddfaa333b625ba79094a0b68262
    Port:          <none>
    Host Port:     <none>
    Command:
      cp
      -r
      /autoinstrumentation/.
      /otel-auto-instrumentation-dotnet
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Sun, 06 Jul 2025 14:55:35 +0900
      Finished:     Sun, 06 Jul 2025 14:55:35 +0900
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:        50m
      memory:     128Mi
    Environment:  <none>
    Mounts:
      /otel-auto-instrumentation-dotnet from opentelemetry-auto-instrumentation-dotnet (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6w75l (ro)
Containers:
  spring-boot:
    Container ID:   containerd://f53af8c0eb98ee024dec37f2ae53ebda4dab0284a8ae833a8b910a35776f3549
    Image:          xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app:v3
    Image ID:       xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app@sha256:6b8391a294eaab0fe8f421caa53026121f2d8cd2918004429e341fe33693855e
    Port:           80/TCP
    Host Port:      0/TCP
    State:          Running
      Started:      Sun, 06 Jul 2025 14:55:35 +0900
    Ready:          True
    Restart Count:  0
    Requests:
      cpu:  500m
    Environment:
      DATABASE_HOST:                                   sample-aurora-postgres-cluster.cluster-cx4ayeauo8zn.ap-northeast-1.rds.amazonaws.com
      DATABASE_NAME:                                   postgres
      DATABASE_USER:                                   postgres
      DATABASE_PASSWORD:                               password
      OTEL_EXPORTER_OTLP_PROTOCOL:                     http/protobuf
      OTEL_METRICS_EXPORTER:                           none
      OTEL_LOGS_EXPORTER:                              none
      OTEL_AWS_APP_SIGNALS_ENABLED:                    true
      OTEL_AWS_APPLICATION_SIGNALS_ENABLED:            true
      OTEL_TRACES_SAMPLER_ARG:                         endpoint=http://cloudwatch-agent.amazon-cloudwatch:2000
      OTEL_TRACES_SAMPLER:                             xray
      OTEL_EXPORTER_OTLP_TRACES_ENDPOINT:              http://cloudwatch-agent.amazon-cloudwatch:4316/v1/traces
      OTEL_AWS_APP_SIGNALS_EXPORTER_ENDPOINT:          http://cloudwatch-agent.amazon-cloudwatch:4316/v1/metrics
      OTEL_AWS_APPLICATION_SIGNALS_EXPORTER_ENDPOINT:  http://cloudwatch-agent.amazon-cloudwatch:4316/v1/metrics
      OTEL_AWS_APPLICATION_SIGNALS_RUNTIME_ENABLED:    true
      JAVA_TOOL_OPTIONS:                                -javaagent:/otel-auto-instrumentation-java/javaagent.jar
      OTEL_SERVICE_NAME:                               spring-boot
      OTEL_RESOURCE_ATTRIBUTES_POD_NAME:               spring-boot-77d66d46fc-8fcsc (v1:metadata.name)
      OTEL_RESOURCE_ATTRIBUTES_NODE_NAME:               (v1:spec.nodeName)
      OTEL_PROPAGATORS:                                tracecontext,baggage,b3,xray
      NODE_OPTIONS:                                     --require /otel-auto-instrumentation-nodejs/autoinstrumentation.js
      OTEL_PYTHON_DISTRO:                              aws_distro
      OTEL_PYTHON_CONFIGURATOR:                        aws_configurator
      PYTHONPATH:                                      /otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation:/otel-auto-instrumentation-python
      OTEL_TRACES_EXPORTER:                            otlp
      OTEL_EXPORTER_OTLP_TRACES_PROTOCOL:              http/protobuf
      OTEL_EXPORTER_OTLP_METRICS_PROTOCOL:             http/protobuf
      OTEL_EXPORTER_OTLP_ENDPOINT:                     http://cloudwatch-agent.amazon-cloudwatch:4316
      OTEL_DOTNET_DISTRO:                              aws_distro
      OTEL_DOTNET_CONFIGURATOR:                        aws_configurator
      OTEL_DOTNET_AUTO_PLUGINS:                        AWS.Distro.OpenTelemetry.AutoInstrumentation.Plugin, AWS.Distro.OpenTelemetry.AutoInstrumentation
      CORECLR_ENABLE_PROFILING:                        1
      CORECLR_PROFILER:                                {918728DD-259F-4A6A-AC2B-B85E1B658318}
      CORECLR_PROFILER_PATH:                           /otel-auto-instrumentation-dotnet/linux-x64/OpenTelemetry.AutoInstrumentation.Native.so
      DOTNET_STARTUP_HOOKS:                            /otel-auto-instrumentation-dotnet/net/OpenTelemetry.AutoInstrumentation.StartupHook.dll
      DOTNET_ADDITIONAL_DEPS:                          /otel-auto-instrumentation-dotnet/AdditionalDeps
      OTEL_DOTNET_AUTO_HOME:                           /otel-auto-instrumentation-dotnet
      DOTNET_SHARED_STORE:                             /otel-auto-instrumentation-dotnet/store
      OTEL_RESOURCE_ATTRIBUTES:                        com.amazonaws.cloudwatch.entity.internal.service.name.source=K8sWorkload,k8s.container.name=spring-boot,k8s.deployment.name=spring-boot,k8s.namespace.name=default,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.replicaset.name=spring-boot-77d66d46fc,service.version=v3
    Mounts:
      /otel-auto-instrumentation-dotnet from opentelemetry-auto-instrumentation-dotnet (rw)
      /otel-auto-instrumentation-java from opentelemetry-auto-instrumentation-java (rw)
      /otel-auto-instrumentation-nodejs from opentelemetry-auto-instrumentation-nodejs (rw)
      /otel-auto-instrumentation-python from opentelemetry-auto-instrumentation-python (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6w75l (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  kube-api-access-6w75l:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
  opentelemetry-auto-instrumentation-java:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  200Mi
  opentelemetry-auto-instrumentation-nodejs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  200Mi
  opentelemetry-auto-instrumentation-python:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:  200Mi
  opentelemetry-auto-instrumentation-dotnet:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:
    SizeLimit:   200Mi
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  83s   default-scheduler  Successfully assigned default/spring-boot-77d66d46fc-8fcsc to i-0cfcb3f270d724f2c
  Normal  Pulling    82s   kubelet            Pulling image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-java:v2.10.0"
  Normal  Pulled     81s   kubelet            Successfully pulled image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-java:v2.10.0" in 975ms (975ms including waiting). Image size: 30171277 bytes.
  Normal  Created    81s   kubelet            Created container: opentelemetry-auto-instrumentation-java
  Normal  Started    81s   kubelet            Started container opentelemetry-auto-instrumentation-java
  Normal  Pulling    80s   kubelet            Pulling image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-node:v0.6.0"
  Normal  Pulled     78s   kubelet            Successfully pulled image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-node:v0.6.0" in 2.359s (2.359s including waiting). Image size: 6979839 bytes.
  Normal  Created    78s   kubelet            Created container: opentelemetry-auto-instrumentation-nodejs
  Normal  Started    78s   kubelet            Started container opentelemetry-auto-instrumentation-nodejs
  Normal  Pulling    75s   kubelet            Pulling image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-python:v0.9.0"
  Normal  Pulled     74s   kubelet            Successfully pulled image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-python:v0.9.0" in 1.252s (1.252s including waiting). Image size: 7545298 bytes.
  Normal  Created    74s   kubelet            Created container: opentelemetry-auto-instrumentation-python
  Normal  Started    74s   kubelet            Started container opentelemetry-auto-instrumentation-python
  Normal  Pulling    72s   kubelet            Pulling image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-dotnet:v1.7.0"
  Normal  Pulled     70s   kubelet            Successfully pulled image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-dotnet:v1.7.0" in 2.39s (2.39s including waiting). Image size: 50068755 bytes.
  Normal  Created    70s   kubelet            Created container: opentelemetry-auto-instrumentation-dotnet
  Normal  Started    69s   kubelet            Started container opentelemetry-auto-instrumentation-dotnet
  Normal  Pulling    69s   kubelet            Pulling image "xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app:v3"
  Normal  Pulled     69s   kubelet            Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app:v3" in 151ms (151ms including waiting). Image size: 236889469 bytes.
  Normal  Created    69s   kubelet            Created container: spring-boot
  Normal  Started    69s   kubelet            Started container spring-boot

無事各種環境変数なども設定されていますね。
Application Signal のサービスとしても登録され、SLA 管理を行ったり、トレースを確認することができるようになりました。

appsig3.png

appsig4.png

今回はコンソールから設定しましたが、EKS アドオン側の設定でも Auto Monitor/Auto restart の設定を行うことも可能です。

https://docs.aws.amazon.com/ja_jp/AmazonCloudWatch/latest/monitoring/CloudWatch-Application-Signals-Enable-EKS.html

また、CoreDNS などの Kubernetes の動作に必要な元々存在するリソースは監視対象外になりました。

appsig5.png

最後に

サービス作成後に都度 Application Signals 側の画面で選択していたり、マニフェストファイルへのアノテーション付与が面倒に感じている方は是非試してみて下さい!

Share this article

facebook logohatena logotwitter logo

© Classmethod, Inc. All rights reserved.