EKS で Auto monitor を有効化して CloudWatch Application Signals を使ってみた
こんにちは。クラウド事業本部の枡川です。
EKS で Application Signals を利用する際、Auto Monitor を有効化できるようになったそうなので試してみます。
CloudWatch Application Signals とは?
OpenTelemetry 互換のアプリケーションパフォーマンスモニタリングを実現するためのサービスです。
アプリケーションの自動計測をセットアップして、各種メトリクスを分析するためのダッシュボードを用意して、各種メトリクスから SLO を定義してアラームを設定して、といった一連の流れをベストプラクティスに沿った形で比較的簡単に行うことができます。
Auto Monitor は何が自動と言っている?
EKS で Application Signals を扱おうとすると、自動計装用のライブラリと計装に必要な各種環境変数が差し込まれます。
この際、マニフェストファイルにアノテーションを付けるか、マネジメントコンソールから Kubernetes リソースを選択して有効化する必要がありました。
Auto Monitor を利用することで、追加設定不要で各 Kubernetes リソースを Application Signals の監視対象とすることができます。
コンソールから設定する場合 (非 Auto Monitor)
サービスや名前空間を都度指定する必要があります。
監視対象を AWS マネジメントコンソールから一元管理可能なことがメリットです。
マニフェストにアノテーションを付与する場合
instrumentation.opentelemetry.io/inject-xxx: "true"
といったアノテーションを対象リソースに付与する必要があります。
例えば、Java の場合は下記のようになります。
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: default
name: spring-boot
spec:
selector:
matchLabels:
app.kubernetes.io/name: spring-boot
replicas: 1
template:
metadata:
labels:
app.kubernetes.io/name: spring-boot
annotations:
instrumentation.opentelemetry.io/inject-java: "true"
spec:
containers:
- image: xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app:v3
imagePullPolicy: Always
name: spring-boot
ports:
- containerPort: 80
resources:
requests:
cpu: "0.5"
env:
- name: DATABASE_HOST
value: "sample-aurora-postgres-cluster.cluster-xxxxxxxxxxxx.ap-northeast-1.rds.amazonaws.com"
- name: DATABASE_NAME
value: "postgres"
- name: DATABASE_USER
value: "postgres"
- name: DATABASE_PASSWORD
value: "password"
監視対象のリソースを細かく制御したい場合、この方法が使いやすいと思います。
Auto Monitor を利用する場合
新しく利用できるようになった方法です。
マネコンにも記載がある通り、Observability アドオンは v4.0.0 以上を使う必要があります。
監視対象リソースを細かく選択する必要が無い場合、設定を簡素化可能です。
Auto restart という項目もあり、こちらを合わせて有効化することで自動で全ての Pod を再起動して自動インスツルメントしてくれるようです。
特にアノテーションを付与しない Java のアプリケーションを構築した上で、有効化してみます。
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: default
name: spring-boot
spec:
selector:
matchLabels:
app.kubernetes.io/name: spring-boot
replicas: 1
template:
metadata:
labels:
app.kubernetes.io/name: spring-boot
spec:
containers:
- image: xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app:v3
imagePullPolicy: Always
name: spring-boot
ports:
- containerPort: 80
resources:
requests:
cpu: "0.5"
env:
- name: DATABASE_HOST
value: "sample-aurora-postgres-cluster.cluster-xxxxxxxxxxxx.ap-northeast-1.rds.amazonaws.com"
- name: DATABASE_NAME
value: "postgres"
- name: DATABASE_USER
value: "postgres"
- name: DATABASE_PASSWORD
value: "password"
EKS クラスターの作成は下記記事と同様に作成しています。
Auto Monitor と Auto restart を有効化します。
有効化後、アプリケーションコンテナが再起動されました。
この際、Application Signals が対応している全言語分のアノテーションを自動で付与するようです。
% kubectl describe pod spring-boot-77d66d46fc-8fcsc
Name: spring-boot-77d66d46fc-8fcsc
Namespace: default
Priority: 0
Service Account: default
Node: i-0cfcb3f270d724f2c/10.0.101.165
Start Time: Sun, 06 Jul 2025 14:55:21 +0900
Labels: app.kubernetes.io/name=spring-boot
pod-template-hash=77d66d46fc
Annotations: cloudwatch.aws.amazon.com/auto-annotate-dotnet: true
cloudwatch.aws.amazon.com/auto-annotate-java: true
cloudwatch.aws.amazon.com/auto-annotate-nodejs: true
cloudwatch.aws.amazon.com/auto-annotate-python: true
instrumentation.opentelemetry.io/inject-dotnet: true
instrumentation.opentelemetry.io/inject-java: true
instrumentation.opentelemetry.io/inject-nodejs: true
instrumentation.opentelemetry.io/inject-python: true
Status: Running
IP: 10.0.101.81
IPs:
IP: 10.0.101.81
Controlled By: ReplicaSet/spring-boot-77d66d46fc
Init Containers:
opentelemetry-auto-instrumentation-java:
Container ID: containerd://746b6ff59be48d39eba589e37a9f5997f07e6f625acf159ac22505f601d7882c
Image: 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-java:v2.10.0
Image ID: 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-java@sha256:d16db829c68a6826c2ae28cba0feb063b48dd9c7ff434a8a4ecf4753d6d30dea
Port: <none>
Host Port: <none>
Command:
cp
/javaagent.jar
/otel-auto-instrumentation-java/javaagent.jar
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 06 Jul 2025 14:55:23 +0900
Finished: Sun, 06 Jul 2025 14:55:23 +0900
Ready: True
Restart Count: 0
Limits:
cpu: 500m
memory: 64Mi
Requests:
cpu: 50m
memory: 64Mi
Environment: <none>
Mounts:
/otel-auto-instrumentation-java from opentelemetry-auto-instrumentation-java (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6w75l (ro)
opentelemetry-auto-instrumentation-nodejs:
Container ID: containerd://90cbf1aba62bd341a9e720927a2c382f51f59b6d9842f936140cc5f67b862fd9
Image: 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-node:v0.6.0
Image ID: 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-node@sha256:bbc64bc498525678047f95e50734a2f027811787842848f3e7480011a94349fa
Port: <none>
Host Port: <none>
Command:
cp
-a
/autoinstrumentation/.
/otel-auto-instrumentation-nodejs
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 06 Jul 2025 14:55:26 +0900
Finished: Sun, 06 Jul 2025 14:55:28 +0900
Ready: True
Restart Count: 0
Limits:
cpu: 500m
memory: 128Mi
Requests:
cpu: 50m
memory: 128Mi
Environment: <none>
Mounts:
/otel-auto-instrumentation-nodejs from opentelemetry-auto-instrumentation-nodejs (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6w75l (ro)
opentelemetry-auto-instrumentation-python:
Container ID: containerd://e630ef46258c8d472b8cd67f7e51dc836ded4400ddd6a5c00c22d08c290b8da6
Image: 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-python:v0.9.0
Image ID: 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-python@sha256:3d579f46ac74eb2e6eee168b531f7b9357b45cf7328efd8c77fe8459670533d4
Port: <none>
Host Port: <none>
Command:
cp
-a
/autoinstrumentation/.
/otel-auto-instrumentation-python
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 06 Jul 2025 14:55:30 +0900
Finished: Sun, 06 Jul 2025 14:55:31 +0900
Ready: True
Restart Count: 0
Limits:
cpu: 500m
memory: 32Mi
Requests:
cpu: 50m
memory: 32Mi
Environment: <none>
Mounts:
/otel-auto-instrumentation-python from opentelemetry-auto-instrumentation-python (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6w75l (ro)
opentelemetry-auto-instrumentation-dotnet:
Container ID: containerd://c62272ed4ad0f47d05e57f9d1b6e2654365164a6545e15376022c68d3595d864
Image: 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-dotnet:v1.7.0
Image ID: 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-dotnet@sha256:e8e72b4a9f31b0d530286facc86f9e1f7aaecddfaa333b625ba79094a0b68262
Port: <none>
Host Port: <none>
Command:
cp
-r
/autoinstrumentation/.
/otel-auto-instrumentation-dotnet
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 06 Jul 2025 14:55:35 +0900
Finished: Sun, 06 Jul 2025 14:55:35 +0900
Ready: True
Restart Count: 0
Limits:
cpu: 500m
memory: 128Mi
Requests:
cpu: 50m
memory: 128Mi
Environment: <none>
Mounts:
/otel-auto-instrumentation-dotnet from opentelemetry-auto-instrumentation-dotnet (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6w75l (ro)
Containers:
spring-boot:
Container ID: containerd://f53af8c0eb98ee024dec37f2ae53ebda4dab0284a8ae833a8b910a35776f3549
Image: xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app:v3
Image ID: xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app@sha256:6b8391a294eaab0fe8f421caa53026121f2d8cd2918004429e341fe33693855e
Port: 80/TCP
Host Port: 0/TCP
State: Running
Started: Sun, 06 Jul 2025 14:55:35 +0900
Ready: True
Restart Count: 0
Requests:
cpu: 500m
Environment:
DATABASE_HOST: sample-aurora-postgres-cluster.cluster-cx4ayeauo8zn.ap-northeast-1.rds.amazonaws.com
DATABASE_NAME: postgres
DATABASE_USER: postgres
DATABASE_PASSWORD: password
OTEL_EXPORTER_OTLP_PROTOCOL: http/protobuf
OTEL_METRICS_EXPORTER: none
OTEL_LOGS_EXPORTER: none
OTEL_AWS_APP_SIGNALS_ENABLED: true
OTEL_AWS_APPLICATION_SIGNALS_ENABLED: true
OTEL_TRACES_SAMPLER_ARG: endpoint=http://cloudwatch-agent.amazon-cloudwatch:2000
OTEL_TRACES_SAMPLER: xray
OTEL_EXPORTER_OTLP_TRACES_ENDPOINT: http://cloudwatch-agent.amazon-cloudwatch:4316/v1/traces
OTEL_AWS_APP_SIGNALS_EXPORTER_ENDPOINT: http://cloudwatch-agent.amazon-cloudwatch:4316/v1/metrics
OTEL_AWS_APPLICATION_SIGNALS_EXPORTER_ENDPOINT: http://cloudwatch-agent.amazon-cloudwatch:4316/v1/metrics
OTEL_AWS_APPLICATION_SIGNALS_RUNTIME_ENABLED: true
JAVA_TOOL_OPTIONS: -javaagent:/otel-auto-instrumentation-java/javaagent.jar
OTEL_SERVICE_NAME: spring-boot
OTEL_RESOURCE_ATTRIBUTES_POD_NAME: spring-boot-77d66d46fc-8fcsc (v1:metadata.name)
OTEL_RESOURCE_ATTRIBUTES_NODE_NAME: (v1:spec.nodeName)
OTEL_PROPAGATORS: tracecontext,baggage,b3,xray
NODE_OPTIONS: --require /otel-auto-instrumentation-nodejs/autoinstrumentation.js
OTEL_PYTHON_DISTRO: aws_distro
OTEL_PYTHON_CONFIGURATOR: aws_configurator
PYTHONPATH: /otel-auto-instrumentation-python/opentelemetry/instrumentation/auto_instrumentation:/otel-auto-instrumentation-python
OTEL_TRACES_EXPORTER: otlp
OTEL_EXPORTER_OTLP_TRACES_PROTOCOL: http/protobuf
OTEL_EXPORTER_OTLP_METRICS_PROTOCOL: http/protobuf
OTEL_EXPORTER_OTLP_ENDPOINT: http://cloudwatch-agent.amazon-cloudwatch:4316
OTEL_DOTNET_DISTRO: aws_distro
OTEL_DOTNET_CONFIGURATOR: aws_configurator
OTEL_DOTNET_AUTO_PLUGINS: AWS.Distro.OpenTelemetry.AutoInstrumentation.Plugin, AWS.Distro.OpenTelemetry.AutoInstrumentation
CORECLR_ENABLE_PROFILING: 1
CORECLR_PROFILER: {918728DD-259F-4A6A-AC2B-B85E1B658318}
CORECLR_PROFILER_PATH: /otel-auto-instrumentation-dotnet/linux-x64/OpenTelemetry.AutoInstrumentation.Native.so
DOTNET_STARTUP_HOOKS: /otel-auto-instrumentation-dotnet/net/OpenTelemetry.AutoInstrumentation.StartupHook.dll
DOTNET_ADDITIONAL_DEPS: /otel-auto-instrumentation-dotnet/AdditionalDeps
OTEL_DOTNET_AUTO_HOME: /otel-auto-instrumentation-dotnet
DOTNET_SHARED_STORE: /otel-auto-instrumentation-dotnet/store
OTEL_RESOURCE_ATTRIBUTES: com.amazonaws.cloudwatch.entity.internal.service.name.source=K8sWorkload,k8s.container.name=spring-boot,k8s.deployment.name=spring-boot,k8s.namespace.name=default,k8s.node.name=$(OTEL_RESOURCE_ATTRIBUTES_NODE_NAME),k8s.pod.name=$(OTEL_RESOURCE_ATTRIBUTES_POD_NAME),k8s.replicaset.name=spring-boot-77d66d46fc,service.version=v3
Mounts:
/otel-auto-instrumentation-dotnet from opentelemetry-auto-instrumentation-dotnet (rw)
/otel-auto-instrumentation-java from opentelemetry-auto-instrumentation-java (rw)
/otel-auto-instrumentation-nodejs from opentelemetry-auto-instrumentation-nodejs (rw)
/otel-auto-instrumentation-python from opentelemetry-auto-instrumentation-python (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-6w75l (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
kube-api-access-6w75l:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
opentelemetry-auto-instrumentation-java:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: 200Mi
opentelemetry-auto-instrumentation-nodejs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: 200Mi
opentelemetry-auto-instrumentation-python:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: 200Mi
opentelemetry-auto-instrumentation-dotnet:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: 200Mi
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 83s default-scheduler Successfully assigned default/spring-boot-77d66d46fc-8fcsc to i-0cfcb3f270d724f2c
Normal Pulling 82s kubelet Pulling image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-java:v2.10.0"
Normal Pulled 81s kubelet Successfully pulled image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-java:v2.10.0" in 975ms (975ms including waiting). Image size: 30171277 bytes.
Normal Created 81s kubelet Created container: opentelemetry-auto-instrumentation-java
Normal Started 81s kubelet Started container opentelemetry-auto-instrumentation-java
Normal Pulling 80s kubelet Pulling image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-node:v0.6.0"
Normal Pulled 78s kubelet Successfully pulled image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-node:v0.6.0" in 2.359s (2.359s including waiting). Image size: 6979839 bytes.
Normal Created 78s kubelet Created container: opentelemetry-auto-instrumentation-nodejs
Normal Started 78s kubelet Started container opentelemetry-auto-instrumentation-nodejs
Normal Pulling 75s kubelet Pulling image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-python:v0.9.0"
Normal Pulled 74s kubelet Successfully pulled image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-python:v0.9.0" in 1.252s (1.252s including waiting). Image size: 7545298 bytes.
Normal Created 74s kubelet Created container: opentelemetry-auto-instrumentation-python
Normal Started 74s kubelet Started container opentelemetry-auto-instrumentation-python
Normal Pulling 72s kubelet Pulling image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-dotnet:v1.7.0"
Normal Pulled 70s kubelet Successfully pulled image "602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/observability/adot-autoinstrumentation-dotnet:v1.7.0" in 2.39s (2.39s including waiting). Image size: 50068755 bytes.
Normal Created 70s kubelet Created container: opentelemetry-auto-instrumentation-dotnet
Normal Started 69s kubelet Started container opentelemetry-auto-instrumentation-dotnet
Normal Pulling 69s kubelet Pulling image "xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app:v3"
Normal Pulled 69s kubelet Successfully pulled image "xxxxxxxxxxxx.dkr.ecr.ap-northeast-1.amazonaws.com/spring-boot-sample-app:v3" in 151ms (151ms including waiting). Image size: 236889469 bytes.
Normal Created 69s kubelet Created container: spring-boot
Normal Started 69s kubelet Started container spring-boot
無事各種環境変数なども設定されていますね。
Application Signal のサービスとしても登録され、SLA 管理を行ったり、トレースを確認することができるようになりました。
今回はコンソールから設定しましたが、EKS アドオン側の設定でも Auto Monitor/Auto restart の設定を行うことも可能です。
また、CoreDNS などの Kubernetes の動作に必要な元々存在するリソースは監視対象外になりました。
最後に
サービス作成後に都度 Application Signals 側の画面で選択していたり、マニフェストファイルへのアノテーション付与が面倒に感じている方は是非試してみて下さい!