[アップデート] Amazon EKS が CoreDNS PodのAuto Scalingをネイティブサポートしました

EKSアドオンでCoreDNSのAuto Scaling設定ができるようになりました
2024.05.27

負荷に応じてCoreDNSがAuto Scalingするように簡単に設定したい

こんにちは、のんピ(@non____97)です。

皆さんは負荷に応じてCoreDNSがAuto Scalingするように簡単に設定したいなと思ったことはありますか? 私はあります。

CoreDNSのダウンしてしまうとPodから名前解決できなくなるため、Pod間やクラスター外への通信に影響があります。そのため、CoreDNSの可用性が高くなるような仕組みが必要が必要です。

今回、アップデートによりEKSがCoreDNS PodのAuto Scalingをネイティブサポートしました。

これにより、EKSアドオンの設定で行うことが可能です。以下のようにCluster Proportional Autoscalerといったツールを別途設定する必要はありません。

どんな動きをするのか気になったので、実際に触ってみました。

いきなりまとめ

  • EKSアドオンでCoreDNS PodのAuto Scalingを設定できるようになった
  • 最小は2、最大は1,000まで設定できる
  • サポートされているEKSクラスターバージョンとプラットフォームバージョン、CoreDNS EKSアドオンのバージョンを満たす必要がある
  • Auto Scalingの条件にはNode数やNodeのCPUコア数などがある
    • 検証の中では具体的な条件は確認できず

設定

設定方法は以下AWS公式ドキュメントにまとまっています。

前提条件は以下のとおりです。

  • CoreDNSのEKSアドオンを使用する必要がある
  • EKSクラスターはサポートされているクラスターバージョンとプラットフォームバージョンで動作している必要がある
  • EKSクラスターでサポートされているCoreDNSのEKSアドオンバージョンが動作している必要がある。

サポートされている最小のクラスターバージョンと、それぞれのCoreDNS EKSアドオンバージョンは以下のとおりです。

Kubernetes バージョン プラットフォームバージョン CoreDNS EKSアドオンバージョン
1.29.3 eks.7 v1.11.1-eksbuild.9
1.28.8 eks.13 v1.10.1-eksbuild.11
1.27.12 eks.17 v1.10.1-eksbuild.11
1.26.15 eks.18 v1.9.3-eksbuild.15
1.25.16 eks.19 v1.9.3-eksbuild.15

Kubernetes未満のEKSクラスターでは使用することはできません。

気になるのは何をトリガーにAuto Scalingするかです。ドキュメントを眺めているとNode数を増やしたり、NodeのCPUコア数を増やすとスケールするようです。

This CoreDNS autoscaler continuously monitors the cluster state, including the number of nodes and CPU cores. Based on that information, the controller will dynamically adapt the number of replicas of the CoreDNS deployment in an EKS cluster.
.
.
(中略)
.
.
As you change the number of nodes and CPU cores of nodes in the cluster, Amazon EKS scales the number of replicas of the CoreDNS deployment.

Autoscaling CoreDNS - Amazon EKS

CoreDNSのPodのCPU負荷に応じてでは無いのでしょうか。ここも実際に触って確認します。

EKSクラスターの作成

まず、eksctlを使って適当にEKSクラスターを作成します。

つい先日EKSがKubernetes 1.30をサポートしたので、1.30で作成します。

$ eksctl create cluster \
  --name=non-97-eks \
  --version 1.30 \
  --nodes=2 \
  --node-volume-size=2 \
  --node-volume-type=gp3 \
  --node-ami-family=Bottlerocket \
  --instance-types=t4g.small \
  --spot \
  --managed \
  --region us-east-1
2024-05-26 09:29:19 [ℹ]  eksctl version 0.179.0-dev+b8f1ac4d7.2024-05-24T09:39:53Z
2024-05-26 09:29:19 [ℹ]  using region us-east-1
2024-05-26 09:29:20 [ℹ]  skipping us-east-1e from selection because it doesn't support the following instance type(s): t4g.small
2024-05-26 09:29:20 [ℹ]  setting availability zones to [us-east-1c us-east-1a]
2024-05-26 09:29:20 [ℹ]  subnets for us-east-1c - public:192.168.0.0/19 private:192.168.64.0/19
2024-05-26 09:29:20 [ℹ]  subnets for us-east-1a - public:192.168.32.0/19 private:192.168.96.0/19
2024-05-26 09:29:20 [ℹ]  nodegroup "ng-bf93e531" will use "" [Bottlerocket/1.30]
2024-05-26 09:29:20 [ℹ]  using Kubernetes version 1.30
2024-05-26 09:29:20 [ℹ]  creating EKS cluster "non-97-eks" in "us-east-1" region with managed nodes
2024-05-26 09:29:20 [ℹ]  will create 2 separate CloudFormation stacks for cluster itself and the initial managed nodegroup
2024-05-26 09:29:20 [ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --cluster=non-97-eks'
2024-05-26 09:29:20 [ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "non-97-eks" in "us-east-1"
2024-05-26 09:29:20 [ℹ]  CloudWatch logging will not be enabled for cluster "non-97-eks" in "us-east-1"
2024-05-26 09:29:20 [ℹ]  you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-east-1 --cluster=non-97-eks'
2024-05-26 09:29:20 [ℹ]
2 sequential tasks: { create cluster control plane "non-97-eks",
    2 sequential sub-tasks: {
        wait for control plane to become ready,
        create managed nodegroup "ng-bf93e531",
    }
}
2024-05-26 09:29:20 [ℹ]  building cluster stack "eksctl-non-97-eks-cluster"
2024-05-26 09:29:22 [ℹ]  deploying stack "eksctl-non-97-eks-cluster"
2024-05-26 09:29:52 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:30:22 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:31:23 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:32:24 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:33:25 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:34:26 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:35:27 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:36:28 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:37:28 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:38:29 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:39:30 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:41:36 [ℹ]  building managed nodegroup stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:41:38 [ℹ]  deploying stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:41:38 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:42:09 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:42:50 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:43:26 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:43:26 [ℹ]  waiting for the control plane to become ready
2024-05-26 09:43:27 [✔]  saved kubeconfig as "/<ホームディレクトリパス>/.kube/config"
2024-05-26 09:43:27 [ℹ]  no tasks
2024-05-26 09:43:27 [✔]  all EKS cluster resources for "non-97-eks" have been created
2024-05-26 09:43:27 [✔]  created 0 nodegroup(s) in cluster "non-97-eks"
2024-05-26 09:43:27 [ℹ]  nodegroup "ng-bf93e531" has 2 node(s)
2024-05-26 09:43:27 [ℹ]  node "ip-192-168-4-116.ec2.internal" is ready
2024-05-26 09:43:27 [ℹ]  node "ip-192-168-47-147.ec2.internal" is ready
2024-05-26 09:43:27 [ℹ]  waiting for at least 2 node(s) to become ready in "ng-bf93e531"
2024-05-26 09:43:28 [ℹ]  nodegroup "ng-bf93e531" has 2 node(s)
2024-05-26 09:43:28 [ℹ]  node "ip-192-168-4-116.ec2.internal" is ready
2024-05-26 09:43:28 [ℹ]  node "ip-192-168-47-147.ec2.internal" is ready
2024-05-26 09:43:28 [✔]  created 1 managed nodegroup(s) in cluster "non-97-eks"
2024-05-26 09:43:35 [ℹ]  kubectl command should work with "/<ホームディレクトリパス>/.kube/config", try 'kubectl get nodes'
2024-05-26 09:43:35 [✔]  EKS cluster "non-97-eks" in "us-east-1" region is ready

デフォルトでCoreDNSのPodが2つ起動していることを確認します。

$ kubectl get pod -n kube-system
NAME                       READY   STATUS    RESTARTS   AGE
aws-node-p6kvd             2/2     Running   0          11m
aws-node-wk7dv             2/2     Running   0          11m
coredns-586b798467-cntvg   1/1     Running   0          17m
coredns-586b798467-rf77f   1/1     Running   0          17m
kube-proxy-ml8jb           1/1     Running   0          11m
kube-proxy-nzqwd           1/1     Running   0          11m

$ kubectl get service -n kube-system
NAME       TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.100.0.10   <none>        53/UDP,53/TCP,9153/TCP   19m

$ kubectl describe pod -n kube-system coredns-586b798467-cntvg
Name:                 coredns-586b798467-cntvg
Namespace:            kube-system
Priority:             2000000000
Priority Class Name:  system-cluster-critical
Service Account:      coredns
Node:                 ip-192-168-4-116.ec2.internal/192.168.4.116
Start Time:           Sun, 26 May 2024 09:42:49 +0900
Labels:               eks.amazonaws.com/component=coredns
                      k8s-app=kube-dns
                      pod-template-hash=586b798467
Annotations:          <none>
Status:               Running
IP:                   192.168.24.128
IPs:
  IP:           192.168.24.128
Controlled By:  ReplicaSet/coredns-586b798467
Containers:
  coredns:
    Container ID:  containerd://539adf8aa70da096a5512f0b9782cab2877ed9eee8087718004994504c3c922e
    Image:         602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8
    Image ID:      602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns@sha256:d21885a6632343ecd25d468b54681a0bd512055174bb17bc35a08cb38a965f12
    Ports:         53/UDP, 53/TCP, 9153/TCP
    Host Ports:    0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    State:          Running
      Started:      Sun, 26 May 2024 09:42:50 +0900
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zfvvp (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      coredns
    Optional:  false
  kube-api-access-zfvvp:
    Type:                     Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:   3607
    ConfigMapName:            kube-root-ca.crt
    ConfigMapOptional:        <nil>
    DownwardAPI:              true
QoS Class:                    Burstable
Node-Selectors:               <none>
Tolerations:                  CriticalAddonsOnly op=Exists
                              node-role.kubernetes.io/control-plane:NoSchedule
                              node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                              node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints:  topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
Events:
  Type     Reason            Age                 From               Message
  ----     ------            ----                ----               -------
  Warning  FailedScheduling  13m (x34 over 18m)  default-scheduler  no nodes available to schedule pods
  Normal   Pulling           12m                 kubelet            Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8"
  Normal   Pulled            12m                 kubelet            Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8" in 869ms (869ms including waiting). Image size: 17282732 bytes.
  Normal   Created           12m                 kubelet            Created container coredns
  Normal   Started           12m                 kubelet            Started container coredns

metrics-serverのインストール

PodやNodeのCPU使用率を確認したいので、metrics-serverをインストールします。

$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

$ kubectl get deployment metrics-server -n kube-system
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   1/1     1            1           74s

$ kubectl top pod -n kube-system
NAME                             CPU(cores)   MEMORY(bytes)
aws-node-p6kvd                   3m           41Mi
aws-node-wk7dv                   2m           41Mi
coredns-586b798467-cntvg         1m           12Mi
coredns-586b798467-rf77f         2m           12Mi
kube-proxy-ml8jb                 1m           11Mi
kube-proxy-nzqwd                 1m           13Mi
metrics-server-7ffbc6d68-49bvd   3m           17Mi

Podのメトリクスを確認できました。

CoreDNSのEKSアドオンの追加

CoreDNSのEKSアドオンの追加をします。

デフォルトではCoreDNSのEKSアドオンは設定されていません。

EKSアドオンの初期状態

$ aws eks describe-addon \
  --cluster-name non-97-eks \
  --addon-name coredns
An error occurred (ResourceNotFoundException) when calling the DescribeAddon operation: No addon: coredns found in cluster: non-97-eks

CoreDNSのEKSアドオンの追加から行います。

追加方法は以下AWS公式ドキュメントに記載されています。

今回はAWS CLIでやってみます。

EKSクラスターにインストールされているCoreDNSのアドオンのバージョンを確認します。

$ kubectl describe deployment coredns \
  --namespace kube-system \
  | grep coredns: \
  | cut -d : -f 3
v1.11.1-eksbuild.8

Deploymentからも確認できます。

$ kubectl get deployment coredns -n kube-system -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  creationTimestamp: "2024-05-26T00:37:01Z"
  generation: 1
  labels:
    eks.amazonaws.com/component: coredns
    k8s-app: kube-dns
    kubernetes.io/name: CoreDNS
  name: coredns
  namespace: kube-system
  resourceVersion: "1681"
  uid: 5e7ba4c0-3a91-4f07-870d-56a513f5c1f0
spec:
  progressDeadlineSeconds: 600
  replicas: 2
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      eks.amazonaws.com/component: coredns
      k8s-app: kube-dns
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        eks.amazonaws.com/component: coredns
        k8s-app: kube-dns
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
              - key: kubernetes.io/arch
                operator: In
                values:
                - amd64
                - arm64
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: k8s-app
                  operator: In
                  values:
                  - kube-dns
              topologyKey: kubernetes.io/hostname
            weight: 100
      containers:
      - args:
        - -conf
        - /etc/coredns/Corefile
        image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 5
          httpGet:
            path: /health
            port: 8080
            scheme: HTTP
          initialDelaySeconds: 60
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: coredns
        ports:
        - containerPort: 53
          name: dns
          protocol: UDP
        - containerPort: 53
          name: dns-tcp
          protocol: TCP
        - containerPort: 9153
          name: metrics
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /ready
            port: 8181
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 1
        resources:
          limits:
            memory: 170Mi
          requests:
            cpu: 100m
            memory: 70Mi
        securityContext:
          allowPrivilegeEscalation: false
          capabilities:
            add:
            - NET_BIND_SERVICE
            drop:
            - ALL
          readOnlyRootFilesystem: true
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /etc/coredns
          name: config-volume
          readOnly: true
      dnsPolicy: Default
      priorityClassName: system-cluster-critical
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      serviceAccount: coredns
      serviceAccountName: coredns
      terminationGracePeriodSeconds: 30
      tolerations:
      - effect: NoSchedule
        key: node-role.kubernetes.io/control-plane
      - key: CriticalAddonsOnly
        operator: Exists
      topologySpreadConstraints:
      - labelSelector:
          matchLabels:
            k8s-app: kube-dns
        maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: ScheduleAnyway
      volumes:
      - configMap:
          defaultMode: 420
          items:
          - key: Corefile
            path: Corefile
          name: coredns
        name: config-volume
status:
  availableReplicas: 2
  conditions:
  - lastTransitionTime: "2024-05-26T00:42:50Z"
    lastUpdateTime: "2024-05-26T00:42:50Z"
    message: Deployment has minimum availability.
    reason: MinimumReplicasAvailable
    status: "True"
    type: Available
  - lastTransitionTime: "2024-05-26T00:37:01Z"
    lastUpdateTime: "2024-05-26T00:42:51Z"
    message: ReplicaSet "coredns-586b798467" has successfully progressed.
    reason: NewReplicaSetAvailable
    status: "True"
    type: Progressing
  observedGeneration: 1
  readyReplicas: 2
  replicas: 2
  updatedReplicas: 2

現在のCoreDNSアドオンと同じバージョンのCoreDNS EKSアドオンを追加します。

$ aws eks create-addon \
  --cluster-name non-97-eks \
  --addon-name coredns \
  --addon-version v1.11.1-eksbuild.8
{
    "addon": {
        "addonName": "coredns",
        "clusterName": "non-97-eks",
        "status": "CREATING",
        "addonVersion": "v1.11.1-eksbuild.8",
        "health": {
            "issues": []
        },
        "addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks/coredns/a2c7d93c-d864-1896-c7ee-065e272910f9",
        "createdAt": "2024-05-26T10:17:50.822000+09:00",
        "modifiedAt": "2024-05-26T10:17:50.842000+09:00",
        "tags": {}
    }
}

$ aws eks describe-addon \
  --cluster-name non-97-eks \
  --addon-name coredns
{
    "addon": {
        "addonName": "coredns",
        "clusterName": "non-97-eks",
        "status": "ACTIVE",
        "addonVersion": "v1.11.1-eksbuild.8",
        "health": {
            "issues": []
        },
        "addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks/coredns/a2c7d93c-d864-1896-c7ee-065e272910f9",
        "createdAt": "2024-05-26T10:17:50.822000+09:00",
        "modifiedAt": "2024-05-26T10:18:05.190000+09:00",
        "tags": {}
    }
}

マネジメントコンソール上でもCoreDNSのEKSアドオンが追加されたことを確認できました。

マネジメントコンソール上でもCoreDNSのEKSアドオンが追加されたことを確認

CoreDNSのAuto Scaling設定

CoreDNSのAuto Scaling設定を行います。

試しに最小2、最大10でAuto Scalingするようにします。

$ aws eks update-addon \
  --cluster-name non-97-eks \
  --addon-name coredns \
  --resolve-conflicts PRESERVE \
  --configuration-values '{"autoScaling":{"enabled":true}, "minReplicas": 2, "maxReplicas": 10}'

An error occurred (InvalidParameterException) when calling the UpdateAddon operation: ConfigurationValue provided in request is not supported: Json schema validation failed with error: [$.autoScaling: is not defined in the schema and the schema does not allow additional properties, $.minReplicas: is not defined in the schema and the schema does not allow additional properties, $.maxReplicas: is not defined in the schema and the schema does not allow additional properties]

「そんなパラメーターない」と怒られてしまいました。

v1.11.1-eksbuild.8のアドオン設定スキーマを確認すると、確かにautoScalingはありません。

{
  "$ref": "#/definitions/Coredns",
  "$schema": "http://json-schema.org/draft-06/schema#",
  "definitions": {
    "Coredns": {
      "additionalProperties": false,
      "properties": {
        "affinity": {
          "default": {
            "affinity": {
              "nodeAffinity": {
                "requiredDuringSchedulingIgnoredDuringExecution": {
                  "nodeSelectorTerms": [
                    {
                      "matchExpressions": [
                        {
                          "key": "kubernetes.io/os",
                          "operator": "In",
                          "values": [
                            "linux"
                          ]
                        },
                        {
                          "key": "kubernetes.io/arch",
                          "operator": "In",
                          "values": [
                            "amd64",
                            "arm64"
                          ]
                        }
                      ]
                    }
                  ]
                }
              },
              "podAntiAffinity": {
                "preferredDuringSchedulingIgnoredDuringExecution": [
                  {
                    "podAffinityTerm": {
                      "labelSelector": {
                        "matchExpressions": [
                          {
                            "key": "k8s-app",
                            "operator": "In",
                            "values": [
                              "kube-dns"
                            ]
                          }
                        ]
                      },
                      "topologyKey": "kubernetes.io/hostname"
                    },
                    "weight": 100
                  }
                ]
              }
            }
          },
          "description": "Affinity of the coredns pods",
          "type": [
            "object",
            "null"
          ]
        },
        "computeType": {
          "type": "string"
        },
        "corefile": {
          "description": "Entire corefile contents to use with installation",
          "type": "string"
        },
        "nodeSelector": {
          "additionalProperties": {
            "type": "string"
          },
          "type": "object"
        },
        "podAnnotations": {
          "properties": {},
          "title": "The podAnnotations Schema",
          "type": "object"
        },
        "podDisruptionBudget": {
          "description": "podDisruptionBudget configurations",
          "enabled": {
            "default": true,
            "description": "the option to enable managed PDB",
            "type": "boolean"
          },
          "maxUnavailable": {
            "anyOf": [
              {
                "pattern": ".*%$",
                "type": "string"
              },
              {
                "type": "integer"
              }
            ],
            "default": 1,
            "description": "minAvailable value for managed PDB, can be either string or integer; if it's string, should end with %"
          },
          "minAvailable": {
            "anyOf": [
              {
                "pattern": ".*%$",
                "type": "string"
              },
              {
                "type": "integer"
              }
            ],
            "description": "maxUnavailable value for managed PDB, can be either string or integer; if it's string, should end with %"
          },
          "type": "object"
        },
        "podLabels": {
          "properties": {},
          "title": "The podLabels Schema",
          "type": "object"
        },
        "replicaCount": {
          "type": "integer"
        },
        "resources": {
          "$ref": "#/definitions/Resources"
        },
        "tolerations": {
          "default": [
            {
              "key": "CriticalAddonsOnly",
              "operator": "Exists"
            },
            {
              "effect": "NoSchedule",
              "key": "node-role.kubernetes.io/control-plane"
            }
          ],
          "description": "Tolerations of the coredns pod",
          "items": {
            "type": "object"
          },
          "type": "array"
        },
        "topologySpreadConstraints": {
          "description": "The coredns pod topology spread constraints",
          "type": "array"
        }
      },
      "title": "Coredns",
      "type": "object"
    },
    "Limits": {
      "additionalProperties": false,
      "properties": {
        "cpu": {
          "type": "string"
        },
        "memory": {
          "type": "string"
        }
      },
      "title": "Limits",
      "type": "object"
    },
    "Resources": {
      "additionalProperties": false,
      "properties": {
        "limits": {
          "$ref": "#/definitions/Limits"
        },
        "requests": {
          "$ref": "#/definitions/Limits"
        }
      },
      "title": "Resources",
      "type": "object"
    }
  }
}

最新のv1.11.1-eksbuild.9にアップデートしてみましょう。

$ aws eks update-addon \
  --cluster-name non-97-eks \
  --addon-name coredns \
  --resolve-conflicts PRESERVE \
  --addon-version v1.11.1-eksbuild.9
{
    "update": {
        "id": "212a6290-6a61-357a-b2e6-0637660c6d6f",
        "status": "InProgress",
        "type": "AddonUpdate",
        "params": [
            {
                "type": "AddonVersion",
                "value": "v1.11.1-eksbuild.9"
            },
            {
                "type": "ResolveConflicts",
                "value": "PRESERVE"
            }
        ],
        "createdAt": "2024-05-26T10:31:49.570000+09:00",
        "errors": []
    }
}

アップデート後のスキーマを確認するとautoScalingのプロパティが生えてきました。最大1,000Podまでスケールするようです。

{
  "$ref": "#/definitions/Coredns",
  "$schema": "http://json-schema.org/draft-06/schema#",
  "definitions": {
    "Coredns": {
      "additionalProperties": false,
      "properties": {
        "affinity": {
          "default": {
            "affinity": {
              "nodeAffinity": {
                "requiredDuringSchedulingIgnoredDuringExecution": {
                  "nodeSelectorTerms": [
                    {
                      "matchExpressions": [
                        {
                          "key": "kubernetes.io/os",
                          "operator": "In",
                          "values": [
                            "linux"
                          ]
                        },
                        {
                          "key": "kubernetes.io/arch",
                          "operator": "In",
                          "values": [
                            "amd64",
                            "arm64"
                          ]
                        }
                      ]
                    }
                  ]
                }
              },
              "podAntiAffinity": {
                "preferredDuringSchedulingIgnoredDuringExecution": [
                  {
                    "podAffinityTerm": {
                      "labelSelector": {
                        "matchExpressions": [
                          {
                            "key": "k8s-app",
                            "operator": "In",
                            "values": [
                              "kube-dns"
                            ]
                          }
                        ]
                      },
                      "topologyKey": "kubernetes.io/hostname"
                    },
                    "weight": 100
                  }
                ]
              }
            }
          },
          "description": "Affinity of the coredns pods",
          "type": [
            "object",
            "null"
          ]
        },
        "autoScaling": {
          "additionalProperties": false,
          "description": "autoScaling configurations",
          "properties": {
            "enabled": {
              "default": false,
              "description": "the option to enable eks managed autoscaling for coredns",
              "type": "boolean"
            },
            "maxReplicas": {
              "description": "the max value that autoscaler can scale up the coredns replicas to",
              "maximum": 1000,
              "minimum": 2,
              "type": "integer"
            },
            "minReplicas": {
              "default": 2,
              "description": "the min value that autoscaler can scale down the coredns replicas to",
              "maximum": 1000,
              "minimum": 2,
              "type": "integer"
            }
          },
          "required": [
            "enabled"
          ],
          "type": "object"
        },
        "computeType": {
          "type": "string"
        },
        "corefile": {
          "description": "Entire corefile contents to use with installation",
          "type": "string"
        },
        "nodeSelector": {
          "additionalProperties": {
            "type": "string"
          },
          "type": "object"
        },
        "podAnnotations": {
          "properties": {},
          "title": "The podAnnotations Schema",
          "type": "object"
        },
        "podDisruptionBudget": {
          "description": "podDisruptionBudget configurations",
          "properties": {
            "enabled": {
              "default": true,
              "description": "the option to enable managed PDB",
              "type": "boolean"
            },
            "maxUnavailable": {
              "anyOf": [
                {
                  "pattern": ".*%$",
                  "type": "string"
                },
                {
                  "type": "integer"
                }
              ],
              "default": 1,
              "description": "maxUnavailable value for managed PDB, can be either string or integer; if it's string, should end with %"
            },
            "minAvailable": {
              "anyOf": [
                {
                  "pattern": ".*%$",
                  "type": "string"
                },
                {
                  "type": "integer"
                }
              ],
              "description": "minAvailable value for managed PDB, can be either string or integer; if it's string, should end with %"
            }
          },
          "type": "object"
        },
        "podLabels": {
          "properties": {},
          "title": "The podLabels Schema",
          "type": "object"
        },
        "replicaCount": {
          "type": "integer"
        },
        "resources": {
          "$ref": "#/definitions/Resources"
        },
        "tolerations": {
          "default": [
            {
              "key": "CriticalAddonsOnly",
              "operator": "Exists"
            },
            {
              "effect": "NoSchedule",
              "key": "node-role.kubernetes.io/control-plane"
            }
          ],
          "description": "Tolerations of the coredns pod",
          "items": {
            "type": "object"
          },
          "type": "array"
        },
        "topologySpreadConstraints": {
          "description": "The coredns pod topology spread constraints",
          "type": "array"
        }
      },
      "title": "Coredns",
      "type": "object"
    },
    "Limits": {
      "additionalProperties": false,
      "properties": {
        "cpu": {
          "type": "string"
        },
        "memory": {
          "type": "string"
        }
      },
      "title": "Limits",
      "type": "object"
    },
    "Resources": {
      "additionalProperties": false,
      "properties": {
        "limits": {
          "$ref": "#/definitions/Limits"
        },
        "requests": {
          "$ref": "#/definitions/Limits"
        }
      },
      "title": "Resources",
      "type": "object"
    }
  }
}

それでは再度設定してみましょう。

$ aws eks update-addon \
  --cluster-name non-97-eks \
  --addon-name coredns \
  --resolve-conflicts PRESERVE \
  --configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 2, "maxReplicas": 10}}'
{
    "update": {
        "id": "99e3ba65-bc09-3b3b-a7f1-5149884a3864",
        "status": "InProgress",
        "type": "AddonUpdate",
        "params": [
            {
                "type": "ResolveConflicts",
                "value": "PRESERVE"
            },
            {
                "type": "ConfigurationValues",
                "value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}"
            }
        ],
        "createdAt": "2024-05-26T10:39:38.591000+09:00",
        "errors": []
    }
}

$ aws eks describe-addon \
  --cluster-name non-97-eks \
  --addon-name coredns
{
    "addon": {
        "addonName": "coredns",
        "clusterName": "non-97-eks",
        "status": "ACTIVE",
        "addonVersion": "v1.11.1-eksbuild.9",
        "health": {
            "issues": []
        },
        "addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks/coredns/a2c7d93c-d864-1896-c7ee-065e272910f9",
        "createdAt": "2024-05-26T10:17:50.822000+09:00",
        "modifiedAt": "2024-05-26T10:39:41.867000+09:00",
        "tags": {},
        "configurationValues": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}"
    }
}

設定できました。

なお、特にこの時点では動きはありませんでした。

$ kubectl get deployments -n kube-system coredns
NAME      READY   UP-TO-DATE   AVAILABLE   AGE
coredns   2/2     2            2           66m

$ kubectl describe deployments coredns -n kube-system
Name:                   coredns
Namespace:              kube-system
CreationTimestamp:      Sun, 26 May 2024 09:37:01 +0900
Labels:                 eks.amazonaws.com/component=coredns
                        k8s-app=kube-dns
                        kubernetes.io/name=CoreDNS
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               eks.amazonaws.com/component=coredns,k8s-app=kube-dns
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 25% max surge
Pod Template:
  Labels:           eks.amazonaws.com/component=coredns
                    k8s-app=kube-dns
  Service Account:  coredns
  Containers:
   coredns:
    Image:       602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:                       ConfigMap (a volume populated by a ConfigMap)
    Name:                       coredns
    Optional:                   false
  Topology Spread Constraints:  topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
  Priority Class Name:          system-cluster-critical
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  coredns-586b798467 (0/0 replicas created)
NewReplicaSet:   coredns-86d5d9b668 (2/2 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  11m   deployment-controller  Scaled up replica set coredns-86d5d9b668 to 1
  Normal  ScalingReplicaSet  11m   deployment-controller  Scaled down replica set coredns-586b798467 to 1 from 2
  Normal  ScalingReplicaSet  11m   deployment-controller  Scaled up replica set coredns-86d5d9b668 to 2 from 1
  Normal  ScalingReplicaSet  11m   deployment-controller  Scaled down replica set coredns-586b798467 to 0 from 1

Auto Scalingすることを確認

Node数を10に増やす

Auto Scalingすることを確認していきましょう。

先述のとおり、「Node数に応じてスケールする」といった書きっぷりがドキュメントにあったので、Node数を10に増やして様子をみます。

$ eksctl scale nodegroup \
  --cluster=non-97-eks \
  --name=ng-bf93e531 \
  --nodes=10 \
  --nodes-min=10 \
  --nodes-max=10
2024-05-26 10:47:06 [ℹ]  scaling nodegroup "ng-bf93e531" in cluster non-97-eks
2024-05-26 10:47:08 [ℹ]  initiated scaling of nodegroup
2024-05-26 10:47:08 [ℹ]  to see the status of the scaling run `eksctl get nodegroup --cluster non-97-eks --region us-east-1 --name ng-bf93e531`

Node数とPod数を確認します。

$ kubectl get node
NAME                             STATUS   ROLES    AGE   VERSION
ip-192-168-12-9.ec2.internal     Ready    <none>   15s   v1.30.0-eks-fff26e3
ip-192-168-14-243.ec2.internal   Ready    <none>   17s   v1.30.0-eks-fff26e3
ip-192-168-18-3.ec2.internal     Ready    <none>   17s   v1.30.0-eks-fff26e3
ip-192-168-23-86.ec2.internal    Ready    <none>   12s   v1.30.0-eks-fff26e3
ip-192-168-32-210.ec2.internal   Ready    <none>   17s   v1.30.0-eks-fff26e3
ip-192-168-38-149.ec2.internal   Ready    <none>   16s   v1.30.0-eks-fff26e3
ip-192-168-4-116.ec2.internal    Ready    <none>   65m   v1.30.0-eks-fff26e3
ip-192-168-42-185.ec2.internal   Ready    <none>   17s   v1.30.0-eks-fff26e3
ip-192-168-47-147.ec2.internal   Ready    <none>   65m   v1.30.0-eks-fff26e3
ip-192-168-61-1.ec2.internal     Ready    <none>   17s   v1.30.0-eks-fff26e3

$ kubectl get pod -n kube-system
NAME                             READY   STATUS    RESTARTS   AGE
aws-node-5hn7v                   2/2     Running   0          32s
aws-node-895dk                   2/2     Running   0          35s
aws-node-gwk9x                   2/2     Running   0          37s
aws-node-nlmqb                   2/2     Running   0          37s
aws-node-p6kvd                   2/2     Running   0          65m
aws-node-ppxpj                   2/2     Running   0          37s
aws-node-qbnsr                   2/2     Running   0          36s
aws-node-vn45f                   2/2     Running   0          37s
aws-node-w2cq2                   2/2     Running   0          36s
aws-node-wk7dv                   2/2     Running   0          65m
coredns-86d5d9b668-rhvrg         1/1     Running   0          16m
coredns-86d5d9b668-tqc5b         1/1     Running   0          16m
kube-proxy-4vxl5                 1/1     Running   0          37s
kube-proxy-8jcp5                 1/1     Running   0          37s
kube-proxy-8w7lw                 1/1     Running   0          37s
kube-proxy-9t5z2                 1/1     Running   0          36s
kube-proxy-gjnqx                 1/1     Running   0          35s
kube-proxy-gz6h6                 1/1     Running   0          36s
kube-proxy-ml8jb                 1/1     Running   0          65m
kube-proxy-nzqwd                 1/1     Running   0          65m
kube-proxy-xcnhd                 1/1     Running   0          37s
kube-proxy-z4bb5                 1/1     Running   0          32s
metrics-server-7ffbc6d68-49bvd   1/1     Running   0          44m

$ kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP               NODE                             NOMINATED NODE   READINESS GATES
coredns-86d5d9b668-rhvrg   1/1     Running   0          20m   192.168.3.147    ip-192-168-4-116.ec2.internal    <none>           <none>
coredns-86d5d9b668-tqc5b   1/1     Running   0          20m   192.168.38.204   ip-192-168-47-147.ec2.internal   <none>           <none>

$ kubectl describe deployments coredns -n kube-system
Name:                   coredns
Namespace:              kube-system
CreationTimestamp:      Sun, 26 May 2024 09:37:01 +0900
Labels:                 eks.amazonaws.com/component=coredns
                        k8s-app=kube-dns
                        kubernetes.io/name=CoreDNS
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               eks.amazonaws.com/component=coredns,k8s-app=kube-dns
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 25% max surge
Pod Template:
  Labels:           eks.amazonaws.com/component=coredns
                    k8s-app=kube-dns
  Service Account:  coredns
  Containers:
   coredns:
    Image:       602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:                       ConfigMap (a volume populated by a ConfigMap)
    Name:                       coredns
    Optional:                   false
  Topology Spread Constraints:  topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
  Priority Class Name:          system-cluster-critical
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  coredns-586b798467 (0/0 replicas created)
NewReplicaSet:   coredns-86d5d9b668 (2/2 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  17m   deployment-controller  Scaled up replica set coredns-86d5d9b668 to 1
  Normal  ScalingReplicaSet  17m   deployment-controller  Scaled down replica set coredns-586b798467 to 1 from 2
  Normal  ScalingReplicaSet  17m   deployment-controller  Scaled up replica set coredns-86d5d9b668 to 2 from 1
  Normal  ScalingReplicaSet  17m   deployment-controller  Scaled down replica set coredns-586b798467 to 0 from 1

Nodeは10個に増えましたが、CoreDNSのPodは増えていませんね。

t4g.micro のNodeを20個追加

Node全体のCPUのコア数が足りないのでしょうか?

t4g.micro のNodeを20個追加してみます。

$ eksctl create nodegroup \
  --cluster=non-97-eks \
  --node-type=t4g.nano \
  --nodes=20 \
  --nodes-min=20 \
  --nodes-max=20 \
  --node-volume-size=2 \
  --node-volume-type=gp3 \
  --node-ami-family=Bottlerocket \
  --spot \
  --managed
2024-05-26 11:04:21 [ℹ]  will use version 1.30 for new nodegroup(s) based on control plane version
2024-05-26 11:04:27 [ℹ]  nodegroup "ng-4a521135" will use "" [Bottlerocket/1.30]
2024-05-26 11:04:29 [ℹ]  1 existing nodegroup(s) (ng-bf93e531) will be excluded
2024-05-26 11:04:29 [ℹ]  1 nodegroup (ng-4a521135) was included (based on the include/exclude rules)
2024-05-26 11:04:29 [ℹ]  will create a CloudFormation stack for each of 1 managed nodegroups in cluster "non-97-eks"
2024-05-26 11:04:30 [ℹ]
2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "ng-4a521135" } }
}
2024-05-26 11:04:30 [ℹ]  checking cluster stack for missing resources
2024-05-26 11:04:31 [ℹ]  cluster stack has all required resources
2024-05-26 11:04:33 [ℹ]  building managed nodegroup stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:04:33 [ℹ]  deploying stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:04:34 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:05:04 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:05:56 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:07:09 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:08:52 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:10:05 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:11:47 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:13:41 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:14:13 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:15:41 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:17:32 [ℹ]  waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:17:33 [ℹ]  no tasks
2024-05-26 11:17:33 [✔]  created 0 nodegroup(s) in cluster "non-97-eks"
2024-05-26 11:17:34 [ℹ]  nodegroup "ng-4a521135" has 20 node(s)
2024-05-26 11:17:34 [ℹ]  node "ip-192-168-14-142.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ]  node "ip-192-168-14-214.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ]  node "ip-192-168-19-169.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ]  node "ip-192-168-20-134.ec2.internal" is ready
.
.
(中略)
.
.
2024-05-26 11:17:34 [ℹ]  node "ip-192-168-6-253.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ]  node "ip-192-168-60-173.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ]  node "ip-192-168-62-70.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ]  node "ip-192-168-9-217.ec2.internal" is ready
2024-05-26 11:17:34 [✔]  created 1 managed nodegroup(s) in cluster "non-97-eks"
2024-05-26 11:17:36 [ℹ]  checking security group configuration for all nodegroups
2024-05-26 11:17:36 [ℹ]  all nodegroups have up-to-date cloudformation templates

Node数とPod数を確認します。

$ kubectl get node
NAME                             STATUS   ROLES    AGE     VERSION
ip-192-168-12-9.ec2.internal     Ready    <none>   29m     v1.30.0-eks-fff26e3
ip-192-168-14-142.ec2.internal   Ready    <none>   9m18s   v1.30.0-eks-fff26e3
ip-192-168-14-214.ec2.internal   Ready    <none>   61s     v1.30.0-eks-fff26e3
ip-192-168-14-243.ec2.internal   Ready    <none>   29m     v1.30.0-eks-fff26e3
ip-192-168-18-3.ec2.internal     Ready    <none>   29m     v1.30.0-eks-fff26e3
ip-192-168-19-169.ec2.internal   Ready    <none>   52s     v1.30.0-eks-fff26e3
ip-192-168-20-134.ec2.internal   Ready    <none>   11m     v1.30.0-eks-fff26e3
ip-192-168-20-208.ec2.internal   Ready    <none>   51s     v1.30.0-eks-fff26e3
ip-192-168-22-87.ec2.internal    Ready    <none>   9m9s    v1.30.0-eks-fff26e3
ip-192-168-23-86.ec2.internal    Ready    <none>   29m     v1.30.0-eks-fff26e3
ip-192-168-25-248.ec2.internal   Ready    <none>   48s     v1.30.0-eks-fff26e3
ip-192-168-30-238.ec2.internal   Ready    <none>   53s     v1.30.0-eks-fff26e3
ip-192-168-32-210.ec2.internal   Ready    <none>   29m     v1.30.0-eks-fff26e3
ip-192-168-34-209.ec2.internal   Ready    <none>   6m55s   v1.30.0-eks-fff26e3
ip-192-168-38-149.ec2.internal   Ready    <none>   29m     v1.30.0-eks-fff26e3
ip-192-168-4-116.ec2.internal    Ready    <none>   94m     v1.30.0-eks-fff26e3
ip-192-168-4-197.ec2.internal    Ready    <none>   11m     v1.30.0-eks-fff26e3
ip-192-168-4-37.ec2.internal     Ready    <none>   9m6s    v1.30.0-eks-fff26e3
ip-192-168-40-63.ec2.internal    Ready    <none>   5m25s   v1.30.0-eks-fff26e3
ip-192-168-42-155.ec2.internal   Ready    <none>   5m12s   v1.30.0-eks-fff26e3
ip-192-168-42-185.ec2.internal   Ready    <none>   29m     v1.30.0-eks-fff26e3
ip-192-168-42-213.ec2.internal   Ready    <none>   5m16s   v1.30.0-eks-fff26e3
ip-192-168-47-147.ec2.internal   Ready    <none>   94m     v1.30.0-eks-fff26e3
ip-192-168-47-240.ec2.internal   Ready    <none>   5m13s   v1.30.0-eks-fff26e3
ip-192-168-58-141.ec2.internal   Ready    <none>   5m13s   v1.30.0-eks-fff26e3
ip-192-168-6-253.ec2.internal    Ready    <none>   48s     v1.30.0-eks-fff26e3
ip-192-168-60-173.ec2.internal   Ready    <none>   5m13s   v1.30.0-eks-fff26e3
ip-192-168-61-1.ec2.internal     Ready    <none>   29m     v1.30.0-eks-fff26e3
ip-192-168-62-70.ec2.internal    Ready    <none>   5m12s   v1.30.0-eks-fff26e3
ip-192-168-9-217.ec2.internal    Ready    <none>   11m     v1.30.0-eks-fff26e3

$ kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP               NODE                             NOMINATED NODE   READINESS GATES
coredns-86d5d9b668-rhvrg   1/1     Running   0          45m   192.168.3.147    ip-192-168-4-116.ec2.internal    <none>           <none>
coredns-86d5d9b668-tqc5b   1/1     Running   0          45m   192.168.38.204   ip-192-168-47-147.ec2.internal   <none>           <none>

$ kubectl describe deployments coredns -n kube-system
Name:                   coredns
Namespace:              kube-system
CreationTimestamp:      Sun, 26 May 2024 09:37:01 +0900
Labels:                 eks.amazonaws.com/component=coredns
                        k8s-app=kube-dns
                        kubernetes.io/name=CoreDNS
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               eks.amazonaws.com/component=coredns,k8s-app=kube-dns
Replicas:               2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 25% max surge
Pod Template:
  Labels:           eks.amazonaws.com/component=coredns
                    k8s-app=kube-dns
  Service Account:  coredns
  Containers:
   coredns:
    Image:       602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:                       ConfigMap (a volume populated by a ConfigMap)
    Name:                       coredns
    Optional:                   false
  Topology Spread Constraints:  topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
  Priority Class Name:          system-cluster-critical
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  coredns-586b798467 (0/0 replicas created)
NewReplicaSet:   coredns-86d5d9b668 (2/2 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  47m   deployment-controller  Scaled up replica set coredns-86d5d9b668 to 1
  Normal  ScalingReplicaSet  47m   deployment-controller  Scaled down replica set coredns-586b798467 to 1 from 2
  Normal  ScalingReplicaSet  47m   deployment-controller  Scaled up replica set coredns-86d5d9b668 to 2 from 1
  Normal  ScalingReplicaSet  47m   deployment-controller  Scaled down replica set coredns-586b798467 to 0 from 1

CoreDNSのPod数は変わりありません。

最小を3、最大を1,000に設定してみる

Auto Scalingの設定を最小を3、最大を1,000に設定してみて挙動を確認します。

$ aws eks update-addon \
  --cluster-name non-97-eks \
  --addon-name coredns \
  --resolve-conflicts PRESERVE \
  --configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 3, "maxReplicas": 1000}}'
{
    "update": {
        "id": "b0625d52-3a49-3d93-94d3-049ce5e98ff5",
        "status": "InProgress",
        "type": "AddonUpdate",
        "params": [
            {
                "type": "ResolveConflicts",
                "value": "PRESERVE"
            },
            {
                "type": "ConfigurationValues",
                "value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 3, \"maxReplicas\": 1000}}"
            }
        ],
        "createdAt": "2024-05-26T11:34:14.326000+09:00",
        "errors": []
    }
}

Pod数を確認します。

$ kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide
NAME                       READY   STATUS    RESTARTS   AGE   IP               NODE                             NOMINATED NODE   READINESS GATES
coredns-86d5d9b668-65xcp   1/1     Running   0          29s   192.168.9.74     ip-192-168-23-86.ec2.internal    <none>           <none>
coredns-86d5d9b668-rhvrg   1/1     Running   0          64m   192.168.3.147    ip-192-168-4-116.ec2.internal    <none>           <none>
coredns-86d5d9b668-tqc5b   1/1     Running   0          64m   192.168.38.204   ip-192-168-47-147.ec2.internal   <none>           <none>

$ kubectl rollout history deployment/coredns -n kube-system
deployment.apps/coredns
REVISION  CHANGE-CAUSE
1         <none>
2         <none>

$ kubectl describe deployments coredns -n kube-system

Name:                   coredns
Namespace:              kube-system
CreationTimestamp:      Sun, 26 May 2024 09:37:01 +0900
Labels:                 eks.amazonaws.com/component=coredns
                        k8s-app=kube-dns
                        kubernetes.io/name=CoreDNS
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               eks.amazonaws.com/component=coredns,k8s-app=kube-dns
Replicas:               3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  1 max unavailable, 25% max surge
Pod Template:
  Labels:           eks.amazonaws.com/component=coredns
                    k8s-app=kube-dns
  Service Account:  coredns
  Containers:
   coredns:
    Image:       602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9
    Ports:       53/UDP, 53/TCP, 9153/TCP
    Host Ports:  0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:                       ConfigMap (a volume populated by a ConfigMap)
    Name:                       coredns
    Optional:                   false
  Topology Spread Constraints:  topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
  Priority Class Name:          system-cluster-critical
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
  Progressing    True    NewReplicaSetAvailable
OldReplicaSets:  coredns-586b798467 (0/0 replicas created)
NewReplicaSet:   coredns-86d5d9b668 (3/3 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  70s   deployment-controller  Scaled up replica set coredns-86d5d9b668 to 3 from 2

CoreDNSで名前解決をして負荷をかける dig 編

CoreDNSで名前解決をして負荷をかけた時の挙動を確認します。

定期的にdigを叩くシェルスクリプトを実行するコンテナを用意します。Dockerfileとスクリプトは以下のとおりです。

Dockerfile

FROM alpine:latest
RUN apk add --no-cache bind-tools bash
COPY ./dns-resolution.sh /usr/local/bin/
CMD ["/bin/bash", "/usr/local/bin/dns-resolution.sh"]

./dns-resolution.sh

#!/bin/bash

set -xu

DOMAIN="${DOMAIN:-www.non-97.net}"
INTERVAL="${INTERVAL:-5}"

while true; do
  dig "${DOMAIN}" +short

  sleep "${INTERVAL}"
done

コンテナイメージをビルドして、作成したECRリポジトリにPushします。

$ docker build -t dns-resolution .
[+] Building 2.8s (8/8) FINISHED
 => [internal] load build definition from Dockerfile                                                                                                                           0.0s
 => => transferring dockerfile: 191B                                                                                                                                           0.0s
 => [internal] load .dockerignore                                                                                                                                              0.0s
 => => transferring context: 2B                                                                                                                                                0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                                                                               2.7s
 => [1/3] FROM docker.io/library/alpine:latest@sha256:77726ef6b57ddf65bb551896826ec38bc3e53f75cdde31354fbffb4f25238ebd                                                         0.0s
 => CACHED [2/3] RUN apk add --no-cache bind-tools bash                                                                                                                        0.0s
 => [internal] load build context                                                                                                                                              0.0s
 => => transferring context: 39B                                                                                                                                               0.0s
 => [3/3] COPY ./dns-resolution.sh /usr/local/bin/                                                                                                                             0.0s
 => exporting to image                                                                                                                                                         0.1s
 => => exporting layers                                                                                                                                                        0.1s
 => => writing image sha256:2db16b22688d82039680917bfca20c70a811f8c18f49d51bd1c4caa7a5587873                                                                                   0.0s
 => => naming to docker.io/library/dns-resolution                                                                                                                              0.0s


$ set AWS_ACCOUNT_ID (aws sts get-caller-identity --output text --query Account)
$ set AWS_REGION (aws configure get region)
$ aws ecr get-login-password \
         | docker login \
           --username AWS \
           --password-stdin https://$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
Login Succeeded

$ aws ecr create-repository --repository-name dns-resolution
{
    "repository": {
        "repositoryArn": "arn:aws:ecr:us-east-1:<AWSアカウントID>:repository/dns-resolution",
        "registryId": "<AWSアカウントID>",
        "repositoryName": "dns-resolution",
        "repositoryUri": "<AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution",
        "createdAt": "2024-05-26T16:02:59.144000+09:00",
        "imageTagMutability": "MUTABLE",
        "imageScanningConfiguration": {
            "scanOnPush": false
        },
        "encryptionConfiguration": {
            "encryptionType": "AES256"
        }
    }
}

$ set dns_resolution_repo (aws ecr describe-repositories \
  --repository-names dns-resolution \
  --query 'repositories[0].repositoryUri' \
  --output text
)

$ docker tag dns-resolution:latest $dns_resolution_repo:latest

$ docker image ls | grep dns-resolution
<AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution                latest                 2db16b22688d   33 seconds ago   22.3MB
dns-resolution                                                             latest                 2db16b22688d   33 seconds ago   22.3MB

$ docker push $dns_resolution_repo:latest
The push refers to repository [<AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution]
b51587f43b2b: Pushed
5d35fe5c895f: Pushed
50171d1acbd5: Pushed
latest: digest: sha256:28fb6d58c37b6cb43ccb10cbe82d3565306c7dd9addb33678a3cb08bd7990101 size: 946

用意したコンテナを実行するマニフェストファイルを作成します。

./dns-resolution-deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-resolution
  namespace: default
spec:
  selector:
    matchLabels:
      app: dns-resolution
  replicas: 2
  template:
    metadata:
      labels:
        app: dns-resolution
    spec:
      containers:
        - name: dns-resolution
          image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest
          env:
            - name: FQDN
              value: www.non-97.net
            - name: INTERVAL
              value: "3"

デプロイします。

$ kubectl apply -f ./dns-resolution-deployment.yml
deployment.apps/dns-resolution configured

$ kubectl get pod -n default
NAME                              READY   STATUS    RESTARTS         AGE
dns-resolution-7cd64b54cf-bkwr4   1/1     Running   0                45s
dns-resolution-7cd64b54cf-vsk79   1/1     Running   11 (6m15s ago)   32m

$ stern dns-resolution -n default
+ dns-resolution-7cd64b54cf-bkwr4 › dns-resolution
+ dns-resolution-7cd64b54cf-vsk79 › dns-resolution
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + DOMAIN=www.non-97.net
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + INTERVAL=3
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + true
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + dig www.non-97.net +short
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + sleep 3
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + true
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + dig www.non-97.net +short
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + sleep 3
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + true
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + dig www.non-97.net +short
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + sleep 3
.
.
(以下略)
.
.

$ kubectl top pod -n kube-system
NAME                             CPU(cores)   MEMORY(bytes)
aws-node-8c5v7                   3m           41Mi
aws-node-f9lrm                   3m           42Mi
aws-node-nnjln                   3m           42Mi
coredns-86d5d9b668-j76c6         2m           12Mi
coredns-86d5d9b668-jltmm         2m           12Mi
coredns-86d5d9b668-vqgqp         1m           12Mi
kube-proxy-62hmp                 1m           11Mi
kube-proxy-gncj8                 1m           11Mi
kube-proxy-wx7hr                 1m           12Mi
metrics-server-7ffbc6d68-4srp5   3m           18Mi

流石にこの程度ではCoreDNSのPodの負荷はかかっていません。

100個同時に実行してみます。マニフェストファイルは以下のとおりです。

./dns-resolution-deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-resolution
  namespace: default
spec:
  selector:
    matchLabels:
      app: dns-resolution
  replicas: 100
  template:
    metadata:
      labels:
        app: dns-resolution
    spec:
      containers:
        - name: dns-resolution
          image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest
          env:
            - name: FQDN
              value: www.non-97.net
            - name: INTERVAL
              value: "0.000001"

流石にNodeが足りないと思うので、Nodeを10個用意します。

$ eksctl scale nodegroup \
  --cluster=non-97-eks \
  --name=ng-b0fc3fde \
  --nodes=10 \
  --nodes-min=10 \
  --nodes-max=10
2024-05-26 17:00:40 [ℹ]  scaling nodegroup "ng-b0fc3fde" in cluster non-97-eks
2024-05-26 17:00:43 [ℹ]  initiated scaling of nodegroup
2024-05-26 17:00:43 [ℹ]  to see the status of the scaling run `eksctl get nodegroup --cluster non-97-eks --region us-east-1 --name ng-b0fc3fde`

$ kubectl get deployment -n default
NAME             READY    UP-TO-DATE   AVAILABLE   AGE
dns-resolution   86/100   100          86          52m

$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME                       CPU(cores)   MEMORY(bytes)
coredns-86d5d9b668-j76c6   53m          15Mi
coredns-86d5d9b668-jltmm   55m          15Mi
coredns-86d5d9b668-vqgqp   53m          15Mi

digを叩くPodが86個起動しましたが、CoreDNSの負荷が足りません。

CoreDNSで名前解決をして負荷をかける dnsperf 編

digではなく、DNSのベンチマークツールであるdnsperfを実行するように変更します。

CoreDNSのサービスのクラスターIPアドレスに対して名前解決をします。

事前準備として、dig編と同じになるようにCoreDNSのPodの最小を2、最大10に変更しておきます。

$ aws eks update-addon \
  --cluster-name non-97-eks \
  --addon-name coredns \
  --resolve-conflicts PRESERVE \
  --configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 2, "maxReplicas": 10}}'
{
    "update": {
        "id": "eabb479c-769f-371e-869a-3abeeb23cfbc",
        "status": "InProgress",
        "type": "AddonUpdate",
        "params": [
            {
                "type": "ResolveConflicts",
                "value": "PRESERVE"
            },
            {
                "type": "ConfigurationValues",
                "value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}"
            }
        ],
        "createdAt": "2024-05-26T17:09:47.277000+09:00",
        "errors": []
    }
}

Dockerfileとシェルスクリプトは以下のとおりです。

Dockerfile

FROM  --platform=linux/arm64 ubuntu:latest AS build
RUN apt-get update && apt-get install -y dnsperf

FROM  --platform=linux/arm64 ubuntu:latest
COPY --from=build /usr/bin/dnsperf /usr/bin/
COPY --from=build /usr/lib/aarch64-linux-gnu/libldns.so.3 /usr/lib/
COPY --from=build /usr/lib/aarch64-linux-gnu/libnghttp2.so.14 /usr/lib/
COPY ./dns-resolution.sh /usr/local/bin/
CMD ["/bin/bash", "/usr/local/bin/dns-resolution.sh"]

./dns-resolution.sh

#!/bin/bash

set -u

DOMAIN="${DOMAIN:-www.non-97.net}"
SERVER_ADDR="${SERVER_ADDR:-10.100.0.10}"
MAXRUNS="${MAXRUNS:-5}"
CLIENTS="${CLIENTS:-1}"

echo "${DOMAIN} A" >"query_random_list.txt"

while true; do
  dnsperf -d query_random_list.txt -l "${MAXRUNS}" -s "${SERVER_ADDR}" -c "${CLIENTS}"
done

ビルドしてECRにPushしたのち、以下マニフェストファイルで起動します。

./dns-resolution-deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-resolution
  namespace: default
spec:
  selector:
    matchLabels:
      app: dns-resolution
  replicas: 2
  template:
    metadata:
      labels:
        app: dns-resolution
    spec:
      containers:
        - name: dns-resolution
          image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest
          env:
            - name: FQDN
              value: www.non-97.net
            - name: SERVER_ADDR
              value: 10.100.0.10
            - name: MAXRUNS
              value: "10"
            - name: CLIENTS
              value: "10"

Podの様子を確認します。

$ kubectl apply -f ./dns-resolution-deployment.yml
deployment.apps/dns-resolution configured

$ kubectl get pod -n default
NAME                              READY   STATUS    RESTARTS   AGE
dns-resolution-55c8658865-mp7sj   1/1     Running   0          13s
dns-resolution-55c8658865-t7qth   1/1     Running   0          13s

$ stern dns-resolution -n default
+ dns-resolution-55c8658865-mp7sj › dns-resolution
+ dns-resolution-55c8658865-t7qth › dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution DNS Performance Testing Tool
dns-resolution-55c8658865-mp7sj dns-resolution Version 2.14.0
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Command line: dnsperf -d query_random_list.txt -l 10 -s 10.100.0.10 -c 10
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Sending queries (to 10.100.0.10:53)
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Started at: Sun May 26 23:05:24 2024
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Stopping after 10.000000 seconds
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Testing complete (time limit)
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution Statistics:
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution   Queries sent:         244371
dns-resolution-55c8658865-mp7sj dns-resolution   Queries completed:    244371 (100.00%)
dns-resolution-55c8658865-mp7sj dns-resolution   Queries lost:         0 (0.00%)
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution   Response codes:       SERVFAIL 244371 (100.00%)
dns-resolution-55c8658865-mp7sj dns-resolution   Average packet size:  request 32, response 32
dns-resolution-55c8658865-mp7sj dns-resolution   Run time (s):         10.008380
dns-resolution-55c8658865-mp7sj dns-resolution   Queries per second:   24416.638857
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution   Average Latency (s):  0.003310 (min 0.000047, max 0.059284)
dns-resolution-55c8658865-mp7sj dns-resolution   Latency StdDev (s):   0.002724
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution DNS Performance Testing Tool
dns-resolution-55c8658865-mp7sj dns-resolution Version 2.14.0
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Command line: dnsperf -d query_random_list.txt -l 10 -s 10.100.0.10 -c 10
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Sending queries (to 10.100.0.10:53)
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Started at: Sun May 26 23:05:34 2024
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Stopping after 10.000000 seconds
dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 5432
dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 5438
.
.
(中略)
.
.
dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 40701
dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 40428
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Testing complete (time limit)
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution Statistics:
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution   Queries sent:         411875
dns-resolution-55c8658865-mp7sj dns-resolution   Queries completed:    411852 (99.99%)
dns-resolution-55c8658865-mp7sj dns-resolution   Queries lost:         23 (0.01%)
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution   Response codes:       SERVFAIL 411852 (100.00%)
dns-resolution-55c8658865-mp7sj dns-resolution   Average packet size:  request 32, response 32
dns-resolution-55c8658865-mp7sj dns-resolution   Run time (s):         10.006569
dns-resolution-55c8658865-mp7sj dns-resolution   Queries per second:   41158.163203
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution   Average Latency (s):  0.001978 (min 0.000038, max 0.044901)
dns-resolution-55c8658865-mp7sj dns-resolution   Latency StdDev (s):   0.001715
dns-resolution-55c8658865-mp7sj dns-resolution

$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME                       CPU(cores)   MEMORY(bytes)
coredns-86d5d9b668-jc4cx   1413m        19Mi
coredns-86d5d9b668-nd792   1549m        20Mi

$ kubectl top node
NAME                            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ip-192-168-0-143.ec2.internal   28m          1%     363Mi           26%
ip-192-168-51-53.ec2.internal   1847m        95%    396Mi           29%
ip-192-168-8-43.ec2.internal    1873m        97%    388Mi           28%

かなりCoreDNSのPodのCPU負荷が高まってきました。

しかし、このまま放置してもCoreDNSのPod数は変わりありませんでした。

Node数を4つに、dnsperfのPodを10個に実行するようにしても変わりありませんでした。

$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME                       CPU(cores)   MEMORY(bytes)
coredns-86d5d9b668-jc4cx   1411m        23Mi
coredns-86d5d9b668-nd792   1369m        17Mi

$ kubectl top node
NAME                            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ip-192-168-0-143.ec2.internal   1094m        56%    409Mi           29%
ip-192-168-33-71.ec2.internal   1202m        62%    362Mi           26%
ip-192-168-51-53.ec2.internal   1869m        96%    418Mi           30%
ip-192-168-8-43.ec2.internal    1913m        99%    423Mi           31%

Kubernetes 1.29で再チャレンジ

EKSクラスターの作成

もしかすると、Kubernetes 1.30を使用しているのが良くないのでしょうか。

AWS公式ドキュメントには確かにKubernetes 1.30への言及はありませんね。

Minimum cluster version

Autoscaling CoreDNS - Amazon EKS

Kubernetes 1.29で再チャレンジします。

EKSクラスターを再作成します。

$ eksctl create cluster \
  --name=non-97-eks-129 \
  --version 1.29 \
  --nodes=4 \
  --nodes-min=4 \
  --nodes-max=4 \
  --node-volume-size=0 \
  --node-volume-type=gp3 \
  --node-ami-family=Bottlerocket \
  --instance-types=t4g.small \
  --spot \
  --managed \
  --region us-east-1

$ aws eks describe-cluster \
  --name non-97-eks-129 \
  --query cluster.version
"1.29"

$ aws eks describe-cluster \
  --name non-97-eks-129 \
  --query cluster.platformVersion
"eks.7"

metrics-serverのインストールをします。

$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

kubectl get deployment metrics-server -n kube-system
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   1/1     1            1           15m

CoreDNSのAuto Scaling設定

CoreDNSのAuto Scaling設定をします。

$ aws eks create-addon \
  --cluster-name non-97-eks-129 \
  --addon-name coredns \
  --addon-version v1.11.1-eksbuild.9
{
    "addon": {
        "addonName": "coredns",
        "clusterName": "non-97-eks-129",
        "status": "CREATING",
        "addonVersion": "v1.11.1-eksbuild.9",
        "health": {
            "issues": []
        },
        "addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks-129/coredns/c4c7dc2a-46ad-4837-0397-57b8ef4fa96a",
        "createdAt": "2024-05-27T13:35:00.355000+09:00",
        "modifiedAt": "2024-05-27T13:35:00.389000+09:00",
        "tags": {}
    }
}

$ aws eks update-addon \
  --cluster-name non-97-eks-129 \
  --addon-name coredns \
  --resolve-conflicts PRESERVE \
  --configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 2, "maxReplicas": 10}}'
{
    "update": {
        "id": "8b3ac3f8-819a-3f57-9fd2-97389feed79c",
        "status": "InProgress",
        "type": "AddonUpdate",
        "params": [
            {
                "type": "ResolveConflicts",
                "value": "PRESERVE"
            },
            {
                "type": "ConfigurationValues",
                "value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}"
            }
        ],
        "createdAt": "2024-05-27T13:35:59.154000+09:00",
        "errors": []
    }
}

$ aws eks describe-addon \
  --cluster-name non-97-eks-129 \
  --addon-name coredns
{
    "addon": {
        "addonName": "coredns",
        "clusterName": "non-97-eks-129",
        "status": "ACTIVE",
        "addonVersion": "v1.11.1-eksbuild.9",
        "health": {
            "issues": []
        },
        "addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks-129/coredns/c4c7dc2a-46ad-4837-0397-57b8ef4fa96a",
        "createdAt": "2024-05-27T13:35:00.355000+09:00",
        "modifiedAt": "2024-05-27T13:36:02.438000+09:00",
        "tags": {},
        "configurationValues": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}"
    }
}

CoreDNSのPod数を確認しておきます。

$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME                     CPU(cores)   MEMORY(bytes)
coredns-bf47b49b-kzxdh   1m           11Mi
coredns-bf47b49b-qf9jm   1m           11Mi

t4g.micro のNodeを10個追加

t4g.micro のNodeを10個追加します。

$ eksctl create nodegroup \
  --cluster=non-97-eks-129 \
  --node-type=t4g.micro \
  --nodes=10 \
  --nodes-min=10 \
  --nodes-max=10 \
  --node-volume-size=2 \
  --node-volume-type=gp3 \
  --node-ami-family=Bottlerocket \
  --spot \
  --managed

$ kubectl top node
NAME                             CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ip-192-168-1-0.ec2.internal      22m          1%     277Mi           54%
ip-192-168-11-78.ec2.internal    17m          0%     272Mi           53%
ip-192-168-12-236.ec2.internal   22m          1%     454Mi           33%
ip-192-168-14-155.ec2.internal   17m          0%     273Mi           53%
ip-192-168-24-76.ec2.internal    33m          1%     270Mi           52%
ip-192-168-3-239.ec2.internal    19m          0%     282Mi           55%
ip-192-168-38-149.ec2.internal   29m          1%     266Mi           51%
ip-192-168-41-178.ec2.internal   17m          0%     262Mi           51%
ip-192-168-48-176.ec2.internal   16m          0%     363Mi           26%
ip-192-168-48-185.ec2.internal   22m          1%     407Mi           29%
ip-192-168-49-7.ec2.internal     22m          1%     274Mi           53%
ip-192-168-57-3.ec2.internal     24m          1%     276Mi           53%
ip-192-168-59-122.ec2.internal   19m          0%     420Mi           30%
ip-192-168-61-175.ec2.internal   20m          1%     283Mi           55%

$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME                     CPU(cores)   MEMORY(bytes)
coredns-bf47b49b-f7rzg   1m           12Mi
coredns-bf47b49b-kzxdh   1m           12Mi

CoreDNSのPod数は2つのままでした。

このまま1時間弱放置しましたが、2つのままでした。

CoreDNSで名前解決をして負荷をかける

dnsperfを使ってCoreDNSに負荷をかけます。

使用したマニフェストファイルは以下のとおりです。

./dns-resolution-deployment.yml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-resolution
  namespace: default
spec:
  selector:
    matchLabels:
      app: dns-resolution
  replicas: 5
  template:
    metadata:
      labels:
        app: dns-resolution
    spec:
      containers:
        - name: dns-resolution
          image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest
          env:
            - name: FQDN
              value: www.non-97.net
            - name: SERVER_ADDR
              value: 10.100.0.10
            - name: MAXRUNS
              value: "10"
            - name: CLIENTS
              value: "15"

デプロイしてCoreDNSのPod数を確認します。

$ kubectl apply -f ./dns-resolution-deployment.yml
deployment.apps/dns-resolution configured

$ kubectl get pod -n default
NAME                              READY   STATUS    RESTARTS   AGE
dns-resolution-75985fd469-99mqj   1/1     Running   0          75s
dns-resolution-75985fd469-k5g4v   1/1     Running   0          75s
dns-resolution-75985fd469-v7h8w   1/1     Running   0          75s
dns-resolution-75985fd469-z4hsv   1/1     Running   0          75s
dns-resolution-75985fd469-zvv9c   1/1     Running   0          75s

$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME                     CPU(cores)   MEMORY(bytes)
coredns-bf47b49b-f7rzg   1530m        22Mi
coredns-bf47b49b-w9r87   1511m        19Mi

$ kubectl top node
NAME                             CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ip-192-168-1-0.ec2.internal      478m         24%    301Mi           58%
ip-192-168-11-78.ec2.internal    16m          0%     269Mi           52%
ip-192-168-12-236.ec2.internal   1854m        96%    495Mi           36%
ip-192-168-14-155.ec2.internal   17m          0%     261Mi           51%
ip-192-168-19-132.ec2.internal   487m         25%    412Mi           30%
ip-192-168-24-76.ec2.internal    26m          1%     265Mi           51%
ip-192-168-3-239.ec2.internal    236m         12%    275Mi           53%
ip-192-168-38-149.ec2.internal   20m          1%     262Mi           51%
ip-192-168-41-178.ec2.internal   20m          1%     266Mi           52%
ip-192-168-48-176.ec2.internal   543m         28%    415Mi           30%
ip-192-168-48-185.ec2.internal   1849m        95%    452Mi           33%
ip-192-168-49-7.ec2.internal     23m          1%     269Mi           52%
ip-192-168-57-3.ec2.internal     15m          0%     273Mi           53%
ip-192-168-61-175.ec2.internal   24m          1%     264Mi           51%

変わらず2つのままです。数分時間を置きましたが、変わりませんでした。

結局、具体的にどのような条件でCoreDNSのPodがAuto Scalingは不明でした。

ちなみにkubectl rollout historyを確認しても、動きは特にありませんでした。

kubectl rollout history deployment/coredns -n kube-system
deployment.apps/coredns
REVISION  CHANGE-CAUSE
1         <none>
2         <none>

$ kubectl rollout history deployment/coredns -n kube-system --revision 1
deployment.apps/coredns with revision #1
Pod Template:
  Labels:	eks.amazonaws.com/component=coredns
	k8s-app=kube-dns
	pod-template-hash=54d6f577c6
  Service Account:	coredns
  Containers:
   coredns:
    Image:	602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.4
    Ports:	53/UDP, 53/TCP, 9153/TCP
    Host Ports:	0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:	170Mi
    Requests:
      cpu:	100m
      memory:	70Mi
    Liveness:	http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:	http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:	<none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:	ConfigMap (a volume populated by a ConfigMap)
    Name:	coredns
    Optional:	false
  Priority Class Name:	system-cluster-critical

$ kubectl rollout history deployment/coredns -n kube-system --revision 2
deployment.apps/coredns with revision #2
Pod Template:
  Labels:	eks.amazonaws.com/component=coredns
	k8s-app=kube-dns
	pod-template-hash=bf47b49b
  Service Account:	coredns
  Containers:
   coredns:
    Image:	602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9
    Ports:	53/UDP, 53/TCP, 9153/TCP
    Host Ports:	0/UDP, 0/TCP, 0/TCP
    Args:
      -conf
      /etc/coredns/Corefile
    Limits:
      memory:	170Mi
    Requests:
      cpu:	100m
      memory:	70Mi
    Liveness:	http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:	http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:	<none>
    Mounts:
      /etc/coredns from config-volume (ro)
  Volumes:
   config-volume:
    Type:	ConfigMap (a volume populated by a ConfigMap)
    Name:	coredns
    Optional:	false
  Topology Spread Constraints:	topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
  Priority Class Name:	system-cluster-critical

EKSアドオンでCoreDNSのAuto Scaling設定ができるようになりました

Amazon EKS が CoreDNS PodのAuto Scalingをネイティブサポートしたアップデートを紹介しました。

EKSアドオンでCoreDNSのAuto Scaling設定ができるようになったのは嬉しいですね。

ただ、個人的にはどのタイミングでCoreDNSのPodがスケールするのか具体的な条件が分からなかったのが気になります。時間があれば再度検証して追記しようと思います。

この記事が誰かの助けになれば幸いです。

以上、AWS事業本部 コンサルティング部の のんピ(@non____97)でした!