[アップデート] Amazon EKS が CoreDNS PodのAuto Scalingをネイティブサポートしました
負荷に応じてCoreDNSがAuto Scalingするように簡単に設定したい
こんにちは、のんピ(@non____97)です。
皆さんは負荷に応じてCoreDNSがAuto Scalingするように簡単に設定したいなと思ったことはありますか? 私はあります。
CoreDNSのダウンしてしまうとPodから名前解決できなくなるため、Pod間やクラスター外への通信に影響があります。そのため、CoreDNSの可用性が高くなるような仕組みが必要が必要です。
今回、アップデートによりEKSがCoreDNS PodのAuto Scalingをネイティブサポートしました。
これにより、EKSアドオンの設定で行うことが可能です。以下のようにCluster Proportional Autoscalerといったツールを別途設定する必要はありません。
どんな動きをするのか気になったので、実際に触ってみました。
いきなりまとめ
- EKSアドオンでCoreDNS PodのAuto Scalingを設定できるようになった
- 最小は2、最大は1,000まで設定できる
- サポートされているEKSクラスターバージョンとプラットフォームバージョン、CoreDNS EKSアドオンのバージョンを満たす必要がある
- Auto Scalingの条件にはNode数やNodeのCPUコア数などがある
- 検証の中では具体的な条件は確認できず
設定
設定方法は以下AWS公式ドキュメントにまとまっています。
前提条件は以下のとおりです。
- CoreDNSのEKSアドオンを使用する必要がある
- EKSクラスターはサポートされているクラスターバージョンとプラットフォームバージョンで動作している必要がある
- EKSクラスターでサポートされているCoreDNSのEKSアドオンバージョンが動作している必要がある。
サポートされている最小のクラスターバージョンと、それぞれのCoreDNS EKSアドオンバージョンは以下のとおりです。
Kubernetes バージョン | プラットフォームバージョン | CoreDNS EKSアドオンバージョン |
---|---|---|
1.29.3 | eks.7 | v1.11.1-eksbuild.9 |
1.28.8 | eks.13 | v1.10.1-eksbuild.11 |
1.27.12 | eks.17 | v1.10.1-eksbuild.11 |
1.26.15 | eks.18 | v1.9.3-eksbuild.15 |
1.25.16 | eks.19 | v1.9.3-eksbuild.15 |
Kubernetes未満のEKSクラスターでは使用することはできません。
気になるのは何をトリガーにAuto Scalingするかです。ドキュメントを眺めているとNode数を増やしたり、NodeのCPUコア数を増やすとスケールするようです。
This CoreDNS autoscaler continuously monitors the cluster state, including the number of nodes and CPU cores. Based on that information, the controller will dynamically adapt the number of replicas of the CoreDNS deployment in an EKS cluster.
.
.
(中略)
.
.
As you change the number of nodes and CPU cores of nodes in the cluster, Amazon EKS scales the number of replicas of the CoreDNS deployment.
CoreDNSのPodのCPU負荷に応じてでは無いのでしょうか。ここも実際に触って確認します。
EKSクラスターの作成
まず、eksctl
を使って適当にEKSクラスターを作成します。
つい先日EKSがKubernetes 1.30をサポートしたので、1.30で作成します。
$ eksctl create cluster \ --name=non-97-eks \ --version 1.30 \ --nodes=2 \ --node-volume-size=2 \ --node-volume-type=gp3 \ --node-ami-family=Bottlerocket \ --instance-types=t4g.small \ --spot \ --managed \ --region us-east-1 2024-05-26 09:29:19 [ℹ] eksctl version 0.179.0-dev+b8f1ac4d7.2024-05-24T09:39:53Z 2024-05-26 09:29:19 [ℹ] using region us-east-1 2024-05-26 09:29:20 [ℹ] skipping us-east-1e from selection because it doesn't support the following instance type(s): t4g.small 2024-05-26 09:29:20 [ℹ] setting availability zones to [us-east-1c us-east-1a] 2024-05-26 09:29:20 [ℹ] subnets for us-east-1c - public:192.168.0.0/19 private:192.168.64.0/19 2024-05-26 09:29:20 [ℹ] subnets for us-east-1a - public:192.168.32.0/19 private:192.168.96.0/19 2024-05-26 09:29:20 [ℹ] nodegroup "ng-bf93e531" will use "" [Bottlerocket/1.30] 2024-05-26 09:29:20 [ℹ] using Kubernetes version 1.30 2024-05-26 09:29:20 [ℹ] creating EKS cluster "non-97-eks" in "us-east-1" region with managed nodes 2024-05-26 09:29:20 [ℹ] will create 2 separate CloudFormation stacks for cluster itself and the initial managed nodegroup 2024-05-26 09:29:20 [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --cluster=non-97-eks' 2024-05-26 09:29:20 [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "non-97-eks" in "us-east-1" 2024-05-26 09:29:20 [ℹ] CloudWatch logging will not be enabled for cluster "non-97-eks" in "us-east-1" 2024-05-26 09:29:20 [ℹ] you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-east-1 --cluster=non-97-eks' 2024-05-26 09:29:20 [ℹ] 2 sequential tasks: { create cluster control plane "non-97-eks", 2 sequential sub-tasks: { wait for control plane to become ready, create managed nodegroup "ng-bf93e531", } } 2024-05-26 09:29:20 [ℹ] building cluster stack "eksctl-non-97-eks-cluster" 2024-05-26 09:29:22 [ℹ] deploying stack "eksctl-non-97-eks-cluster" 2024-05-26 09:29:52 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:30:22 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:31:23 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:32:24 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:33:25 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:34:26 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:35:27 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:36:28 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:37:28 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:38:29 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:39:30 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster" 2024-05-26 09:41:36 [ℹ] building managed nodegroup stack "eksctl-non-97-eks-nodegroup-ng-bf93e531" 2024-05-26 09:41:38 [ℹ] deploying stack "eksctl-non-97-eks-nodegroup-ng-bf93e531" 2024-05-26 09:41:38 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531" 2024-05-26 09:42:09 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531" 2024-05-26 09:42:50 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531" 2024-05-26 09:43:26 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531" 2024-05-26 09:43:26 [ℹ] waiting for the control plane to become ready 2024-05-26 09:43:27 [✔] saved kubeconfig as "/<ホームディレクトリパス>/.kube/config" 2024-05-26 09:43:27 [ℹ] no tasks 2024-05-26 09:43:27 [✔] all EKS cluster resources for "non-97-eks" have been created 2024-05-26 09:43:27 [✔] created 0 nodegroup(s) in cluster "non-97-eks" 2024-05-26 09:43:27 [ℹ] nodegroup "ng-bf93e531" has 2 node(s) 2024-05-26 09:43:27 [ℹ] node "ip-192-168-4-116.ec2.internal" is ready 2024-05-26 09:43:27 [ℹ] node "ip-192-168-47-147.ec2.internal" is ready 2024-05-26 09:43:27 [ℹ] waiting for at least 2 node(s) to become ready in "ng-bf93e531" 2024-05-26 09:43:28 [ℹ] nodegroup "ng-bf93e531" has 2 node(s) 2024-05-26 09:43:28 [ℹ] node "ip-192-168-4-116.ec2.internal" is ready 2024-05-26 09:43:28 [ℹ] node "ip-192-168-47-147.ec2.internal" is ready 2024-05-26 09:43:28 [✔] created 1 managed nodegroup(s) in cluster "non-97-eks" 2024-05-26 09:43:35 [ℹ] kubectl command should work with "/<ホームディレクトリパス>/.kube/config", try 'kubectl get nodes' 2024-05-26 09:43:35 [✔] EKS cluster "non-97-eks" in "us-east-1" region is ready
デフォルトでCoreDNSのPodが2つ起動していることを確認します。
$ kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE aws-node-p6kvd 2/2 Running 0 11m aws-node-wk7dv 2/2 Running 0 11m coredns-586b798467-cntvg 1/1 Running 0 17m coredns-586b798467-rf77f 1/1 Running 0 17m kube-proxy-ml8jb 1/1 Running 0 11m kube-proxy-nzqwd 1/1 Running 0 11m $ kubectl get service -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP,9153/TCP 19m $ kubectl describe pod -n kube-system coredns-586b798467-cntvg Name: coredns-586b798467-cntvg Namespace: kube-system Priority: 2000000000 Priority Class Name: system-cluster-critical Service Account: coredns Node: ip-192-168-4-116.ec2.internal/192.168.4.116 Start Time: Sun, 26 May 2024 09:42:49 +0900 Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns pod-template-hash=586b798467 Annotations: <none> Status: Running IP: 192.168.24.128 IPs: IP: 192.168.24.128 Controlled By: ReplicaSet/coredns-586b798467 Containers: coredns: Container ID: containerd://539adf8aa70da096a5512f0b9782cab2877ed9eee8087718004994504c3c922e Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8 Image ID: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns@sha256:d21885a6632343ecd25d468b54681a0bd512055174bb17bc35a08cb38a965f12 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile State: Running Started: Sun, 26 May 2024 09:42:50 +0900 Ready: True Restart Count: 0 Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zfvvp (ro) Conditions: Type Status PodReadyToStartContainers True Initialized True Ready True ContainersReady True PodScheduled True Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false kube-api-access-zfvvp: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true QoS Class: Burstable Node-Selectors: <none> Tolerations: CriticalAddonsOnly op=Exists node-role.kubernetes.io/control-plane:NoSchedule node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 13m (x34 over 18m) default-scheduler no nodes available to schedule pods Normal Pulling 12m kubelet Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8" Normal Pulled 12m kubelet Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8" in 869ms (869ms including waiting). Image size: 17282732 bytes. Normal Created 12m kubelet Created container coredns Normal Started 12m kubelet Started container coredns
metrics-serverのインストール
PodやNodeのCPU使用率を確認したいので、metrics-serverをインストールします。
$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml serviceaccount/metrics-server created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrole.rbac.authorization.k8s.io/system:metrics-server created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created service/metrics-server created deployment.apps/metrics-server created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created $ kubectl get deployment metrics-server -n kube-system NAME READY UP-TO-DATE AVAILABLE AGE metrics-server 1/1 1 1 74s $ kubectl top pod -n kube-system NAME CPU(cores) MEMORY(bytes) aws-node-p6kvd 3m 41Mi aws-node-wk7dv 2m 41Mi coredns-586b798467-cntvg 1m 12Mi coredns-586b798467-rf77f 2m 12Mi kube-proxy-ml8jb 1m 11Mi kube-proxy-nzqwd 1m 13Mi metrics-server-7ffbc6d68-49bvd 3m 17Mi
Podのメトリクスを確認できました。
CoreDNSのEKSアドオンの追加
CoreDNSのEKSアドオンの追加をします。
デフォルトではCoreDNSのEKSアドオンは設定されていません。
$ aws eks describe-addon \ --cluster-name non-97-eks \ --addon-name coredns An error occurred (ResourceNotFoundException) when calling the DescribeAddon operation: No addon: coredns found in cluster: non-97-eks
CoreDNSのEKSアドオンの追加から行います。
追加方法は以下AWS公式ドキュメントに記載されています。
今回はAWS CLIでやってみます。
EKSクラスターにインストールされているCoreDNSのアドオンのバージョンを確認します。
$ kubectl describe deployment coredns \ --namespace kube-system \ | grep coredns: \ | cut -d : -f 3 v1.11.1-eksbuild.8
Deploymentからも確認できます。
$ kubectl get deployment coredns -n kube-system -o yaml apiVersion: apps/v1 kind: Deployment metadata: annotations: deployment.kubernetes.io/revision: "1" creationTimestamp: "2024-05-26T00:37:01Z" generation: 1 labels: eks.amazonaws.com/component: coredns k8s-app: kube-dns kubernetes.io/name: CoreDNS name: coredns namespace: kube-system resourceVersion: "1681" uid: 5e7ba4c0-3a91-4f07-870d-56a513f5c1f0 spec: progressDeadlineSeconds: 600 replicas: 2 revisionHistoryLimit: 10 selector: matchLabels: eks.amazonaws.com/component: coredns k8s-app: kube-dns strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 1 type: RollingUpdate template: metadata: creationTimestamp: null labels: eks.amazonaws.com/component: coredns k8s-app: kube-dns spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/os operator: In values: - linux - key: kubernetes.io/arch operator: In values: - amd64 - arm64 podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - podAffinityTerm: labelSelector: matchExpressions: - key: k8s-app operator: In values: - kube-dns topologyKey: kubernetes.io/hostname weight: 100 containers: - args: - -conf - /etc/coredns/Corefile image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8 imagePullPolicy: IfNotPresent livenessProbe: failureThreshold: 5 httpGet: path: /health port: 8080 scheme: HTTP initialDelaySeconds: 60 periodSeconds: 10 successThreshold: 1 timeoutSeconds: 5 name: coredns ports: - containerPort: 53 name: dns protocol: UDP - containerPort: 53 name: dns-tcp protocol: TCP - containerPort: 9153 name: metrics protocol: TCP readinessProbe: failureThreshold: 3 httpGet: path: /ready port: 8181 scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 resources: limits: memory: 170Mi requests: cpu: 100m memory: 70Mi securityContext: allowPrivilegeEscalation: false capabilities: add: - NET_BIND_SERVICE drop: - ALL readOnlyRootFilesystem: true terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /etc/coredns name: config-volume readOnly: true dnsPolicy: Default priorityClassName: system-cluster-critical restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: coredns serviceAccountName: coredns terminationGracePeriodSeconds: 30 tolerations: - effect: NoSchedule key: node-role.kubernetes.io/control-plane - key: CriticalAddonsOnly operator: Exists topologySpreadConstraints: - labelSelector: matchLabels: k8s-app: kube-dns maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: ScheduleAnyway volumes: - configMap: defaultMode: 420 items: - key: Corefile path: Corefile name: coredns name: config-volume status: availableReplicas: 2 conditions: - lastTransitionTime: "2024-05-26T00:42:50Z" lastUpdateTime: "2024-05-26T00:42:50Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available - lastTransitionTime: "2024-05-26T00:37:01Z" lastUpdateTime: "2024-05-26T00:42:51Z" message: ReplicaSet "coredns-586b798467" has successfully progressed. reason: NewReplicaSetAvailable status: "True" type: Progressing observedGeneration: 1 readyReplicas: 2 replicas: 2 updatedReplicas: 2
現在のCoreDNSアドオンと同じバージョンのCoreDNS EKSアドオンを追加します。
$ aws eks create-addon \ --cluster-name non-97-eks \ --addon-name coredns \ --addon-version v1.11.1-eksbuild.8 { "addon": { "addonName": "coredns", "clusterName": "non-97-eks", "status": "CREATING", "addonVersion": "v1.11.1-eksbuild.8", "health": { "issues": [] }, "addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks/coredns/a2c7d93c-d864-1896-c7ee-065e272910f9", "createdAt": "2024-05-26T10:17:50.822000+09:00", "modifiedAt": "2024-05-26T10:17:50.842000+09:00", "tags": {} } } $ aws eks describe-addon \ --cluster-name non-97-eks \ --addon-name coredns { "addon": { "addonName": "coredns", "clusterName": "non-97-eks", "status": "ACTIVE", "addonVersion": "v1.11.1-eksbuild.8", "health": { "issues": [] }, "addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks/coredns/a2c7d93c-d864-1896-c7ee-065e272910f9", "createdAt": "2024-05-26T10:17:50.822000+09:00", "modifiedAt": "2024-05-26T10:18:05.190000+09:00", "tags": {} } }
マネジメントコンソール上でもCoreDNSのEKSアドオンが追加されたことを確認できました。
CoreDNSのAuto Scaling設定
CoreDNSのAuto Scaling設定を行います。
試しに最小2、最大10でAuto Scalingするようにします。
$ aws eks update-addon \ --cluster-name non-97-eks \ --addon-name coredns \ --resolve-conflicts PRESERVE \ --configuration-values '{"autoScaling":{"enabled":true}, "minReplicas": 2, "maxReplicas": 10}' An error occurred (InvalidParameterException) when calling the UpdateAddon operation: ConfigurationValue provided in request is not supported: Json schema validation failed with error: [$.autoScaling: is not defined in the schema and the schema does not allow additional properties, $.minReplicas: is not defined in the schema and the schema does not allow additional properties, $.maxReplicas: is not defined in the schema and the schema does not allow additional properties]
「そんなパラメーターない」と怒られてしまいました。
v1.11.1-eksbuild.8のアドオン設定スキーマを確認すると、確かにautoScaling
はありません。
{ "$ref": "#/definitions/Coredns", "$schema": "http://json-schema.org/draft-06/schema#", "definitions": { "Coredns": { "additionalProperties": false, "properties": { "affinity": { "default": { "affinity": { "nodeAffinity": { "requiredDuringSchedulingIgnoredDuringExecution": { "nodeSelectorTerms": [ { "matchExpressions": [ { "key": "kubernetes.io/os", "operator": "In", "values": [ "linux" ] }, { "key": "kubernetes.io/arch", "operator": "In", "values": [ "amd64", "arm64" ] } ] } ] } }, "podAntiAffinity": { "preferredDuringSchedulingIgnoredDuringExecution": [ { "podAffinityTerm": { "labelSelector": { "matchExpressions": [ { "key": "k8s-app", "operator": "In", "values": [ "kube-dns" ] } ] }, "topologyKey": "kubernetes.io/hostname" }, "weight": 100 } ] } } }, "description": "Affinity of the coredns pods", "type": [ "object", "null" ] }, "computeType": { "type": "string" }, "corefile": { "description": "Entire corefile contents to use with installation", "type": "string" }, "nodeSelector": { "additionalProperties": { "type": "string" }, "type": "object" }, "podAnnotations": { "properties": {}, "title": "The podAnnotations Schema", "type": "object" }, "podDisruptionBudget": { "description": "podDisruptionBudget configurations", "enabled": { "default": true, "description": "the option to enable managed PDB", "type": "boolean" }, "maxUnavailable": { "anyOf": [ { "pattern": ".*%$", "type": "string" }, { "type": "integer" } ], "default": 1, "description": "minAvailable value for managed PDB, can be either string or integer; if it's string, should end with %" }, "minAvailable": { "anyOf": [ { "pattern": ".*%$", "type": "string" }, { "type": "integer" } ], "description": "maxUnavailable value for managed PDB, can be either string or integer; if it's string, should end with %" }, "type": "object" }, "podLabels": { "properties": {}, "title": "The podLabels Schema", "type": "object" }, "replicaCount": { "type": "integer" }, "resources": { "$ref": "#/definitions/Resources" }, "tolerations": { "default": [ { "key": "CriticalAddonsOnly", "operator": "Exists" }, { "effect": "NoSchedule", "key": "node-role.kubernetes.io/control-plane" } ], "description": "Tolerations of the coredns pod", "items": { "type": "object" }, "type": "array" }, "topologySpreadConstraints": { "description": "The coredns pod topology spread constraints", "type": "array" } }, "title": "Coredns", "type": "object" }, "Limits": { "additionalProperties": false, "properties": { "cpu": { "type": "string" }, "memory": { "type": "string" } }, "title": "Limits", "type": "object" }, "Resources": { "additionalProperties": false, "properties": { "limits": { "$ref": "#/definitions/Limits" }, "requests": { "$ref": "#/definitions/Limits" } }, "title": "Resources", "type": "object" } } }
最新のv1.11.1-eksbuild.9にアップデートしてみましょう。
$ aws eks update-addon \ --cluster-name non-97-eks \ --addon-name coredns \ --resolve-conflicts PRESERVE \ --addon-version v1.11.1-eksbuild.9 { "update": { "id": "212a6290-6a61-357a-b2e6-0637660c6d6f", "status": "InProgress", "type": "AddonUpdate", "params": [ { "type": "AddonVersion", "value": "v1.11.1-eksbuild.9" }, { "type": "ResolveConflicts", "value": "PRESERVE" } ], "createdAt": "2024-05-26T10:31:49.570000+09:00", "errors": [] } }
アップデート後のスキーマを確認するとautoScaling
のプロパティが生えてきました。最大1,000Podまでスケールするようです。
{ "$ref": "#/definitions/Coredns", "$schema": "http://json-schema.org/draft-06/schema#", "definitions": { "Coredns": { "additionalProperties": false, "properties": { "affinity": { "default": { "affinity": { "nodeAffinity": { "requiredDuringSchedulingIgnoredDuringExecution": { "nodeSelectorTerms": [ { "matchExpressions": [ { "key": "kubernetes.io/os", "operator": "In", "values": [ "linux" ] }, { "key": "kubernetes.io/arch", "operator": "In", "values": [ "amd64", "arm64" ] } ] } ] } }, "podAntiAffinity": { "preferredDuringSchedulingIgnoredDuringExecution": [ { "podAffinityTerm": { "labelSelector": { "matchExpressions": [ { "key": "k8s-app", "operator": "In", "values": [ "kube-dns" ] } ] }, "topologyKey": "kubernetes.io/hostname" }, "weight": 100 } ] } } }, "description": "Affinity of the coredns pods", "type": [ "object", "null" ] }, "autoScaling": { "additionalProperties": false, "description": "autoScaling configurations", "properties": { "enabled": { "default": false, "description": "the option to enable eks managed autoscaling for coredns", "type": "boolean" }, "maxReplicas": { "description": "the max value that autoscaler can scale up the coredns replicas to", "maximum": 1000, "minimum": 2, "type": "integer" }, "minReplicas": { "default": 2, "description": "the min value that autoscaler can scale down the coredns replicas to", "maximum": 1000, "minimum": 2, "type": "integer" } }, "required": [ "enabled" ], "type": "object" }, "computeType": { "type": "string" }, "corefile": { "description": "Entire corefile contents to use with installation", "type": "string" }, "nodeSelector": { "additionalProperties": { "type": "string" }, "type": "object" }, "podAnnotations": { "properties": {}, "title": "The podAnnotations Schema", "type": "object" }, "podDisruptionBudget": { "description": "podDisruptionBudget configurations", "properties": { "enabled": { "default": true, "description": "the option to enable managed PDB", "type": "boolean" }, "maxUnavailable": { "anyOf": [ { "pattern": ".*%$", "type": "string" }, { "type": "integer" } ], "default": 1, "description": "maxUnavailable value for managed PDB, can be either string or integer; if it's string, should end with %" }, "minAvailable": { "anyOf": [ { "pattern": ".*%$", "type": "string" }, { "type": "integer" } ], "description": "minAvailable value for managed PDB, can be either string or integer; if it's string, should end with %" } }, "type": "object" }, "podLabels": { "properties": {}, "title": "The podLabels Schema", "type": "object" }, "replicaCount": { "type": "integer" }, "resources": { "$ref": "#/definitions/Resources" }, "tolerations": { "default": [ { "key": "CriticalAddonsOnly", "operator": "Exists" }, { "effect": "NoSchedule", "key": "node-role.kubernetes.io/control-plane" } ], "description": "Tolerations of the coredns pod", "items": { "type": "object" }, "type": "array" }, "topologySpreadConstraints": { "description": "The coredns pod topology spread constraints", "type": "array" } }, "title": "Coredns", "type": "object" }, "Limits": { "additionalProperties": false, "properties": { "cpu": { "type": "string" }, "memory": { "type": "string" } }, "title": "Limits", "type": "object" }, "Resources": { "additionalProperties": false, "properties": { "limits": { "$ref": "#/definitions/Limits" }, "requests": { "$ref": "#/definitions/Limits" } }, "title": "Resources", "type": "object" } } }
それでは再度設定してみましょう。
$ aws eks update-addon \ --cluster-name non-97-eks \ --addon-name coredns \ --resolve-conflicts PRESERVE \ --configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 2, "maxReplicas": 10}}' { "update": { "id": "99e3ba65-bc09-3b3b-a7f1-5149884a3864", "status": "InProgress", "type": "AddonUpdate", "params": [ { "type": "ResolveConflicts", "value": "PRESERVE" }, { "type": "ConfigurationValues", "value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}" } ], "createdAt": "2024-05-26T10:39:38.591000+09:00", "errors": [] } } $ aws eks describe-addon \ --cluster-name non-97-eks \ --addon-name coredns { "addon": { "addonName": "coredns", "clusterName": "non-97-eks", "status": "ACTIVE", "addonVersion": "v1.11.1-eksbuild.9", "health": { "issues": [] }, "addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks/coredns/a2c7d93c-d864-1896-c7ee-065e272910f9", "createdAt": "2024-05-26T10:17:50.822000+09:00", "modifiedAt": "2024-05-26T10:39:41.867000+09:00", "tags": {}, "configurationValues": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}" } }
設定できました。
なお、特にこの時点では動きはありませんでした。
$ kubectl get deployments -n kube-system coredns NAME READY UP-TO-DATE AVAILABLE AGE coredns 2/2 2 2 66m $ kubectl describe deployments coredns -n kube-system Name: coredns Namespace: kube-system CreationTimestamp: Sun, 26 May 2024 09:37:01 +0900 Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns kubernetes.io/name=CoreDNS Annotations: deployment.kubernetes.io/revision: 2 Selector: eks.amazonaws.com/component=coredns,k8s-app=kube-dns Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 1 max unavailable, 25% max surge Pod Template: Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns Service Account: coredns Containers: coredns: Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns Priority Class Name: system-cluster-critical Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: coredns-586b798467 (0/0 replicas created) NewReplicaSet: coredns-86d5d9b668 (2/2 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set coredns-86d5d9b668 to 1 Normal ScalingReplicaSet 11m deployment-controller Scaled down replica set coredns-586b798467 to 1 from 2 Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set coredns-86d5d9b668 to 2 from 1 Normal ScalingReplicaSet 11m deployment-controller Scaled down replica set coredns-586b798467 to 0 from 1
Auto Scalingすることを確認
Node数を10に増やす
Auto Scalingすることを確認していきましょう。
先述のとおり、「Node数に応じてスケールする」といった書きっぷりがドキュメントにあったので、Node数を10に増やして様子をみます。
$ eksctl scale nodegroup \ --cluster=non-97-eks \ --name=ng-bf93e531 \ --nodes=10 \ --nodes-min=10 \ --nodes-max=10 2024-05-26 10:47:06 [ℹ] scaling nodegroup "ng-bf93e531" in cluster non-97-eks 2024-05-26 10:47:08 [ℹ] initiated scaling of nodegroup 2024-05-26 10:47:08 [ℹ] to see the status of the scaling run `eksctl get nodegroup --cluster non-97-eks --region us-east-1 --name ng-bf93e531`
Node数とPod数を確認します。
$ kubectl get node NAME STATUS ROLES AGE VERSION ip-192-168-12-9.ec2.internal Ready <none> 15s v1.30.0-eks-fff26e3 ip-192-168-14-243.ec2.internal Ready <none> 17s v1.30.0-eks-fff26e3 ip-192-168-18-3.ec2.internal Ready <none> 17s v1.30.0-eks-fff26e3 ip-192-168-23-86.ec2.internal Ready <none> 12s v1.30.0-eks-fff26e3 ip-192-168-32-210.ec2.internal Ready <none> 17s v1.30.0-eks-fff26e3 ip-192-168-38-149.ec2.internal Ready <none> 16s v1.30.0-eks-fff26e3 ip-192-168-4-116.ec2.internal Ready <none> 65m v1.30.0-eks-fff26e3 ip-192-168-42-185.ec2.internal Ready <none> 17s v1.30.0-eks-fff26e3 ip-192-168-47-147.ec2.internal Ready <none> 65m v1.30.0-eks-fff26e3 ip-192-168-61-1.ec2.internal Ready <none> 17s v1.30.0-eks-fff26e3 $ kubectl get pod -n kube-system NAME READY STATUS RESTARTS AGE aws-node-5hn7v 2/2 Running 0 32s aws-node-895dk 2/2 Running 0 35s aws-node-gwk9x 2/2 Running 0 37s aws-node-nlmqb 2/2 Running 0 37s aws-node-p6kvd 2/2 Running 0 65m aws-node-ppxpj 2/2 Running 0 37s aws-node-qbnsr 2/2 Running 0 36s aws-node-vn45f 2/2 Running 0 37s aws-node-w2cq2 2/2 Running 0 36s aws-node-wk7dv 2/2 Running 0 65m coredns-86d5d9b668-rhvrg 1/1 Running 0 16m coredns-86d5d9b668-tqc5b 1/1 Running 0 16m kube-proxy-4vxl5 1/1 Running 0 37s kube-proxy-8jcp5 1/1 Running 0 37s kube-proxy-8w7lw 1/1 Running 0 37s kube-proxy-9t5z2 1/1 Running 0 36s kube-proxy-gjnqx 1/1 Running 0 35s kube-proxy-gz6h6 1/1 Running 0 36s kube-proxy-ml8jb 1/1 Running 0 65m kube-proxy-nzqwd 1/1 Running 0 65m kube-proxy-xcnhd 1/1 Running 0 37s kube-proxy-z4bb5 1/1 Running 0 32s metrics-server-7ffbc6d68-49bvd 1/1 Running 0 44m $ kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-86d5d9b668-rhvrg 1/1 Running 0 20m 192.168.3.147 ip-192-168-4-116.ec2.internal <none> <none> coredns-86d5d9b668-tqc5b 1/1 Running 0 20m 192.168.38.204 ip-192-168-47-147.ec2.internal <none> <none> $ kubectl describe deployments coredns -n kube-system Name: coredns Namespace: kube-system CreationTimestamp: Sun, 26 May 2024 09:37:01 +0900 Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns kubernetes.io/name=CoreDNS Annotations: deployment.kubernetes.io/revision: 2 Selector: eks.amazonaws.com/component=coredns,k8s-app=kube-dns Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 1 max unavailable, 25% max surge Pod Template: Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns Service Account: coredns Containers: coredns: Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns Priority Class Name: system-cluster-critical Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: coredns-586b798467 (0/0 replicas created) NewReplicaSet: coredns-86d5d9b668 (2/2 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 17m deployment-controller Scaled up replica set coredns-86d5d9b668 to 1 Normal ScalingReplicaSet 17m deployment-controller Scaled down replica set coredns-586b798467 to 1 from 2 Normal ScalingReplicaSet 17m deployment-controller Scaled up replica set coredns-86d5d9b668 to 2 from 1 Normal ScalingReplicaSet 17m deployment-controller Scaled down replica set coredns-586b798467 to 0 from 1
Nodeは10個に増えましたが、CoreDNSのPodは増えていませんね。
t4g.micro のNodeを20個追加
Node全体のCPUのコア数が足りないのでしょうか?
t4g.micro のNodeを20個追加してみます。
$ eksctl create nodegroup \ --cluster=non-97-eks \ --node-type=t4g.nano \ --nodes=20 \ --nodes-min=20 \ --nodes-max=20 \ --node-volume-size=2 \ --node-volume-type=gp3 \ --node-ami-family=Bottlerocket \ --spot \ --managed 2024-05-26 11:04:21 [ℹ] will use version 1.30 for new nodegroup(s) based on control plane version 2024-05-26 11:04:27 [ℹ] nodegroup "ng-4a521135" will use "" [Bottlerocket/1.30] 2024-05-26 11:04:29 [ℹ] 1 existing nodegroup(s) (ng-bf93e531) will be excluded 2024-05-26 11:04:29 [ℹ] 1 nodegroup (ng-4a521135) was included (based on the include/exclude rules) 2024-05-26 11:04:29 [ℹ] will create a CloudFormation stack for each of 1 managed nodegroups in cluster "non-97-eks" 2024-05-26 11:04:30 [ℹ] 2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "ng-4a521135" } } } 2024-05-26 11:04:30 [ℹ] checking cluster stack for missing resources 2024-05-26 11:04:31 [ℹ] cluster stack has all required resources 2024-05-26 11:04:33 [ℹ] building managed nodegroup stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:04:33 [ℹ] deploying stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:04:34 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:05:04 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:05:56 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:07:09 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:08:52 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:10:05 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:11:47 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:13:41 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:14:13 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:15:41 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:17:32 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135" 2024-05-26 11:17:33 [ℹ] no tasks 2024-05-26 11:17:33 [✔] created 0 nodegroup(s) in cluster "non-97-eks" 2024-05-26 11:17:34 [ℹ] nodegroup "ng-4a521135" has 20 node(s) 2024-05-26 11:17:34 [ℹ] node "ip-192-168-14-142.ec2.internal" is ready 2024-05-26 11:17:34 [ℹ] node "ip-192-168-14-214.ec2.internal" is ready 2024-05-26 11:17:34 [ℹ] node "ip-192-168-19-169.ec2.internal" is ready 2024-05-26 11:17:34 [ℹ] node "ip-192-168-20-134.ec2.internal" is ready . . (中略) . . 2024-05-26 11:17:34 [ℹ] node "ip-192-168-6-253.ec2.internal" is ready 2024-05-26 11:17:34 [ℹ] node "ip-192-168-60-173.ec2.internal" is ready 2024-05-26 11:17:34 [ℹ] node "ip-192-168-62-70.ec2.internal" is ready 2024-05-26 11:17:34 [ℹ] node "ip-192-168-9-217.ec2.internal" is ready 2024-05-26 11:17:34 [✔] created 1 managed nodegroup(s) in cluster "non-97-eks" 2024-05-26 11:17:36 [ℹ] checking security group configuration for all nodegroups 2024-05-26 11:17:36 [ℹ] all nodegroups have up-to-date cloudformation templates
Node数とPod数を確認します。
$ kubectl get node NAME STATUS ROLES AGE VERSION ip-192-168-12-9.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3 ip-192-168-14-142.ec2.internal Ready <none> 9m18s v1.30.0-eks-fff26e3 ip-192-168-14-214.ec2.internal Ready <none> 61s v1.30.0-eks-fff26e3 ip-192-168-14-243.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3 ip-192-168-18-3.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3 ip-192-168-19-169.ec2.internal Ready <none> 52s v1.30.0-eks-fff26e3 ip-192-168-20-134.ec2.internal Ready <none> 11m v1.30.0-eks-fff26e3 ip-192-168-20-208.ec2.internal Ready <none> 51s v1.30.0-eks-fff26e3 ip-192-168-22-87.ec2.internal Ready <none> 9m9s v1.30.0-eks-fff26e3 ip-192-168-23-86.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3 ip-192-168-25-248.ec2.internal Ready <none> 48s v1.30.0-eks-fff26e3 ip-192-168-30-238.ec2.internal Ready <none> 53s v1.30.0-eks-fff26e3 ip-192-168-32-210.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3 ip-192-168-34-209.ec2.internal Ready <none> 6m55s v1.30.0-eks-fff26e3 ip-192-168-38-149.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3 ip-192-168-4-116.ec2.internal Ready <none> 94m v1.30.0-eks-fff26e3 ip-192-168-4-197.ec2.internal Ready <none> 11m v1.30.0-eks-fff26e3 ip-192-168-4-37.ec2.internal Ready <none> 9m6s v1.30.0-eks-fff26e3 ip-192-168-40-63.ec2.internal Ready <none> 5m25s v1.30.0-eks-fff26e3 ip-192-168-42-155.ec2.internal Ready <none> 5m12s v1.30.0-eks-fff26e3 ip-192-168-42-185.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3 ip-192-168-42-213.ec2.internal Ready <none> 5m16s v1.30.0-eks-fff26e3 ip-192-168-47-147.ec2.internal Ready <none> 94m v1.30.0-eks-fff26e3 ip-192-168-47-240.ec2.internal Ready <none> 5m13s v1.30.0-eks-fff26e3 ip-192-168-58-141.ec2.internal Ready <none> 5m13s v1.30.0-eks-fff26e3 ip-192-168-6-253.ec2.internal Ready <none> 48s v1.30.0-eks-fff26e3 ip-192-168-60-173.ec2.internal Ready <none> 5m13s v1.30.0-eks-fff26e3 ip-192-168-61-1.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3 ip-192-168-62-70.ec2.internal Ready <none> 5m12s v1.30.0-eks-fff26e3 ip-192-168-9-217.ec2.internal Ready <none> 11m v1.30.0-eks-fff26e3 $ kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-86d5d9b668-rhvrg 1/1 Running 0 45m 192.168.3.147 ip-192-168-4-116.ec2.internal <none> <none> coredns-86d5d9b668-tqc5b 1/1 Running 0 45m 192.168.38.204 ip-192-168-47-147.ec2.internal <none> <none> $ kubectl describe deployments coredns -n kube-system Name: coredns Namespace: kube-system CreationTimestamp: Sun, 26 May 2024 09:37:01 +0900 Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns kubernetes.io/name=CoreDNS Annotations: deployment.kubernetes.io/revision: 2 Selector: eks.amazonaws.com/component=coredns,k8s-app=kube-dns Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 1 max unavailable, 25% max surge Pod Template: Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns Service Account: coredns Containers: coredns: Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns Priority Class Name: system-cluster-critical Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: coredns-586b798467 (0/0 replicas created) NewReplicaSet: coredns-86d5d9b668 (2/2 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 47m deployment-controller Scaled up replica set coredns-86d5d9b668 to 1 Normal ScalingReplicaSet 47m deployment-controller Scaled down replica set coredns-586b798467 to 1 from 2 Normal ScalingReplicaSet 47m deployment-controller Scaled up replica set coredns-86d5d9b668 to 2 from 1 Normal ScalingReplicaSet 47m deployment-controller Scaled down replica set coredns-586b798467 to 0 from 1
CoreDNSのPod数は変わりありません。
最小を3、最大を1,000に設定してみる
Auto Scalingの設定を最小を3、最大を1,000に設定してみて挙動を確認します。
$ aws eks update-addon \ --cluster-name non-97-eks \ --addon-name coredns \ --resolve-conflicts PRESERVE \ --configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 3, "maxReplicas": 1000}}' { "update": { "id": "b0625d52-3a49-3d93-94d3-049ce5e98ff5", "status": "InProgress", "type": "AddonUpdate", "params": [ { "type": "ResolveConflicts", "value": "PRESERVE" }, { "type": "ConfigurationValues", "value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 3, \"maxReplicas\": 1000}}" } ], "createdAt": "2024-05-26T11:34:14.326000+09:00", "errors": [] } }
Pod数を確認します。
$ kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES coredns-86d5d9b668-65xcp 1/1 Running 0 29s 192.168.9.74 ip-192-168-23-86.ec2.internal <none> <none> coredns-86d5d9b668-rhvrg 1/1 Running 0 64m 192.168.3.147 ip-192-168-4-116.ec2.internal <none> <none> coredns-86d5d9b668-tqc5b 1/1 Running 0 64m 192.168.38.204 ip-192-168-47-147.ec2.internal <none> <none> $ kubectl rollout history deployment/coredns -n kube-system deployment.apps/coredns REVISION CHANGE-CAUSE 1 <none> 2 <none> $ kubectl describe deployments coredns -n kube-system Name: coredns Namespace: kube-system CreationTimestamp: Sun, 26 May 2024 09:37:01 +0900 Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns kubernetes.io/name=CoreDNS Annotations: deployment.kubernetes.io/revision: 2 Selector: eks.amazonaws.com/component=coredns,k8s-app=kube-dns Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable StrategyType: RollingUpdate MinReadySeconds: 0 RollingUpdateStrategy: 1 max unavailable, 25% max surge Pod Template: Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns Service Account: coredns Containers: coredns: Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns Priority Class Name: system-cluster-critical Conditions: Type Status Reason ---- ------ ------ Available True MinimumReplicasAvailable Progressing True NewReplicaSetAvailable OldReplicaSets: coredns-586b798467 (0/0 replicas created) NewReplicaSet: coredns-86d5d9b668 (3/3 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 70s deployment-controller Scaled up replica set coredns-86d5d9b668 to 3 from 2
CoreDNSで名前解決をして負荷をかける dig 編
CoreDNSで名前解決をして負荷をかけた時の挙動を確認します。
定期的にdigを叩くシェルスクリプトを実行するコンテナを用意します。Dockerfileとスクリプトは以下のとおりです。
FROM alpine:latest RUN apk add --no-cache bind-tools bash COPY ./dns-resolution.sh /usr/local/bin/ CMD ["/bin/bash", "/usr/local/bin/dns-resolution.sh"]
#!/bin/bash set -xu DOMAIN="${DOMAIN:-www.non-97.net}" INTERVAL="${INTERVAL:-5}" while true; do dig "${DOMAIN}" +short sleep "${INTERVAL}" done
コンテナイメージをビルドして、作成したECRリポジトリにPushします。
$ docker build -t dns-resolution . [+] Building 2.8s (8/8) FINISHED => [internal] load build definition from Dockerfile 0.0s => => transferring dockerfile: 191B 0.0s => [internal] load .dockerignore 0.0s => => transferring context: 2B 0.0s => [internal] load metadata for docker.io/library/alpine:latest 2.7s => [1/3] FROM docker.io/library/alpine:latest@sha256:77726ef6b57ddf65bb551896826ec38bc3e53f75cdde31354fbffb4f25238ebd 0.0s => CACHED [2/3] RUN apk add --no-cache bind-tools bash 0.0s => [internal] load build context 0.0s => => transferring context: 39B 0.0s => [3/3] COPY ./dns-resolution.sh /usr/local/bin/ 0.0s => exporting to image 0.1s => => exporting layers 0.1s => => writing image sha256:2db16b22688d82039680917bfca20c70a811f8c18f49d51bd1c4caa7a5587873 0.0s => => naming to docker.io/library/dns-resolution 0.0s $ set AWS_ACCOUNT_ID (aws sts get-caller-identity --output text --query Account) $ set AWS_REGION (aws configure get region) $ aws ecr get-login-password \ | docker login \ --username AWS \ --password-stdin https://$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com Login Succeeded $ aws ecr create-repository --repository-name dns-resolution { "repository": { "repositoryArn": "arn:aws:ecr:us-east-1:<AWSアカウントID>:repository/dns-resolution", "registryId": "<AWSアカウントID>", "repositoryName": "dns-resolution", "repositoryUri": "<AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution", "createdAt": "2024-05-26T16:02:59.144000+09:00", "imageTagMutability": "MUTABLE", "imageScanningConfiguration": { "scanOnPush": false }, "encryptionConfiguration": { "encryptionType": "AES256" } } } $ set dns_resolution_repo (aws ecr describe-repositories \ --repository-names dns-resolution \ --query 'repositories[0].repositoryUri' \ --output text ) $ docker tag dns-resolution:latest $dns_resolution_repo:latest $ docker image ls | grep dns-resolution <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution latest 2db16b22688d 33 seconds ago 22.3MB dns-resolution latest 2db16b22688d 33 seconds ago 22.3MB $ docker push $dns_resolution_repo:latest The push refers to repository [<AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution] b51587f43b2b: Pushed 5d35fe5c895f: Pushed 50171d1acbd5: Pushed latest: digest: sha256:28fb6d58c37b6cb43ccb10cbe82d3565306c7dd9addb33678a3cb08bd7990101 size: 946
用意したコンテナを実行するマニフェストファイルを作成します。
apiVersion: apps/v1 kind: Deployment metadata: name: dns-resolution namespace: default spec: selector: matchLabels: app: dns-resolution replicas: 2 template: metadata: labels: app: dns-resolution spec: containers: - name: dns-resolution image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest env: - name: FQDN value: www.non-97.net - name: INTERVAL value: "3"
デプロイします。
$ kubectl apply -f ./dns-resolution-deployment.yml deployment.apps/dns-resolution configured $ kubectl get pod -n default NAME READY STATUS RESTARTS AGE dns-resolution-7cd64b54cf-bkwr4 1/1 Running 0 45s dns-resolution-7cd64b54cf-vsk79 1/1 Running 11 (6m15s ago) 32m $ stern dns-resolution -n default + dns-resolution-7cd64b54cf-bkwr4 › dns-resolution + dns-resolution-7cd64b54cf-vsk79 › dns-resolution dns-resolution-7cd64b54cf-bkwr4 dns-resolution + DOMAIN=www.non-97.net dns-resolution-7cd64b54cf-bkwr4 dns-resolution + INTERVAL=3 dns-resolution-7cd64b54cf-bkwr4 dns-resolution + true dns-resolution-7cd64b54cf-bkwr4 dns-resolution + dig www.non-97.net +short dns-resolution-7cd64b54cf-bkwr4 dns-resolution + sleep 3 dns-resolution-7cd64b54cf-bkwr4 dns-resolution + true dns-resolution-7cd64b54cf-bkwr4 dns-resolution + dig www.non-97.net +short dns-resolution-7cd64b54cf-bkwr4 dns-resolution + sleep 3 dns-resolution-7cd64b54cf-bkwr4 dns-resolution + true dns-resolution-7cd64b54cf-bkwr4 dns-resolution + dig www.non-97.net +short dns-resolution-7cd64b54cf-bkwr4 dns-resolution + sleep 3 . . (以下略) . . $ kubectl top pod -n kube-system NAME CPU(cores) MEMORY(bytes) aws-node-8c5v7 3m 41Mi aws-node-f9lrm 3m 42Mi aws-node-nnjln 3m 42Mi coredns-86d5d9b668-j76c6 2m 12Mi coredns-86d5d9b668-jltmm 2m 12Mi coredns-86d5d9b668-vqgqp 1m 12Mi kube-proxy-62hmp 1m 11Mi kube-proxy-gncj8 1m 11Mi kube-proxy-wx7hr 1m 12Mi metrics-server-7ffbc6d68-4srp5 3m 18Mi
流石にこの程度ではCoreDNSのPodの負荷はかかっていません。
100個同時に実行してみます。マニフェストファイルは以下のとおりです。
apiVersion: apps/v1 kind: Deployment metadata: name: dns-resolution namespace: default spec: selector: matchLabels: app: dns-resolution replicas: 100 template: metadata: labels: app: dns-resolution spec: containers: - name: dns-resolution image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest env: - name: FQDN value: www.non-97.net - name: INTERVAL value: "0.000001"
流石にNodeが足りないと思うので、Nodeを10個用意します。
$ eksctl scale nodegroup \ --cluster=non-97-eks \ --name=ng-b0fc3fde \ --nodes=10 \ --nodes-min=10 \ --nodes-max=10 2024-05-26 17:00:40 [ℹ] scaling nodegroup "ng-b0fc3fde" in cluster non-97-eks 2024-05-26 17:00:43 [ℹ] initiated scaling of nodegroup 2024-05-26 17:00:43 [ℹ] to see the status of the scaling run `eksctl get nodegroup --cluster non-97-eks --region us-east-1 --name ng-b0fc3fde` $ kubectl get deployment -n default NAME READY UP-TO-DATE AVAILABLE AGE dns-resolution 86/100 100 86 52m $ kubectl top pod -l k8s-app=kube-dns -n kube-system NAME CPU(cores) MEMORY(bytes) coredns-86d5d9b668-j76c6 53m 15Mi coredns-86d5d9b668-jltmm 55m 15Mi coredns-86d5d9b668-vqgqp 53m 15Mi
digを叩くPodが86個起動しましたが、CoreDNSの負荷が足りません。
CoreDNSで名前解決をして負荷をかける dnsperf 編
digではなく、DNSのベンチマークツールであるdnsperfを実行するように変更します。
CoreDNSのサービスのクラスターIPアドレスに対して名前解決をします。
事前準備として、dig編と同じになるようにCoreDNSのPodの最小を2、最大10に変更しておきます。
$ aws eks update-addon \ --cluster-name non-97-eks \ --addon-name coredns \ --resolve-conflicts PRESERVE \ --configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 2, "maxReplicas": 10}}' { "update": { "id": "eabb479c-769f-371e-869a-3abeeb23cfbc", "status": "InProgress", "type": "AddonUpdate", "params": [ { "type": "ResolveConflicts", "value": "PRESERVE" }, { "type": "ConfigurationValues", "value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}" } ], "createdAt": "2024-05-26T17:09:47.277000+09:00", "errors": [] } }
Dockerfileとシェルスクリプトは以下のとおりです。
FROM --platform=linux/arm64 ubuntu:latest AS build RUN apt-get update && apt-get install -y dnsperf FROM --platform=linux/arm64 ubuntu:latest COPY --from=build /usr/bin/dnsperf /usr/bin/ COPY --from=build /usr/lib/aarch64-linux-gnu/libldns.so.3 /usr/lib/ COPY --from=build /usr/lib/aarch64-linux-gnu/libnghttp2.so.14 /usr/lib/ COPY ./dns-resolution.sh /usr/local/bin/ CMD ["/bin/bash", "/usr/local/bin/dns-resolution.sh"]
#!/bin/bash set -u DOMAIN="${DOMAIN:-www.non-97.net}" SERVER_ADDR="${SERVER_ADDR:-10.100.0.10}" MAXRUNS="${MAXRUNS:-5}" CLIENTS="${CLIENTS:-1}" echo "${DOMAIN} A" >"query_random_list.txt" while true; do dnsperf -d query_random_list.txt -l "${MAXRUNS}" -s "${SERVER_ADDR}" -c "${CLIENTS}" done
ビルドしてECRにPushしたのち、以下マニフェストファイルで起動します。
apiVersion: apps/v1 kind: Deployment metadata: name: dns-resolution namespace: default spec: selector: matchLabels: app: dns-resolution replicas: 2 template: metadata: labels: app: dns-resolution spec: containers: - name: dns-resolution image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest env: - name: FQDN value: www.non-97.net - name: SERVER_ADDR value: 10.100.0.10 - name: MAXRUNS value: "10" - name: CLIENTS value: "10"
Podの様子を確認します。
$ kubectl apply -f ./dns-resolution-deployment.yml deployment.apps/dns-resolution configured $ kubectl get pod -n default NAME READY STATUS RESTARTS AGE dns-resolution-55c8658865-mp7sj 1/1 Running 0 13s dns-resolution-55c8658865-t7qth 1/1 Running 0 13s $ stern dns-resolution -n default + dns-resolution-55c8658865-mp7sj › dns-resolution + dns-resolution-55c8658865-t7qth › dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution DNS Performance Testing Tool dns-resolution-55c8658865-mp7sj dns-resolution Version 2.14.0 dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution [Status] Command line: dnsperf -d query_random_list.txt -l 10 -s 10.100.0.10 -c 10 dns-resolution-55c8658865-mp7sj dns-resolution [Status] Sending queries (to 10.100.0.10:53) dns-resolution-55c8658865-mp7sj dns-resolution [Status] Started at: Sun May 26 23:05:24 2024 dns-resolution-55c8658865-mp7sj dns-resolution [Status] Stopping after 10.000000 seconds dns-resolution-55c8658865-mp7sj dns-resolution [Status] Testing complete (time limit) dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution Statistics: dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution Queries sent: 244371 dns-resolution-55c8658865-mp7sj dns-resolution Queries completed: 244371 (100.00%) dns-resolution-55c8658865-mp7sj dns-resolution Queries lost: 0 (0.00%) dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution Response codes: SERVFAIL 244371 (100.00%) dns-resolution-55c8658865-mp7sj dns-resolution Average packet size: request 32, response 32 dns-resolution-55c8658865-mp7sj dns-resolution Run time (s): 10.008380 dns-resolution-55c8658865-mp7sj dns-resolution Queries per second: 24416.638857 dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution Average Latency (s): 0.003310 (min 0.000047, max 0.059284) dns-resolution-55c8658865-mp7sj dns-resolution Latency StdDev (s): 0.002724 dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution DNS Performance Testing Tool dns-resolution-55c8658865-mp7sj dns-resolution Version 2.14.0 dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution [Status] Command line: dnsperf -d query_random_list.txt -l 10 -s 10.100.0.10 -c 10 dns-resolution-55c8658865-mp7sj dns-resolution [Status] Sending queries (to 10.100.0.10:53) dns-resolution-55c8658865-mp7sj dns-resolution [Status] Started at: Sun May 26 23:05:34 2024 dns-resolution-55c8658865-mp7sj dns-resolution [Status] Stopping after 10.000000 seconds dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 5432 dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 5438 . . (中略) . . dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 40701 dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 40428 dns-resolution-55c8658865-mp7sj dns-resolution [Status] Testing complete (time limit) dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution Statistics: dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution Queries sent: 411875 dns-resolution-55c8658865-mp7sj dns-resolution Queries completed: 411852 (99.99%) dns-resolution-55c8658865-mp7sj dns-resolution Queries lost: 23 (0.01%) dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution Response codes: SERVFAIL 411852 (100.00%) dns-resolution-55c8658865-mp7sj dns-resolution Average packet size: request 32, response 32 dns-resolution-55c8658865-mp7sj dns-resolution Run time (s): 10.006569 dns-resolution-55c8658865-mp7sj dns-resolution Queries per second: 41158.163203 dns-resolution-55c8658865-mp7sj dns-resolution dns-resolution-55c8658865-mp7sj dns-resolution Average Latency (s): 0.001978 (min 0.000038, max 0.044901) dns-resolution-55c8658865-mp7sj dns-resolution Latency StdDev (s): 0.001715 dns-resolution-55c8658865-mp7sj dns-resolution $ kubectl top pod -l k8s-app=kube-dns -n kube-system NAME CPU(cores) MEMORY(bytes) coredns-86d5d9b668-jc4cx 1413m 19Mi coredns-86d5d9b668-nd792 1549m 20Mi $ kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-192-168-0-143.ec2.internal 28m 1% 363Mi 26% ip-192-168-51-53.ec2.internal 1847m 95% 396Mi 29% ip-192-168-8-43.ec2.internal 1873m 97% 388Mi 28%
かなりCoreDNSのPodのCPU負荷が高まってきました。
しかし、このまま放置してもCoreDNSのPod数は変わりありませんでした。
Node数を4つに、dnsperfのPodを10個に実行するようにしても変わりありませんでした。
$ kubectl top pod -l k8s-app=kube-dns -n kube-system NAME CPU(cores) MEMORY(bytes) coredns-86d5d9b668-jc4cx 1411m 23Mi coredns-86d5d9b668-nd792 1369m 17Mi $ kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-192-168-0-143.ec2.internal 1094m 56% 409Mi 29% ip-192-168-33-71.ec2.internal 1202m 62% 362Mi 26% ip-192-168-51-53.ec2.internal 1869m 96% 418Mi 30% ip-192-168-8-43.ec2.internal 1913m 99% 423Mi 31%
Kubernetes 1.29で再チャレンジ
EKSクラスターの作成
もしかすると、Kubernetes 1.30を使用しているのが良くないのでしょうか。
AWS公式ドキュメントには確かにKubernetes 1.30への言及はありませんね。
Autoscaling CoreDNS - Amazon EKS
Kubernetes 1.29で再チャレンジします。
EKSクラスターを再作成します。
$ eksctl create cluster \ --name=non-97-eks-129 \ --version 1.29 \ --nodes=4 \ --nodes-min=4 \ --nodes-max=4 \ --node-volume-size=0 \ --node-volume-type=gp3 \ --node-ami-family=Bottlerocket \ --instance-types=t4g.small \ --spot \ --managed \ --region us-east-1 $ aws eks describe-cluster \ --name non-97-eks-129 \ --query cluster.version "1.29" $ aws eks describe-cluster \ --name non-97-eks-129 \ --query cluster.platformVersion "eks.7"
metrics-serverのインストールをします。
$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml serviceaccount/metrics-server created clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created clusterrole.rbac.authorization.k8s.io/system:metrics-server created rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created service/metrics-server created deployment.apps/metrics-server created apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created kubectl get deployment metrics-server -n kube-system NAME READY UP-TO-DATE AVAILABLE AGE metrics-server 1/1 1 1 15m
CoreDNSのAuto Scaling設定
CoreDNSのAuto Scaling設定をします。
$ aws eks create-addon \ --cluster-name non-97-eks-129 \ --addon-name coredns \ --addon-version v1.11.1-eksbuild.9 { "addon": { "addonName": "coredns", "clusterName": "non-97-eks-129", "status": "CREATING", "addonVersion": "v1.11.1-eksbuild.9", "health": { "issues": [] }, "addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks-129/coredns/c4c7dc2a-46ad-4837-0397-57b8ef4fa96a", "createdAt": "2024-05-27T13:35:00.355000+09:00", "modifiedAt": "2024-05-27T13:35:00.389000+09:00", "tags": {} } } $ aws eks update-addon \ --cluster-name non-97-eks-129 \ --addon-name coredns \ --resolve-conflicts PRESERVE \ --configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 2, "maxReplicas": 10}}' { "update": { "id": "8b3ac3f8-819a-3f57-9fd2-97389feed79c", "status": "InProgress", "type": "AddonUpdate", "params": [ { "type": "ResolveConflicts", "value": "PRESERVE" }, { "type": "ConfigurationValues", "value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}" } ], "createdAt": "2024-05-27T13:35:59.154000+09:00", "errors": [] } } $ aws eks describe-addon \ --cluster-name non-97-eks-129 \ --addon-name coredns { "addon": { "addonName": "coredns", "clusterName": "non-97-eks-129", "status": "ACTIVE", "addonVersion": "v1.11.1-eksbuild.9", "health": { "issues": [] }, "addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks-129/coredns/c4c7dc2a-46ad-4837-0397-57b8ef4fa96a", "createdAt": "2024-05-27T13:35:00.355000+09:00", "modifiedAt": "2024-05-27T13:36:02.438000+09:00", "tags": {}, "configurationValues": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}" } }
CoreDNSのPod数を確認しておきます。
$ kubectl top pod -l k8s-app=kube-dns -n kube-system NAME CPU(cores) MEMORY(bytes) coredns-bf47b49b-kzxdh 1m 11Mi coredns-bf47b49b-qf9jm 1m 11Mi
t4g.micro のNodeを10個追加
t4g.micro のNodeを10個追加します。
$ eksctl create nodegroup \ --cluster=non-97-eks-129 \ --node-type=t4g.micro \ --nodes=10 \ --nodes-min=10 \ --nodes-max=10 \ --node-volume-size=2 \ --node-volume-type=gp3 \ --node-ami-family=Bottlerocket \ --spot \ --managed $ kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-192-168-1-0.ec2.internal 22m 1% 277Mi 54% ip-192-168-11-78.ec2.internal 17m 0% 272Mi 53% ip-192-168-12-236.ec2.internal 22m 1% 454Mi 33% ip-192-168-14-155.ec2.internal 17m 0% 273Mi 53% ip-192-168-24-76.ec2.internal 33m 1% 270Mi 52% ip-192-168-3-239.ec2.internal 19m 0% 282Mi 55% ip-192-168-38-149.ec2.internal 29m 1% 266Mi 51% ip-192-168-41-178.ec2.internal 17m 0% 262Mi 51% ip-192-168-48-176.ec2.internal 16m 0% 363Mi 26% ip-192-168-48-185.ec2.internal 22m 1% 407Mi 29% ip-192-168-49-7.ec2.internal 22m 1% 274Mi 53% ip-192-168-57-3.ec2.internal 24m 1% 276Mi 53% ip-192-168-59-122.ec2.internal 19m 0% 420Mi 30% ip-192-168-61-175.ec2.internal 20m 1% 283Mi 55% $ kubectl top pod -l k8s-app=kube-dns -n kube-system NAME CPU(cores) MEMORY(bytes) coredns-bf47b49b-f7rzg 1m 12Mi coredns-bf47b49b-kzxdh 1m 12Mi
CoreDNSのPod数は2つのままでした。
このまま1時間弱放置しましたが、2つのままでした。
CoreDNSで名前解決をして負荷をかける
dnsperf
を使ってCoreDNSに負荷をかけます。
使用したマニフェストファイルは以下のとおりです。
apiVersion: apps/v1 kind: Deployment metadata: name: dns-resolution namespace: default spec: selector: matchLabels: app: dns-resolution replicas: 5 template: metadata: labels: app: dns-resolution spec: containers: - name: dns-resolution image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest env: - name: FQDN value: www.non-97.net - name: SERVER_ADDR value: 10.100.0.10 - name: MAXRUNS value: "10" - name: CLIENTS value: "15"
デプロイしてCoreDNSのPod数を確認します。
$ kubectl apply -f ./dns-resolution-deployment.yml deployment.apps/dns-resolution configured $ kubectl get pod -n default NAME READY STATUS RESTARTS AGE dns-resolution-75985fd469-99mqj 1/1 Running 0 75s dns-resolution-75985fd469-k5g4v 1/1 Running 0 75s dns-resolution-75985fd469-v7h8w 1/1 Running 0 75s dns-resolution-75985fd469-z4hsv 1/1 Running 0 75s dns-resolution-75985fd469-zvv9c 1/1 Running 0 75s $ kubectl top pod -l k8s-app=kube-dns -n kube-system NAME CPU(cores) MEMORY(bytes) coredns-bf47b49b-f7rzg 1530m 22Mi coredns-bf47b49b-w9r87 1511m 19Mi $ kubectl top node NAME CPU(cores) CPU% MEMORY(bytes) MEMORY% ip-192-168-1-0.ec2.internal 478m 24% 301Mi 58% ip-192-168-11-78.ec2.internal 16m 0% 269Mi 52% ip-192-168-12-236.ec2.internal 1854m 96% 495Mi 36% ip-192-168-14-155.ec2.internal 17m 0% 261Mi 51% ip-192-168-19-132.ec2.internal 487m 25% 412Mi 30% ip-192-168-24-76.ec2.internal 26m 1% 265Mi 51% ip-192-168-3-239.ec2.internal 236m 12% 275Mi 53% ip-192-168-38-149.ec2.internal 20m 1% 262Mi 51% ip-192-168-41-178.ec2.internal 20m 1% 266Mi 52% ip-192-168-48-176.ec2.internal 543m 28% 415Mi 30% ip-192-168-48-185.ec2.internal 1849m 95% 452Mi 33% ip-192-168-49-7.ec2.internal 23m 1% 269Mi 52% ip-192-168-57-3.ec2.internal 15m 0% 273Mi 53% ip-192-168-61-175.ec2.internal 24m 1% 264Mi 51%
変わらず2つのままです。数分時間を置きましたが、変わりませんでした。
結局、具体的にどのような条件でCoreDNSのPodがAuto Scalingは不明でした。
ちなみにkubectl rollout history
を確認しても、動きは特にありませんでした。
kubectl rollout history deployment/coredns -n kube-system deployment.apps/coredns REVISION CHANGE-CAUSE 1 <none> 2 <none> $ kubectl rollout history deployment/coredns -n kube-system --revision 1 deployment.apps/coredns with revision #1 Pod Template: Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns pod-template-hash=54d6f577c6 Service Account: coredns Containers: coredns: Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.4 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false Priority Class Name: system-cluster-critical $ kubectl rollout history deployment/coredns -n kube-system --revision 2 deployment.apps/coredns with revision #2 Pod Template: Labels: eks.amazonaws.com/component=coredns k8s-app=kube-dns pod-template-hash=bf47b49b Service Account: coredns Containers: coredns: Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9 Ports: 53/UDP, 53/TCP, 9153/TCP Host Ports: 0/UDP, 0/TCP, 0/TCP Args: -conf /etc/coredns/Corefile Limits: memory: 170Mi Requests: cpu: 100m memory: 70Mi Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5 Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /etc/coredns from config-volume (ro) Volumes: config-volume: Type: ConfigMap (a volume populated by a ConfigMap) Name: coredns Optional: false Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns Priority Class Name: system-cluster-critical
EKSアドオンでCoreDNSのAuto Scaling設定ができるようになりました
Amazon EKS が CoreDNS PodのAuto Scalingをネイティブサポートしたアップデートを紹介しました。
EKSアドオンでCoreDNSのAuto Scaling設定ができるようになったのは嬉しいですね。
ただ、個人的にはどのタイミングでCoreDNSのPodがスケールするのか具体的な条件が分からなかったのが気になります。時間があれば再度検証して追記しようと思います。
この記事が誰かの助けになれば幸いです。
以上、AWS事業本部 コンサルティング部の のんピ(@non____97)でした!