負荷に応じてCoreDNSがAuto Scalingするように簡単に設定したい
こんにちは、のんピ(@non____97)です。
皆さんは負荷に応じてCoreDNSがAuto Scalingするように簡単に設定したいなと思ったことはありますか? 私はあります。
CoreDNSのダウンしてしまうとPodから名前解決できなくなるため、Pod間やクラスター外への通信に影響があります。そのため、CoreDNSの可用性が高くなるような仕組みが必要が必要です。
今回、アップデートによりEKSがCoreDNS PodのAuto Scalingをネイティブサポートしました。
これにより、EKSアドオンの設定で行うことが可能です。以下のようにCluster Proportional Autoscalerといったツールを別途設定する必要はありません。
どんな動きをするのか気になったので、実際に触ってみました。
いきなりまとめ
- EKSアドオンでCoreDNS PodのAuto Scalingを設定できるようになった
- 最小は2、最大は1,000まで設定できる
- サポートされているEKSクラスターバージョンとプラットフォームバージョン、CoreDNS EKSアドオンのバージョンを満たす必要がある
- Auto Scalingの条件にはNode数やNodeのCPUコア数などがある
- 検証の中では具体的な条件は確認できず
設定
設定方法は以下AWS公式ドキュメントにまとまっています。
前提条件は以下のとおりです。
- CoreDNSのEKSアドオンを使用する必要がある
- EKSクラスターはサポートされているクラスターバージョンとプラットフォームバージョンで動作している必要がある
- EKSクラスターでサポートされているCoreDNSのEKSアドオンバージョンが動作している必要がある。
サポートされている最小のクラスターバージョンと、それぞれのCoreDNS EKSアドオンバージョンは以下のとおりです。
Kubernetes バージョン | プラットフォームバージョン | CoreDNS EKSアドオンバージョン |
---|---|---|
1.29.3 | eks.7 | v1.11.1-eksbuild.9 |
1.28.8 | eks.13 | v1.10.1-eksbuild.11 |
1.27.12 | eks.17 | v1.10.1-eksbuild.11 |
1.26.15 | eks.18 | v1.9.3-eksbuild.15 |
1.25.16 | eks.19 | v1.9.3-eksbuild.15 |
Kubernetes未満のEKSクラスターでは使用することはできません。
気になるのは何をトリガーにAuto Scalingするかです。ドキュメントを眺めているとNode数を増やしたり、NodeのCPUコア数を増やすとスケールするようです。
This CoreDNS autoscaler continuously monitors the cluster state, including the number of nodes and CPU cores. Based on that information, the controller will dynamically adapt the number of replicas of the CoreDNS deployment in an EKS cluster.
.
.
(中略)
.
.
As you change the number of nodes and CPU cores of nodes in the cluster, Amazon EKS scales the number of replicas of the CoreDNS deployment.
CoreDNSのPodのCPU負荷に応じてでは無いのでしょうか。ここも実際に触って確認します。
EKSクラスターの作成
まず、eksctl
を使って適当にEKSクラスターを作成します。
つい先日EKSがKubernetes 1.30をサポートしたので、1.30で作成します。
$ eksctl create cluster \
--name=non-97-eks \
--version 1.30 \
--nodes=2 \
--node-volume-size=2 \
--node-volume-type=gp3 \
--node-ami-family=Bottlerocket \
--instance-types=t4g.small \
--spot \
--managed \
--region us-east-1
2024-05-26 09:29:19 [ℹ] eksctl version 0.179.0-dev+b8f1ac4d7.2024-05-24T09:39:53Z
2024-05-26 09:29:19 [ℹ] using region us-east-1
2024-05-26 09:29:20 [ℹ] skipping us-east-1e from selection because it doesn't support the following instance type(s): t4g.small
2024-05-26 09:29:20 [ℹ] setting availability zones to [us-east-1c us-east-1a]
2024-05-26 09:29:20 [ℹ] subnets for us-east-1c - public:192.168.0.0/19 private:192.168.64.0/19
2024-05-26 09:29:20 [ℹ] subnets for us-east-1a - public:192.168.32.0/19 private:192.168.96.0/19
2024-05-26 09:29:20 [ℹ] nodegroup "ng-bf93e531" will use "" [Bottlerocket/1.30]
2024-05-26 09:29:20 [ℹ] using Kubernetes version 1.30
2024-05-26 09:29:20 [ℹ] creating EKS cluster "non-97-eks" in "us-east-1" region with managed nodes
2024-05-26 09:29:20 [ℹ] will create 2 separate CloudFormation stacks for cluster itself and the initial managed nodegroup
2024-05-26 09:29:20 [ℹ] if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --cluster=non-97-eks'
2024-05-26 09:29:20 [ℹ] Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "non-97-eks" in "us-east-1"
2024-05-26 09:29:20 [ℹ] CloudWatch logging will not be enabled for cluster "non-97-eks" in "us-east-1"
2024-05-26 09:29:20 [ℹ] you can enable it with 'eksctl utils update-cluster-logging --enable-types={SPECIFY-YOUR-LOG-TYPES-HERE (e.g. all)} --region=us-east-1 --cluster=non-97-eks'
2024-05-26 09:29:20 [ℹ]
2 sequential tasks: { create cluster control plane "non-97-eks",
2 sequential sub-tasks: {
wait for control plane to become ready,
create managed nodegroup "ng-bf93e531",
}
}
2024-05-26 09:29:20 [ℹ] building cluster stack "eksctl-non-97-eks-cluster"
2024-05-26 09:29:22 [ℹ] deploying stack "eksctl-non-97-eks-cluster"
2024-05-26 09:29:52 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:30:22 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:31:23 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:32:24 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:33:25 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:34:26 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:35:27 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:36:28 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:37:28 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:38:29 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:39:30 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-cluster"
2024-05-26 09:41:36 [ℹ] building managed nodegroup stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:41:38 [ℹ] deploying stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:41:38 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:42:09 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:42:50 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:43:26 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-bf93e531"
2024-05-26 09:43:26 [ℹ] waiting for the control plane to become ready
2024-05-26 09:43:27 [✔] saved kubeconfig as "/<ホームディレクトリパス>/.kube/config"
2024-05-26 09:43:27 [ℹ] no tasks
2024-05-26 09:43:27 [✔] all EKS cluster resources for "non-97-eks" have been created
2024-05-26 09:43:27 [✔] created 0 nodegroup(s) in cluster "non-97-eks"
2024-05-26 09:43:27 [ℹ] nodegroup "ng-bf93e531" has 2 node(s)
2024-05-26 09:43:27 [ℹ] node "ip-192-168-4-116.ec2.internal" is ready
2024-05-26 09:43:27 [ℹ] node "ip-192-168-47-147.ec2.internal" is ready
2024-05-26 09:43:27 [ℹ] waiting for at least 2 node(s) to become ready in "ng-bf93e531"
2024-05-26 09:43:28 [ℹ] nodegroup "ng-bf93e531" has 2 node(s)
2024-05-26 09:43:28 [ℹ] node "ip-192-168-4-116.ec2.internal" is ready
2024-05-26 09:43:28 [ℹ] node "ip-192-168-47-147.ec2.internal" is ready
2024-05-26 09:43:28 [✔] created 1 managed nodegroup(s) in cluster "non-97-eks"
2024-05-26 09:43:35 [ℹ] kubectl command should work with "/<ホームディレクトリパス>/.kube/config", try 'kubectl get nodes'
2024-05-26 09:43:35 [✔] EKS cluster "non-97-eks" in "us-east-1" region is ready
デフォルトでCoreDNSのPodが2つ起動していることを確認します。
$ kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
aws-node-p6kvd 2/2 Running 0 11m
aws-node-wk7dv 2/2 Running 0 11m
coredns-586b798467-cntvg 1/1 Running 0 17m
coredns-586b798467-rf77f 1/1 Running 0 17m
kube-proxy-ml8jb 1/1 Running 0 11m
kube-proxy-nzqwd 1/1 Running 0 11m
$ kubectl get service -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.100.0.10 <none> 53/UDP,53/TCP,9153/TCP 19m
$ kubectl describe pod -n kube-system coredns-586b798467-cntvg
Name: coredns-586b798467-cntvg
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: coredns
Node: ip-192-168-4-116.ec2.internal/192.168.4.116
Start Time: Sun, 26 May 2024 09:42:49 +0900
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
pod-template-hash=586b798467
Annotations: <none>
Status: Running
IP: 192.168.24.128
IPs:
IP: 192.168.24.128
Controlled By: ReplicaSet/coredns-586b798467
Containers:
coredns:
Container ID: containerd://539adf8aa70da096a5512f0b9782cab2877ed9eee8087718004994504c3c922e
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8
Image ID: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns@sha256:d21885a6632343ecd25d468b54681a0bd512055174bb17bc35a08cb38a965f12
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
State: Running
Started: Sun, 26 May 2024 09:42:50 +0900
Ready: True
Restart Count: 0
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-zfvvp (ro)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
kube-api-access-zfvvp:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly op=Exists
node-role.kubernetes.io/control-plane:NoSchedule
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13m (x34 over 18m) default-scheduler no nodes available to schedule pods
Normal Pulling 12m kubelet Pulling image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8"
Normal Pulled 12m kubelet Successfully pulled image "602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8" in 869ms (869ms including waiting). Image size: 17282732 bytes.
Normal Created 12m kubelet Created container coredns
Normal Started 12m kubelet Started container coredns
metrics-serverのインストール
PodやNodeのCPU使用率を確認したいので、metrics-serverをインストールします。
$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
$ kubectl get deployment metrics-server -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 74s
$ kubectl top pod -n kube-system
NAME CPU(cores) MEMORY(bytes)
aws-node-p6kvd 3m 41Mi
aws-node-wk7dv 2m 41Mi
coredns-586b798467-cntvg 1m 12Mi
coredns-586b798467-rf77f 2m 12Mi
kube-proxy-ml8jb 1m 11Mi
kube-proxy-nzqwd 1m 13Mi
metrics-server-7ffbc6d68-49bvd 3m 17Mi
Podのメトリクスを確認できました。
CoreDNSのEKSアドオンの追加
CoreDNSのEKSアドオンの追加をします。
デフォルトではCoreDNSのEKSアドオンは設定されていません。
$ aws eks describe-addon \
--cluster-name non-97-eks \
--addon-name coredns
An error occurred (ResourceNotFoundException) when calling the DescribeAddon operation: No addon: coredns found in cluster: non-97-eks
CoreDNSのEKSアドオンの追加から行います。
追加方法は以下AWS公式ドキュメントに記載されています。
今回はAWS CLIでやってみます。
EKSクラスターにインストールされているCoreDNSのアドオンのバージョンを確認します。
$ kubectl describe deployment coredns \
--namespace kube-system \
| grep coredns: \
| cut -d : -f 3
v1.11.1-eksbuild.8
Deploymentからも確認できます。
$ kubectl get deployment coredns -n kube-system -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: "2024-05-26T00:37:01Z"
generation: 1
labels:
eks.amazonaws.com/component: coredns
k8s-app: kube-dns
kubernetes.io/name: CoreDNS
name: coredns
namespace: kube-system
resourceVersion: "1681"
uid: 5e7ba4c0-3a91-4f07-870d-56a513f5c1f0
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
eks.amazonaws.com/component: coredns
k8s-app: kube-dns
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
eks.amazonaws.com/component: coredns
k8s-app: kube-dns
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
- key: kubernetes.io/arch
operator: In
values:
- amd64
- arm64
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: k8s-app
operator: In
values:
- kube-dns
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- args:
- -conf
- /etc/coredns/Corefile
image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.8
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: coredns
ports:
- containerPort: 53
name: dns
protocol: UDP
- containerPort: 53
name: dns-tcp
protocol: TCP
- containerPort: 9153
name: metrics
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /ready
port: 8181
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 1
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
securityContext:
allowPrivilegeEscalation: false
capabilities:
add:
- NET_BIND_SERVICE
drop:
- ALL
readOnlyRootFilesystem: true
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /etc/coredns
name: config-volume
readOnly: true
dnsPolicy: Default
priorityClassName: system-cluster-critical
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: coredns
serviceAccountName: coredns
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
- key: CriticalAddonsOnly
operator: Exists
topologySpreadConstraints:
- labelSelector:
matchLabels:
k8s-app: kube-dns
maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
volumes:
- configMap:
defaultMode: 420
items:
- key: Corefile
path: Corefile
name: coredns
name: config-volume
status:
availableReplicas: 2
conditions:
- lastTransitionTime: "2024-05-26T00:42:50Z"
lastUpdateTime: "2024-05-26T00:42:50Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2024-05-26T00:37:01Z"
lastUpdateTime: "2024-05-26T00:42:51Z"
message: ReplicaSet "coredns-586b798467" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 2
replicas: 2
updatedReplicas: 2
現在のCoreDNSアドオンと同じバージョンのCoreDNS EKSアドオンを追加します。
$ aws eks create-addon \
--cluster-name non-97-eks \
--addon-name coredns \
--addon-version v1.11.1-eksbuild.8
{
"addon": {
"addonName": "coredns",
"clusterName": "non-97-eks",
"status": "CREATING",
"addonVersion": "v1.11.1-eksbuild.8",
"health": {
"issues": []
},
"addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks/coredns/a2c7d93c-d864-1896-c7ee-065e272910f9",
"createdAt": "2024-05-26T10:17:50.822000+09:00",
"modifiedAt": "2024-05-26T10:17:50.842000+09:00",
"tags": {}
}
}
$ aws eks describe-addon \
--cluster-name non-97-eks \
--addon-name coredns
{
"addon": {
"addonName": "coredns",
"clusterName": "non-97-eks",
"status": "ACTIVE",
"addonVersion": "v1.11.1-eksbuild.8",
"health": {
"issues": []
},
"addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks/coredns/a2c7d93c-d864-1896-c7ee-065e272910f9",
"createdAt": "2024-05-26T10:17:50.822000+09:00",
"modifiedAt": "2024-05-26T10:18:05.190000+09:00",
"tags": {}
}
}
マネジメントコンソール上でもCoreDNSのEKSアドオンが追加されたことを確認できました。
CoreDNSのAuto Scaling設定
CoreDNSのAuto Scaling設定を行います。
試しに最小2、最大10でAuto Scalingするようにします。
$ aws eks update-addon \
--cluster-name non-97-eks \
--addon-name coredns \
--resolve-conflicts PRESERVE \
--configuration-values '{"autoScaling":{"enabled":true}, "minReplicas": 2, "maxReplicas": 10}'
An error occurred (InvalidParameterException) when calling the UpdateAddon operation: ConfigurationValue provided in request is not supported: Json schema validation failed with error: [$.autoScaling: is not defined in the schema and the schema does not allow additional properties, $.minReplicas: is not defined in the schema and the schema does not allow additional properties, $.maxReplicas: is not defined in the schema and the schema does not allow additional properties]
「そんなパラメーターない」と怒られてしまいました。
v1.11.1-eksbuild.8のアドオン設定スキーマを確認すると、確かにautoScaling
はありません。
{
"$ref": "#/definitions/Coredns",
"$schema": "http://json-schema.org/draft-06/schema#",
"definitions": {
"Coredns": {
"additionalProperties": false,
"properties": {
"affinity": {
"default": {
"affinity": {
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "kubernetes.io/os",
"operator": "In",
"values": [
"linux"
]
},
{
"key": "kubernetes.io/arch",
"operator": "In",
"values": [
"amd64",
"arm64"
]
}
]
}
]
}
},
"podAntiAffinity": {
"preferredDuringSchedulingIgnoredDuringExecution": [
{
"podAffinityTerm": {
"labelSelector": {
"matchExpressions": [
{
"key": "k8s-app",
"operator": "In",
"values": [
"kube-dns"
]
}
]
},
"topologyKey": "kubernetes.io/hostname"
},
"weight": 100
}
]
}
}
},
"description": "Affinity of the coredns pods",
"type": [
"object",
"null"
]
},
"computeType": {
"type": "string"
},
"corefile": {
"description": "Entire corefile contents to use with installation",
"type": "string"
},
"nodeSelector": {
"additionalProperties": {
"type": "string"
},
"type": "object"
},
"podAnnotations": {
"properties": {},
"title": "The podAnnotations Schema",
"type": "object"
},
"podDisruptionBudget": {
"description": "podDisruptionBudget configurations",
"enabled": {
"default": true,
"description": "the option to enable managed PDB",
"type": "boolean"
},
"maxUnavailable": {
"anyOf": [
{
"pattern": ".*%$",
"type": "string"
},
{
"type": "integer"
}
],
"default": 1,
"description": "minAvailable value for managed PDB, can be either string or integer; if it's string, should end with %"
},
"minAvailable": {
"anyOf": [
{
"pattern": ".*%$",
"type": "string"
},
{
"type": "integer"
}
],
"description": "maxUnavailable value for managed PDB, can be either string or integer; if it's string, should end with %"
},
"type": "object"
},
"podLabels": {
"properties": {},
"title": "The podLabels Schema",
"type": "object"
},
"replicaCount": {
"type": "integer"
},
"resources": {
"$ref": "#/definitions/Resources"
},
"tolerations": {
"default": [
{
"key": "CriticalAddonsOnly",
"operator": "Exists"
},
{
"effect": "NoSchedule",
"key": "node-role.kubernetes.io/control-plane"
}
],
"description": "Tolerations of the coredns pod",
"items": {
"type": "object"
},
"type": "array"
},
"topologySpreadConstraints": {
"description": "The coredns pod topology spread constraints",
"type": "array"
}
},
"title": "Coredns",
"type": "object"
},
"Limits": {
"additionalProperties": false,
"properties": {
"cpu": {
"type": "string"
},
"memory": {
"type": "string"
}
},
"title": "Limits",
"type": "object"
},
"Resources": {
"additionalProperties": false,
"properties": {
"limits": {
"$ref": "#/definitions/Limits"
},
"requests": {
"$ref": "#/definitions/Limits"
}
},
"title": "Resources",
"type": "object"
}
}
}
最新のv1.11.1-eksbuild.9にアップデートしてみましょう。
$ aws eks update-addon \
--cluster-name non-97-eks \
--addon-name coredns \
--resolve-conflicts PRESERVE \
--addon-version v1.11.1-eksbuild.9
{
"update": {
"id": "212a6290-6a61-357a-b2e6-0637660c6d6f",
"status": "InProgress",
"type": "AddonUpdate",
"params": [
{
"type": "AddonVersion",
"value": "v1.11.1-eksbuild.9"
},
{
"type": "ResolveConflicts",
"value": "PRESERVE"
}
],
"createdAt": "2024-05-26T10:31:49.570000+09:00",
"errors": []
}
}
アップデート後のスキーマを確認するとautoScaling
のプロパティが生えてきました。最大1,000Podまでスケールするようです。
{
"$ref": "#/definitions/Coredns",
"$schema": "http://json-schema.org/draft-06/schema#",
"definitions": {
"Coredns": {
"additionalProperties": false,
"properties": {
"affinity": {
"default": {
"affinity": {
"nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "kubernetes.io/os",
"operator": "In",
"values": [
"linux"
]
},
{
"key": "kubernetes.io/arch",
"operator": "In",
"values": [
"amd64",
"arm64"
]
}
]
}
]
}
},
"podAntiAffinity": {
"preferredDuringSchedulingIgnoredDuringExecution": [
{
"podAffinityTerm": {
"labelSelector": {
"matchExpressions": [
{
"key": "k8s-app",
"operator": "In",
"values": [
"kube-dns"
]
}
]
},
"topologyKey": "kubernetes.io/hostname"
},
"weight": 100
}
]
}
}
},
"description": "Affinity of the coredns pods",
"type": [
"object",
"null"
]
},
"autoScaling": {
"additionalProperties": false,
"description": "autoScaling configurations",
"properties": {
"enabled": {
"default": false,
"description": "the option to enable eks managed autoscaling for coredns",
"type": "boolean"
},
"maxReplicas": {
"description": "the max value that autoscaler can scale up the coredns replicas to",
"maximum": 1000,
"minimum": 2,
"type": "integer"
},
"minReplicas": {
"default": 2,
"description": "the min value that autoscaler can scale down the coredns replicas to",
"maximum": 1000,
"minimum": 2,
"type": "integer"
}
},
"required": [
"enabled"
],
"type": "object"
},
"computeType": {
"type": "string"
},
"corefile": {
"description": "Entire corefile contents to use with installation",
"type": "string"
},
"nodeSelector": {
"additionalProperties": {
"type": "string"
},
"type": "object"
},
"podAnnotations": {
"properties": {},
"title": "The podAnnotations Schema",
"type": "object"
},
"podDisruptionBudget": {
"description": "podDisruptionBudget configurations",
"properties": {
"enabled": {
"default": true,
"description": "the option to enable managed PDB",
"type": "boolean"
},
"maxUnavailable": {
"anyOf": [
{
"pattern": ".*%$",
"type": "string"
},
{
"type": "integer"
}
],
"default": 1,
"description": "maxUnavailable value for managed PDB, can be either string or integer; if it's string, should end with %"
},
"minAvailable": {
"anyOf": [
{
"pattern": ".*%$",
"type": "string"
},
{
"type": "integer"
}
],
"description": "minAvailable value for managed PDB, can be either string or integer; if it's string, should end with %"
}
},
"type": "object"
},
"podLabels": {
"properties": {},
"title": "The podLabels Schema",
"type": "object"
},
"replicaCount": {
"type": "integer"
},
"resources": {
"$ref": "#/definitions/Resources"
},
"tolerations": {
"default": [
{
"key": "CriticalAddonsOnly",
"operator": "Exists"
},
{
"effect": "NoSchedule",
"key": "node-role.kubernetes.io/control-plane"
}
],
"description": "Tolerations of the coredns pod",
"items": {
"type": "object"
},
"type": "array"
},
"topologySpreadConstraints": {
"description": "The coredns pod topology spread constraints",
"type": "array"
}
},
"title": "Coredns",
"type": "object"
},
"Limits": {
"additionalProperties": false,
"properties": {
"cpu": {
"type": "string"
},
"memory": {
"type": "string"
}
},
"title": "Limits",
"type": "object"
},
"Resources": {
"additionalProperties": false,
"properties": {
"limits": {
"$ref": "#/definitions/Limits"
},
"requests": {
"$ref": "#/definitions/Limits"
}
},
"title": "Resources",
"type": "object"
}
}
}
それでは再度設定してみましょう。
$ aws eks update-addon \
--cluster-name non-97-eks \
--addon-name coredns \
--resolve-conflicts PRESERVE \
--configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 2, "maxReplicas": 10}}'
{
"update": {
"id": "99e3ba65-bc09-3b3b-a7f1-5149884a3864",
"status": "InProgress",
"type": "AddonUpdate",
"params": [
{
"type": "ResolveConflicts",
"value": "PRESERVE"
},
{
"type": "ConfigurationValues",
"value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}"
}
],
"createdAt": "2024-05-26T10:39:38.591000+09:00",
"errors": []
}
}
$ aws eks describe-addon \
--cluster-name non-97-eks \
--addon-name coredns
{
"addon": {
"addonName": "coredns",
"clusterName": "non-97-eks",
"status": "ACTIVE",
"addonVersion": "v1.11.1-eksbuild.9",
"health": {
"issues": []
},
"addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks/coredns/a2c7d93c-d864-1896-c7ee-065e272910f9",
"createdAt": "2024-05-26T10:17:50.822000+09:00",
"modifiedAt": "2024-05-26T10:39:41.867000+09:00",
"tags": {},
"configurationValues": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}"
}
}
設定できました。
なお、特にこの時点では動きはありませんでした。
$ kubectl get deployments -n kube-system coredns
NAME READY UP-TO-DATE AVAILABLE AGE
coredns 2/2 2 2 66m
$ kubectl describe deployments coredns -n kube-system
Name: coredns
Namespace: kube-system
CreationTimestamp: Sun, 26 May 2024 09:37:01 +0900
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
kubernetes.io/name=CoreDNS
Annotations: deployment.kubernetes.io/revision: 2
Selector: eks.amazonaws.com/component=coredns,k8s-app=kube-dns
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 25% max surge
Pod Template:
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
Service Account: coredns
Containers:
coredns:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
Priority Class Name: system-cluster-critical
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: coredns-586b798467 (0/0 replicas created)
NewReplicaSet: coredns-86d5d9b668 (2/2 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set coredns-86d5d9b668 to 1
Normal ScalingReplicaSet 11m deployment-controller Scaled down replica set coredns-586b798467 to 1 from 2
Normal ScalingReplicaSet 11m deployment-controller Scaled up replica set coredns-86d5d9b668 to 2 from 1
Normal ScalingReplicaSet 11m deployment-controller Scaled down replica set coredns-586b798467 to 0 from 1
Auto Scalingすることを確認
Node数を10に増やす
Auto Scalingすることを確認していきましょう。
先述のとおり、「Node数に応じてスケールする」といった書きっぷりがドキュメントにあったので、Node数を10に増やして様子をみます。
$ eksctl scale nodegroup \
--cluster=non-97-eks \
--name=ng-bf93e531 \
--nodes=10 \
--nodes-min=10 \
--nodes-max=10
2024-05-26 10:47:06 [ℹ] scaling nodegroup "ng-bf93e531" in cluster non-97-eks
2024-05-26 10:47:08 [ℹ] initiated scaling of nodegroup
2024-05-26 10:47:08 [ℹ] to see the status of the scaling run `eksctl get nodegroup --cluster non-97-eks --region us-east-1 --name ng-bf93e531`
Node数とPod数を確認します。
$ kubectl get node
NAME STATUS ROLES AGE VERSION
ip-192-168-12-9.ec2.internal Ready <none> 15s v1.30.0-eks-fff26e3
ip-192-168-14-243.ec2.internal Ready <none> 17s v1.30.0-eks-fff26e3
ip-192-168-18-3.ec2.internal Ready <none> 17s v1.30.0-eks-fff26e3
ip-192-168-23-86.ec2.internal Ready <none> 12s v1.30.0-eks-fff26e3
ip-192-168-32-210.ec2.internal Ready <none> 17s v1.30.0-eks-fff26e3
ip-192-168-38-149.ec2.internal Ready <none> 16s v1.30.0-eks-fff26e3
ip-192-168-4-116.ec2.internal Ready <none> 65m v1.30.0-eks-fff26e3
ip-192-168-42-185.ec2.internal Ready <none> 17s v1.30.0-eks-fff26e3
ip-192-168-47-147.ec2.internal Ready <none> 65m v1.30.0-eks-fff26e3
ip-192-168-61-1.ec2.internal Ready <none> 17s v1.30.0-eks-fff26e3
$ kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
aws-node-5hn7v 2/2 Running 0 32s
aws-node-895dk 2/2 Running 0 35s
aws-node-gwk9x 2/2 Running 0 37s
aws-node-nlmqb 2/2 Running 0 37s
aws-node-p6kvd 2/2 Running 0 65m
aws-node-ppxpj 2/2 Running 0 37s
aws-node-qbnsr 2/2 Running 0 36s
aws-node-vn45f 2/2 Running 0 37s
aws-node-w2cq2 2/2 Running 0 36s
aws-node-wk7dv 2/2 Running 0 65m
coredns-86d5d9b668-rhvrg 1/1 Running 0 16m
coredns-86d5d9b668-tqc5b 1/1 Running 0 16m
kube-proxy-4vxl5 1/1 Running 0 37s
kube-proxy-8jcp5 1/1 Running 0 37s
kube-proxy-8w7lw 1/1 Running 0 37s
kube-proxy-9t5z2 1/1 Running 0 36s
kube-proxy-gjnqx 1/1 Running 0 35s
kube-proxy-gz6h6 1/1 Running 0 36s
kube-proxy-ml8jb 1/1 Running 0 65m
kube-proxy-nzqwd 1/1 Running 0 65m
kube-proxy-xcnhd 1/1 Running 0 37s
kube-proxy-z4bb5 1/1 Running 0 32s
metrics-server-7ffbc6d68-49bvd 1/1 Running 0 44m
$ kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-86d5d9b668-rhvrg 1/1 Running 0 20m 192.168.3.147 ip-192-168-4-116.ec2.internal <none> <none>
coredns-86d5d9b668-tqc5b 1/1 Running 0 20m 192.168.38.204 ip-192-168-47-147.ec2.internal <none> <none>
$ kubectl describe deployments coredns -n kube-system
Name: coredns
Namespace: kube-system
CreationTimestamp: Sun, 26 May 2024 09:37:01 +0900
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
kubernetes.io/name=CoreDNS
Annotations: deployment.kubernetes.io/revision: 2
Selector: eks.amazonaws.com/component=coredns,k8s-app=kube-dns
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 25% max surge
Pod Template:
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
Service Account: coredns
Containers:
coredns:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
Priority Class Name: system-cluster-critical
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: coredns-586b798467 (0/0 replicas created)
NewReplicaSet: coredns-86d5d9b668 (2/2 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 17m deployment-controller Scaled up replica set coredns-86d5d9b668 to 1
Normal ScalingReplicaSet 17m deployment-controller Scaled down replica set coredns-586b798467 to 1 from 2
Normal ScalingReplicaSet 17m deployment-controller Scaled up replica set coredns-86d5d9b668 to 2 from 1
Normal ScalingReplicaSet 17m deployment-controller Scaled down replica set coredns-586b798467 to 0 from 1
Nodeは10個に増えましたが、CoreDNSのPodは増えていませんね。
t4g.micro のNodeを20個追加
Node全体のCPUのコア数が足りないのでしょうか?
t4g.micro のNodeを20個追加してみます。
$ eksctl create nodegroup \
--cluster=non-97-eks \
--node-type=t4g.nano \
--nodes=20 \
--nodes-min=20 \
--nodes-max=20 \
--node-volume-size=2 \
--node-volume-type=gp3 \
--node-ami-family=Bottlerocket \
--spot \
--managed
2024-05-26 11:04:21 [ℹ] will use version 1.30 for new nodegroup(s) based on control plane version
2024-05-26 11:04:27 [ℹ] nodegroup "ng-4a521135" will use "" [Bottlerocket/1.30]
2024-05-26 11:04:29 [ℹ] 1 existing nodegroup(s) (ng-bf93e531) will be excluded
2024-05-26 11:04:29 [ℹ] 1 nodegroup (ng-4a521135) was included (based on the include/exclude rules)
2024-05-26 11:04:29 [ℹ] will create a CloudFormation stack for each of 1 managed nodegroups in cluster "non-97-eks"
2024-05-26 11:04:30 [ℹ]
2 sequential tasks: { fix cluster compatibility, 1 task: { 1 task: { create managed nodegroup "ng-4a521135" } }
}
2024-05-26 11:04:30 [ℹ] checking cluster stack for missing resources
2024-05-26 11:04:31 [ℹ] cluster stack has all required resources
2024-05-26 11:04:33 [ℹ] building managed nodegroup stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:04:33 [ℹ] deploying stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:04:34 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:05:04 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:05:56 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:07:09 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:08:52 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:10:05 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:11:47 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:13:41 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:14:13 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:15:41 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:17:32 [ℹ] waiting for CloudFormation stack "eksctl-non-97-eks-nodegroup-ng-4a521135"
2024-05-26 11:17:33 [ℹ] no tasks
2024-05-26 11:17:33 [✔] created 0 nodegroup(s) in cluster "non-97-eks"
2024-05-26 11:17:34 [ℹ] nodegroup "ng-4a521135" has 20 node(s)
2024-05-26 11:17:34 [ℹ] node "ip-192-168-14-142.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ] node "ip-192-168-14-214.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ] node "ip-192-168-19-169.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ] node "ip-192-168-20-134.ec2.internal" is ready
.
.
(中略)
.
.
2024-05-26 11:17:34 [ℹ] node "ip-192-168-6-253.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ] node "ip-192-168-60-173.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ] node "ip-192-168-62-70.ec2.internal" is ready
2024-05-26 11:17:34 [ℹ] node "ip-192-168-9-217.ec2.internal" is ready
2024-05-26 11:17:34 [✔] created 1 managed nodegroup(s) in cluster "non-97-eks"
2024-05-26 11:17:36 [ℹ] checking security group configuration for all nodegroups
2024-05-26 11:17:36 [ℹ] all nodegroups have up-to-date cloudformation templates
Node数とPod数を確認します。
$ kubectl get node
NAME STATUS ROLES AGE VERSION
ip-192-168-12-9.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3
ip-192-168-14-142.ec2.internal Ready <none> 9m18s v1.30.0-eks-fff26e3
ip-192-168-14-214.ec2.internal Ready <none> 61s v1.30.0-eks-fff26e3
ip-192-168-14-243.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3
ip-192-168-18-3.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3
ip-192-168-19-169.ec2.internal Ready <none> 52s v1.30.0-eks-fff26e3
ip-192-168-20-134.ec2.internal Ready <none> 11m v1.30.0-eks-fff26e3
ip-192-168-20-208.ec2.internal Ready <none> 51s v1.30.0-eks-fff26e3
ip-192-168-22-87.ec2.internal Ready <none> 9m9s v1.30.0-eks-fff26e3
ip-192-168-23-86.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3
ip-192-168-25-248.ec2.internal Ready <none> 48s v1.30.0-eks-fff26e3
ip-192-168-30-238.ec2.internal Ready <none> 53s v1.30.0-eks-fff26e3
ip-192-168-32-210.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3
ip-192-168-34-209.ec2.internal Ready <none> 6m55s v1.30.0-eks-fff26e3
ip-192-168-38-149.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3
ip-192-168-4-116.ec2.internal Ready <none> 94m v1.30.0-eks-fff26e3
ip-192-168-4-197.ec2.internal Ready <none> 11m v1.30.0-eks-fff26e3
ip-192-168-4-37.ec2.internal Ready <none> 9m6s v1.30.0-eks-fff26e3
ip-192-168-40-63.ec2.internal Ready <none> 5m25s v1.30.0-eks-fff26e3
ip-192-168-42-155.ec2.internal Ready <none> 5m12s v1.30.0-eks-fff26e3
ip-192-168-42-185.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3
ip-192-168-42-213.ec2.internal Ready <none> 5m16s v1.30.0-eks-fff26e3
ip-192-168-47-147.ec2.internal Ready <none> 94m v1.30.0-eks-fff26e3
ip-192-168-47-240.ec2.internal Ready <none> 5m13s v1.30.0-eks-fff26e3
ip-192-168-58-141.ec2.internal Ready <none> 5m13s v1.30.0-eks-fff26e3
ip-192-168-6-253.ec2.internal Ready <none> 48s v1.30.0-eks-fff26e3
ip-192-168-60-173.ec2.internal Ready <none> 5m13s v1.30.0-eks-fff26e3
ip-192-168-61-1.ec2.internal Ready <none> 29m v1.30.0-eks-fff26e3
ip-192-168-62-70.ec2.internal Ready <none> 5m12s v1.30.0-eks-fff26e3
ip-192-168-9-217.ec2.internal Ready <none> 11m v1.30.0-eks-fff26e3
$ kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-86d5d9b668-rhvrg 1/1 Running 0 45m 192.168.3.147 ip-192-168-4-116.ec2.internal <none> <none>
coredns-86d5d9b668-tqc5b 1/1 Running 0 45m 192.168.38.204 ip-192-168-47-147.ec2.internal <none> <none>
$ kubectl describe deployments coredns -n kube-system
Name: coredns
Namespace: kube-system
CreationTimestamp: Sun, 26 May 2024 09:37:01 +0900
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
kubernetes.io/name=CoreDNS
Annotations: deployment.kubernetes.io/revision: 2
Selector: eks.amazonaws.com/component=coredns,k8s-app=kube-dns
Replicas: 2 desired | 2 updated | 2 total | 2 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 25% max surge
Pod Template:
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
Service Account: coredns
Containers:
coredns:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
Priority Class Name: system-cluster-critical
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: coredns-586b798467 (0/0 replicas created)
NewReplicaSet: coredns-86d5d9b668 (2/2 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 47m deployment-controller Scaled up replica set coredns-86d5d9b668 to 1
Normal ScalingReplicaSet 47m deployment-controller Scaled down replica set coredns-586b798467 to 1 from 2
Normal ScalingReplicaSet 47m deployment-controller Scaled up replica set coredns-86d5d9b668 to 2 from 1
Normal ScalingReplicaSet 47m deployment-controller Scaled down replica set coredns-586b798467 to 0 from 1
CoreDNSのPod数は変わりありません。
最小を3、最大を1,000に設定してみる
Auto Scalingの設定を最小を3、最大を1,000に設定してみて挙動を確認します。
$ aws eks update-addon \
--cluster-name non-97-eks \
--addon-name coredns \
--resolve-conflicts PRESERVE \
--configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 3, "maxReplicas": 1000}}'
{
"update": {
"id": "b0625d52-3a49-3d93-94d3-049ce5e98ff5",
"status": "InProgress",
"type": "AddonUpdate",
"params": [
{
"type": "ResolveConflicts",
"value": "PRESERVE"
},
{
"type": "ConfigurationValues",
"value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 3, \"maxReplicas\": 1000}}"
}
],
"createdAt": "2024-05-26T11:34:14.326000+09:00",
"errors": []
}
}
Pod数を確認します。
$ kubectl get pods -l k8s-app=kube-dns -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-86d5d9b668-65xcp 1/1 Running 0 29s 192.168.9.74 ip-192-168-23-86.ec2.internal <none> <none>
coredns-86d5d9b668-rhvrg 1/1 Running 0 64m 192.168.3.147 ip-192-168-4-116.ec2.internal <none> <none>
coredns-86d5d9b668-tqc5b 1/1 Running 0 64m 192.168.38.204 ip-192-168-47-147.ec2.internal <none> <none>
$ kubectl rollout history deployment/coredns -n kube-system
deployment.apps/coredns
REVISION CHANGE-CAUSE
1 <none>
2 <none>
$ kubectl describe deployments coredns -n kube-system
Name: coredns
Namespace: kube-system
CreationTimestamp: Sun, 26 May 2024 09:37:01 +0900
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
kubernetes.io/name=CoreDNS
Annotations: deployment.kubernetes.io/revision: 2
Selector: eks.amazonaws.com/component=coredns,k8s-app=kube-dns
Replicas: 3 desired | 3 updated | 3 total | 3 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 1 max unavailable, 25% max surge
Pod Template:
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
Service Account: coredns
Containers:
coredns:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
Priority Class Name: system-cluster-critical
Conditions:
Type Status Reason
---- ------ ------
Available True MinimumReplicasAvailable
Progressing True NewReplicaSetAvailable
OldReplicaSets: coredns-586b798467 (0/0 replicas created)
NewReplicaSet: coredns-86d5d9b668 (3/3 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 70s deployment-controller Scaled up replica set coredns-86d5d9b668 to 3 from 2
CoreDNSで名前解決をして負荷をかける dig 編
CoreDNSで名前解決をして負荷をかけた時の挙動を確認します。
定期的にdigを叩くシェルスクリプトを実行するコンテナを用意します。Dockerfileとスクリプトは以下のとおりです。
Dockerfile
FROM alpine:latest
RUN apk add --no-cache bind-tools bash
COPY ./dns-resolution.sh /usr/local/bin/
CMD ["/bin/bash", "/usr/local/bin/dns-resolution.sh"]
./dns-resolution.sh
#!/bin/bash
set -xu
DOMAIN="${DOMAIN:-www.non-97.net}"
INTERVAL="${INTERVAL:-5}"
while true; do
dig "${DOMAIN}" +short
sleep "${INTERVAL}"
done
コンテナイメージをビルドして、作成したECRリポジトリにPushします。
$ docker build -t dns-resolution .
[+] Building 2.8s (8/8) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 191B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/library/alpine:latest 2.7s
=> [1/3] FROM docker.io/library/alpine:latest@sha256:77726ef6b57ddf65bb551896826ec38bc3e53f75cdde31354fbffb4f25238ebd 0.0s
=> CACHED [2/3] RUN apk add --no-cache bind-tools bash 0.0s
=> [internal] load build context 0.0s
=> => transferring context: 39B 0.0s
=> [3/3] COPY ./dns-resolution.sh /usr/local/bin/ 0.0s
=> exporting to image 0.1s
=> => exporting layers 0.1s
=> => writing image sha256:2db16b22688d82039680917bfca20c70a811f8c18f49d51bd1c4caa7a5587873 0.0s
=> => naming to docker.io/library/dns-resolution 0.0s
$ set AWS_ACCOUNT_ID (aws sts get-caller-identity --output text --query Account)
$ set AWS_REGION (aws configure get region)
$ aws ecr get-login-password \
| docker login \
--username AWS \
--password-stdin https://$AWS_ACCOUNT_ID.dkr.ecr.$AWS_REGION.amazonaws.com
Login Succeeded
$ aws ecr create-repository --repository-name dns-resolution
{
"repository": {
"repositoryArn": "arn:aws:ecr:us-east-1:<AWSアカウントID>:repository/dns-resolution",
"registryId": "<AWSアカウントID>",
"repositoryName": "dns-resolution",
"repositoryUri": "<AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution",
"createdAt": "2024-05-26T16:02:59.144000+09:00",
"imageTagMutability": "MUTABLE",
"imageScanningConfiguration": {
"scanOnPush": false
},
"encryptionConfiguration": {
"encryptionType": "AES256"
}
}
}
$ set dns_resolution_repo (aws ecr describe-repositories \
--repository-names dns-resolution \
--query 'repositories[0].repositoryUri' \
--output text
)
$ docker tag dns-resolution:latest $dns_resolution_repo:latest
$ docker image ls | grep dns-resolution
<AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution latest 2db16b22688d 33 seconds ago 22.3MB
dns-resolution latest 2db16b22688d 33 seconds ago 22.3MB
$ docker push $dns_resolution_repo:latest
The push refers to repository [<AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution]
b51587f43b2b: Pushed
5d35fe5c895f: Pushed
50171d1acbd5: Pushed
latest: digest: sha256:28fb6d58c37b6cb43ccb10cbe82d3565306c7dd9addb33678a3cb08bd7990101 size: 946
用意したコンテナを実行するマニフェストファイルを作成します。
./dns-resolution-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: dns-resolution
namespace: default
spec:
selector:
matchLabels:
app: dns-resolution
replicas: 2
template:
metadata:
labels:
app: dns-resolution
spec:
containers:
- name: dns-resolution
image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest
env:
- name: FQDN
value: www.non-97.net
- name: INTERVAL
value: "3"
デプロイします。
$ kubectl apply -f ./dns-resolution-deployment.yml
deployment.apps/dns-resolution configured
$ kubectl get pod -n default
NAME READY STATUS RESTARTS AGE
dns-resolution-7cd64b54cf-bkwr4 1/1 Running 0 45s
dns-resolution-7cd64b54cf-vsk79 1/1 Running 11 (6m15s ago) 32m
$ stern dns-resolution -n default
+ dns-resolution-7cd64b54cf-bkwr4 › dns-resolution
+ dns-resolution-7cd64b54cf-vsk79 › dns-resolution
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + DOMAIN=www.non-97.net
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + INTERVAL=3
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + true
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + dig www.non-97.net +short
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + sleep 3
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + true
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + dig www.non-97.net +short
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + sleep 3
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + true
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + dig www.non-97.net +short
dns-resolution-7cd64b54cf-bkwr4 dns-resolution + sleep 3
.
.
(以下略)
.
.
$ kubectl top pod -n kube-system
NAME CPU(cores) MEMORY(bytes)
aws-node-8c5v7 3m 41Mi
aws-node-f9lrm 3m 42Mi
aws-node-nnjln 3m 42Mi
coredns-86d5d9b668-j76c6 2m 12Mi
coredns-86d5d9b668-jltmm 2m 12Mi
coredns-86d5d9b668-vqgqp 1m 12Mi
kube-proxy-62hmp 1m 11Mi
kube-proxy-gncj8 1m 11Mi
kube-proxy-wx7hr 1m 12Mi
metrics-server-7ffbc6d68-4srp5 3m 18Mi
流石にこの程度ではCoreDNSのPodの負荷はかかっていません。
100個同時に実行してみます。マニフェストファイルは以下のとおりです。
./dns-resolution-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: dns-resolution
namespace: default
spec:
selector:
matchLabels:
app: dns-resolution
replicas: 100
template:
metadata:
labels:
app: dns-resolution
spec:
containers:
- name: dns-resolution
image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest
env:
- name: FQDN
value: www.non-97.net
- name: INTERVAL
value: "0.000001"
流石にNodeが足りないと思うので、Nodeを10個用意します。
$ eksctl scale nodegroup \
--cluster=non-97-eks \
--name=ng-b0fc3fde \
--nodes=10 \
--nodes-min=10 \
--nodes-max=10
2024-05-26 17:00:40 [ℹ] scaling nodegroup "ng-b0fc3fde" in cluster non-97-eks
2024-05-26 17:00:43 [ℹ] initiated scaling of nodegroup
2024-05-26 17:00:43 [ℹ] to see the status of the scaling run `eksctl get nodegroup --cluster non-97-eks --region us-east-1 --name ng-b0fc3fde`
$ kubectl get deployment -n default
NAME READY UP-TO-DATE AVAILABLE AGE
dns-resolution 86/100 100 86 52m
$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-86d5d9b668-j76c6 53m 15Mi
coredns-86d5d9b668-jltmm 55m 15Mi
coredns-86d5d9b668-vqgqp 53m 15Mi
digを叩くPodが86個起動しましたが、CoreDNSの負荷が足りません。
CoreDNSで名前解決をして負荷をかける dnsperf 編
digではなく、DNSのベンチマークツールであるdnsperfを実行するように変更します。
CoreDNSのサービスのクラスターIPアドレスに対して名前解決をします。
事前準備として、dig編と同じになるようにCoreDNSのPodの最小を2、最大10に変更しておきます。
$ aws eks update-addon \
--cluster-name non-97-eks \
--addon-name coredns \
--resolve-conflicts PRESERVE \
--configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 2, "maxReplicas": 10}}'
{
"update": {
"id": "eabb479c-769f-371e-869a-3abeeb23cfbc",
"status": "InProgress",
"type": "AddonUpdate",
"params": [
{
"type": "ResolveConflicts",
"value": "PRESERVE"
},
{
"type": "ConfigurationValues",
"value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}"
}
],
"createdAt": "2024-05-26T17:09:47.277000+09:00",
"errors": []
}
}
Dockerfileとシェルスクリプトは以下のとおりです。
Dockerfile
FROM --platform=linux/arm64 ubuntu:latest AS build
RUN apt-get update && apt-get install -y dnsperf
FROM --platform=linux/arm64 ubuntu:latest
COPY --from=build /usr/bin/dnsperf /usr/bin/
COPY --from=build /usr/lib/aarch64-linux-gnu/libldns.so.3 /usr/lib/
COPY --from=build /usr/lib/aarch64-linux-gnu/libnghttp2.so.14 /usr/lib/
COPY ./dns-resolution.sh /usr/local/bin/
CMD ["/bin/bash", "/usr/local/bin/dns-resolution.sh"]
./dns-resolution.sh
#!/bin/bash
set -u
DOMAIN="${DOMAIN:-www.non-97.net}"
SERVER_ADDR="${SERVER_ADDR:-10.100.0.10}"
MAXRUNS="${MAXRUNS:-5}"
CLIENTS="${CLIENTS:-1}"
echo "${DOMAIN} A" >"query_random_list.txt"
while true; do
dnsperf -d query_random_list.txt -l "${MAXRUNS}" -s "${SERVER_ADDR}" -c "${CLIENTS}"
done
ビルドしてECRにPushしたのち、以下マニフェストファイルで起動します。
./dns-resolution-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: dns-resolution
namespace: default
spec:
selector:
matchLabels:
app: dns-resolution
replicas: 2
template:
metadata:
labels:
app: dns-resolution
spec:
containers:
- name: dns-resolution
image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest
env:
- name: FQDN
value: www.non-97.net
- name: SERVER_ADDR
value: 10.100.0.10
- name: MAXRUNS
value: "10"
- name: CLIENTS
value: "10"
Podの様子を確認します。
$ kubectl apply -f ./dns-resolution-deployment.yml
deployment.apps/dns-resolution configured
$ kubectl get pod -n default
NAME READY STATUS RESTARTS AGE
dns-resolution-55c8658865-mp7sj 1/1 Running 0 13s
dns-resolution-55c8658865-t7qth 1/1 Running 0 13s
$ stern dns-resolution -n default
+ dns-resolution-55c8658865-mp7sj › dns-resolution
+ dns-resolution-55c8658865-t7qth › dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution DNS Performance Testing Tool
dns-resolution-55c8658865-mp7sj dns-resolution Version 2.14.0
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Command line: dnsperf -d query_random_list.txt -l 10 -s 10.100.0.10 -c 10
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Sending queries (to 10.100.0.10:53)
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Started at: Sun May 26 23:05:24 2024
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Stopping after 10.000000 seconds
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Testing complete (time limit)
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution Statistics:
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution Queries sent: 244371
dns-resolution-55c8658865-mp7sj dns-resolution Queries completed: 244371 (100.00%)
dns-resolution-55c8658865-mp7sj dns-resolution Queries lost: 0 (0.00%)
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution Response codes: SERVFAIL 244371 (100.00%)
dns-resolution-55c8658865-mp7sj dns-resolution Average packet size: request 32, response 32
dns-resolution-55c8658865-mp7sj dns-resolution Run time (s): 10.008380
dns-resolution-55c8658865-mp7sj dns-resolution Queries per second: 24416.638857
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution Average Latency (s): 0.003310 (min 0.000047, max 0.059284)
dns-resolution-55c8658865-mp7sj dns-resolution Latency StdDev (s): 0.002724
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution DNS Performance Testing Tool
dns-resolution-55c8658865-mp7sj dns-resolution Version 2.14.0
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Command line: dnsperf -d query_random_list.txt -l 10 -s 10.100.0.10 -c 10
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Sending queries (to 10.100.0.10:53)
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Started at: Sun May 26 23:05:34 2024
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Stopping after 10.000000 seconds
dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 5432
dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 5438
.
.
(中略)
.
.
dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 40701
dns-resolution-55c8658865-mp7sj dns-resolution [Timeout] Query timed out: msg id 40428
dns-resolution-55c8658865-mp7sj dns-resolution [Status] Testing complete (time limit)
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution Statistics:
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution Queries sent: 411875
dns-resolution-55c8658865-mp7sj dns-resolution Queries completed: 411852 (99.99%)
dns-resolution-55c8658865-mp7sj dns-resolution Queries lost: 23 (0.01%)
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution Response codes: SERVFAIL 411852 (100.00%)
dns-resolution-55c8658865-mp7sj dns-resolution Average packet size: request 32, response 32
dns-resolution-55c8658865-mp7sj dns-resolution Run time (s): 10.006569
dns-resolution-55c8658865-mp7sj dns-resolution Queries per second: 41158.163203
dns-resolution-55c8658865-mp7sj dns-resolution
dns-resolution-55c8658865-mp7sj dns-resolution Average Latency (s): 0.001978 (min 0.000038, max 0.044901)
dns-resolution-55c8658865-mp7sj dns-resolution Latency StdDev (s): 0.001715
dns-resolution-55c8658865-mp7sj dns-resolution
$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-86d5d9b668-jc4cx 1413m 19Mi
coredns-86d5d9b668-nd792 1549m 20Mi
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-192-168-0-143.ec2.internal 28m 1% 363Mi 26%
ip-192-168-51-53.ec2.internal 1847m 95% 396Mi 29%
ip-192-168-8-43.ec2.internal 1873m 97% 388Mi 28%
かなりCoreDNSのPodのCPU負荷が高まってきました。
しかし、このまま放置してもCoreDNSのPod数は変わりありませんでした。
Node数を4つに、dnsperfのPodを10個に実行するようにしても変わりありませんでした。
$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-86d5d9b668-jc4cx 1411m 23Mi
coredns-86d5d9b668-nd792 1369m 17Mi
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-192-168-0-143.ec2.internal 1094m 56% 409Mi 29%
ip-192-168-33-71.ec2.internal 1202m 62% 362Mi 26%
ip-192-168-51-53.ec2.internal 1869m 96% 418Mi 30%
ip-192-168-8-43.ec2.internal 1913m 99% 423Mi 31%
Kubernetes 1.29で再チャレンジ
EKSクラスターの作成
もしかすると、Kubernetes 1.30を使用しているのが良くないのでしょうか。
AWS公式ドキュメントには確かにKubernetes 1.30への言及はありませんね。
Autoscaling CoreDNS - Amazon EKS
Kubernetes 1.29で再チャレンジします。
EKSクラスターを再作成します。
$ eksctl create cluster \
--name=non-97-eks-129 \
--version 1.29 \
--nodes=4 \
--nodes-min=4 \
--nodes-max=4 \
--node-volume-size=0 \
--node-volume-type=gp3 \
--node-ami-family=Bottlerocket \
--instance-types=t4g.small \
--spot \
--managed \
--region us-east-1
$ aws eks describe-cluster \
--name non-97-eks-129 \
--query cluster.version
"1.29"
$ aws eks describe-cluster \
--name non-97-eks-129 \
--query cluster.platformVersion
"eks.7"
metrics-serverのインストールをします。
$ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
kubectl get deployment metrics-server -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
metrics-server 1/1 1 1 15m
CoreDNSのAuto Scaling設定
CoreDNSのAuto Scaling設定をします。
$ aws eks create-addon \
--cluster-name non-97-eks-129 \
--addon-name coredns \
--addon-version v1.11.1-eksbuild.9
{
"addon": {
"addonName": "coredns",
"clusterName": "non-97-eks-129",
"status": "CREATING",
"addonVersion": "v1.11.1-eksbuild.9",
"health": {
"issues": []
},
"addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks-129/coredns/c4c7dc2a-46ad-4837-0397-57b8ef4fa96a",
"createdAt": "2024-05-27T13:35:00.355000+09:00",
"modifiedAt": "2024-05-27T13:35:00.389000+09:00",
"tags": {}
}
}
$ aws eks update-addon \
--cluster-name non-97-eks-129 \
--addon-name coredns \
--resolve-conflicts PRESERVE \
--configuration-values '{"autoScaling":{"enabled":true, "minReplicas": 2, "maxReplicas": 10}}'
{
"update": {
"id": "8b3ac3f8-819a-3f57-9fd2-97389feed79c",
"status": "InProgress",
"type": "AddonUpdate",
"params": [
{
"type": "ResolveConflicts",
"value": "PRESERVE"
},
{
"type": "ConfigurationValues",
"value": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}"
}
],
"createdAt": "2024-05-27T13:35:59.154000+09:00",
"errors": []
}
}
$ aws eks describe-addon \
--cluster-name non-97-eks-129 \
--addon-name coredns
{
"addon": {
"addonName": "coredns",
"clusterName": "non-97-eks-129",
"status": "ACTIVE",
"addonVersion": "v1.11.1-eksbuild.9",
"health": {
"issues": []
},
"addonArn": "arn:aws:eks:us-east-1:<AWSアカウントID>:addon/non-97-eks-129/coredns/c4c7dc2a-46ad-4837-0397-57b8ef4fa96a",
"createdAt": "2024-05-27T13:35:00.355000+09:00",
"modifiedAt": "2024-05-27T13:36:02.438000+09:00",
"tags": {},
"configurationValues": "{\"autoScaling\":{\"enabled\":true, \"minReplicas\": 2, \"maxReplicas\": 10}}"
}
}
CoreDNSのPod数を確認しておきます。
$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-bf47b49b-kzxdh 1m 11Mi
coredns-bf47b49b-qf9jm 1m 11Mi
t4g.micro のNodeを10個追加
t4g.micro のNodeを10個追加します。
$ eksctl create nodegroup \
--cluster=non-97-eks-129 \
--node-type=t4g.micro \
--nodes=10 \
--nodes-min=10 \
--nodes-max=10 \
--node-volume-size=2 \
--node-volume-type=gp3 \
--node-ami-family=Bottlerocket \
--spot \
--managed
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-192-168-1-0.ec2.internal 22m 1% 277Mi 54%
ip-192-168-11-78.ec2.internal 17m 0% 272Mi 53%
ip-192-168-12-236.ec2.internal 22m 1% 454Mi 33%
ip-192-168-14-155.ec2.internal 17m 0% 273Mi 53%
ip-192-168-24-76.ec2.internal 33m 1% 270Mi 52%
ip-192-168-3-239.ec2.internal 19m 0% 282Mi 55%
ip-192-168-38-149.ec2.internal 29m 1% 266Mi 51%
ip-192-168-41-178.ec2.internal 17m 0% 262Mi 51%
ip-192-168-48-176.ec2.internal 16m 0% 363Mi 26%
ip-192-168-48-185.ec2.internal 22m 1% 407Mi 29%
ip-192-168-49-7.ec2.internal 22m 1% 274Mi 53%
ip-192-168-57-3.ec2.internal 24m 1% 276Mi 53%
ip-192-168-59-122.ec2.internal 19m 0% 420Mi 30%
ip-192-168-61-175.ec2.internal 20m 1% 283Mi 55%
$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-bf47b49b-f7rzg 1m 12Mi
coredns-bf47b49b-kzxdh 1m 12Mi
CoreDNSのPod数は2つのままでした。
このまま1時間弱放置しましたが、2つのままでした。
CoreDNSで名前解決をして負荷をかける
dnsperf
を使ってCoreDNSに負荷をかけます。
使用したマニフェストファイルは以下のとおりです。
./dns-resolution-deployment.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: dns-resolution
namespace: default
spec:
selector:
matchLabels:
app: dns-resolution
replicas: 5
template:
metadata:
labels:
app: dns-resolution
spec:
containers:
- name: dns-resolution
image: <AWSアカウントID>.dkr.ecr.us-east-1.amazonaws.com/dns-resolution:latest
env:
- name: FQDN
value: www.non-97.net
- name: SERVER_ADDR
value: 10.100.0.10
- name: MAXRUNS
value: "10"
- name: CLIENTS
value: "15"
デプロイしてCoreDNSのPod数を確認します。
$ kubectl apply -f ./dns-resolution-deployment.yml
deployment.apps/dns-resolution configured
$ kubectl get pod -n default
NAME READY STATUS RESTARTS AGE
dns-resolution-75985fd469-99mqj 1/1 Running 0 75s
dns-resolution-75985fd469-k5g4v 1/1 Running 0 75s
dns-resolution-75985fd469-v7h8w 1/1 Running 0 75s
dns-resolution-75985fd469-z4hsv 1/1 Running 0 75s
dns-resolution-75985fd469-zvv9c 1/1 Running 0 75s
$ kubectl top pod -l k8s-app=kube-dns -n kube-system
NAME CPU(cores) MEMORY(bytes)
coredns-bf47b49b-f7rzg 1530m 22Mi
coredns-bf47b49b-w9r87 1511m 19Mi
$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-192-168-1-0.ec2.internal 478m 24% 301Mi 58%
ip-192-168-11-78.ec2.internal 16m 0% 269Mi 52%
ip-192-168-12-236.ec2.internal 1854m 96% 495Mi 36%
ip-192-168-14-155.ec2.internal 17m 0% 261Mi 51%
ip-192-168-19-132.ec2.internal 487m 25% 412Mi 30%
ip-192-168-24-76.ec2.internal 26m 1% 265Mi 51%
ip-192-168-3-239.ec2.internal 236m 12% 275Mi 53%
ip-192-168-38-149.ec2.internal 20m 1% 262Mi 51%
ip-192-168-41-178.ec2.internal 20m 1% 266Mi 52%
ip-192-168-48-176.ec2.internal 543m 28% 415Mi 30%
ip-192-168-48-185.ec2.internal 1849m 95% 452Mi 33%
ip-192-168-49-7.ec2.internal 23m 1% 269Mi 52%
ip-192-168-57-3.ec2.internal 15m 0% 273Mi 53%
ip-192-168-61-175.ec2.internal 24m 1% 264Mi 51%
変わらず2つのままです。数分時間を置きましたが、変わりませんでした。
結局、具体的にどのような条件でCoreDNSのPodがAuto Scalingは不明でした。
ちなみにkubectl rollout history
を確認しても、動きは特にありませんでした。
kubectl rollout history deployment/coredns -n kube-system
deployment.apps/coredns
REVISION CHANGE-CAUSE
1 <none>
2 <none>
$ kubectl rollout history deployment/coredns -n kube-system --revision 1
deployment.apps/coredns with revision #1
Pod Template:
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
pod-template-hash=54d6f577c6
Service Account: coredns
Containers:
coredns:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.4
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
Priority Class Name: system-cluster-critical
$ kubectl rollout history deployment/coredns -n kube-system --revision 2
deployment.apps/coredns with revision #2
Pod Template:
Labels: eks.amazonaws.com/component=coredns
k8s-app=kube-dns
pod-template-hash=bf47b49b
Service Account: coredns
Containers:
coredns:
Image: 602401143452.dkr.ecr.us-east-1.amazonaws.com/eks/coredns:v1.11.1-eksbuild.9
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
-conf
/etc/coredns/Corefile
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/etc/coredns from config-volume (ro)
Volumes:
config-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: coredns
Optional: false
Topology Spread Constraints: topology.kubernetes.io/zone:ScheduleAnyway when max skew 1 is exceeded for selector k8s-app=kube-dns
Priority Class Name: system-cluster-critical
EKSアドオンでCoreDNSのAuto Scaling設定ができるようになりました
Amazon EKS が CoreDNS PodのAuto Scalingをネイティブサポートしたアップデートを紹介しました。
EKSアドオンでCoreDNSのAuto Scaling設定ができるようになったのは嬉しいですね。
ただ、個人的にはどのタイミングでCoreDNSのPodがスケールするのか具体的な条件が分からなかったのが気になります。時間があれば再度検証して追記しようと思います。
この記事が誰かの助けになれば幸いです。
以上、AWS事業本部 コンサルティング部の のんピ(@non____97)でした!