EKS マネージドノードグループから Auto Mode に移行してみた

EKS マネージドノードグループから Auto Mode に移行してみた

EKS のマネージドノードグループから Auto Mode へのインプレース移行を試してみました。

https://docs.aws.amazon.com/eks/latest/userguide/auto-migrate-mng.html

移行に際して、可能であれば新しくクラスターを横に立てて Blue/Green 的にやる方が良さそうには見えてますが、インプレースでの移行も可能ってことでとりあえず試してみました!

マネージドノードグループを利用した EKS を作成

Terraform の AWS EKS Terraform module を利用して、非 Auto Mode でマネージドノードグループが関連付けられた EKS クラスターを作成します。

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "~> 5.17.0"

  name = "eks-vpc"

  cidr = "10.0.0.0/16"

  azs             = ["ap-northeast-1a", "ap-northeast-1c", "ap-northeast-1d"]
  public_subnets  = ["10.0.0.0/24", "10.0.1.0/24", "10.0.2.0/24"]
  private_subnets = ["10.0.100.0/24", "10.0.101.0/24", "10.0.102.0/24"]

  enable_nat_gateway = true
  single_nat_gateway = true

  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }

}


module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.35.0"

  cluster_name                    = local.cluster_name
  cluster_version                 = "1.32"
  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  enable_cluster_creator_admin_permissions = true

  eks_managed_node_groups = {
    default = {
      name = "default"

      instance_types = ["t3.small"]

      min_size     = 1
      max_size     = 3
      desired_size = 1
    }
  }

  bootstrap_self_managed_addons = false

  cluster_addons = {
    coredns = {
      version = "v1.11.4-eksbuild.2"
    }
    kube-proxy = {
      version = "v1.32.0-eksbuild.2"
    }
    vpc-cni = {
      version        = "v1.19.3-eksbuild.1"
      before_compute = true
    }
    eks-pod-identity-agent = {
      version = "v1.3.5-eksbuild.2"
    }
  }
}

[補足]
あくまで Terraform の AWS EKS Terraform module に関する話になりますが、bootstrap_self_managed_addons = false として組み込みネットワーキングアドオンを作成しないようにしています。
本属性は、非 Auto Mode の際はデフォルトで true なのですが、Auto Mode を有効化するとデフォルトが false になります。
その上に、変更すると強制的に replace 扱いになる属性なので、不要なクラスターリプレイスを引き起こさないように明示的に設定しています。

https://github.com/terraform-aws-modules/terraform-aws-eks/issues/3266

そもそも EKS アドオンを利用している場合、組み込みネットワーキングアドオンは不要です。
また、bootstrap_self_managed_addons はクラスター作成時に効果を発揮する属性です。
明示的に設定しておらずに true 扱いになっていた場合は、bootstrap_self_managed_addons = true と指定しておくことで、クラスターをリプレイスせずに Auto Mode を有効化可能です。

https://dev.classmethod.jp/articles/eks-built-in-networiking-addons/

Kubernetes リソースのデプロイと確認

下記マニフェストファイルを利用して、アプリケーションをデプロイしておきます。

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: default
  name: nginx
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app.kubernetes.io/name: nginx
    spec:
      containers:
        - image: nginx
          imagePullPolicy: Always
          name: nginx
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "0.5"

また、この段階でクラスター上の Kubernetes リソースの状態を確認しておきます。
今回、aws-node(VPC CNI driver)、CoreDNS、kube-proxy、Pod Identity Agent をアドオン経由でインストールしており、これらのコンポーネントは EKS Auto Mode 移行後には AWS 管理になります。
EKS アドオンを利用した場合、CoreDNS のみ DamonSet としてデプロイされ、残りは Deployment としてデプロイされています。
また、CoreDNS の Pod は ClusterIP 経由でアクセスできる状態になっており、ClusterIP が利用する IP はサービス IPv4 範囲(コントロールプレーン側が利用する IP) のレンジ内から選ばれます。

eks1.png

Nginx の Pod で /etc/resolve.conf を確認すると、172.20.0.10 になっています。

# cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local ap-northeast-1.compute.internal
nameserver 172.20.0.10
options ndots:5

この IP は、kube-dns という名前の ClusterIP に相当します。

% kubectl get svc kube-dns -n kube-system
NAME       TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   172.20.0.10   <none>        53/UDP,53/TCP,9153/TCP   26m

Endpoints に複数の IP が指定されています。

% kubectl describe service/kube-dns -n kube-system
Name:                     kube-dns
Namespace:                kube-system
Labels:                   eks.amazonaws.com/component=kube-dns
                          k8s-app=kube-dns
                          kubernetes.io/cluster-service=true
                          kubernetes.io/name=CoreDNS
Annotations:              prometheus.io/port: 9153
                          prometheus.io/scrape: true
Selector:                 k8s-app=kube-dns
Type:                     ClusterIP
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.20.0.10
IPs:                      172.20.0.10
Port:                     dns  53/UDP
TargetPort:               53/UDP
Endpoints:                10.0.100.25:53,10.0.100.40:53
Port:                     dns-tcp  53/TCP
TargetPort:               53/TCP
Endpoints:                10.0.100.25:53,10.0.100.40:53
Port:                     metrics  9153/TCP
TargetPort:               9153/TCP
Endpoints:                10.0.100.25:9153,10.0.100.40:9153
Session Affinity:         None
Internal Traffic Policy:  Cluster
Events:                   <none>

こちらが CoreDNS アドオン経由でデプロイされた Pod の IP になります。

% kubectl get pod -o wide -n kube-system
NAME                           READY   STATUS    RESTARTS   AGE   IP            NODE                                            NOMINATED NODE   READINESS GATES
aws-node-brh56                 2/2     Running   0          40m   10.0.100.9    ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>
coredns-6d78c58c9f-hzrmg       1/1     Running   0          39m   10.0.100.40   ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>
coredns-6d78c58c9f-kqzd4       1/1     Running   0          39m   10.0.100.25   ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>
eks-pod-identity-agent-5r76h   1/1     Running   0          37m   10.0.100.9    ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>
kube-proxy-9n95p               1/1     Running   0          39m   10.0.100.9    ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>

基本的には kube-proxy が iptables を書き換えたりして、ClusterIP 経由で各 Pod が CoreDNS にアクセスするような形になっているはずです。

eks2.png

https://kubernetes.io/ja/docs/concepts/services-networking/service/#proxy-mode-iptables

Auto Mode を有効化する

では、Auto Mode を有効化してきます。
AWS EKS Terraform module の場合、cluster_compute_config 属性を追加することで、Auto Mode を有効化できます。
また、EKS Auto Mode でノードをプロビジョニングする際は、ノードプールと呼ばれる概念があり、インスタンスタイプ、CPU アーキテクチャ (ARM64/AMD64)やキャパシティタイプ (スポット/オンデマンド)などを指定可能です。
自分でカスタマイズして作成することも可能ですが、今回は一般的なユースケースを想定して元々用意されている general-purpose を利用します。

https://dev.classmethod.jp/articles/eks-auto-mode-custom-node-pool/

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.35.0"

  cluster_name                    = local.cluster_name
  cluster_version                 = "1.32"
  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  enable_cluster_creator_admin_permissions = true

  eks_managed_node_groups = {
    default = {
      name = "default"

      instance_types = ["t3.small"]

      min_size     = 1
      max_size     = 3
      desired_size = 1
    }
  }

  bootstrap_self_managed_addons = false

  # Auto Mode の有効化
  cluster_compute_config = {
    enabled    = true
    node_pools = ["general-purpose"]
  }

  cluster_addons = {
    coredns = {
      version = "v1.11.4-eksbuild.2"
    }
    kube-proxy = {
      version = "v1.32.0-eksbuild.2"
    }
    vpc-cni = {
      version        = "v1.19.3-eksbuild.1"
      before_compute = true
    }
    eks-pod-identity-agent = {
      version = "v1.3.5-eksbuild.2"
    }
  }
}

cluster_compute_config 属性を追加した所、terraform plan の結果は下記のようになりました。

Terraform will perform the following actions:

  # module.eks.data.aws_eks_addon_version.this["coredns"] will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_eks_addon_version" "this" {
      + addon_name         = "coredns"
      + id                 = (known after apply)
      + kubernetes_version = "1.32"
      + version            = (known after apply)
    }

  # module.eks.data.aws_eks_addon_version.this["eks-pod-identity-agent"] will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_eks_addon_version" "this" {
      + addon_name         = "eks-pod-identity-agent"
      + id                 = (known after apply)
      + kubernetes_version = "1.32"
      + version            = (known after apply)
    }

  # module.eks.data.aws_eks_addon_version.this["kube-proxy"] will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_eks_addon_version" "this" {
      + addon_name         = "kube-proxy"
      + id                 = (known after apply)
      + kubernetes_version = "1.32"
      + version            = (known after apply)
    }

  # module.eks.data.aws_eks_addon_version.this["vpc-cni"] will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "aws_eks_addon_version" "this" {
      + addon_name         = "vpc-cni"
      + id                 = (known after apply)
      + kubernetes_version = "1.32"
      + version            = (known after apply)
    }

  # module.eks.data.tls_certificate.this[0] will be read during apply
  # (depends on a resource or a module with changes pending)
 <= data "tls_certificate" "this" {
      + certificates = (known after apply)
      + id           = (known after apply)
      + url          = "https://oidc.eks.ap-northeast-1.amazonaws.com/id/XXXXXXXXXXXXXXXXXXXXXXXX"
    }

  # module.eks.aws_eks_addon.before_compute["vpc-cni"] will be updated in-place
  ~ resource "aws_eks_addon" "before_compute" {
      ~ addon_version               = "v1.19.2-eksbuild.1" -> (known after apply)
        id                          = "test-cluster:vpc-cni"
        tags                        = {}
        # (11 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.eks.aws_eks_addon.this["coredns"] will be updated in-place
  ~ resource "aws_eks_addon" "this" {
      ~ addon_version               = "v1.11.4-eksbuild.2" -> (known after apply)
        id                          = "test-cluster:coredns"
        tags                        = {}
        # (11 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.eks.aws_eks_addon.this["eks-pod-identity-agent"] will be updated in-place
  ~ resource "aws_eks_addon" "this" {
      ~ addon_version               = "v1.3.4-eksbuild.1" -> (known after apply)
        id                          = "test-cluster:eks-pod-identity-agent"
        tags                        = {}
        # (11 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.eks.aws_eks_addon.this["kube-proxy"] will be updated in-place
  ~ resource "aws_eks_addon" "this" {
      ~ addon_version               = "v1.32.0-eksbuild.2" -> (known after apply)
        id                          = "test-cluster:kube-proxy"
        tags                        = {}
        # (11 unchanged attributes hidden)

        # (1 unchanged block hidden)
    }

  # module.eks.aws_eks_cluster.this[0] will be updated in-place
  ~ resource "aws_eks_cluster" "this" {
        id                            = "test-cluster"
        name                          = "test-cluster"
        tags                          = {
            "terraform-aws-modules" = "eks"
        }
        # (12 unchanged attributes hidden)

      + compute_config {
          + enabled       = true
          + node_pools    = [
              + "general-purpose",
            ]
          + node_role_arn = (known after apply)
        }

      ~ kubernetes_network_config {
            # (3 unchanged attributes hidden)

          ~ elastic_load_balancing {
              ~ enabled = false -> true
            }
        }

      + storage_config {
          + block_storage {
              + enabled = true
            }
        }

        # (5 unchanged blocks hidden)
    }

  # module.eks.aws_iam_openid_connect_provider.oidc_provider[0] will be updated in-place
  ~ resource "aws_iam_openid_connect_provider" "oidc_provider" {
        id              = "arn:aws:iam::xxxxxxxxxxxx:oidc-provider/oidc.eks.ap-northeast-1.amazonaws.com/id/XXXXXXXXXXXXXXXXXXXXXXXX"
        tags            = {
            "Name" = "test-cluster-eks-irsa"
        }
      ~ thumbprint_list = [
          - "9e99a48a9960b14926bb7f3b02e22da2b0ab7280",
        ] -> (known after apply)
        # (4 unchanged attributes hidden)
    }

  # module.eks.aws_iam_role.eks_auto[0] will be created
  + resource "aws_iam_role" "eks_auto" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = [
                          + "sts:TagSession",
                          + "sts:AssumeRole",
                        ]
                      + Effect    = "Allow"
                      + Principal = {
                          + Service = "ec2.amazonaws.com"
                        }
                      + Sid       = "EKSAutoNodeAssumeRole"
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = true
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = (known after apply)
      + name_prefix           = "test-cluster-eks-auto-"
      + path                  = "/"
      + tags_all              = (known after apply)
      + unique_id             = (known after apply)

      + inline_policy (known after apply)
    }

  # module.eks.aws_iam_role_policy_attachment.eks_auto["AmazonEC2ContainerRegistryPullOnly"] will be created
  + resource "aws_iam_role_policy_attachment" "eks_auto" {
      + id         = (known after apply)
      + policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryPullOnly"
      + role       = (known after apply)
    }

  # module.eks.aws_iam_role_policy_attachment.eks_auto["AmazonEKSWorkerNodeMinimalPolicy"] will be created
  + resource "aws_iam_role_policy_attachment" "eks_auto" {
      + id         = (known after apply)
      + policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodeMinimalPolicy"
      + role       = (known after apply)
    }

  # module.eks.aws_iam_role_policy_attachment.this["AmazonEKSBlockStoragePolicy"] will be created
  + resource "aws_iam_role_policy_attachment" "this" {
      + id         = (known after apply)
      + policy_arn = "arn:aws:iam::aws:policy/AmazonEKSBlockStoragePolicy"
      + role       = "test-cluster-cluster-20250503052606002600000003"
    }

  # module.eks.aws_iam_role_policy_attachment.this["AmazonEKSComputePolicy"] will be created
  + resource "aws_iam_role_policy_attachment" "this" {
      + id         = (known after apply)
      + policy_arn = "arn:aws:iam::aws:policy/AmazonEKSComputePolicy"
      + role       = "test-cluster-cluster-20250503052606002600000003"
    }

  # module.eks.aws_iam_role_policy_attachment.this["AmazonEKSLoadBalancingPolicy"] will be created
  + resource "aws_iam_role_policy_attachment" "this" {
      + id         = (known after apply)
      + policy_arn = "arn:aws:iam::aws:policy/AmazonEKSLoadBalancingPolicy"
      + role       = "test-cluster-cluster-20250503052606002600000003"
    }

  # module.eks.aws_iam_role_policy_attachment.this["AmazonEKSNetworkingPolicy"] will be created
  + resource "aws_iam_role_policy_attachment" "this" {
      + id         = (known after apply)
      + policy_arn = "arn:aws:iam::aws:policy/AmazonEKSNetworkingPolicy"
      + role       = "test-cluster-cluster-20250503052606002600000003"
    }

  # module.eks.aws_iam_role_policy_attachment.this["AmazonEKSVPCResourceController"] will be destroyed
  # (because key ["AmazonEKSVPCResourceController"] is not in for_each map)
  - resource "aws_iam_role_policy_attachment" "this" {
      - id         = "test-cluster-cluster-20250503052606002600000003-20250503052607818000000008" -> null
      - policy_arn = "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController" -> null
      - role       = "test-cluster-cluster-20250503052606002600000003" -> null
    }

AWS 公式ドキュメント にも記載されている、下記 IAM ポリシーがクラスターロールに追加されています。

  • AmazonEKSComputePolicy
  • AmazonEKSBlockStoragePolicy
  • AmazonEKSLoadBalancingPolicy
  • AmazonEKSNetworkingPolicy
  • AmazonEKSClusterPolicy

Auto Mode の有効化を終えた後、各 Pod は最初から存在するノード上にそのまま存在します。

% kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP             NODE                                            NOMINATED NODE   READINESS GATES
nginx-7c64dfbfdc-5hxn5   1/1     Running   0          52m   10.0.100.137   ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>
nginx-7c64dfbfdc-hhvdr   1/1     Running   0          52m   10.0.100.43    ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>

また、有効化直後の段階では Auto Mode 管理のノードは存在しません。

% kubectl get node
NAME                                            STATUS   ROLES    AGE   VERSION
ip-10-0-100-9.ap-northeast-1.compute.internal   Ready    <none>   59m   v1.32.3-eks-473151a

DNS 周りの設定も変わっていません。
Auto Mode 経由でノードを起動できる状態になったというだけのようです。

% kubectl describe service/kube-dns -n kube-system
Name:                     kube-dns
Namespace:                kube-system
Labels:                   eks.amazonaws.com/component=kube-dns
                          k8s-app=kube-dns
                          kubernetes.io/cluster-service=true
                          kubernetes.io/name=CoreDNS
Annotations:              prometheus.io/port: 9153
                          prometheus.io/scrape: true
Selector:                 k8s-app=kube-dns
Type:                     ClusterIP
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.20.0.10
IPs:                      172.20.0.10
Port:                     dns  53/UDP
TargetPort:               53/UDP
Endpoints:                10.0.100.25:53,10.0.100.40:53
Port:                     dns-tcp  53/TCP
TargetPort:               53/TCP
Endpoints:                10.0.100.25:53,10.0.100.40:53
Port:                     metrics  9153/TCP
TargetPort:               9153/TCP
Endpoints:                10.0.100.25:9153,10.0.100.40:9153
Session Affinity:         None
Internal Traffic Policy:  Cluster
Events:                   <none>

では、アプリケーションを Auto Mode 管理のノードに移行していきましょう。
nodeSelectoreks.amazonaws.com/compute-type: auto を指定して kubectl apply を実行します。

apiVersion: apps/v1
kind: Deployment
metadata:
  namespace: default
  name: nginx
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: nginx
  replicas: 2
  template:
    metadata:
      labels:
        app.kubernetes.io/name: nginx
    spec:
      containers:
        - image: nginx
          imagePullPolicy: Always
          name: nginx
          ports:
            - containerPort: 80
          resources:
            requests:
              cpu: "0.5"
      nodeSelector:
        eks.amazonaws.com/compute-type: auto

Auto Mode 管理のノードが起動されるのを待って Pod が Pending 状態になっています。

% kubectl get pod
NAME                     READY   STATUS    RESTARTS   AGE
nginx-66d8bcc68c-rdj7f   0/1     Pending   0          12s
nginx-7c64dfbfdc-5hxn5   1/1     Running   0          55m
nginx-7c64dfbfdc-hhvdr   1/1     Running   0          55m

Auto Mode 管理のノードが起動されました。

% kubectl get node
NAME                                            STATUS   ROLES    AGE   VERSION
i-016c762ee61d543dd                             Ready    <none>   16s   v1.32.2-eks-677bac1
ip-10-0-100-9.ap-northeast-1.compute.internal   Ready    <none>   62m   v1.32.3-eks-473151a

無事、Auto Mode 管理のノードに Pod が移動しました。

% kubectl get pod -o wide
NAME                     READY   STATUS    RESTARTS   AGE   IP            NODE                  NOMINATED NODE   READINESS GATES
nginx-66d8bcc68c-rdj7f   1/1     Running   0          71s   10.0.101.80   i-016c762ee61d543dd   <none>           <none>
nginx-66d8bcc68c-zjtr4   1/1     Running   0          30s   10.0.101.81   i-016c762ee61d543dd   <none>           <none>

既存のマネージドノードグループの削除

無事 Pod の移行も済んだので、不要になったマネージドノードグループを消していきます。
不要なのでさっさと消したい所ですが、アドオン関連の Pod が既存のノードに多く残っています。

% kubectl get pod -o wide -A
NAMESPACE     NAME                           READY   STATUS    RESTARTS   AGE   IP            NODE                                            NOMINATED NODE   READINESS GATES
default       nginx-66d8bcc68c-rdj7f         1/1     Running   0          15m   10.0.101.80   i-016c762ee61d543dd                             <none>           <none>
default       nginx-66d8bcc68c-zjtr4         1/1     Running   0          15m   10.0.101.81   i-016c762ee61d543dd                             <none>           <none>
kube-system   aws-node-brh56                 2/2     Running   0          78m   10.0.100.9    ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>
kube-system   coredns-6d78c58c9f-hzrmg       1/1     Running   0          76m   10.0.100.40   ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>
kube-system   coredns-6d78c58c9f-kqzd4       1/1     Running   0          76m   10.0.100.25   ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>
kube-system   eks-pod-identity-agent-5r76h   1/1     Running   0          75m   10.0.100.9    ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>
kube-system   kube-proxy-9n95p               1/1     Running   0          76m   10.0.100.9    ip-10-0-100-9.ap-northeast-1.compute.internal   <none>           <none>

aws-node(AWS VPC CNI)、CoreDNS、kube-proxy、Pod Identity エージェントなどですね。

% kubectl get all -A
NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
default       pod/nginx-66d8bcc68c-rdj7f         1/1     Running   0          16m
default       pod/nginx-66d8bcc68c-zjtr4         1/1     Running   0          15m
kube-system   pod/aws-node-brh56                 2/2     Running   0          78m
kube-system   pod/coredns-6d78c58c9f-hzrmg       1/1     Running   0          76m
kube-system   pod/coredns-6d78c58c9f-kqzd4       1/1     Running   0          76m
kube-system   pod/eks-pod-identity-agent-5r76h   1/1     Running   0          75m
kube-system   pod/kube-proxy-9n95p               1/1     Running   0          76m

NAMESPACE     NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes                  ClusterIP   172.20.0.1       <none>        443/TCP                  83m
kube-system   service/eks-extension-metrics-api   ClusterIP   172.20.148.179   <none>        443/TCP                  83m
kube-system   service/kube-dns                    ClusterIP   172.20.0.10      <none>        53/UDP,53/TCP,9153/TCP   76m

NAMESPACE     NAME                                    DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
kube-system   daemonset.apps/aws-node                 1         1         1       1            1           <none>          79m
kube-system   daemonset.apps/eks-pod-identity-agent   1         1         1       1            1           <none>          75m
kube-system   daemonset.apps/kube-proxy               1         1         1       1            1           <none>          76m

NAMESPACE     NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
default       deployment.apps/nginx     2/2     2            2           71m
kube-system   deployment.apps/coredns   2/2     2            2           76m

NAMESPACE     NAME                                 DESIRED   CURRENT   READY   AGE
default       replicaset.apps/nginx-66d8bcc68c     2         2         2       16m
default       replicaset.apps/nginx-7c64dfbfdc     0         0         0       71m
kube-system   replicaset.apps/coredns-6d78c58c9f   2         2         2       76m

とりあえず(?)、cluster_addons 属性を削除してアドオンを削除してみます。

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.35.0"

  cluster_name                    = local.cluster_name
  cluster_version                 = "1.32"
  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  enable_cluster_creator_admin_permissions = true

  eks_managed_node_groups = {
    default = {
      name = "default"

      instance_types = ["t3.small"]

      min_size     = 1
      max_size     = 3
      desired_size = 1
    }
  }

  cluster_compute_config = {
    enabled    = true
    node_pools = ["general-purpose"]
  }

  bootstrap_self_managed_addons = false
}

差分は下記のようになりました。

Terraform will perform the following actions:

  # module.eks.aws_eks_addon.before_compute["vpc-cni"] will be destroyed
  # (because key ["vpc-cni"] is not in for_each map)
  - resource "aws_eks_addon" "before_compute" {
      - addon_name                  = "vpc-cni" -> null
      - addon_version               = "v1.19.2-eksbuild.1" -> null
      - arn                         = "arn:aws:eks:ap-northeast-1:xxxxxxxxxxxx:addon/test-cluster/vpc-cni/20cb4a51-f67e-c71b-a374-2fe82b1d8630" -> null
      - cluster_name                = "test-cluster" -> null
      - created_at                  = "2025-05-03T05:34:53Z" -> null
      - id                          = "test-cluster:vpc-cni" -> null
      - modified_at                 = "2025-05-03T05:35:02Z" -> null
      - preserve                    = true -> null
      - resolve_conflicts_on_create = "NONE" -> null
      - resolve_conflicts_on_update = "OVERWRITE" -> null
      - tags                        = {} -> null
      - tags_all                    = {} -> null
        # (2 unchanged attributes hidden)

      - timeouts {}
    }

  # module.eks.aws_eks_addon.this["coredns"] will be destroyed
  # (because key ["coredns"] is not in for_each map)
  - resource "aws_eks_addon" "this" {
      - addon_name                  = "coredns" -> null
      - addon_version               = "v1.11.4-eksbuild.2" -> null
      - arn                         = "arn:aws:eks:ap-northeast-1:xxxxxxxxxxxx:addon/test-cluster/coredns/d2cb4a53-115c-f5d4-ecab-72b43e54990b" -> null
      - cluster_name                = "test-cluster" -> null
      - created_at                  = "2025-05-03T05:37:18Z" -> null
      - id                          = "test-cluster:coredns" -> null
      - modified_at                 = "2025-05-03T05:38:02Z" -> null
      - preserve                    = true -> null
      - resolve_conflicts_on_create = "NONE" -> null
      - resolve_conflicts_on_update = "OVERWRITE" -> null
      - tags                        = {} -> null
      - tags_all                    = {} -> null
        # (2 unchanged attributes hidden)

      - timeouts {}
    }

  # module.eks.aws_eks_addon.this["eks-pod-identity-agent"] will be destroyed
  # (because key ["eks-pod-identity-agent"] is not in for_each map)
  - resource "aws_eks_addon" "this" {
      - addon_name                  = "eks-pod-identity-agent" -> null
      - addon_version               = "v1.3.4-eksbuild.1" -> null
      - arn                         = "arn:aws:eks:ap-northeast-1:xxxxxxxxxxxx:addon/test-cluster/eks-pod-identity-agent/94cb4a54-041d-9329-299b-f4074a46a77b" -> null
      - cluster_name                = "test-cluster" -> null
      - created_at                  = "2025-05-03T05:39:22Z" -> null
      - id                          = "test-cluster:eks-pod-identity-agent" -> null
      - modified_at                 = "2025-05-03T05:39:58Z" -> null
      - preserve                    = true -> null
      - resolve_conflicts_on_create = "NONE" -> null
      - resolve_conflicts_on_update = "OVERWRITE" -> null
      - tags                        = {} -> null
      - tags_all                    = {} -> null
        # (2 unchanged attributes hidden)

      - timeouts {}
    }

  # module.eks.aws_eks_addon.this["kube-proxy"] will be destroyed
  # (because key ["kube-proxy"] is not in for_each map)
  - resource "aws_eks_addon" "this" {
      - addon_name                  = "kube-proxy" -> null
      - addon_version               = "v1.32.0-eksbuild.2" -> null
      - arn                         = "arn:aws:eks:ap-northeast-1:xxxxxxxxxxxx:addon/test-cluster/kube-proxy/46cb4a53-1165-2e29-2150-bba6b8a475c7" -> null
      - cluster_name                = "test-cluster" -> null
      - created_at                  = "2025-05-03T05:37:18Z" -> null
      - id                          = "test-cluster:kube-proxy" -> null
      - modified_at                 = "2025-05-03T05:38:25Z" -> null
      - preserve                    = true -> null
      - resolve_conflicts_on_create = "NONE" -> null
      - resolve_conflicts_on_update = "OVERWRITE" -> null
      - tags                        = {} -> null
      - tags_all                    = {} -> null
        # (2 unchanged attributes hidden)

      - timeouts {}
    }

Plan: 0 to add, 0 to change, 4 to destroy.

アドオンを消しても、対応する Kubernetes リソースが消えなかったので kubectl 経由で強制的に消しました。

kubectl delete deployment coredns -n kube-system
kubectl delete daemonset aws-node -n kube-system
kubectl delete daemonset eks-pod-identity-agent -n kube-system
kubectl delete daemonset kube-proxy -n kube-system

さて、勢いでアドオンを消してしまいましたが、果たして良かったのでしょうか?
DaemonSet で実装されている、aws-node(VPC CNI driver)、kube-proxy、Pod Identity エージェントは良いとして、Deployment としてデプロイされていた CoreDNS が特に気になります。
Pod 上の /etc/resolve.conf も Auto Mode 有効化前と同じ IP を指しています。

% kubectl exec nginx-66d8bcc68c-rdj7f -it -- /bin/sh
# cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local ap-northeast-1.compute.internal
nameserver 172.20.0.10
options ndots:5

この段階で CoreDNS 用の ClusterIP を確認してみると、Endpoints が無くなっています。

% kubectl describe service/kube-dns -n kube-system
Name:                     kube-dns
Namespace:                kube-system
Labels:                   eks.amazonaws.com/component=kube-dns
                          k8s-app=kube-dns
                          kubernetes.io/cluster-service=true
                          kubernetes.io/name=CoreDNS
Annotations:              prometheus.io/port: 9153
                          prometheus.io/scrape: true
Selector:                 k8s-app=kube-dns
Type:                     ClusterIP
IP Family Policy:         SingleStack
IP Families:              IPv4
IP:                       172.20.0.10
IPs:                      172.20.0.10
Port:                     dns  53/UDP
TargetPort:               53/UDP
Endpoints:
Port:                     dns-tcp  53/TCP
TargetPort:               53/TCP
Endpoints:
Port:                     metrics  9153/TCP
TargetPort:               9153/TCP
Endpoints:
Session Affinity:         None
Internal Traffic Policy:  Cluster
Events:                   <none>

一方で、Service を作成した後に Nginx Pod の中に乗り込んで名前解決してみると、特に問題無く名前解決できる状態です。

# dig nginx.default.svc.cluster.local

; <<>> DiG 9.18.33-1~deb12u2-Debian <<>> nginx.default.svc.cluster.local
;; global options: +cmd
;; Got answer:
;; WARNING: .local is reserved for Multicast DNS
;; You are currently testing what happens when an mDNS query is leaked to DNS
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42133
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 5686dda4de381016 (echoed)
;; QUESTION SECTION:
;nginx.default.svc.cluster.local. IN    A

;; ANSWER SECTION:
nginx.default.svc.cluster.local. 5 IN   A       172.20.53.13

;; Query time: 0 msec
;; SERVER: 172.20.0.10#53(172.20.0.10) (UDP)
;; WHEN: Sat May 03 07:17:37 UTC 2025
;; MSG SIZE  rcvd: 119

つまり、Auto Mode 管理のノード上にいるコンテナは、アドオンとしてデプロイされる CoreDNS を参照していないようです。
Auto Mode の場合、CoreDNS もノード上で systemd サービスとして実行されています。

auto-mode.png

EKS Auto Mode!

ノード上のログをダンプして system/ps.txt を確認すれば、kube-proxy や CoreDNS に該当するプロセスを確認できます。

kube-proxy

root        1170  0.0  1.4 1738012 56396 ?       Ssl  07:53   0:01 /usr/bin/kube-proxy --hostname-override i-04239f7bdcfa45781 --config=/usr/share/kube-proxy/kube-proxy-config --kubeconfig=/etc/kubernetes/kube-proxy/kubeconfig

CoreDNS

coredns     1604  0.1  1.5 1813796 59840 ?       Ssl  07:53   0:10 /usr/bin/coredns -conf=/etc/coredns/Corefile

https://dev.classmethod.jp/articles/eks-auto-mode-get-node-logs/

Auto Mode 管理のノード上で起動した Pod は、名前解決の際に良い感じにノード上の CoreDNS にルーティングされていると考えられます。
最初から Auto Mode として作成したクラスターでは CoreDNS 用の ClusterIP も存在しません。

% kubectl get all -A
NAMESPACE     NAME                                TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
default       service/kubernetes                  ClusterIP   172.20.0.1       <none>        443/TCP   8m45s
kube-system   service/eks-extension-metrics-api   ClusterIP   172.20.213.192   <none>        443/TCP   8m42s

/etc/resolve.conf/etc/nsswitch.conf についても非 Auto Mode と変わり映え無いので、ノードの iptables で良い感じにルーティングされているのでしょう。

resolve.conf

% kubectl exec nginx-66d8bcc68c-rdj7f -it -- /bin/sh
# cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local ap-northeast-1.compute.internal
nameserver 172.20.0.10
options ndots:5

nsswitch.conf

% kubectl exec nginx-66d8bcc68c-rdj7f -it -- /bin/sh
# cat /etc/nsswitch.conf
# /etc/nsswitch.conf
#
# Example configuration of GNU Name Service Switch functionality.
# If you have the `glibc-doc-reference' and `info' packages installed, try:
# `info libc "Name Service Switch"' for information about this file.

passwd:         files
group:          files
shadow:         files
gshadow:        files

hosts:          files dns
networks:       files

protocols:      db files
services:       db files
ethers:         db files
rpc:            db files

netgroup:       nis

アドオンを消してよさそうなことがわかったので、改めてノードグループを削除します。
EKS クラスター関連の記述はこんな感じです。
アドオンの管理もノードグループの管理も不要になったので、スッキリしました。

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.35.0"

  cluster_name                    = local.cluster_name
  cluster_version                 = "1.32"
  cluster_endpoint_public_access  = true
  cluster_endpoint_private_access = true

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  enable_cluster_creator_admin_permissions = true
  bootstrap_self_managed_addons = false

  # Auto Mode の有効化
  cluster_compute_config = {
    enabled    = true
    node_pools = ["general-purpose"]
  }
}

差分は下記のようになりました。

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated
with the following symbols:
  - destroy

Terraform will perform the following actions:

  # module.eks.module.eks_managed_node_group["default"].aws_eks_node_group.this[0] will be destroyed
  # (because module.eks.module.eks_managed_node_group["default"] is not in configuration)
  - resource "aws_eks_node_group" "this" {
      - ami_type               = "AL2023_x86_64_STANDARD" -> null
      - arn                    = "arn:aws:eks:ap-northeast-1:xxxxxxxxxxxx:nodegroup/test-cluster/default-20250503053528850800000011/00cb4a52-3c48-71f3-7020-b72076da380d" -> null
      - capacity_type          = "ON_DEMAND" -> null
      - cluster_name           = "test-cluster" -> null
      - disk_size              = 0 -> null
      - id                     = "test-cluster:default-20250503053528850800000011" -> null
      - instance_types         = [
          - "t3.small",
        ] -> null
      - labels                 = {} -> null
      - node_group_name        = "default-20250503053528850800000011" -> null
      - node_group_name_prefix = "default-" -> null
      - node_role_arn          = "arn:aws:iam::xxxxxxxxxxxx:role/default-eks-node-group-20250503052606002600000002" -> null
      - release_version        = "1.32.3-20250501" -> null
      - resources              = [
          - {
              - autoscaling_groups              = [
                  - {
                      - name = "eks-default-20250503053528850800000011-00cb4a52-3c48-71f3-7020-b72076da380d"
                    },
                ]
                # (1 unchanged attribute hidden)
            },
        ] -> null
      - status                 = "ACTIVE" -> null
      - subnet_ids             = [
          - "subnet-011609187505c89a3",
          - "subnet-0302d2f4c14d12022",
          - "subnet-09181196696186fc9",
        ] -> null
      - tags                   = {
          - "Name" = "default"
        } -> null
      - tags_all               = {
          - "Name" = "default"
        } -> null
      - version                = "1.32" -> null

      - launch_template {
          - id      = "lt-0fbede7110502107a" -> null
          - name    = "default-2025050305352315330000000f" -> null
          - version = "1" -> null
        }

      - scaling_config {
          - desired_size = 1 -> null
          - max_size     = 3 -> null
          - min_size     = 1 -> null
        }

      - timeouts {}

      - update_config {
          - max_unavailable            = 0 -> null
          - max_unavailable_percentage = 33 -> null
        }
    }

  # module.eks.module.eks_managed_node_group["default"].aws_iam_role.this[0] will be destroyed
  # (because module.eks.module.eks_managed_node_group["default"] is not in configuration)
  - resource "aws_iam_role" "this" {
      - arn                   = "arn:aws:iam::xxxxxxxxxxxx:role/default-eks-node-group-20250503052606002600000002" -> null
      - assume_role_policy    = jsonencode(
            {
              - Statement = [
                  - {
                      - Action    = "sts:AssumeRole"
                      - Effect    = "Allow"
                      - Principal = {
                          - Service = "ec2.amazonaws.com"
                        }
                      - Sid       = "EKSNodeAssumeRole"
                    },
                ]
              - Version   = "2012-10-17"
            }
        ) -> null
      - create_date           = "2025-05-03T05:26:06Z" -> null
      - description           = "EKS managed node group IAM role" -> null
      - force_detach_policies = true -> null
      - id                    = "default-eks-node-group-20250503052606002600000002" -> null
      - managed_policy_arns   = [
          - "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
          - "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy",
          - "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy",
        ] -> null
      - max_session_duration  = 3600 -> null
      - name                  = "default-eks-node-group-20250503052606002600000002" -> null
      - name_prefix           = "default-eks-node-group-" -> null
      - path                  = "/" -> null
      - tags                  = {} -> null
      - tags_all              = {} -> null
      - unique_id             = "AROAW3MEE5OCLYIPBF6OZ" -> null
        # (1 unchanged attribute hidden)
    }

  # module.eks.module.eks_managed_node_group["default"].aws_iam_role_policy_attachment.this["AmazonEC2ContainerRegistryReadOnly"] will be destroyed
  # (because module.eks.module.eks_managed_node_group["default"] is not in configuration)
  - resource "aws_iam_role_policy_attachment" "this" {
      - id         = "default-eks-node-group-20250503052606002600000002-2025050305260809140000000b" -> null
      - policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly" -> null
      - role       = "default-eks-node-group-20250503052606002600000002" -> null
    }

  # module.eks.module.eks_managed_node_group["default"].aws_iam_role_policy_attachment.this["AmazonEKSWorkerNodePolicy"] will be destroyed
  # (because module.eks.module.eks_managed_node_group["default"] is not in configuration)
  - resource "aws_iam_role_policy_attachment" "this" {
      - id         = "default-eks-node-group-20250503052606002600000002-2025050305260796910000000a" -> null
      - policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy" -> null
      - role       = "default-eks-node-group-20250503052606002600000002" -> null
    }

  # module.eks.module.eks_managed_node_group["default"].aws_iam_role_policy_attachment.this["AmazonEKS_CNI_Policy"] will be destroyed
  # (because module.eks.module.eks_managed_node_group["default"] is not in configuration)
  - resource "aws_iam_role_policy_attachment" "this" {
      - id         = "default-eks-node-group-20250503052606002600000002-20250503052607932300000009" -> null
      - policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy" -> null
      - role       = "default-eks-node-group-20250503052606002600000002" -> null
    }

  # module.eks.module.eks_managed_node_group["default"].aws_launch_template.this[0] will be destroyed
  # (because module.eks.module.eks_managed_node_group["default"] is not in configuration)
  - resource "aws_launch_template" "this" {
      - arn                                  = "arn:aws:ec2:ap-northeast-1:xxxxxxxxxxxx:launch-template/lt-0fbede7110502107a" -> null
      - default_version                      = 1 -> null
      - description                          = "Custom launch template for default EKS managed node group" -> null
      - disable_api_stop                     = false -> null
      - disable_api_termination              = false -> null
      - id                                   = "lt-0fbede7110502107a" -> null
      - latest_version                       = 1 -> null
      - name                                 = "default-2025050305352315330000000f" -> null
      - name_prefix                          = "default-" -> null
      - security_group_names                 = [] -> null
      - tags                                 = {} -> null
      - tags_all                             = {} -> null
      - update_default_version               = true -> null
      - vpc_security_group_ids               = [
          - "sg-0d629394303b2d94b",
        ] -> null
        # (8 unchanged attributes hidden)

      - metadata_options {
          - http_endpoint               = "enabled" -> null
          - http_put_response_hop_limit = 2 -> null
          - http_tokens                 = "required" -> null
            # (2 unchanged attributes hidden)
        }

      - monitoring {
          - enabled = true -> null
        }

      - tag_specifications {
          - resource_type = "instance" -> null
          - tags          = {
              - "Name" = "default"
            } -> null
        }
      - tag_specifications {
          - resource_type = "network-interface" -> null
          - tags          = {
              - "Name" = "default"
            } -> null
        }
      - tag_specifications {
          - resource_type = "volume" -> null
          - tags          = {
              - "Name" = "default"
            } -> null
        }
    }

  # module.eks.module.eks_managed_node_group["default"].module.user_data.null_resource.validate_cluster_service_cidr will be destroyed
  # (because module.eks.module.eks_managed_node_group["default"].module.user_data is not in configuration)
  - resource "null_resource" "validate_cluster_service_cidr" {
      - id = "2710373779183779823" -> null
    }

Plan: 0 to add, 0 to change, 7 to destroy.

─────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Note: You didn't use the -out option to save this plan, so Terraform can't guarantee to take exactly these
actions if you run "terraform apply" now.

無事 Auto Mode にインプレースで移行して、既存のマネージドノードグループも消せました!

まとめ

EKS のマネージドノードグループから Auto Mode にインプレース移行してみました。
下記順序で行えば、問題なさそうです。

  • Auto Mode 有効化
  • アプリケーション移行
    • Auto Mode に含まれていないアドオン (EFS CSI driver など) も含む
  • Auto Mode では不要なアドオンを削除
  • 既存ノードグループ削除

Ingress や PV が絡むともう少しややこしそうなので、いずれ検証してみようと思います。

Share this article

facebook logohatena logotwitter logo

© Classmethod, Inc. All rights reserved.