Metrics ServerをインストールしてHPA(Horizontal Pod Autoscaler)を使ってみる

2022.09.16

Metrics Serverとは

クラスター全体のリソース使用量データを収集するコンポーネントです。これをインストールすると、kubectl top nodekubectl top podで各のNodeやPodのリソース(CPU・メモリ使用率)を確認できるようになります。

Metrics Serverをインストールする動機

HPA(Horizontal Pod Autoscaler)を使いたいからです。HPAは、CPU等のリソース使用率に基づいてdeployment等のPodの数を自動的にスケールする機能です。Metrics Server経由でCPUやメモリ使用率を取得して、その値を元に自動スケールアウト・スケールインすることができます。

Metrics ServerなしでHPAを定義した場合

以下コマンドでHPAを作成します。CPU使用率を元にスケールさせます。

% kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10

describeすると、CPU使用率を取得できないというログが色々出てますね。

% kubectl describe hpa php-apache   
Name:                                                  php-apache
Namespace:                                             hoge
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Wed, 14 Sep 2022 15:14:13 +0900
Reference:                                             Deployment/php-apache
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  <unknown> / 50%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       1 current / 0 desired
Conditions:
  Type           Status  Reason                   Message
  ----           ------  ------                   -------
  AbleToScale    True    SucceededGetScale        the HPA controller was able to get the target's current scale
  ScalingActive  False   FailedGetResourceMetric  the HPA was unable to compute the replica count: failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
Events:
  Type     Reason                        Age                  From                       Message
  ----     ------                        ----                 ----                       -------
  Warning  FailedGetResourceMetric       6s (x11 over 2m37s)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
  Warning  FailedComputeMetricsReplicas  6s (x11 over 2m37s)  horizontal-pod-autoscaler  invalid metrics (1 invalid out of 1), first error is: failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)

Metrics Serverをインストールする

今回はTerraformのHelm Providerを使ってインストールします。

providerの設定

EKSを使用しています。

terraform {
  required_version = "= 1.2.1"

  required_providers {
    aws = {
      version = "4.20.1"
    }
    kubernetes = {
      version = "2.12.1"
    }
    helm = {
      version = "2.4.1"
    }
  }
}

provider "helm" {
  kubernetes {
    host                   = data.aws_eks_cluster.cluster.endpoint
    cluster_ca_certificate = base64decode(data.aws_eks_cluster.cluster.certificate_authority[0].data)
    token                  = data.aws_eks_cluster_auth.cluster.token
  }
}

基本設定でインストール

resource "helm_release" "metrics_server" {
  name      = "metrics-server"
  namespace = "kube-system"

  repository = "https://kubernetes-sigs.github.io/metrics-server"
  chart      = "metrics-server"
  version    = "3.8.2"

  recreate_pods = true
}

高可用性設定

Metrics ServerのGitHubリポジトリに書かれている高可用性設定のマニフェストファイルを参考に少し書き換えました。

+ locals {
+   metrics_server_values = <<EOT
+ replicas: 2  
+ updateStrategy:
+   rollingUpdate:
+     maxUnavailable: 1
+ affinity:
+   podAntiAffinity:
+     requiredDuringSchedulingIgnoredDuringExecution:
+     - labelSelector:
+         matchLabels:
+           k8s-app: metrics-server
+       namespaces:
+       - kcr-cs
+       topologyKey: kubernetes.io/hostname
+ podDisruptionBudget:
+   enabled: true
+   minAvailable: 1
+ EOT
+ }
+
  resource "helm_release" "metrics_server" {
    name      = "metrics-server"
    namespace = "kube-system"
  
    repository = "https://kubernetes-sigs.github.io/metrics-server"
    chart      = "metrics-server"
    version    = "3.8.2"

    recreate_pods = true

+   values = [local.metrics_server_values]
  }

HPA確認

メトリクスが取れています。Condition欄のScalingActiveもFlaseからTrueになっていますね。

% kubectl describe hpa php-apache
Name:                                                  php-apache
Namespace:                                             hoge
Labels:                                                <none>
Annotations:                                           <none>
CreationTimestamp:                                     Wed, 14 Sep 2022 15:14:13 +0900
Reference:                                             Deployment/php-apache
Metrics:                                               ( current / target )
  resource cpu on pods  (as a percentage of request):  0% (1m) / 50%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason            Message
  ----            ------  ------            -------
  AbleToScale     True    ReadyForNewScale  recommended size matches current size
  ScalingActive   True    ValidMetricFound  the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
  ScalingLimited  True    TooFewReplicas    the desired replica count is less than the minimum replica count
Events:
  Type     Reason                   Age                     From                       Message
  ----     ------                   ----                    ----                       -------
  Warning  FailedGetResourceMetric  2m55s (x441 over 113m)  horizontal-pod-autoscaler  failed to get cpu utilization: unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)

負荷を掛けてみます。

% kubectl run -i \
    --tty load-generator \
    --rm --image=busybox \
    --restart=Never \
    -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"

Podが自動スケールアウトしたことを確認できました!

% k get pod -w
NAME                                         READY   STATUS    RESTARTS   AGE
php-apache-779cd44bdc-lrjpt                  1/1     Running   0          115m
load-generator                               0/1     Pending   0          0s
load-generator                               0/1     Pending   0          1s
load-generator                               0/1     Pending   0          54s
load-generator                               0/1     ContainerCreating   0          55s
load-generator                               1/1     Running             0          61s
load-generator                               1/1     Terminating         0          61s
php-apache-779cd44bdc-htrgl                  0/1     Pending             0          0s
php-apache-779cd44bdc-2m242                  0/1     Pending             0          0s
php-apache-779cd44bdc-htrgl                  0/1     Pending             0          1s
php-apache-779cd44bdc-2m242                  0/1     Pending             0          1s
load-generator                               0/1     Terminating         0          94s
load-generator                               0/1     Terminating         0          94s
load-generator                               0/1     Terminating         0          94s
php-apache-779cd44bdc-htrgl                  0/1     Pending             0          53s
php-apache-779cd44bdc-htrgl                  0/1     ContainerCreating   0          53s
php-apache-779cd44bdc-2m242                  0/1     Pending             0          56s
php-apache-779cd44bdc-2m242                  0/1     ContainerCreating   0          56s
php-apache-779cd44bdc-htrgl                  1/1     Running             0          93s
php-apache-779cd44bdc-2m242                  1/1     Running             0          101s

参考情報