Elasticacheのレプリケーショングループ(Redis)で複数AZへの分散配置を「維持」できるようになりました

レプリケーショングループに対して、MultiAZEnabledという属性を付与できるようになりました。 この属性の意味を解説します。
2020.07.02

中山です。

今日は、Elasticacheのレプリケーショングループ(Redis)で複数AZへの分散配置を「維持」できるようになった件をまとめてみました。

これまでの課題

これまでもElasticache(Redis)ではノードを複数のAZに分散して配置することはできました。 具体的には、SubnetGroupに異なるAZのSubnetを設定した上でReplicasPerNodeGroupに1以上を設定したり、手動でノードを追加する際にPrimaryのAZと異なるAZを指定することで分散配置することができました。

しかし、手動でノードを削除するなどして単一のAZにノードが偏る可能性を排除できていませんでした。 ただし、パフォーマンス要件を考慮して意図的に寄せる場合は、また別の話です。

新しい機能

今回追加されたMultiAZEnabledという属性を有効化することで、Node Groupに含まれるノードが特定のAZに偏ることを防止することができます。 挙動としては、EC2 InstanceやCloudFormation StackのDeletion Protectionのような機能をイメージして頂けると分かりやすいのではないでしょうか。

Minimizing Downtime in ElastiCache for Redis with Multi-AZ

ちなみに、Elasticache(Redis)をすでにご利用の方は今回の仕様変更をPersonal Health Dashboardなどを通じて認識されているのではないかと思います。

なお、このアップデートは2020/6/6ごろにリリースされたものと推測されます。 以下の通り、ドキュメントがほぼ全面的に書き換えられていることが確認できます。

doc_source/redis/AutoFailover.md

やってみた

事前準備

実際に挙動を確認してみましょう。 VPCはDefault VPCを利用することとします。

まず、Subnet Groupを作成します。 今回はap-northeast-1aとap-northeast-1cを利用します。

aws elasticache create-cache-subnet-group \
  --cache-subnet-group-name "test-sng" \
  --cache-subnet-group-description "test" \
  --subnet-ids subnet-87af54ce subnet-2edde176
{
    "CacheSubnetGroup": {
        "VpcId": "vpc-44200c20",
        "CacheSubnetGroupDescription": "test",
        "Subnets": [
            {
                "SubnetIdentifier": "subnet-87af54ce",
                "SubnetAvailabilityZone": {
                    "Name": "ap-northeast-1a"
                }
            },
            {
                "SubnetIdentifier": "subnet-2edde176",
                "SubnetAvailabilityZone": {
                    "Name": "ap-northeast-1c"
                }
            }
        ],
        "CacheSubnetGroupName": "test-sng"
    }
}

Replication Groupの作成

以下のCloudFormation Templateを利用してReplication Groupをプロビジョニングします。

AWSTemplateFormatVersion: "2010-09-09"
Description: A sample template
Resources:
  myReplicationGroup:
    Type: 'AWS::ElastiCache::ReplicationGroup'
    Properties:
      ReplicationGroupDescription: description
      NumNodeGroups: 2
      ReplicasPerNodeGroup: 1
      CacheNodeType: cache.t3.micro
      AutomaticFailoverEnabled: true
      MultiAZEnabled: true
      CacheSubnetGroupName: test-sng
      Engine: redis
      EngineVersion: 5.0.6
      ReplicationGroupId: test-rg

できあがったReplication Groupの詳細はこちらです。 複数のAZに分散配置されていることが分かります。

aws elasticache describe-replication-groups \
  --replication-group-id test-rg
{
    "ReplicationGroups": [
        {
            "Status": "available",
            "MultiAZ": "enabled",
            "Description": "description",
            "NodeGroups": [
                {
                    "Status": "available",
                    "Slots": "0-8191",
                    "NodeGroupId": "0001",
                    "NodeGroupMembers": [
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1a",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0001-001"
                        },
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1c",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0001-002"
                        }
                    ]
                },
                {
                    "Status": "available",
                    "Slots": "8192-16383",
                    "NodeGroupId": "0002",
                    "NodeGroupMembers": [
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1c",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0002-001"
                        },
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1a",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0002-002"
                        }
                    ]
                }
            ],
            "ConfigurationEndpoint": {
                "Port": 6379,
                "Address": "test-rg.s1jbux.clustercfg.apne1.cache.amazonaws.com"
            },
            "AtRestEncryptionEnabled": false,
            "ClusterEnabled": true,
            "ReplicationGroupId": "test-rg",
            "GlobalReplicationGroupInfo": {},
            "SnapshotRetentionLimit": 0,
            "AutomaticFailover": "enabled",
            "TransitEncryptionEnabled": false,
            "SnapshotWindow": "15:00-16:00",
            "AuthTokenEnabled": false,
            "MemberClusters": [
                "test-rg-0001-001",
                "test-rg-0001-002",
                "test-rg-0002-001",
                "test-rg-0002-002"
            ],
            "CacheNodeType": "cache.t3.micro",
            "PendingModifiedValues": {}
        }
    ]
}

動作確認

それでは、動作を確認します。

ap-northeast-1aにノードを追加し、ap-northeast-1cのノードを削除します。

まず、ノードを追加します。

aws elasticache increase-replica-count \
  --replication-group-id test-rg \
  --replica-configuration NodeGroupId=0001,NewReplicaCount=2,PreferredAvailabilityZones=ap-northeast-1a,ap-northeast-1c,ap-northeast-1a \
  --apply-immediately
{
    "ReplicationGroup": {
        "Status": "modifying",
        "MultiAZ": "enabled",
        "Description": "description",
        "NodeGroups": [
            {
                "Status": "modifying",
                "Slots": "0-8191",
                "NodeGroupId": "0001",
                "NodeGroupMembers": [
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1a",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0001-001"
                    },
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1c",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0001-002"
                    }
                ]
            },
            {
                "Status": "modifying",
                "Slots": "8192-16383",
                "NodeGroupId": "0002",
                "NodeGroupMembers": [
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1c",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0002-001"
                    },
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1a",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0002-002"
                    }
                ]
            }
        ],
        "ConfigurationEndpoint": {
            "Port": 6379,
            "Address": "test-rg.s1jbux.clustercfg.apne1.cache.amazonaws.com"
        },
        "AtRestEncryptionEnabled": false,
        "ClusterEnabled": true,
        "ReplicationGroupId": "test-rg",
        "GlobalReplicationGroupInfo": {},
        "SnapshotRetentionLimit": 0,
        "AutomaticFailover": "enabled",
        "TransitEncryptionEnabled": false,
        "SnapshotWindow": "15:00-16:00",
        "MemberClusters": [
            "test-rg-0001-001",
            "test-rg-0001-002",
            "test-rg-0001-003",
            "test-rg-0002-001",
            "test-rg-0002-002"
        ],
        "CacheNodeType": "cache.t3.micro",
        "PendingModifiedValues": {}
    }
}

ノードの追加が完了したら、Replication Groupの状態を確認します。

aws elasticache describe-replication-groups \
  --replication-group-id test-rg
{
    "ReplicationGroups": [
        {
            "Status": "available",
            "MultiAZ": "enabled",
            "Description": "description",
            "NodeGroups": [
                {
                    "Status": "available",
                    "Slots": "0-8191",
                    "NodeGroupId": "0001",
                    "NodeGroupMembers": [
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1a",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0001-001"
                        },
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1c",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0001-002"
                        },
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1a",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0001-003"
                        }
                    ]
                },
                {
                    "Status": "available",
                    "Slots": "8192-16383",
                    "NodeGroupId": "0002",
                    "NodeGroupMembers": [
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1c",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0002-001"
                        },
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1a",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0002-002"
                        }
                    ]
                }
            ],
            "ConfigurationEndpoint": {
                "Port": 6379,
                "Address": "test-rg.s1jbux.clustercfg.apne1.cache.amazonaws.com"
            },
            "AtRestEncryptionEnabled": false,
            "ClusterEnabled": true,
            "ReplicationGroupId": "test-rg",
            "GlobalReplicationGroupInfo": {},
            "SnapshotRetentionLimit": 0,
            "AutomaticFailover": "enabled",
            "TransitEncryptionEnabled": false,
            "SnapshotWindow": "15:00-16:00",
            "AuthTokenEnabled": false,
            "MemberClusters": [
                "test-rg-0001-001",
                "test-rg-0001-002",
                "test-rg-0001-003",
                "test-rg-0002-001",
                "test-rg-0002-002"
            ],
            "CacheNodeType": "cache.t3.micro",
            "PendingModifiedValues": {}
        }
    ]
}

次に、ap-northeast-1cのノードを削除し、NodeGroupId=0001のノードをap-northeast-1aのみにしてみます。

aws elasticache decrease-replica-count \
  --replication-group-id test-rg \
  --replicas-to-remove test-rg-0001-002 \
  --apply-immediately
An error occurred (InvalidParameterValue) when calling the DecreaseReplicaCount operation: Cannot delete the given replicas as there needs to be nodes in two Availability Zones for this Multi-AZ enabled Replication Group

上記のようなエラーが発生し、削除できない(保護されている)ことを確認できました。

MultiAZEnabledを無効にした場合の挙動

MultiAZEnabledを無効化した場合の挙動も確認しておきましょう。

まずはMultiAZEnabledを無効化します。

aws elasticache modify-replication-group \
  --replication-group-id test-rg \
  --no-multi-az-enabled
{
    "ReplicationGroup": {
        "Status": "available",
        "MultiAZ": "disabled",
        "Description": "description",
        "NodeGroups": [
            {
                "Status": "available",
                "Slots": "0-8191",
                "NodeGroupId": "0001",
                "NodeGroupMembers": [
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1a",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0001-001"
                    },
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1c",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0001-002"
                    },
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1a",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0001-003"
                    }
                ]
            },
            {
                "Status": "available",
                "Slots": "8192-16383",
                "NodeGroupId": "0002",
                "NodeGroupMembers": [
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1c",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0002-001"
                    },
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1a",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0002-002"
                    }
                ]
            }
        ],
        "ConfigurationEndpoint": {
            "Port": 6379,
            "Address": "test-rg.s1jbux.clustercfg.apne1.cache.amazonaws.com"
        },
        "AtRestEncryptionEnabled": false,
        "ClusterEnabled": true,
        "ReplicationGroupId": "test-rg",
        "GlobalReplicationGroupInfo": {},
        "SnapshotRetentionLimit": 0,
        "AutomaticFailover": "enabled",
        "TransitEncryptionEnabled": false,
        "SnapshotWindow": "15:00-16:00",
        "MemberClusters": [
            "test-rg-0001-001",
            "test-rg-0001-002",
            "test-rg-0001-003",
            "test-rg-0002-001",
            "test-rg-0002-002"
        ],
        "CacheNodeType": "cache.t3.micro",
        "PendingModifiedValues": {}
    }
}

先ほどと同じコマンドでノードの削除します。

aws elasticache decrease-replica-count \
  --replication-group-id test-rg \
  --replicas-to-remove test-rg-0001-002 \
  --apply-immediately
{
    "ReplicationGroup": {
        "Status": "modifying",
        "MultiAZ": "disabled",
        "Description": "description",
        "NodeGroups": [
            {
                "Status": "modifying",
                "Slots": "0-8191",
                "NodeGroupId": "0001",
                "NodeGroupMembers": [
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1a",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0001-001"
                    },
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1c",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0001-002"
                    },
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1a",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0001-003"
                    }
                ]
            },
            {
                "Status": "modifying",
                "Slots": "8192-16383",
                "NodeGroupId": "0002",
                "NodeGroupMembers": [
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1c",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0002-001"
                    },
                    {
                        "PreferredAvailabilityZone": "ap-northeast-1a",
                        "CacheNodeId": "0001",
                        "CacheClusterId": "test-rg-0002-002"
                    }
                ]
            }
        ],
        "ConfigurationEndpoint": {
            "Port": 6379,
            "Address": "test-rg.s1jbux.clustercfg.apne1.cache.amazonaws.com"
        },
        "AtRestEncryptionEnabled": false,
        "ClusterEnabled": true,
        "ReplicationGroupId": "test-rg",
        "GlobalReplicationGroupInfo": {},
        "SnapshotRetentionLimit": 0,
        "AutomaticFailover": "enabled",
        "TransitEncryptionEnabled": false,
        "SnapshotWindow": "15:00-16:00",
        "MemberClusters": [
            "test-rg-0001-001",
            "test-rg-0001-002",
            "test-rg-0001-003",
            "test-rg-0002-001",
            "test-rg-0002-002"
        ],
        "CacheNodeType": "cache.t3.micro",
        "PendingModifiedValues": {}
    }
}

削除が完了したらReplication Groupの状態を確認します。

aws elasticache describe-replication-groups \
  --replication-group-id test-rg
{
    "ReplicationGroups": [
        {
            "Status": "available",
            "MultiAZ": "disabled",
            "Description": "description",
            "NodeGroups": [
                {
                    "Status": "available",
                    "Slots": "0-8191",
                    "NodeGroupId": "0001",
                    "NodeGroupMembers": [
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1a",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0001-001"
                        },
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1a",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0001-003"
                        }
                    ]
                },
                {
                    "Status": "available",
                    "Slots": "8192-16383",
                    "NodeGroupId": "0002",
                    "NodeGroupMembers": [
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1c",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0002-001"
                        },
                        {
                            "PreferredAvailabilityZone": "ap-northeast-1a",
                            "CacheNodeId": "0001",
                            "CacheClusterId": "test-rg-0002-002"
                        }
                    ]
                }
            ],
            "ConfigurationEndpoint": {
                "Port": 6379,
                "Address": "test-rg.s1jbux.clustercfg.apne1.cache.amazonaws.com"
            },
            "AtRestEncryptionEnabled": false,
            "ClusterEnabled": true,
            "ReplicationGroupId": "test-rg",
            "GlobalReplicationGroupInfo": {},
            "SnapshotRetentionLimit": 0,
            "AutomaticFailover": "enabled",
            "TransitEncryptionEnabled": false,
            "SnapshotWindow": "15:00-16:00",
            "AuthTokenEnabled": false,
            "MemberClusters": [
                "test-rg-0001-001",
                "test-rg-0001-003",
                "test-rg-0002-001",
                "test-rg-0002-002"
            ],
            "CacheNodeType": "cache.t3.micro",
            "PendingModifiedValues": {}
        }
    ]
}

このように、ノードが特定のAZに偏った状態を意図的に作成することができました。

(補足)ElasticacheのSLA

ElasticacheのSLAは以下のように記載されています。

Service Commitment AWS will use commercially reasonable efforts to make ElastiCache for Memcached Cross-AZ Configurations and ElastiCache for Redis Multi-AZ Configurations available with a Monthly Uptime Percentage of at least 99.9% during any monthly billing cycle (the "Service Commitment"). In the event ElastiCache does not meet the Monthly Uptime Percentage commitment, you will be eligible to receive a Service Credit as described below.

Amazon ElastiCache Service Level Agreement Last Updated: March 20, 2019

このように、複数AZへの分散配置が前提となっています。 この前提が崩れては困る方はMultiAZEnabledを有効化しましょう。

まとめ

冒頭に申し上げたとおり、すでにご利用の皆様にはPersonal Health Dashboard経由で有効化を促す通知が送付されています。 環境を適切な状態に維持するためにも有効化しておきましょう。

また、これから新しい環境を構築する際にも有効化することをオススメします。

現場からは以上です。