AWSサービスクォータ制限を監視してSlack通知するCloudForamtionテンプレートを作ってCloudFormation StackSetでデプロイしてみた
以前AWSサービスクォータ制限を監視して、Slack通知する仕組みを作ってみました。
CloudFormation StackSetを使って複数アカウントにデプロイしたい場合があったので、CloudFormation版を作ってみました。
やってみた
CloudFormationテンプレートの準備
以下のCloudFormationテンプレートを用意します。
作成されるリソースの構成や各サービスの選定理由は冒頭の記事で説明していますので、そちらをご確認ください。
AWSTemplateFormatVersion: '2010-09-09'
Description: 'AWS Service Quota Monitoring with Trusted Advisor, EventBridge, Step Functions, and Slack webhook integration'
Parameters:
SlackWebhookUrl:
Type: String
Description: 'Slack webhook URL for notifications'
NoEcho: true
Resources:
# IAM Role for Step Functions
# Reference: https://docs.aws.amazon.com/step-functions/latest/dg/cw-logs.html
StepFunctionsServiceRole:
Type: AWS::IAM::Role
Properties:
RoleName: ServiceQuotaMonitoring-StepFunctions-Role
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: states.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: ServiceQuotaMonitoring-StepFunctions-Policy
PolicyDocument:
Version: '2012-10-17'
Statement:
# CloudWatch Logs permissions for Step Functions logging
# Reference: https://docs.aws.amazon.com/step-functions/latest/dg/cw-logs.html
# Note: Resource must be "*" as these API actions don't support specific resource ARNs
- Effect: Allow
Action:
- logs:CreateLogDelivery
- logs:CreateLogStream
- logs:GetLogDelivery
- logs:UpdateLogDelivery
- logs:DeleteLogDelivery
- logs:ListLogDeliveries
- logs:PutLogEvents
- logs:PutResourcePolicy
- logs:DescribeResourcePolicies
- logs:DescribeLogGroups
Resource: '*'
- Effect: Allow
Action:
- states:InvokeHTTPEndpoint
Resource: '*'
Condition:
StringEquals:
'states:HTTPEndpoint': !Ref SlackWebhookUrl
- Effect: Allow
Action:
- events:RetrieveConnectionCredentials
Resource: !GetAtt SlackWebhookConnection.Arn
- Effect: Allow
Action:
- secretsmanager:GetSecretValue
- secretsmanager:DescribeSecret
Resource: !GetAtt SlackWebhookConnection.SecretArn
# IAM Role for EventBridge
EventBridgeServiceRole:
Type: AWS::IAM::Role
Properties:
RoleName: ServiceQuotaMonitoring-EventBridge-Role
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: events.amazonaws.com
Action: sts:AssumeRole
Condition:
StringEquals:
aws:SourceArn: !Sub 'arn:aws:events:${AWS::Region}:${AWS::AccountId}:rule/ServiceQuotaMonitoring-Alert'
aws:SourceAccount: !Ref AWS::AccountId
Policies:
- PolicyName: ServiceQuotaMonitoring-EventBridge-Policy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- states:StartExecution
Resource: !Ref SlackNotificationStateMachine
# EventBridge Connection for Slack webhook
SlackWebhookConnection:
Type: AWS::Events::Connection
Properties:
Name: ServiceQuotaMonitoring-SlackWebhook
Description: 'Connection for Slack webhook'
AuthorizationType: API_KEY
AuthParameters:
ApiKeyAuthParameters:
ApiKeyName: 'Authorization'
ApiKeyValue: 'Bearer dummy'
# CloudWatch Log Group for Step Functions
StepFunctionsLogGroup:
Type: AWS::Logs::LogGroup
Properties:
LogGroupName: '/aws/stepfunctions/ServiceQuotaMonitoring-SlackNotification'
RetentionInDays: 14
# Step Functions State Machine
SlackNotificationStateMachine:
Type: AWS::StepFunctions::StateMachine
Properties:
StateMachineName: ServiceQuotaMonitoring-SlackNotification
RoleArn: !GetAtt StepFunctionsServiceRole.Arn
LoggingConfiguration:
Level: ALL
IncludeExecutionData: true
Destinations:
- CloudWatchLogsLogGroup:
LogGroupArn: !GetAtt StepFunctionsLogGroup.Arn
DefinitionString: !Sub |
{
"Comment": "Send service quota alerts to Slack",
"StartAt": "SendSlackNotification",
"States": {
"SendSlackNotification": {
"Type": "Task",
"Resource": "arn:aws:states:::http:invoke",
"Parameters": {
"ApiEndpoint": "${SlackWebhookUrl}",
"Method": "POST",
"Headers": {
"Content-Type": "application/json"
},
"RequestBody": {
"text.$": "States.Format('*Trusted Advisor Service Quota*\n\n*Check*: {}\n*Status*: {}\n*Account*: {}\n\n*Console Link*: https://{}.signin.aws.amazon.com/console/trustedadvisor?region={}', $.checkname, $.status, $.account, $.account, $.region)"
},
"InvocationConfig": {
"ConnectionArn": "${SlackWebhookConnection.Arn}"
}
},
"End": true,
"Retry": [
{
"ErrorEquals": ["States.Http.HttpRequestFailed"],
"IntervalSeconds": 5,
"MaxAttempts": 3,
"BackoffRate": 2.0
}
],
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "NotificationFailed"
}
]
},
"NotificationFailed": {
"Type": "Fail",
"Cause": "Failed to send Slack notification"
}
}
}
# EventBridge Rule for Service Quota Monitoring
ServiceQuotaMonitoringRule:
Type: AWS::Events::Rule
Properties:
Name: ServiceQuotaMonitoring-Alert
Description: 'Monitor AWS service quotas via Trusted Advisor alerts'
EventPattern:
source:
- aws.trustedadvisor
# - custom.testing # テスト用 テスト時のみコメントイン
detail-type:
- Trusted Advisor Check Item Refresh Notification
detail:
status:
- ERROR
- WARN
check-name:
- VPC
- VPC Internet Gateways
- EC2-VPC Elastic IP Address
- Underutilized Amazon EBS Volumes
- Amazon Route 53 Latency Resource Record Sets
- Amazon EC2 Reserved Instances Optimization
- Low Utilization Amazon EC2 Instances
State: ENABLED
Targets:
- Arn: !Ref SlackNotificationStateMachine
Id: ServiceQuotaMonitoringStepFunctionsTarget
RoleArn: !GetAtt EventBridgeServiceRole.Arn
InputTransformer:
InputPathsMap:
account: $.account
checkname: $.detail.check-name
status: $.detail.status
region: $.region
InputTemplate: |
{
"account": "<account>",
"checkname": "<checkname>",
"status": "<status>",
"region": "<region>"
}
CloudFormationスタックセット経由でデプロイしてみる
Terraformをメインで利用している場合を想定して、スタックセットはTerraformで作ります。
必要に応じて、マネジメントコンソール経由の作成に置き換えてください。
以下は検証用のため、Statefileのリモートバックエンド設定入れていませんが、本番環境で利用する際は環境に合わせてバックエンド設定を追加することをおすすめします。
terraform {
required_version = ">= 1.13.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = ">= 6.19.0"
}
}
}
provider "aws" {
region = "us-east-1"
}
variable "aws_ou_id" {
description = "The id of the OU to deploy the IAM role to"
type = string
}
variable "slack_webhook_url" {
description = "The Slack webhook URL for notifications"
type = string
sensitive = true
}
resource "aws_cloudformation_stack_set" "this" {
name = "ServiceQuotaMonitoring"
permission_model = "SERVICE_MANAGED"
auto_deployment {
enabled = true
retain_stacks_on_account_removal = false
}
parameters = {
SlackWebhookUrl = var.slack_webhook_url
}
template_body = file("${path.module}/files/service-quota-monitoring.yaml")
capabilities = ["CAPABILITY_NAMED_IAM"]
}
resource "aws_cloudformation_stack_set_instance" "this" {
stack_set_name = aws_cloudformation_stack_set.this.name
deployment_targets {
organizational_unit_ids = [var.aws_ou_id]
}
}
terraform.tfvarsでデプロイ先のOU IDとSlack Webhook URLをセットします。
aws_ou_id = "ou-XXXXXX"
slack_webhook_url = "https://hooks.slack.com/services/XXXXXX/XXXXXX/XXXXXX "
準備ができたらTerraformを実行して、リソースを作成します。
terraform init
terraform plan
terraform apply
動作確認
動作確認を行っていきます。
サービスクォータを超える量のリソースを作成して確認するのも一つの手段ですが、今回はAWS CLIでカスタムイベントを作成してSlackに通知が届くかを確認します。
動作確認時はServiceQuotaMonitoringRule.Properties.EventPattern.source.custom.testingのコメントアウトを解除します。
ServiceQuotaMonitoringRule:
Type: AWS::Events::Rule
Properties:
Name: ServiceQuotaMonitoring-Alert
Description: 'Monitor AWS service quotas via Trusted Advisor alerts'
EventPattern:
source:
- aws.trustedadvisor
- # - custom.testing # テスト用 テスト時のみコメントイン
+ - custom.testing # テスト用 テスト時のみコメントイン
source: aws.trustedadvisorは、AWSが生成したイベントでのみ使用できるため、独自のsourceを定義しています。
テストイベント用のファイルを用意します。
4つのイベントを定義しました。
今回はERRORとWARNのイベントを通知するように設定しています。
status: OKのイベントで通知が飛ばないかを確認するために、status:OKイベントである Amazon EC2 Reserved Instances Optimizationも用意しました。
正常であれば、3つのイベントがSlackに通知されます。
[
{
"Source": "custom.testing",
"DetailType": "Trusted Advisor Check Item Refresh Notification",
"Detail": "{\"check-name\":\"Amazon Route 53 Latency Resource Record Sets\",\"check-item-detail\":{\"Hosted Zone Name\":\"latency.blahblahblah.com.\",\"Hosted Zone ID\":\"Z3NGWQ075IS9UX\",\"Resource Record Set Type\":\"A\",\"Resource Record Set Name\":\"latency.latency.blahblahblah.com.\"},\"status\":\"WARN\",\"resource_id\":\"arn:aws:route53:::hostedzone/Z3NGWQ075IS9UX\",\"uuid\":\"c66e03e1-64e2-4eb4-b9b7-5081972afeb6\"}",
"Time": "2016-11-13T13:31:34Z",
"Resources": []
},
{
"Source": "custom.testing",
"DetailType": "Trusted Advisor Check Item Refresh Notification",
"Detail": "{\"check-name\":\"Underutilized Amazon EBS Volumes\",\"check-item-detail\":{\"Volume Type\":\"General purpose(SSD)\",\"Volume ID\":\"vol-69a6546e\",\"Volume Size\":\"1024\",\"Snapshot Name\":null,\"Region\":\"ap-southeast-2\",\"Snapshot ID\":null,\"Monthly Storage Cost\":\"$122.88\",\"Volume Name\":null,\"Snapshot Age\":null},\"status\":\"WARN\",\"resource_id\":\"arn:aws:ec2:ap-southeast-2:123456789012:volume/vol-69a6546e\",\"uuid\":\"28945e59-c856-440f-979c-1c12d6999615\"}",
"Time": "2016-11-13T13:31:35Z",
"Resources": []
},
{
"Source": "custom.testing",
"DetailType": "Trusted Advisor Check Item Refresh Notification",
"Detail": "{\"check-name\":\"Amazon EC2 Reserved Instances Optimization\",\"check-item-detail\":{\"Recommended Additional 1-Year RIs\":\"+1\",\"Estimated Bill \\n(Current RIs)\":\"$126.00\",\"Current RIs \\n(1-Year and 3-Year)\":\"0\",\"Upfront Cost\":\"$750.00\",\"Region / AZ\":\"us-west-2a\",\"Instance Type\":\"m1.large\",\"Estimated Bill \\n(Optimized RIs)\":\"$46.13\",\"Operating System\":\"Linux/Unix\",\"Hourly Instance Usage Max/Average/Min\":\"1/0/0\",\"Estimated Monthly Savings\":\"$79.87\",\"Recommended Additional 3-Year RIs\":\"+1\"},\"status\":\"OK\",\"resource_id\":\"\",\"uuid\":\"7952e954-4408-4088-bdea-f9b17d9d1305\"}",
"Time": "2016-11-13T13:31:36Z",
"Resources": []
},
{
"Source": "custom.testing",
"DetailType": "Trusted Advisor Check Item Refresh Notification",
"Detail": "{\"check-name\":\"Low Utilization Amazon EC2 Instances\",\"check-item-detail\":{\"Day 1\":\"0.0% 0.00MB\",\"Day 2\":\"0.0% 0.00MB\",\"Day 3\":\"0.0% 0.00MB\",\"Region/AZ\":\"eu-central-1a\",\"Estimated Monthly Savings\":\"$10.80\",\"14-Day Average CPU Utilization\":\"0.0%\",\"Day 14\":\"0.0% 0.00MB\",\"Day 13\":\"0.0% 0.00MB\",\"Day 12\":\"0.0% 0.00MB\",\"Day 11\":\"0.0% 0.00MB\",\"Day 10\":\"0.0% 0.00MB\",\"14-Day Average Network I/O\":\"0.00MB\",\"Number of Days Low Utilization\":\"14 days\",\"Instance Type\":\"t2.micro\",\"Instance ID\":\"i-917b1a5f\",\"Day 8\":\"0.0% 0.00MB\",\"Instance Name\":null,\"Day 9\":\"0.0% 0.00MB\",\"Day 4\":\"0.0% 0.00MB\",\"Day 5\":\"0.0% 0.00MB\",\"Day 6\":\"0.0% 0.00MB\",\"Day 7\":\"0.0% 0.00MB\"},\"status\":\"WARN\",\"resource_id\":\"arn:aws:ec2:eu-central-1:123456789012:instance/i-917b1a5f\",\"uuid\":\"6ba6d96a-d3dd-4fca-8020-350bbee4719c\"}",
"Time": "2016-11-13T13:31:37Z",
"Resources": []
}
]
ちなみにテストイベントの内容は、マネジメントコンソール -> Eventbridgeのサンプルイベントで確認できるものを利用しました。

スタックセット経由でスタックをデプロイしたアカウントで以下のコマンドを実行し、イベントを作成します。
aws events put-events \
--entries file://test-event.json \
--region us-east-1
Slackを確認すると想定どおり、3つのイベントが通知されていました。

後片付け
環境が不要であれば、Terraformでリソースを削除します。
terraform destroy






