I tried Multi-contributor Alarm with the BY clause in CloudWatch Log Based Alarm

I tried Multi-contributor Alarm with the BY clause in CloudWatch Log Based Alarm

2026.07.02

This page has been translated by machine translation. View original

Introduction

On July 1, 2026, the AWS blog (What's New) and official documentation for CloudWatch Log Based Alarm were published.

https://aws.amazon.com/jp/about-aws/whats-new/2026/07/amazon-cloudwatch-log-alarms/

In a previous article, we verified the basic functionality of the PutLogAlarm API by referencing the CLI documentation.

https://dev.classmethod.jp/articles/cloudwatch-log-based-alarm-putlogalarm-try/

With the documentation published this time, the specification for Multi-contributor Alarm (independent evaluation per group using the BY clause) — which was not verified in the previous article — has been formally documented. This article focuses on verifying this feature.

Item Single-value Alarm (previous article) Multi-contributor Alarm (this article)
AggregationExpression count(*), avg(duration), etc. count(*) by svc, avg(duration) by endpoint, etc.
Evaluation unit A single value for the entire query Independent evaluation per BY clause group (contributor)
Threshold evaluation Overall aggregated value vs. threshold Each contributor's value vs. threshold
SNS notification One notification for the entire alarm Individual notification per contributor that meets the threshold condition
Information included in notification AlarmName, NewStateReason + AlarmContributorId, AlarmContributorAttributes

What is Multi-contributor Alarm

When a by clause is added to AggregationExpression, query results are grouped, and each group is independently evaluated against the threshold as a "contributor." For example, with count(*) by serviceName, a count is calculated for each serviceName value. Contributors that meet the threshold condition are evaluated as ALARM, and notifications are fired per contributor.

The constraints listed in the official documentation are as follows:

  • Number of fields that can be specified in the BY clause: Maximum 5
  • Number of contributors per query: Maximum 500
  • Number of contributors tracked by the alarm: Maximum 100

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/alarm-log.html

Verification: Creating and Confirming Behavior of Multi-contributor Alarm

For verification, structured logs in the following JSON format were ingested into a log group.

{"level":"error","serviceName":"auth-service","environment":"prod","endpoint":"/api/login","duration":1500,"msg":"connection refused"}

Multi-contributor Alarms were created with three patterns of AggregationExpression against these logs.

Error count by service (basic BY clause form)

An alarm was created to count errors per serviceName and trigger ALARM when there are 2 or more.

aws cloudwatch put-log-alarm \
  --alarm-name "multi-contributor-case1-error-by-service" \
  --scheduled-query-configuration '{
    "QueryString": "filter level = \"error\" | fields serviceName as svc",
    "LogGroupIdentifiers": ["/test/multi-contributor-alarm"],
    "ScheduledQueryRoleARN": "arn:aws:iam::123456789012:role/log-alarm-scheduled-query-role",
    "ScheduleConfiguration": {
      "ScheduleExpression": "rate(1 minute)",
      "StartTimeOffset": 360,
      "EndTimeOffset": 0
    },
    "AggregationExpression": "count(*) by svc"
  }' \
  --query-results-to-evaluate 1 \
  --query-results-to-alarm 1 \
  --threshold 2 \
  --comparison-operator "GreaterThanOrEqualToThreshold" \
  --alarm-actions "arn:aws:sns:ap-northeast-1:123456789012:my-topic"

Here is the result of describe-alarms.

{
  "AlarmName": "multi-contributor-case1-error-by-service",
  "StateValue": "ALARM",
  "StateReason": "4 out of 4 contributors evaluated to ALARM",
  "ScheduledQueryConfiguration": {
    "AggregationExpression": "count(*) by svc"
  }
}

The StateReason takes the format "4 out of 4 contributors evaluated to ALARM," which shows the number of contributors in ALARM state.

The evaluation result for each contributor is as follows.

svc count(*) Evaluation
auth-service 3.0 ALARM
payment-service 3.0 ALARM
notification-service 2.0 ALARM
order-service 2.0 ALARM

All 4 contributors met the threshold (2 or more), so the alarm transitioned to ALARM. In this verification, logs for 4 types of serviceName were ingested, all of which contained 2 or more errors.

SNS notifications were fired individually per contributor. The structure of the notification message received via SQS is as follows.

{
  "AlarmName": "multi-contributor-case1-error-by-service",
  "NewStateValue": "ALARM",
  "NewStateReason": "Threshold Crossed: 1 out of the last 1 query results [3.0 (02/07/26 00:54:47)] was greater than or equal to the threshold (2.0)",
  "AlarmContributorId": "79e1d0e2f3955827",
  "AlarmContributorAttributes": {
    "svc": "payment-service"
  },
  "LogGroups": ["/test/multi-contributor-alarm"],
  "QueryString": "filter level = \"error\" | fields serviceName as svc",
  "AggregationExpression": "count(*) by svc",
  "QueryExecutionId": "4015767e-3a01-4c88-b245-062cd6ffb129"
}
  • AlarmContributorId: An ID that identifies the contributor (in this verification, a 16-digit hexadecimal string was observed)
  • AlarmContributorAttributes: The values of the fields specified in the BY clause are stored as a map
  • NewStateReason: Contains the actual measured value (3.0) and threshold (2.0) for that contributor

Latency by endpoint (avg + sort desc)

An alarm was created using avg(duration) by endpoint | sort desc to evaluate average latency per endpoint in descending order.

aws cloudwatch put-log-alarm \
  --alarm-name "multi-contributor-case2-latency-by-endpoint" \
  --scheduled-query-configuration '{
    "QueryString": "fields duration, endpoint",
    "LogGroupIdentifiers": ["/test/multi-contributor-alarm"],
    "ScheduledQueryRoleARN": "arn:aws:iam::123456789012:role/log-alarm-scheduled-query-role",
    "ScheduleConfiguration": {
      "ScheduleExpression": "rate(1 minute)",
      "StartTimeOffset": 360,
      "EndTimeOffset": 0
    },
    "AggregationExpression": "avg(duration) by endpoint | sort desc"
  }' \
  --query-results-to-evaluate 1 \
  --query-results-to-alarm 1 \
  --threshold 1000 \
  --comparison-operator "GreaterThanOrEqualToThreshold" \
  --alarm-actions "arn:aws:sns:ap-northeast-1:123456789012:my-topic"

Of the 6 endpoints, 5 with an average latency of 1000ms or more transitioned to ALARM.

The evaluation result for each contributor is as follows. Since only AlarmActions was configured this time, the values for ALARM contributors were confirmed via SNS notification, and the OK result for /api/refund was obtained from the query results.

endpoint avg(duration) Evaluation
/api/send 4750.0 ALARM
/api/charge 2900.0 ALARM
/api/token 1800.0 ALARM
/api/login 1750.0 ALARM
/api/orders 1150.0 ALARM
/api/refund 500.0 OK

When sort desc is specified, contributors are ordered from largest to smallest value in the query results. In cases like this where the number of contributors is within the limit, each contributor is independently evaluated against the threshold, and the presence or absence of sort did not affect the ALARM/OK evaluation results.

Multiple fields in BY clause (serviceName, environment)

The contributor key structure when multiple fields are specified in the BY clause was verified.

aws cloudwatch put-log-alarm \
  --alarm-name "multi-contributor-case3-by-service-env" \
  --scheduled-query-configuration '{
    "QueryString": "filter level = \"error\" | fields serviceName, environment",
    "LogGroupIdentifiers": ["/test/multi-contributor-alarm"],
    "ScheduledQueryRoleARN": "arn:aws:iam::123456789012:role/log-alarm-scheduled-query-role",
    "ScheduleConfiguration": {
      "ScheduleExpression": "rate(1 minute)",
      "StartTimeOffset": 360,
      "EndTimeOffset": 0
    },
    "AggregationExpression": "count(*) by serviceName, environment"
  }' \
  --query-results-to-evaluate 1 \
  --query-results-to-alarm 1 \
  --threshold 2 \
  --comparison-operator "GreaterThanOrEqualToThreshold" \
  --alarm-actions "arn:aws:sns:ap-northeast-1:123456789012:my-topic"

The result was "4 out of 5 contributors evaluated to ALARM." There were 5 combinations of serviceName and environment, of which 4 had 2 or more errors.

Checking AlarmContributorAttributes in the SNS notification, multiple fields are stored as a map.

{
  "AlarmContributorId": "07416b2411a9b6ab",
  "AlarmContributorAttributes": {
    "serviceName": "auth-service",
    "environment": "prod"
  }
}

When multiple fields are specified in the BY clause, each combination of field values becomes one contributor. This allows "auth-service in prod environment" and "auth-service in staging environment" to be independently evaluated as separate contributors.

Verification: Creating with CloudFormation

A Multi-contributor Alarm was created using the AWS::CloudWatch::LogAlarm resource type.

AWSTemplateFormatVersion: '2010-09-09'
Description: Multi-contributor Log Alarm via CloudFormation

Resources:
  MultiContributorLogAlarm:
    Type: AWS::CloudWatch::LogAlarm
    Properties:
      AlarmName: multi-contributor-cfn-case4
      AlarmDescription: "Multi-contributor Alarm created with CloudFormation"
      ScheduledQueryConfiguration:
        QueryString: 'filter level = "error" | fields serviceName as svc'
        LogGroupIdentifiers:
          - /test/multi-contributor-alarm
        ScheduledQueryRoleARN: arn:aws:iam::123456789012:role/log-alarm-scheduled-query-role
        ScheduleConfiguration:
          ScheduleExpression: rate(1 minute)
          StartTimeOffset: 360
          EndTimeOffset: 0
        AggregationExpression: "count(*) by svc"
      QueryResultsToEvaluate: 1
      QueryResultsToAlarm: 1
      Threshold: 2
      ComparisonOperator: GreaterThanOrEqualToThreshold
      AlarmActions:
        - arn:aws:sns:ap-northeast-1:123456789012:my-topic
      ActionsEnabled: true

Outputs:
  AlarmArn:
    Value: !GetAtt MultiContributorLogAlarm.Arn

After deployment, checking with describe-alarms confirmed it behaved the same as when created via CLI, transitioning to "3 out of 3 contributors evaluated to ALARM."

Supplementary: Information Formally Documented in Official Documentation

In Log Based Alarm, the following IAM roles are specified according to their purpose.

  • ScheduledQueryRoleARN: Used for executing scheduled queries. Requires permissions for logs:StartQuery, logs:StopQuery, and logs:GetQueryResults
  • ActionLogLineRoleArn: Used to retrieve log lines to include in notifications when ActionLogLineCount is configured. Requires permission for logs:GetLogEvents

It is recommended to restrict permissions to the minimum necessary, limiting them to only the target log group ARN.

TreatMissingData Behavior

The behavior when query results are empty (no contributors returned) is controlled by TreatMissingData. The results of verifying four options against an empty log group (/test/multi-contributor-alarm-empty) are as follows.

Option Alarm state Use case
missing (default) INSUFFICIENT_DATA When you want to explicitly detect insufficient data
notBreaching OK When no logs = considered normal
breaching ALARM When the absence of logs is itself abnormal (e.g., missing health check logs)
ignore Maintains previous state When you don't want the alert state to change due to temporary data loss

Since the setting value is included in StateReason, it is clear why the alarm entered that state.

Note that in this verification, an empty log group was used to confirm the behavior when "the entire query result is empty." The handling of cases where "only logs for a specific contributor are absent" when using the BY clause is outside the scope of this verification.

EvaluationState

When the scheduled query execution itself fails, rather than a normal state transition, the EvaluationState field is set to EVALUATION_ERROR. This was verified by specifying an IAM role without logs:StartQuery permission as the ScheduledQueryRoleARN.

{
  "StateValue": "INSUFFICIENT_DATA",
  "EvaluationState": "EVALUATION_ERROR",
  "StateReason": "User with accountId: 123456789012 is not authorized to perform StartQuery on resources /test/multi-contributor-alarm."
}

If EvaluationState appears in the describe-alarms response, please check the permissions of the IAM role configured in ScheduledQueryRoleARN.

Other Constraints

  • Concurrent execution limit for Scheduled Queries: 100 per account
  • EndTimeOffset: Set this when accounting for log ingestion delays. With 0 (default), logs that have not yet finished being ingested may be excluded from evaluation
  • M-out-of-N evaluation: With QueryResultsToAlarm (M) out of QueryResultsToEvaluate (N), the alarm transitions to ALARM when M out of the last N evaluations meet the threshold condition. For example, setting M=2, N=3 means the alarm triggers when 2 or more of the last 3 evaluations are breaching, which helps suppress false positives from temporary spikes

Summary

In the previous article, we confirmed that Log Based Alarm can evaluate CloudWatch Logs search results as a single aggregated value and create log-based alarms without metric filters.

With the Multi-contributor Alarm verified this time, adding a BY clause to AggregationExpression allows the evaluation unit to be extended from the entire query to the contributor level. By specifying something like count(*) by serviceName or avg(duration) by endpoint, threshold evaluation is performed independently per service name or endpoint. We confirmed that notifications are sent per contributor that meets the threshold condition.

This allows multiple services and multiple endpoints to be monitored comprehensively with a single Log Based Alarm. The target group can be confirmed from AlarmContributorAttributes included in the notification, and within the scope of this verification, it was also possible to create alarms the same way with CloudFormation.

With Multi-contributor Alarm, a common threshold is applied to all contributors within the same BY clause alarm. Therefore, a practical approach would be to consolidate groups that can be evaluated with the same threshold into a BY clause alarm, and define groups that require different thresholds as separate alarms.

Share this article

AWSのお困り事はクラスメソッドへ

Related articles