I tried creating an alarm from Logs Insights query results using CloudWatch's new feature "Log Based Alarm"

I tried creating an alarm from Logs Insights query results using CloudWatch's new feature "Log Based Alarm"

I tested using the new "Log Based Alarm" alarm type added to CloudWatch to turn Logs Insights query results into alarms. I verified the creation procedure via the PutLogAlarm API, notifications with log lines (ActionLogLineCount), EventBridge integration, and how to implement ratio alarms combined with `if`/`case` functions.
2026.06.30

This page has been translated by machine translation. View original

Introduction

On June 29, 2026, a new alarm type called "Log Based Alarm" was added to CloudWatch. The AWS CLI v2 Changelog contains the following entry:

2.35.12

api-change:cloudwatch: This release adds the API (PutLogAlarm) to manage a new CloudWatch resource, Log Based Alarms. Log Based Alarms allows customers to alarm directly on CloudWatch Logs query results.

https://raw.githubusercontent.com/aws/aws-cli/v2/CHANGELOG.rst

The CLI reference for PutLogAlarm is published here:

https://docs.aws.amazon.com/cli/latest/reference/cloudwatch/put-log-alarm.html

Previously, alarms based on CloudWatch Logs content required creating custom metrics via metric filters. With Log Based Alarms, this is no longer necessary — results extracted and processed by a Logs Insights query can be aggregated using AggregationExpression, and that value can be used directly for alarm evaluation. The main differences from the traditional approach are as follows:

Item Traditional (Metric Filter + MetricAlarm) New (Log Based Alarm)
Setup steps Create metric filter → Verify metric → Create alarm Completed in 1 API call with PutLogAlarm
Query flexibility Fixed pattern matching only Filter and field processing with Logs Insights queries
Include log lines in notifications Not possible Up to 50 lines with ActionLogLineCount
Logs Insights link None Direct link to query results included in notification
Evaluation method EvaluationPeriods / DatapointsToAlarm QueryResultsToEvaluate / QueryResultsToAlarm
Chatbot support × (as of 2026-06-30)
Cost Metric filter: Free / Custom metric: $0.30/month / Alarm: $0.10/month Alarm: $0.10/month + Logs Insights query: $0.005/GB scanned

How Log Based Alarms Work

Log Based Alarms operate on a pipeline of "scheduled query execution → result aggregation → threshold evaluation."

Scheduled Query

When PutLogAlarm is executed, a Scheduled Query is automatically created in CloudWatch Logs. A Logs Insights query is executed periodically on the specified schedule (e.g., rate(5 minutes)), and the results are used for alarm evaluation.

The time range targeted by the query is controlled by StartTimeOffset and EndTimeOffset.

AggregationExpression

This is an expression that aggregates query results into a single numeric value. You specify one of the following functions: count(*), sum(fieldName), avg(fieldName), min(fieldName), or max(fieldName). This aggregated result is compared against the threshold.

M-of-N Evaluation

The evaluation logic is controlled by QueryResultsToEvaluate (the N most recent query results to evaluate) and QueryResultsToAlarm (ALARM triggers when M of those results satisfy the threshold condition).

Verification: Creating an Alarm

Let's actually create an alarm using PutLogAlarm.

Preparing the IAM Role

Log Based Alarms require an IAM role for executing scheduled queries. Both logs.amazonaws.com and cloudwatch.amazonaws.com must be trusted.

aws iam create-role \
  --role-name "log-alarm-query-role" \
  --assume-role-policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Principal": {"Service": "logs.amazonaws.com"},
        "Action": "sts:AssumeRole"
      },
      {
        "Effect": "Allow",
        "Principal": {"Service": "cloudwatch.amazonaws.com"},
        "Action": "sts:AssumeRole"
      }
    ]
  }'

Grant query permissions for the target log group using an inline policy.

aws iam put-role-policy \
  --role-name "log-alarm-query-role" \
  --policy-name "LogAlarmQueryPermissions" \
  --policy-document '{
    "Version": "2012-10-17",
    "Statement": [
      {
        "Effect": "Allow",
        "Action": [
          "logs:StartQuery",
          "logs:GetQueryResults",
          "logs:StopQuery",
          "logs:FilterLogEvents",
          "logs:GetLogEvents"
        ],
        "Resource": "arn:aws:logs:ap-northeast-1:123456789012:log-group:/aws/ecs/your-app:*"
      }
    ]
  }'

Executing PutLogAlarm

aws cloudwatch put-log-alarm \
  --alarm-name "timeout-log-alarm" \
  --alarm-description "ECS timeout errors - 5min aggregation, threshold 20" \
  --scheduled-query-configuration '{
    "QueryString": "filter @message like /timeout request to/",
    "LogGroupIdentifiers": ["/aws/ecs/your-app"],
    "ScheduledQueryRoleARN": "arn:aws:iam::123456789012:role/log-alarm-query-role",
    "ScheduleConfiguration": {
      "ScheduleExpression": "rate(5 minutes)",
      "StartTimeOffset": 360,
      "EndTimeOffset": 60
    },
    "AggregationExpression": "count(*)"
  }' \
  --action-log-line-count 10 \
  --action-log-line-role-arn "arn:aws:iam::123456789012:role/log-alarm-query-role" \
  --actions-enabled \
  --alarm-actions "arn:aws:sns:ap-northeast-1:123456789012:your-alarm-topic" \
  --ok-actions "arn:aws:sns:ap-northeast-1:123456789012:your-alarm-topic" \
  --query-results-to-evaluate 1 \
  --query-results-to-alarm 1 \
  --threshold 20.0 \
  --comparison-operator "GreaterThanThreshold" \
  --treat-missing-data "notBreaching" \
  --region ap-northeast-1

Key parameters:

  • QueryString: Logs Insights query. Describes log filtering and field definitions (final numeric aggregation is specified in AggregationExpression)
  • LogGroupIdentifiers: Log groups to query
  • ScheduleExpression: Query execution interval
  • StartTimeOffset / EndTimeOffset: Specifies the query target time range as relative seconds from execution time. In this case, 360 / 60 is specified, so the 5-minute window from 6 minutes before to 1 minute before execution time is the evaluation target
  • AggregationExpression: Aggregation function for query results
  • ActionLogLineCount: Number of log lines to include in alarm notifications (maximum 50)
  • QueryResultsToEvaluate / QueryResultsToAlarm: N and M for M-of-N evaluation

Confirming Automatic Scheduled Query Creation

After creation, you can check the Scheduled Query ARN using describe-alarms.

aws cloudwatch describe-alarms \
  --alarm-names "timeout-log-alarm" \
  --region ap-northeast-1
Response (LogAlarms section)
{
  "LogAlarms": [
    {
      "AlarmName": "timeout-log-alarm",
      "AlarmArn": "arn:aws:cloudwatch:ap-northeast-1:123456789012:alarm:timeout-log-alarm",
      "AlarmDescription": "ECS timeout errors - 5min aggregation, threshold 20",
      "ActionsEnabled": true,
      "OKActions": ["arn:aws:sns:ap-northeast-1:123456789012:your-alarm-topic"],
      "AlarmActions": ["arn:aws:sns:ap-northeast-1:123456789012:your-alarm-topic"],
      "InsufficientDataActions": [],
      "StateValue": "INSUFFICIENT_DATA",
      "ScheduledQueryConfiguration": {
        "QueryString": "filter @message like /timeout request to/",
        "LogGroupIdentifiers": ["/aws/ecs/your-app"],
        "QueryARN": "arn:aws:logs:ap-northeast-1:123456789012:scheduled-query:e026155f-ae01-4d43-bde5-024d4ea39f45",
        "ScheduledQueryRoleARN": "arn:aws:iam::123456789012:role/log-alarm-query-role",
        "ScheduleConfiguration": {
          "ScheduleExpression": "rate(5 minutes)",
          "StartTimeOffset": 360,
          "EndTimeOffset": 60
        },
        "AggregationExpression": "count(*)"
      },
      "QueryResultsToEvaluate": 1,
      "QueryResultsToAlarm": 1,
      "Threshold": 20.0,
      "ComparisonOperator": "GreaterThanThreshold",
      "TreatMissingData": "notBreaching",
      "ActionLogLineCount": 10,
      "ActionLogLineRoleArn": "arn:aws:iam::123456789012:role/log-alarm-query-role"
    }
  ]
}

The response from describe-alarms is returned under the LogAlarms key, separate from the traditional MetricAlarms.

Verification: Alarm Firing and Notification Confirmation

When the scheduled query executes, a Logs Insights query runs against the target log group. In this verification, 34 timeout log entries in the target log group matched. Since this exceeded the threshold of 20, the alarm transitioned to ALARM state.

StateReason: "Threshold Crossed: 1 out of the last 1 query results [34.0 (30/06/26 11:43:54)]
was greater than the threshold (20.0)
(minimum 1 datapoint for OK -> ALARM transition)."

SNS Notification Content

The SNS message when an ALARM fires includes the following fields that are not present in traditional MetricAlarms:

{
  "AlarmName": "timeout-log-alarm",
  "NewStateValue": "ALARM",
  "NewStateReason": "Threshold Crossed: 1 out of the last 1 query results [34.0 (30/06/26 11:43:54)] was greater than the threshold (20.0) ...",
  "LogGroups": ["/aws/ecs/your-app"],
  "QueryString": "filter @message like /timeout request to/",
  "AggregationExpression": "count(*)",
  "QueryExecutionId": "627920a2-af1e-4d79-a866-04488dbebd55"
}

The Trigger field present in traditional MetricAlarm messages is not included; instead, Log Alarm-specific fields are added.

Log Lines Bundled via ActionLogLineCount

The log lines included in the email notification were as follows:

Log Lines:
2026-06-30 11:41:59.111 timeout request to https://example.cloudfront.net/spaces/.../entries?...
2026-06-30 11:38:44.108 timeout request to https://example.io/api/stats/user-30.json.gz
2026-06-30 11:38:44.008 timeout request to https://example.io/api/stats/user-29.json.gz
2026-06-30 11:38:43.908 timeout request to https://example.io/api/stats/user-28.json.gz
2026-06-30 11:38:43.808 timeout request to https://example.io/api/stats/user-27.json.gz
2026-06-30 11:38:43.708 timeout request to https://example.io/api/stats/user-26.json.gz
2026-06-30 11:38:43.608 timeout request to https://example.io/api/stats/user-25.json.gz
2026-06-30 11:38:43.508 timeout request to https://example.io/api/stats/user-24.json.gz
2026-06-30 11:38:43.408 timeout request to https://example.io/api/stats/user-23.json.gz
2026-06-30 11:38:43.308 timeout request to https://example.io/api/stats/user-22.json.gz

The email notification also includes a direct link to the Logs Insights query results.

View query results in Logs Insights:
https://ap-northeast-1.console.aws.amazon.com/cloudwatch/home?region=ap-northeast-1#logsV2:logs-insights$3FqueryId$3D627920a2-af1e-4d79-a866-04488dbebd55

Verification: EventBridge Integration

State changes in Log Based Alarms can also be captured with EventBridge. The event pattern is the same CloudWatch Alarm State Change as regular CloudWatch Alarms.

aws events put-rule \
  --name "log-alarm-state-change-capture" \
  --event-pattern '{
    "source": ["aws.cloudwatch"],
    "detail-type": ["CloudWatch Alarm State Change"],
    "resources": ["arn:aws:cloudwatch:ap-northeast-1:123456789012:alarm:timeout-log-alarm"]
  }' \
  --state ENABLED \
  --region ap-northeast-1

The structure of the captured event is as follows:

EventBridge Event (OK → ALARM transition)
{
  "version": "0",
  "id": "a644916b-0f3e-7687-4a60-3e0b6b639f81",
  "detail-type": "CloudWatch Alarm State Change",
  "source": "aws.cloudwatch",
  "time": "2026-06-30T12:13:00Z",
  "region": "ap-northeast-1",
  "resources": [
    "arn:aws:cloudwatch:ap-northeast-1:123456789012:alarm:timeout-log-alarm"
  ],
  "detail": {
    "alarmName": "timeout-log-alarm",
    "state": {
      "value": "ALARM",
      "reason": "Threshold Crossed: 1 out of the last 1 query results [34.0] was greater than the threshold (20.0)...",
      "timestamp": "2026-06-30T12:13:00.795+0000"
    },
    "previousState": {
      "value": "OK",
      "reason": "...",
      "timestamp": "2026-06-30T12:12:51.478+0000"
    },
    "configuration": {
      "logGroupIdentifiers": ["/aws/ecs/your-app"],
      "queryString": "filter @message like /timeout request to/",
      "aggregationExpression": "count(*)",
      "scheduledQueryRoleARN": "arn:aws:iam::123456789012:role/log-alarm-query-role",
      "actionLogLineRoleArn": "arn:aws:iam::123456789012:role/log-alarm-query-role",
      "actionLogLineCount": 10,
      "schedule": {
        "expression": "rate(5 minutes)",
        "startTimeOffset": 360,
        "endTimeOffset": 60
      },
      "threshold": 20.0,
      "comparisonOperator": "GreaterThanThreshold",
      "treatMissingData": "notBreaching",
      "queryResultsToEvaluate": 1,
      "queryResultsToAlarm": 1
    }
  }
}

detail.configuration contains Log Alarm-specific fields (logGroupIdentifiers, queryString, aggregationExpression, schedule, etc.).

Advanced: Combining with Logs Insights Functions

The AggregationExpression in Log Based Alarms can reference numeric fields defined using fields or parse within the QueryString, using sum(), avg(), min(), or max(). When numerically aggregating values extracted as strings using parse, convert them with toNumber(). Combining with the new Logs Insights functions added in May–June 2026 (if, toNumber, isPrivateIP, etc.) enables alarms based not just on simple counts, but also on ratios and weighted scores.

https://dev.classmethod.jp/articles/cloudwatch-logs-insights-new-commands-functions-2026/

https://dev.classmethod.jp/articles/cloudwatch-logs-insights-new-commands-functions-2026-june/

Ratio Alarm: Fire When Timeout Rate Exceeds Threshold

Use the if function to flag matching lines as 1 and others as 0, then calculate the ratio with AVG.

aws cloudwatch put-log-alarm \
  --alarm-name "timeout-ratio-alarm" \
  --alarm-description "Timeout ratio exceeds 50% of all log lines" \
  --scheduled-query-configuration '{
    "QueryString": "fields if(@message like /timeout request to/, 1, 0) as is_timeout",
    "LogGroupIdentifiers": ["/aws/ecs/your-app"],
    "ScheduledQueryRoleARN": "arn:aws:iam::123456789012:role/log-alarm-query-role",
    "ScheduleConfiguration": {
      "ScheduleExpression": "rate(5 minutes)",
      "StartTimeOffset": 360,
      "EndTimeOffset": 60
    },
    "AggregationExpression": "avg(is_timeout)"
  }' \
  --actions-enabled \
  --query-results-to-evaluate 1 \
  --query-results-to-alarm 1 \
  --threshold 0.5 \
  --comparison-operator "GreaterThanThreshold" \
  --treat-missing-data "notBreaching" \
  --region ap-northeast-1

The evaluation result of the alarm I actually created was as follows:

StateReason: "Threshold Crossed: 1 out of the last 1 query results [0.03333333333333333 (30/06/26 12:58:25)]
was not greater than the threshold (0.5)
(minimum 1 datapoint for INSUFFICIENT_DATA -> OK transition)."

The timeout ratio was calculated as approximately 3.3% of the query result rows in the evaluation period, which is below the 50% threshold, so the state is OK. This confirms that avg(is_timeout) averages the is_timeout field (0 or 1) across each row of the query results and evaluates it as a ratio.

Weighted Score: Alarm Based on Severity

The case function (added in May 2026) enables multi-branch scoring. It is more readable than nested if statements.

aws cloudwatch put-log-alarm \
  --alarm-name "severity-score-alarm" \
  --alarm-description "Weighted severity score exceeds threshold" \
  --scheduled-query-configuration '{
    "QueryString": "fields case(@message like /timeout/, 2, @message like /error/, 1, 0) as severity_score",
    "LogGroupIdentifiers": ["/aws/ecs/your-app"],
    "ScheduledQueryRoleARN": "arn:aws:iam::123456789012:role/log-alarm-query-role",
    "ScheduleConfiguration": {
      "ScheduleExpression": "rate(5 minutes)",
      "StartTimeOffset": 360,
      "EndTimeOffset": 60
    },
    "AggregationExpression": "avg(severity_score)"
  }' \
  --actions-enabled \
  --query-results-to-evaluate 1 \
  --query-results-to-alarm 1 \
  --threshold 1.0 \
  --comparison-operator "GreaterThanThreshold" \
  --treat-missing-data "notBreaching" \
  --region ap-northeast-1

The syntax is case(condition1, value1, condition2, value2, ..., default_value), supporting up to 10 branches.

Other Combination Patterns

Use Case QueryString AggregationExpression
Average response time parse @message /took (?<ms>\d+)ms/ | fields toNumber(ms) as rt avg(rt)
Ratio of slow requests fields if(toNumber(response_time) > 1000, 1, 0) as is_slow avg(is_slow)
External IP access ratio fields if(isPrivateIP(src_ip), 0, 1) as is_external avg(is_external)
Detection of oversized log lines fields if(strlen(@message) > 10000, 1, 0) as is_oversized sum(is_oversized)

The key point is to define fields using fields in QueryString without using stats. Fields aggregated by stats within the query cannot be referenced from AggregationExpression.

Notes

AggregationExpression Constraints

Specify an aggregation function directly in AggregationExpression. The available functions are count(*), sum(fieldName), avg(fieldName), min(fieldName), and max(fieldName).

Fields defined with fields or parse can be referenced, but fields aggregated using stats within the QueryString cannot be referenced.

# NG: QueryString contains stats, AggregationExpression references the field
QueryString: "filter @message like /error/ | stats count(*) as errorCount"
AggregationExpression: "errorCount"

# OK: QueryString is filter only, aggregation is handled by AggregationExpression
QueryString: "filter @message like /error/"
AggregationExpression: "count(*)"

# OK: Field defined with fields is aggregated by AggregationExpression
QueryString: "fields if(@message like /error/, 1, 0) as is_error"
AggregationExpression: "avg(is_error)"

AWS Chatbot Not Supported (as of 2026-06-30)

The SNS message format for Log Based Alarms differs from traditional MetricAlarms (as mentioned earlier, the Trigger field is absent and the message is composed of Log Alarm-specific fields), so AWS Chatbot may not support it. In our verification, the SNS Publish succeeded but the notification did not appear in Slack.

If you want to send notifications to Slack via Chatbot, please try a configuration of EventBridge → Lambda to process the message into a custom notification format and then Publish to SNS.

https://dev.classmethod.jp/articles/sns-amazon-q-developer-aws-chatbot/

Resource-Based Policies for EventBridge Targets

If you specify CloudWatch Logs as an EventBridge target, a resource-based policy is required on the log group. Allow logs:CreateLogStream and logs:PutLogEvents for events.amazonaws.com.

Summary

Log Based Alarms have made it possible to greatly simplify CloudWatch Logs monitoring configuration. Without creating metric filters, you can combine filtering and field processing via Logs Insights queries with aggregation using AggregationExpression to treat log search results directly as alarm conditions.

In this verification, we confirmed that not only simple count thresholds, but also ratio and weighted value-based alarms such as timeout rate, severity score, and average response time can be expressed by using functions like if / case / toNumber / isPrivateIP. Configurations that previously required combining metric filters, custom metrics, and MetricAlarms, or performing conditional evaluation in Lambda, can now be achieved with less configuration.

On the notification side, the ability to include matched log lines via ActionLogLineCount and the inclusion of a direct link to Logs Insights query results are convenient features. EventBridge integration was also usable with the same event pattern as traditional alarms.

At this time, there are limitations with direct notifications to AWS Chatbot, and there are also points to be careful about with the usage of AggregationExpression, but this looks to be a strong option for scenarios where you want to simply create alarms based on log content.

Share this article

AWSのお困り事はクラスメソッドへ