I verified 23 new commands and functions added to CloudWatch Logs Insights (June 2026 edition)

I verified 23 new commands and functions added to CloudWatch Logs Insights (June 2026 edition)

In June 2026, we verified 23 new commands and functions added to CloudWatch Logs Insights. We tested hash functions, IP determination, type conversion, CSV/XML parsing, histogram, multi-stage pipes for stats, and more using actual queries, and summarized the confirmed behaviors and points to note.
2026.06.10

This page has been translated by machine translation. View original

Introduction

On June 8, 2026, 23 new commands and functions were added to Amazon CloudWatch Logs Insights.

https://aws.amazon.com/jp/about-aws/whats-new/2026/06/amazon-cloudwatch-logs-insights-new/

In the previous article, we introduced the 13 additions announced on May 21, but in less than a month, there has been another significant addition.

https://dev.classmethod.jp/articles/cloudwatch-logs-insights-new-commands-functions-2026/

Announcement Date Number Added Main Categories
5/21 13 String operations, encode/decode, logfmt parsing, coordinate calculation
6/8 23 Hash, IP classification, type conversion, time series analysis, CSV/XML parsing, command extensions

Comparing the trends of features added in May and June reveals the direction of the query language.

May (13) June (23)
Character Log viewing and formatting tool Log analysis platform
Main work "Read and search" "Aggregate and determine"
What it replaced Manual inspection, manual decoding Post-processing in Athena/pandas/Splunk
Typical use cases Initial investigation during incidents Security audits, traffic analysis, SLO aggregation

While May's additions were focused on "making logs easier to read," June's additions can be described as "getting answers from logs."

The list of new features added this time is as follows.

Category Items
Hash functions md5, sha256
String functions strcontains (case-insensitive support), split
Conditional logic if
Conversion functions toNumber, toInt, toLong, toDouble
IP functions ipv4ToNumber, isPrivateIP, isPublicIP, isReservedIP
Analytics functions rate, count_over_time, sum_over_time, offset, histogram
Parse functions parse CSV, parse XML, parse multi, values, addtotals
Other limit any N, stats command up to 10 stages

※ The official announcement states "23 new query commands and functions." The above table expands parse CSV/XML/multi etc. individually, so the item count may differ. The official announcement counts syntax extensions together, and this article uses the official count of "23" as-is.

This article will run these in practice and confirm the operation results.

Verification Environment

  • Region: us-east-1
  • Log group: /test/insights-new-2026-06
  • Test data: JSON format, CSV format, XML format, time series data. Fictional user IDs, service names, and IP addresses are used (IPs consist of RFC 5737 TEST-NET, RFC 1918 private ranges, and well-known addresses such as 8.8.8.8)

Hash Functions (md5, sha256)

md5 and sha256 are functions that generate hash values from field values. They can be used when you want to display hashed versions of user IDs and similar values in logs.

fields user_id, md5(user_id) as md5_hash, sha256(user_id) as sha256_hash
| limit 3
user_id md5_hash sha256_hash
usr_010 2fb6c8adce410db19ed04a7157b1ebd0 34f0365ae65242b4664ad6a1e4fe941c77caf56d7bd5aca88f1e9c6927012207
usr_009 3b24508aecac732b2dbf6d4e4bf9c4c2 8c231f143bc453047665350be67bc92029fe34375ddd45bee71164fcb278315c
usr_008 136ea0f2a0158fcbbd873dead2d60963 a8d79cc679fa29d08b483641f5d4b01faea2e81aa44edc7684fc2b936f32437c

md5 returned a 128-bit (32 hexadecimal characters) string, and sha256 returned a 256-bit (64 hexadecimal characters) string.

String Functions (strcontains, split)

split

split splits a string into an array using a specified delimiter.

fields tags, split(tags, ',') as tag_array
| limit 3
tags tag_array
prod,warning,us-east-1 ["prod","warning","us-east-1"]
prod,normal,ap-northeast-1 ["prod","normal","ap-northeast-1"]
prod,critical,ap-northeast-1 ["prod","critical","ap-northeast-1"]

The comma-delimited tag string was expanded into array format (["prod","warning","us-east-1"]).

strcontains

strcontains determines whether a string contains a specific substring. According to the documentation, specifying true as the third argument enables case-insensitive search.

fields service,
  strcontains(service, 'auth') as has_auth_lower,
  strcontains(service, 'AUTH') as has_AUTH_upper,
  strcontains(service, 'AUTH', true) as has_AUTH_ci
| filter ispresent(service)
| sort service
| limit 5
service has_auth_lower has_AUTH_upper has_AUTH_ci
api-gateway 0 0 0
auth-service 1 0 0
cdn-edge 0 0 0
data-pipeline 0 0 0
geo-service 0 0 0

The basic operation (first and second arguments only) is working correctly. strcontains(service, 'auth') returns 1 for auth-service.

However, in the author's environment, the effect of the third argument true (case-insensitive mode) could not be confirmed. strcontains(service, 'AUTH', true) returned 0 for auth-service, appearing to still behave in a case-sensitive manner. The argument itself was accepted without a syntax error, but specifying it did not change the result.

Conditional Logic (if)

if is a function that returns a value based on a condition. It uses the syntax if(condition, value_if_true, value_if_false) and can be used like a ternary operator.

fields service, response_time,
  if(toNumber(response_time) > 1000, 'slow', 'fast') as speed
| limit 5
service response_time speed
queue-worker 2045 slow
search-service 156 fast
auth-service 8901 slow
notification 67 fast
geo-service 312 fast

Requests with a response time exceeding 1000ms were successfully classified as slow.

Compared to the case function added in the previous (May) update, if is suited for a single condition with two choices, while case is suited for multiple branches.

Conversion Functions (toNumber, toInt, toLong, toDouble)

Four functions for converting string fields to numeric types have been added. Let's verify the differences between each type conversion.

Converting Integer Values

fields response_time,
  toInt(response_time) as rt_int,
  toLong(response_time) as rt_long,
  toDouble(response_time) as rt_double,
  toNumber(response_time) as rt_number
| filter ispresent(response_time)
| limit 5
response_time rt_int rt_long rt_double rt_number
234 234 234 234 234
89 89 89 89 89
1523 1523 1523 1523 1523
5002 5002 5002 5002 5002
not_a_number (null) (null) (null) (null)

For integer values, all four functions returned the same result. When conversion fails (e.g., not_a_number), null is returned without an error.

Converting Decimal Values (where type differences are prominent)

parse @message /latency=(?<lat_val>[\d.]+)/
| display lat_val,
    toInt(lat_val) as lat_int,
    toLong(lat_val) as lat_long,
    toDouble(lat_val) as lat_double,
    toNumber(lat_val) as lat_number
| filter ispresent(lat_val)
lat_val lat_int lat_long lat_double lat_number
890.12 890 890 890.12 890.12
45.7 45 45 45.7 45.7
123.456 123 123 123.456 123.456

Differences appeared with values containing decimals.

  • toInt / toLong: Truncated the decimal portion (verified for positive values only; behavior for floor vs truncate with negative numbers was not confirmed)
  • toDouble / toNumber: Retained the decimal portion, and yielded the same results within the scope of this verification
  • Based on the type names, the difference between toInt and toLong may appear for values exceeding the 32-bit integer range (approximately 2.1 billion), but no difference was confirmed with the test data in this verification

For practical use, toInt/toLong can be used for aggregations where decimal precision is not needed, while toDouble/toNumber can be used for calculations where decimals need to be retained.

IP Functions (ipv4ToNumber, isPrivateIP, isPublicIP, isReservedIP)

Four functions for numeric conversion and classification of IP addresses have been added. They are useful for analyzing VPC flow logs and ALB access logs.

fields ip,
  ipv4ToNumber(ip) as ip_num,
  isPrivateIP(ip) as is_private,
  isPublicIP(ip) as is_public,
  isReservedIP(ip) as is_reserved
| limit 10
ip ip_num is_private is_public is_reserved
10.255.255.1 184549121 1 0 0
52.194.x.x 885131796 0 1 0
192.0.2.1 3221225985 0 0 1
169.254.169.254 2852039166 0 0 1
100.64.0.1 1681915905 0 0 1
8.8.8.8 134744072 0 1 0
172.16.0.10 2886729738 1 0 0
10.0.0.55 167772215 1 0 0
203.0.113.50 3405803826 0 0 1
192.168.1.100 3232235876 1 0 0

※ The lower octets of some public IPs are masked (complete IP addresses were used in the actual verification).

For the IP addresses prepared this time, we confirmed that the expected classification results were obtained. The classification criteria that can be read from the results are as follows.

  • Private (isPrivateIP = 1): Private address ranges defined in RFC 1918
    • 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
  • Reserved (isReservedIP = 1): Address ranges reserved for special purposes
    • 169.254.0.0/16 (link-local), 100.64.0.0/10 (CGN / Shared Address)
    • 192.0.2.0/24 (TEST-NET-1), 203.0.113.0/24 (TEST-NET-3)
  • Public (isPublicIP = 1): Within the range tested, general public IP addresses that do not fall under Private or Reserved were determined to be 1

Within the IPv4 address ranges verified this time, only one of the three classifications was 1, confirming they are mutually exclusive. ipv4ToNumber is a function that converts an IP address to a 32-bit integer. It can be used for filtering by IP range (ipv4ToNumber(ip) >= X and ipv4ToNumber(ip) <= Y).

Analytics Functions (rate, count_over_time, sum_over_time, offset, histogram)

Five new features related to time series data analysis have been added.

count_over_time

count_over_time counts the number of records per time range.

filter metric = 'cpu_usage'
| stats count_over_time(*) as cot by bin(2m)
bin(2m) cot
2026-06-09 16:20:00.000 3
2026-06-09 16:18:00.000 4
2026-06-09 16:16:00.000 4
2026-06-09 16:14:00.000 4
2026-06-09 16:12:00.000 4
2026-06-09 16:10:00.000 1

Under the bin aggregation conditions of this verification, the result was equivalent to count(*).

sum_over_time

sum_over_time calculates the total value per time range.

filter metric = 'cpu_usage'
| stats sum_over_time(value) as sot by bin(2m)
bin(2m) sot
2026-06-09 16:20:00.000 277
2026-06-09 16:18:00.000 447
2026-06-09 16:16:00.000 383
2026-06-09 16:14:00.000 408
2026-06-09 16:12:00.000 384
2026-06-09 16:10:00.000 143

This also yielded the same result as sum(value). count_over_time/sum_over_time appear to be named for time series analysis purposes. However, under the bin aggregation conditions of this verification, no behavioral difference from count(*)/sum() could be confirmed, so it cannot be definitively stated that they behave identically.

histogram

histogram is used as a grouping function in the by clause, and aggregates by dividing a numeric field into buckets.

filter metric = 'cpu_usage'
| stats count(*) as cnt by histogram(value, 50)
histogram(value, 50) cnt
50 10
100 10

The second argument is the bucket width, and the lower bound of the bucket is displayed in the results. Changing the bucket width to 25 shows a more detailed distribution.

filter metric = 'cpu_usage'
| stats count(*) as cnt by histogram(value, 25)
histogram(value, 25) cnt
50 2
75 8
100 6
125 4

Note that histogram is a grouping function for the by clause, not an aggregation function for stats. While bin is for bucketing on the time axis, histogram is for bucketing on the numeric axis.

offset

offset is a modifier for bin() that shifts the alignment (starting position) of bin boundaries.

filter metric = 'cpu_usage'
| stats count(*) as cnt by bin(5m) offset 5m
bin(5m) cnt
2026-06-09 16:20:00.000 3
2026-06-09 16:15:00.000 10
2026-06-09 16:10:00.000 7

The syntax is by bin(5m) offset 5m, placed after bin. It is a modifier, not a function. Using offset allows you to shift the starting position of bin boundaries, enabling you to set aggregation intervals aligned with business hours.

rate

rate is a function that calculates the rate of change of a numeric field within a bin. It uses the syntax rate(field, period), where the second argument specifies the time unit (1s, 1m, 2m, etc.).

filter metric = 'cpu_usage'
| stats rate(value, 1s) as rate_1s, rate(value, 1m) as rate_1m, rate(value, 2m) as rate_2m by bin(5m)
bin(5m) rate_1s rate_1m rate_2m
2026-06-09 18:35:00.000 20 0.3333 0.1667
2026-06-09 18:30:00.000 20 0.3333 0.1667
2026-06-09 18:25:00.000 20 0.3333 0.1667

The test data is a time series where values increase by +10 every 2 minutes. From the results, the ratio of values by period was rate_1s : rate_1m : rate_2m = 60 : 1 : 0.5. rate returns the total change in field values within a bin divided by the number of seconds in the period. The shorter the period, the larger the value.

Note that specifying a numeric value (e.g., 60) as the second argument results in an error. It must be specified as a time unit such as 1m.

Parse Syntax (parse CSV, parse XML, parse multi, values, addtotals)

Log parsing capabilities have been significantly expanded. Three new parse modes — CSV, XML, and multi-match — along with the aggregation helpers values and addtotals have been added.

parse CSV

The syntax parse @message CSV as alias1, alias2, ... splits CSV-formatted logs by column.

filter @logStream = 'test-csv-stream'
| parse @message CSV as ts, lvl, svc, val
| display ts, lvl, svc, val
ts lvl svc val
2026-06-10T00:04:00Z DEBUG cache 10
2026-06-10T00:03:00Z INFO search 50
2026-06-10T00:02:00Z WARN payment 175
2026-06-10T00:01:00Z ERROR auth-service 250

Each comma-delimited value was stored in order into the aliases. There is no distinction between header rows and data rows; all rows are parsed.

parse XML

For XML-formatted logs, fields are extracted using XPath-style path expressions.

filter @message like /<event>/
| parse @message XML '/event/level' as xlevel
| parse @message XML '/event/service' as xsvc
| parse @message XML '/event/code' as xcode
| display xlevel, xsvc, xcode
xlevel xsvc xcode
WARN payment 429
INFO api-gw 200
ERROR auth 401

The syntax is parse @message XML '/element/path' as alias. Within the scope of this verification, values could be retrieved using simple element paths like /event/level. Retrieving the entire document object with parse @message XML as doc or accessing it with dot notation resulted in errors. To extract multiple fields, use multiple parse statements chained with pipes as shown above.

parse multi

parse multi expands all matches within a single line of a regular expression into individual records. It is powerful for parsing key=value formatted logs.

filter @message like /^level=/
| parse @message /(?<kname>\w+)=(?<kval>\S+)/ multi
| stats count(*) by kname
kname count(*)
level 3
service 3
latency 3
request_id 3

The source data has 3 lines, each containing 4 key=value pairs. By adding multi, the 3 lines × 4 pairs = 12 records were expanded, enabling aggregation by key name.

Without multi, only the first match per line is extracted.

parse @message /(?<kname>\w+)=(?<kval>\S+)/
| display kname, kval
kname kval
level WARN
level INFO
level ERROR

With the regex-based parse multi tested this time, extraction worked as expected using named capture groups (?<name>...). On the other hand, the as alias multi syntax described in the documentation resulted in a syntax error in the author's environment.

values

values is an aggregation function that returns distinct values per group combined together.

filter ispresent(service) and ispresent(level)
| stats values(service) as services by level
level services
INFO test-convert, api-gateway, geo-service, notification, search-service
WARN payment-service, queue-worker
ERROR auth-service, cdn-edge
DEBUG data-pipeline

In the API results of this verification, it was confirmed as a comma-delimited string representation. It is useful for listing unique values within a group.

addtotals

addtotals is a command that adds a column with the sum of numeric fields for each row. Note that not only the displayed columns but also all numeric fields present in the query are included in the sum, so the Total value may not match the simple sum of displayed columns.

filter ispresent(response_time)
| fields toNumber(response_time) as rt, toNumber(response_time) * 2 as rt2
| addtotals
| limit 5
rt rt2 Total
2045 4090 8180
156 312 624
8901 17802 35604
67 134 268
234 468 936

By default, a row total is added under the column name Total. The column name can be customized with addtotals fieldname=RowSum.

Specifying col=true should also add a column total row, but the column total row was not included in the get-query-results API response. It may only be displayed in the console UI.

Other (limit any, stats command up to 10 stages)

limit any

limit any is a syntax that returns any N records without guaranteed ordering. While the regular limit returns the top N records in the order of the preceding sort (or what appears to be descending timestamp order when unspecified), limit any may return results faster when ordering is not needed.

fields service, level, ip | limit any 2
service level ip
search-service INFO 10.255.255.1
cdn-edge ERROR 52.194.x.x

This is useful when you want to quickly obtain samples from log groups with large amounts of logs.

stats command up to 10 stages

The range in which stats can be chained with pipes has been expanded, allowing up to 10 stages in the Standard log class.

filter ispresent(service)
| stats count(*) as cnt by service, level
| stats sum(cnt) as level_total by level
| stats max(level_total) as max_level_total, min(level_total) as min_level_total
max_level_total min_level_total
5 1

Three stages of stats are connected with pipes. The first stage counts by service × level, the second stage sums by level, and the third stage calculates the maximum and minimum.

According to the documentation, up to 10 stages can be used in the Standard log class, while up to 2 stages can be used in the Infrequent Access log class. Subsequent stats stages can only reference fields defined in the previous stage, and sort and limit must be placed after the last stats.

As a practical example, here is a pattern for understanding trends from aggregated message character counts by time period.

fields strlen(@message) as msg_len
| stats sum(msg_len) as total_chars by bin(5m)
| stats max(total_chars) as peak, min(total_chars) as lowest, avg(total_chars) as average
peak lowest average
1586 431 935.5

This calculates the total message character count per 5 minutes and then retrieves the peak, minimum, and average from those results. If logs are primarily ASCII, this can also be used to get a rough sense of message size trends. Patterns like "aggregating aggregated results" can now be completed in a single query.

Summary

With 13 additions in May and 23 in June, the CloudWatch Logs Insights query language has expanded significantly in a short period of time.

The features added this time included IP classification, CSV/XML parsing, histogram, and multi-stage stats. It has become easier to perform aggregation, classification, and trend analysis within queries, not just log searching. Within the scope of the verification, it appears there will be more situations where analysis that was previously post-processed in Athena or external tools can be completed entirely within Logs Insights.

On the other hand, at the time of writing, case-insensitive search using the third argument of strcontains did not work as expected in the author's environment.

CloudWatch Logs Insights gives the impression of evolving from "a tool for searching and reviewing logs" into a query environment that is more useful for analytical purposes as well. With these additions, there seem to be even more situations where it can be used for daily investigations and ad hoc analysis.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_QuerySyntax.html

Share this article

AWSのお困り事はクラスメソッドへ