I verified 23 new commands and functions added to CloudWatch Logs Insights (June 2026 edition)
This page has been translated by machine translation. View original
Introduction
On June 8, 2026, 23 new commands and functions were added to Amazon CloudWatch Logs Insights.
In the previous article, we introduced the 13 additions announced on May 21, but in less than a month, there has been another significant addition.
| Announcement Date | Number Added | Main Categories |
|---|---|---|
| 5/21 | 13 | String operations, encode/decode, logfmt parsing, coordinate calculation |
| 6/8 | 23 | Hash, IP classification, type conversion, time series analysis, CSV/XML parsing, command extensions |
Comparing the trends of features added in May and June reveals the direction of the query language.
| May (13) | June (23) | |
|---|---|---|
| Character | Log viewing and formatting tool | Log analysis platform |
| Main work | "Read and search" | "Aggregate and determine" |
| What it replaced | Manual inspection, manual decoding | Post-processing in Athena/pandas/Splunk |
| Typical use cases | Initial investigation during incidents | Security audits, traffic analysis, SLO aggregation |
While May's additions were focused on "making logs easier to read," June's additions can be described as "getting answers from logs."
The list of new features added this time is as follows.
| Category | Items |
|---|---|
| Hash functions | md5, sha256 |
| String functions | strcontains (case-insensitive support), split |
| Conditional logic | if |
| Conversion functions | toNumber, toInt, toLong, toDouble |
| IP functions | ipv4ToNumber, isPrivateIP, isPublicIP, isReservedIP |
| Analytics functions | rate, count_over_time, sum_over_time, offset, histogram |
| Parse functions | parse CSV, parse XML, parse multi, values, addtotals |
| Other | limit any N, stats command up to 10 stages |
※ The official announcement states "23 new query commands and functions." The above table expands parse CSV/XML/multi etc. individually, so the item count may differ. The official announcement counts syntax extensions together, and this article uses the official count of "23" as-is.
This article will run these in practice and confirm the operation results.
Verification Environment
- Region: us-east-1
- Log group:
/test/insights-new-2026-06 - Test data: JSON format, CSV format, XML format, time series data. Fictional user IDs, service names, and IP addresses are used (IPs consist of RFC 5737 TEST-NET, RFC 1918 private ranges, and well-known addresses such as 8.8.8.8)
Hash Functions (md5, sha256)
md5 and sha256 are functions that generate hash values from field values. They can be used when you want to display hashed versions of user IDs and similar values in logs.
fields user_id, md5(user_id) as md5_hash, sha256(user_id) as sha256_hash
| limit 3
| user_id | md5_hash | sha256_hash |
|---|---|---|
| usr_010 | 2fb6c8adce410db19ed04a7157b1ebd0 | 34f0365ae65242b4664ad6a1e4fe941c77caf56d7bd5aca88f1e9c6927012207 |
| usr_009 | 3b24508aecac732b2dbf6d4e4bf9c4c2 | 8c231f143bc453047665350be67bc92029fe34375ddd45bee71164fcb278315c |
| usr_008 | 136ea0f2a0158fcbbd873dead2d60963 | a8d79cc679fa29d08b483641f5d4b01faea2e81aa44edc7684fc2b936f32437c |
md5 returned a 128-bit (32 hexadecimal characters) string, and sha256 returned a 256-bit (64 hexadecimal characters) string.
String Functions (strcontains, split)
split
split splits a string into an array using a specified delimiter.
fields tags, split(tags, ',') as tag_array
| limit 3
| tags | tag_array |
|---|---|
| prod,warning,us-east-1 | ["prod","warning","us-east-1"] |
| prod,normal,ap-northeast-1 | ["prod","normal","ap-northeast-1"] |
| prod,critical,ap-northeast-1 | ["prod","critical","ap-northeast-1"] |
The comma-delimited tag string was expanded into array format (["prod","warning","us-east-1"]).
strcontains
strcontains determines whether a string contains a specific substring. According to the documentation, specifying true as the third argument enables case-insensitive search.
fields service,
strcontains(service, 'auth') as has_auth_lower,
strcontains(service, 'AUTH') as has_AUTH_upper,
strcontains(service, 'AUTH', true) as has_AUTH_ci
| filter ispresent(service)
| sort service
| limit 5
| service | has_auth_lower | has_AUTH_upper | has_AUTH_ci |
|---|---|---|---|
| api-gateway | 0 | 0 | 0 |
| auth-service | 1 | 0 | 0 |
| cdn-edge | 0 | 0 | 0 |
| data-pipeline | 0 | 0 | 0 |
| geo-service | 0 | 0 | 0 |
The basic operation (first and second arguments only) is working correctly. strcontains(service, 'auth') returns 1 for auth-service.
However, in the author's environment, the effect of the third argument true (case-insensitive mode) could not be confirmed. strcontains(service, 'AUTH', true) returned 0 for auth-service, appearing to still behave in a case-sensitive manner. The argument itself was accepted without a syntax error, but specifying it did not change the result.
Conditional Logic (if)
if is a function that returns a value based on a condition. It uses the syntax if(condition, value_if_true, value_if_false) and can be used like a ternary operator.
fields service, response_time,
if(toNumber(response_time) > 1000, 'slow', 'fast') as speed
| limit 5
| service | response_time | speed |
|---|---|---|
| queue-worker | 2045 | slow |
| search-service | 156 | fast |
| auth-service | 8901 | slow |
| notification | 67 | fast |
| geo-service | 312 | fast |
Requests with a response time exceeding 1000ms were successfully classified as slow.
Compared to the case function added in the previous (May) update, if is suited for a single condition with two choices, while case is suited for multiple branches.
Conversion Functions (toNumber, toInt, toLong, toDouble)
Four functions for converting string fields to numeric types have been added. Let's verify the differences between each type conversion.
Converting Integer Values
fields response_time,
toInt(response_time) as rt_int,
toLong(response_time) as rt_long,
toDouble(response_time) as rt_double,
toNumber(response_time) as rt_number
| filter ispresent(response_time)
| limit 5
| response_time | rt_int | rt_long | rt_double | rt_number |
|---|---|---|---|---|
| 234 | 234 | 234 | 234 | 234 |
| 89 | 89 | 89 | 89 | 89 |
| 1523 | 1523 | 1523 | 1523 | 1523 |
| 5002 | 5002 | 5002 | 5002 | 5002 |
| not_a_number | (null) | (null) | (null) | (null) |
For integer values, all four functions returned the same result. When conversion fails (e.g., not_a_number), null is returned without an error.
Converting Decimal Values (where type differences are prominent)
parse @message /latency=(?<lat_val>[\d.]+)/
| display lat_val,
toInt(lat_val) as lat_int,
toLong(lat_val) as lat_long,
toDouble(lat_val) as lat_double,
toNumber(lat_val) as lat_number
| filter ispresent(lat_val)
| lat_val | lat_int | lat_long | lat_double | lat_number |
|---|---|---|---|---|
| 890.12 | 890 | 890 | 890.12 | 890.12 |
| 45.7 | 45 | 45 | 45.7 | 45.7 |
| 123.456 | 123 | 123 | 123.456 | 123.456 |
Differences appeared with values containing decimals.
- toInt / toLong: Truncated the decimal portion (verified for positive values only; behavior for floor vs truncate with negative numbers was not confirmed)
- toDouble / toNumber: Retained the decimal portion, and yielded the same results within the scope of this verification
- Based on the type names, the difference between toInt and toLong may appear for values exceeding the 32-bit integer range (approximately 2.1 billion), but no difference was confirmed with the test data in this verification
For practical use, toInt/toLong can be used for aggregations where decimal precision is not needed, while toDouble/toNumber can be used for calculations where decimals need to be retained.
IP Functions (ipv4ToNumber, isPrivateIP, isPublicIP, isReservedIP)
Four functions for numeric conversion and classification of IP addresses have been added. They are useful for analyzing VPC flow logs and ALB access logs.
fields ip,
ipv4ToNumber(ip) as ip_num,
isPrivateIP(ip) as is_private,
isPublicIP(ip) as is_public,
isReservedIP(ip) as is_reserved
| limit 10
| ip | ip_num | is_private | is_public | is_reserved |
|---|---|---|---|---|
| 10.255.255.1 | 184549121 | 1 | 0 | 0 |
| 52.194.x.x | 885131796 | 0 | 1 | 0 |
| 192.0.2.1 | 3221225985 | 0 | 0 | 1 |
| 169.254.169.254 | 2852039166 | 0 | 0 | 1 |
| 100.64.0.1 | 1681915905 | 0 | 0 | 1 |
| 8.8.8.8 | 134744072 | 0 | 1 | 0 |
| 172.16.0.10 | 2886729738 | 1 | 0 | 0 |
| 10.0.0.55 | 167772215 | 1 | 0 | 0 |
| 203.0.113.50 | 3405803826 | 0 | 0 | 1 |
| 192.168.1.100 | 3232235876 | 1 | 0 | 0 |
※ The lower octets of some public IPs are masked (complete IP addresses were used in the actual verification).
For the IP addresses prepared this time, we confirmed that the expected classification results were obtained. The classification criteria that can be read from the results are as follows.
- Private (
isPrivateIP= 1): Private address ranges defined in RFC 1918- 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16
- Reserved (
isReservedIP= 1): Address ranges reserved for special purposes- 169.254.0.0/16 (link-local), 100.64.0.0/10 (CGN / Shared Address)
- 192.0.2.0/24 (TEST-NET-1), 203.0.113.0/24 (TEST-NET-3)
- Public (
isPublicIP= 1): Within the range tested, general public IP addresses that do not fall under Private or Reserved were determined to be1
Within the IPv4 address ranges verified this time, only one of the three classifications was 1, confirming they are mutually exclusive. ipv4ToNumber is a function that converts an IP address to a 32-bit integer. It can be used for filtering by IP range (ipv4ToNumber(ip) >= X and ipv4ToNumber(ip) <= Y).
Analytics Functions (rate, count_over_time, sum_over_time, offset, histogram)
Five new features related to time series data analysis have been added.
count_over_time
count_over_time counts the number of records per time range.
filter metric = 'cpu_usage'
| stats count_over_time(*) as cot by bin(2m)
| bin(2m) | cot |
|---|---|
| 2026-06-09 16:20:00.000 | 3 |
| 2026-06-09 16:18:00.000 | 4 |
| 2026-06-09 16:16:00.000 | 4 |
| 2026-06-09 16:14:00.000 | 4 |
| 2026-06-09 16:12:00.000 | 4 |
| 2026-06-09 16:10:00.000 | 1 |
Under the bin aggregation conditions of this verification, the result was equivalent to count(*).
sum_over_time
sum_over_time calculates the total value per time range.
filter metric = 'cpu_usage'
| stats sum_over_time(value) as sot by bin(2m)
| bin(2m) | sot |
|---|---|
| 2026-06-09 16:20:00.000 | 277 |
| 2026-06-09 16:18:00.000 | 447 |
| 2026-06-09 16:16:00.000 | 383 |
| 2026-06-09 16:14:00.000 | 408 |
| 2026-06-09 16:12:00.000 | 384 |
| 2026-06-09 16:10:00.000 | 143 |
This also yielded the same result as sum(value). count_over_time/sum_over_time appear to be named for time series analysis purposes. However, under the bin aggregation conditions of this verification, no behavioral difference from count(*)/sum() could be confirmed, so it cannot be definitively stated that they behave identically.
histogram
histogram is used as a grouping function in the by clause, and aggregates by dividing a numeric field into buckets.
filter metric = 'cpu_usage'
| stats count(*) as cnt by histogram(value, 50)
| histogram(value, 50) | cnt |
|---|---|
| 50 | 10 |
| 100 | 10 |
The second argument is the bucket width, and the lower bound of the bucket is displayed in the results. Changing the bucket width to 25 shows a more detailed distribution.
filter metric = 'cpu_usage'
| stats count(*) as cnt by histogram(value, 25)
| histogram(value, 25) | cnt |
|---|---|
| 50 | 2 |
| 75 | 8 |
| 100 | 6 |
| 125 | 4 |
Note that histogram is a grouping function for the by clause, not an aggregation function for stats. While bin is for bucketing on the time axis, histogram is for bucketing on the numeric axis.
offset
offset is a modifier for bin() that shifts the alignment (starting position) of bin boundaries.
filter metric = 'cpu_usage'
| stats count(*) as cnt by bin(5m) offset 5m
| bin(5m) | cnt |
|---|---|
| 2026-06-09 16:20:00.000 | 3 |
| 2026-06-09 16:15:00.000 | 10 |
| 2026-06-09 16:10:00.000 | 7 |
The syntax is by bin(5m) offset 5m, placed after bin. It is a modifier, not a function. Using offset allows you to shift the starting position of bin boundaries, enabling you to set aggregation intervals aligned with business hours.
rate
rate is a function that calculates the rate of change of a numeric field within a bin. It uses the syntax rate(field, period), where the second argument specifies the time unit (1s, 1m, 2m, etc.).
filter metric = 'cpu_usage'
| stats rate(value, 1s) as rate_1s, rate(value, 1m) as rate_1m, rate(value, 2m) as rate_2m by bin(5m)
| bin(5m) | rate_1s | rate_1m | rate_2m |
|---|---|---|---|
| 2026-06-09 18:35:00.000 | 20 | 0.3333 | 0.1667 |
| 2026-06-09 18:30:00.000 | 20 | 0.3333 | 0.1667 |
| 2026-06-09 18:25:00.000 | 20 | 0.3333 | 0.1667 |
The test data is a time series where values increase by +10 every 2 minutes. From the results, the ratio of values by period was rate_1s : rate_1m : rate_2m = 60 : 1 : 0.5. rate returns the total change in field values within a bin divided by the number of seconds in the period. The shorter the period, the larger the value.
Note that specifying a numeric value (e.g., 60) as the second argument results in an error. It must be specified as a time unit such as 1m.
Parse Syntax (parse CSV, parse XML, parse multi, values, addtotals)
Log parsing capabilities have been significantly expanded. Three new parse modes — CSV, XML, and multi-match — along with the aggregation helpers values and addtotals have been added.
parse CSV
The syntax parse @message CSV as alias1, alias2, ... splits CSV-formatted logs by column.
filter @logStream = 'test-csv-stream'
| parse @message CSV as ts, lvl, svc, val
| display ts, lvl, svc, val
| ts | lvl | svc | val |
|---|---|---|---|
| 2026-06-10T00:04:00Z | DEBUG | cache | 10 |
| 2026-06-10T00:03:00Z | INFO | search | 50 |
| 2026-06-10T00:02:00Z | WARN | payment | 175 |
| 2026-06-10T00:01:00Z | ERROR | auth-service | 250 |
Each comma-delimited value was stored in order into the aliases. There is no distinction between header rows and data rows; all rows are parsed.
parse XML
For XML-formatted logs, fields are extracted using XPath-style path expressions.
filter @message like /<event>/
| parse @message XML '/event/level' as xlevel
| parse @message XML '/event/service' as xsvc
| parse @message XML '/event/code' as xcode
| display xlevel, xsvc, xcode
| xlevel | xsvc | xcode |
|---|---|---|
| WARN | payment | 429 |
| INFO | api-gw | 200 |
| ERROR | auth | 401 |
The syntax is parse @message XML '/element/path' as alias. Within the scope of this verification, values could be retrieved using simple element paths like /event/level. Retrieving the entire document object with parse @message XML as doc or accessing it with dot notation resulted in errors. To extract multiple fields, use multiple parse statements chained with pipes as shown above.
parse multi
parse multi expands all matches within a single line of a regular expression into individual records. It is powerful for parsing key=value formatted logs.
filter @message like /^level=/
| parse @message /(?<kname>\w+)=(?<kval>\S+)/ multi
| stats count(*) by kname
| kname | count(*) |
|---|---|
| level | 3 |
| service | 3 |
| latency | 3 |
| request_id | 3 |
The source data has 3 lines, each containing 4 key=value pairs. By adding multi, the 3 lines × 4 pairs = 12 records were expanded, enabling aggregation by key name.
Without multi, only the first match per line is extracted.
parse @message /(?<kname>\w+)=(?<kval>\S+)/
| display kname, kval
| kname | kval |
|---|---|
| level | WARN |
| level | INFO |
| level | ERROR |
With the regex-based parse multi tested this time, extraction worked as expected using named capture groups (?<name>...). On the other hand, the as alias multi syntax described in the documentation resulted in a syntax error in the author's environment.
values
values is an aggregation function that returns distinct values per group combined together.
filter ispresent(service) and ispresent(level)
| stats values(service) as services by level
| level | services |
|---|---|
| INFO | test-convert, api-gateway, geo-service, notification, search-service |
| WARN | payment-service, queue-worker |
| ERROR | auth-service, cdn-edge |
| DEBUG | data-pipeline |
In the API results of this verification, it was confirmed as a comma-delimited string representation. It is useful for listing unique values within a group.
addtotals
addtotals is a command that adds a column with the sum of numeric fields for each row. Note that not only the displayed columns but also all numeric fields present in the query are included in the sum, so the Total value may not match the simple sum of displayed columns.
filter ispresent(response_time)
| fields toNumber(response_time) as rt, toNumber(response_time) * 2 as rt2
| addtotals
| limit 5
| rt | rt2 | Total |
|---|---|---|
| 2045 | 4090 | 8180 |
| 156 | 312 | 624 |
| 8901 | 17802 | 35604 |
| 67 | 134 | 268 |
| 234 | 468 | 936 |
By default, a row total is added under the column name Total. The column name can be customized with addtotals fieldname=RowSum.
Specifying col=true should also add a column total row, but the column total row was not included in the get-query-results API response. It may only be displayed in the console UI.
Other (limit any, stats command up to 10 stages)
limit any
limit any is a syntax that returns any N records without guaranteed ordering. While the regular limit returns the top N records in the order of the preceding sort (or what appears to be descending timestamp order when unspecified), limit any may return results faster when ordering is not needed.
fields service, level, ip | limit any 2
| service | level | ip |
|---|---|---|
| search-service | INFO | 10.255.255.1 |
| cdn-edge | ERROR | 52.194.x.x |
This is useful when you want to quickly obtain samples from log groups with large amounts of logs.
stats command up to 10 stages
The range in which stats can be chained with pipes has been expanded, allowing up to 10 stages in the Standard log class.
filter ispresent(service)
| stats count(*) as cnt by service, level
| stats sum(cnt) as level_total by level
| stats max(level_total) as max_level_total, min(level_total) as min_level_total
| max_level_total | min_level_total |
|---|---|
| 5 | 1 |
Three stages of stats are connected with pipes. The first stage counts by service × level, the second stage sums by level, and the third stage calculates the maximum and minimum.
According to the documentation, up to 10 stages can be used in the Standard log class, while up to 2 stages can be used in the Infrequent Access log class. Subsequent stats stages can only reference fields defined in the previous stage, and sort and limit must be placed after the last stats.
As a practical example, here is a pattern for understanding trends from aggregated message character counts by time period.
fields strlen(@message) as msg_len
| stats sum(msg_len) as total_chars by bin(5m)
| stats max(total_chars) as peak, min(total_chars) as lowest, avg(total_chars) as average
| peak | lowest | average |
|---|---|---|
| 1586 | 431 | 935.5 |
This calculates the total message character count per 5 minutes and then retrieves the peak, minimum, and average from those results. If logs are primarily ASCII, this can also be used to get a rough sense of message size trends. Patterns like "aggregating aggregated results" can now be completed in a single query.
Summary
With 13 additions in May and 23 in June, the CloudWatch Logs Insights query language has expanded significantly in a short period of time.
The features added this time included IP classification, CSV/XML parsing, histogram, and multi-stage stats. It has become easier to perform aggregation, classification, and trend analysis within queries, not just log searching. Within the scope of the verification, it appears there will be more situations where analysis that was previously post-processed in Athena or external tools can be completed entirely within Logs Insights.
On the other hand, at the time of writing, case-insensitive search using the third argument of strcontains did not work as expected in the author's environment.
CloudWatch Logs Insights gives the impression of evolving from "a tool for searching and reviewing logs" into a query environment that is more useful for analytical purposes as well. With these additions, there seem to be even more situations where it can be used for daily investigations and ad hoc analysis.
Reference Links
