Kiro CLI now supports Claude Opus 4.8. I checked the differences between each Opus model.

Kiro CLI now supports Anthropic's latest model Claude Opus 4.8. We conducted diagnostics on a CloudFormation template containing embedded security risks using headless Kiro CLI, running each of Opus 4.8, 4.7, and 4.6 five times respectively, and compared the trends in diagnostic results, speed, and credit consumption.

suzuki.ryo

2026.05.30

This page has been translated by machine translation. View original

 IntroductionOn May 29, 2026, Claude Opus 4.8, Anthropic's latest model, became available for selection in Kiro CLI (Kiro CLI 2.5.0 or later, treated as experimental preview).
https://kiro.dev/blog/opus-4-8/
Opus 4.8 became available on Amazon Bedrock and Claude Platform on AWS on May 28, 2026, and support was announced for Kiro (IDE / CLI / Web) the following day, May 29.
https://dev.classmethod.jp/articles/20260529-amazon-bedrock-claude-opus-4-8/
This article confirms that Opus 4.8 can actually be selected and run in Kiro CLI 2.5.0, and measures the differences when changing model generations (4.8 / 4.7 / 4.6) and effort (low to max) on the same CloudFormation template analysis task.
 Model List and Credit Multipliers on KiroHere is an excerpt of the model list that can be confirmed with the /model command. The credit multiplier for all Opus 4.x models is the same at 2.20x.


Model
Credit Multiplier
Notes


claude-opus-4.8
2.20x
Experimental preview / 1M context

claude-opus-4.7
2.20x
Experimental preview / 1M context

claude-opus-4.6
2.20x
Claude Opus 4.6

claude-sonnet-4.6
1.30x
Latest Sonnet / 1M context

According to the official Kiro blog, the target plans are Kiro Pro / Pro+ / Power, with a context of 1M tokens and a maximum output of 128K tokens. Available regions are AWS US-East-1 (Northern Virginia) and Europe (Frankfurt), with cross-region inference support.
 Verification Details Task and Scoring MethodA CloudFormation template (eval_template.yaml) intentionally containing 10 issues was fed to Kiro CLI, which was asked to provide an architecture overview and point out security/best practice issues.
eval_template.yaml (planted CFn template)AWSTemplateFormatVersion: '2010-09-09'
Description: Sample stack for analysis (intentionally contains ~10 planted issues)

Resources:
  DataBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: my-app-data-bucket
      AccessControl: PublicRead

  AppSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: app sg
      VpcId: vpc-0abc1234
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 3306
          ToPort: 3306
          CidrIp: 0.0.0.0/0

  AppRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: { Service: ec2.amazonaws.com }
            Action: sts:AssumeRole
      Policies:
        - PolicyName: full-access
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action: '*'
                Resource: '*'

  AppInstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Roles: [ !Ref AppRole ]

  AppServer:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: ami-0123456789abcdef0
      InstanceType: t3.large
      IamInstanceProfile: !Ref AppInstanceProfile
      SecurityGroupIds: [ !Ref AppSecurityGroup ]

  AppDatabase:
    Type: AWS::RDS::DBInstance
    Properties:
      Engine: mysql
      DBInstanceClass: db.t3.medium
      AllocatedStorage: '20'
      MasterUsername: admin
      MasterUserPassword: Password123!
      PubliclyAccessible: true
      StorageEncrypted: false
      VPCSecurityGroups: [ !Ref AppSecurityGroup ]

Outputs:
  DBEndpoint:
    Value: !GetAtt AppDatabase.Endpoint.Address
The 10 items subject to scoring are as follows.
S3 public (PublicRead)
S3 no encryption
S3 PublicAccessBlock not configured
SSH (port 22) open to all
DB port (3306) open to all
IAM full permissions (Action / Resource = *)
DB password hardcoded in plaintext
RDS PubliclyAccessible
RDS storage encryption disabled
Data retention (deletion protection / automatic backup / snapshots, etc.) not configured
Scoring was based on keyword matching, while false judgments due to expression variations were corrected by visually reviewing the body text. Item 10 uses OR judgment, where detection is counted if any of deletion protection, backup, or snapshots are mentioned.
 Execution MethodEach condition was run in headless mode.
kiro-cli chat --no-interactive --trust-tools=read --model <model> "<prompt>"
Effort switching was configured as follows.
# Explicit specification
kiro-cli settings chat.modelDefaults '{"claude-opus-4.8":{"output_config":{"effort":"high"}}}'

# Restore to default (delete setting)
kiro-cli settings --delete chat.modelDefaults
The time in the results table uses the Time value displayed by Kiro, and Credits also uses the Kiro display value as-is. Each condition was measured 5 times and shown as an average.
bench.sh (iterative measurement script)#!/usr/bin/env bash
# Runs the same prompt for each condition REPS times, preserving raw output and index.
# Usage: ./bench.sh [REPS] [MODE]
#   REPS : Number of repetitions per condition (default 5)
#   MODE : effort | models | both   (default effort)
set -uo pipefail

DIR="$(cd "$(dirname "$0")" && pwd)"
REPS="${1:-5}"
MODE="${2:-effort}"
MODEL="claude-opus-4.8"
EFFORTS=(low medium high xhigh max default)
MODELS=(claude-opus-4.8 claude-opus-4.7 claude-opus-4.6)

OUT="$DIR/bench/out"; ERR="$DIR/bench/err"; IDX="$DIR/bench/runs.tsv"
mkdir -p "$OUT" "$ERR"
[ -f "$IDX" ] || printf "cond\trun\twall_sec\toutfile\terrfile\n" > "$IDX"

TPL=$(cat "$DIR/eval_template.yaml")
PROMPT="Please reverse-analyze the following CloudFormation template and list (1) the architecture overview and (2) security/best practice issues exhaustively in bullet points.

\`\`\`yaml
$TPL
\`\`\`"

set_effort() {
  if [ "$1" = "default" ]; then kiro-cli settings --delete chat.modelDefaults >/dev/null 2>&1
  else kiro-cli settings chat.modelDefaults "{\"$MODEL\":{\"output_config\":{\"effort\":\"$1\"}}}" >/dev/null; fi
}

one_run() {
  local cond="$1" model="$2" r o e s en wall
  for r in $(seq 1 "$REPS"); do
    o="$OUT/${cond}_${r}.txt"; e="$ERR/${cond}_${r}.txt"
    s=$(date +%s.%N)
    timeout 600 kiro-cli chat --no-interactive --trust-tools=read --model "$model" "$PROMPT" > "$o" 2> "$e"
    en=$(date +%s.%N); wall=$(echo "$en - $s" | bc)
    printf "%s\t%s\t%s\t%s\t%s\n" "$cond" "$r" "$wall" "$o" "$e" >> "$IDX"
    echo "[$(date +%T)] $cond run $r/$REPS  wall=${wall}s"
  done
}

if [ "$MODE" = "effort" ] || [ "$MODE" = "both" ]; then
  for ef in "${EFFORTS[@]}"; do set_effort "$ef"; one_run "effort_${ef}" "$MODEL"; done
fi
if [ "$MODE" = "models" ] || [ "$MODE" = "both" ]; then
  kiro-cli settings --delete chat.modelDefaults >/dev/null 2>&1
  for m in "${MODELS[@]}"; do one_run "model_${m}" "$m"; done
fi

kiro-cli settings --delete chat.modelDefaults >/dev/null 2>&1
echo "[$(date +%T)] DONE. Analysis: python3 analyze.py"
analyze.py (statistics aggregation script)#!/usr/bin/env python3
"""Analyzes bench.sh output. Outputs mean, median, and outliers per condition."""
import re, os, statistics as st

DIR = os.path.dirname(os.path.abspath(__file__))
IDX = os.path.join(DIR, "bench", "runs.tsv")
ANSI = re.compile(r'\x1b\[[0-9;?]*[a-zA-Z]')

CHECKS = {
    "S3 public":            ["publicread", "public", "exposed"],
    "S3 no encryption":     ["bucketencryption", "not encrypted", "no encryption",
                             "encryption not configured", "encryption not set", "no encryption",
                             "encryption disabled", "sse-", "enable encryption", "data stored in plaintext"],
    "S3 no block":          ["publicaccessblock", "public access block"],
    "port 22 open":         ["22", "ssh"],
    "port 3306 open":       ["3306"],
    "IAM full access":      ["action: '*'", "wildcard", "full access", "'*'", "administrator", "full access"],
    "DB password plaintext": ["plaintext", "hardcoded", "password123", "secrets manager", "inline"],
    "RDS public":           ["publiclyaccessible", "internet", "publicly exposed", "public ip"],
    "RDS no encryption":    ["storageencrypted", "at rest", "storage encryption", "data at rest not encrypted"],
    "no deletion/backup":   ["deletionpolicy", "deletion protection", "snapshot",
                             "backup", "multiaz", "multi-az"],
}

def strip(p):
    return ANSI.sub("", open(p, encoding="utf-8", errors="replace").read())

def parse_run(outf, errf):
    body = strip(outf); lo = body.lower()
    detect = sum(any(w in lo for w in ks) for ks in CHECKS.values())
    chars = len(re.sub(r"\s", "", body))
    err = strip(errf) if os.path.exists(errf) else ""
    m = re.search(r'Credits:\s*([\d.]+).*?Time:\s*(\d+)\s*([hms])', err)
    credits = float(m.group(1)) if m else None
    app_t = None
    if m:
        v = int(m.group(2))
        app_t = v*60 if m.group(3) == "m" else (v*3600 if m.group(3) == "h" else v)
    return credits, app_t, detect, chars

def iqr_outliers(vals):
    xs = sorted(v for v in vals if v is not None)
    if len(xs) < 4:
        return set()
    q1 = st.quantiles(xs, n=4)[0]; q3 = st.quantiles(xs, n=4)[2]; iqr = q3 - q1
    lo, hi = q1 - 1.5*iqr, q3 + 1.5*iqr
    return {v for v in xs if v < lo or v > hi}

def stats(vals):
    xs = [v for v in vals if v is not None]
    if not xs:
        return None
    return dict(n=len(xs), mean=st.mean(xs), median=st.median(xs),
                stdev=(st.pstdev(xs) if len(xs) > 1 else 0.0),
                min=min(xs), max=max(xs))

runs = {}
if not os.path.exists(IDX):
    raise SystemExit(f"not found: {IDX}  (run ./bench.sh first)")
for line in open(IDX):
    if line.startswith("cond\t"):
        continue
    cond, r, wall, outf, errf = line.rstrip("\n").split("\t")
    cr, at, det, ch = parse_run(outf, errf)
    runs.setdefault(cond, []).append(
        dict(run=int(r), wall=float(wall), credits=cr, app_t=at, detect=det, chars=ch))

metrics = [("credits", "cr"), ("app_t", "s"), ("wall", "s"), ("detect", "/10"), ("chars", "")]
for cond in sorted(runs):
    rs = runs[cond]; n = len(rs)
    print(f"\n===== {cond}  (n={n}) =====")
    print(f"{'metric':9}{'mean':>9}{'median':>9}{'stdev':>9}{'min':>8}{'max':>8}{'  outliers(run#)'}")
    for key, unit in metrics:
        s = stats([x[key] for x in rs])
        if not s:
            print(f"{key:9}{'n/a':>9}"); continue
        outs = iqr_outliers([x[key] for x in rs])
        orun = [f"#{x['run']}({x[key]})" for x in rs
                if x[key] in outs and x[key] is not None]
        print(f"{key:9}{s['mean']:>9.2f}{s['median']:>9.2f}{s['stdev']:>9.2f}"
              f"{s['min']:>8.2f}{s['max']:>8.2f}  {', '.join(orun) if orun else '-'}")
 Results Comparison by Model (no effort specified = each model's default, n=5 average)

Model
Detection (average)
Credits (average)
Kiro displayed Time (average)
Output character count (average)


claude-opus-4.8
10/10
0.65cr
36.6s
approx. 1,940

claude-opus-4.7
10/10
0.98cr
50.0s
approx. 3,000

claude-opus-4.6
9.2/10
0.32cr
20.8s
approx. 1,260

The most clear difference was by model generation. 4.7 had the highest Credits and time, and also produced longer responses. In the 5-run average for this test, 4.8 came out smaller than 4.7 in both Credits and time (0.65 vs 0.98cr, 36.6 vs 50.0s). Detection was 10/10 for both. 4.6 had the smallest Credits and time (0.32cr / 20.8s), with detection at 9.2/10.
The difference with 4.6 was mainly in mentions of "S3 PublicAccessBlock not configured." Since it is recommended to enable PublicAccessBlock for S3 and, even when public access is needed, to use configurations like CloudFront OAC to avoid direct public exposure, this was included as a best practice scoring item. 4.7 / 4.8 pointed out this issue, while 4.6 had fewer mentions of this perspective.
On the other hand, 4.6 was still able to detect critical risks such as SSH / DB port fully open, IAM full permissions, DB password in plaintext, RDS public access, and RDS no encryption. It can be a sufficient choice for use cases that prioritize conciseness, speed, and lower Credits.
 Comparison by Effort (claude-opus-4.8, n=5 average)

effort
Detection (average)
Credits (average)
Kiro displayed Time (average)


low
10/10
0.67cr
35.4s

medium
10/10
0.66cr
35.6s

high
10/10
0.67cr
34.6s

xhigh
9.6/10
0.61cr
36.6s

max
10/10
0.62cr
33.6s

not specified (default)
10/10
0.64cr
34.8s

No clear trend was observed in detection, Credits, or time across low to max. At least for this CloudFormation analysis task, the difference in final output from changing effort was not significant.
 SummaryClaude Opus 4.8 became available for selection in Kiro CLI 2.5.0, albeit as a preview. In this task, 4.8 showed equivalent detection capability to 4.7 while achieving smaller Credits and time, and when comparing default settings, it appeared to be a superior alternative. Meanwhile, 4.6 is concise, fast, and low in Credits, capable of detecting critical risks, and looks set to remain a strong option in scenarios where speed or Credits are a priority.
 Reference Linkshttps://aws.amazon.com/blogs/machine-learning/claude-opus-4-8-is-now-available-on-aws/

Kiro CLI now supports Claude Opus 4.8. I checked the differences between each Opus model.

Introduction

Model List and Credit Multipliers on Kiro

Verification Details

Task and Scoring Method

Execution Method

Results

Comparison by Model (no effort specified = each model's default, n=5 average)

Comparison by Effort (claude-opus-4.8, n=5 average)

Summary

Reference Links

Claudeならクラスメソッドにお任せください

AWS Topics

Trending Topics

Products & Services

Features and Series

Model	Credit Multiplier	Notes
claude-opus-4.8	2.20x	Experimental preview / 1M context
claude-opus-4.7	2.20x	Experimental preview / 1M context
claude-opus-4.6	2.20x	Claude Opus 4.6
claude-sonnet-4.6	1.30x	Latest Sonnet / 1M context

Model	Detection (average)	Credits (average)	Kiro displayed Time (average)	Output character count (average)
claude-opus-4.8	10/10	0.65cr	36.6s	approx. 1,940
claude-opus-4.7	10/10	0.98cr	50.0s	approx. 3,000
claude-opus-4.6	9.2/10	0.32cr	20.8s	approx. 1,260

effort	Detection (average)	Credits (average)	Kiro displayed Time (average)
low	10/10	0.67cr	35.4s
medium	10/10	0.66cr	35.6s
high	10/10	0.67cr	34.6s
xhigh	9.6/10	0.61cr	36.6s
max	10/10	0.62cr	33.6s
not specified (default)	10/10	0.64cr	34.8s