
Kiro CLI now supports Claude Opus 4.8. I checked the differences between each Opus model.
This page has been translated by machine translation. View original
Introduction
On May 29, 2026, Claude Opus 4.8, Anthropic's latest model, became available for selection in Kiro CLI (Kiro CLI 2.5.0 or later, treated as experimental preview).
Opus 4.8 became available on Amazon Bedrock and Claude Platform on AWS on May 28, 2026, and support was announced for Kiro (IDE / CLI / Web) the following day, May 29.
This article confirms that Opus 4.8 can actually be selected and run in Kiro CLI 2.5.0, and measures the differences when changing model generations (4.8 / 4.7 / 4.6) and effort (low to max) on the same CloudFormation template analysis task.
Model List and Credit Multipliers on Kiro
Here is an excerpt of the model list that can be confirmed with the /model command. The credit multiplier for all Opus 4.x models is the same at 2.20x.
| Model | Credit Multiplier | Notes |
|---|---|---|
| claude-opus-4.8 | 2.20x | Experimental preview / 1M context |
| claude-opus-4.7 | 2.20x | Experimental preview / 1M context |
| claude-opus-4.6 | 2.20x | Claude Opus 4.6 |
| claude-sonnet-4.6 | 1.30x | Latest Sonnet / 1M context |
According to the official Kiro blog, the target plans are Kiro Pro / Pro+ / Power, with a context of 1M tokens and a maximum output of 128K tokens. Available regions are AWS US-East-1 (Northern Virginia) and Europe (Frankfurt), with cross-region inference support.
Verification Details
Task and Scoring Method
A CloudFormation template (eval_template.yaml) intentionally containing 10 issues was fed to Kiro CLI, which was asked to provide an architecture overview and point out security/best practice issues.
eval_template.yaml (planted CFn template)
AWSTemplateFormatVersion: '2010-09-09'
Description: Sample stack for analysis (intentionally contains ~10 planted issues)
Resources:
DataBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: my-app-data-bucket
AccessControl: PublicRead
AppSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: app sg
VpcId: vpc-0abc1234
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 3306
ToPort: 3306
CidrIp: 0.0.0.0/0
AppRole:
Type: AWS::IAM::Role
Properties:
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal: { Service: ec2.amazonaws.com }
Action: sts:AssumeRole
Policies:
- PolicyName: full-access
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action: '*'
Resource: '*'
AppInstanceProfile:
Type: AWS::IAM::InstanceProfile
Properties:
Roles: [ !Ref AppRole ]
AppServer:
Type: AWS::EC2::Instance
Properties:
ImageId: ami-0123456789abcdef0
InstanceType: t3.large
IamInstanceProfile: !Ref AppInstanceProfile
SecurityGroupIds: [ !Ref AppSecurityGroup ]
AppDatabase:
Type: AWS::RDS::DBInstance
Properties:
Engine: mysql
DBInstanceClass: db.t3.medium
AllocatedStorage: '20'
MasterUsername: admin
MasterUserPassword: Password123!
PubliclyAccessible: true
StorageEncrypted: false
VPCSecurityGroups: [ !Ref AppSecurityGroup ]
Outputs:
DBEndpoint:
Value: !GetAtt AppDatabase.Endpoint.Address
The 10 items subject to scoring are as follows.
- S3 public (PublicRead)
- S3 no encryption
- S3 PublicAccessBlock not configured
- SSH (port 22) open to all
- DB port (3306) open to all
- IAM full permissions (Action / Resource =
*) - DB password hardcoded in plaintext
- RDS PubliclyAccessible
- RDS storage encryption disabled
- Data retention (deletion protection / automatic backup / snapshots, etc.) not configured
Scoring was based on keyword matching, while false judgments due to expression variations were corrected by visually reviewing the body text. Item 10 uses OR judgment, where detection is counted if any of deletion protection, backup, or snapshots are mentioned.
Execution Method
Each condition was run in headless mode.
kiro-cli chat --no-interactive --trust-tools=read --model <model> "<prompt>"
Effort switching was configured as follows.
# Explicit specification
kiro-cli settings chat.modelDefaults '{"claude-opus-4.8":{"output_config":{"effort":"high"}}}'
# Restore to default (delete setting)
kiro-cli settings --delete chat.modelDefaults
The time in the results table uses the Time value displayed by Kiro, and Credits also uses the Kiro display value as-is. Each condition was measured 5 times and shown as an average.
bench.sh (iterative measurement script)
#!/usr/bin/env bash
# Runs the same prompt for each condition REPS times, preserving raw output and index.
# Usage: ./bench.sh [REPS] [MODE]
# REPS : Number of repetitions per condition (default 5)
# MODE : effort | models | both (default effort)
set -uo pipefail
DIR="$(cd "$(dirname "$0")" && pwd)"
REPS="${1:-5}"
MODE="${2:-effort}"
MODEL="claude-opus-4.8"
EFFORTS=(low medium high xhigh max default)
MODELS=(claude-opus-4.8 claude-opus-4.7 claude-opus-4.6)
OUT="$DIR/bench/out"; ERR="$DIR/bench/err"; IDX="$DIR/bench/runs.tsv"
mkdir -p "$OUT" "$ERR"
[ -f "$IDX" ] || printf "cond\trun\twall_sec\toutfile\terrfile\n" > "$IDX"
TPL=$(cat "$DIR/eval_template.yaml")
PROMPT="Please reverse-analyze the following CloudFormation template and list (1) the architecture overview and (2) security/best practice issues exhaustively in bullet points.
\`\`\`yaml
$TPL
\`\`\`"
set_effort() {
if [ "$1" = "default" ]; then kiro-cli settings --delete chat.modelDefaults >/dev/null 2>&1
else kiro-cli settings chat.modelDefaults "{\"$MODEL\":{\"output_config\":{\"effort\":\"$1\"}}}" >/dev/null; fi
}
one_run() {
local cond="$1" model="$2" r o e s en wall
for r in $(seq 1 "$REPS"); do
o="$OUT/${cond}_${r}.txt"; e="$ERR/${cond}_${r}.txt"
s=$(date +%s.%N)
timeout 600 kiro-cli chat --no-interactive --trust-tools=read --model "$model" "$PROMPT" > "$o" 2> "$e"
en=$(date +%s.%N); wall=$(echo "$en - $s" | bc)
printf "%s\t%s\t%s\t%s\t%s\n" "$cond" "$r" "$wall" "$o" "$e" >> "$IDX"
echo "[$(date +%T)] $cond run $r/$REPS wall=${wall}s"
done
}
if [ "$MODE" = "effort" ] || [ "$MODE" = "both" ]; then
for ef in "${EFFORTS[@]}"; do set_effort "$ef"; one_run "effort_${ef}" "$MODEL"; done
fi
if [ "$MODE" = "models" ] || [ "$MODE" = "both" ]; then
kiro-cli settings --delete chat.modelDefaults >/dev/null 2>&1
for m in "${MODELS[@]}"; do one_run "model_${m}" "$m"; done
fi
kiro-cli settings --delete chat.modelDefaults >/dev/null 2>&1
echo "[$(date +%T)] DONE. Analysis: python3 analyze.py"
analyze.py (statistics aggregation script)
#!/usr/bin/env python3
"""Analyzes bench.sh output. Outputs mean, median, and outliers per condition."""
import re, os, statistics as st
DIR = os.path.dirname(os.path.abspath(__file__))
IDX = os.path.join(DIR, "bench", "runs.tsv")
ANSI = re.compile(r'\x1b\[[0-9;?]*[a-zA-Z]')
CHECKS = {
"S3 public": ["publicread", "public", "exposed"],
"S3 no encryption": ["bucketencryption", "not encrypted", "no encryption",
"encryption not configured", "encryption not set", "no encryption",
"encryption disabled", "sse-", "enable encryption", "data stored in plaintext"],
"S3 no block": ["publicaccessblock", "public access block"],
"port 22 open": ["22", "ssh"],
"port 3306 open": ["3306"],
"IAM full access": ["action: '*'", "wildcard", "full access", "'*'", "administrator", "full access"],
"DB password plaintext": ["plaintext", "hardcoded", "password123", "secrets manager", "inline"],
"RDS public": ["publiclyaccessible", "internet", "publicly exposed", "public ip"],
"RDS no encryption": ["storageencrypted", "at rest", "storage encryption", "data at rest not encrypted"],
"no deletion/backup": ["deletionpolicy", "deletion protection", "snapshot",
"backup", "multiaz", "multi-az"],
}
def strip(p):
return ANSI.sub("", open(p, encoding="utf-8", errors="replace").read())
def parse_run(outf, errf):
body = strip(outf); lo = body.lower()
detect = sum(any(w in lo for w in ks) for ks in CHECKS.values())
chars = len(re.sub(r"\s", "", body))
err = strip(errf) if os.path.exists(errf) else ""
m = re.search(r'Credits:\s*([\d.]+).*?Time:\s*(\d+)\s*([hms])', err)
credits = float(m.group(1)) if m else None
app_t = None
if m:
v = int(m.group(2))
app_t = v*60 if m.group(3) == "m" else (v*3600 if m.group(3) == "h" else v)
return credits, app_t, detect, chars
def iqr_outliers(vals):
xs = sorted(v for v in vals if v is not None)
if len(xs) < 4:
return set()
q1 = st.quantiles(xs, n=4)[0]; q3 = st.quantiles(xs, n=4)[2]; iqr = q3 - q1
lo, hi = q1 - 1.5*iqr, q3 + 1.5*iqr
return {v for v in xs if v < lo or v > hi}
def stats(vals):
xs = [v for v in vals if v is not None]
if not xs:
return None
return dict(n=len(xs), mean=st.mean(xs), median=st.median(xs),
stdev=(st.pstdev(xs) if len(xs) > 1 else 0.0),
min=min(xs), max=max(xs))
runs = {}
if not os.path.exists(IDX):
raise SystemExit(f"not found: {IDX} (run ./bench.sh first)")
for line in open(IDX):
if line.startswith("cond\t"):
continue
cond, r, wall, outf, errf = line.rstrip("\n").split("\t")
cr, at, det, ch = parse_run(outf, errf)
runs.setdefault(cond, []).append(
dict(run=int(r), wall=float(wall), credits=cr, app_t=at, detect=det, chars=ch))
metrics = [("credits", "cr"), ("app_t", "s"), ("wall", "s"), ("detect", "/10"), ("chars", "")]
for cond in sorted(runs):
rs = runs[cond]; n = len(rs)
print(f"\n===== {cond} (n={n}) =====")
print(f"{'metric':9}{'mean':>9}{'median':>9}{'stdev':>9}{'min':>8}{'max':>8}{' outliers(run#)'}")
for key, unit in metrics:
s = stats([x[key] for x in rs])
if not s:
print(f"{key:9}{'n/a':>9}"); continue
outs = iqr_outliers([x[key] for x in rs])
orun = [f"#{x['run']}({x[key]})" for x in rs
if x[key] in outs and x[key] is not None]
print(f"{key:9}{s['mean']:>9.2f}{s['median']:>9.2f}{s['stdev']:>9.2f}"
f"{s['min']:>8.2f}{s['max']:>8.2f} {', '.join(orun) if orun else '-'}")
Results
Comparison by Model (no effort specified = each model's default, n=5 average)
| Model | Detection (average) | Credits (average) | Kiro displayed Time (average) | Output character count (average) |
|---|---|---|---|---|
| claude-opus-4.8 | 10/10 | 0.65cr | 36.6s | approx. 1,940 |
| claude-opus-4.7 | 10/10 | 0.98cr | 50.0s | approx. 3,000 |
| claude-opus-4.6 | 9.2/10 | 0.32cr | 20.8s | approx. 1,260 |
The most clear difference was by model generation. 4.7 had the highest Credits and time, and also produced longer responses. In the 5-run average for this test, 4.8 came out smaller than 4.7 in both Credits and time (0.65 vs 0.98cr, 36.6 vs 50.0s). Detection was 10/10 for both. 4.6 had the smallest Credits and time (0.32cr / 20.8s), with detection at 9.2/10.
The difference with 4.6 was mainly in mentions of "S3 PublicAccessBlock not configured." Since it is recommended to enable PublicAccessBlock for S3 and, even when public access is needed, to use configurations like CloudFront OAC to avoid direct public exposure, this was included as a best practice scoring item. 4.7 / 4.8 pointed out this issue, while 4.6 had fewer mentions of this perspective.
On the other hand, 4.6 was still able to detect critical risks such as SSH / DB port fully open, IAM full permissions, DB password in plaintext, RDS public access, and RDS no encryption. It can be a sufficient choice for use cases that prioritize conciseness, speed, and lower Credits.
Comparison by Effort (claude-opus-4.8, n=5 average)
| effort | Detection (average) | Credits (average) | Kiro displayed Time (average) |
|---|---|---|---|
| low | 10/10 | 0.67cr | 35.4s |
| medium | 10/10 | 0.66cr | 35.6s |
| high | 10/10 | 0.67cr | 34.6s |
| xhigh | 9.6/10 | 0.61cr | 36.6s |
| max | 10/10 | 0.62cr | 33.6s |
| not specified (default) | 10/10 | 0.64cr | 34.8s |
No clear trend was observed in detection, Credits, or time across low to max. At least for this CloudFormation analysis task, the difference in final output from changing effort was not significant.
Summary
Claude Opus 4.8 became available for selection in Kiro CLI 2.5.0, albeit as a preview. In this task, 4.8 showed equivalent detection capability to 4.7 while achieving smaller Credits and time, and when comparing default settings, it appeared to be a superior alternative. Meanwhile, 4.6 is concise, fast, and low in Credits, capable of detecting critical risks, and looks set to remain a strong option in scenarios where speed or Credits are a priority.
Reference Links

