Kiro CLI now supports Claude Opus 4.8. I checked the differences between each Opus model.

Kiro CLI now supports Claude Opus 4.8. I checked the differences between each Opus model.

Kiro CLI now supports Anthropic's latest model Claude Opus 4.8. We conducted diagnostics on a CloudFormation template containing embedded security risks using headless Kiro CLI, running each of Opus 4.8, 4.7, and 4.6 five times respectively, and compared the trends in diagnostic results, speed, and credit consumption.
2026.05.30

This page has been translated by machine translation. View original

Introduction

On May 29, 2026, Claude Opus 4.8, Anthropic's latest model, became available for selection in Kiro CLI (Kiro CLI 2.5.0 or later, treated as experimental preview).

https://kiro.dev/blog/opus-4-8/

Opus 4.8 became available on Amazon Bedrock and Claude Platform on AWS on May 28, 2026, and support was announced for Kiro (IDE / CLI / Web) the following day, May 29.

https://dev.classmethod.jp/articles/20260529-amazon-bedrock-claude-opus-4-8/

This article confirms that Opus 4.8 can actually be selected and run in Kiro CLI 2.5.0, and measures the differences when changing model generations (4.8 / 4.7 / 4.6) and effort (low to max) on the same CloudFormation template analysis task.

Model List and Credit Multipliers on Kiro

Here is an excerpt of the model list that can be confirmed with the /model command. The credit multiplier for all Opus 4.x models is the same at 2.20x.

Model Credit Multiplier Notes
claude-opus-4.8 2.20x Experimental preview / 1M context
claude-opus-4.7 2.20x Experimental preview / 1M context
claude-opus-4.6 2.20x Claude Opus 4.6
claude-sonnet-4.6 1.30x Latest Sonnet / 1M context

According to the official Kiro blog, the target plans are Kiro Pro / Pro+ / Power, with a context of 1M tokens and a maximum output of 128K tokens. Available regions are AWS US-East-1 (Northern Virginia) and Europe (Frankfurt), with cross-region inference support.

Verification Details

Task and Scoring Method

A CloudFormation template (eval_template.yaml) intentionally containing 10 issues was fed to Kiro CLI, which was asked to provide an architecture overview and point out security/best practice issues.

eval_template.yaml (planted CFn template)
AWSTemplateFormatVersion: '2010-09-09'
Description: Sample stack for analysis (intentionally contains ~10 planted issues)

Resources:
  DataBucket:
    Type: AWS::S3::Bucket
    Properties:
      BucketName: my-app-data-bucket
      AccessControl: PublicRead

  AppSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: app sg
      VpcId: vpc-0abc1234
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
        - IpProtocol: tcp
          FromPort: 3306
          ToPort: 3306
          CidrIp: 0.0.0.0/0

  AppRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: { Service: ec2.amazonaws.com }
            Action: sts:AssumeRole
      Policies:
        - PolicyName: full-access
          PolicyDocument:
            Version: '2012-10-17'
            Statement:
              - Effect: Allow
                Action: '*'
                Resource: '*'

  AppInstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Roles: [ !Ref AppRole ]

  AppServer:
    Type: AWS::EC2::Instance
    Properties:
      ImageId: ami-0123456789abcdef0
      InstanceType: t3.large
      IamInstanceProfile: !Ref AppInstanceProfile
      SecurityGroupIds: [ !Ref AppSecurityGroup ]

  AppDatabase:
    Type: AWS::RDS::DBInstance
    Properties:
      Engine: mysql
      DBInstanceClass: db.t3.medium
      AllocatedStorage: '20'
      MasterUsername: admin
      MasterUserPassword: Password123!
      PubliclyAccessible: true
      StorageEncrypted: false
      VPCSecurityGroups: [ !Ref AppSecurityGroup ]

Outputs:
  DBEndpoint:
    Value: !GetAtt AppDatabase.Endpoint.Address

The 10 items subject to scoring are as follows.

  1. S3 public (PublicRead)
  2. S3 no encryption
  3. S3 PublicAccessBlock not configured
  4. SSH (port 22) open to all
  5. DB port (3306) open to all
  6. IAM full permissions (Action / Resource = *)
  7. DB password hardcoded in plaintext
  8. RDS PubliclyAccessible
  9. RDS storage encryption disabled
  10. Data retention (deletion protection / automatic backup / snapshots, etc.) not configured

Scoring was based on keyword matching, while false judgments due to expression variations were corrected by visually reviewing the body text. Item 10 uses OR judgment, where detection is counted if any of deletion protection, backup, or snapshots are mentioned.

Execution Method

Each condition was run in headless mode.

kiro-cli chat --no-interactive --trust-tools=read --model <model> "<prompt>"

Effort switching was configured as follows.

# Explicit specification
kiro-cli settings chat.modelDefaults '{"claude-opus-4.8":{"output_config":{"effort":"high"}}}'

# Restore to default (delete setting)
kiro-cli settings --delete chat.modelDefaults

The time in the results table uses the Time value displayed by Kiro, and Credits also uses the Kiro display value as-is. Each condition was measured 5 times and shown as an average.

bench.sh (iterative measurement script)
#!/usr/bin/env bash
# Runs the same prompt for each condition REPS times, preserving raw output and index.
# Usage: ./bench.sh [REPS] [MODE]
#   REPS : Number of repetitions per condition (default 5)
#   MODE : effort | models | both   (default effort)
set -uo pipefail

DIR="$(cd "$(dirname "$0")" && pwd)"
REPS="${1:-5}"
MODE="${2:-effort}"
MODEL="claude-opus-4.8"
EFFORTS=(low medium high xhigh max default)
MODELS=(claude-opus-4.8 claude-opus-4.7 claude-opus-4.6)

OUT="$DIR/bench/out"; ERR="$DIR/bench/err"; IDX="$DIR/bench/runs.tsv"
mkdir -p "$OUT" "$ERR"
[ -f "$IDX" ] || printf "cond\trun\twall_sec\toutfile\terrfile\n" > "$IDX"

TPL=$(cat "$DIR/eval_template.yaml")
PROMPT="Please reverse-analyze the following CloudFormation template and list (1) the architecture overview and (2) security/best practice issues exhaustively in bullet points.

\`\`\`yaml
$TPL
\`\`\`"

set_effort() {
  if [ "$1" = "default" ]; then kiro-cli settings --delete chat.modelDefaults >/dev/null 2>&1
  else kiro-cli settings chat.modelDefaults "{\"$MODEL\":{\"output_config\":{\"effort\":\"$1\"}}}" >/dev/null; fi
}

one_run() {
  local cond="$1" model="$2" r o e s en wall
  for r in $(seq 1 "$REPS"); do
    o="$OUT/${cond}_${r}.txt"; e="$ERR/${cond}_${r}.txt"
    s=$(date +%s.%N)
    timeout 600 kiro-cli chat --no-interactive --trust-tools=read --model "$model" "$PROMPT" > "$o" 2> "$e"
    en=$(date +%s.%N); wall=$(echo "$en - $s" | bc)
    printf "%s\t%s\t%s\t%s\t%s\n" "$cond" "$r" "$wall" "$o" "$e" >> "$IDX"
    echo "[$(date +%T)] $cond run $r/$REPS  wall=${wall}s"
  done
}

if [ "$MODE" = "effort" ] || [ "$MODE" = "both" ]; then
  for ef in "${EFFORTS[@]}"; do set_effort "$ef"; one_run "effort_${ef}" "$MODEL"; done
fi
if [ "$MODE" = "models" ] || [ "$MODE" = "both" ]; then
  kiro-cli settings --delete chat.modelDefaults >/dev/null 2>&1
  for m in "${MODELS[@]}"; do one_run "model_${m}" "$m"; done
fi

kiro-cli settings --delete chat.modelDefaults >/dev/null 2>&1
echo "[$(date +%T)] DONE. Analysis: python3 analyze.py"
analyze.py (statistics aggregation script)
#!/usr/bin/env python3
"""Analyzes bench.sh output. Outputs mean, median, and outliers per condition."""
import re, os, statistics as st

DIR = os.path.dirname(os.path.abspath(__file__))
IDX = os.path.join(DIR, "bench", "runs.tsv")
ANSI = re.compile(r'\x1b\[[0-9;?]*[a-zA-Z]')

CHECKS = {
    "S3 public":            ["publicread", "public", "exposed"],
    "S3 no encryption":     ["bucketencryption", "not encrypted", "no encryption",
                             "encryption not configured", "encryption not set", "no encryption",
                             "encryption disabled", "sse-", "enable encryption", "data stored in plaintext"],
    "S3 no block":          ["publicaccessblock", "public access block"],
    "port 22 open":         ["22", "ssh"],
    "port 3306 open":       ["3306"],
    "IAM full access":      ["action: '*'", "wildcard", "full access", "'*'", "administrator", "full access"],
    "DB password plaintext": ["plaintext", "hardcoded", "password123", "secrets manager", "inline"],
    "RDS public":           ["publiclyaccessible", "internet", "publicly exposed", "public ip"],
    "RDS no encryption":    ["storageencrypted", "at rest", "storage encryption", "data at rest not encrypted"],
    "no deletion/backup":   ["deletionpolicy", "deletion protection", "snapshot",
                             "backup", "multiaz", "multi-az"],
}

def strip(p):
    return ANSI.sub("", open(p, encoding="utf-8", errors="replace").read())

def parse_run(outf, errf):
    body = strip(outf); lo = body.lower()
    detect = sum(any(w in lo for w in ks) for ks in CHECKS.values())
    chars = len(re.sub(r"\s", "", body))
    err = strip(errf) if os.path.exists(errf) else ""
    m = re.search(r'Credits:\s*([\d.]+).*?Time:\s*(\d+)\s*([hms])', err)
    credits = float(m.group(1)) if m else None
    app_t = None
    if m:
        v = int(m.group(2))
        app_t = v*60 if m.group(3) == "m" else (v*3600 if m.group(3) == "h" else v)
    return credits, app_t, detect, chars

def iqr_outliers(vals):
    xs = sorted(v for v in vals if v is not None)
    if len(xs) < 4:
        return set()
    q1 = st.quantiles(xs, n=4)[0]; q3 = st.quantiles(xs, n=4)[2]; iqr = q3 - q1
    lo, hi = q1 - 1.5*iqr, q3 + 1.5*iqr
    return {v for v in xs if v < lo or v > hi}

def stats(vals):
    xs = [v for v in vals if v is not None]
    if not xs:
        return None
    return dict(n=len(xs), mean=st.mean(xs), median=st.median(xs),
                stdev=(st.pstdev(xs) if len(xs) > 1 else 0.0),
                min=min(xs), max=max(xs))

runs = {}
if not os.path.exists(IDX):
    raise SystemExit(f"not found: {IDX}  (run ./bench.sh first)")
for line in open(IDX):
    if line.startswith("cond\t"):
        continue
    cond, r, wall, outf, errf = line.rstrip("\n").split("\t")
    cr, at, det, ch = parse_run(outf, errf)
    runs.setdefault(cond, []).append(
        dict(run=int(r), wall=float(wall), credits=cr, app_t=at, detect=det, chars=ch))

metrics = [("credits", "cr"), ("app_t", "s"), ("wall", "s"), ("detect", "/10"), ("chars", "")]
for cond in sorted(runs):
    rs = runs[cond]; n = len(rs)
    print(f"\n===== {cond}  (n={n}) =====")
    print(f"{'metric':9}{'mean':>9}{'median':>9}{'stdev':>9}{'min':>8}{'max':>8}{'  outliers(run#)'}")
    for key, unit in metrics:
        s = stats([x[key] for x in rs])
        if not s:
            print(f"{key:9}{'n/a':>9}"); continue
        outs = iqr_outliers([x[key] for x in rs])
        orun = [f"#{x['run']}({x[key]})" for x in rs
                if x[key] in outs and x[key] is not None]
        print(f"{key:9}{s['mean']:>9.2f}{s['median']:>9.2f}{s['stdev']:>9.2f}"
              f"{s['min']:>8.2f}{s['max']:>8.2f}  {', '.join(orun) if orun else '-'}")

Results

Comparison by Model (no effort specified = each model's default, n=5 average)

Model Detection (average) Credits (average) Kiro displayed Time (average) Output character count (average)
claude-opus-4.8 10/10 0.65cr 36.6s approx. 1,940
claude-opus-4.7 10/10 0.98cr 50.0s approx. 3,000
claude-opus-4.6 9.2/10 0.32cr 20.8s approx. 1,260

The most clear difference was by model generation. 4.7 had the highest Credits and time, and also produced longer responses. In the 5-run average for this test, 4.8 came out smaller than 4.7 in both Credits and time (0.65 vs 0.98cr, 36.6 vs 50.0s). Detection was 10/10 for both. 4.6 had the smallest Credits and time (0.32cr / 20.8s), with detection at 9.2/10.

The difference with 4.6 was mainly in mentions of "S3 PublicAccessBlock not configured." Since it is recommended to enable PublicAccessBlock for S3 and, even when public access is needed, to use configurations like CloudFront OAC to avoid direct public exposure, this was included as a best practice scoring item. 4.7 / 4.8 pointed out this issue, while 4.6 had fewer mentions of this perspective.

On the other hand, 4.6 was still able to detect critical risks such as SSH / DB port fully open, IAM full permissions, DB password in plaintext, RDS public access, and RDS no encryption. It can be a sufficient choice for use cases that prioritize conciseness, speed, and lower Credits.

Comparison by Effort (claude-opus-4.8, n=5 average)

effort Detection (average) Credits (average) Kiro displayed Time (average)
low 10/10 0.67cr 35.4s
medium 10/10 0.66cr 35.6s
high 10/10 0.67cr 34.6s
xhigh 9.6/10 0.61cr 36.6s
max 10/10 0.62cr 33.6s
not specified (default) 10/10 0.64cr 34.8s

No clear trend was observed in detection, Credits, or time across low to max. At least for this CloudFormation analysis task, the difference in final output from changing effort was not significant.

Summary

Claude Opus 4.8 became available for selection in Kiro CLI 2.5.0, albeit as a preview. In this task, 4.8 showed equivalent detection capability to 4.7 while achieving smaller Credits and time, and when comparing default settings, it appeared to be a superior alternative. Meanwhile, 4.6 is concise, fast, and low in Credits, capable of detecting critical risks, and looks set to remain a strong option in scenarios where speed or Credits are a priority.

https://aws.amazon.com/blogs/machine-learning/claude-opus-4-8-is-now-available-on-aws/


生成AI活用はクラスメソッドにお任せ

過去に支援してきた生成AIの支援実績100+を元にホワイトペーパーを作成しました。御社が抱えている課題のうち、どれが解決できて、どのようなサービスが受けられるのか?4つのフェーズに分けてまとめています。どうぞお気軽にご覧ください。

生成AI資料イメージ

無料でダウンロードする

Share this article

AWSのお困り事はクラスメソッドへ