I tried visualizing Claude Code's OTel logs on a CloudWatch dashboard

I tried visualizing Claude Code's OTel logs on a CloudWatch dashboard

2026.06.13

This page has been translated by machine translation. View original

Introduction

I'm kasama from the Data Business Division.
This time, I'd like to send Claude Code's OpenTelemetry (OTel) telemetry directly to CloudWatch Logs and visualize team usage on a CloudWatch dashboard. Since dashboards tend to go unvisited after they're created, I'll also build a Skill that generates and posts a weekly Slack digest as a complete package.

Overview

Claude Code has built-in functionality to send metrics and logs via OpenTelemetry. Details are described in the Anthropic official documentation.

https://code.claude.com/docs/en/monitoring-usage

On the AWS side, CloudWatch natively provides an OTLP endpoint, and using Bearer Token authentication allows direct log transmission from workloads outside AWS without a collector.

https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/CWL_HTTP_Endpoints_OTLP.html

The basic steps for sending OTel logs from Claude Code / Cowork to CloudWatch Logs without a collector are explained in detail in the following article. This article builds on that configuration and adds Sanitizer Lambda for secret removal, dashboarding, and a weekly Slack digest.

https://dev.classmethod.jp/articles/claude-otel-cloudwatch-otlp-wo-collector/

Combining these two approaches allows telemetry from each member's Claude Code to be aggregated into the team's AWS account without running a container or EC2 instance for an OTel Collector. The only additional resource is a single serverless Sanitizer Lambda.

Architecture

claude-otel-architecture

The processing flow is as follows.

  1. Each member's Claude Code sends logs directly to the CloudWatch Logs OTLP endpoint via OTLP HTTPS + Bearer Token
  2. Events arrive at the Raw LogGroup (1-day retention)
  3. The Sanitizer Lambda is triggered via Subscription Filter and discards sensitive fields using an allowlist approach
  4. Written back to the Sanitized LogGroup (60-day retention)
  5. The CloudWatch dashboard and weekly digest Skill reference only Sanitized data

Why a two-stage Raw and Sanitized configuration

When OTEL_LOG_TOOL_DETAILS=1 is enabled, Claude Code's OTel logs send not only skill names but also Bash command text and file paths as tool details. While skill names are needed for per-skill usage aggregation, command text and paths may contain repository structure, which we don't want stored long-term in the team's shared account. Therefore, the Raw side is kept at 1-day retention, and the Lambda passes only allowed fields (cost, token count, model name, skill name, subagent type, MCP server name, etc.) to the Sanitized side. Prompt text is sent as <REDACTED> unless OTEL_LOG_USER_PROMPTS is configured.

Differences from existing verification methods

Method Target Form
/usage command Individual On-the-spot confirmation within a session
/insights command Individual Local HTML report
OTel + CloudWatch dashboard (this article) Team Continuous collection + dashboard + weekly Slack

Built-in commands are sufficient for individual usage, but aggregating via OTel is necessary to continuously track "who on the team is using which skills, subagents, and MCP, and how much."

Constraints

  • Bearer Token authentication for the CloudWatch Logs OTLP endpoint is limited to US regions (us-east-1, us-east-2, us-west-1, us-west-2). Since telemetry is stored in the US, please verify in advance if you have data residency requirements.

Prerequisites

  • Claude Code must be installed
  • AWS CLI v2, SAM CLI, Python 3.13, and pytest must be installed
  • Authentication for the destination AWS account must be configured (able to run aws ... --profile <your-profile>)

Implementation

The implementation code is stored on GitHub.

https://github.com/cm-yoshikikasama/blog_code/tree/main/70_claude_code_otel_dashboard

The project structure is as follows.

70_claude_code_otel_dashboard/
├── cfn/
│   └── claude-otel-logs.yml          # LogGroup, IAM User, dashboard
├── sam/
│   └── claude-otel-sanitizer/
│       ├── template.yaml             # Sanitizer Lambda + Subscription Filter
│       ├── samconfig.toml
│       ├── src/handler.py            # allowlist filter
│       └── tests/test_handler.py
└── .claude/skills/claude-usage-digest/   # Weekly Slack digest Skill
    ├── SKILL.md
    ├── references/
    ├── scripts/queries.txt           # Logs Insights query collection
    └── templates/digest-template.txt

CloudFormation

https://github.com/cm-yoshikikasama/blog_code/blob/main/70_claude_code_otel_dashboard/cfn/claude-otel-logs.yml

claude-otel-logs.yml defines two LogGroups (Raw / Sanitized), a write-only IAM User, and the dashboard. The IAM User's permissions are limited to logs:CallWithBearerToken and PutLogEvents / CreateLogStream to the Raw LogGroup only, so even if a Bearer Token is leaked, it cannot do anything other than write.

The dashboard is composed of Logs Insights queries to the Sanitized LogGroup and includes the following widgets.

  • Team totals: Team-wide totals (cost, message count, skill / subagent / MCP usage count)
  • Daily cost per user: Daily cost trend per user (only the top 10 users by cost within the display period are shown for readability)
  • Per-user activity: Per-user summary (session count, model-specific invocations, cost per message, cache rate)
  • Top skills / subagents / MCP servers / models: Rankings of what is being used by whom

Notable points are that dashboard variables (variables) allow switching the table sort axis, and that cost, messages, and skill count are aggregated in a single stats. As an example, here is an excerpt of the core of the Per-user activity query.

SOURCE '/demo-team/claude-otel-sanitized'
| filter ispresent(attributes.user.email)
| parse attributes.user.email /(?<user>[^@]+)/
| stats sum(coalesce(attributes.cost_usd, 0)) as cost,
        count_distinct(attributes.session.id) as sessions,
        sum(if(body = "claude_code.user_prompt", 1, 0)) as messages,
        sum(if(body = "claude_code.skill_activated", 1, 0)) as skills,
        ...
        by user

SAM (Sanitizer Lambda)

https://github.com/cm-yoshikikasama/blog_code/blob/main/70_claude_code_otel_dashboard/sam/claude-otel-sanitizer/template.yaml

template.yaml defines the Sanitizer Lambda and Subscription Filter. Using SAM's CloudWatchLogs event source automatically configures the Subscription Filter creation and Lambda execution permissions for CloudWatch Logs. The Lambda's IAM policy only grants write access to the Sanitized LogGroup; read permissions for the Raw side are unnecessary (since the Subscription Filter pushes events).

https://github.com/cm-yoshikikasama/blog_code/blob/main/70_claude_code_otel_dashboard/sam/claude-otel-sanitizer/src/handler.py

handler.py is the body of the allowlist filter. Allowed fields are declared in the ATTRS / RES_ATTRS sets, and everything else (tool_input, prompt, bash_command, etc.) is discarded. tool_parameters is handled specially, extracting only the subagent_type from Agent tools and mcp_server_name from MCP tools. The reason for using an allowlist rather than a denylist is to fail safely even if new attributes are added by a Claude Code update.

https://github.com/cm-yoshikikasama/blog_code/blob/main/70_claude_code_otel_dashboard/sam/claude-otel-sanitizer/tests/test_handler.py

test_handler.py uses fixtures that simulate real events to verify with exact matching that PII (user.id, organization.id, request_id, etc.) does not leak. The practice is to always update tests when modifying the allowlist.

Skill (Weekly Slack Digest)

https://github.com/cm-yoshikikasama/blog_code/tree/main/70_claude_code_otel_dashboard/.claude/skills/claude-usage-digest

Since dashboards tend to go unvisited after they're created, a digest generation Skill for weekly Slack posting is provided. When run without arguments, it retrieves the last 7 days of aggregates (team totals, who used which skills, heavy users, MCP usage) from Logs Insights and assembles a Slack message. When run with arguments, it operates as a natural language usage analysis (e.g., "analyze why costs spiked last week").

A key feature is that it includes AI insights beyond just aggregated values. Before generating insights, it fetches the Claude Code official documentation (changelog, etc.) live via WebFetch, and proposes a flow of "observed usage → latest official features → next steps," making the digest itself a driver of team adoption. Note that the digest only includes positive framing that celebrates active users and skills; it does not call out inactive users by name (the purpose is horizontal adoption, not surveillance).

Scheduled execution was considered using Claude Desktop routines, but since the IAM role used for Logs Insights queries requires MFA and is incompatible with headless execution, it is currently run manually on a weekly basis.

Deployment

1. Deploy the CloudFormation Stack

Creates LogGroups, IAM User, and dashboard.

cd 70_claude_code_otel_dashboard
aws cloudformation deploy \
  --stack-name demo-team-claude-otel-logs \
  --template-file cfn/claude-otel-logs.yml \
  --parameter-overrides ProjectName=demo-team LogRetentionDays=60 RawLogRetentionDays=1 \
  --capabilities CAPABILITY_NAMED_IAM \
  --region us-east-1 \
  --profile <your-profile>

2. Deploy the Sanitizer Lambda

Before deployment, confirm the allowlist behavior and absence of PII leakage with unit tests.

cd sam/claude-otel-sanitizer
pytest tests/ -v
sam build
sam deploy --profile <your-profile>

The Lambda execution logs (/aws/lambda/demo-team-claude-otel-sanitizer) are retained indefinitely by SAM's default, so set a retention period after deployment.

aws logs put-retention-policy \
  --log-group-name /aws/lambda/demo-team-claude-otel-sanitizer \
  --retention-in-days 60 --region us-east-1 --profile <your-profile>

3. Enable Bearer Token Authentication and Issue API Key (Manual)

This cannot be completed with CloudFormation and requires manual work in the console. Although a BearerTokenAuthenticationEnabled property exists for LogGroups, as of April 2026 there were cases where enabling it failed with a security token invalid error, so this template uses a procedure to enable it from the console.

  1. CloudWatch console (us-east-1) → Log groups → /demo-team/claude-otelActionsEnable bearer token authentication
  2. IAM console → Users → demo-team-claude-otel-writerSecurity credentialsAPI keysCreate API key (note the displayed secret as it cannot be shown again)
  3. Save to SSM Parameter Store as SecureString /demo-team/claude-otel/bearer-token for team distribution

4. Claude Code Configuration for Each Member

Each member adds the following to the env section of ~/.claude/settings.json. Since it contains the Bearer Token, do not write this in the repository-side .claude/settings.json.

~/.claude/settings.json
{
  "env": {
    "CLAUDE_CODE_ENABLE_TELEMETRY": "1",
    "OTEL_LOGS_EXPORTER": "otlp",
    "OTEL_EXPORTER_OTLP_PROTOCOL": "http/protobuf",
    "OTEL_EXPORTER_OTLP_ENDPOINT": "https://logs.us-east-1.amazonaws.com",
    "OTEL_EXPORTER_OTLP_HEADERS": "Authorization=Bearer <token>,x-aws-log-group=/demo-team/claude-otel,x-aws-log-stream=code",
    "OTEL_LOG_TOOL_DETAILS": "1",
    "OTEL_RESOURCE_ATTRIBUTES": "user.plan=pro"
  }
}

Setting OTEL_LOG_TOOL_DETAILS=1 records the actual skill name in claude_code.skill_activated, enabling per-skill aggregation in the dashboard. user.plan in OTEL_RESOURCE_ATTRIBUTES is optional; when set, a plan column appears in the dashboard.

If you want to stop sending telemetry for a specific repository, you can opt out by overriding CLAUDE_CODE_ENABLE_TELEMETRY to 0 in that repository's .claude/settings.json. For customer-confidential repositories, it is safer to explicitly state this in the repository-side .claude/settings.json for team sharing.

What I Tried

Ingesting Validation Data

For validation, I injected OTel events from my own last 7 days of Claude Code usage history (local session logs) into the Raw LogGroup to run the entire pipeline. Email addresses were replaced with sample values (kasama@example.com).

Verifying Sanitizer Behavior

After injection into the Raw LogGroup, events passed through the allowlist were written to the Sanitized LogGroup within a few seconds. The events on the Sanitized side are in the following form, and it can be confirmed that tool detail and prompt-related fields have been dropped.

{
  "body": "claude_code.api_request",
  "attributes": {
    "user.email": "kasama@example.com",
    "session.id": "<session-uuid>",
    "model": "claude-opus-4-8",
    "cost_usd": "0.1000455",
    "input_tokens": "10",
    "cache_read_tokens": "190383"
  },
  "resource": {
    "attributes": {
      "user.plan": "max-5x",
      "service.name": "claude-code",
      "service.version": "2.1.121"
    }
  }
}

Verifying the Dashboard

Open the dashboard from the URL output in CloudFormation's Outputs. The first text widget describes how to read each widget and the meaning of columns (such as interpreting cost_per_msg_usd and cache_pct), allowing anyone who opens it to understand how to read it on the spot.
daily_cost_per_user

Team totals shows the team-wide totals.

team_totals

Daily cost per user shows the daily cost trend per user (only the top 10 users by cost are shown for readability).
Screenshot 2026-06-13 at 18.13.11

Per-user activity allows you to grasp each user's cost, session count, message count, skill / subagent / MCP count, model-specific invocations, and cache rate in a single row. In the validation data, approximately $888 over 7 days (API equivalent cost for the Max plan; since Max is a flat-rate subscription, this is not the actual billed amount), and cache_pct was 99.9%, indicating that prompt caching was in effect.

per_user_activity

These are the Top skills / subagents / MCP servers / models rankings. Daily-note type and PR creation type skills rank at the top, subagents are centered on general-purpose, and MCP shows Slack and Google Calendar as the most used — directly reflecting my own usage patterns.

top_skills

top_subagents

top_mcp_servers

top_models

Posting the Weekly Digest to Slack

Running /claude-usage-digest without arguments aggregates the last 7 days using Logs Insights, fetches the official changelog via WebFetch, and assembles a Slack message. This time, it was sent to my own DM via Slack MCP. The actually posted message is as follows.

slack_weekly_digest

Since the AI insights' "to make more use of it" section is updated weekly based on the official changelog, the digest itself serves as a way to inform the team about new features.

Cost

This is actual measurement from approximately one month of operation (31 days, 1,238 sessions, 7,121 messages) by a team of 15 people. Infrastructure costs (ingestion, Lambda, storage) were just under $0.2 per month.

Item Actual (15 people, 31 days)
Ingestion (Raw + Sanitized + Lambda logs) $0.17
Storage (Sanitized and others) $0.001
Lambda (approx. 40,000 executions, mostly within free tier) $0.008
Subtotal approx. $0.18

The main cost item is ingestion, which is billed twice for Raw and Sanitized (the cost of the two-stage configuration; Lambda stays within the free tier even with many executions).

Separate from this, Logs Insights scan fees ($0.005/GB) apply. This is a variable cost that scans "number of widgets × data volume for the time range" each time the dashboard is opened. For the 15-person, approximately 1-month scenario above, it was $0.26 per month. As usage scale and dashboard viewing frequency increase, this could reach the tens of dollars range depending on assumptions (e.g., 100 people opening it once a week with a default 31-day range could cost tens of dollars per month). Shortening the time range (from the default 31 days to 7 days, etc.) is the most effective cost-saving measure. Since there is no collector to keep running, fixed costs are nearly zero and operation is entirely pay-as-you-go.

Closing

I set up a system to send Claude Code's OTel telemetry directly to the CloudWatch Logs OTLP endpoint, strip sensitive data with Sanitizer Lambda, visualize team usage on a dashboard, and automate weekly Slack digest generation. By eliminating the need for a collector so fixed costs are nearly zero, using an allowlist approach so sensitive data doesn't remain, and making the digest "arrive" rather than requiring people to "go look at" the dashboard, we've made sharing the team's AI usage status something that works as an ongoing practice. I hope this is helpful.


Claudeならクラスメソッドにお任せください

クラスメソッドは、Anthropic社とリセラー契約を締結しています。各種製品ガイドから、業種別の活用法、フェーズごとのお悩み解決などサービス支援ページにまとめております。まずはご覧いただき、お気軽にご相談ください。

サービス詳細を見る

Share this article