I tried local voice reading of Claude Code's response with Kokoro TTS

2026.03.24
This page has been translated by machine translation. View original
 IntroductionI'm kasama from the Data Business Division.

Recently, I watched the movie Project Hail Mary. There's a scene where the protagonist Grace communicates with an alien creature called "Rocky." Rocky is a being that communicates through sound waves, and Grace records and analyzes Rocky's sounds, creating a program that maps sounds to meanings. The flow of eventually generating speech from text to talk to Rocky was very impressive.
https://projecthm.movie/
While watching this, I thought, "Maybe Claude Code's responses could also be converted from text to speech." In this article, I'll introduce a system I created that combines Claude Code's Stop hook with Kokoro TTS to read responses aloud locally when the response is complete. It uses Apple Silicon to run the voice synthesis.
 Prerequisites EnvironmentmacOS (Apple Silicon / M1 or later)
Claude Code installed
Python 3.12 or later
uv (Python package manager) installed
 What are Claude Code Hooks?Claude Code Hooks are a mechanism that allows shell commands to be inserted at specific lifecycle events (before/after tool execution, when responses complete, etc.). They are defined in the hooks field in ~/.claude/settings.json.
The Stop event we'll use here fires when Claude Code's response is complete, and receives a JSON like the following on standard input:
{
  "session_id": "abc123",
  "transcript_path": "~/.claude/projects/.../00893aaf.jsonl",
  "cwd": "/Users/...",
  "hook_event_name": "Stop",
  "stop_hook_active": true,
  "last_assistant_message": "Refactoring completed. ..."
}
The last_assistant_message field contains Claude's final response text, which we'll pass to the TTS engine for reading aloud.
Please refer to the official documentation for Hooks:
https://code.claude.com/docs/en/hooks#stops
 Kokoro TTS and mlx-audioKokoro is a lightweight 82M parameter TTS model. It runs on Apple's MLX framework through the mlx-audio library and can utilize Apple Silicon with zero copying. We'll use the Japanese male voice jm_kumo.
https://huggingface.co/mlx-community/Kokoro-82M-bf16
 Process FlowWhen Claude Code completes its response, the following sequence of processes is executed:
Claude Code completes its response and the Stop hook fires
If the environment variable CLAUDE_VOICE=1 is set, the TTS script runs asynchronously
The TTS script removes markdown, converts English to katakana, and synthesizes speech
TTS runs asynchronously with "async": true, so it doesn't block Claude Code operations.
 ImplementationThe implementation code is stored on GitHub.
https://github.com/cm-yoshikikasama/blog_code/tree/main/68_claude_code_tts_hook
68_claude_code_tts_hook/
├── hooks/
│   ├── kokoro-tts/
│   │   ├── pyproject.toml       # uv project definition (TTS dependencies)
│   │   └── uv.lock              # Fixed dependency versions
│   └── say-response.py          # TTS main script
├── settings-example.json        # settings.json example
└── README.md
 settings.jsonhttps://github.com/cm-yoshikikasama/blog_code/blob/main/68_claude_code_tts_hook/settings-example.json
In settings-example.json, we define a hook for TTS reading on the Stop event. By combining a conditional branch that only executes when the environment variable CLAUDE_VOICE is 1 with asynchronous execution via "async": true, you can continue operating Claude Code while voice synthesis is in progress.
CLAUDE_VOICE is not an official Claude Code environment variable, but a flag I defined myself. You can enable it by specifying CLAUDE_VOICE=1 claude at startup. Without this flag, there's no voice output, which prevents situations where multiple sessions might return a large number of voice responses simultaneously. Since model download runs on the first TTS execution, I recommend enabling it only when needed rather than keeping it on all the time.
 TTS Scripthttps://github.com/cm-yoshikikasama/blog_code/blob/main/68_claude_code_tts_hook/hooks/say-response.py
The say-response.py script operates in four steps.
Step 1 reads the JSON passed by the Stop hook from standard input and retrieves Claude's response text from the last_assistant_message field.
Step 2, markdown removal, uses regular expressions to remove code blocks, tables, list syntax, heading symbols, notes in parentheses, URLs, etc. Code blocks are tracked with an in_code flag to detect their start and end, and text within blocks is completely skipped. Since Claude's responses often contain code, reading it aloud would be meaningless. Finally, the text is trimmed to the first 600 characters.
Step 3, English to katakana conversion, improves pronunciation accuracy in Japanese TTS. The CUSTOM dictionary defines readings for technical terms, with priority given to replacing longer strings (so CloudFormation doesn't get split into Cloud + Formation). General English words not in the dictionary are converted to katakana using the alkana library.
In Step 4, we load the Kokoro model with mlx-audio's load_model and generate audio with the generate method. Parameters like model ID and voice name are managed by constants at the top of the file. The generated WAV is written to a temporary file and played using the macOS standard afplay command. After playback, a threading.Thread automatically deletes the temporary file.
 uv Projecthttps://github.com/cm-yoshikikasama/blog_code/blob/main/68_claude_code_tts_hook/hooks/kokoro-tts/pyproject.toml
In pyproject.toml, we pin misaki[ja]==0.7.4. Misaki is a phoneme conversion library used internally by Kokoro, and versions 0.8 and later have compatibility issues with unidic, so we fix the version. espeakng-loader is required as a backend for phonemizer. alkana is a library for English to katakana conversion.
 Setup 1. Clone the Repository and Place FilesClone the files from GitHub and place them in ~/.claude/hooks/.
git clone https://github.com/cm-yoshikikasama/blog_code.git
cp -r blog_code/68_claude_code_tts_hook/hooks/* ~/.claude/hooks/
 2. Install Dependenciescd ~/.claude/hooks/kokoro-tts
uv sync
 3. Edit settings.jsonMerge the contents of settings-example.json into the hooks field of your ~/.claude/settings.json.
 4. Set Environment VariableTo enable TTS, specify the environment variable when launching Claude Code.
CLAUDE_VOICE=1 claude
This way, TTS will only be enabled for that session.
 Trying It Out VerificationLet's ask Claude Code a question with CLAUDE_VOICE=1 set.
CLAUDE_VOICE=1 claude
Here's a video of my test. I think Kokoro TTS's Japanese voice is natural and easy to understand.

For technical terms, abbreviations like AWS and CDK defined in the CUSTOM dictionary are correctly pronounced in katakana. English words not registered in the dictionary are automatically converted with alkana, but proper nouns may not be read accurately. Add entries to the CUSTOM dictionary as needed.

https://youtu.be/Zqa14Yf5Gso
 ConclusionWhile we're still far from Rocky's voice, there are TTS systems that can generate voices from voice samples, which I'd like to try in the future. Also, I felt that conversing in English would increase opportunities to engage with English in daily work, making it effective for language learning. The day Anthropic officially provides a voice response feature may be near, but until then, I'd like to continue using this system for dialogue-based tasks. Note that using it simultaneously in multiple sessions causes voice overlapping, so I recommend enabling it in just one session.
If introducing voice models is difficult or restricted by company policies, you can also implement voice responses using macOS's standard say command. This would require extracting text suitable for reading by removing markdown and code blocks from response text and passing it to say, but it's easier to implement compared to setting up Kokoro TTS, so you might want to try that approach first.
The movie "Project Hail Mary" was also very interesting, so if you haven't seen it yet, please do.