
AI co-authored 40% of the codebase: A retrospective on 155 commits over 4 months
This page has been translated by machine translation. View original
Introduction
I spent 4 months developing a RAG system for inquiry handling almost entirely on my own. It's a full-stack application built with Next.js + FastAPI + Amazon Bedrock Knowledge Bases. 155 commits, 39 design documents — 62 of those commits, or 40%, were collaborative work with Claude Code (an AI agent).
This isn't a story about "getting some help from AI on a few things" — it's a record of 4 months of collaborating with AI across every phase: design, implementation, debugging, and refactoring. I'll look back at what AI-assisted development was actually like, through the lens of the commit history.
The Big Picture in Numbers
First, here are the facts extracted from the git history.
| Item | Value |
|---|---|
| Period | 2026-03-12 to 2026-06-18 (approx. 14 weeks) |
| Total commits | 155 |
| Claude co-authored commits | 62 (40%) |
| Human-only commits | 76 (49%) |
| Dependabot (automated) | 17 (11%) |
Design documents (.plans/) |
39 files |
| Python source code | approx. 4,500 lines |
| TypeScript/TSX source code | approx. 3,000 lines |
| Commit type breakdown | feat: 37, fix: 27, chore: 29, build: 27, docs: 9, refactor: 8 |

Whether a commit was AI co-authored is determined mechanically by checking whether the commit message trailer contains Co-Authored-By: Claude. Claude Code automatically appends this trailer at commit time, so there are no gaps from manual recording errors.
Opus → Sonnet: AI Model Switches Preserved in History
The breakdown of the 62 AI co-authored commits was as follows.
| Model | Commit count | Period |
|---|---|---|
| Claude Opus 4.6 | 13 | 2026-03-12 to 03-18 (first week) |
| Claude Sonnet 4.6 | 49 | 2026-03-24 to 06-18 (everything after) |
Development started with Opus (the top-tier model) for the first week, then switched to Sonnet.
What's interesting is that this switch is naturally recorded in the commit history. Since only the model name in the Co-Authored-By trailer changes, looking back you can clearly see "ah, this is where the model changed."
Controlling Co-Authored-By: Switchable via settings.json
This Co-Authored-By trailer can actually be controlled via ~/.claude/settings.json (or .claude/settings.json in the project root).
{
"attribution": {
"commit": "",
"pr": ""
}
}
commit specifies the trailer text added to commit messages, and pr specifies the trailer text when creating PRs. Setting an empty string "" disables the trailer. By default, Co-Authored-By: Claude <noreply@anthropic.com> is added automatically.
Some teams may want to explicitly mark which commits involved AI, while others may prefer not to distinguish them from regular commits. This setting allows flexible switching.
On this project I disabled it partway through using this setting, but for retrospective analysis it would have been more useful to keep the trailers. Being able to quantitatively analyze the reality of AI co-development after the fact is only possible because of those trailers.
The .plans/ Directory: 39 Design Sessions with AI
The most unique artifact in this project is the 39 design documents accumulated in the .plans/ directory.
These are design documents created in "planning mode" with Claude Code before implementing each feature. Each document follows a consistent structure.
# Title
**Status:** Complete
**Date:** 2026-XX-XX
## Background
Why this change is needed. Current problems.
## Approach
How to solve it. Rationale for technical choices.
## Changes
What specifically is being changed and how.
Here are a few examples.
Example 1: Structured Output via Bedrock Tool Use
There was a bug where, in places that parsed LLM output using json.loads + regex, the token limit was being hit due to the high number of Japanese tokens, causing JSON to be truncated mid-output. In a design session with Claude Code, we switched to an approach using toolConfig + toolChoice in the Bedrock Converse API to guarantee valid JSON at the schema level.
Example 2: Direct OpenSearch NextGen Queries
In response to a bug where Bedrock Knowledge Bases' Retrieve API was returning 403 errors on OpenSearch NextGen (an AWS-side issue), we implemented a bypass using opensearch-py to perform KNN searches directly. The design uses the existing Bedrock KB index as-is and routes traffic via a single environment variable switch.
These design documents aren't "throwaway" artifacts — they function as a log of the project's decision-making. Why this approach was chosen, what other options existed, what should be reconsidered in the future — they serve a role similar to ADRs (Architecture Decision Records).
Where AI Helped: Analysis by Pattern
Categorizing the 62 AI co-authored commits reveals where AI was particularly strong.
1. Initial Implementation of Full-Stack Features (feat type: approx. 20 commits)
Where AI showed the most power was in implementing backend API + frontend UI + tests all in one go.
For example, the following were completed in a single work session:
- Adding a FastAPI endpoint
- Defining Pydantic models
- Creating React components on the frontend
- Adding SSE events
- Updating the OpenAPI spec
Doing this as a single human means your concentration gets worn down by context switching (bouncing between Python → TypeScript → CSS), but AI can traverse all layers while maintaining context.
2. Debugging: Forming and Testing Hypotheses (fix type: approx. 15 commits)
When you hand AI a "investigate this bug" task, it reads the codebase, forms multiple hypotheses, and starts verifying from the most likely one. In cases like the SSE streaming issue mentioned earlier — where what looked like one problem turned out to be a compound of four independent causes — where a human alone would keep cycling through "fixed it... wait, no I didn't," AI was able to analyze it structurally.
3. Refactoring and Migration (refactor type: approx. 8 commits)
boto3 → anthropic SDK migration, setting up Docker infrastructure, removing legacy code — these "replace without breaking" tasks are AI's strength. Swapping out internal implementations while maintaining existing interfaces requires accurately understanding the scope of impact, and AI can take the entire codebase into context while working.
4. Documentation and Configuration (docs/chore type: approx. 15 commits)
Auto-generating OpenAPI specs, updating user manuals, cleaning up .env, optimizing Dockerfiles — the barrier to asking AI to handle these "should do but easy to put off" tasks is extremely low.
Where AI Struggled
On the other hand, there were situations where handing things to AI didn't go as expected.
1. Company-Specific Domain Knowledge
Internal rules like "exclude tickets if their category ID is a specific value" are things AI simply won't know unless you tell it. In early data pipeline implementations, aligning on this kind of domain logic took time.
2. Troubleshooting Infrastructure and Network Issues
SOCKS proxy configuration via VPN, EC2 instance role permissions, nginx buffering behavior — environment-dependent issues that you can only understand by actually running things cannot be solved by AI alone. It ends up becoming an interactive debugging session where you show it logs.
3. Fine-Grained UI Adjustments
Sensory UI tweaks like "this button placement feels slightly off..." or "this color tone..." have a high cost to verbalize, and there were many cases where it was faster to just fix them myself.
Actual Development Speed
Looking at the monthly commit distribution, you can see the rhythm of development.
| Month | Commit count | Main work |
|---|---|---|
| March | 24 | Prototype construction, data pipeline foundation |
| April | 55 | UI overhaul, SSE streaming |
| May | 60 | Structured output, SDK migration, OpenAPI setup |
| June | 16 | OpenSearch NextGen migration, API Gateway integration |

The plan was March for PoC and technical validation (0.4 person-months), then full development from April (0.8 person-months), but commit counts surged in April and May. This coincides with the period when I got the hang of the AI co-development tempo.
Particularly noteworthy is the fact that a single engineer shipped 14 versions across a full stack (frontend + backend + data pipeline + infrastructure + documentation) in 4 months. Without AI co-development, I don't think this pace would have been possible.
The AI Co-Development Workflow
Here's what the actual development flow looked like.

1. Planning Phase
Human: "I want to add a ○○ feature. Requirements are △△"
Claude Code (plan mode): Creates design document with background, approach, and changes
Human: Reviews and gives revision instructions
→ Saved as .plans/xxx.md
2. Implementation Phase
Claude Code: Implements code based on design document
Human: Verifies behavior and gives feedback
Claude Code: Makes corrections and additional implementations
→ Commit (Co-Authored-By: Claude trailer added automatically)
3. Review Phase
Human: Reviews git diff
Claude Code: Makes corrections based on code review feedback
→ Final commit
What's important is that the human always makes the final judgment on reviews and merges. AI makes proposals and handles implementation, but the "this is good" decision is made by the human.
CLAUDE.md: "Development Guidelines" for AI
The project root's CLAUDE.md contains development rules for the AI.
## LLM Output Structured Format
Always use Bedrock tool use to enforce structured JSON output.
Never use regex or json.loads to parse raw LLM text responses.
## Scope Rules
- Never generate code for features not explicitly requested.
- Never add fields to Pydantic models unless referenced in both route and frontend.
- If a function has no callers after implementation, delete it before committing.
This is the same concept as handing coding guidelines to a human developer. Because AI has a tendency to "helpfully" add extra features or generate unused helper functions, explicitly constraining scope maintains quality.
By accumulating lessons learned from past failures here, we prevent the same mistakes from recurring.
Conclusion
Looking back on 4 months of AI co-development, there are a few things I can say with confidence.
What AI changed:
- The elimination of context switching — In work that spans Python ↔ TypeScript ↔ CSS ↔ Docker, AI completes the implementation before human concentration gives out
- The elimination of "it's a hassle, I'll do it later" — Documentation, tests, configuration file cleanup, and other work that would be better done but tends to get put off can be easily requested
- Design documentation — The 39 files in
.plans/are artifacts that wouldn't have existed without dialogue with AI. They're records that wouldn't have survived with the old way of deciding "let's do it this way" verbally and jumping into implementation
What AI didn't change:
- Design decisions are still human work — What to build, why to build it, and which option to choose are all decisions made by humans
- Domain knowledge still needs to be taught — Company-specific rules and tacit knowledge require the cost of verbalizing and conveying them to AI
- Reviews can't be skipped — Whether code was written by AI or a human, the importance of review doesn't change
The 62/155 figure is not a record of "handing things off to AI" — it's a record of "going through the design, implementation, and review cycle 155 times together with AI."