AI co-authored 40% of the codebase: A retrospective on 155 commits over 4 months

AI co-authored 40% of the codebase: A retrospective on 155 commits over 4 months

62 out of 155 commits, or 40%, were collaborative work with Claude Code. Looking back at the reality of AI co-development through the actual git history of a RAG system built with Next.js and FastAPI over four months of development.
2026.06.25

This page has been translated by machine translation. View original

Introduction

I spent 4 months developing a RAG system for inquiry handling almost entirely on my own. It's a full-stack application built with Next.js + FastAPI + Amazon Bedrock Knowledge Bases. 155 commits, 39 design documents — 62 of those commits, or 40%, were collaborative work with Claude Code (an AI agent).

This isn't a story about "getting some help from AI on a few things" — it's a record of 4 months of collaborating with AI across every phase: design, implementation, debugging, and refactoring. I'll look back at what AI-assisted development was actually like, through the lens of the commit history.

The Big Picture in Numbers

First, here are the facts extracted from the git history.

Item Value
Period 2026-03-12 to 2026-06-18 (approx. 14 weeks)
Total commits 155
Claude co-authored commits 62 (40%)
Human-only commits 76 (49%)
Dependabot (automated) 17 (11%)
Design documents (.plans/) 39 files
Python source code approx. 4,500 lines
TypeScript/TSX source code approx. 3,000 lines
Commit type breakdown feat: 37, fix: 27, chore: 29, build: 27, docs: 9, refactor: 8

ai-co-authored-40-percent-codebase-155-commits-retrospective-commit-breakdown

Whether a commit was AI co-authored is determined mechanically by checking whether the commit message trailer contains Co-Authored-By: Claude. Claude Code automatically appends this trailer at commit time, so there are no gaps from manual recording errors.

Opus → Sonnet: AI Model Switches Preserved in History

The breakdown of the 62 AI co-authored commits was as follows.

Model Commit count Period
Claude Opus 4.6 13 2026-03-12 to 03-18 (first week)
Claude Sonnet 4.6 49 2026-03-24 to 06-18 (everything after)

Development started with Opus (the top-tier model) for the first week, then switched to Sonnet.

What's interesting is that this switch is naturally recorded in the commit history. Since only the model name in the Co-Authored-By trailer changes, looking back you can clearly see "ah, this is where the model changed."

Controlling Co-Authored-By: Switchable via settings.json

This Co-Authored-By trailer can actually be controlled via ~/.claude/settings.json (or .claude/settings.json in the project root).

{
  "attribution": {
    "commit": "",
    "pr": ""
  }
}

commit specifies the trailer text added to commit messages, and pr specifies the trailer text when creating PRs. Setting an empty string "" disables the trailer. By default, Co-Authored-By: Claude <noreply@anthropic.com> is added automatically.

Some teams may want to explicitly mark which commits involved AI, while others may prefer not to distinguish them from regular commits. This setting allows flexible switching.

On this project I disabled it partway through using this setting, but for retrospective analysis it would have been more useful to keep the trailers. Being able to quantitatively analyze the reality of AI co-development after the fact is only possible because of those trailers.

The .plans/ Directory: 39 Design Sessions with AI

The most unique artifact in this project is the 39 design documents accumulated in the .plans/ directory.

These are design documents created in "planning mode" with Claude Code before implementing each feature. Each document follows a consistent structure.

# Title
**Status:** Complete
**Date:** 2026-XX-XX

## Background
Why this change is needed. Current problems.

## Approach
How to solve it. Rationale for technical choices.

## Changes
What specifically is being changed and how.

Here are a few examples.

Example 1: Structured Output via Bedrock Tool Use

There was a bug where, in places that parsed LLM output using json.loads + regex, the token limit was being hit due to the high number of Japanese tokens, causing JSON to be truncated mid-output. In a design session with Claude Code, we switched to an approach using toolConfig + toolChoice in the Bedrock Converse API to guarantee valid JSON at the schema level.

Example 2: Direct OpenSearch NextGen Queries

In response to a bug where Bedrock Knowledge Bases' Retrieve API was returning 403 errors on OpenSearch NextGen (an AWS-side issue), we implemented a bypass using opensearch-py to perform KNN searches directly. The design uses the existing Bedrock KB index as-is and routes traffic via a single environment variable switch.

These design documents aren't "throwaway" artifacts — they function as a log of the project's decision-making. Why this approach was chosen, what other options existed, what should be reconsidered in the future — they serve a role similar to ADRs (Architecture Decision Records).

Where AI Helped: Analysis by Pattern

Categorizing the 62 AI co-authored commits reveals where AI was particularly strong.

1. Initial Implementation of Full-Stack Features (feat type: approx. 20 commits)

Where AI showed the most power was in implementing backend API + frontend UI + tests all in one go.

For example, the following were completed in a single work session:

  • Adding a FastAPI endpoint
  • Defining Pydantic models
  • Creating React components on the frontend
  • Adding SSE events
  • Updating the OpenAPI spec

Doing this as a single human means your concentration gets worn down by context switching (bouncing between Python → TypeScript → CSS), but AI can traverse all layers while maintaining context.

2. Debugging: Forming and Testing Hypotheses (fix type: approx. 15 commits)

When you hand AI a "investigate this bug" task, it reads the codebase, forms multiple hypotheses, and starts verifying from the most likely one. In cases like the SSE streaming issue mentioned earlier — where what looked like one problem turned out to be a compound of four independent causes — where a human alone would keep cycling through "fixed it... wait, no I didn't," AI was able to analyze it structurally.

3. Refactoring and Migration (refactor type: approx. 8 commits)

boto3anthropic SDK migration, setting up Docker infrastructure, removing legacy code — these "replace without breaking" tasks are AI's strength. Swapping out internal implementations while maintaining existing interfaces requires accurately understanding the scope of impact, and AI can take the entire codebase into context while working.

4. Documentation and Configuration (docs/chore type: approx. 15 commits)

Auto-generating OpenAPI specs, updating user manuals, cleaning up .env, optimizing Dockerfiles — the barrier to asking AI to handle these "should do but easy to put off" tasks is extremely low.

Where AI Struggled

On the other hand, there were situations where handing things to AI didn't go as expected.

1. Company-Specific Domain Knowledge

Internal rules like "exclude tickets if their category ID is a specific value" are things AI simply won't know unless you tell it. In early data pipeline implementations, aligning on this kind of domain logic took time.

2. Troubleshooting Infrastructure and Network Issues

SOCKS proxy configuration via VPN, EC2 instance role permissions, nginx buffering behavior — environment-dependent issues that you can only understand by actually running things cannot be solved by AI alone. It ends up becoming an interactive debugging session where you show it logs.

3. Fine-Grained UI Adjustments

Sensory UI tweaks like "this button placement feels slightly off..." or "this color tone..." have a high cost to verbalize, and there were many cases where it was faster to just fix them myself.

Actual Development Speed

Looking at the monthly commit distribution, you can see the rhythm of development.

Month Commit count Main work
March 24 Prototype construction, data pipeline foundation
April 55 UI overhaul, SSE streaming
May 60 Structured output, SDK migration, OpenAPI setup
June 16 OpenSearch NextGen migration, API Gateway integration

ai-co-authored-40-percent-codebase-155-commits-retrospective-monthly-commits

The plan was March for PoC and technical validation (0.4 person-months), then full development from April (0.8 person-months), but commit counts surged in April and May. This coincides with the period when I got the hang of the AI co-development tempo.

Particularly noteworthy is the fact that a single engineer shipped 14 versions across a full stack (frontend + backend + data pipeline + infrastructure + documentation) in 4 months. Without AI co-development, I don't think this pace would have been possible.

The AI Co-Development Workflow

Here's what the actual development flow looked like.

ai-co-authored-40-percent-codebase-155-commits-retrospective-workflow

1. Planning Phase

Human: "I want to add a ○○ feature. Requirements are △△"
Claude Code (plan mode): Creates design document with background, approach, and changes
Human: Reviews and gives revision instructions
→ Saved as .plans/xxx.md

2. Implementation Phase

Claude Code: Implements code based on design document
Human: Verifies behavior and gives feedback
Claude Code: Makes corrections and additional implementations
→ Commit (Co-Authored-By: Claude trailer added automatically)

3. Review Phase

Human: Reviews git diff
Claude Code: Makes corrections based on code review feedback
→ Final commit

What's important is that the human always makes the final judgment on reviews and merges. AI makes proposals and handles implementation, but the "this is good" decision is made by the human.

CLAUDE.md: "Development Guidelines" for AI

The project root's CLAUDE.md contains development rules for the AI.

## LLM Output Structured Format
Always use Bedrock tool use to enforce structured JSON output.
Never use regex or json.loads to parse raw LLM text responses.

## Scope Rules
- Never generate code for features not explicitly requested.
- Never add fields to Pydantic models unless referenced in both route and frontend.
- If a function has no callers after implementation, delete it before committing.

This is the same concept as handing coding guidelines to a human developer. Because AI has a tendency to "helpfully" add extra features or generate unused helper functions, explicitly constraining scope maintains quality.

By accumulating lessons learned from past failures here, we prevent the same mistakes from recurring.

Conclusion

Looking back on 4 months of AI co-development, there are a few things I can say with confidence.

What AI changed:

  • The elimination of context switching — In work that spans Python ↔ TypeScript ↔ CSS ↔ Docker, AI completes the implementation before human concentration gives out
  • The elimination of "it's a hassle, I'll do it later" — Documentation, tests, configuration file cleanup, and other work that would be better done but tends to get put off can be easily requested
  • Design documentation — The 39 files in .plans/ are artifacts that wouldn't have existed without dialogue with AI. They're records that wouldn't have survived with the old way of deciding "let's do it this way" verbally and jumping into implementation

What AI didn't change:

  • Design decisions are still human work — What to build, why to build it, and which option to choose are all decisions made by humans
  • Domain knowledge still needs to be taught — Company-specific rules and tacit knowledge require the cost of verbalizing and conveying them to AI
  • Reviews can't be skipped — Whether code was written by AI or a human, the importance of review doesn't change

The 62/155 figure is not a record of "handing things off to AI" — it's a record of "going through the design, implementation, and review cycle 155 times together with AI."


Claudeならクラスメソッドにお任せください

クラスメソッドは、Anthropic社とリセラー契約を締結しています。各種製品ガイドから、業種別の活用法、フェーズごとのお悩み解決などサービス支援ページにまとめております。まずはご覧いただき、お気軽にご相談ください。

サービス詳細を見る

Share this article