話題の記事

If I ask Claude Code to implement the same task in C/C++/Rust/Zig, will there be differences in behavior across languages?

I compared LRU cache and Thread Pool implementations generated by Claude Code (Opus 4.6) in C, C++, Rust, and Zig, evaluating the generation quality, repair process, and code understanding across these languages.

越井琢巳 (Koshii Takumi)

2026.03.02

This page has been translated by machine translation. View original

 IntroductionDoes the choice of programming language affect generation quality when having AI write code? Does Rust's borrow checker help AI fix code, or does C's simplicity work in AI's favor...? I observed the differences between languages by having Claude Code (Opus 4.6) implement the same tasks in C, C++, Rust, and Zig.
 What is RustRust is a systems programming language initially developed at Mozilla. It is now developed by the community under the Rust Foundation. Its ownership system and borrow checker guarantee memory and thread safety at compile time.
 What is ZigZig is a systems programming language designed as an alternative to C. While adopting manual memory management, it reinforces safety with compile-time computation (comptime) and runtime safety checks.
 Target AudienceSoftware engineers interested in AI code generation
Developers interested in programming language safety mechanisms
Those wanting to know the practicality of Claude Code
 Test EnvironmentOS: Windows 11 Home (Build 26200)
CPU: Intel Core i7-11700F, RAM: 32 GB
C/C++: MSVC v143 (Visual Studio 2022), CMake 4.2.1, C implements threads using Win32 API (CreateThread)
Rust: rustc 1.93.1
Zig: 0.15.2
AI: Claude Code (Opus 4.6)
 ReferencesClaude Code
The Rust Programming Language
Zig Programming Language
 Test PlanIf language safety mechanisms affect AI, memory management and concurrent processing would be the areas where differences are most likely to appear. This is because these areas have the most divergent approaches across languages: C/C++ requires manual management, Rust strictly checks at compile time, and Zig reinforces with runtime checks.
As hypotheses, I considered the following possibilities:
Rust's borrow checker helps with repairs, minimizing correction rounds
Conversely, the borrow checker challenges AI, consuming rounds in dialogue with compiler errors
C's simplicity works in AI's favor, passing even with manual memory management on the first try
There is little difference between languages, and generation quality is language-independent
 Testing MethodFor this test, I prepared two tasks:
Task A: LRU Cache

Focus on memory management, testing ownership, lifetime, and release responsibilities.
Task B: Thread Pool

Focus on concurrent processing, testing shutdown, join, and condition variable synchronization.
For each task, I defined L1 tests (12 specification tests) and L2 stress tests (fixed seed, repeated execution). I allowed a maximum of 5 rounds of repairs for the AI-generated code to pass these tests.
 Control of Confounding VariablesTo prevent learning effects from carrying over between languages, implementations for each language were done in independent Claude Code sessions. Each session was provided only with specifications, test code, and build commands.
I also conducted comprehension tests on the completed code. If the readability of AI-generated code differs by language, there should also be differences when another AI maintains that code later. To verify this, I had separate sessions read just the code without the specifications and assigned three tasks: providing an overview, identifying issues, and adding functionality.
 Result 1: No Difference in Memory Management TasksResults for the LRU cache:


Metric
C
C++
Rust
Zig


First-pass result
All pass
All pass
All pass
All pass

Rounds to green
0
0
0
0

Compile-fix cycles
0
0
0
0

Stress survival
All pass
All pass
All pass
All pass

All 4 languages passed the L1 tests (12/12) on the first try and also passed the stress test (10,000 iterations). No repair rounds were needed. Algorithm selection was consistent across all 4 languages, with each adopting the standard LRU cache pattern of a hash map plus a doubly linked list. However, Rust alone showed unique adaptation to accommodate the borrow checker.
 Rust's Index-Based ListIn the C implementation, nodes are linked with pointers:
typedef struct Entry {
    int32_t key;
    uint8_t *value;
    size_t value_len;
    struct Entry *prev;
    struct Entry *next;
    struct Entry *hash_next;
} Entry;
In contrast, the Rust implementation links nodes using Vec indices instead of pointers:
struct Node {
    key: i32,
    value: Vec<u8>,
    prev: usize,  // index instead of pointer
    next: usize,
}

pub struct LRUCache {
    capacity: usize,
    map: HashMap<i32, usize>,
    nodes: Vec<Node>,  // node arena
    head: usize,       // sentinel (index 0)
    tail: usize,       // sentinel (index 1)
}
Implementing a doubly linked list with pointers in Rust would create a situation where nodes are mutably borrowed from both previous and next nodes, which isn't possible without unsafe. AI avoided this by using Vec<Node> as an arena and using indices as pseudo-pointers. There were 0 unsafe blocks. This is a pattern known in the Rust community. It's noteworthy that AI naturally chose this strategy.
 Result 2: Differences Appeared in Concurrent Processing TasksResults for the Thread Pool:
Rounds to green: Number of correction rounds required to pass all tests (generation→build→test→failure presentation→correction counts as 1 round)
Compile-fix cycles: Number of times stopped by compilation errors before test execution and required fixes
Regression count: Number of times a fix caused another test to newly fail


Metric
C
C++
Rust
Zig


First-pass result
T08 fail
Build error
T08 fail
Build error

Rounds to green
1
2
2
3

Compile-fix cycles
0
1
0
1

Regression count
0
0
1
0

Stress survival
All pass
All pass
All pass
All pass

In contrast to Task A, repairs were required for all languages. The number of repair rounds was C: 1, C++: 2, Rust: 2, Zig: 3.
 Common Wall Across All Languages: Timing Race in TestsTest T08 (case where shutdown is called while submit is blocking) failed in all 4 languages. The cause was not incorrect implementation logic but timing dependencies in the test code. AI was able to generate mostly correct concurrent processing logic, but couldn't correctly handle the constraint of waiting for submit to complete before calling shutdown on the first try.
 Different Types of Failures by LanguageThe T08 test-side race was common to all languages, but other failures differed by language. C completed with just one round of fixing the test-side race, but the other 3 languages had additional problems:
C++

Defined worker_loop as a free function but couldn't access the private Impl structure, causing compilation error (C2248)
Zig

Compilation error due to Zig API changes moving std.time.sleep to std.Thread.sleep. Additionally, returning ThreadPool by value caused worker threads to reference dangling pointers, resulting in runtime panic
Rust

The T08 fix caused regression in T09 (the only language that broke another test during repair)
 Case Where Zig's Runtime Detection Helped AIZig's R2 is an interesting case. Returning ThreadPool by value causes the structure to move on the stack, invalidating pointers held by worker threads. Zig's runtime boundary check immediately detected this as an index out of bounds error, allowing AI to complete the fix in one round by placing SharedState on the heap.
pub const ThreadPool = struct {
    state: *SharedState,  // Pointer to heap (not value embedding)
    workers: []std.Thread,

    pub fn init(thread_count: usize, queue_capacity: usize) ThreadPool {
        const alloc = std.heap.page_allocator;
        // Place SharedState on heap to ensure stable address
        const state = alloc.create(SharedState) catch @panic("alloc failed");
        // ...
    }
};
In Rust, this type of problem would be detected at compile time, but in C, it might silently proceed as undefined behavior. Zig's runtime detection is a good example of early error detection even in languages without compile-time guarantees.
 Result 3: All Languages' Code Was Accurately Understood in Comprehension TestsFor the completed code, I assigned 3 tasks to a separate AI session without providing specifications:
Task 1: Overview

Evaluated understanding of implementation structure, synchronization method, and lifecycle across 7 items
Task 2: Issue identification

Analysis of potential risks in specified functions
Task 3: Feature addition

Implementation of a new function, checking that it doesn't break existing tests
The results for both tasks across all 4 languages were Task 1: 7/7, Task 2: accurate, Task 3: 0 repair rounds. Despite differences appearing between languages during implementation of Task B, no differences appeared in reading comprehension and modification.
However, there were qualitative differences in the Task 2 findings. Regarding reference invalidation in cache_get for Task A, C/C++/Zig identified it as a use-after-free risk, while Rust recognized that reference invalidation is prevented by the borrow checker and instead identified it as a constraint on API usability due to &mut self. AI reinterpreted the same problem in language-specific contexts, demonstrating understanding of each language's safety model.
 Findings from the Test Hypotheses 1, 2: Borrow Checker InfluenceNeither hypothesis - that the borrow checker helps with repairs or that the borrow checker challenges AI - was supported. Rust had 0 compilation errors, showing no signs of struggling with the borrow checker. In Task A, it naturally chose an index-based list, accommodating the borrow checker from the start. However, it didn't have the fewest repair rounds (C had the minimum of 1) and was the only language where regression occurred.
 Hypothesis 3: C's Simplicity Is AdvantageousThis was partially supported. In Task B, C passed with the minimum of 1 round and had no compilation errors or runtime panics, requiring only test-side race fixes. However, in Task A, there was no difference between the 4 languages, so the advantage was limited.
 Hypothesis 4: No Language Difference in Generation QualityThis was closest to reality. It was fully supported in Task A, and even in Task B, the dominant factor was the test-side race common to all languages, with language-specific differences being relatively minor.
 Unexpected FindingsC++ and Zig detected errors at compile time, and Zig also caught dangling pointers at runtime. Rust guaranteed thread safety without unsafe code. However, timing races in tests couldn't be prevented by any safety mechanism. The effect of language safety mechanisms is reflected in the speed and specificity of error messages.
!This test was conducted once per language, so it is an N=1 test. Statistical generalizations cannot be made. Also, these are trends in a specific LLM model. Different results may be obtained with other models.
It's also important to note that Zig likely has less training data compared to the other 3 languages. This test cannot determine whether Zig's highest number of repair rounds is due to language characteristics or insufficient training data.
 ConclusionThe choice of language has a slight influence on generation quality for AI, but this influence is relatively small compared to the complexity of the task. No differences appeared in known patterns like memory management, and differences only became apparent in concurrent processing. Language safety mechanisms contribute to early error detection, but timing constraints in test design required repairs in all 4 languages, representing a challenge that couldn't be avoided by language choice.

If I ask Claude Code to implement the same task in C/C++/Rust/Zig, will there be differences in behavior across languages?

Introduction

What is Rust

What is Zig

Target Audience

Test Environment

References

Test Plan

Testing Method

Control of Confounding Variables

Result 1: No Difference in Memory Management Tasks

Rust's Index-Based List

Result 2: Differences Appeared in Concurrent Processing Tasks

Common Wall Across All Languages: Timing Race in Tests

Different Types of Failures by Language

Case Where Zig's Runtime Detection Helped AI

Result 3: All Languages' Code Was Accurately Understood in Comprehension Tests

Findings from the Test

Hypotheses 1, 2: Borrow Checker Influence

Hypothesis 3: C's Simplicity Is Advantageous

Hypothesis 4: No Language Difference in Generation Quality

Unexpected Findings

Conclusion

AWS Topics

Trending Topics

Products & Services

Features and Series

Metric	C	C++	Rust	Zig
First-pass result	All pass	All pass	All pass	All pass
Rounds to green	0	0	0	0
Compile-fix cycles	0	0	0	0
Stress survival	All pass	All pass	All pass	All pass

Metric	C	C++	Rust	Zig
First-pass result	T08 fail	Build error	T08 fail	Build error
Rounds to green	1	2	2	3
Compile-fix cycles	0	1	0	1
Regression count	0	0	1	0
Stress survival	All pass	All pass	All pass	All pass