Game development data layout strategy: Switching from node-based to SoA made updates 3 times faster

When moving a large number of objects in games, the frame rate drops. The cause is in the object-oriented data layout. This article explains the SoA (Structure of Arrays) pattern based on Data-Oriented Design from its concept, and verifies its effectiveness with measurement results in the Godot Engine.

ゲーム開発・運用

越井琢巳 (Koshii Takumi)

2026.02.22

This page has been translated by machine translation. View original

 IntroductionWhen you increase the number of objects on the screen to hundreds or thousands in games or physical simulations, frame rate drops dramatically at some point.
With a straightforward design following object-oriented principles, each object is generated as a class instance with its own update and drawing processes. While the code is clear, this design is not suitable for processing large numbers of objects every frame. The issue isn't with algorithmic complexity but with how data is arranged in memory.
This article analyzes this problem from a Data-Oriented Design perspective and explains improvement methods using the SoA (Structure of Arrays) pattern. In the latter half, we'll measure actual effects using Godot Engine 4.6 and confirm the results with numbers.
 Target AudienceEngineers who need to handle large numbers of objects in games or physical simulations
Those who have hit performance walls with object-oriented designs
People interested in Data-Oriented Design
 ReferencesRichard Fabian, Data-Oriented Design — An explanatory book on data-oriented design (free online version)
Godot Engine Documentation, PackedFloat32Array / PackedVector2Array
Unity Technologies, Unity DOTS / Entities overview
 CPU Cache and Memory LayoutTo understand performance issues, you need to know how CPUs read data from memory.
 How Cache Lines WorkCPUs don't read data from main memory one byte at a time, but in fixed-size blocks called cache lines (typically 64 bytes on many CPUs). Accessing data within an already loaded cache line is fast. Conversely, accessing an address not in a cache line triggers a load from main memory, causing a wait of tens to hundreds of cycles. This is a cache miss.
In other words, processing that accesses contiguous memory areas sequentially is fast, while processing that jumps between distant memory addresses is slow.
 AoS: The Natural Layout in Object-Oriented DesignIn object-oriented design, each object contains all its fields like position, velocity, color, and HP. An array of this structure is called AoS (Array of Structures).
Even if movement processing only uses position and velocity, the cache line will also load color and HP data. The larger the object size, the fewer objects fit into a single cache line, worsening cache efficiency.
 SoA: Separating Arrays by FieldSoA (Structure of Arrays) groups the same fields into continuous arrays.
Movement processing traverses only the position and velocity arrays. The cache lines are filled with values of the same field, greatly reducing cache misses.
 What Happens With Node-Based Design in Game EnginesIn typical game engines, each object is managed as a node in a scene graph. In Unity this would be GameObject, and in Godot, Node2D.
Node-based design excels at intuitively representing game structure. However, when generating 1000 objects of the same type, the following overheads become significant:
Repeated virtual function calls

The engine calls the update function of each node individually every frame. Besides the cost of the calls themselves, cache misses are induced because each node may be located far apart in memory
Scene tree management costs

Internal engine processes like maintaining parent-child relationships and propagating coordinate transformations increase proportionally to the number of nodes
Memory fragmentation

Since nodes are allocated individually on the heap, contiguous memory arrangement isn't guaranteed
Nodes are suitable for objects that are few in number and different in type, like players and bosses. For handling large numbers of similar objects like bullets or particles, batch management with SoA is more efficient.
 SoA + Batch Processing Design PatternLet's look at specific design patterns when managing data with SoA.
 Data StructurePosition, velocity, and color are maintained as independent arrays.
positions:  [pos₀, pos₁, pos₂, ...]    # Vector2 array
velocities: [vel₀, vel₁, vel₂, ...]    # Vector2 array
colors:     [c₀, c₁, c₂, ...]          # Color array
count:      int                        # Number of valid entities
 Batch UpdateIn node-based systems, the engine calls the update function for each of N nodes.
With SoA, all entities are updated in a single loop.
for i in range(count):
    positions[i] += velocities[i] * delta
Accessing arrays sequentially from start to end results in high cache efficiency. The function call overhead is also reduced to just once.
 Batch DrawingSimilarly, drawing processes all entities in a loop within a single draw function.
for i in range(count):
    draw_circle(positions[i], radius, colors[i])
While node-based engines call the drawing function of each of N nodes, SoA draws N shapes together in a loop within a single drawing function.
!Deletion: swap-and-pop
Entity deletion is tricky with SoA. Deleting from the middle of an array creates a gap, requiring skip checks during loops.
With the swap-and-pop method, you exchange the element to delete with the last element in the array, then truncate the end. This maintains array contiguity and eliminates the need for gap checks. It's effective when order doesn't need to be preserved.
# Delete entity[i]
positions[i]  = positions[count - 1]
velocities[i] = velocities[count - 1]
colors[i]     = colors[count - 1]
count -= 1
 Measuring with Godot Engine 4.6Based on these concepts, I tested the actual difference in Godot Engine 4.6.
!Godot provides continuous memory arrays like PackedVector2Array and PackedFloat32Array as standard, allowing direct representation of SoA data structures. Also, the design where each node has virtual _process() and _draw() functions is suitable for measuring node-based overhead. When performing similar tests in Unity, you would use GameObject + MonoBehaviour.Update() for the node-based approach, and the job system (DOTS) with NativeArray for the SoA approach.
 Benchmark DesignI created a minimal scene where N circles move in random directions and reflect off screen edges.
The two patterns compared were:
Node-based

Each circle is generated as a child node of Node2D, with each node having its own _process() and _draw(). Since Godot's _draw() is only called when redrawing is requested with queue_redraw(), this test calls queue_redraw() every frame
SoA

All positions, velocities, and colors are managed with PackedArray, and batch processing is done with a single _process() and _draw()
After a 60-frame warm-up, I recorded the arithmetic mean over 180 frames. The total frame time (frame_ms), update processing time (update_ms), and drawing processing time (draw_ms) were measured separately. The test environment was Windows, Godot 4.6.1 (GL Compatibility renderer), with VSync OFF.
Node-based / Entity count: 5000 capture
SoA / Entity count: 5000 capture
 Measurement Results

Entity count
Node FPS
SoA FPS
Node frame (ms)
SoA frame (ms)
Node update (ms)
SoA update (ms)


100
468
566
2.14
1.77
0.06
0.02

500
101
126
9.90
7.92
0.33
0.11

1,000
49
56
20.43
17.78
0.65
0.19

2,000
25
30
40.10
33.58
1.30
0.40

5,000
10
12
99.71
83.28
3.29
1.00

10,000
5
6
211.91
165.30
6.49
1.97

 Findings from the TestUpdate processing is about 3.3 times faster with SoA

With 10,000 entities, Node-based took 6.49 ms versus 1.97 ms for SoA. This is due to improved cache efficiency from sequential scanning of PackedArrays and reduced function call overhead.
Drawing processing occupies most of the frame time

Drawing time accounted for 42-48% of the total frame time. Since drawing includes issuing draw commands to the GPU, there's a limit to improvements from CPU-side data arrangement. However, Node-based is slower due to the overhead of individual _draw() calls.
Overall FPS improvement is limited to about 1.2 times

While the update processing improvement rate is significant, most frame time is occupied by drawing processing and engine overhead, resulting in only about 1.2 times improvement overall. SoA is just a CPU-side data access optimization; other approaches like GPU instancing are needed for drawing pipeline bottlenecks.
 Which Should You Use?SoA isn't always the right answer. SoA is effective when you have hundreds to thousands of the same type of object and need to traverse all entities every frame. Bullets, particles, and crowd simulations are typical examples. For objects that are few in number and diverse in type, node-based is more suitable. Players, UI elements, and boss enemies benefit from node features (using engine physics and animation capabilities) and have little motivation to be converted to SoA. In practical game design, it's realistic to mix approaches—using node-based for players and SoA for bullets.
 SummaryChanging the data layout from AoS to SoA can significantly speed up processing of large numbers of objects. While game engine node systems improve code clarity, it's worth switching to data-oriented design when dealing with large numbers of similar objects. In this article's measurements, we confirmed about 3.3 times improvement in update processing and about 1.2 times improvement in overall frame time. If drawing is a bottleneck, further improvements can be expected by combining with other techniques like GPU instancing.

Game development data layout strategy: Switching from node-based to SoA made updates 3 times faster

Introduction

Target Audience

References

CPU Cache and Memory Layout

How Cache Lines Work

AoS: The Natural Layout in Object-Oriented Design

SoA: Separating Arrays by Field

What Happens With Node-Based Design in Game Engines

SoA + Batch Processing Design Pattern

Data Structure

Batch Update

Batch Drawing

Measuring with Godot Engine 4.6

Benchmark Design

Measurement Results

Findings from the Test

Which Should You Use?

Summary

AWS Topics

Trending Topics

Products & Services

Features and Series

Entity count	Node FPS	SoA FPS	Node frame (ms)	SoA frame (ms)	Node update (ms)	SoA update (ms)
100	468	566	2.14	1.77	0.06	0.02
500	101	126	9.90	7.92	0.33	0.11
1,000	49	56	20.43	17.78	0.65	0.19
2,000	25	30	40.10	33.58	1.30	0.40
5,000	10	12	99.71	83.28	3.29	1.00
10,000	5	6	211.91	165.30	6.49	1.97