Game Development Data Layout Strategy: Switching from Node-based to SoA Made Updates 3 Times Faster

Game Development Data Layout Strategy: Switching from Node-based to SoA Made Updates 3 Times Faster

Moving a large number of objects in games causes frame rate drops. The cause is object-oriented data layout. This article explains the SoA (Structure of Arrays) pattern based on Data-Oriented Design concepts from fundamentals, and verifies its effectiveness with benchmark results from Godot Engine.
2026.02.22

This page has been translated by machine translation. View original

Introduction

When increasing the number of objects on the screen to hundreds or thousands in games or physical simulations, there comes a point where the frame rate drops dramatically.

In a straightforward object-oriented design, each object is generated as a class instance with its own update and drawing processes. While the code is easy to understand, this design is not suitable for processing large numbers of objects every frame. The cause is not the computational complexity of the algorithm, but the arrangement of data in memory.

This article analyzes this problem from a Data-Oriented Design perspective and explains the improvement method using the SoA (Structure of Arrays) pattern. In the latter half, we'll measure the actual effect in Godot Engine 4.6 and confirm the results numerically.

Target Audience

  • Engineers who need to handle large numbers of objects in games or physical simulations
  • Those who have hit performance walls with object-oriented design
  • Those interested in Data-Oriented Design

References

CPU Cache and Memory Layout

To understand performance issues, you need to know how the CPU reads data from memory.

How Cache Lines Work

The CPU loads data from main memory not byte by byte, but in fixed-size blocks called cache lines (64 bytes on many CPUs). Access to data included in an already loaded cache line is fast. Conversely, accessing an address not in the cache line causes a load from main memory, resulting in a wait of tens to hundreds of cycles. This is a cache miss.

In other words, processing that accesses contiguous memory regions in sequence is fast, while processing that jumps between distant addresses in memory is slow.

AoS: The Natural Layout of Object-Oriented Design

In object-oriented design, all fields such as position, velocity, color, and HP are grouped within one object. An array of these structures is called AoS (Array of Structures).

Even if the movement process only uses position and velocity, the cache line will also load color and HP data. The larger the object size, the fewer objects fit in a single cache line, degrading cache efficiency.

SoA: Separating Arrays by Field

SoA (Structure of Arrays) groups the same fields into continuous arrays.

The movement process scans only position and velocity arrays. Cache lines are filled with values of the same field, greatly reducing cache misses.

What Happens in Node-Based Design in Game Engines

In typical game engines, each object is managed as a node in the scene graph. In Unity this would be GameObject, in Godot it would be Node2D.

Node-based design excels at intuitively representing game structure. However, when generating 1000 objects of the same type, the following overheads become significant:

  • Repeated virtual function calls
    The engine calls the update function of each node individually every frame. In addition to the cost of the calls themselves, the distant memory positions of each node induce cache misses
  • Scene tree management cost
    Engine internal processes like maintaining parent-child relationships and propagating coordinate transformations increase proportionally with the number of nodes
  • Memory fragmentation
    Nodes are allocated individually on the heap, so contiguous memory layout is not guaranteed

Nodes are suitable for a small number of different types of objects like players and bosses. For handling large numbers of similar objects like bullets or particles, batch management with SoA is more efficient.

SoA + Batch Processing Design Pattern

Let's look at specific design patterns for managing data with SoA.

Data Structure

Position, velocity, and color are held as independent arrays.

positions:  [pos₀, pos₁, pos₂, ...]    # Vector2 array
velocities: [vel₀, vel₁, vel₂, ...]    # Vector2 array
colors:     [c₀, c₁, c₂, ...]          # Color array
count:      int                         # Number of valid entities

Batch Update

In node-based systems, the engine calls the update function for each of N nodes.

With SoA, all entities are updated in a single loop.

for i in range(count):
    positions[i] += velocities[i] * delta

Since we access continuously from the beginning to the end of the array, cache efficiency is high. The function call overhead is incurred only once.

Batch Drawing

Similarly for drawing, all entities are drawn in a loop within a single draw function.

for i in range(count):
    draw_circle(positions[i], radius, colors[i])

In node-based systems, the engine calls the draw function of each of N nodes, but with SoA, N shapes are drawn together in a loop within a single draw function.

Measurement in Godot Engine 4.6

Based on these concepts, I tested how much difference there actually is in Godot Engine 4.6.

Benchmark Design

I created a minimal scene where N circles move in random directions and reflect off screen edges.

The two comparison patterns are:

  • Node-based
    Each circle is generated as a child node of Node2D, with each node having its own _process() and _draw(). Since Godot's _draw() is only called when redrawing is requested with queue_redraw(), this test calls queue_redraw() every frame
  • SoA
    All positions, velocities, and colors are managed in PackedArrays, with batch processing in a single _process() and _draw()

After 60 frames of warm-up, the arithmetic mean of 180 frames was recorded. Total frame time (frame_ms), update processing time (update_ms), and drawing processing time (draw_ms) are measured separately. The measurement environment was Windows, Godot 4.6.1 (GL Compatibility renderer), VSync OFF.

soa-test-node-based-5000

Node-based / Entity count: 5000 capture

soa-test-soa-5000-2

SoA / Entity count: 5000 capture

Measurement Results

Entity Count Node FPS SoA FPS Node frame (ms) SoA frame (ms) Node update (ms) SoA update (ms)
100 468 566 2.14 1.77 0.06 0.02
500 101 126 9.90 7.92 0.33 0.11
1,000 49 56 20.43 17.78 0.65 0.19
2,000 25 30 40.10 33.58 1.30 0.40
5,000 10 12 99.71 83.28 3.29 1.00
10,000 5 6 211.91 165.30 6.49 1.97

chart_frame_ms

chart_update_ms

Findings

  • Update processing is about 3.3 times faster with SoA
    With 10,000 entities, SoA took 1.97 ms compared to 6.49 ms for Node-based. This is due to improved cache efficiency from sequential scanning of PackedArray and reduced function call overhead.

  • Drawing process occupies most of the frame time
    Drawing time accounted for 42-48% of the total frame time. Since drawing involves issuing draw commands to the GPU, there are limits to improvements from CPU-side data layout. However, Node-based is slower due to the overhead of individual _draw() calls.

  • Overall FPS improvement is limited to about 1.2 times
    While the update process improvement rate is significant, most of the frame time is occupied by drawing and engine overhead, resulting in only about a 1.2 times overall improvement. SoA is only an optimization of CPU-side data access, and different approaches like GPU instancing are needed for bottlenecks in the rendering pipeline.

Which Should You Use?

SoA is not always the right answer. SoA is effective when you have hundreds to thousands of the same type of object, and process all entities every frame. Bullets, particles, and crowd simulations are typical examples. On the other hand, node-based is suitable for objects that are few in number but vary in type. Players, UI elements, and boss enemies benefit from nodes (using the engine's physics and animation features) and have little motivation to be converted to SoA. In the same game, it's practical to mix designs, such as using node-based for players and SoA for bullets.

Summary

By changing the data layout from AoS to SoA, you can significantly speed up the processing of large numbers of objects. Game engine node systems improve code clarity, but it's worth switching to data-oriented design when handling large numbers of similar objects. In this article's measurements, we confirmed about a 3.3 times improvement in update processing and about a 1.2 times improvement in overall frame rate. If drawing becomes a bottleneck, further improvements can be expected by combining with other techniques such as GPU instancing.

Share this article

FacebookHatena blogX