Query Benchmark

This page summarizes the query performance benchmark in src/Test/Ghost.Entities.Test/QueryBenchmark.cs.

Goal

Compare update throughput between:

Classic managed object array iteration (GameObject[]).
ECS chunk iteration through EntityQuery.

The benchmark updates one million items in both paths.

Benchmark Setup

[GlobalSetup]
public void Setup()
{
    _world = World.Create(entityCapacity: 1_000_000);
    _gameObjects = new GameObject[1_000_000];

    using var scope = AllocationManager.CreateStackScope();
    var componentSet = new ComponentSet(scope.AllocationHandle, ComponentTypeID<Position>.Value);
    _world.EntityManager.CreateEntities(1_000_000, componentSet);

    _queryIdentifier = new QueryBuilder().WithAllRW<Position>().Build(_world);
}

Two benchmark methods:

[Benchmark]
public void QueryGameObjects()
{
    for (var i = 0; i < _gameObjects.Length; i++)
    {
        _gameObjects[i].Position += new Vector4(_dt, _dt, _dt, 0);
    }
}

[Benchmark(Baseline = true)]
public void QueryEntities()
{
    ref var query = ref _world.ComponentManager.GetEntityQueryReference(_queryIdentifier);
    foreach (var chunkView in query.GetChunkIterator())
    {
        var positions = chunkView.GetComponentDataRW<Position>();
        ref var address = ref MemoryMarshal.GetReference(positions);

        for (var i = 0; i < positions.Length; i++)
        {
            Unsafe.Add(ref address, i).value += new Vector4(_dt, _dt, _dt, 0);
        }
    }
}

Notes on Measurement

The benchmark enables hardware counters (CacheMisses, LlcReference, InstructionRetired) for deeper analysis.
ECS path uses chunk traversal and contiguous component storage.
Unsafe.Add avoids bounds-check overhead inside the inner loop.

Running the Benchmark

From src/:

dotnet run --project Test/Ghost.Entities.Test/Ghost.Entities.Test.csproj -c Release

The current Program.cs already executes BenchmarkRunner.Run<QueryBenchmark>().

Interpreting Results

When comparing output, evaluate both runtime and counters:

Lower execution time for target workload.
Lower cache miss rates under similar work.
Reasonable instruction count for equivalent behavior.