refactor project structure and add documents.
This commit is contained in:
@@ -0,0 +1,115 @@
|
||||
# Best Practices and API Selection
|
||||
|
||||
## Which job type to use
|
||||
|
||||
| If you need | Use |
|
||||
|---|---|
|
||||
| Run one piece of work once | `IJob` |
|
||||
| Run the same operation across many independent elements | `IJobParallelFor` |
|
||||
| Run a parallel operation with per-batch setup overhead | `IJobParallel` |
|
||||
| Full control over execution and cleanup, or dynamic dispatch | `ICustomJob<TSelf>` |
|
||||
| Debug or test a job without threading overhead | `Run` / `RunRef` |
|
||||
|
||||
## IJob
|
||||
|
||||
Use `IJob` for any unit of work that can't be broken into smaller parallel pieces. Examples:
|
||||
|
||||
- Apply velocity to a single entity
|
||||
- Compute a sum, product, or aggregate over data that's already been processed
|
||||
- Trigger an action after dependencies complete
|
||||
|
||||
`IJob` runs once on one worker thread. If you find yourself scheduling many `IJob` instances that do the same operation, consider batching them into an `IJobParallelFor`.
|
||||
|
||||
## IJobParallelFor
|
||||
|
||||
Use `IJobParallelFor` when you need to apply the same transformation to every element of an array or buffer. The system distributes indices across worker threads in batches.
|
||||
|
||||
**Choose the right batch size:**
|
||||
|
||||
- Small batches (1-16): Best load balancing, more stealing overhead. Use when work per element varies.
|
||||
- Medium batches (32-128): Good balance. A reasonable default for most workloads.
|
||||
- Large batches (256+): Less overhead, but can cause uneven distribution. Use when work per element is uniform.
|
||||
|
||||
A good starting point is `batchSize = 64`. Profile and adjust from there.
|
||||
|
||||
**Avoid writing to overlapping indices.** Each index should be independent. If two indices write to the same location, you have a race condition.
|
||||
|
||||
## IJobParallel
|
||||
|
||||
Use `IJobParallel` when each batch of work has setup cost that you want to amortize. For example:
|
||||
|
||||
- Processing chunks of data where each chunk requires preparing local state
|
||||
- Operations where computing the output for a range is cheaper per-element than per-index
|
||||
|
||||
The API is the same as `IJobParallelFor`, but `Execute` receives `(startIndex, endIndex)` instead of a single `index`. This lets you write loops with local accumulators or per-batch initialization.
|
||||
|
||||
## ICustomJob
|
||||
|
||||
Use `ICustomJob<TSelf>` when you need:
|
||||
|
||||
- A job type that isn't known at compile time (dynamic dispatch via function pointers)
|
||||
- Custom cleanup logic that runs after the job completes
|
||||
- To control `JobRanges` directly for non-standard iteration patterns
|
||||
|
||||
The overhead is slightly higher than the standard interfaces due to the function pointer indirection. Only use it when the standard interfaces don't fit.
|
||||
|
||||
## Scheduler configuration
|
||||
|
||||
**ThreadCount:** Set to `Environment.ProcessorCount` for general use. The scheduler caps at the number of logical processors. For workloads that share cores with rendering or other systems, consider leaving one or two cores free.
|
||||
|
||||
**DependencyChainCapacity:** This is the total number of dependency edges the scheduler can track at once. Set it to cover your peak concurrent dependencies. If you run out, jobs will still work but dependency enforcement may be incomplete. When in doubt, set it higher — unused capacity costs nothing.
|
||||
|
||||
**ThreadPriority:** Use `Normal` for most cases. Use `AboveNormal` if the job system is the primary consumer of CPU time and you want to prioritize it over other system threads.
|
||||
|
||||
## Memory and allocation
|
||||
|
||||
- **Pre-allocate everything.** The scheduler allocates all internal structures (queues, edge pool, slot maps) at creation. No per-job GC allocations occur during scheduling or execution.
|
||||
- **Job data is copied.** When you schedule a struct job, the data is copied into an internal pool. Pointers and references remain valid for the job's lifetime.
|
||||
- **Managed payloads work.** Unlike many job systems, this library supports class-based jobs and jobs holding managed types (`List`, `string`, arrays). The same zero-allocation guarantees apply.
|
||||
- **Free custom resources in `ICustomJob.Free`.** If your custom job allocates unmanaged memory, the `Free` callback is the right place to release it.
|
||||
|
||||
## Schedule and complete timing
|
||||
|
||||
It's best practice to call `Schedule` on a job as soon as you have the data it needs, and don't call `Complete` on it until you need the results.
|
||||
|
||||
You can schedule less important jobs in a part of the frame where they aren't competing with more important jobs.
|
||||
|
||||
For example, if there is a period between the end of one frame and the beginning of the next frame where no jobs are running, and a one frame latency is acceptable, you can schedule the job towards the end of a frame and use its results in the following frame. Alternatively, if your application saturates that changeover period with other jobs, and there's an under-utilized period somewhere else in the frame, it's more efficient to schedule your job there instead.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **Prefer multiple dependencies over deep chains.** A job that waits on 10 handles directly is better than a chain of 10 jobs each waiting on one. This gives the scheduler more freedom to parallelize independent work.
|
||||
- **Use `CombineDependencies` for large dependency sets.** If a job depends on more than a handful of other jobs, combine them to reduce scheduling overhead.
|
||||
|
||||
## Avoid long running jobs
|
||||
|
||||
Unlike threads, jobs don't yield execution. Once a job starts, that job worker thread commits to completing the job before running any other job. As such, it's best practice to break up long running jobs into smaller jobs that depend on one another, instead of submitting jobs that take a long time to complete relative to other jobs in the system.
|
||||
|
||||
The job system usually runs multiple chains of job dependencies, so if you break up long running tasks into multiple pieces there is a chance for multiple job chains to progress. If instead the job system is filled with long running jobs, they might completely consume all worker threads and block independent jobs from executing. This might push out the completion time of important jobs that the main thread explicitly waits for, resulting in stalls on the main thread that otherwise wouldn't exist.
|
||||
|
||||
In particular, long running `IJobParallelFor` jobs impact negatively on the job system because these job types intentionally try to run on as many worker threads as possible for the job batch size. If you can't break up long parallel jobs, consider increasing the batch size of your job when scheduling it to limit how many workers pick up the long running job.
|
||||
|
||||
## Priorities
|
||||
|
||||
- **Reserve High for critical-path work.** Jobs on the critical path (the chain that the main thread is waiting on) benefit most from High priority.
|
||||
- **Use Low for background tasks.** Deferred work like cleanup, analytics, or pre-computation that isn't needed this frame should use Low priority.
|
||||
- **Most jobs should be Normal.** Overusing High priority dilutes its effectiveness.
|
||||
|
||||
## Inline execution
|
||||
|
||||
By default, `Wait` helps execute the job inline while waiting. This reduces latency because the calling thread contributes CPU time to the work it needs. Leave this enabled unless:
|
||||
|
||||
- The calling thread has other work to do while waiting (use async variants instead)
|
||||
- You're relying on thread-local storage and can't have an external thread execute jobs
|
||||
|
||||
## Thread safety
|
||||
|
||||
- **No two threads should write to the same memory.** Use dependencies to serialize writes.
|
||||
- **Multiple readers are safe.** `IJobParallelFor` indices are independent by design — each index writes to its own location.
|
||||
- **Don't access mutable static data from jobs.** The job system can't protect against race conditions on static fields.
|
||||
|
||||
## Additional resources
|
||||
|
||||
- [Creating and Scheduling Jobs](creating-jobs.md)
|
||||
- [Job Dependencies and Coordination](job-dependencies.md)
|
||||
- [Threading Fundamentals](threading-fundamentals.md)
|
||||
@@ -0,0 +1,277 @@
|
||||
# Creating and Scheduling Jobs
|
||||
|
||||
To create and run a job, you must:
|
||||
|
||||
1. Define a struct or class that implements one of the job interfaces.
|
||||
2. Schedule the job on a `JobScheduler` instance.
|
||||
3. Wait for the job to complete before accessing its results.
|
||||
|
||||
## Job types
|
||||
|
||||
| Type | Description |
|
||||
|---|---|
|
||||
| `IJob` | A single job that runs once on one worker thread |
|
||||
| `IJobParallelFor` | A parallel job that runs `Execute` once per index |
|
||||
| `IJobParallel` | A parallel job that runs `Execute` once per range segment |
|
||||
| `ICustomJob<TSelf>` | A job with user-defined execution and cleanup function pointers |
|
||||
|
||||
## Managed and unmanaged jobs
|
||||
|
||||
Jobs can hold both **unmanaged** data (pointers, primitive types, blittable structs) and **managed** data (arrays, `List<T>`, strings, class references). The job scheduler copies the job data into an internal pool at schedule time and frees it after completion.
|
||||
|
||||
```csharp
|
||||
// Managed job: holds an array reference
|
||||
public struct ArraySumJob : IJob
|
||||
{
|
||||
public int[] data; // managed array
|
||||
public int* result; // pointer to output
|
||||
|
||||
public void Execute(ref readonly JobExecutionContext ctx)
|
||||
{
|
||||
int sum = 0;
|
||||
for (int i = 0; i < data.Length; i++)
|
||||
sum += data[i];
|
||||
*result = sum;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This differs from many job systems that restrict payloads to blittable types only. However, using managed references inside jobs means standard threading rules still apply — multiple jobs must not write to the same managed object simultaneously without proper synchronization.
|
||||
|
||||
## Create a scheduler
|
||||
|
||||
Before you can schedule jobs, create a `JobScheduler` with a description that defines the thread count and dependency capacity.
|
||||
|
||||
```csharp
|
||||
using Misaki.HighPerformance.Jobs;
|
||||
|
||||
JobSchedulerDesc desc = new JobSchedulerDesc
|
||||
{
|
||||
ThreadCount = Environment.ProcessorCount,
|
||||
ThreadPriority = ThreadPriority.Normal,
|
||||
DependencyChainCapacity = 64,
|
||||
};
|
||||
|
||||
JobScheduler scheduler = new JobScheduler(in desc);
|
||||
```
|
||||
|
||||
`ThreadCount` controls how many worker threads the scheduler spawns. `DependencyChainCapacity` is the maximum number of dependency edges the scheduler can track simultaneously. Set this to a value that covers the peak number of outstanding dependencies your workload needs.
|
||||
|
||||
The scheduler also reserves one helper thread slot for external threads. Use `scheduler.ThreadLocalCount` when allocating thread-local storage to ensure every possible executor has a valid slot.
|
||||
|
||||
## IJob
|
||||
|
||||
`IJob` runs a single unit of work once on one worker thread. Implement the `Execute` method with the work you want to perform.
|
||||
|
||||
```csharp
|
||||
public struct ApplyVelocityJob : IJob
|
||||
{
|
||||
public Vector3* position;
|
||||
public Vector3 velocity;
|
||||
public float deltaTime;
|
||||
|
||||
public void Execute(ref readonly JobExecutionContext ctx)
|
||||
{
|
||||
*position += velocity * deltaTime;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
To schedule the job, call `Schedule` on the scheduler. This returns a `JobHandle` that you can use to track completion.
|
||||
|
||||
```csharp
|
||||
Vector3 pos = new Vector3(0, 0, 0);
|
||||
Vector3 vel = new Vector3(10, 0, 0);
|
||||
|
||||
ApplyVelocityJob job = new ApplyVelocityJob
|
||||
{
|
||||
position = &pos,
|
||||
velocity = vel,
|
||||
deltaTime = 0.016f,
|
||||
};
|
||||
|
||||
JobHandle handle = scheduler.Schedule(ref job);
|
||||
scheduler.Wait(handle);
|
||||
|
||||
// Result: pos == (0.16, 0, 0)
|
||||
```
|
||||
|
||||
## IJobParallelFor
|
||||
|
||||
`IJobParallelFor` runs the same operation across a range of indices in parallel. Each worker thread picks up batches of indices, processes them, then steals remaining batches from other workers.
|
||||
|
||||
This is useful for updating arrays of entities, processing particle data, or any operation where each element is independent.
|
||||
|
||||
```csharp
|
||||
public struct UpdatePositionJob : IJobParallelFor
|
||||
{
|
||||
public Vector3* positions;
|
||||
public Vector3 velocity;
|
||||
public float deltaTime;
|
||||
|
||||
public void Execute(int index, ref readonly JobExecutionContext ctx)
|
||||
{
|
||||
positions[index] += velocity * deltaTime;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Schedule a parallel-for job with the total iteration count and the batch size. The batch size controls how many indices each worker claims at once. Smaller batches give better load balancing. Larger batches reduce stealing overhead.
|
||||
|
||||
```csharp
|
||||
const int entityCount = 10000;
|
||||
|
||||
UpdatePositionJob job = new UpdatePositionJob
|
||||
{
|
||||
positions = positionsPtr,
|
||||
velocity = new Vector3(10, 0, 0),
|
||||
deltaTime = 0.016f,
|
||||
};
|
||||
|
||||
JobHandle handle = scheduler.ScheduleParallelFor(ref job, entityCount, 64);
|
||||
scheduler.Wait(handle);
|
||||
```
|
||||
|
||||
## IJobParallel
|
||||
|
||||
`IJobParallel` is similar to `IJobParallelFor`, but receives a start and end index instead of a single index. This is useful when the work per batch has setup overhead that you want to amortize across multiple elements.
|
||||
|
||||
```csharp
|
||||
public struct ProcessChunkJob : IJobParallel
|
||||
{
|
||||
public float* data;
|
||||
public int* output;
|
||||
|
||||
public void Execute(int startIndex, int endIndex, ref readonly JobExecutionContext ctx)
|
||||
{
|
||||
float sum = 0;
|
||||
for (int i = startIndex; i < endIndex; i++)
|
||||
{
|
||||
sum += data[i];
|
||||
}
|
||||
// Store per-chunk result
|
||||
output[startIndex] = (int)sum;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Schedule it the same way as a parallel-for job:
|
||||
|
||||
```csharp
|
||||
JobHandle handle = scheduler.ScheduleParallel(ref job, totalLength, batchSize);
|
||||
scheduler.Wait(handle);
|
||||
```
|
||||
|
||||
## ICustomJob
|
||||
|
||||
`ICustomJob<TSelf>` gives you full control over execution and cleanup by letting you provide function pointers. This is useful when you need custom resource management or when the job's execution logic isn't known until runtime.
|
||||
|
||||
```csharp
|
||||
public unsafe struct MyCustomJob : ICustomJob<MyCustomJob>
|
||||
{
|
||||
public int* value;
|
||||
|
||||
public static void Execute(ref MyCustomJob job, ref JobRanges jobRanges, ref readonly JobExecutionContext ctx)
|
||||
{
|
||||
*job.value += 1;
|
||||
}
|
||||
|
||||
public static void Free(ref MyCustomJob job)
|
||||
{
|
||||
// Clean up any unmanaged resources here
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Schedule it using `ScheduleCustom` with a `CustomJobDesc`:
|
||||
|
||||
```csharp
|
||||
int value = 0;
|
||||
|
||||
MyCustomJob customJob = new MyCustomJob { value = &value };
|
||||
|
||||
CustomJobDesc<MyCustomJob> desc = new CustomJobDesc<MyCustomJob>
|
||||
{
|
||||
data = ref customJob,
|
||||
pExecutionFunc = &MyCustomJob.Execute,
|
||||
pFreeFunc = &MyCustomJob.Free,
|
||||
jobRanges = JobRanges.Single,
|
||||
priority = JobPriority.Normal,
|
||||
};
|
||||
|
||||
JobHandle handle = scheduler.ScheduleCustom(ref desc);
|
||||
scheduler.Wait(handle);
|
||||
```
|
||||
|
||||
## Run inline
|
||||
|
||||
You can also run a job immediately on the calling thread. This is useful for debugging or when the work is too small to justify threading overhead.
|
||||
|
||||
```csharp
|
||||
// IJob
|
||||
job.Run(default);
|
||||
|
||||
// IJobParallelFor
|
||||
job.Run(totalIterations, default);
|
||||
|
||||
// IJobParallel
|
||||
job.Run(totalIterations, default);
|
||||
```
|
||||
|
||||
For struct jobs, use `RunRef` to avoid a copy:
|
||||
|
||||
```csharp
|
||||
ref MyJob jobRef = ref someJob;
|
||||
jobRef.RunRef(default);
|
||||
```
|
||||
|
||||
## Priority
|
||||
|
||||
You can assign a priority when scheduling a job.
|
||||
|
||||
```csharp
|
||||
JobHandle handle = scheduler.Schedule(ref job, JobPriority.High);
|
||||
```
|
||||
|
||||
For more information on how priorities affect scheduling, see [Threading Fundamentals](threading-fundamentals.md).
|
||||
|
||||
## preferLocal
|
||||
|
||||
When you schedule with `preferLocal: true`, the scheduler pushes the job onto the calling thread's local queue first. This keeps the job's data hot in the CPU cache for that thread.
|
||||
|
||||
```csharp
|
||||
JobHandle handle = scheduler.Schedule(ref job, preferLocal: true);
|
||||
```
|
||||
|
||||
Use this when the calling thread is likely to be the one that executes the job, such as when scheduling from a dedicated system thread.
|
||||
|
||||
## Wait for completion
|
||||
|
||||
After scheduling, call one of the wait methods to block until the job finishes:
|
||||
|
||||
```csharp
|
||||
// Block until a single job completes
|
||||
scheduler.Wait(handle);
|
||||
|
||||
// Block until all specified jobs complete
|
||||
scheduler.WaitAll(handle1, handle2);
|
||||
|
||||
// Block until any of the specified jobs completes
|
||||
JobHandle completed = scheduler.WaitAny(handle1, handle2);
|
||||
```
|
||||
|
||||
By default, `Wait` helps execute the job inline while waiting. Pass `inlineExecution: false` to disable this.
|
||||
|
||||
## Dispose
|
||||
|
||||
When you no longer need the scheduler, call `Dispose` to stop all worker threads and release resources:
|
||||
|
||||
```csharp
|
||||
scheduler.Dispose();
|
||||
```
|
||||
|
||||
## Additional resources
|
||||
|
||||
- [Threading Fundamentals](threading-fundamentals.md)
|
||||
- [Job Dependencies and Coordination](job-dependencies.md)
|
||||
- [Best Practices and API Selection](best-practices.md)
|
||||
@@ -0,0 +1,94 @@
|
||||
# Introduction
|
||||
|
||||
The job system lets you write safe, multithreaded code so your application can use all available CPU cores efficiently. It provides a zero-allocation, lock-free scheduling layer designed for game engines, simulations, and any high-throughput runtime.
|
||||
|
||||
## Why a dedicated job system?
|
||||
|
||||
Standard .NET primitives weren't designed for fine-grained game workloads:
|
||||
|
||||
- `Task` produces GC allocations per invocation, lacks native dependency chains, and doesn't support work stealing.
|
||||
- `Parallel.For` allocates per call and offers no dependency or priority control.
|
||||
- `ThreadPool` isn't built for low-latency job dispatch or batch-aware scheduling.
|
||||
|
||||
This library solves these problems with pre-allocated memory pools, lock-free scheduling, full DAG-based dependency tracking, and automatic work stealing across all CPU cores.
|
||||
|
||||
## Feature highlights
|
||||
|
||||
| Feature | Description |
|
||||
|---|---|
|
||||
| Zero allocation | All memory for scheduling and execution is pre-allocated at scheduler creation |
|
||||
| Lock-free scheduling | No `lock` statements or `Monitor` enters on the hot path |
|
||||
| Job dependencies | Full directed-acyclic-graph chain per job, bounded only by the global capacity set at scheduler creation |
|
||||
| Work stealing | Idle workers pull work from busy workers, naturally balancing load across P-cores and E-cores |
|
||||
| Priority scheduling | High (50%), Normal (37.5%), and Low (12.5%) dispatch ratios |
|
||||
| Managed + unmanaged | Supports both struct jobs and class-based jobs |
|
||||
| Three job contracts | `IJob`, `IJobParallelFor`, `IJobParallel`, plus `ICustomJob<TSelf>` for custom execution logic |
|
||||
| Async wait | `WaitAsync`, `WaitAllAsync`, `WaitAnyAsync` for non-blocking coordination |
|
||||
| Inline execution | Calling threads can help execute the job they're waiting on, reducing latency |
|
||||
|
||||
## Basic usage
|
||||
|
||||
```csharp
|
||||
using Misaki.HighPerformance.Jobs;
|
||||
|
||||
public struct AddJob : IJob
|
||||
{
|
||||
public int* pA;
|
||||
public int* pB;
|
||||
public int* pResult;
|
||||
|
||||
public void Execute(ref readonly JobExecutionContext ctx)
|
||||
{
|
||||
*pResult = *pA + *pB;
|
||||
}
|
||||
}
|
||||
|
||||
JobSchedulerDesc desc = new JobSchedulerDesc
|
||||
{
|
||||
ThreadCount = Environment.ProcessorCount,
|
||||
ThreadPriority = ThreadPriority.Normal,
|
||||
DependencyChainCapacity = 64,
|
||||
};
|
||||
|
||||
JobScheduler jobScheduler = new JobScheduler(in desc);
|
||||
|
||||
int a = 5;
|
||||
int b = 10;
|
||||
int result = 0;
|
||||
|
||||
AddJob job = new AddJob
|
||||
{
|
||||
pA = &a,
|
||||
pB = &b,
|
||||
pResult = &result
|
||||
};
|
||||
|
||||
JobHandle handle = jobScheduler.Schedule(job);
|
||||
jobScheduler.Wait(handle);
|
||||
|
||||
Console.WriteLine($"Result: {result}"); // Output: Result: 15
|
||||
```
|
||||
|
||||
## Who this is for
|
||||
|
||||
- Custom game engine developers who need a scheduling backbone without GC pauses
|
||||
- Simulation and batch-processing authors who need predictable parallelism
|
||||
- .NET developers who have hit the limits of `Task`-based approaches in tight loops
|
||||
|
||||
## Requirements
|
||||
|
||||
- .NET 10.0 or later
|
||||
- `unsafe` code enabled
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
dotnet add package Misaki.HighPerformance.Jobs
|
||||
```
|
||||
|
||||
## Additional resources
|
||||
|
||||
- [Threading Fundamentals](threading-fundamentals.md)
|
||||
- [Creating and Scheduling Jobs](creating-jobs.md)
|
||||
- [Job Dependencies and Coordination](job-dependencies.md)
|
||||
- [Best Practices and API Selection](best-practices.md)
|
||||
@@ -0,0 +1,178 @@
|
||||
# Job Dependencies and Coordination
|
||||
|
||||
Often, one job depends on the results of another job. For example, job A might write velocity data that job B reads to update positions. You must tell the scheduler about such a dependency when you schedule the dependent job. The scheduler won't run the dependent job until all jobs it depends on have finished.
|
||||
|
||||
A job can depend on any number of other jobs. You can also create chains of jobs where each job depends on the previous one. However, dependencies delay job execution, so you should design your dependency graph to allow independent chains to run in parallel.
|
||||
|
||||
## Dependencies on completed jobs
|
||||
|
||||
If the job you're depending on has already completed by the time you schedule the dependent job, the scheduler detects this and skips the wait. The dependent job becomes eligible to run immediately. This means there's no penalty for passing handles that are already complete — safe to use in patterns where the completion timing isn't guaranteed.
|
||||
|
||||
## Single dependency
|
||||
|
||||
Pass a `JobHandle` from one job's schedule call as a dependency to the next.
|
||||
|
||||
```csharp
|
||||
using Misaki.HighPerformance.Jobs;
|
||||
|
||||
public unsafe struct AddJob : IJob
|
||||
{
|
||||
public float* result;
|
||||
|
||||
public void Execute(ref readonly JobExecutionContext ctx)
|
||||
{
|
||||
*result += 1;
|
||||
}
|
||||
}
|
||||
|
||||
float result = 0;
|
||||
|
||||
AddJob jobA = new AddJob { result = &result };
|
||||
JobHandle handleA = scheduler.Schedule(ref jobA);
|
||||
|
||||
AddJob jobB = new AddJob { result = &result };
|
||||
JobHandle handleB = scheduler.Schedule(ref jobB, handleA);
|
||||
|
||||
scheduler.Wait(handleB);
|
||||
// result == 2
|
||||
```
|
||||
|
||||
Job B won't start until job A completes. Because both jobs write to the same data, the dependency ensures there is no race condition.
|
||||
|
||||
## Multiple dependencies
|
||||
|
||||
A job can wait on several jobs at once. Pass multiple handles to `Schedule`.
|
||||
|
||||
```csharp
|
||||
JobHandle handle1 = scheduler.Schedule(ref job1);
|
||||
JobHandle handle2 = scheduler.Schedule(ref job2);
|
||||
|
||||
// Job 3 waits for both job1 and job2 to finish
|
||||
JobHandle handle3 = scheduler.Schedule(ref job3, handle1, handle2);
|
||||
scheduler.Wait(handle3);
|
||||
```
|
||||
|
||||
## Combined dependencies
|
||||
|
||||
For a large number of dependencies, use `CombineDependencies` to create a single handle that represents all of them. This avoids deep dependency chains and reduces scheduling overhead.
|
||||
|
||||
```csharp
|
||||
// Collect handles from many scheduled jobs
|
||||
JobHandle handle1 = scheduler.Schedule(ref job1);
|
||||
JobHandle handle2 = scheduler.Schedule(ref job2);
|
||||
JobHandle handle3 = scheduler.Schedule(ref job3);
|
||||
|
||||
// Combine into one handle, then pass as a single dependency
|
||||
JobHandle combined = scheduler.CombineDependencies(handle1, handle2, handle3);
|
||||
JobHandle finalHandle = scheduler.Schedule(ref finalJob, combined);
|
||||
scheduler.Wait(finalHandle);
|
||||
```
|
||||
|
||||
## Full example
|
||||
|
||||
The following example chains three jobs together: add a value to each element of an array, multiply each element, then compute the sum.
|
||||
|
||||
```csharp
|
||||
using Misaki.HighPerformance.Jobs;
|
||||
|
||||
public unsafe struct ParallelAddJob : IJobParallel
|
||||
{
|
||||
public float value;
|
||||
public float* inout;
|
||||
|
||||
public void Execute(int startIndex, int endIndex, ref readonly JobExecutionContext ctx)
|
||||
{
|
||||
for (int i = startIndex; i < endIndex; i++)
|
||||
inout[i] += value;
|
||||
}
|
||||
}
|
||||
|
||||
public unsafe struct ParallelMultiplyJob : IJobParallel
|
||||
{
|
||||
public float multiplier;
|
||||
public float* inout;
|
||||
|
||||
public void Execute(int startIndex, int endIndex, ref readonly JobExecutionContext ctx)
|
||||
{
|
||||
for (int i = startIndex; i < endIndex; i++)
|
||||
inout[i] *= multiplier;
|
||||
}
|
||||
}
|
||||
|
||||
public unsafe struct SumJob : IJob
|
||||
{
|
||||
public float* input;
|
||||
public int length;
|
||||
public float* output;
|
||||
|
||||
public void Execute(ref readonly JobExecutionContext ctx)
|
||||
{
|
||||
float sum = 0;
|
||||
for (int i = 0; i < length; i++)
|
||||
sum += input[i];
|
||||
*output = sum;
|
||||
}
|
||||
}
|
||||
|
||||
const int arraySize = 10000;
|
||||
float* data = stackalloc float[arraySize];
|
||||
float result = 0;
|
||||
|
||||
// Chain: add -> multiply -> sum
|
||||
JobHandle handle1 = scheduler.ScheduleParallel(ref new ParallelAddJob { value = 10f, inout = data }, arraySize, 64);
|
||||
JobHandle handle2 = scheduler.ScheduleParallel(ref new ParallelMultiplyJob { multiplier = 2f, inout = data }, arraySize, 64, handle1);
|
||||
JobHandle handle3 = scheduler.Schedule(ref new SumJob { input = data, length = arraySize, output = &result }, handle2);
|
||||
|
||||
scheduler.Wait(handle3);
|
||||
```
|
||||
|
||||
## Async wait
|
||||
|
||||
The scheduler provides async variants that offload the wait to the thread pool. This lets the calling thread continue other work while waiting.
|
||||
|
||||
```csharp
|
||||
// Wait asynchronously for a single job
|
||||
await scheduler.WaitAsync(handle);
|
||||
|
||||
// Wait asynchronously for all jobs
|
||||
await scheduler.WaitAllAsync(new Memory<JobHandle>(new[] { handle1, handle2 }));
|
||||
|
||||
// Wait asynchronously for any job to complete
|
||||
JobHandle completed = await scheduler.WaitAnyAsync(new ReadOnlyMemory<JobHandle>(new[] { handle1, handle2 }));
|
||||
```
|
||||
|
||||
Unlike synchronous `Wait`, the async variants do **not** execute the job inline on the calling thread. The wait is fully offloaded to the thread pool, so the calling thread can continue other work without contributing CPU time to the job's completion.
|
||||
|
||||
Each async method accepts an optional `CancellationToken` to cancel the wait.
|
||||
|
||||
```csharp
|
||||
var cts = new CancellationTokenSource();
|
||||
await scheduler.WaitAsync(handle, cts.Token);
|
||||
```
|
||||
|
||||
## WaitAll and WaitAny
|
||||
|
||||
The synchronous variants reorder the collection in-place, moving completed handles to the front. This allows you to efficiently check which handles are still pending.
|
||||
|
||||
```csharp
|
||||
// After WaitAll, completed handles are at the front of the span
|
||||
scheduler.WaitAll(handles);
|
||||
|
||||
// WaitAny returns the first handle that completed
|
||||
JobHandle firstCompleted = scheduler.WaitAny(handle1, handle2);
|
||||
```
|
||||
|
||||
## Get job status
|
||||
|
||||
You can check a job's current state without waiting.
|
||||
|
||||
```csharp
|
||||
JobState state = scheduler.GetJobStatus(handle);
|
||||
// Returns Created, Scheduled, Running, Completed, or Invalid
|
||||
```
|
||||
|
||||
## Additional resources
|
||||
|
||||
- [Creating and Scheduling Jobs](creating-jobs.md)
|
||||
- [Threading Fundamentals](threading-fundamentals.md)
|
||||
- [Best Practices and API Selection](best-practices.md)
|
||||
@@ -0,0 +1,50 @@
|
||||
# Threading Fundamentals
|
||||
|
||||
The job system uses multiple worker threads to execute your code across all available CPU cores. Each worker thread picks up jobs, executes them, and coordinates with other workers through a lock-free scheduling layer.
|
||||
|
||||
## Multithreading
|
||||
|
||||
When you use the job system, your code executes over worker threads running in parallel across multiple CPU cores. Instead of tasks running one after another on the main thread, they run simultaneously on separate cores. The worker threads run in parallel to one another, and synchronize their results with the calling thread once completed.
|
||||
|
||||
The job system ensures there are only enough threads to match the capacity of the CPU cores. This means you can schedule as many jobs as you need without specifically needing to know how many CPU cores are available.
|
||||
|
||||
## Worker threads
|
||||
|
||||
When you create a `JobScheduler`, it spawns a configurable number of worker threads. These threads form the backbone of the system. Each worker thread runs a continuous loop:
|
||||
|
||||
1. Attempt to find a job to execute.
|
||||
2. If no job is immediately available, spin-wait briefly.
|
||||
3. If still no work, wait for a signal that a new job has been scheduled.
|
||||
4. Execute the found job, then repeat.
|
||||
|
||||
The scheduler also reserves one **helper thread** slot for external threads that call `Wait()` with inline execution enabled. The `WorkerCount` property reports the number of managed worker threads, while `ThreadLocalCount` returns the total (workers + helper). Use `ThreadLocalCount` when allocating thread-local storage to ensure every possible executor has a valid slot.
|
||||
|
||||
## Thread-local queues
|
||||
|
||||
Each worker thread has its own set of thread-local queues. When a job is scheduled with `preferLocal: true`, the scheduler pushes the job onto the calling thread's local queue first. Workers pop from their local queue in last-in-first-out (LIFO) order, which keeps the most recently scheduled job hot in the CPU cache.
|
||||
|
||||
If a worker's local queues are empty, it looks to the global queues or steals work from other workers.
|
||||
|
||||
## Work stealing
|
||||
|
||||
The job system uses work stealing as part of its scheduling strategy to even out the amount of tasks shared across worker threads. Worker threads might process tasks faster than others, so once a worker thread has finished processing all of its tasks, it looks at the other worker threads' queues and then processes tasks assigned to another worker thread.
|
||||
|
||||
On CPUs with a mix of performance cores and efficiency cores, faster cores naturally end up stealing more work, which means the overall workload stays balanced without manual partitioning.
|
||||
|
||||
## Priority scheduling
|
||||
|
||||
Jobs can be assigned one of three priority levels. The scheduler divides dispatch slots to give each priority an appropriate share of execution time:
|
||||
|
||||
| Priority | Share | Use case |
|
||||
|---|---|---|
|
||||
| High | 50% | Critical-path work that must complete quickly |
|
||||
| Normal | 37.5% | Default priority for most jobs |
|
||||
| Low | 12.5% | Background tasks with no immediate deadline |
|
||||
|
||||
The scheduler probes queues in a cascade pattern that respects these ratios. Within each priority tier, the worker checks its local queue first, then the global queue, then attempts to steal from other workers before moving to the next tier.
|
||||
|
||||
## Lock-free scheduling
|
||||
|
||||
All scheduling operations — state transitions, dependency registration, job dispatch — use lock-free techniques such as compare-and-swap (CAS) and interlocked operations. There are no `lock` statements or `Monitor` enters on the hot path.
|
||||
|
||||
This design keeps overhead minimal. Thousands of jobs can be scheduled, executed, and completed per frame without kernel transitions, heap allocations, or garbage collection pauses.
|
||||
10
docs/documents/articles/Misaki.HighPerformance.Jobs/toc.yml
Normal file
10
docs/documents/articles/Misaki.HighPerformance.Jobs/toc.yml
Normal file
@@ -0,0 +1,10 @@
|
||||
- name: Introduction
|
||||
href: introduction.md
|
||||
- name: Threading Fundamentals
|
||||
href: threading-fundamentals.md
|
||||
- name: Creating and Scheduling Jobs
|
||||
href: creating-jobs.md
|
||||
- name: Job Dependencies and Coordination
|
||||
href: job-dependencies.md
|
||||
- name: Best Practices and API Selection
|
||||
href: best-practices.md
|
||||
@@ -0,0 +1,278 @@
|
||||
# Allocators
|
||||
|
||||
Every allocation in this library goes through an `AllocationHandle`. The handle determines where memory comes from, how it's organized, and when it's reclaimed. There is no default allocator — you always choose one explicitly.
|
||||
|
||||
## AllocationHandle
|
||||
|
||||
`AllocationHandle` is a struct with a state pointer and three function pointers:
|
||||
|
||||
```
|
||||
AllocationHandle
|
||||
_state : void* — allocator-specific context
|
||||
_alloc : delegate — allocate
|
||||
_realloc : delegate — reallocate
|
||||
_free : delegate — free
|
||||
```
|
||||
|
||||
The function pointers let any allocator implementation be wrapped without boxing, virtual dispatch, or GC pressure. To create a custom allocator, you only need to populate this struct.
|
||||
|
||||
Every collection stores the `AllocationHandle` it was created with and uses it for all internal memory operations.
|
||||
|
||||
## Built-in allocators
|
||||
|
||||
The library ships with allocators managed by `AllocationManager`:
|
||||
|
||||
| Allocator | Handle access | Backing | Lifetime | Reuse | Thread safety |
|
||||
|---|---|---|---|---|---|
|
||||
| Heap | `AllocationHandle.Persistent` | Heap (replaced by mimalloc if `MHP_ENABLE_MIMALLOC` is defined) | Until freed | Controlled by malloc or mimalloc | Yes |
|
||||
| FreeList | `AllocationHandle.FreeList` | Heap | Until freed | Yes — reuses freed blocks | Yes (remote-free queue) |
|
||||
| TLSF | `AllocationHandle.TLSF` | Virtual memory chunks | Until freed | Yes — low fragmentation | Yes (lock-protected) |
|
||||
| VirtualArena | `AllocationHandle.Temp` | Virtual memory (reserve on init, commit on demand) | Until `ResetTempAllocator()` | No — bulk reset only | Yes |
|
||||
| VirtualStack | `AllocationManager.CreateStackScope().AllocationHandle` | Virtual memory (reserve on init, commit on demand) | Via `VirtualStack.Scope` | Yes when scope disposed | No — thread-local |
|
||||
|
||||
The library also provides heap-based variants of `Arena` and `Stack` for direct use via `MemoryPool`:
|
||||
|
||||
| Allocator | Backing | Lifetime | Reuse |
|
||||
|---|---|---|---|
|
||||
| Arena | Heap (`NativeMemory.Alloc`) | Until `Reset()` or dispose | No — bulk reset only |
|
||||
| Stack | Heap (`NativeMemory.Alloc`) | Via `Stack.Scope` | Yes when scope disposed |
|
||||
|
||||
### VirtualArena (Temp)
|
||||
|
||||
VirtualArena reserves a large virtual address range on initialization and commits physical memory on demand as allocations are made. Allocation bumps an offset pointer — there is no free-list walk, no block splitting, and no metadata search. This makes it the fastest allocator in the library.
|
||||
|
||||
**Free is a no-op.** Individual frees do nothing. The entire arena is reset at once by calling `AllocationManager.ResetTempAllocator()`, which rewinds the offset back to zero. This makes the arena ideal for frame-scoped or phase-scoped work where you want to allocate freely and discard everything at once.
|
||||
|
||||
```csharp
|
||||
// Temp allocations are freed collectively.
|
||||
var a = new UnsafeArray<int>(10, AllocationHandle.Temp);
|
||||
var b = new UnsafeList<int>(AllocationHandle.Temp);
|
||||
|
||||
// Reset everything.
|
||||
AllocationManager.ResetTempAllocator();
|
||||
|
||||
// Both a and b are now invalid.
|
||||
```
|
||||
|
||||
Under the hood, `Temp` uses a `MemoryPool<VirtualArena>`. The arena uses 64 KB pages for OS-level commit granularity and is thread-safe with a lock-free bump-allocate path.
|
||||
|
||||
### FreeList (FreeList)
|
||||
|
||||
The free-list allocator reclaims and reuses individual blocks. When you free a block, it is returned to a size-bucketed free list and can satisfy a future allocation of the same bucket. This avoids the overhead of re-committing virtual memory while keeping fragmentation low.
|
||||
|
||||
FreeList uses per-thread caches for the hot path and a remote-free queue for cross-thread deallocation. This means threads can free each other's allocations without a global lock on the common path.
|
||||
|
||||
```csharp
|
||||
// FreeList allocations can be freed and reused independently.
|
||||
var a = new UnsafeArray<int>(10, AllocationHandle.FreeList);
|
||||
a.Dispose(); // Memory goes back to the free list.
|
||||
|
||||
var b = new UnsafeArray<int>(10, AllocationHandle.FreeList);
|
||||
// Likely backed by the same memory as a.
|
||||
```
|
||||
|
||||
### TLSF (Persistent)
|
||||
|
||||
The Two-Level Segregated Fit allocator guarantees O(1) allocation and deallocation with very low external fragmentation. It organizes free blocks by size class in a two-level bitmap index, which lets it find a best-fit block in constant time. TLSF backs its memory pool with virtual memory chunks allocated via `Mmap`.
|
||||
|
||||
`AllocationHandle.Persistent` maps to the manager's internal TLSF allocator. Use it for long-lived allocations where fragmentation matters and where you need consistent O(1) performance.
|
||||
|
||||
The TLSF implementation is single-threaded internally and wrapped in a lock by the manager. For concurrent use from multiple threads, access is serialized through that lock.
|
||||
|
||||
```csharp
|
||||
// Persistent (TLSF) for long-lived allocations.
|
||||
var cache = new UnsafeHashMap<int, EntityData>(
|
||||
AllocationHandle.Persistent
|
||||
);
|
||||
|
||||
// ... use for the lifetime of the application ...
|
||||
|
||||
cache.Dispose();
|
||||
```
|
||||
|
||||
### VirtualStack (Stack)
|
||||
|
||||
VirtualStack is a LIFO allocator backed by a reserved virtual address range that commits physical memory on demand. It allocates by bumping an offset (like VirtualArena) but adds a scope mechanism that rewinds to a saved position on dispose.
|
||||
|
||||
The stack is **not thread-safe** and is designed for single-threaded or thread-local contexts. Each thread gets its own stack through `AllocationManager.CreateStackScope()`:
|
||||
|
||||
```csharp
|
||||
// Creates a thread-local stack scope.
|
||||
// The scope's AllocationHandle can be used for allocations
|
||||
// that are automatically reclaimed when the scope ends.
|
||||
using var scope = AllocationManager.CreateStackScope();
|
||||
|
||||
// Allocate from the stack.
|
||||
var temp = new UnsafeArray<int>(10, scope.AllocationHandle);
|
||||
|
||||
// When scope is disposed, all allocations are rewound.
|
||||
```
|
||||
|
||||
The stack uses 64 KB commit granularity and supports scope nesting. The scope records the offset at creation and rewinds to it on dispose — allocations from an inner scope are always reclaimed before the outer scope.
|
||||
|
||||
### Arena (heap)
|
||||
|
||||
`Arena` is the heap-based counterpart of `VirtualArena`. It uses `NativeMemory.Alloc` to allocate a fixed-size buffer on the heap and bumps an offset pointer for allocations. It supports the same bulk-reset pattern but without virtual memory reservation.
|
||||
|
||||
```csharp
|
||||
using var arena = new MemoryPool<Arena, Arena.CreationOptions>(
|
||||
new Arena.CreationOptions { size = 1024 * 1024 });
|
||||
|
||||
var arr = new UnsafeArray<int>(10, arena.AllocationHandle);
|
||||
|
||||
// Reset rewinds the offset. Memory stays allocated.
|
||||
arena.Allocator.Reset();
|
||||
```
|
||||
|
||||
`Arena` is thread-safe with a lock-free bump-allocate path.
|
||||
|
||||
### Stack (heap)
|
||||
|
||||
`Stack` is the heap-based counterpart of `VirtualStack`. It uses `NativeMemory.Alloc` to allocate a fixed-size buffer on the heap and uses the same scope mechanism to rewind allocations on scope dispose.
|
||||
|
||||
```csharp
|
||||
using var stack = new MemoryPool<Stack, Stack.CreationOptions>(
|
||||
new Stack.CreationOptions { size = 1024 * 1024 });
|
||||
|
||||
using (var scope = stack.Allocator.CreateScope(stack.AllocationHandle))
|
||||
{
|
||||
var arr = new UnsafeArray<int>(10, scope.AllocationHandle);
|
||||
} // Scope dispose rewinds all allocations.
|
||||
```
|
||||
|
||||
`Stack` is **not thread-safe** and is designed for single-threaded contexts.
|
||||
|
||||
## AllocationManager configuration
|
||||
|
||||
`AllocationManager` can be configured with an `AllocationManagerDesc` to control the capacity and alignment of each built-in allocator:
|
||||
|
||||
```csharp
|
||||
var desc = new AllocationManagerDesc
|
||||
{
|
||||
ArenaCapacity = 1024 * 1024 * 1024, // 1 GB virtual reservation
|
||||
StackCapacity = 32 * 1024 * 1024, // 32 MB per thread
|
||||
FreeListChunkSize = 64 * 1024, // 64 KB chunks
|
||||
FreeListDefaultAlignment = 16, // 16-byte alignment
|
||||
TLSFAlignment = 16, // 16-byte alignment
|
||||
TLSFInitialChunkSize = 64 * 1024 * 1024 // 64 MB initial chunk
|
||||
};
|
||||
|
||||
AllocationManager.Initialize(desc);
|
||||
```
|
||||
|
||||
Calling `Initialize()` with no arguments uses these same defaults.
|
||||
|
||||
## MemoryPool for scoped allocators
|
||||
|
||||
`MemoryPool<TAllocator, TOpts>` creates a standalone allocator outside of `AllocationManager`. This is useful when you want:
|
||||
|
||||
- An allocator type not available in the built-in set
|
||||
- An allocator scoped to a single method or algorithm
|
||||
- Isolation from the global allocation state
|
||||
|
||||
```csharp
|
||||
using var pool = new MemoryPool<TLSF, TLSF.CreationOptions>(
|
||||
new TLSF.CreationOptions
|
||||
{
|
||||
alignment = 16,
|
||||
initialChunkSize = 1024 * 1024
|
||||
});
|
||||
|
||||
using var array = new UnsafeArray<int>(10, pool.AllocationHandle);
|
||||
|
||||
// When pool is disposed, all TLSF memory is released.
|
||||
```
|
||||
|
||||
The pool wraps any type implementing `IMemoryAllocator<TSelf, TOpts>`. This includes `Arena`, `VirtualArena`, `Stack`, `VirtualStack`, `TLSF`, and `DynamicArena`:
|
||||
|
||||
```csharp
|
||||
// Heap-based arena.
|
||||
using var arenaPool = new MemoryPool<Arena, Arena.CreationOptions>(
|
||||
new Arena.CreationOptions { size = 1024 * 1024 });
|
||||
|
||||
// Virtual-memory-based stack.
|
||||
using var stackPool = new MemoryPool<VirtualStack, VirtualStack.CreationOptions>(
|
||||
new VirtualStack.CreationOptions { reserveCapacity = 1024 * 1024 });
|
||||
|
||||
// Dynamically growing arena (heap).
|
||||
using var dynamicPool = new MemoryPool<DynamicArena, DynamicArena.CreationOptions>(
|
||||
new DynamicArena.CreationOptions { initialSize = 4096 });
|
||||
```
|
||||
|
||||
`DynamicArena` creates linked arenas that grow automatically when full, with no virtual address reservation upfront.
|
||||
|
||||
## Custom allocators
|
||||
|
||||
Creating a custom allocator requires populating an `AllocationHandle` with your own allocate, reallocate, and free functions:
|
||||
|
||||
```csharp
|
||||
static void* MyAlloc(void* state, nuint size, nuint alignment, AllocationOption option)
|
||||
{
|
||||
// Your allocation logic.
|
||||
}
|
||||
|
||||
static void* MyRealloc(void* state, void* ptr, nuint oldSize, nuint newSize, nuint alignment, AllocationOption option)
|
||||
{
|
||||
// Your reallocation logic.
|
||||
}
|
||||
|
||||
static void MyFree(void* state, void* ptr)
|
||||
{
|
||||
// Your deallocation logic.
|
||||
}
|
||||
|
||||
var handle = new AllocationHandle(
|
||||
myAllocatorState,
|
||||
&MyAlloc,
|
||||
&MyRealloc,
|
||||
&MyFree
|
||||
);
|
||||
|
||||
var array = new UnsafeArray<int>(10, handle);
|
||||
```
|
||||
|
||||
For more structured custom allocators, implement `IMemoryAllocator<TSelf, TOpts>` and use `MemoryPool<TAllocator, TOpts>`:
|
||||
|
||||
```csharp
|
||||
public unsafe struct MyAllocator
|
||||
: IMemoryAllocator<MyAllocator, MyAllocator.CreationOptions>
|
||||
{
|
||||
public struct CreationOptions { /* ... */ }
|
||||
|
||||
public static MyAllocator Create(in CreationOptions opts) { /* ... */ }
|
||||
public void* Allocate(nuint size, nuint alignment, AllocationOption option) { /* ... */ }
|
||||
public void* Reallocate(void* ptr, nuint oldSize, nuint newSize, nuint alignment, AllocationOption option) { /* ... */ }
|
||||
public void Free(void* ptr) { /* ... */ }
|
||||
public void Dispose() { /* ... */ }
|
||||
}
|
||||
|
||||
using var pool = new MemoryPool<MyAllocator, MyAllocator.CreationOptions>(/* ... */);
|
||||
```
|
||||
|
||||
## AllocationOption
|
||||
|
||||
`AllocationOption` is a flags enum that controls per-allocation behavior:
|
||||
|
||||
| Value | Behavior |
|
||||
|---|---|
|
||||
| `None` | Memory is returned as-is, contents are undefined |
|
||||
| `Clear` | All allocated bytes are zeroed before returning |
|
||||
|
||||
```csharp
|
||||
// Request zeroed memory.
|
||||
var ptr = handle.Alloc(1024, 16, AllocationOption.Clear);
|
||||
```
|
||||
|
||||
`Clear` is useful for security-sensitive data or when you need deterministic initialization. Omitting it avoids the cost of touching every page.
|
||||
|
||||
## Enable Mimalloc
|
||||
|
||||
You can define `MHP_ENABLE_MIMALLOC` to use mimalloc as the underlying allocator for `AllocationHandle.Persistent` and `MemoryUtility.Malloc` instead of the default C allocator.
|
||||
|
||||
> Using mimalloc requires to install the `TerraFX.Interop.Mimalloc` package.
|
||||
|
||||
## Additional resources
|
||||
|
||||
- [Introduction](introduction.md) — install, first steps, and safety checks
|
||||
- [Architecture overview](architecture-overview.md) — layering, MemoryHandle, and struct semantics
|
||||
- [Collection types](collection-types.md) — all available data structures
|
||||
@@ -0,0 +1,123 @@
|
||||
# Architecture overview
|
||||
|
||||
The library is structured as a stack of explicit layers. Each layer has a single responsibility, and you can work at any level depending on how much control you need.
|
||||
|
||||
```
|
||||
User Code
|
||||
|
|
||||
Unsafe collections (UnsafeArray, UnsafeList, UnsafeHashMap, ...)
|
||||
|
|
||||
AllocationHandle (function-pointer-based "interface")
|
||||
|
|
||||
Allocators (Arena / FreeList / TLSF / Stack)
|
||||
|
|
||||
OS memory (VirtualAlloc / malloc / mimalloc)
|
||||
```
|
||||
|
||||
## Design philosophy
|
||||
|
||||
The library is built around four principles:
|
||||
|
||||
**Explicit over implicit.** Every allocation requires an `AllocationHandle`. There is no default allocator, no hidden malloc, and no GC fallback. If memory is allocated, you chose exactly where it came from.
|
||||
|
||||
**Unsafe-first.** All collections work with raw pointers internally. Safety checks are optional and compiled away in release. The library trusts you — and lets you prove you can be trusted.
|
||||
|
||||
**Struct-only collections.** No managed objects, no handles to GC-tracked state, no hidden heap allocations. A collection is just a pointer + metadata. Copy it, inline it, store it in unmanaged memory.
|
||||
|
||||
**Zero overhead by default.** Safety checks, stack traces, tracking — all opt-in via compile-time constants. Release builds produce the same code as hand-written pointer manipulation.
|
||||
|
||||
## AllocationHandle pattern
|
||||
|
||||
`AllocationHandle` is the central abstraction. It is a struct containing a state pointer and three function pointers:
|
||||
|
||||
```
|
||||
AllocationHandle
|
||||
_state : void* — allocator-specific context
|
||||
_alloc : delegate — allocate
|
||||
_realloc : delegate — reallocate
|
||||
_free : delegate — free
|
||||
```
|
||||
|
||||
Because it uses function pointers instead of virtual interfaces, an `AllocationHandle` call is a direct indirect call — no boxing, no vtable lookup, no GC pressure. Any combination of alloc/free/realloc functions can be composed into a handle, which means custom allocators are just a matter of filling in the struct.
|
||||
|
||||
Every collection stores its `AllocationHandle` and uses it for all internal memory operations.
|
||||
|
||||
## MemoryHandle tracking
|
||||
|
||||
`MemoryHandle` is a safety-only struct that pairs an allocation ID with a generation counter:
|
||||
|
||||
```csharp
|
||||
public readonly struct MemoryHandle
|
||||
{
|
||||
public readonly int ID;
|
||||
public readonly int Generation;
|
||||
}
|
||||
```
|
||||
|
||||
When `MHP_ENABLE_SAFETY_CHECKS` is defined, every allocation is registered in `AllocationManager`'s tracking database with its address and size. Operations like dispose verify the handle is still valid, catching double-free and use-after-free errors. The generation counter prevents handle reuse after an allocation is freed.
|
||||
|
||||
In release builds, `MemoryHandle` fields compile away to nothing.
|
||||
|
||||
## AllocationManager
|
||||
|
||||
`AllocationManager` serves two roles:
|
||||
|
||||
- **Registry.** It owns the global instances of the built-in allocators (`Temp`, `FreeList`, `Persistent`, `TLSF`). Calling `AllocationHandle.Persistent` returns a handle to the manager's internal TLSF allocator instance.
|
||||
|
||||
- **Safety database.** When safety checks are enabled, the manager tracks every live allocation. Diagnostics, snapshot inspection, and leak detection all go through this system.
|
||||
|
||||
The manager must be initialized before any allocation and disposed at shutdown:
|
||||
|
||||
```csharp
|
||||
AllocationManager.Initialize();
|
||||
// ... use collections ...
|
||||
AllocationManager.Dispose();
|
||||
```
|
||||
|
||||
## MemoryPool for scoped allocators
|
||||
|
||||
`MemoryPool<TAllocator, TOpts>` creates a standalone allocator scoped to a method or algorithm, independent of `AllocationManager`. This is useful when you want an allocator that doesn't exist in the built-in set, or you want to isolate allocations from the global state:
|
||||
|
||||
```csharp
|
||||
using var pool = new MemoryPool<TLSF, TLSF.CreationOptions>(
|
||||
new TLSF.CreationOptions { alignment = 16 });
|
||||
|
||||
using var array = new UnsafeArray<int>(10, pool.AllocationHandle);
|
||||
```
|
||||
|
||||
Collections work with any `AllocationHandle`, regardless of whether it came from `AllocationManager` or a `MemoryPool`. You do not need to initialize `AllocationManager` when using only standalone pools. The allocator lives as long as the pool, and all its memory is released when the pool is disposed.
|
||||
|
||||
## Struct semantics
|
||||
|
||||
All collections are structs with no managed references. This has two important consequences:
|
||||
|
||||
- **No GC overhead.** The struct itself is stack-allocated or inlineable. The memory it points to is unmanaged and outside the GC's view.
|
||||
|
||||
- **Pass by value copies the struct.** If you pass an `UnsafeList<T>` to a method without `ref`, the method operates on a copy. Additions, removals, and resizes on that copy are invisible to the caller. Always use `ref` for mutation:
|
||||
|
||||
```csharp
|
||||
public void Process(ref UnsafeList<int> list) { ... }
|
||||
```
|
||||
|
||||
## Safety checks system
|
||||
|
||||
The library supports two compile-time safety levels:
|
||||
|
||||
- `MHP_ENABLE_SAFETY_CHECKS` — enables bounds checking, use-after-free detection, double-free detection, and `IsCreated` validity verification (checks that the internal memory handle is still registered in the tracking database).
|
||||
|
||||
- `MHP_ENABLE_STACKTRACE` — adds stack trace capture on every allocation, enabling precise leak investigation. Requires `MHP_ENABLE_SAFETY_CHECKS`.
|
||||
|
||||
When `MHP_ENABLE_SAFETY_CHECKS` is not defined, the safety fields compile away and `IsCreated` only checks whether the internal pointer is non-null without verifying the actual validity of the memory. This matches the performance of raw pointer code.
|
||||
|
||||
## AllocationOption
|
||||
|
||||
Allocation operation can take an optional `AllocationOption`:
|
||||
|
||||
| Value | Behavior |
|
||||
|---|---|
|
||||
| `None` | Default — memory is returned as-is |
|
||||
| `Clear` | Zero the allocated memory before returning |
|
||||
|
||||
## Content files packaging
|
||||
|
||||
The library is packaged as content files rather than a traditional assembly. Source files are embedded directly into the consuming project at build time. This enables the AOT compiler to see through every call site, inline aggressively, and strip unused code paths — including the entire safety check infrastructure when the relevant constants are undefined.
|
||||
@@ -0,0 +1,77 @@
|
||||
# Collection types
|
||||
|
||||
All collection types in this library are structs that wrap unmanaged memory allocated through an `AllocationHandle`. They follow the same general API patterns as the BCL collections but operate entirely outside the GC heap.
|
||||
|
||||
## Array-like types
|
||||
|
||||
| Data structure | Description |
|
||||
|---|---|
|
||||
| `UnsafeArray<T>` | A fixed-size array. Supports resize via `Resize()`. |
|
||||
| `UnsafeList<T>` | A dynamically resizing list. |
|
||||
| `UnsafeQueue<T>` | A FIFO queue. |
|
||||
| `UnsafeStack<T>` | A LIFO stack. |
|
||||
| `UnsafeChunkedList<T>` | A list that stores elements in fixed-size chunks. Adding elements never moves existing ones, providing stable element addresses. |
|
||||
|
||||
## Map and set types
|
||||
|
||||
| Data structure | Description |
|
||||
|---|---|
|
||||
| `UnsafeHashMap<TKey, TValue>` | An unordered associative array of key-value pairs. |
|
||||
| `UnsafeHashSet<T>` | A set of unique values. |
|
||||
| `UnsafeMultiHashMap<TKey, TValue>` | An unordered associative array where keys don't have to be unique. Multiple values can share the same key. |
|
||||
|
||||
## Sparse types
|
||||
|
||||
| Data structure | Description |
|
||||
|---|---|
|
||||
| `UnsafeSparseSet<T>` | A sparse set that provides O(1) insertion, deletion, and lookup. Uses the dense/sparse array pattern. Sparse indices work like entity IDs and are automatically generated. |
|
||||
| `UnsafeSlotMap<T>` | A slot map with generation counters. Fast insertion, removal, and lookup by slot index. The generation counter prevents stale index access to data that has been replaced. |
|
||||
|
||||
## String and text types
|
||||
|
||||
| Data structure | Description |
|
||||
|---|---|
|
||||
| `FixedString32` | A 32-byte UTF-16 string (16 characters max). |
|
||||
| `FixedString64` | A 64-byte UTF-16 string (32 characters max). |
|
||||
| `FixedString128` | A 128-byte UTF-16 string (64 characters max). |
|
||||
| `FixedString256` | A 256-byte UTF-16 string (128 characters max). |
|
||||
| `FixedString512` | A 512-byte UTF-16 string (256 characters max). |
|
||||
| `FixedString1024` | A 1024-byte UTF-16 string (512 characters max). |
|
||||
| `FixedString2048` | A 2048-byte UTF-16 string (1024 characters max). |
|
||||
| `FixedString4096` | A 4096-byte UTF-16 string (2048 characters max). |
|
||||
| `FixedText32` | A 32-byte UTF-8 encoded string (30 bytes max). |
|
||||
| `FixedText64` | A 64-byte UTF-8 encoded string (62 bytes max). |
|
||||
| `FixedText128` | A 128-byte UTF-8 encoded string (126 bytes max). |
|
||||
| `FixedText256` | A 256-byte UTF-8 encoded string (254 bytes max). |
|
||||
| `FixedText512` | A 512-byte UTF-8 encoded string (510 bytes max). |
|
||||
| `FixedText1024` | A 1024-byte UTF-8 encoded string (1022 bytes max). |
|
||||
| `FixedText2048` | A 2048-byte UTF-8 encoded string (2046 bytes max). |
|
||||
| `FixedText4096` | A 4096-byte UTF-8 encoded string (4094 bytes max). |
|
||||
|
||||
All fixed string and text types are stack-only. Every copy duplicates the underlying data.
|
||||
|
||||
## Parallel types
|
||||
|
||||
| Data structure | Description |
|
||||
|---|---|
|
||||
| `UnsafeParallelQueue<T>` | A dynamically resizing, lock-free queue. Provides `ParallelProducer` and `ParallelConsumer` views for safe concurrent access. Uses a spin lock only during chunk allocation. |
|
||||
| `UnsafeParallelHashMap<TKey, TValue>` | A parallel hash map. Provides a `ParallelWriter` for concurrent insertions from multiple threads. Does not resize concurrently — pre-allocate enough capacity. |
|
||||
|
||||
## Bit structures
|
||||
|
||||
| Data structure | Description |
|
||||
|---|---|
|
||||
| `UnsafeBitSet` | An arbitrary-sized array of bits with set, test, clear, and search operations. |
|
||||
|
||||
## Utility types
|
||||
|
||||
| Type | Description |
|
||||
|---|---|
|
||||
| `ReadOnlyUnsafeCollection<T>` | A read-only view over a pointer and count. Implicitly converts to `ReadOnlySpan<T>`. Useful for passing collection data to APIs that expect spans. |
|
||||
| `DisposablePtr<T>` | A pointer wrapper that calls `Dispose` on the pointed-to value when disposed. Used by allocate-on-heap factory methods like `UnsafeParallelQueue<T>.Allocate()`. |
|
||||
|
||||
## Additional resources
|
||||
|
||||
- [Introduction](introduction.md) — install, first steps, and safety checks
|
||||
- [Architecture overview](architecture-overview.md) — layering, AllocationHandle, and struct semantics
|
||||
- [Allocators](allocators.md) — built-in allocators, MemoryPool, and custom allocators
|
||||
@@ -0,0 +1,66 @@
|
||||
# Introduction
|
||||
|
||||
The low-level library provides unsafe collections, allocators, and memory-management primitives for high-performance C#. It gives you explicit control over allocation, layout, and ownership so you can build systems that run without GC interference.
|
||||
|
||||
## Why a dedicated low-level library?
|
||||
|
||||
Standard .NET memory management wasn't designed for allocation-heavy game and simulation workloads:
|
||||
|
||||
- `NativeMemory.Alloc` and `Marshal.AllocHGlobal` provide raw allocation but no collection types, no lifetime tracking, and no safety checks.
|
||||
- The BCL collections (`List<T>`, `Dictionary<K,V>`) allocate on the managed heap, producing GC pressure in tight loops.
|
||||
- `Span<T>` and `Memory<T>` avoid allocations but don't own their memory and can't manage lifetimes across asynchronous boundaries.
|
||||
|
||||
This library solves these problems with pluggable allocators, unsafe collections that wrap raw pointers, and a safety check system that can be compiled away in release builds.
|
||||
|
||||
## Feature highlights
|
||||
|
||||
| Feature | Description |
|
||||
|---|---|
|
||||
| Pluggable allocators | Every allocation passes through an `AllocationHandle` — choose the right allocator per use case |
|
||||
| Built-in allocators | `Temp`, `FreeList`, `Persistent`, `TLSF` — or use `MemoryPool` with heap-based `Arena` and `Stack` |
|
||||
| Unsafe collections | Arrays, lists, queues, stacks, hash maps, hash sets, sparse sets, slot maps, chunked lists |
|
||||
| Parallel-aware types | Lock-free queue and concurrent hash map with parallel reader/writer views |
|
||||
| Fixed-size text | Stack-only `FixedString` (UTF-16) and `FixedText` (UTF-8) for zero-allocation string operations |
|
||||
| Compile-time safety | `MHP_ENABLE_SAFETY_CHECKS` enables bounds checking, use-after-free detection, and leak tracking — compiled away in release |
|
||||
| Custom allocators | Implement your own allocator by populating an `AllocationHandle` with function pointers |
|
||||
| `MemoryPool<TAllocator>` | Scope allocators to a method or algorithm without touching the global state |
|
||||
| Struct semantics | All collections are structs with no managed handles — no GC overhead, pass by `ref` for mutation |
|
||||
|
||||
## Basic usage
|
||||
|
||||
```csharp
|
||||
using Misaki.HighPerformance.LowLevel.Buffer;
|
||||
|
||||
AllocationManager.Initialize();
|
||||
|
||||
var array = new UnsafeArray<int>(10, AllocationHandle.Persistent);
|
||||
|
||||
array[0] = 42;
|
||||
Console.WriteLine(array[0]); // Output: 42
|
||||
|
||||
array.Dispose();
|
||||
AllocationManager.Dispose();
|
||||
```
|
||||
|
||||
## Who this is for
|
||||
|
||||
- Custom game engine developers who need allocation control without GC pauses
|
||||
- Systems programmers building runtime components, job schedulers, or custom allocators
|
||||
- .NET developers who have hit performance limits with managed collections in hot paths
|
||||
|
||||
## Requirements
|
||||
|
||||
- .NET 10.0 or later
|
||||
- `unsafe` code enabled (`<AllowUnsafeBlocks>true</AllowUnsafeBlocks>`)
|
||||
|
||||
## Install
|
||||
|
||||
```bash
|
||||
dotnet add package Misaki.HighPerformance.LowLevel
|
||||
```
|
||||
|
||||
## Additional resources
|
||||
|
||||
- [Architecture overview](architecture-overview.md) — understand the allocation model and design philosophy
|
||||
- [Allocators](allocators.md) — learn about each built-in allocator and how to create custom ones
|
||||
- [Collection types](collection-types.md) — explore all available data structures
|
||||
@@ -0,0 +1,8 @@
|
||||
- name: Introduction
|
||||
href: introduction.md
|
||||
- name: Architecture overview
|
||||
href: architecture-overview.md
|
||||
- name: Allocators
|
||||
href: allocators.md
|
||||
- name: Collection types
|
||||
href: collection-types.md
|
||||
7
docs/documents/articles/toc.yml
Normal file
7
docs/documents/articles/toc.yml
Normal file
@@ -0,0 +1,7 @@
|
||||
- name: Misaki.HighPerformance.Jobs
|
||||
href: Misaki.HighPerformance.Jobs/toc.yml
|
||||
topicHref: Misaki.HighPerformance.Jobs/introduction.md
|
||||
|
||||
- name: Misaki.HighPerformance.LowLevel
|
||||
href: Misaki.HighPerformance.LowLevel/toc.yml
|
||||
topicHref: Misaki.HighPerformance.LowLevel/introduction.md
|
||||
Reference in New Issue
Block a user