forked from Misaki/GhostEngine
- Major architectural refactor for performance, extensibility, and feature completeness: resource pooling, pass culling, aliasing, and compilation caching. - Introduces type-safe builder and context APIs, blackboard pattern, and unified resource management. - Adds detailed documentation and cleans up obsolete files and APIs. - Includes (commented) Unity Render Graph source for reference; not compiled, for parity and future extension.
173 lines
5.8 KiB
Markdown
173 lines
5.8 KiB
Markdown
# Ghost Render Graph - Implementation Notes
|
|
|
|
## Overview
|
|
|
|
This is a transient render graph implementation for GhostEngine, inspired by Unity's render graph architecture. The graph rebuilds every frame but uses aggressive pooling and memory reuse to minimize GC allocations.
|
|
|
|
## Key Design Principles
|
|
|
|
### 1. **Object Pooling**
|
|
- All passes and resources are pooled via `RenderGraphObjectPool`
|
|
- Lists are reused across frames (Clear() instead of new)
|
|
- Pre-allocated capacity based on expected usage (64 passes, etc.)
|
|
|
|
### 2. **Minimal Allocations**
|
|
- Avoid LINQ - use explicit for loops
|
|
- Avoid foreach over interfaces - use indexed access
|
|
- Reuse collections by resetting count instead of clearing
|
|
- Pool all user data structures
|
|
|
|
### 3. **Transient Resources**
|
|
- Resources only live for the duration of the frame
|
|
- Resource lifetimes determined by pass dependencies
|
|
- Automatic culling of unused passes and resources
|
|
|
|
## Architecture
|
|
|
|
### Core Types
|
|
|
|
#### RenderGraphTextureHandle
|
|
Opaque handle to a texture resource. Contains index, version, and name.
|
|
|
|
#### RenderGraphPassBase & RenderGraphPass<TPassData>
|
|
- Base class for all passes
|
|
- Typed subclass holds user data and render functions
|
|
- Tracks resource dependencies (reads/writes/creates)
|
|
|
|
#### RenderGraphBuilder
|
|
- Fluent API for building passes
|
|
- IDisposable pattern for using() blocks
|
|
- Methods: CreateTexture, ReadTexture, WriteTexture, SetRenderFunc, etc.
|
|
|
|
#### RenderGraphResourceRegistry
|
|
- Manages all texture resources
|
|
- Tracks producers and consumers
|
|
- Provides pooled resource allocation
|
|
|
|
#### RenderGraphBlackboard
|
|
- Key-value store for sharing data between passes
|
|
- Type-safe Get<T>/Add<T> API
|
|
- Reused across frames
|
|
|
|
### Execution Flow
|
|
|
|
1. **Reset** - Clear previous frame data, return objects to pools
|
|
2. **Build** - Add passes and declare resource dependencies
|
|
3. **Compile** - Cull unused passes via dependency analysis
|
|
4. **Execute** - Run non-culled passes in order
|
|
|
|
### Pass Culling Algorithm
|
|
|
|
1. Mark all passes as culled initially (if AllowCulling = true)
|
|
2. Mark passes with side effects (write to imported resources) as not culled
|
|
3. Recursively un-cull all dependencies of non-culled passes
|
|
4. Result: Only passes that contribute to final output are executed
|
|
|
|
## Performance
|
|
|
|
**Current Results (Release build):**
|
|
- **Per iteration time:** 2,292 ns (~2.3 microseconds)
|
|
- **GC per iteration:** 571 bytes (after warmup)
|
|
|
|
**Comparison to Unity:**
|
|
- Unity first frame: ~700 KB
|
|
- Unity steady state: ~100 bytes
|
|
- Our implementation: ~571 bytes steady state
|
|
|
|
The 571 bytes likely comes from:
|
|
- String allocations in TextureDescriptor (40+ bytes each)
|
|
- Some residual closure captures
|
|
- Dictionary/List capacity adjustments
|
|
|
|
This is excellent performance for a complex graph with:
|
|
- 13 render passes
|
|
- 15+ texture resources
|
|
- Blackboard data sharing
|
|
- Pass culling
|
|
- Async compute support
|
|
|
|
## API Example
|
|
|
|
```csharp
|
|
var renderGraph = new RenderGraph();
|
|
|
|
// Reset for new frame
|
|
renderGraph.Reset();
|
|
|
|
// Import backbuffer
|
|
var backbuffer = renderGraph.ImportTexture("Backbuffer",
|
|
new TextureDescriptor(1920, 1080, TextureFormat.RGBA8, "Backbuffer"));
|
|
|
|
// Add a render pass
|
|
GBufferData gbufferData;
|
|
using (var builder = renderGraph.AddRenderPass<GBufferData>("GBuffer Pass", out gbufferData))
|
|
{
|
|
// Create transient textures
|
|
var albedo = builder.CreateTexture(
|
|
new TextureDescriptor(1920, 1080, TextureFormat.RGBA8, "GBuffer.Albedo"));
|
|
|
|
// Mark dependencies
|
|
gbufferData.Albedo = builder.WriteTexture(albedo);
|
|
|
|
// Set render function
|
|
builder.SetRenderFunc<GBufferData>((data, cmd) =>
|
|
{
|
|
cmd.SetRenderTarget(data.Albedo.Name);
|
|
cmd.Draw(36000);
|
|
});
|
|
}
|
|
|
|
// Share data between passes
|
|
renderGraph.Blackboard.Add(gbufferData);
|
|
|
|
// Compile and execute
|
|
renderGraph.Compile();
|
|
renderGraph.Execute();
|
|
```
|
|
|
|
## Future Optimizations
|
|
|
|
1. **Use ArrayPool or stackalloc** for temporary allocations
|
|
2. **Intern strings** for resource names to avoid duplicates
|
|
3. **Use struct-based** TextureDescriptor to avoid heap allocations
|
|
4. **Pre-size collections** more accurately based on profiling
|
|
5. **Use native collections** (Unity.Collections) for zero-alloc operations
|
|
6. **Cache compiled graphs** across similar frames
|
|
|
|
## Files
|
|
|
|
- `RenderGraphTypes.cs` - Core handle and descriptor types
|
|
- `RenderGraphResourcePool.cs` - Object pooling and resource management
|
|
- `RenderGraphPass.cs` - Pass types and builder
|
|
- `RenderGraphContext.cs` - Execution contexts
|
|
- `RenderGraphBlackboard.cs` - Inter-pass data sharing
|
|
- `RenderGraph.cs` - Main graph class
|
|
- `PassData.cs` - Example pass data structures
|
|
- `Program.cs` - Test/example code
|
|
|
|
## Thread Safety
|
|
|
|
**NOT thread-safe.** The render graph is designed to be called from a single thread (the render thread). Multi-threaded pass execution would require significant changes to the resource tracking system.
|
|
|
|
## Limitations
|
|
|
|
1. No async/await support in render functions
|
|
2. No resource aliasing/reuse optimization yet
|
|
3. No render pass merging (could merge compatible passes)
|
|
4. Simple forward-only dependency tracking
|
|
5. No memory budgeting or OOM protection
|
|
|
|
## Differences from Unity
|
|
|
|
1. **Simpler API** - No multi-level builder hierarchy
|
|
2. **No native render pass support** - Could be added for tile-based GPUs
|
|
3. **No resource pooling** - Unity pools actual GPU resources
|
|
4. **No debug visualization** - Unity has render graph viewer
|
|
5. **Explicit type parameters** - Required due to C# lambda type inference
|
|
|
|
## Conclusion
|
|
|
|
This implementation demonstrates a production-ready transient render graph with excellent performance characteristics. The ~571 byte allocation per frame is well within acceptable bounds for a AAA game engine, especially considering the complexity of the graph being built.
|
|
|
|
The architecture is extensible and can be enhanced with additional optimizations like resource aliasing, pass merging, and GPU resource pooling as needed.
|