Chapters

Hide chapters

Metal by Tutorials

Third Edition · macOS 12 · iOS 15 · Swift 5.5 · Xcode 13

Section I: Beginning Metal

Section 1: 10 chapters
Show chapters Hide chapters

Section II: Intermediate Metal

Section 2: 8 chapters
Show chapters Hide chapters

Section III: Advanced Metal

Section 3: 8 chapters
Show chapters Hide chapters

26. GPU-Driven Rendering
Written by Marius Horga & Caroline Begbie

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

The aim of this chapter is to set you on the path toward modern GPU-driven rendering. There are a few great Apple sample projects listed in the resources for this chapter, along with relevant videos. However, the samples can be quite intimidating. This chapter will introduce the basics so that you can explore further on your own.

The GPU requires a lot of information to be able to render a model. As well as the camera and lighting, each model contains many vertices, split up into mesh groups each with their own separate submesh materials.

A house model with submeshes expanded
A house model with submeshes expanded

The scene you’ll render, in contrast, will only render two static models, each with one mesh and one submesh. Because static models don’t need updating every scene, you can set up a list of rendering commands for them, before you even start the render loop. Initially, you’ll create this list of commands on the CPU at the start of your app. Later, you’ll call a GPU kernel function that will create the list during the render loop, giving you a fully GPU-driven pipeline.

With this simple project, you may not see the immediate gains. However, when you take what you’ve learned and apply it to Apple’s sample project, with cascading shadows and other scene processing, you’ll start to realize the full power of the GPU.

You’ll need recent hardware to run the code in this chapter. Techniques involved include:

  • Non-uniform threadgroups: Supported on Apple Family GPU 4 and later (A11).
  • Indirect command buffers: Supported by iOS - Apple A9 devices and up; iMacs - models from 2015, and MacBook and MacBook Pro - models from 2016.
  • Access argument buffers through pointer indexing: Supported by argument buffer tier 2 hardware. This includes Apple GPU Family 6 and up (A13 and Silicon). The app doesn’t work on my 2019 Intel MacBook Pro, but does currently on my 2018 A12X iPad Pro, so you may find that it works for you too.

The Starter Project

➤ In Xcode, open the starter project, and build and run the app.

The starter app
The starter app

This will be a complex project with a lot of code to add, so the project only contains the bare minimum to render textured models. All shadows, transparency and lighting has been removed.

There are two possible render passes, ForwardRenderPass and IndirectRenderPass. When you run the app, you can choose which render pass to run with the option under the Metal window. Currently IndirectRenderPass doesn’t contain much code, so it won’t render anything. IndirectRenderPass.swift is where you’ll add most of the CPU code in this chapter. You’ll change the GPU shader functions in Shaders/Indirect.metal.

➤ Open ForwardRenderPass.swift, and examine draw(commandBuffer:scene:uniforms:params:).

Instead of rendering the model in Model, the rendering code is all here. You can see each render encoder command listed in this one method. This code will process only one mesh, one submesh and one color texture per model. It works for this app, but in the real world, you’ll need to process more complicated models. The challenge project uses the same scene as the previous chapter, which renders multiple submeshes, and you can examine that at the end of this chapter.

Indirect Command Buffers

In the previous chapter, you created argument buffers for your textures. These argument buffers point to textures in a texture heap.

Your render loop
Moel yirjen jiih

Indirect rendering
Alhasizs zutlicajq

1. Initializing the Uniform Buffers

➤ In the Render Passes group, open IndirectRenderPass.swift.

var uniformsBuffer: MTLBuffer!
var modelParamsBuffer: MTLBuffer!
typedef struct {
  matrix_float4x4 modelMatrix;
  matrix_float3x3 normalMatrix;
  uint tiling;
} ModelParams;
mutating func initializeUniforms(_ models: [Model]) {
  let bufferLength = MemoryLayout<Uniforms>.stride
  uniformsBuffer =
    Renderer.device.makeBuffer(length: bufferLength, options: [])
  uniformsBuffer.label = "Uniforms"

  var modelParams: [ModelParams] = models.map { model in
    var modelParams = ModelParams()
    modelParams.modelMatrix = model.transform.modelMatrix
    modelParams.normalMatrix = modelParams.modelMatrix.upperLeft
    modelParams.tiling = model.tiling
    return modelParams
  }
  modelParamsBuffer = Renderer.device.makeBuffer(
    bytes: &modelParams,
    length: MemoryLayout<ModelParams>.stride * models.count,
    options: [])
  modelParamsBuffer.label = "Model Transforms Array"
}
func updateUniforms(scene: GameScene, uniforms: Uniforms) {
  var uniforms = uniforms
  uniformsBuffer.contents().copyMemory(
    from: &uniforms,
    byteCount: MemoryLayout<Uniforms>.stride)
}
updateUniforms(scene: scene, uniforms: uniforms)
mutating func initialize(models: [Model]) {
  initializeUniforms(models)
}
indirectRenderPass.initialize(models: scene.models)

2. Setting up an Indirect Command Buffer

You’re now ready to create some indirect commands.

var icb: MTLIndirectCommandBuffer!
mutating func initializeICBCommands(_ models: [Model]) {
  let icbDescriptor = MTLIndirectCommandBufferDescriptor()
  icbDescriptor.commandTypes = [.drawIndexed]
  icbDescriptor.inheritBuffers = false
  icbDescriptor.maxVertexBufferBindCount = 25
  icbDescriptor.maxFragmentBufferBindCount = 25
  icbDescriptor.inheritPipelineState = true
}
guard let icb = Renderer.device.makeIndirectCommandBuffer(
  descriptor: icbDescriptor,
  maxCommandCount: models.count,
  options: []) else { fatalError("Failed to create ICB") }
self.icb = icb

3. Setting up the Indirect Commands

Now that you’ve set up an indirect command buffer, you’ll add the list of commands to it.

for (modelIndex, model) in models.enumerated() {
  let mesh = model.meshes[0]
  let submesh = mesh.submeshes[0]
  let icbCommand = icb.indirectRenderCommandAt(modelIndex)
  icbCommand.setVertexBuffer(
    uniformsBuffer, offset: 0, at: UniformsBuffer.index)
  icbCommand.setVertexBuffer(
    modelParamsBuffer, offset: 0, at: ModelParamsBuffer.index)
  icbCommand.setFragmentBuffer(
    modelParamsBuffer, offset: 0, at: ModelParamsBuffer.index)
  icbCommand.setVertexBuffer(
    mesh.vertexBuffers[VertexBuffer.index],
    offset: 0,
    at: VertexBuffer.index)
  icbCommand.setVertexBuffer(
    mesh.vertexBuffers[UVBuffer.index],
    offset: 0,
    at: UVBuffer.index)
  icbCommand.setFragmentBuffer(
    submesh.argumentBuffer!, offset: 0, at: MaterialBuffer.index)
}
Vertex buffer layouts
Qomnaf hamkit qoteukb

icbCommand.drawIndexedPrimitives(
  .triangle,
  indexCount: submesh.indexCount,
  indexType: submesh.indexType,
  indexBuffer: submesh.indexBuffer,
  indexBufferOffset: submesh.indexBufferOffset,
  instanceCount: 1,
  baseVertex: 0,
  baseInstance: modelIndex)
initializeICBCommands(models)

4. Updating the Render Loop

Currently none of your resources are making their way to the GPU.

func useResources(
  encoder: MTLRenderCommandEncoder, models: [Model]
) {
  encoder.pushDebugGroup("Using resources")
  encoder.useResource(uniformsBuffer, usage: .read)
  encoder.useResource(modelParamsBuffer, usage: .read)
  if let heap = TextureController.heap {
    encoder.useHeap(heap)
  }
  for model in models {
    let mesh = model.meshes[0]
    let submesh = mesh.submeshes[0]
    encoder.useResource(
      mesh.vertexBuffers[VertexBuffer.index], usage: .read)
    encoder.useResource(
      mesh.vertexBuffers[UVBuffer.index], usage: .read)
    encoder.useResource(
      submesh.indexBuffer, usage: .read)
    encoder.useResource(
      submesh.argumentBuffer!, usage: .read)
  }
  encoder.popDebugGroup()
}
useResources(encoder: renderEncoder, models: scene.models)
Loaded indirect resources
Guamuq udpukuhk cebuupmeq

5. Updating the Shader Functions

➤ In the Shaders group, open Indirect.metal.

constant ModelParams *modelParams [[buffer(ModelParamsBuffer)]],
uint modelIndex [[base_instance]]
ModelParams model = modelParams[modelIndex];
model.modelMatrix
uint modelIndex [[flat]];
.modelIndex = modelIndex
constant ModelParams *modelParams [[buffer(ModelParamsBuffer)]]
ModelParams model = modelParams[in.modelIndex];
model.tiling
constant Params &params [[buffer(ParamsBuffer)]],

6. Execute the Command List

All the code you have written in this chapter so far has been building up to one command.

renderEncoder.executeCommandsInBuffer(
  icb, range: 0..<scene.models.count)
The indirect command buffer inherits pipelines ( inheritPipelineState = YES) but the render pipeline set on this encoder does not support indirect command buffers ( supportIndirectCommandBuffers = NO )
pipelineDescriptor.supportIndirectCommandBuffers = true
Indirect encoding
Ozluluhq uhgohoqc

Execute indirect commands
Ihijapa okyiweqj pasremlz

The indirect command list
Kro azroxoxd muxfasp yexf

GPU-Driven Rendering

You’ve achieved indirect CPU rendering, by setting up a command list and rendering it. However, you can go one better and get the GPU to create this command list.

GPU command creation
ZYI cusqazm jvuomuex

Creating commands per thread
Fsoarots kuhbixfc lar blmaiz

1. Creating the Kernel Function

You’ll start by creating the kernel function compute shader so that you can see what data you have to pass. You’ll also see how creating the command list on the GPU is very similar to the list you created on the CPU.

#import "Common.h"

struct ICBContainer {
  command_buffer icb [[id(0)]];
};

struct Model {
  constant float *vertexBuffer;
  constant float *uvBuffer;
  constant uint *indexBuffer;
  constant float *materialBuffer;
};
kernel void encodeCommands(
  // 1
  uint modelIndex [[thread_position_in_grid]],
  // 2
  device ICBContainer *icbContainer [[buffer(ICBBuffer)]],
  constant Uniforms &uniforms [[buffer(UniformsBuffer)]],
  // 3
  constant Model *models [[buffer(ModelsBuffer)]],
  constant ModelParams *modelParams [[buffer(ModelParamsBuffer)]],
  constant MTLDrawIndexedPrimitivesIndirectArguments
    *drawArgumentsBuffer [[buffer(DrawArgumentsBuffer)]])
{
}
// 1
Model model = models[modelIndex];
MTLDrawIndexedPrimitivesIndirectArguments drawArguments
  = drawArgumentsBuffer[modelIndex];
// 2
render_command cmd(icbContainer->icb, modelIndex);
// 3
cmd.set_vertex_buffer  (&uniforms,       UniformsBuffer);
cmd.set_vertex_buffer  (model.vertexBuffer,   VertexBuffer);
cmd.set_vertex_buffer  (model.uvBuffer,  UVBuffer);
cmd.set_vertex_buffer  (modelParams,     ModelParamsBuffer);
cmd.set_fragment_buffer(modelParams,     ModelParamsBuffer);
cmd.set_fragment_buffer(model.materialBuffer, MaterialBuffer);
cmd.draw_indexed_primitives(
  primitive_type::triangle,
  drawArguments.indexCount,
  model.indexBuffer + drawArguments.indexStart,
  drawArguments.instanceCount,
  drawArguments.baseVertex,
  drawArguments.baseInstance);

2. The Compute Pipeline State

➤ Open IndirectRenderPass.swift, and create these new properties in IndirectRenderPass:

let icbPipelineState: MTLComputePipelineState
let icbComputeFunction: MTLFunction
icbComputeFunction =
  Renderer.library.makeFunction(name: "encodeCommands")!
icbPipelineState = PipelineStates.createComputePSO(
  function: "encodeCommands")

3. Setting Up the Argument Buffers

The encodeCommands kernel function requires two structures as input: one for the ICB, and one for the model.

var icbBuffer: MTLBuffer!
var modelsBuffer: MTLBuffer!
let icbEncoder = icbComputeFunction.makeArgumentEncoder(
  bufferIndex: ICBBuffer.index)
icbBuffer = Renderer.device.makeBuffer(
  length: icbEncoder.encodedLength,
  options: [])
icbEncoder.setArgumentBuffer(icbBuffer, offset: 0)
icbEncoder.setIndirectCommandBuffer(icb, index: 0)
mutating func initializeModels(_ models: [Model]) {
  // 1
  let encoder = icbComputeFunction.makeArgumentEncoder(
    bufferIndex: ModelsBuffer.index)
  // 2
  modelsBuffer = Renderer.device.makeBuffer(
    length: encoder.encodedLength * models.count, options: [])
  // 3
  for (index, model) in models.enumerated() {
    let mesh = model.meshes[0]
    let submesh = mesh.submeshes[0]
    encoder.setArgumentBuffer(
      modelsBuffer, startOffset: 0, arrayElement: index)
    encoder.setBuffer(
      mesh.vertexBuffers[VertexBuffer.index], offset: 0, index: 0)
    encoder.setBuffer(
      mesh.vertexBuffers[UVBuffer.index],
      offset: 0,
      index: 1)
    encoder.setBuffer(
      submesh.indexBuffer,
      offset: submesh.indexBufferOffset,
      index: 2)
    encoder.setBuffer(submesh.argumentBuffer!, offset: 0, index: 3)
  }
}
initializeModels(models)

4. Setting Up the Draw Arguments

The encodeCommands kernel function takes in an array of draw arguments that it uses for each draw call. You’ll now set these up into a buffer.

var drawArgumentsBuffer: MTLBuffer!
mutating func initializeDrawArguments(models: [Model]) {
  let drawLength = models.count *
    MemoryLayout<MTLDrawIndexedPrimitivesIndirectArguments>.stride
  drawArgumentsBuffer = Renderer.device.makeBuffer(
    length: drawLength, options: [])
  drawArgumentsBuffer.label = "Draw Arguments"
  var drawPointer =
    drawArgumentsBuffer.contents().bindMemory(
      to: MTLDrawIndexedPrimitivesIndirectArguments.self,
      capacity: models.count)
}
for (modelIndex, model) in models.enumerated() {
  let mesh = model.meshes[0]
  let submesh = mesh.submeshes[0]
  var drawArgument = MTLDrawIndexedPrimitivesIndirectArguments()
  drawArgument.indexCount = UInt32(submesh.indexCount)
  drawArgument.indexStart = UInt32(submesh.indexBufferOffset)
  drawArgument.instanceCount = 1
  drawArgument.baseVertex = 0
  drawArgument.baseInstance = UInt32(modelIndex)
  drawPointer.pointee = drawArgument
  drawPointer = drawPointer.advanced(by: 1)
}
initializeDrawArguments(models: models)

5. Completing the Compute Command Encoder

You’ve done all the preamble and setup code. All that’s left to do now is create a compute command encoder to run the encodeCommands compute shader function. The function will create a render command to render every model.

guard
  let computeEncoder = commandBuffer.makeComputeCommandEncoder()
  else { return }
encodeDraw(encoder: computeEncoder)
useResources(encoder: computeEncoder, models: scene.models)
dispatchThreads(
  encoder: computeEncoder, drawCount: scene.models.count)
computeEncoder.endEncoding()
func encodeDraw(encoder: MTLComputeCommandEncoder) {
  encoder.setComputePipelineState(icbPipelineState)
  encoder.setBuffer(
    icbBuffer, offset: 0, index: ICBBuffer.index)
  encoder.setBuffer(
    uniformsBuffer, offset: 0, index: UniformsBuffer.index)
  encoder.setBuffer(
    modelsBuffer, offset: 0, index: ModelsBuffer.index)
  encoder.setBuffer(
    modelParamsBuffer, offset: 0, index: ModelParamsBuffer.index)
  encoder.setBuffer(
    drawArgumentsBuffer, offset: 0, index: DrawArgumentsBuffer.index)
}
  func useResources(
    encoder: MTLComputeCommandEncoder, models: [Model]
  ) {
encoder.useResource(icb, usage: .write)
useResources(encoder: renderEncoder, models: scene.models)
func dispatchThreads(
  encoder: MTLComputeCommandEncoder,
  drawCount: Int
) {
  let threadExecutionWidth = icbPipelineState.threadExecutionWidth
  let threads = MTLSize(width: drawCount, height: 1, depth: 1)
  let threadsPerThreadgroup = MTLSize(
    width: threadExecutionWidth, height: 1, depth: 1)
  encoder.dispatchThreads(
    threads,
    threadsPerThreadgroup: threadsPerThreadgroup)
}
The final render
Ztu rojan cizcin

The down-sized render pass
Lza yetg-tazuv quxmom cess

Challenge

In the challenge folder for this chapter, you’ll find an app similar to the one in the previous chapter that includes rendering multiple submeshes. Your challenge is to review this app and ensure you understand how the code all fits together.

Key Points

  • Indirect command buffers contain a list of render or compute encoder commands.
  • You can create the list of commands on the CPU at the start of your app. For simple static rendering work, this will be fine.
  • Argument buffers should match your shader function parameters. When setting up indirect commands with argument buffers double check that they do.
  • Argument buffers point to other resources. When you pass an argument buffer to the GPU, the resources aren’t automatically available to the GPU. You must also useResource. If you don’t you’ll get unexpected rendering results.
  • When you have a complex scene where you may be determining whether models are in frame, or setting level of detail, create the render loop on the GPU using a kernel function.

Where to Go From Here?

In this chapter, you moved the bulk of the rendering work in each frame on to the GPU. The GPU is now responsible for creating render commands, and which objects you actually render. Although shifting work to the GPU is generally a good thing, so that you can simultaneously do expensive tasks like physics and collisions on the CPU, you should also follow that up with performance analysis to see where the bottlenecks are. You can read more about this in Chapter 31, “Performance Optimization”.

Apple sample: Modern Rendering With Metal
Oqxje neczpu: Yujupy Zuqcoqorc Bulb Mokem

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2025 Kodeco Inc.

You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now