Chapters

Hide chapters

Metal by Tutorials

Second Edition · iOS 13 · Swift 5.1 · Xcode 11

Before You Begin

Section 0: 3 chapters
Show chapters Hide chapters

Section I: The Player

Section 1: 8 chapters
Show chapters Hide chapters

Section III: The Effects

Section 3: 10 chapters
Show chapters Hide chapters

21. Metal Performance Shaders
Written by Marius Horga

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now

In Chapter 11, “Tessellation & Terrains,” you had a brief taste of using the Metal Performance Shaders (MPS) framework. MPS consists of low-level, fine-tuned, high-performance kernels that run off the shelf with minimal configuration. In this chapter, you’ll dive a bit deeper into the world of MPS.

Overview

The MPS kernels make use of data-parallel primitives that are written in such a way that they can take advantage of each GPU family’s characteristics. The developer doesn’t have to care about which GPU the code needs to run on, because the MPS kernels have multiple versions of the same kernel written for every GPU you might use. Think of MPS kernels as convenient black boxes that work efficiently and seamlessly with your command buffer. Simply give it the desired effect, a source and destination resource (buffer or texture), and then encode GPU commands on the fly!

The Sobel filter is a great way to detect edges in an image. In the projects folder for this chapter, open and run sobel.playground and you’ll see such an effect (left: original image, right: Sobel filter applied):

Assuming you already created a device object, a command queue, a command buffer and a texture object for the input image, there are only two more lines of code you need to apply the Sobel filter to your input image:

let shader = MPSImageSobel(device: device)
shader.encode(commandBuffer: commandBuffer, 
              sourceTexture: inputImage,
              destinationTexture: drawable.texture)

MPS kernels are not thread-safe, so it’s not recommended to run the same kernel on multiple threads that are all writing to the same command buffer concurrently.

Moreover, you should always allocate your kernel to only one device, because the kernel’s init(device:) method could allocate resources that are held by the current device and might not be available to another device.

Note: MPS kernels provide a copy(with:device:) method that allows them to be copied to another device.

The MPS framework serves a variety of purposes beyond image filters. One of those areas is Neural Networks, which is not covered in this book. Instead, you’ll stay focused on:

  1. Image processing
  2. Matrix/vector mathematics
  3. Ray tracing

Image processing

There are a few dozen MPS image filters, among the most common being:

(6 * 1  +  7 * 2  +  3 * 1  + 
 4 * 2  +  9 * 4  +  8 * 2  + 
 9 * 1  +  2 * 2  +  3 * 1) / 16 = 6

(0 * 1  +  0 * 2  +  0 * 1  + 
 0 * 2  +  6 * 4  +  7 * 2  + 
 0 * 1  +  4 * 2  +  9 * 1) / 9 = 6

Bloom

The bloom effect is quite a spectacular one. It amplifies the brightness of objects in the scene and makes them look luminous as if they’re emitting light themselves.

The project

In the starter folder, open the Bloom project. Build and run, and you’ll see a familiar scene from previous chapters.

import MetalPerformanceShaders
var outputTexture: MTLTexture!
var finalTexture: MTLTexture!
outputTexture = 
    createTexture(pixelFormat: view.colorPixelFormat,
                  size: size)
finalTexture = 
    createTexture(pixelFormat: view.colorPixelFormat,
                  size: size)

Image Threshold To Zero

Using the following test, MPSImageThresholdToZero is a filter that returns either the original value for each pixel having a value greater than a specified brightness threshold or 0:

destinationColor = sourceColor > thresholdValue 
                     ? sourceColor : 0
let brightness =
    MPSImageThresholdToZero(device: Renderer.device,
                            thresholdValue: 0.2,
                            linearGrayColorTransform: nil)
brightness.label = "MPS brightness"
brightness.encode(commandBuffer: commandBuffer,
                  sourceTexture: drawable.texture,
                  destinationTexture: outputTexture)
metalView.framebufferOnly = false
finalTexture = outputTexture
guard let blitEncoder = commandBuffer.makeBlitCommandEncoder() 
      else { return }
let origin = MTLOriginMake(0, 0, 0)
let size = MTLSizeMake(drawable.texture.width, 
                       drawable.texture.height, 
					   1)
blitEncoder.copy(from: finalTexture, sourceSlice: 0, 
                 sourceLevel: 0,
                 sourceOrigin: origin, sourceSize: size,
                 to: drawable.texture, destinationSlice: 0,
                 destinationLevel: 0, destinationOrigin: origin)
blitEncoder.endEncoding()

Gaussian blur

MPSImageGaussianBlur is a filter that convolves an image with a Gaussian blur with a given sigma value (the amount of blur) in both the X and Y directions.

let blur = MPSImageGaussianBlur(device: Renderer.device,
                                sigma: 9.0)
blur.label = "MPS blur"
blur.encode(commandBuffer: commandBuffer,
            inPlaceTexture: &outputTexture,
            fallbackCopyAllocator: nil)

Image add

The final part of creating the bloom effect is to add the pixels of this blurred image to the pixels of the original render.

let add = MPSImageAdd(device: Renderer.device)
add.encode(commandBuffer: commandBuffer, 
           primaryTexture: drawable.texture, 
           secondaryTexture: outputTexture, 
           destinationTexture: finalTexture)
finalTexture = outputTexture

Matrix/vector mathematics

You learned in the previous section how you could quickly apply a series of MPS filters that are provided by the framework. But what if you wanted to make your own filters?

import MetalPerformanceShaders

guard let device = MTLCreateSystemDefaultDevice(),
      let commandQueue = device.makeCommandQueue() 
else { fatalError() }

let size = 4
let count = size * size

guard let commandBuffer = commandQueue.makeCommandBuffer() 
else { fatalError() }

commandBuffer.commit()
commandBuffer.waitUntilCompleted()
func createMPSMatrix(withRepeatingValue: Float) -> MPSMatrix {
  // 1
  let rowBytes = MPSMatrixDescriptor.rowBytes(
                                   forColumns: size,
                                   dataType: .float32)
  // 2
  let array = [Float](repeating: withRepeatingValue, 
                      count: count)
  // 3
  guard let buffer = device.makeBuffer(bytes: array,
                                       length: size * rowBytes,
                                       options: []) 
  else { fatalError() }
  // 4
  let matrixDescriptor = MPSMatrixDescriptor(
                                   rows: size,
                                   columns: size,
                                   rowBytes: rowBytes,
                                   dataType: .float32)
                                             
  return MPSMatrix(buffer: buffer, descriptor: matrixDescriptor)
}
let A = createMPSMatrix(withRepeatingValue: 3)
let B = createMPSMatrix(withRepeatingValue: 2)
let C = createMPSMatrix(withRepeatingValue: 1)
let multiplicationKernel = MPSMatrixMultiplication(
                              device: device,
                              transposeLeft: false,
                              transposeRight: false,
                              resultRows: size,
                              resultColumns: size,
                              interiorColumns: size,
                              alpha: 1.0,
                              beta: 0.0)
multiplicationKernel.encode(commandBuffer:commandBuffer,
                            leftMatrix: A,
                            rightMatrix: B,
                            resultMatrix: C)
// 1
let contents = C.data.contents()
let pointer = contents.bindMemory(to: Float.self, 
                                  capacity: count)
// 2
(0..<count).map {
  pointer.advanced(by: $0).pointee
}

Ray tracing

In Chapter 18, “Rendering with Rays,” you looked briefly at ray tracing and path tracing. In this section of the chapter, you’re going to implement an MPS-accelerated raytracer, which is, in fact, a path tracer variant using the Monte Carlo integration.

For each pixel on the screen:
  Reset the pixel color C.
    For each sample (random direction): 
      Shoot a ray and trace its path.
      C += incoming radiance from ray.
    C /= number of samples 

1. Primary rays

Primary rays render the equivalent of a rasterized scene, but the most expensive part of ray tracing is finding all of the intersections between rays and triangles in the scene.

1.0 The starter app

Time for some coding! Open the starter project named Raytracing, build and run it. Although you won’t see anything but a dull solid color, the starter project contains much of the setup needed.

1.1 Create the render target

As you go through the various passes, you’ll write to a render target texture. You’ll accumulate values, and this will be the texture you render onto the screen quad.

var renderTarget: MTLTexture!
let renderTargetDescriptor = MTLTextureDescriptor()
renderTargetDescriptor.pixelFormat = .rgba32Float
renderTargetDescriptor.textureType = .type2D
renderTargetDescriptor.width = Int(size.width)
renderTargetDescriptor.height = Int(size.height)
renderTargetDescriptor.storageMode = .private
renderTargetDescriptor.usage = [.shaderRead, .shaderWrite]
renderTarget = device.makeTexture(descriptor: renderTargetDescriptor)

1.2 Create the Ray Intersector

When you generate the primary rays in a kernel, you send the results to a Ray struct array of a particular format. The ray intersector decides this format.

var intersector: MPSRayIntersector!
let rayStride = 
MemoryLayout<MPSRayOriginMinDistanceDirectionMaxDistance>.stride 
  + MemoryLayout<float3>.stride
func buildIntersector() {
  intersector = MPSRayIntersector(device: device)
  intersector?.rayDataType 
      = .originMinDistanceDirectionMaxDistance
  intersector?.rayStride = rayStride
}
buildIntersector()

1.3 Generate primary rays

Before you can generate the primary rays, you need to create a new compute pipeline state and a buffer to hold the generated rays. At the top of Renderer, add this code:

var rayPipeline: MTLComputePipelineState!
var rayBuffer: MTLBuffer!
var shadowRayBuffer: MTLBuffer!
let rayCount = Int(size.width * size.height)
rayBuffer = device.makeBuffer(length: rayStride * rayCount,
                              options: .storageModePrivate)
shadowRayBuffer = 
    device.makeBuffer(length: rayStride * rayCount,
                      options: .storageModePrivate)
let computeDescriptor = MTLComputePipelineDescriptor()
computeDescriptor.threadGroupSizeIsMultipleOfThreadExecutionWidth 
    = true
computeDescriptor.computeFunction = library.makeFunction(
                                           name: "primaryRays")
rayPipeline = try device.makeComputePipelineState(
                                 descriptor: computeDescriptor,
                                 options: [],
                                 reflection: nil)
// 1
let width = Int(size.width)
let height = Int(size.height)
let threadsPerGroup = MTLSizeMake(8, 8, 1)
let threadGroups = 
    MTLSizeMake((width + threadsPerGroup.width - 1)
                                  / threadsPerGroup.width,
                (height + threadsPerGroup.height - 1)
                                  / threadsPerGroup.height,
                 1)
// 2
var computeEncoder = commandBuffer.makeComputeCommandEncoder()
computeEncoder?.label = "Generate Rays"
computeEncoder?.setBuffer(uniformBuffer, 
                          offset: uniformBufferOffset,
                          index: 0)
computeEncoder?.setBuffer(rayBuffer, offset: 0, index: 1)
computeEncoder?.setBuffer(randomBuffer, 
                          offset: randomBufferOffset,
                          index: 2)
computeEncoder?.setTexture(renderTarget, index: 0)
computeEncoder?.setComputePipelineState(rayPipeline)
computeEncoder?.dispatchThreadgroups(threadGroups,
  threadsPerThreadgroup: threadsPerGroup)
computeEncoder?.endEncoding()
struct Ray {
  packed_float3 origin;
  float minDistance;
  packed_float3 direction;
  float maxDistance;
  float3 color;
};
kernel void 
    primaryRays(constant Uniforms & uniforms [[buffer(0)]],
             device Ray *rays [[buffer(1)]],
             device float2 *random [[buffer(2)]],
             texture2d<float, access::write> t [[texture(0)]],
             uint2 tid [[thread_position_in_grid]]) {
  // 1
  if (tid.x < uniforms.width && tid.y < uniforms.height) {
    // 2
    float2 pixel = (float2)tid;
    float2 r = random[(tid.y % 16) * 16 + (tid.x % 16)];
    pixel += r;
    float2 uv = 
        (float2)pixel / float2(uniforms.width, uniforms.height);
    uv = uv * 2.0 - 1.0;
    // 3
    constant Camera & camera = uniforms.camera;
    unsigned int rayIdx = tid.y * uniforms.width + tid.x;
    device Ray & ray = rays[rayIdx];
    ray.origin = camera.position;
    ray.direction = 
        normalize(uv.x * camera.right + uv.y * camera.up 
                     + camera.forward);
    ray.minDistance = 0;
    ray.maxDistance = INFINITY;
    ray.color = float3(1.0);
    // 4
    t.write(float4(0.0), tid);
  }
}

1.4 Accumulation

You’re writing to the render target texture that you’ll combine with the other textures; you’ll create these other textures later for shadows and secondary rays, and then render to the background quad. You’ll set this render up now so that you can see your progress.

var accumulatePipeline: MTLComputePipelineState!
var accumulationTarget: MTLTexture!
computeDescriptor.computeFunction = library.makeFunction(
  name: "accumulateKernel")
accumulatePipeline = try device.makeComputePipelineState(
  descriptor: computeDescriptor, options: [], reflection: nil)
accumulationTarget = device.makeTexture(
  descriptor: renderTargetDescriptor)
computeEncoder = commandBuffer.makeComputeCommandEncoder()
computeEncoder?.label = "Accumulation"
computeEncoder?.setBuffer(uniformBuffer, 
                          offset: uniformBufferOffset,
                          index: 0)
computeEncoder?.setTexture(renderTarget, index: 0)
computeEncoder?.setTexture(accumulationTarget, index: 1)
computeEncoder?.setComputePipelineState(accumulatePipeline)
computeEncoder?.dispatchThreadgroups(threadGroups,
  threadsPerThreadgroup: threadsPerGroup)
computeEncoder?.endEncoding()
renderEncoder.setFragmentTexture(accumulationTarget, index: 0)
kernel void accumulateKernel(constant Uniforms & uniforms,
                   texture2d<float> renderTex,
                   texture2d<float, access::read_write> t,
                   uint2 tid [[thread_position_in_grid]])
{
  if (tid.x < uniforms.width && tid.y < uniforms.height) {
    // 1
    float3 color = renderTex.read(tid).xyz;
    if (uniforms.frameIndex > 0) {
      // 2
      float3 prevColor = t.read(tid).xyz;
      prevColor *= uniforms.frameIndex;
      color += prevColor;
      color /= (uniforms.frameIndex + 1);
    }
    t.write(float4(color, 1.0), tid);
  }
}
constexpr sampler s(min_filter::nearest,
                    mag_filter::nearest,
                    mip_filter::none);
float3 color = tex.sample(s, in.uv).xyz;
return float4(color, 1.0);

1.5 Create the acceleration structure

In Renderer.swift, at the top of the class, declare the acceleration structure object:

var accelerationStructure: MPSTriangleAccelerationStructure!
func buildAccelerationStructure() {
  accelerationStructure = 
    MPSTriangleAccelerationStructure(device: device)
  accelerationStructure?.vertexBuffer = vertexPositionBuffer
  accelerationStructure?.triangleCount = vertices.count / 3
  accelerationStructure?.rebuild()
}
buildAccelerationStructure()

1.6 Intersect Rays with the Scene

The next stage is to take the generated rays, and the acceleration structure, and use the intersector to combine them into an intersection buffer that contains all of the hits where a ray coincides with a triangle.

var intersectionBuffer: MTLBuffer!
let intersectionStride = 
MemoryLayout<MPSIntersectionDistancePrimitiveIndexCoordinates>.stride
intersectionBuffer = device.makeBuffer(
  length: intersectionStride * rayCount,
  options: .storageModePrivate)
intersector?.intersectionDataType = .distancePrimitiveIndexCoordinates
intersector?.encodeIntersection(
  commandBuffer: commandBuffer,
  intersectionType: .nearest,
  rayBuffer: rayBuffer,
  rayBufferOffset: 0,
  intersectionBuffer: intersectionBuffer,
  intersectionBufferOffset: 0,
  rayCount: width * height,
  accelerationStructure: accelerationStructure)

1.7 Use intersections for shading

The last step in casting primary rays is shading. This depends on intersection points and vertex attributes, so yet another compute kernel applies the lighting based on this information. At the top of Renderer, add a new pipeline for this new kernel:

var shadePipelineState: MTLComputePipelineState!
computeDescriptor.computeFunction = library.makeFunction(
  name: "shadeKernel")
shadePipelineState = try device.makeComputePipelineState(
  descriptor: computeDescriptor,options: [], reflection: nil)
computeEncoder = commandBuffer.makeComputeCommandEncoder()
computeEncoder?.label = "Shading"
computeEncoder?.setBuffer(uniformBuffer, 
                          offset: uniformBufferOffset,
                          index: 0)
computeEncoder?.setBuffer(rayBuffer, offset: 0, index: 1)
computeEncoder?.setBuffer(shadowRayBuffer, offset: 0, index: 2)
computeEncoder?.setBuffer(intersectionBuffer, offset: 0, 
                          index: 3)
computeEncoder?.setBuffer(vertexColorBuffer, offset: 0, 
                          index: 4)
computeEncoder?.setBuffer(vertexNormalBuffer, offset: 0, 
                          index: 5)
computeEncoder?.setBuffer(randomBuffer, 
                          offset: randomBufferOffset,
                          index: 6)
computeEncoder?.setTexture(renderTarget, index: 0)
computeEncoder?.setComputePipelineState(shadePipelineState!)
computeEncoder?.dispatchThreadgroups(threadGroups,
  threadsPerThreadgroup: threadsPerGroup)
computeEncoder?.endEncoding()
struct Intersection {
  float distance;
  int primitiveIndex;
  float2 coordinates;
};
template<typename T>
inline T interpolateVertexAttribute(device T *attributes, 
                     Intersection intersection) {
  // 1
  float3 uvw;
  uvw.xy = intersection.coordinates;
  uvw.z = 1.0 - uvw.x - uvw.y;
  // 2
  unsigned int triangleIndex = intersection.primitiveIndex;
  T T0 = attributes[triangleIndex * 3 + 0];
  T T1 = attributes[triangleIndex * 3 + 1];
  T T2 = attributes[triangleIndex * 3 + 2];
  return uvw.x * T0 + uvw.y * T1 + uvw.z * T2;
}
kernel void shadeKernel(uint2 tid [[thread_position_in_grid]],
                        constant Uniforms & uniforms,
                        device Ray *rays,
                        device Ray *shadowRays,
                        device Intersection *intersections,
                        device float3 *vertexColors,
                        device float3 *vertexNormals,
                        device float2 *random,
                        texture2d<float, access::write> renderTarget)
{
  if (tid.x < uniforms.width && tid.y < uniforms.height) {
	
  }
}
unsigned int rayIdx = tid.y * uniforms.width + tid.x;
device Ray & ray = rays[rayIdx];
device Ray & shadowRay = shadowRays[rayIdx];
device Intersection & intersection = intersections[rayIdx];
float3 color = ray.color;
// 1
if (ray.maxDistance >= 0.0 && intersection.distance >= 0.0) {
  float3 intersectionPoint = ray.origin + ray.direction
                              * intersection.distance;
  float3 surfaceNormal = 
      interpolateVertexAttribute(vertexNormals,
                                 intersection);
  surfaceNormal = normalize(surfaceNormal);
  // 2
  float2 r = random[(tid.y % 16) * 16 + (tid.x % 16)];
  float3 lightDirection;
  float3 lightColor;
  float lightDistance;
  sampleAreaLight(uniforms.light, r, intersectionPoint,
                  lightDirection, lightColor, lightDistance);
  // 3                
  lightColor *= saturate(dot(surfaceNormal, lightDirection));
  color *= interpolateVertexAttribute(vertexColors, 
                                      intersection);
}
else {
  ray.maxDistance = -1.0;
}
// 4
renderTarget.write(float4(color, 1.0), tid);

renderTarget.write(float4(color, 1.0), tid);

2. Shadow rays

As well as calculating the color of the pixel in the final texture, you’ll need to check if the point is in shadow.

shadowRay.origin = intersectionPoint + surfaceNormal * 1e-3;
shadowRay.direction = lightDirection;
shadowRay.maxDistance = lightDistance - 1e-3;
shadowRay.color = lightColor * color;
shadowRay.maxDistance = -1.0;

var shadowPipeline: MTLComputePipelineState!
computeDescriptor.computeFunction = library.makeFunction(
                                           name: "shadowKernel")
shadowPipeline = 
    try device.makeComputePipelineState(
                      descriptor: computeDescriptor,
                      options: [],
                      reflection: nil)
intersector?.label = "Shadows Intersector"
intersector?.intersectionDataType = .distance
intersector?.encodeIntersection(
                commandBuffer: commandBuffer,
                intersectionType: .any,
                rayBuffer: shadowRayBuffer,
                rayBufferOffset: 0,
                intersectionBuffer: intersectionBuffer!,
                intersectionBufferOffset: 0,
                rayCount: width * height,
                accelerationStructure: accelerationStructure!)
computeEncoder = commandBuffer.makeComputeCommandEncoder()
computeEncoder?.label = "Shadows"
computeEncoder?.setBuffer(uniformBuffer, 
                          offset: uniformBufferOffset,
                          index: 0)
computeEncoder?.setBuffer(shadowRayBuffer, offset: 0, index: 1)
computeEncoder?.setBuffer(intersectionBuffer, offset: 0, 
                          index: 2)
computeEncoder?.setTexture(renderTarget, index: 0)
computeEncoder?.setComputePipelineState(shadowPipeline!)
computeEncoder?.dispatchThreadgroups(
                   threadGroups,
                   threadsPerThreadgroup: threadsPerGroup)
computeEncoder?.endEncoding()
kernel void shadowKernel(uint2 tid [[thread_position_in_grid]],
             constant Uniforms & uniforms,
             device Ray *shadowRays,
             device float *intersections,
             texture2d<float, access::read_write> renderTarget)
{
  if (tid.x < uniforms.width && tid.y < uniforms.height) {
    // 1
    unsigned int rayIdx = tid.y * uniforms.width + tid.x;
    device Ray & shadowRay = shadowRays[rayIdx];
    float intersectionDistance = intersections[rayIdx];
    // 2
    if (shadowRay.maxDistance >= 0.0 
          && intersectionDistance < 0.0) {
      float3 color = shadowRay.color;
      color += renderTarget.read(tid).xyz;
      renderTarget.write(float4(color, 1.0), tid);
    }
  }
}

3. Secondary rays

This scene looks quite dark because you’re not bouncing any light around. In the real world, light bounces off all surfaces in all directions.

for _ in 0..<3 {
    // MARK: generate intersections between rays and model triangles
    // MARK: shading
    // MARK: shadows
}  
// MARK: accumulation
float3 sampleDirection = sampleCosineWeightedHemisphere(r);
sampleDirection = alignHemisphereWithNormal(sampleDirection,
                                            surfaceNormal);
ray.origin = intersectionPoint + surfaceNormal * 1e-3f;
ray.direction = sampleDirection;
ray.color = color;

Where to go from here?

What a great journey this has been. In this chapter, you were able to use the MPS framework to:

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2024 Kodeco Inc.

You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.

Unlock now