Chapters

Hide chapters

Metal by Tutorials

Second Edition · iOS 13 · Swift 5.1 · Xcode 11

Before You Begin

Section 0: 3 chapters
Show chapters Hide chapters

Section I: The Player

Section 1: 8 chapters
Show chapters Hide chapters

Section III: The Effects

Section 3: 10 chapters
Show chapters Hide chapters

3. The Rendering Pipeline
Written by Marius Horga

In this chapter, you’ll take a deep dive through the rendering pipeline and create a Metal app that renders a red cube. Along the way, you’ll discover all of the hardware chips responsible for taking the 3D objects and turning them into the gorgeous pixels that you see on the screen.

The GPU and the CPU

All computers have a Central Processing Unit (CPU) that drives the operations and manages the resources on a computer. They also have a Graphics Processing Unit (GPU).

A GPU is a specialized hardware component that can process images, videos and massive amounts of data really fast. This is called throughput. The throughput is measured by the amount of data processed in a specific unit of time.

A CPU, on the other hand, can’t handle massive amounts of data really fast, but it can process many sequential tasks (one after another) really fast. The time necessary to process a task is called latency.

The ideal setup includes low latency and high throughput. Low latency allows for the serial execution of queued tasks so the CPU can execute the commands without the system becoming slow or unresponsive; and high throughput lets the GPU render videos and games asynchronously without stalling the CPU. Because the GPU has a highly parallelized architecture, specialized in doing the same task repeatedly, and with little or no data transfers, it’s able to process larger amounts of data.

The following diagram shows the major differences between the CPU and the GPU.

The CPU has a large cache memory and a few Arithmetic Logic Unit (ALU) cores. The low latency cache memory on the CPU is used for fast access to temporary resources. The GPU does not have much cache memory and there’s room for more ALU cores which only do calculations without saving partial results to memory.

Also, the CPU typically only has a handful of cores while the GPU has hundreds — even thousands of cores. With more cores, the GPU can split the problem into many smaller parts, each running on a separate core in parallel, thus hiding latency. At the end of processing, the partial results are combined and the final result returned to the CPU. But cores aren’t the only thing that matters!

Besides being slimmed down, GPU cores also have special circuitry for processing geometry and are often called shader cores. These shader cores are responsible for the beautiful colors you see on the screen. The GPU writes a whole frame at a time to fit the entire rendering window. It will then proceed to rendering the next frame as quickly as possible to maintain a good frame rate.

The CPU continues to issue commands to the GPU to keep it busy, but at some point, either the CPU will finish sending commands or the GPU will finish processing the commands it received. To avoid stalling, Metal on the CPU queues up multiple commands in command buffers and will issue new commands, sequentially, for the next frame without having to wait for the GPU to finish the first frame. This way, no matter who finishes the work first, there will be more work available to do.

The GPU part of the graphics pipeline starts once it’s received all of the commands and resources.

The Metal project

You’ve been using Playgrounds to learn about Metal. Playgrounds are great for testing and learning new concepts. In fact, you’ll use Playgrounds occasionally throughout this book, however, it’s important to understand how to set up a full Metal project. Since the Metal framework is almost identical on macOS and iOS, you’ll create a macOS app.

Note: The project files for this chapter’s challenge project also include an iOS target, so you can see the few differences between iOS and macOS. The Metal files will all be shared between the two targets.

Create a new macOS project using the macOS App template.

Name your project Pipeline and ensure the User Interface dropdown is set to Storyboard. Leave all the checkbox options unchecked.

Open Main.storyboard and select View under the View Controller Scene.

In the Identity inspector, change the view from NSView to MTKView.

This sets up the main view as a MetalKit View.

If ViewController.swift doesn’t exist in your project, create a new Cocoa Class file and name the class ViewController. Make it a subclass of NSViewController. Uncheck the XIB file option. An NSViewController is the macOS equivalent to iOS’s UIViewController.

In ViewController.swift, at the top of the file, import the MetalKit framework:

import MetalKit

Then, add this code to viewDidLoad():

guard let metalView = view as? MTKView else {
  fatalError("metal view not set up in storyboard")
}

You now have a choice. You can subclass MTKView and use this view in the storyboard. In that case, the subclass’s draw(_:) will be called every frame and you’d put your drawing code in that method. However, in this book, you’ll set up a Renderer class that conforms to MTKViewDelegate and sets Renderer as a delegate of MTKView. MTKView calls a delegate method every frame, and this is where you’ll place the necessary drawing code.

Note: If you’re coming from a different API world, you might be looking for a game loop construct. You do have the option of extending CAMetalLayer instead of creating the MTKView. You can then use CADisplayLink for the timing; but Apple introduced MetalKit with its protocols to manage the game loop more easily.

The Renderer class

Create a new Swift file named Renderer.swift and replace its contents with the following code:

import MetalKit

class Renderer: NSObject {
  init(metalView: MTKView) {
    super.init()
  }
}

extension Renderer: MTKViewDelegate {
  func mtkView(_ view: MTKView, drawableSizeWillChange size: CGSize) {
  }
  
  func draw(in view: MTKView) {
    print("draw")
  }
}

Here, you create an initializer and make Renderer conform to MTKViewDelegate with the two MTKView delegate methods:

  • mtkView(_:drawableSizeWillChange:): Gets called every time the size of the window changes. This allows you to update the render coordinate system.
  • draw(in:): Gets called every frame.

In ViewController.swift, add a property to hold the renderer:

var renderer: Renderer?

At the end of viewDidLoad(), initialize the renderer:

renderer = Renderer(metalView: metalView)

Initialization

Just as you did in the first chapter, you need to set up the Metal environment.

Metal has a major advantage over OpenGL in that you’re able to instantiate some objects up-front rather than create them during each frame. The following diagram indicates some of the objects you can create at the start of the app.

  • MTLDevice: The software reference to the GPU hardware device.
  • MTLCommandQueue: Responsible for creating and organizing MTLCommandBuffers each frame.
  • MTLLibrary: Contains the source code from your vertex and fragment shader functions.
  • MTLRenderPipelineState: Sets the information for the draw, such as which shader functions to use, what depth and color settings to use and how to read the vertex data.
  • MTLBuffer: Holds data, such as vertex information, in a form that you can send to the GPU.

Typically, you’ll have one MTLDevice, one MTLCommandQueue and one MTLLibrary object in your app. You’ll also have several MTLRenderPipelineState objects that will define the various pipeline states, as well as several MTLBuffers to hold the data.

Before you can use these objects, however, you need to initialize them. Add these properties to the Renderer class:

static var device: MTLDevice!
static var commandQueue: MTLCommandQueue!
var mesh: MTKMesh!
var vertexBuffer: MTLBuffer!
var pipelineState: MTLRenderPipelineState!

These are the properties you need to keep references to the different objects. They are currently all implicitly unwrapped optionals for convenience, but you can change this after you’ve completed the initialization. Also, you won’t need to keep a reference to the MTLLibrary, so there’s no need to create it.

You’re using class properties for the device and the command queue to ensure that only one of each exists. In rare cases, you may require more than one — but in most apps, one will be plenty.

Still in Renderer.swift, add this code to init(metalView:), before super.init():

guard
  let device = MTLCreateSystemDefaultDevice(),
  let commandQueue = device.makeCommandQueue() else {
    fatalError("GPU not available")
}
Renderer.device = device
Renderer.commandQueue = commandQueue
metalView.device = device

This initializes the GPU and creates the command queue.

Finally, after super.init(), add this:

metalView.clearColor = MTLClearColor(red: 1.0, green: 1.0,
                                     blue: 0.8, alpha: 1.0)
metalView.delegate = self

This sets metalView.clearColor to a cream color. It also sets Renderer as the delegate for metalView so that the view will call the MTKViewDelegate drawing methods.

Build and run the app to make sure everything’s set up and working. If all’s well, you should see a plain gray window. In the debug console, you’ll see the word “draw” repeatedly. Use this to verify that your app is calling draw(in:) for every frame.

Note: You won’t see metalView’s cream color because you’re not asking the GPU to do any drawing yet.

Set up the data

A class to build 3D primitive meshes is always useful. In the previous chapter, you created a sphere and a cone. In this chapter, you’ll set up a class for creating 3D shape primitives, and you’ll add a cube to it.

Create a new Swift file named Primitive.swift and replace the default code with this:

import MetalKit

class Primitive {
  static func makeCube(device: MTLDevice, size: Float) -> MDLMesh {
    let allocator = MTKMeshBufferAllocator(device: device)
    let mesh = MDLMesh(boxWithExtent: [size, size, size], 
                       segments: [1, 1, 1],
                       inwardNormals: false, 
                       geometryType: .triangles,
                       allocator: allocator)
    return mesh
  }
}

This method returns a cube and is similar to the code used to draw a sphere and a cone from the previous chapter.

In Renderer.swift, in init(metalView:), before calling super.init(), set up the mesh:

let mdlMesh = Primitive.makeCube(device: device, size: 1)
do {
  mesh = try MTKMesh(mesh: mdlMesh, device: device)
} catch let error {
  print(error.localizedDescription)
}

Then, set up the MTLBuffer that contains the vertex data you’ll send to the GPU.

vertexBuffer = mesh.vertexBuffers[0].buffer

This puts the mesh data in an MTLBuffer. Now, you need to set up the pipeline state so that the GPU will know how to render the data.

First, set up the MTLLibrary and ensure that the vertex and fragment shader functions are present.

Continue adding code before super.init():

let library = device.makeDefaultLibrary()
let vertexFunction = library?.makeFunction(name: "vertex_main")
let fragmentFunction = 
    library?.makeFunction(name: "fragment_main")

You’ll create these shader functions later in this chapter. Unlike OpenGL shaders, these are compiled when you compile your project which is more efficient than compiling on the fly. The result is stored in the library.

Now, create the pipeline state:

let pipelineDescriptor = MTLRenderPipelineDescriptor()
pipelineDescriptor.vertexFunction = vertexFunction
pipelineDescriptor.fragmentFunction = fragmentFunction
pipelineDescriptor.vertexDescriptor = 
      MTKMetalVertexDescriptorFromModelIO(mdlMesh.vertexDescriptor)
pipelineDescriptor.colorAttachments[0].pixelFormat = metalView.colorPixelFormat
do {
  pipelineState = 
    try device.makeRenderPipelineState(descriptor: pipelineDescriptor)
} catch let error {
  fatalError(error.localizedDescription)
}

This sets up a potential state for the GPU. The GPU needs to know its complete state before it can start managing vertices. You set the two shader functions the GPU will call, and you also set the pixel format for the texture to which the GPU will write.

You also set the pipeline’s vertex descriptor. This is how the GPU will know how to interpret the vertex data that you’ll present in the mesh data MTLBuffer.

If you need to call different vertex or fragment functions, or use a different data layout, then you’ll need more pipeline states. Creating pipeline states is relatively time-consuming which is why you do it up-front, but switching pipeline states during frames is fast and efficient.

The initialization is complete and your project will compile. However, if you try to run it, you’ll get an error because you haven’t yet set up the shader functions.

Render frames

In Renderer.swift, using the same commands as in Chapter 1, “Hello Metal!” replace the print statement in draw(in:) with this code:

guard 
  let descriptor = view.currentRenderPassDescriptor,
  let commandBuffer = Renderer.commandQueue.makeCommandBuffer(),
  let renderEncoder = 
    commandBuffer.makeRenderCommandEncoder(
        descriptor: descriptor) else {
    return
}

// drawing code goes here

renderEncoder.endEncoding()
guard let drawable = view.currentDrawable else {
  return
}
commandBuffer.present(drawable)
commandBuffer.commit()

This sets up the render command encoder and presents the view’s drawable texture to the GPU.

Drawing

On the CPU side, to prepare the GPU, you need to give it the data and the pipeline state. Then, you will issue the draw call.

Still in draw(in:), replace the comment:

// drawing code goes here

With:

renderEncoder.setRenderPipelineState(pipelineState)
renderEncoder.setVertexBuffer(vertexBuffer, offset: 0, index: 0)
for submesh in mesh.submeshes {
  renderEncoder.drawIndexedPrimitives(type: .triangle,
              indexCount: submesh.indexCount,
              indexType: submesh.indexType,
              indexBuffer: submesh.indexBuffer.buffer,
              indexBufferOffset: submesh.indexBuffer.offset)
}

You’ve now set up the GPU commands to set the pipeline state, the vertex buffer, and perform the draw calls on the mesh’s submeshes. When you commit the command buffer at the end of draw(in:), this indicates to the GPU that all the data and the pipeline are ready and the GPU can take over.

The rendering pipeline

You finally get to investigate the GPU pipeline! In the following diagram, you can see the stages of the pipeline.

The graphics pipeline takes the vertices through multiple stages during which the vertices have their coordinates transformed between various spaces. You’ll read more about coordinate spaces in Chapter 4, “Coordinate Spaces.”

As a Metal programmer, you’re only concerned about the Vertex and Fragment Processing stages since they’re the only two programmable stages. Later in the chapter, you’ll write both a vertex shader and a fragment shader. For all the non-programmable pipeline stages, such as Vertex Fetch, Primitive Assembly and Rasterization, the GPU has specially designed hardware units to serve those stages.

1 - Vertex Fetch

The name of this stage varies among various graphics Application Programming Interfaces (APIs). For example, DirectX calls it Input Assembler.

To start rendering 3D content, you first need a scene. A scene consists of models that have meshes of vertices. One of the simplest models is the cube which has 6 faces (12 triangles).

As you saw in the previous chapter, you use a vertex descriptor to define the way vertices will be read in along with their attributes such as position, texture coordinates, normal and color.

You do have the option not to use a vertex descriptor and just send an array of vertices in an MTLBuffer, however, if you decide not to use one, you’ll need to know how the vertex buffer is organized ahead of time.

When the GPU fetches the vertex buffer, the MTLRenderCommandEncoder draw call tells the GPU whether the buffer is indexed.

If the buffer is not indexed, the GPU assumes the buffer is an array and reads in one element at a time in order.

In the previous chapter, you saw how Model I/O imports .obj files and sets up their buffers indexed by submesh. This indexing is important because vertices are cached for reuse.

For example, a cube has twelve triangles and eight vertices (at the corners). If you don’t index, you’ll have to specify the vertices for each triangle and send thirty-six vertices to the GPU.

This may not sound like a lot, but in a model that has several thousand vertices, vertex caching is important!

There is also a second cache for shaded vertices so that vertices that are accessed multiple times are only shaded once. A shaded vertex is one to which color was already applied. But that happens in the next stage.

A special hardware unit called the Scheduler sends the vertices and their attributes on to the Vertex Processing stage.

2 - Vertex Processing

In this stage, vertices are processed individually. You write code to calculate per-vertex lighting and color. More importantly, you send vertex coordinates through various coordinate spaces to reach their position in the final framebuffer.

You learned briefly about shader functions and about the Metal Shading Language (MSL) in Chapter 1, “Hello Metal!” Now it’s time to see what happens under the hood at the hardware level.

Take a look at this modern architecture of an AMD GPU:

Going top-down, the GPU has:

  • 1 Graphics Command Processor: This coordinates the work processes.
  • 4 Shader Engines (SE): An SE is an organizational unit on the GPU that can serve an entire pipeline. Each SE has a geometry processor, a rasterizer and Compute Units.
  • 9 Compute Units (CU): A CU is nothing more than a group of shader cores.
  • 64 shader cores: A shader core is the basic building block of the GPU where all of the shading work is done.

In total, the 36 CUs have 2304 shader cores. Compare that to the number of cores in your quad-core CPU. Not fair, I know! :]

For mobile devices, the story is a little different. For comparison, take a look at the following image showing a GPU similar to those in recent iOS devices. Instead of having SEs and CUs, the PowerVR GPU has Unified Shading Clusters (USC).

This particular GPU model has 6 USCs and 32 cores per USC for a total of only 192 cores.

Note: The iPhone X had the first mobile GPU entirely designed in-house by Apple. Unfortunately, Apple has not made the GPU hardware specifications public.

So what can you do with that many cores? Since these cores are specialized in both vertex and fragment shading, one obvious thing to do is give all the cores work to do in parallel so that the processing of vertices or fragments is done faster. There are a few rules, though.

Inside a CU, you can only process either vertices or fragments at one time. Good thing there’s thirty-six of those! Another rule is that you can only process one shader function per SE. Having four SE’s lets you combine work in interesting and useful ways. For example, you can run one fragment shader on one SE and a second fragment shader on a second SE at one time. Or you can separate your vertex shader from your fragment shader and have them run in parallel but on different SEs.

It’s now time to see vertex processing in action! The vertex shader you’re about to write is minimal but encapsulates most of the necessary vertex shader syntax you’ll need in this and subsequent chapters.

Create a new file using the Metal File template and name it Shaders.metal. Then, add this code at the end of the file:

// 1
struct VertexIn {
  float4 position [[attribute(0)]];
};

// 2
vertex float4 vertex_main(const VertexIn vertexIn [[stage_in]]) {
  return vertexIn.position;
}

Going through this code:

  1. Create a struct VertexIn to describe the vertex attributes that match the vertex descriptor you set up earlier. In this case, just position.
  2. Implement a vertex shader, vertex_main, that takes in VertexIn structs and returns vertex positions as float4 types.

Remember that vertices are indexed in the vertex buffer. The vertex shader gets the current index via the [[stage_in]] attribute and unpacks the VertexIn struct cached for the vertex at the current index.

Compute Units can process (at one time) batches of vertices up to their maximum number of shader cores. This batch can fit entirely in the CU cache and vertices can thus be reused as needed. The batch will keep the CU busy until the processing is done but other CUs should become available to process the next batch.

As soon as the vertex processing is done, the cache is cleared for the next batches of vertices. At this point, vertices are now ordered and grouped, ready to be sent to the primitive assembly stage.

To recap, the CPU sent the GPU a vertex buffer that you created from the model’s mesh. You configured the vertex buffer using a vertex descriptor that tells the GPU how the vertex data is structured. On the GPU, you created a struct to encapsulate the vertex attributes. The vertex shader takes in this struct, as a function argument, and through the [[stage_in]] qualifier, acknowledges that position comes from the CPU via the [[attribute(0)]] position in the vertex buffer. The vertex shader then processes all the vertices and returns their positions as a float4.

Note that when you use a vertex descriptor with attributes, you don’t have to match types. The MTLBuffer position is a float3, whereas VertexIn struct can read the position as a float4.

A special hardware unit called Distributer sends the grouped blocks of vertices on to the Primitive Assembly stage.

3 - Primitive Assembly

The previous stage sent processed vertices grouped into blocks of data to this stage. The important thing to keep in mind is that vertices belonging to the same geometrical shape (primitive) are always in the same block. That means that the one vertex of a point, or the two vertices of a line, or the three vertices of a triangle, will always be in the same block, hence a second block fetch will never be necessary.

Along with vertices, the CPU also sends vertex connectivity information when it issues the draw call command, like this:

renderEncoder.drawIndexedPrimitives(type: .triangle,
                  indexCount: submesh.indexCount,
                  indexType: submesh.indexType,
                  indexBuffer: submesh.indexBuffer.buffer,
                  indexBufferOffset: 0)

The first argument of the draw function contains the most important information about vertex connectivity. In this case, it tells the GPU that it should draw triangles from the vertex buffer it sent.

The Metal API provides five primitive types:

  • point: For each vertex, rasterize a point. You can specify the size of a point that has the attribute [[point_size]] in the vertex shader.
  • line: For each pair of vertices, rasterize a line between them. If a vertex was already included in a line, it cannot be included again in other lines. The last vertex is ignored if there are an odd number of vertices.
  • lineStrip: Same as a simple line, except that the line strip connects all adjacent vertices and forms a poly-line. Each vertex (except the first) is connected to the previous vertex.
  • triangle: For every sequence of three vertices, rasterize a triangle. The last vertices are ignored if they cannot form another triangle.
  • triangleStrip: Same as a simple triangle, except adjacent vertices can be connected to other triangles as well.

There is one more primitive type called a patch but this needs special treatment. You will read more about patches in Chapter 11, “Tessellation and Terrains.”

As you read in the previous chapter, the pipeline specifies the winding order of the vertices. If the winding order is counter-clockwise, and the triangle vertex order is counter-clockwise, it means they are front-faced. Otherwise, they are back-faced and can be culled since we cannot see their color and lighting.

Primitives will be culled when they are totally occluded by other primitives; however, when they are only partially off-screen, they’ll be clipped.

For efficiency, you should set winding order and enable back-face culling in the pipeline state.

At this point, primitives are fully assembled from connected vertices and they move on to the rasterizer.

4 - Rasterization

There are two modern rendering techniques currently evolving on separate paths but sometimes used together: ray tracing and rasterization. They are quite different; both have pros and cons.

Ray tracing — which you’ll read more about in Chapter 18, “Rendering with Rays” — is preferred when rendering content that is static and far away, while rasterization is preferred when the content is closer to the camera and more dynamic.

With ray tracing, for each pixel on the screen, it sends a ray into the scene to see if there’s an intersection with an object. If yes, change the pixel color to that object’s color, but only if the object is closer to the screen than the previously saved object for the current pixel.

Rasterization works the other way around: for each object in the scene, send rays back into the screen and check which pixels are covered by the object. Depth information is kept the same way as for ray tracing, so it will update the pixel color if the current object is closer than the previously saved one.

At this point, all connected vertices sent from the previous stage need to be represented on a two-dimensional grid using their X and Y coordinates. This step is known as the triangle setup.

Here is where the rasterizer needs to calculate the slope or steepness of the line segments between any two vertices. When the three slopes for the three vertices are known, the triangle can be formed from these three edges.

Next, a process called scan conversion runs on each line of the screen to look for intersections and to determine what is visible and what is not. To draw on the screen at this point, only the vertices and the slopes they determine are needed.

The scan algorithm determines if all the points on a line segment, or all the points inside of a triangle are visible, in which case the triangle is filled with color entirely.

For mobile devices, the rasterization takes advantage of the tiled architecture of PowerVR GPUs by rasterizing the primitives on a 32x32 tile grid in parallel. In this case, 32 is the number of screen pixels assigned to a tile but this size perfectly fits the number of cores in a USC.

What if one object is behind another object? How can the rasterizer determine which object to render? This hidden surface removal problem can be solved by using stored depth information (early-Z testing) to determine whether each point is in front of other points in the scene.

After rasterization is finished, three more specialized hardware units take the stage:

  • A buffer called Hierarchical-Z is responsible for removing fragments that were marked for culling by the rasterizer.
  • The Z and Stencil Test unit then removes non-visible fragments by comparing them against the depth and stencil buffer.
  • Finally, the Interpolator unit takes the remaining visible fragments and generates fragment attributes from the assembled triangle attributes.

At this point, the Scheduler unit again dispatches work to the shader cores, but this time it’s the rasterized fragments sent for Fragment Processing.

5 - Fragment Processing

Time for a quick review of the pipeline.

  • The Vertex Fetch unit grabs vertices from the memory and passes them to the Scheduler unit.
  • The Scheduler unit knows which shader cores are available so it dispatches work on them.
  • After work is done, the Distributer unit knows if this work was Vertex or Fragment Processing.
  • If it was Vertex Processing work, it sends the result to the Primitive Assembly unit. This path continues to the Rasterization unit and then back to the Scheduler unit.
  • If it was Fragment Processing work, it sends the result to the Color Writing unit.
  • Finally, the colored pixels are sent back to the memory.

The primitive processing in the previous stages was sequential because there is only one Primitive Assembly unit and one Rasterization unit. However, as soon as fragments reach the Scheduler unit, work can be forked (divided) into many tiny parts, and each part is given to an available shader core.

Hundreds or even thousands of cores are now doing parallel processing. When the work is finished, the results will be joined (merged) and sent to the memory, again sequentially.

The fragment processing stage is another programmable stage. You create a fragment shader function that will receive the lighting, texture coordinate, depth and color information that the vertex function outputs.

The fragment shader output is a single color for that fragment. Each of these fragments will contribute to the color of the final pixel in the framebuffer. All the attributes are interpolated for each fragment.

For example, to render this triangle, the vertex function would process three vertices with the colors red, green and blue. As the diagram shows, each fragment that makes up this triangle is interpolated from these three colors. Linear interpolation simply averages the color at each point on the line between two endpoints. If one endpoint has red color, and the other has green color, the midpoint on the line between them will be yellow. And so on.

The interpolation equation is parametric and has this form, where parameter p is the percentage (or a range from 0 to 1) of a color’s presence:

newColor = p * oldColor1 + (1 - p) * oldColor2

Color is easy to visualize, but all the other vertex function outputs are also similarly interpolated for each fragment.

Note: If you don’t want a vertex output to be interpolated, add the attribute [[flat]] to its definition.

In Shaders.Metal, add the fragment function to the end of the file:

fragment float4 fragment_main() {
  return float4(1, 0, 0, 1);
}

This is the simplest fragment function possible. You return the interpolated color red in the form of a float4. All the fragments that make up the cube will be red. The GPU takes the fragments and does a series of post-processing tests:

  • alpha-testing determines which opaque objects are drawn and which are not based on depth testing.
  • In the case of translucent objects, alpha-blending will combine the color of the new object with that already saved in the color buffer previously.
  • scissor testing checks whether a fragment is inside of a specified rectangle; this test is useful for masked rendering.
  • stencil testing checks how the stencil value in the framebuffer where the fragment is stored, compares to a specified value we choose.
  • In the previous stage early-Z testing ran; now a late-Z testing is done to solve more visibility issues; stencil and depth tests are also useful for ambient occlusion and shadows.
  • Finally, antialiasing is also calculated here so that final images that get to the screen do not look jagged.

You will learn more about post-processing tests in Chapter 10, “Fragment Post-Processing.”

6 - Framebuffer

As soon as fragments have been processed into pixels the Distributer unit sends them to the Color Writing unit. This unit is responsible for writing the final color in a special memory location called the framebuffer. From here, the view gets its colored pixels refreshed every frame. But does that mean the color is written to the framebuffer while being displayed on the screen?

A technique called double-buffering is used to solve this situation. While the first buffer is being displayed on the screen, the second one is updated in the background. Then, the two buffers are swapped, and the second one is displayed on the screen while the first one is updated, and the cycle continues.

Whew! That was a lot of hardware information to take in. However, the code you’ve written is what every Metal renderer uses, and despite just starting out, you should begin to recognize the rendering process when you look at Apple’s sample code.

Build and run the app, and your app will render this red cube:

Notice how the cube is not square. Remember that Metal uses Normalized Device Coordinates (NDC) that is -1 to 1 on the X axis. Resize your window, and the cube will maintain a size relative to the size of the window. In the next chapter, you’ll be able to position objects precisely on the screen.

Send data to the GPU

Metal is all about gorgeous graphics and fast and smooth animation. As a next step, you’ll make your cube move up and down the screen. To do this, you’ll have a timer that updates every frame and the cube’s position will depend on this timer. The vertex function is where you update vertex positions so you’ll send the timer data to the GPU.

At the top of Renderer, add the timer property:

var timer: Float = 0

In draw(in:), just before:

renderEncoder.setRenderPipelineState(pipelineState)

add:

// 1
timer += 0.05
var currentTime = sin(timer)
// 2
renderEncoder.setVertexBytes(&currentTime, 
                              length: MemoryLayout<Float>.stride, 
                              index: 1)
  1. Every frame you update the timer. You want your cube to move up and down the screen, so you’ll use a value between -1 and 1. Using sin() is a great way to achieve this as sine values are always -1 to 1.
  2. If you’re only sending a small amount of data (less than 4kb) to the GPU, setVertexBytes(_:length:index:) is an alternative to setting up a MTLBuffer. Here, you set currentTime to be at index 1 in the buffer argument table.

In Shaders.metal, replace the vertex_main function with:

vertex float4 vertex_main(const VertexIn vertexIn [[ stage_in ]],
                          constant float &timer [[ buffer(1) ]]) {
  float4 position = vertexIn.position;
  position.y += timer;
  return position;
}

Here, the function receives the timer as a float in buffer 1. You add the timer value to the y position and return the new position from the function.

Build and run the app, and you now have an animated cube!

What an incredible journey. You learned how pipelines work and you even added a little animation. In the next chapter, you’ll move to another dimension and render models in 3D rather than 2D.

Challenge

Using the train.obj model in resources ▸ Models, replace the cube with this train. When importing the model, make sure Create Groups is selected. Refer back to the previous chapter for the vertex descriptor code.

Once you’ve done this, animate the train from side-to-side, instead of up and down.

Finally, color your train blue.

The challenge sample code project for this chapter contains both macOS and iOS targets. Take a look at this project to see how it’s constructed.

Pipeline-macOS contains files for only the macOS target; Pipeline-iOS contains files for only the iOS target.

Pipeline contains files common to both targets. The Metal and shader code will all go in this folder. To select which target to run, choose the scheme at the top left of the Xcode window.

Note: Most of the sample code from now on will contain both macOS and iOS targets. Be sure when you create new files that you add the new file to both targets. You choose the target for a file in the File inspector.

If you have any difficulties, the full code is in the project challenge directory for this chapter.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.
© 2025 Kodeco Inc.