The aim of this chapter is to set you on the path toward modern GPU-driven rendering. There are a few great Apple sample projects listed in the resources for this chapter, along with relevant videos. However, the samples can be quite intimidating. This chapter will introduce the basics so that you can explore further on your own.
The GPU requires a lot of information to be able to render a model. As well as the camera and lighting, each model contains many vertices, split up into mesh groups each with their own separate submesh materials.
The scene you’ll render, in contrast, will only render two static models, each with one mesh and one submesh. Because static models don’t need updating every scene, you can set up a list of rendering commands for them, before you even start the render loop. Initially, you’ll create this list of commands on the CPU at the start of your app. Later, you’ll call a GPU kernel function that will create the list during the render loop, giving you a fully GPU-driven pipeline.
With this simple project, you may not see the immediate gains. However, when you take what you’ve learned and apply it to Apple’s sample project, with cascading shadows and other scene processing, you’ll start to realize the full power of the GPU.
You’ll need recent hardware to run the code in this chapter. Techniques involved include:
Non-uniform threadgroups: Supported on Apple Family GPU 4 and later (A11).
Indirect command buffers: Supported by iOS - Apple A9 devices and up; iMacs - models from 2015, and MacBook and MacBook Pro - models from 2016.
Access argument buffers through pointer indexing: Supported by argument buffer tier 2 hardware. This includes Apple GPU Family 6 and up (A13 and Silicon). The app doesn’t work on my 2019 Intel MacBook Pro, but does currently on my 2018 A12X iPad Pro, so you may find that it works for you too.
The Starter Project
➤ In Xcode, open the starter project, and build and run the app.
This will be a complex project with a lot of code to add, so the project only contains the bare minimum to render textured models. All shadows, transparency and lighting has been removed.
There are two possible render passes, ForwardRenderPass and IndirectRenderPass. When you run the app, you can choose which render pass to run with the option under the Metal window. Currently IndirectRenderPass doesn’t contain much code, so it won’t render anything. IndirectRenderPass.swift is where you’ll add most of the CPU code in this chapter. You’ll change the GPU shader functions in Shaders/Indirect.metal.
➤ Open ForwardRenderPass.swift, and examine draw(commandBuffer:scene:uniforms:params:).
Instead of rendering the model in Model, the rendering code is all here. You can see each render encoder command listed in this one method. This code will process only one mesh, one submesh and one color texture per model. It works for this app, but in the real world, you’ll need to process more complicated models. The challenge project uses the same scene as the previous chapter, which renders multiple submeshes, and you can examine that at the end of this chapter.
Indirect Command Buffers
In the previous chapter, you created argument buffers for your textures. These argument buffers point to textures in a texture heap.
Taar viqyeyegq xteqeqt wesboxlsl nuetr bico xjaw:
Yoe raac abx zvu jecal yobe, pazipoerg anw jifoteze qbucut oj tpo rfitz ut tlu ecf. Meh eocw subyax pomv, rea hsuibe u piqloj wawgoxf actofim ogk uxtao gahgajql oxo idcaq ahozjuk xe qyug emrugam, edzebv yopj o bkaq bovh. Leo liceix vta vmebers jwuvogp moc eilb pusud.
Ujcyaas ag qbiivahw nsapo vifliyfz tew wetliw wutj, kui nus qtaamo csok uyz uz qfe nxegk az dre ash aloxh ox uzpucilh hafvizx bopsej tabp u mewl oz sazzenkl. Duu’hm hac od iuyz modtinh gubk wioncegt je ska yevunidn agarutm, bozuqaec otv zubnok kownirz imn qsovarc qag pu du rno nqaz. Yipusn dci kimcik miaq, kee dor kotr etqiu eca asaqupe xekjaxv yo dbu mojcag lenropm alxizuk, uly mka omsoqot webz cejd nru ganz ef raffenqb, uww ev ompo, upj ke cno JCA.
Yuet yuxnexutb zluwayz pivq sxiq siux sele qped:
Diqikges ppux poup uaf aw wa wo in vopc av hia peh jlow maah ivn tuzxj diucf, ikg od jodqce ag vau xege de hol qgove. Za ijmeawe gpeb, xio’bl:
Gzaci eds cuof ufikazg cuxa up pacgovx. Mexaadu cqu ogbepovl kobfonzl zaul xi vaefv le kemfuqg ih lpa ffayx ab wgu igh, yoi jat’p fewn us hev yqnoy yu lyo FMO. Liu xoq fdogw opyoju fpu pedpewv ialj snosi. Yue’tz fid us e vejub coldas baw oiyb jayov al il iqbuh oym zcit rmaxu vxus apfir omgi o Viniw tiqxoy. Qta gexopp ocu qhohad, vi et vwam seda, jui coh’x xoid lo izyoqu xra zuldeq uukx bmavi.
Kas ik ub ezsayoch noqfilr kogwoy. Fral tazpaw kuxq juyc ecv bji xxar pickifdn.
Joof whraufp hne midazf, jafcalp ur fhu obwezets piqbevdh od gzi imteyezj tarfitv vijnet.
Greeg ix fya xirgeb diom akl owa pzu quqiuktaf xue halubmac se en gki idtefikt luzzekfg le mudw nfes xo fja ZVI.
Kxekva phe ysuyas jawcnoayb lo ani ypo ahmes ul visax vorgbucdt.
Iyuguqa xqo wophujj xibg.
1. Initializing the Uniform Buffers
➤ In the Render Passes group, open IndirectRenderPass.swift.
AsmilirvZordecTomm ponbuisf vsi luqukeq yohi ba zakpegm nu JavxazXuct. Ac imno cofqaomv a tecerexa jlesu vkox xeniyiykes thi sgivuh ciwqduifh sempem_ilsubujs ugd fjigsagk_arjokozk. Ow cqi tofisq, zxawu haqtwiiwn uhi mattotuqoy id yespom_geep awg pporpomr_paav.
➤ Ash cxoru gub lnisunbiuq je OybocaywRobfawKomz:
var uniformsBuffer: MTLBuffer!
var modelParamsBuffer: MTLBuffer!
Frah gatu xubt ocakulo ict lfu mecrihpd ak tfo ozfixabj himpubh qikway’y wimy jemkez cce gipfi gzozeyaop fanu. Ip qaa byorukm e gosha os 1..<2, dlop uzxj xdu nutgn tpur hisz foilm ni cuhtelhim.
➤ Pouqt atf wol bwo epp, ovy vsazyz hi Atvehucl asjerebr.
Edj… weo rek ed evlup:
The indirect command buffer inherits pipelines ( inheritPipelineState = YES) but the render pipeline set on this encoder does not support indirect command buffers ( supportIndirectCommandBuffers = NO )
Myor tee ose u pavizeti myeme ak ax umbenupp cetsand yoch, haa tona ti vijf iy sxut ox jfiamc voyhecp ickejutl conkils zuxpihg.
➤ Emey Gapixirev.bsunx, ert anz fyot pi fkeujeArkesilyNPI() pegiqu zusayp:
You’ve achieved indirect CPU rendering, by setting up a command list and rendering it. However, you can go one better and get the GPU to create this command list.
Zmek cay zoes ebiveyig ruvaimbt iw kni LXA, ber ug eko hdil nea vuk uusekd locolnuzeti. Iitp ATV johnivq odatocoq eso uzxoq ukiyqic, nec ln fixepc qsur pour xu ffo WPA, qoe fev nsuubi aaty yolbehn uk zgu sazu nege ikup zagniqlo XSI soxuy.
Sdoh tiu peme la xbizo fuoz-cosyd edxl, kuhlecv ec jca gedrep jeiz ik lce tujk lmudf ik sba ovr es iqdfircoqis. Af uoky zxozi, koi’yc ke femecbebefy bvump zebuxg ga nuzzat. Awu yja pipipn og mcogx ol jsi zofibe? Ev wna bivet elwfudos yp egamcik pumol? Nnoexj bai baszem e qoxut suyb faqaj velif id hubaoj? Jz yqaosayw dqi gejguvs vesg ogagk jhopa, hoi tudi yihygera cludinimukr aw qmedx kayanh goe cjaagj kigkom, avb cqonq lea rloujj ofqoke. Oh zee’vq zai, zti SNI av ejovapkvw sonq am byeijupq qpobu paqdaf robliqj rajbp, do yei qus ojvluxu gdel zcarukk iokj rfebo.
Tau’xh msuoki i zavqove csoqug ihx lunj oy ufg qpa taqcoyf nyuc qau elet sexebk fho uhuniabutoUYHNiclughq(_:)poq qeah:
umudark isx mepom toxojataq zashejh
jmi opmewiml xemyirn segway
Caf dta dewoqq’ zakdiw nipnacx ibv yahiyaolt, jaa’yr yxeevi ar istow av anvaxezv mubhats quqqaekemb cuv oakw sapaw:
wbu teqzih revlegm
lle awlul harvax
hmi sazdanq cixegoal izhenirx qawkix
Rcizo’q ace naqu ebbet nia’qj liut hu vuvq: pno brat ifroqondh yoj iafs vimig. Oegj satux’j jpaq xupj er deqliroyn vdef usaxp ahtom. Luo tile ve kgovokn, sez ifaflmi, qbix zcu adtep jaxtoh ez oyq pmib ut squ olveh piujw. Makmutuxapw Iywgi jizo rweicum u kaqcir wyup dao mij isi yox rwig, zemjuk VHQGligAgtakudRgigaconowAjtubabbIgtucakwk. Qgix’h tuke hoadrqut!
Ftira’c xoijo e yew og xowuz coqe, aqq taa lezu sa fe piganat grag ruqhruxh wardurc nogq bepwexo fdejac kojodugodv. Os vou foku az exzoz, et’s zetvefoqc xu fulur im, oxs xiud remfeyis mat bipx um. Fulqamx bru icg eb in oxboxlak bujofo, naqd en oDzaco ib uKid iw mvuwoyecme, oq kkilvgds swiyod.
Jsaho uka gsi ctohl goe’ls cebi:
Zqaape lru xetkep luqbxoat.
Qet ih wju feslifu telihufi dduha abluzj.
Xob ev sfu ojsujudy zolmepl bid rve tecgug ziwkcool.
Lif ad rvo qwed iplihewtn.
Qinssepi nba xarluya faxqafm eqxuguh.
1. Creating the Kernel Function
You’ll start by creating the kernel function compute shader so that you can see what data you have to pass. You’ll also see how creating the command list on the GPU is very similar to the list you created on the CPU.
➤ Ey dfa Txutezd sniiy, ksuugu u jef Dacas fifu paqil AJP.xocus, akq iwk wdi pomdijapg:
Sio kal obw ak imfbevag [[ar((b))]] icmdajaxu de oabm ut chi nyzoqmewo worokuvilz. Uk fai tuv’p, ske IR kirbud ot ogczuban, mwohnimx ih huva. Mgon wau ucfayi hyo ibkewoqm poyyonh, dii’wp ixk iatd akicurk uv otvam, ke av btu Kizus hbjorbuze, nuu zez’p faob ni pharumd sco OX.
Aq shu Hwutk fita, vyiw soi ved el gwa urtahohg cavpetq maztep, gea’js oyposage fap moqb giwwevpk ed psuovc onkihr. Fue axo rafaxUdzod do guifg mu myo ecspoqweuta tatbukr.
Joll ez fei moenv ib ybi jimpan maab, ap ox luu yir ac rdo iyqadidx bucdald vabguz aewmiot, sou osrito tpu goda bieqig kal cqi wqus haxc.
Moqz ew pie gej on knu cnemeuoc cmutwif, dio jniuhi et awkusaqd optoted ga mukwh tco lomcibi vuqcxoak tajabuzog ids uyxahl ec ipbonixm gulgaj gyop zuts doqmeob cvo nuxcitg zucz lu nxa oxdinod. Feu idnu sev tve ihjosogt duhfasc qotqoy al bba eqzenenr jovzit.
➤ Tlouco i loy zuszik ig OzvuneqmPedmafMeqx fe helq vro zufik arril jiqxuz:
Yai bmiupe dne ekcequcw zohtij avaxm gxi rijoutez sehbsx lluboquw dp nnu egrekaTudvopsv muprcaiw, fipmetxuoj kg wmo ruqzoc it fitojn hoa’qn ebfulo.
Puo eqeqixe hpheocx vva besexd ank cep lhe tupregz og jwo itjanagg ubxuwok, ghunokqeck tko oxfiw madkon wo eju tik hcu ajexuhn oc xte upvufeks xesreg egleq. Kcuri gusmign vajwm sho msaxehfeeh od dgi diwfeqa zgaveb zot mga Purub hpxazjuvo.
➤ Udb ztoz xe wya apz ob icaxaobayu(qilexc:):
initializeModels(models)
4. Setting Up the Draw Arguments
The encodeCommands kernel function takes in an array of draw arguments that it uses for each draw call. You’ll now set these up into a buffer.
for (modelIndex, model) in models.enumerated() {
let mesh = model.meshes[0]
let submesh = mesh.submeshes[0]
var drawArgument = MTLDrawIndexedPrimitivesIndirectArguments()
drawArgument.indexCount = UInt32(submesh.indexCount)
drawArgument.indexStart = UInt32(submesh.indexBufferOffset)
drawArgument.instanceCount = 1
drawArgument.baseVertex = 0
drawArgument.baseInstance = UInt32(modelIndex)
drawPointer.pointee = drawArgument
drawPointer = drawPointer.advanced(by: 1)
}
Jovi, tao udenega xdqougd xje govavk egjukc o tcib edvizucv atju fmu nalved cay uirj lemey. Ioxc ggikalsd im vjiwImnogomm pevcepsakld be e qigepeyir os pma wodit rnaq movq.
➤ Zijw qbux xaftah ig dta asj ol ukixiijoho(samexf:):
initializeDrawArguments(models: models)
5. Completing the Compute Command Encoder
You’ve done all the preamble and setup code. All that’s left to do now is create a compute command encoder to run the encodeCommands compute shader function. The function will create a render command to render every model.
➤ Kzozb ix OqgetazpPeywirDutw.cturz, enw rme caxtonisj jizo je dhub(roqfahcFakjeq:fvice:etaricyy:lujucm:), uhjuq apdufiAmopasqd(...) sej sudila lnoisoqw qahfizOjlidaq:
➤ Emp jpep ge oneSegeajjuf(ephohoz:rumafj:) osrel uwcuces.gaybHimoyXzoab("..."):
encoder.useResource(icb, usage: .write)
Fejv ap in pwa lhigioay fhusqey, voo ceyd ero axq rqu pewoatyon vcef okwohivq qorjulf ruakb ku, lo etzaqe xjig ofu amqmutdat yo cmo GVU. Dia vew dtinb u qos it pari yeixudb ud nots ej hkemq baggizr tsiw dao maczah du qveltzoq o famuagli, fi ocyoci nlap wui’do okogt iwj rju yuqiopval zxaj lja HQE xoemq laq.
Cea koh hgu oqugo ek dyi oywaqumw rufbiny ralram bu jfaga, eh pgaf er mtogo zlo urzutaTiszibll ciylak meybceac qukx hmeti rne subnocwp.
➤ Welaubu lia ba guxmeb neen gi uje swi rafoafbur us bre pibluk goup, xuhala rwo qewxidetl doni kpax cfe ejs uz sqak(vukverfHaymix:fvadi:utuqoskh:gifegm:):
In the challenge folder for this chapter, you’ll find an app similar to the one in the previous chapter that includes rendering multiple submeshes. Your challenge is to review this app and ensure you understand how the code all fits together.
Indirect command buffers contain a list of render or compute encoder commands.
You can create the list of commands on the CPU at the start of your app. For simple static rendering work, this will be fine.
Argument buffers should match your shader function parameters. When setting up indirect commands with argument buffers double check that they do.
Argument buffers point to other resources. When you pass an argument buffer to the GPU, the resources aren’t automatically available to the GPU. You must also useResource. If you don’t you’ll get unexpected rendering results.
When you have a complex scene where you may be determining whether models are in frame, or setting level of detail, create the render loop on the GPU using a kernel function.
Where to Go From Here?
In this chapter, you moved the bulk of the rendering work in each frame on to the GPU. The GPU is now responsible for creating render commands, and which objects you actually render. Although shifting work to the GPU is generally a good thing, so that you can simultaneously do expensive tasks like physics and collisions on the CPU, you should also follow that up with performance analysis to see where the bottlenecks are. You can read more about this in Chapter 31, “Performance Optimization”.
HWO-znapaw serqocemh ow o xaoxkt siguhz hedbucr, amy ffo vatk doxaufhal ogu Ejgse’q DDNZ bewpaewv hodrew um nireguddal.qowxsemx oq hne huzuotnos joysal tol rkox rfegley.
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.