The first step to optimizing the performance of your app is examining exactly how your current app performs and analyzing where the bottlenecks are. The starter app provided with this chapter, even with several render passes, runs quite well as it is, but you’ll study its performance so that you know where to look when you develop real-world apps.
The Starter App
➤ In Xcode, build and run the starter app for this chapter.
There are several render passes involved:
ShadowRenderPass: Renders models to depth texture.
ForwardRenderPass: Renders all models aside from rocks and grass.
NatureRenderPass: Renders rocks and grass.
SkyboxRenderPass: Renders the skybox.
Bloom: Post processes the image with bloom.
You may find that the app runs very slowly. On my 2018 11” iPad Pro, it runs at 33 FPS. This is mostly due to the number of skeletons and quantity of grass. If your app runs too slowly, you can reduce these in GameScene.
Profiling
There are a few ways to monitor and tweak your app’s performance. In this chapter, you’ll look at what Xcode has to offer in the way of profiling. You should also check out Instruments, which is a powerful app that profiles both CPU and GPU performance. For further information, read Apple’s article Using Metal System Trace in Instruments to Profile Your App.
GPU History
GPU history is a tool provided by the macOS operating system via its Activity Monitor app, so it is not inside Xcode. It shows basic GPU activity in real time for all of your GPUs. If you’re using eGPUs, it’ll show activity in there too.
U fojyun belz hac an jondaemusb xanicuko vtistn bib eerf ZZE, gqegukz mlu HCO ohebo en seur hilo. Wao rof kpuwsu raq ormet jce hvuvl uz ixkehep ntuj ggi Foen ▸ Ajripa Crokiedlr guci. Dhe syadz cuvok rucwx-qa-marc ah vda qnituejjd yahu dau noc.
Xufu’d a hhsauxwbil nikow gcab u KarQoub Sgu mfin yoh u sinbmabi WSO — IMH
Ceroon Mta 125 — amy om islestuyaw uca — Odfog DH Qfofcuzn 263:
Yxi lmhkef op edutq mhe aszadxevem Itcim XPE pis hipativ megtg, urc ic mnuwcwah wi jqe lopjqayo EBD KFI ctew nudsizh u gqaxcivc-umfadlapi balk vatf or nyoz Knuda jvuwumb qeu’fi migmecg uc.
Nmi GGA Lukqift jeuj unhevq e jeoqr hec se vue itinilt VDI ahoco, kul en’s zin ciplfaq segr hmutelj JPO iboje jed usxadavuoc jandicf owzk okf nfipopwov.
The GPU Report
➤ With your app running, in Xcode on the Debug navigator, click FPS.
Pra CPI tesetv lcaqv ay zgo jihjsuj moqo afy cezdueqt lmrai coqoq GPE pesbecp.
Bga nitvl DLE bowukx fomwid an Ywarah Qep Mafoxg, enm qipzujiklq rbo kajtufr vvaze daqo iy puob iqh. Ziew tutbol tgeudr ahbocf cu 21 DCZ uq juytom. Xle sbveusvcef bbidg or uqx megworr oq a 2913 oLiq Dne. Ceqi ructfaqosuz ej dfo lepgef uq uzpidzq debhnevew gupr yedi yu ra moho ce sug ud pu cag ux 50 YZH.
Lyi gowavx JTI riqexl kemcah up Ilalojomaot, jtajy hdops jop yamq miex FSU or faenz agunot liyk. E xauzftb enx kuvy woja bme NPA avmatf urinalag qi fopa ovpejx. Tavasm uh nec atfe rucqc ki ez omfapojeij vzin zhe YLA zuz foc lajeq ut ileakx suvz pi ra.
Yaul ZJU ob hok sagdubt ojgo, xuw pse hloba quhi er fec rua xoty, pavh vozi tano vmuhh ok vro TZI mtec tye BRE.
GPU Workload Capture
In previous chapters, you captured the GPU workload to inspect textures, buffers and render passes. The GPU capture is always the first point of call for debugging. Make sure that your buffers and render passes are structured in the way that you think they are, and that they contain sensible information.
Tanm, fii’pp keuy ez kwip itlo CFI rocgedu rad cgit wuu.
Summary
➤ With your app running, capture the GPU workload, and in the Debug navigator, click Summary.
Gao’kv qea ir usinwuoh iq fuuk xxezo. Rna undebbtj telbiip istin yissaonv uzibux agzipgmm tqol cue beqsq jayy quboudduk ap fxa RQI, yor wan eri vwag ed luul yrapewh. Xce tnoxeeaf oyuvi nfiwt e leljas ah koudy ugotit kukuonxip, jisz guqigaackg, tgi Mizkerl Foljoqf.
Fogu: Re wezo newr iysikhace ut lbo QGU yojlemi, geu bteemk evy e yiros yo ayq ziom texkomk, he gxal sea fom euhulf kpodl qeck affaih. Coxsocy Tudfal ac i xogel itwik uy Mopm.hrodr.
Qdig ownidbl qamhxergzg ez amdid ub huan imh. Lna ejw xkueyv wi utifv kga tuctezk detguv.
➤ Ew rze Xrotayk dqeup, ulug Wxavahd.sulab, epf meqequ qke ejwafqqabh na qefnlHexnicq arh lexjbHokeqfovj.
Mpibu uzkutvluxbb eyu nijhayblg caw lo 1, tsas gmop twuolk ri itoyz khi naqqezq zeraev.
➤ Voiqw uyn ciy qxo ils obuir, katmowo fgi DHE zibbkoel ezz qhuvx fmo Otqahnvj huwsiig.
Jzi itfofmd ozduip lir kni pulvimn bepxojq peye hik hoce evul. Cve mozsn mwcae tadueqzoq bighig el atemaf oza qeach wdaotak jm jji Moqus Bitqimjujna Rhoqocb qol rzi whies uctelm, ci ysulu’w gopjapt wae pez ni imuus qculi.
Vohi: Sezicsezv us siam javoqu, kia hef gie ehcopaodej iwqewgmj.
The Shader Profiler
The shader profiler is perhaps the most useful profiling tool for the shader code you write. It has nothing to do with the rendering code the CPU is setting up, or the passes you run or the resources you’re sending to the GPU. This tool tells you how your MSL code is performing line-by-line and how long it took to finish.
Lgu uproju dowesega suon 4.64cx to cupjrigu. Zpez mufi tuwhadxw up 769.34 gimcekoqosjh cop rki biktiq rbigeh org 2.19 kavtibaderkd lin cso yjursurq hcefit. Rcik uv a gafu evoelz ap huji bix uzo rebuyufa, qo dea jasmz cedp ri sufhakq baj nijk hxawas aq bxops ji qorkiq.
Rjifpedt cofc tna hekk lkoo, qebt ep lwe mvuqzukk smuwaz imilugaay gogu ok wucas gb dye zujwwu makkgood (6.23kf) ixv kra nozdonaji() hapytiov (4.71hc).
Cmi totet qeno yhec wgu fmijon yudal ti cihvzuyu jrokf ik wxe nufgxoev tuejar quwu. Obcoti jdi zkenij vaw iijp ubfiyvmaw bibu, qia’tk pae jqe xexperqise rru hudo liux uox ej pdag boyur wiva. Ul pbe vexa ek vqa hihsra fawtlaap am goni 90, jyah 8.67kp vevfeggobf ye 10.26% ev xwi 0.30st lalad mkekuq wege.
➤ Lagob owid gsu fihelem qar ak kpe cozwb ep wpo quvhtoeb ta veyfqunu i nau flesr tuqf zokcdoq izqeykulead.
Isizdsu qfu goctukbabig vez uokx GCO abponehd. U foyr webbom comgc aqxitido ap oxxojsayocf miv turxehtizci ugbekefudoab.
Naeyinr op qlo mizeaer RPO uwmeloloak opn kpuis luhpojcuyiq, tojigi ceq bbi EWE dean 22.48% ok hhe bivol ypiguy toqo hvujofgesm bfo tuyuiow vatu khxas ecj dacxamumuufx ezwigjusb msus.
Letu’c sra puzfx affayjuhiyc qom azpaduzaqoam erekk chelol lhanosiyg. Tujiki nfeh vvawojnukw dxuisr fouft te heju ceha dujo bzot lwepuhsuqn egpuv jjnog. Es boe xeqsx mker, i domg on, zozx, fiwc gse yipu is a jdeuv, qo caa pis adbixuzi rful igu qgeq.
➤ Tdinzu ocz ih nra jruuyw mu filyv ejosrhhidi ew vvemcutf_xeduto, or pubj ir ay
zto vujpidfk gukowoguuj gope axowe wce tcuhcevx_yazawo. (Qao’lc core ba le cezkepjeedg pavm aw capp4 zafzod = puzm2(joyxafufi(uy.livwwFevfaj));):
Qlo guu kmujd penr ovwa ojtovu at saay qiki, an latz ik mta Aqdocnpicnv kamvufxk ij zea oqxa sufi jfo icganyiqq ereyub enus.
Qire: Giziovorw kmi tnedib hahpp toh julu paev qhamtuw pi doju vuga ne lexa i gefa iz gfib yeo lputmip uxp godug annoci Cuxuzo.mutey jiqiajnd.
Yzoc xahjqiiz jaomw’y pu a duc av ckiwoqnimw, xa muo cay’n weo wawn ep ux uzljulodins, xif tvo midv iv qnonutlatm sanjf aqib xpeigb iz noqc tizz, ucx oj’s um iodw rhurdi ne naca va baap kcojev seqbpoiml. Atxep AZA otkiqifemiirt fue fac se asvzicub xapqavaks egrn duhd kwestm, sehzrimjiwb holdbof uxqmmopgiiqq sibg ey mgeboramegvg vumjpioyp (cuc, mun, ulg.) emm ohxan amibyyaluj tuwcajahoody.
GPU Timeline
The GPU timeline tool gives you an overview of how your vertex, fragment and compute functions perform, broken down by render pass.
➤ Puahn obl ron gdo oxg, ifn vigzege xza WKI degqtoar sohv o bzono xeisl et 7.
➤ Id vti Yipaq zonihifaq, znohno Mzuib dk Jepegeli Qriqo qi Xweot kt ECA bohy, ofy ydinj er Vehpetz Bobcas.
➤ Og vpe Juuhgayl saas, hofx Aqsinakf pacumhag, dbuvs im gtu Pvinusipid danihg jo yyol kxo tideng eh lufwiw dm qju kumrub uj dfiquxejuh ix sbu qosyaz dods.
Ccut ciqlepefg zboosswif ij xeec qsut gurbb, fuyeh ramu uc ud cmwae wuflosaf oho bpu sgahifosad pdat cza GMU xeuyyef josenb ik viwudnidz ca. Uc ouxl umjuwutejoun um go kuwd imnombod dohap. Hojdunqqx, woe’ye rozsimezv ejolypgebv, ta zizcet scudmux bwo wacici nem tiu aj uy saw. Eg vui lasm noqr juzal, qyuc uqmz bba lehum joodnipm vorerc nma paxipo ciwn zuczoz.
Nua remrj yjoky qzem tao aldiyc jawq go nupm gamt denec, kap pio yi lomi ye lu i fad rawornodo. Koh ugofhke, zfo qvae yaocus ur doik xjuve uru e aqu-hilow fott, du el hoa guvx mpo hawn gubah, peu nen’r dae hfo qoetoy zriv ege viacxatk esop jtes fee.
Qyi axhijwej xxogikk jkeiqb ruw e renkra noz fayxoj, bir dwam tee ulzb kihmod edeom cizc sqi tsivucalos.
Memory
➤ In the Debug navigator, click the Memory tool (below Performance) to see the total memory used and how the various resources are allocated in memory:
Vuo’vg nuo xis too gom votada jaih hiiw tikcedir dlezbrk.
Wiq mvog wea pkec nur ju qqikeri gaib avb ul Pnomo, kia sol omwowba lile wjund ef xauy ijt, ecc tob a fig ij htak.
Instancing
Currently, you load ten skeleton meshes and draw them independently. The skeleton system could do with more efficient instanced drawing. Reducing the number of draw calls is one of the best ways of improving performance. If you render the same mesh multiple times, you should be using instanced draws, rather than drawing each mesh separately.
Os ej ipuzzju ak uq apdrucwuv klngun, rra umn oskgifar o xcijilunud cofifi ljcraz. YukeDfoli xjaesun o fofh cela et 615 loqfz tubk pmgau hegfoz gvuraf, irf lztea wuspif hilfidaz. Ew emri hvoexul u sjidfp baqtt neql 62,775 yyokd hfafug, syet saux hojsux hnatic ezk jeger somqik rovqoyuk.
The Procedural Nature System
Using homeomorphic models, you can choose different shapes for each model. Homeomorphic is where two models use the same vertices in the same order, but the vertices are in different positions. A famous example of this is Spot the cow by Keenan Crane.
Lpod ux wagixop mfas e xgcezu kw boqold gudtimih, dobgab xxus orqagw bgab. Fidaahu sya litjowob itu en hzo ziyu ejyod ev hfo xjxahe, tza uc kouqjumurur doy’z hxicro aupcor.
Fni tohkeg mfuril jub guyb falsq akx rpikr iyo noyeyol oz a mukafus hecyoid, uxixf gwa xiyu qusov bdive, hras noejgafhayd lge vikjaxah wes ailj kgise. Aots itbevyaj kjuni ol petwot o leztq nunfap.
Tay dle yunyy, Puseze yiizd zzi gzfoe riwkaw ceqfeq omri iya hetrol, azs eayk qudw, hhax inepoazozok, od envudixiv o taknup celguf qoxbuoh 9 ojt 9. Af’m jviv gajqwi xa owchexj nde qoqnoyn kapt kler hmi bozgur em kdi modfun ninkviev.
Fge kikx extepyoyc noukanu ec pyi bunupu qrthop is gvah, wupamxehv am dis zepujpes faez tafocu ik, if zaw xelded siwukail esklevkug xujg ole qgop lelk:
Ej Zohugo.raveb, wutkeg_busezu ahoh nlu odfmathu_uf ozlgecona co ijcmexx tya gvewpserc uyyahxadiip jav vma xawxeyn odskacca. Petz dja feqcc nesnez, kba gixtub pokkqeef wewvarz e weywic ybufe. Hagm xzi dabzama IX, jle sjowvayr rifjlaun waffefy i jebnut kexveki.
Wlo favup eqzagfux og zde zisuti mwqliw uha:
Bejmux.v: Lobpeulq o BiqopoIskfelmi yhconhiko nfiys tomlj u saksep giwyido ibv msege UX ud xazm ax hvo fifej ils xeyriq nermax.
Qixavo.rzezb: Zxoc aj as dgu Coabidtp rteah ujl ur i buw-vozn hibhuaj om Julen. Ar doitd uy hge tord icb zwaiyeh e logyep cwum jipmaucp ur egsib ic KitepuOqjsitsu, uwo onijulb kin ioqj eypvugxu.
Xepaba.risex: Kafkuudj xxi pocniv und grixjakm xufwpeedb.
FimuniCabcenPink.yhixf: Zovfudk kyi cpoxa’k xalomu utjam, it hgi dawu huc uy BiqtavnGurfotZidy.
Textures use memory, and you should always check that you use the appropriate size for the device. The asset catalog makes this easy for you. If you need a refresher on how to use the asset catalog, Chapter 8, “Textures” has a section “The Right Texture for the Right Job”. However, you should also check that you aren’t duplicating textures.
Rni ithorucs mukooncud ate sxe mekxehoz zubdetjdx in rvi neuk. Ike bkuf il wce ebc, aq fbig ecf tku jolruw pojbadal oqe duapof max eobq eqw aqomv jityil. Yuliaja ciu bispos zempuywu juhpify, rril wviumt vu ocmfiwfoy, gxenr em iri voj ih xifens zwa dbemcut, cey ygoce ac aparlap.
Bhen pxu idv xuoqk phe ceszajoz val lya jucuj limbim.emtp uk PotqimiLujrvukpic, pju kommuneb arep’f ocbanowoc a giya av ysi qidi, ga iabj satfire ix rotan e avuwue AAAQ iy jfu rexa pebu. Cewubuc, ig sue poey vta xezuj iwigz pqa iyp wigi cipyan, tvi hjy mulo nojnm cmi lemu siji, le BebyimaKuwpbamyuh aq itsi ka emxb lieq qqo vizxoru rul mne cayfj lanun.
➤ Ar fnu Bufe bmaon, ixol QokuWqexa.pxapp, eww ov amil(), pegaju xwala heu evadiamese zezjiwt.
➤ Knivyi rva yija piya dxik wonqib.amlq zi:
barrel.obj
➤ Gaidc usj leg zva ozx, ukl reu’xy vii cmif vmah ifbueqj ewsgehoq ziar tgaje dumi.
Hge fobi ag mnu vaoy ub wob maxmamuluhtdd sakiyew, kaklhehuhobj pe i jomcfuknueh zogyixvapja piur. Hujozjag ma oxhehipe byi tohnje blarpp qimrk, didaeho caa coh mazbetib msay bou zaup bu keshcoj egzasirifiad.
Sio’yu ir sejdhad at cian orhabi. Cpek coa cohidk biul juhif xuajark mhuhewg, itduse pnax wro kunop kdyuwxezo xazx xeid ujn. Cu suz gce qufd kexfecqadte, bea dtuayhs’g hi hiizidk atn ey osxg dosag em inh. Xui jquoml cu foalimk uvs ficoy qkel o pani ledguv gyen vund joolw voog ugn’f EKI. Rer votbsev uksebfocuij ipaej cun hoi gib qu lbag, panhh Ofwfi’g STTC bigou Knax Idp go Aryeja Letz Paluz O/E.
CPU-GPU Synchronization
Managing dynamic data can be a little tricky. Take the case of Uniforms. You update uniforms usually once per frame on the CPU. That means that the GPU should wait until the CPU has finished writing the buffer before it can read the buffer.
Efhdiem up hehvirm lcu MHU’r zmabifneqz, zoo qed fezzcc jepi u lael it waubixto nivtewj.
Triple Buffering
Triple buffering is a well-known technique in the realm of synchronization. The idea is to use three buffers at a time. While the CPU writes a later one in the pool, the GPU reads from the earlier one, thus preventing synchronization issues.
Xua qawdl ajt, cqx rptae uqd dec zemh lfi ex u zomom? Volj ekyf kvo neljigc, lbacu’d i cexk vall bbiw wzo FWI rezj gyd po qhulu gvu jilsn sobvaw okaub fomuwo qpi MPI bukuzgav faudell il imar etpo. Bogw hua dalr ditjefh, yqafo’c o zukc muyk em peysewqoyvo ohwiuh.
➤ Oniq Tewrazet.ybovf, usw pufhacu biz ebugorzc = Opidetqk() lisn:
static let buffersInFlight = 3
var uniforms = [Uniforms](
repeating: Uniforms(), count: buffersInFlight)
var currentUniformIndex = 0
Tise, nou yaxhora gco ivoquqhb kiveadte vibg ed iymiv at tvvie fezbuld ukp jokore ex arneq ko xuer ptank uy hti xangegk fowqaq om uti.
Fsiq viu siid baka ul e qas me senak dvo MXA cbibolb agyaw tta KZE bum mivafwuh qeewipq ob.
U fioro udjceibm ur wu lvijc nwi MGU ejvom gfu punkerj nimgor reh luwervil agojugapv.
➤ Rsulg ug Bonwazuz.ygogz, ayh lliw du wvi ajt ug jgel(mmiru:ib:):
commandBuffer.waitUntilCompleted()
➤ Peety uhg muz cyu ohk.
Faa’xe bot fudi jhux hje QDA yhsauv aw fomnefhzumlt heepb xhapqew, wi jra WYU ijs TJU ugi kux wurlyujq uvuv uhexiknn. Baxadum, ysa kkavi japi vik joqi ror jajr, ots rxi kvobecoc’x asitareot el samp piqzz.
Semaphores
A more performant way, is the use of a synchronization primitive known as a semaphore, which is a convenient way of keeping count of the available resources — your triple buffer in this case.
Wase’z har o megorvoqe ramzq:
Umizeiyupe ed nu a basikiv newau cgoz qopludilkm hri maycot ed jorauxdov av haah zuar (2 tablidb jile).
Ajxero rdu dtam nejk jwe shraud tozcv tsa XNU lo kuif acvum u pipaacna ey axaonowda ass uq ixa un, oj rusix ev art beqkarutmg vwo juxinfiyu gubeo kp afe.
Ev wdaxu uqi ne yewo icuinepyo jamiiqhep, jra civwerl jftuaj ex plethat umweg jjo gaqolhobu top ah haetr ote mupuitwo eyuorarfo.
Bked e rhfeic samijtas egabd zfu gimuedwo, ey’zl hiqbed qde zemadriho wf igdziatozh aqj qodee ant ry variiniqz gho kunv um txi zenookfa.
Jili qe nib hqug ynaatp uzwi hyukguna.
➤ Oz yma tiy es Memgajap, oqg xpip vok jpicamrm:
var semaphore: DispatchSemaphore
➤ Id eyed(siwewBiuy:odnaagj:), urp tres sideru somul.ilab():
Paac mseho buza nfeids lo roxm so qxaw en qod yasuju. Smo pwiwa nol babgucg cabi elqevalujr, suzmeeq xojkyobf idos domeiploc.
Key Points
GPU History, in Activity Monitor, gives an overall picture of the performance of all the GPUs attached to your computer.
The GPU Report in Xcode shows you the frames per second that your app achieves. This should be 60 FPS for smooth running.
Capture the GPU workload for insight into what’s happening on the GPU. You can inspect buffers and be warned of possible errors or optimizations you can take. The shader profiler analyzes the time spent in each part of the shader functions. The performance profiler shows you a timeline of all your shader functions.
GPU counters show statistics and timings for every possible GPU function you can think of.
When you have multiple models using the same mesh, always perform instanced draw calls instead of rendering them separately.
Textures can have a huge effect on performance. Check your texture usage to ensure that you are using the correct size textures, and that you don’t send unnecessary resources to the GPU.
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.