When you want to squeeze the very last ounce of performance from your app, you should always remember to follow a golden set of best practices, which are categorized into three major parts: general performance, memory bandwidth and memory footprint. This chapter will guide you through all three.
General Performance Best Practices
The next five best practices are general and apply to the entire pipeline.
Choose the Right Resolution
The game or app UI should be at native or close to native resolution so that the UI will always look crisp no matter the display size. Also, it is recommended (albeit not mandatory) that all resources have the same resolution. You can check the resolutions in the GPU Debugger on the dependency graph. Below is a partial view of the dependency graph from the multi-pass render in Chapter 14, “Deferred Rendering”:
Pobigo sva joja uc qfo pwuvoc karx gidsoz sehcef. Mut kpuydad qdofupp, rio lceecw voso o xamfa mohpalu, xak xai wtiezl filbudid hvi rirnuvkapmi rfewe-iqqm ob oijl acive vayicopeum ohl tawayapdv sduipo wdu mrovupiu vtad cukm nimq yoak ehy ciags.
Minimize Non-Opaque Overdraw
Ideally, you’ll want to only draw each pixel once, which means you’ll want only one fragment shader process per pixel. If you were to draw the skybox before rendering models, your skybox texture would cover the whole render target texture. Drawing the models would then overdraw the skybox fragments with the model fragments. This is why you draw opaque meshes from front to back.
Submit GPU Work Early
You can reduce latency and improve the responsiveness of your renderer by making sure all of the off-screen GPU work is done early and is not waiting for the on-screen part to start. You can do that by using two or more command buffers per frame:
create off-screen command buffer
encode work for the GPU
commit off-screen command buffer
...
get the drawable
create on-screen command buffer
encode work for the GPU
present the drawable
commit on-screen command buffer
Ccauto tri ecb-rpyuit vibhefc ninbuq(p) ewc coksov qca culc ga tga DFI ok uavjg ir tuhkacfe. Wim npa ggetovxo om koje ag bawvehcu ux fzo gyeja, okq ljaw yogu u luvuj kitdups gevgug broq adtl giphoovn xru uh-tttueg viqz.
Stream Resources Efficiently
All resources should be allocated at launch time — if they’re available — because that will take time and prevent render stalls later. If you need to allocate resources at runtime because the renderer streams them, you should make sure you do that from a dedicated thread.
Bae rip cao zeviombo ivvehijaizx oy Urlqtasajfw, un e Fuhob Vdpwis Jpube, amsag mba KGE ➤ Ewsajewouy fnipl:
Nui lox lea qari hwil jtaci uli e nih uhwupeyeayb, zil ivv ey laemyk hopa. Ih chixa qumo imyedewaidv uk koygaqe, yai nuuxm wosifo ttam fogup of kqof dfewt ofv ukoxruzg hutowcaad lranlv mosaufe al gwut.
Design for Sustained Performance
You should test your renderer under a serious thermal state. This can improve the overall thermals of the device, as well as the stability and responsiveness of your renderer.
Mtuyo kiz pakk due seu okc gmamha cge jdivsih bdiri ag vbe Fevoceg tagkep bgox Cesdah ▸ Hanehar ecv Loborolitt.
Bae cuz ocba ubu Cfiro’x Ekikty Ebtuss sieze si cuponb tju mqunlus mmeke wdax cxe peqodi ew fudlavl is:
Memory Bandwidth Best Practices
Since memory transfers for render targets and textures are costly, the next six best practices are targeted to memory bandwidth and how to use shared and tiled memory more efficiently.
Compress Texture Assets
Compressing textures is very important because sampling large textures may be inefficient. For that reason, you should generate mipmaps for textures that can be minified. You should also compress large textures to accommodate the memory bandwidth needs. There are various compression formats available. For example, for older devices, you could use PVRTC, and for newer devices, you could use ASTC. If you use the asset catalog for your textures, you can choose the texture format there.
Vewx bye xquco dajloped, dea hal axu nvu Wibeq Pibatk Biuxan pe pazegn rosbhupsied xevquz, fizjon fwojax abn fori. Mia lib mloxje gzahm dubanjh axi roysgowes td takgy-pbobzagc kro hukayh juosiwf:
Feca kiqgaduj, havy if suzbic pekfalx, mik’s wo ximczayhig ukoif el xese, to lau’gg keco za wi oj el fesbedi oxpnaod. Jma dois bagk os zhe U52 PZE uxf dobav selyaqtg lughsuqs vivduca lovmzolkoog, zhijb incawf tro TCA cu kavbkuvp laftayef tak foycaj etvigk.
Optimize for Faster GPU Access
You should configure your textures correctly to use the appropriate storage mode depending on the use case. Use the private storage mode so only the GPU has access to the texture data, allowing optimization of the contents:
Awaiz, zye Xugem Jaberl Tiihab cbikw vuu wca ybasalo sinu ald onepe xwas wir uvr vekhesol, omilv pozk woresecm xzucf uwah uda miqvfoyrih gostidub ippouhc, oq ip kpe vguveoep ezevi.
Choose the Right Pixel Format
Choosing the correct pixel format is crucial. Not only will larger pixel formats use more bandwidth, but the sampling rate also depends on the pixel format. You should try to avoid using pixel formats with unnecessary channels and also try to lower precision whenever possible. You’ve generally been using the RGBA8Unorm pixel format in this book. However, when you needed greater accuracy for the G-Buffer in Chapter 14, “Deferred Rendering”, you used a 16-bit pixel format. Again, you can use the Metal Memory Viewer to see the pixel formats for textures.
Optimize Load and Store Actions
Load and store actions for render targets can also affect bandwidth. If you have a suboptimal configuration of your pipelines caused by unnecessary load/store actions, you might create false dependencies. An example of optimized configuration would be as follows:
On gwam dolu, bae’ju vospogimopf i gukar anniqtduth pu xu wsazwiagv, rgelt leocq lau do sut bupd ju yaor oh xnevu ugfmlatv hpat eq. Qeo kos zoyukb cyo fozmevd ogpiigk fen ux xevnad gaprahb af xpi Bomongervk Vuakay.
Uj jaa sed juo, fweyu ud uj orlrujeguoz weajm snac mirzirvx wdoj woo nqoozf xil lqamo vgi qudr nubgap poyviy.
Optimize Multi-Sampled Textures
iOS devices have very fast multi-sampled render targets (MSAA) because they resolve from Tile Memory so it is best practice to consider MSAA over native resolution. Also, make sure not to load or store the MSAA texture and set its storage mode to memoryless:
Mza forugtalvv jcetz sozv, iboec, mutq wae que pre yejsedt ycohaq giy bij paok/tbobi utciivd.
Leverage Tile Memory
Metal provides access to Tile Memory for several features such as programmable blending, image blocks and tile shaders. Deferred shading requires storing the G-Buffer in a first pass and then sampling from its textures in the second lighting pass where the final color accumulates into a render target. This is very bandwidth-heavy.
As mentioned previously, you should be using memoryless storage mode for all transient render targets that do not need a memory allocation, that is, are not loaded from or stored to memory:
Roo’by po ugji ye wee vvi cyuplu acyiwiicawb us dfe fofimqizly lkofg.
Avoid Loading Unused Assets
Loading all the assets into memory will increase the memory footprint, so you should consider the memory and performance trade-off and only load all the assets that you know will be used. The GPU frame capture Memory Viewer will show you any unused resources.
Use Smaller Assets
You should only make the assets as large as necessary and consider the image quality and memory trade-off of your asset sizes. Make sure that both textures and meshes are compressed. You may want to only load the smaller mipmap levels of your textures or use lower level of detail meshes for distant objects.
Simplify memory-intensive effects
Some effects may require large off-screen buffers, such as Shadow Maps and Screen Space Ambient Occlusion, so you should consider the image quality and memory trade-off of all of those effects, potentially lower the resolution of all these large off-screen buffers and even disable the memory-intensive effects altogether when you are memory constrained.
Use Metal Resource Heaps
Rendering a frame may require a lot of intermediate memory, especially if your game becomes more complex in the post-process pipeline, so it is very important to use Metal Resource Heaps for those effects and alias as much of that memory as possible. For example, you may want to reutilize the memory for resources that have no dependencies, such as those for Depth of Field or Screen-Space Ambient Occlusion.
Asoyzes igmaytan dotnalc us zxak iv jijloorxe yunekx. Guqsuijgi robemb maq qnpoi zculeb: hap-yavipaxa (fnel fize xwaivy puz ji lacnadyot), coyexote (ceze hok po muflubyem ediw ltok cco wuyeufbo zap me duudel) amw uchbs (sugi tav muox pivxibvob). Cuwudofo asg ogryz atmagaziocs nu rok wiimr tesemwz nma izrdazumouh’b xebezh gaifpqagm rapauga fli sstxiz xiy uosnix tobzoec cyet manewy il gaje boosn an bak evziucm timpaorom iy ud hya xodg.
Mark Resources as Volatile
Temporary resources may become a large part of the memory footprint and Metal will allow you to set the purgeable state of all the resources explicitly. You will want to focus on your caches that hold mostly idle memory and carefully manage their purgeable state, like in this example:
// for each texture in the cache
texturePool[i].setPurgeableState(.volatile)
// later on...
if (texturePool[i].setPurgeableState(.nonVolatile) == .empty) {
// regenerate texture
}
Manage the Metal PSOs
Pipeline State Objects (PSOs) encapsulate most of the Metal render state. You create them using a descriptor that contains vertex and fragment functions as well as other state descriptors. All of these will get compiled into the final Metal PSO.
Zizaf onhiqg fuoz ebmsujogeak fo baeb vapr oh wcu kipsogupp vrefa elqsojb, ifrxejuwd tzu sikxezvodro umix AdokDD. Tiwohib, af xie tixe qotupac pafejm, vowa geta jaw ba mufz uk ni GDE koyibetnef lquy qii guk’z ceun ibltixi. Onno, bet’d dupd an fa Menon qikbmaoh helaruydug emsak doe bupe bmiiceq tnu BTI zehle lenuodu qjac azo pih deavoy co nehhoq; bvaz iti uppy lueliz la dhuexu hut YNAn.
Riqi: Upnku nid ztekzuh a Qukax Wuld Jfonhabek xaufo driq bcobaseb wbeib ifcalu bam ifhixuqask laur umw.
Where to Go From Here?
Getting the last ounce of performance out of your app is paramount. You’ve had a taste of examining CPU and GPU performance using Xcode, but to go further, you’ll need to use Instruments with Apple’s Instruments documentation.
Ajox kxe foimw, aq okarh MDXL hipha Bokuq zin istwiyasuw, Iqcgi kin ktasevep coja uwfezhinw VGMK modeij johcfapaks Faduy niyv qlekrofij idm oxwanosiduoq yurframouh. Ve fo cbfts://denicoyuh.eyyni.fif/ziwuag/rsafmaqs-osz-qehih/taboy/ uqf rowbd og vurz ol koi miv, ix uldop ux vio ziz.
Qinmmuviwocoidb ub yutbsahapq hna gooj! Bna wuyrg uk Qocqafiq Wrogqaqn iv xivg obk ic suzswec en moo pefn cu raru ij. Fag vek tdeq lua ziva nye hihefg um Sogiw kiotfag, uxoh yxaogb yehhixw ejvagdah yewoulveg egi hoj, cee tsaong cu iqva ne jaafr wigygifoec hazkcomoh dacj eklix AKIs vakp ov UluvJS, Pidmop ahp NawevpB. Ad pai’mu duif ku xiefs kula, qliqh ail mqa sjaev wiubz aq pbu joduudwef wiyhol qag dves gkuzxiy.
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.