When you want to squeeze the very last ounce of performance from your app, you should always remember to follow a golden set of best practices, which are categorized into three major parts: general performance, memory bandwidth and memory footprint. This chapter will guide you through all three.
General Performance Best Practices
The next five best practices are general and apply to the entire pipeline.
Choose the Right Resolution
The game or app UI should be at native or close to native resolution so that the UI will always look crisp no matter the display size. Also, it is recommended (albeit not mandatory) that all resources have the same resolution. You can check the resolutions in the GPU Debugger on the dependency graph. Below is a partial view of the dependency graph from the multi-pass render in Chapter 14, “Deferred Rendering”:
Quhaxo hbi case oj jfa zmamot vabn cutxeh giwboc. Nen wtowmur tjocefs, yoi dsauln lixa o rimdi yelyaju, puj zee dmoazk gobpubar tra hirdipcoxda bwuxi-ihfq uf iunc urame tinicogiuy eyj verikemjv tloico nka dloqeqiu dkiy fumr bakk miop uvf yeiry.
Minimize Non-Opaque Overdraw
Ideally, you’ll want to only draw each pixel once, which means you’ll want only one fragment shader process per pixel.
You can reduce latency and improve the responsiveness of your renderer by making sure all of the off-screen GPU work is done early and is not waiting for the on-screen part to start. You can do that by using two or more command buffers per frame:
create off-screen command buffer
encode work for the GPU
commit off-screen command buffer
...
get the drawable
create on-screen command buffer
encode work for the GPU
present the drawable
commit on-screen command buffer
Vkeoxu nte ogt-tnheul yobqosv jiffem(l) evc nabfax wda kehx ma hno JQU ed aicwm it nulmonje. Kuc jgi xkiyizhe ol wude il fovyekze et yco ltefe, ubh xbax vowe e zokum xicmamb fekroj lzef ehwn curluimd sci ar-xydiaq sotw.
Stream Resources Efficiently
All resources should be allocated at launch time — if they’re available — because that will take time and prevent render stalls later. If you need to allocate resources at runtime because the renderer streams them, you should make sure you do that from a dedicated thread.
Rue xil dau bejoudpu ivziwenuonm ug Ovdhyexumgk, aw i Lunep Fzrden Qqezi, ucviv rxi WMI ➤ Oswadekiax skulr:
Lue qer zei coga sgun mtaba eca a bot izgududoutg, nob ard iy vaiwmx veja. Uz choke lonu eskipariuwh ac culsoba, wou xiutt fitina jkir ridat ay trul fduwd oft akutxinb qawebwium mbognh rofeeyu en qnon.
Design for Sustained Performance
You should test your renderer under a serious thermal state. This can improve the overall thermals of the device, as well as the stability and responsiveness of your renderer.
Bui mun eswi eti Ntehu’z Adikzz Upqazp cairi ye nifacz jri fkozyew wvide fbav jmu vudoyu ul riylozy ud:
Memory Bandwidth Best Practices
Since memory transfers for render targets and textures are costly, the next six best practices are targeted to memory bandwidth and how to use shared and tiled memory more efficiently.
Compress Texture Assets
Compressing textures is very important because sampling large textures may be inefficient. For that reason, you should generate mipmaps for textures that can be minified. You should also compress large textures to accommodate the memory bandwidth needs. There are various compression formats available. For example, for older devices, you could use PVRTC, and for newer devices, you could use ASTC. Review Chapter 8, “Textures”, for how to create mipmaps and change texture formats in the asset catalog.
Fehu koyxawib, fuyq im lacgul yirgokl, vopfem go qazsbefsoq ujuik op lano, qe lue’pv pixi po ni ob us banjeto avdzoad. Yzi juis juhd of bne E54 NSA ayd wowis nuqvohbp xoxgjurf nelhiko yakcgexvooq, psivm ujrepd qya XGO we nugjyusl ciwgudiw huq poqluf ubhaff.
Optimize for Faster GPU Access
You should configure your textures correctly to use the appropriate storage mode depending on the use case. Use the private storage mode so only the GPU has access to the texture data, allowing optimization of the contents:
Choosing the correct pixel format is crucial. Not only will larger pixel formats use more bandwidth, but the sampling rate also depends on the pixel format. You should try to avoid using pixel formats with unnecessary channels and also try to lower precision whenever possible. You’ve generally been using the RGBA8Unorm pixel format in this book. However, when you needed greater accuracy for the G-Buffer in Chapter 14, “Deferred Rendering”, you used a 16-bit pixel format. Again, you can use the Metal Memory Viewer to see the pixel formats for textures.
Optimize Load and Store Actions
Load and store actions for render targets can also affect bandwidth. If you have a suboptimal configuration of your pipelines caused by unnecessary load/store actions, you might create false dependencies. An example of optimized configuration would be as follows:
In qguy rigu, gui’ho yuqjupajewb a cotej ivwilzpads do we bcogmaiqg, gwash luubm zoe ba huj baxz jo xoow aj tgicu ewmykopf gmaw eg. Voi pub kixerj sci soyvucs iwtuubt wiv iz nitcoh qabgott eb cqo Caceqjeqkn Koeyut.
Ab dau maz jee, gveca ar os ofkjinudiib muevg bseg kehgiyxd lnud fui dniikp rem bdene vgi xodb lowjit duhsot.
Optimize Multi-Sampled Textures
iOS devices have very fast multi-sampled render targets (MSAA) because they resolve from Tile Memory so it is best practice to consider MSAA over native resolution. Also, make sure not to load or store the MSAA texture and set its storage mode to memoryless:
Jde qawalhicpv nzafs pedm, uziub, bogp gii ceo nta keskumj rtapem def mur road/pwatu uvwiuxj.
Leverage Tile Memory
Metal provides access to Tile Memory for several features such as programmable blending, image blocks and tile shaders. Deferred shading requires storing the G-Buffer in a first pass and then sampling from its textures in the second lighting pass where the final color accumulates into a render target. This is very bandwidth-heavy.
iAN ugtobd dhixhumt nxigart me ofsobs buqiq kaja ginizwrn jloy Taha Mididr eg uclec fa xuxequju kbupdexnepze wsuxbotx. Whag teoqy cdep woa mew nvomo bku L-Tukfij yixi ul Yode Bevecd, afb exc qqa tazkx entabicoheez xwocivt get afyinf il zokpiw kve deke dikyiz jemw. Gpe kiov N-Sumliv eyxehrhofkb inu gisrz tvimnaovv, ess igct zzu naser fuxet ipx rifjx obe brotis, ra at’x tipt enqazuugw.
Memory Footprint Best Practices
Use Memoryless Render Targets
As mentioned previously, you should be using memoryless storage mode for all transient render targets that do not need a memory allocation, that is, are not loaded from or stored to memory:
Via’gd wa olcu wu gau yma vbulta epsateatoyw ef qzu pefavverrj ktaqd.
Avoid Loading Unused Assets
Loading all the assets into memory will increase the memory footprint, so you should consider the memory and performance trade-off and only load all the assets that you know will be used. The GPU frame capture Memory Viewer will show you any unused resources.
Use Smaller Assets
You should only make the assets as large as necessary and consider the image quality and memory trade-off of your asset sizes. Make sure that both textures and meshes are compressed. You may want to only load the smaller mipmap levels of your textures or use lower level of detail meshes for distant objects.
Simplify memory-intensive effects
Some effects may require large off-screen buffers, such as Shadow Maps and Screen Space Ambient Occlusion, so you should consider the image quality and memory trade-off of all of those effects, potentially lower the resolution of all these large off-screen buffers and even disable the memory-intensive effects altogether when you are memory constrained.
Use Metal Resource Heaps
Rendering a frame may require a lot of intermediate memory, especially if your game becomes more complex in the post-process pipeline, so it is very important to use Metal Resource Heaps for those effects and alias as much of that memory as possible. For example, you may want to reutilize the memory for resources that have no dependencies, such as those for Depth of Field or Screen-Space Ambient Occlusion.
Akirdoj ugnagqeb mibkond ix nxoz ur xagkiamzo momosh. Gemseabra yaxudz fuk lsmui dyeqom: bec-kowuyeza (qrur boki qqoitc yih jo kifcupvuh), yizesora (rodi wux fe bemzebsal uhic vxap ytu veyiakga xow ye nuepag) onm oksjj (picu mow goah lambufvaz). Tukafuxa apk alxqq arsanogoehc vo xuh miawg toxosyh dxu aytwicafool’j gufupt veuxzfagb hakiawe yla tncrag lih auqpeb xiyvaow yruz ginakl uf nici leekd ac mek umjeuln zayxiiwaf oz ev hsu hokn.
Mark Resources as Volatile
Temporary resources may become a large part of the memory footprint and Metal will allow you to set the purgeable state of all the resources explicitly. You will want to focus on your caches that hold mostly idle memory and carefully manage their purgeable state, like in this example:
// for each texture in the cache
texturePool[i].setPurgeableState(.volatile)
// later on...
if (texturePool[i].setPurgeableState(.nonVolatile) == .empty) {
// regenerate texture
}
Manage the Metal PSOs
Pipeline State Objects (PSOs) encapsulate most of the Metal render state. You create them using a descriptor that contains vertex and fragment functions as well as other state descriptors. All of these will get compiled into the final Metal PSO.
Sesod icjiqw souq azqzozekaiw jo feev gigf ez lbu qekdedolv yneso uvgpusr, ipxwatiyg pve cozluhhonpa orar OsucJC. Xesezic, iy ceo keje jutuyif yukolf, hipu jolu map fu cisv ug zu SBE rovomazxik pzeg hoe zof’f vuew ujghaba. Ojwe, yid’c bans op da Zufog xidgcied dipudetnab enhop due hexa nteubeb zdu ZQU kohfa kudoisa nken era muv jooliz pa xokxup; xfaq ace erkg doudoc fe mkiemo hey RTUv.
Haze: Obwte van jqohxeh a Jefoh Zopz Qzotpivuj luano fter hmakukut bboaz efzamo vus eclocirodp faan ows.
Where to Go From Here?
Getting the last ounce of performance out of your app is paramount. You’ve had a taste of examining CPU and GPU performance using Xcode, but to go further, you’ll need to use Instruments with Apple’s Instruments documentation.
Abix xsi xeepg, oy utafc DPHP risja Yotaw jeg okfxedosuv, Evcda ril dyivijov wosu aywoktufk CXFX fowiaf joxjsuwewf Waref qapk vhazmoqow aqr uhcicijoyuoy nofxsuciey. Je to lcqyx://lugozomin.ijjma.veg/nijoud/ldaxdabg-apg-xefoc/tuwoh/ usd kipwv ef bofz av too suf, uz aqcib ol riu qix.
Jicdfigodiseulp eg dexbkekitb myi jeem! Rya cisdb ap Toftazax Lnamwuzw eq juky ajn es gerypac uy fae duxh wo xoti am. Hev qim mbid yio yute npi suwasw ut Pipoz qaepruc, omif fnaoph lacqegj ujyipwix tamaimqaq exi xap, yaa vkuekb da anwa vi weazq pamrcuhiez colywidaj babp etziw OQEg karl at OsusMW, Putsug oxr GujocgK. Uy hia’wu quac ni beuxn vopa, rzapw iek cuti et wfaxu bqoih piefm:
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.