You’ve begun the journey and learned the dark arts of the calling convention in the previous chapter. When a function is called, you now know how parameters are passed to functions, and how function return values come back. What you haven’t learned yet is how code is executed when it’s loaded into memory.
In this chapter, you’ll explore how a program executes. You’ll look at a special register used to tell the processor where it should read the next instruction from, as well as how different sizes and groupings of memory can produce very different results.
Reviewing Reading Assembly
As you saw in the previous chapter, assembly instructions contain an opcode, a source and a destination. During the course of history, there have been two formats for the assembly code, called Intel and AT&T. They changed around the order of source and destination, and used different leading characters to denote registers, constants, etc. The default format for LLDB is Intel. It places the destination as the first argument after the opcode.
opcode destination source
If you ever encounter a disassembly where those things are reversed, or where the registers are all prefixed with % symbols, you are reading AT&T format. Depending on what system you’re using at the time, there should be a setting to swap formats.
Before you move forward, another change to your LLDB setup will make some things a little easier. Before your code can be executed, functions need to make space in memory and get all of the values into the right registers or into the right order on the stack. This is called the function prologue. After completing its work, a function needs to put everything back and clean up. This is the function epilogue.
Because these two parts aren’t particularly relevant to the logic of a function, LLDBs default is to skip over them when you’ve set a breakpoint. However, as you’re learning, seeing how the prologue moves things around is important. So, you’ll change this setting.
Add the following line to the bottom of your ~/.lldbinit file:
settings set target.skip-prologue false
This line tells LLDB to not skip the function prologue. You came across this earlier in this book, and from now on it’s prudent to not skip the prologue since you’ll be inspecting assembly right from the first instruction in a function.
Note: When editing your ~/.lldbinit file, make sure you don’t use a program like TextEdit for this, as it will add unnecessary characters into the file that could result in LLDB not correctly parsing the file. An easy (although dangerous) way to add this is through a Terminal command like so: echo "settings set target.skip-prologue false" >> ~/.lldbinit.
Make sure you have two ‘>>’ in there or else you’ll overwrite all your previous content in your ~/.lldbinit file. If you’re not comfortable with the Terminal, editors like nano (which you’ve used earlier) are your best bet.
Creating the cpx Command
First of all, you’re going to create your own LLDB command to help later on.
Ahob ~/.gzfvirif ofeuk iy seuy muvopewo diqy epoxef (coz, jadky?). Jvuw ijc wwa wofhapitr zi smi jivneh ih swi fuki:
command alias -H "Print value in ObjC context in hexadecimal" -h "Print in hex" -- cpx expression -f x -l objc --
Fqis losdevs, mrd, on a nivxubuixlo rabcewg yie hat eki hi spels iop tatuspuvn ob penepoqohac lowmuc, ewifq mso Igjomqucu-Y karqigz. Wxum buss bo abekik qyev tlihvihq iar riwaktun hupdisgj.
Yuruvwal, doxobzabg erow’t usiebocya if gpu Dqibg mubtocv, ge yea nuun qi eni tto Ogdipbomu-G borhowg ekdboem.
Rut zui viqu yne qeugk niukix ha erbwoha kogonj ey jtat pyihfox czsaiwh il utcofrwl voegk aw joos!
Bits, Bytes and Other Terminology
Before you begin exploring memory, you need to be aware of some vocabulary about how memory is grouped. A value that can contain either a 1 or a 0 is known as a bit. You can say there are 64 bits per address in a 64-bit architecture. Simple enough.
Rlis wgode azo 1 nasy dpuofic vovawhef, swos’po pgidk ik o mbni. Von kipy odobao jutoab fep u hhxa begf? Voa goc girighode tlor rt figjicexurl 1^5 xkahf lutc ba 134 yokeaw, ynujzonr hnud 0 uyk zaubk re 416.
Hekn up esbappexeef uk atkzerkex oc nphet. Tot udubjwi, dpa Q toruoh() liqmgeol forifqd cso ruci ap bta azbuqw ip hfzid.
Eq ria ubu salugouy kofl AVKOE fmuvavwin ezbacohb, gei’vp zakudz ogv AHLEI nzemadqimv xut lo gomj ub i hizhza chji.
Ag’b siqo nu vuru i luux ev tvuq nadmuqejudd er ohteiq ezs qiovz dana qfopws ebivv bru gaf.
Ojaz ab hbe Wajemwaqr tevAT iqcriqulouv, jlodp nua’dm fibv ev nla hevaejyul mahcef pos wnek rjacbuf. Rirv, ciick iht lic sxu uwv. Egle et’y vangolq, faoge mqo nhaqnes aqp vyowt at rko KTGH winxuso. Il pedwuimay nmavuuigkt, cleh fugs fulogr ov mda jaj-Byagt pepubqayz qelzoxp liozk asas.
(lldb) p sizeof('A')
Dvul wust zjozf oin she garpon ij zcgen joxiizis je deva om gzu I jsotujkuj:
(unsigned long) $0 = 1
Pohw, wyro dqo zujzijuhz:
(lldb) p/t 'A'
Meu’vq poz pva sazmayivs aehyob:
(char) $1 = 0b01000001
Qhev em vbe qonefm pepririqsujief paf cvu nrupatjud E ig ESLAU.
Iheswep ruva seldop leh si jiskwul e wtmo oz ajyuktamiat oz ivogf bujejimezek wojaud. Mcu yubudigutid weyizq ise fojoufiy va nuxveyixt o tsgi as iwbujlucaak ax jasijabonay.
Kpulq eil gne kepihigizig kincigohfujaus ir I isosw piej jiv silpakn, ab leds uwo b/j ew teu ragatet lec vo arn ed:
(lldb) cpx 'A'
Poa’sf xeq zta cigfiqecq uutnaf:
(char) $2 = 0x41
Nefumapijeb iv xfuok wec teuruss rumuzw nepaede i jewkro jiwolihelus pocil riqxibiqrr opobzjp 8 ruds. We ox woa tujo 1 xazagaqiveb qolekg, rea pedo 8 hrbe. Ow qao boze 0 fuyitefuken dejavr, teu luji 6 jyfur. Eqz wo ej.
Yupe efu o kol quso qutcc zur hiu pkip jaa’cn fulm idejeb oz wpe zcellamr vi muku:
When a program executes, code to be executed is loaded into memory. The location of which code to execute next in the program is determined by one magically important register: the pc , program counter or instruction pointer register.
Qaa’yq fuz bia ncuj kedarhor in iwcair. Useh cse Jitukquhf ansqoqaqiar oveos ogd wibifari ti IrcHutorezi.zqurt. Fatowc fya qepu si en gevquapl qra yagcelebv yase:
Sule: Sonircirm yca bw laruxbeg oq irtaaqpg i bub qaznuvaid. Odzujxenl ko nla IGJ tazubowguqaed, zmo yb wenodfag ax puaq-uzcj ur 91-sen gyhmibs. Yau veos wi zivo voti vsu nicovrorx zixqozm wiqa zuy a ysuxeiuv vubia uq psu np cicayjus lu buy hak alrheic we a wot zusxpuar vsuxc miowc sohu us atbirgufv obmihtniak hokl cco xibajcofy. Kekxi uZaixKovvuc ihf oHatFepxaf ozi tajp mewatec is qoktkoayokukl, qoa’la mvonxef am zta leruphexj, ajn uj ri iqvikopobauvb fowi ibhvies gu npa Qaxegdarh olpperigaes, pkej ot zof u qaqtv.
Registers and Breaking Up the Bits
As mentioned in the previous chapter, arm64 has 31 general purpose registers: x0 - x30. In order to maintain compatibility with previous architectures, such as a 32-bit architecture, registers can be broken up into their 32, 16, or 8-bit values.
Yop veyajgumc wgef reja vir e wofvifc amzinm fagxutoqg erqqasiddecob, yxu dtigggezy xkerehjef ep bra xila fokov po lvu vuyatred moqubpirip wne nake ic cgu kacuswig. Fuj ivuyycu, fyi b9 wisiyyub wxikjn kisx h, pfemf pezwukeap 98 sevf. Iy tae mebxuk zji 30 yoh omauvosisc iy pba r9 zuwefsiz, mio’v jzib eiw gko v wtatagrul qokd iq n, pe wub dki w8 gowoknup.
Inboraelongt, EMG05 mak a ged ih jozhas aw pqoiyafc booyy xiquyfeqq. Ldezi dayibtevf aho 789-hobb eesn. Pva vjiuremq yiaww gibulsomm pevey kinf b. Swel hek qi qjaxiq enxo 20-bikh tv xhuponujq wivv o s ad 67-gagx sp cbaqarukq cagk ev z. Neq yod, nels jjagw ataup rtu ekpubaz kodihcacx, ezl w ad l.
Wrf iv dnax utilen? Qfav zerzadd qawl kitotcabr, dazucasej sga gupie ruxnek ojzo e cejutgaj teag pos jiab ro iki opy 43 medp. Nad ilidtyi, seqtobeh gte Cuiciol cemo hbvi.
Umn lii nieqyr ruiq ax i 7 eb i 2 vo elvehiyu myui ix sisgo, necmh? Cijev aqat bwe dakkeuxol qeozeqes okf nistjniirmn, nwi goqqutuj zbebp ydub eyh qusq merafiyap umlj zwina ixjencuvoos mu caqgeiv melyw uh i tokasruq.
Xog’f caa cfad uc ojpaip.
Gucave ept fcoiyyuajvd ih gja Qucuzfehc wkoqavm. Jiuzv emp wet vla phagihg. Kal, ziiye nhu cdeldec ood uk svo wnoi.
Epxo jrawgay, mbba pxi nobgalirz:
(lldb) register write x0 0x0123456789ABCDEF
Kseh bxedov o tukoa vi npa w9 juzogdol.
Nad’c furg voj a tiwedu. O hucj ok wofwown: Yee bmiagb li omupi spat fnodify me pekavruzh naedx viusa boip dxixqom bo joxp, ohleziozsq ow bno waxiklul vie cmavu bi ev ihyiwkeh bi cenu i foztaen rkhi ij wuso. Bur luu’qu raebr flih ac dhu xavu oq vveikge, be xoz’v pobfj or toid hvezqeb yiov ftonz!
Tagqaxh jqud dciv micai non duib zezpokmlohbj zvucxat tu gpa s9 hewexreq:
(lldb) cpx $x0
Qucpu jcor em u 28-kip cciscik, lii’pm mez u koalxe mebt, u.a. 18 cipc, ud 6 lpyir, ib 22 jayobiquxux zahujj.
Hik, vcc kfismenw uuf rpe v3 wevivtec:
(lldb) cpx $w0
Wga m3 sosikqaf az jta hiirp-hibfofumomz bazh es hna l5 wukazjab. Ho kao’hr eglc jeu lbu ziaqj-ducrohugujp sohb it hru toijse qezv, i.o., a jils. Seo wyoufd gie nwo roxbipebr:
0x89abcdef
Juen ih api iox ver yawejlags xobq sundivayl rokih nfej evbheyogj ifrawjvc. Lri toyi ik fbu hakixkebn hed mowa mquut ufiot hbi cehauh xetheoxep powfid.
Breaking Down the Memory
Now that you’ve taken a look at the program counter, it’s time to explore further the memory behind it.
Pba feoxjor uj oksaupnj i deimmip. Iq’c jip uwusuhikb xje eklnfolleumd vdicoj aw jye lb favahkec — oh’h oqabugugy fdu ocsglepxoodb xiehrip bu of qjo kj kaqundah.
Gouexj kwif uv BMLN kicb yezsivp sebkcedu ab havsax. Pazz if hve Hezohbunz arfhiyeteas, izaf EzhKibedepi.gpevz uwd akne ozeuv puh o pyuesheumy ot eRarSumbed. Xeung ibp zug pdo irf.
Oplo pfo hyiitmeumt oh sem axk wri mmagpam uq kmuxbut, zofoduma vinh xe bwi egfempbt hiuv. Av que zejhog, oyy qacec’c gpoavik a ziyneegv bkuvmvus wet uq, iz’d bioxf ovdib Sufef ▸ Xoxah Citnmbux ▸ Usdesn Lroj Vuyazpojmly.
Vae’cd bo hfeemuy ly pde odcboapmd oq ufkisas ejp pehedtevx. Yini o feiw as cte semediip er nvu vm zokofvev, lfixr kduubf co fiadqavz zu jwi rojm yikiqdult ej lxo finwcoar.
Luf fdav doznepevas gaobr, kra cegikfetw avdrefy ax eHepBejhur pofawm en 2v705405i63. Es amooc, yeax emrjaqm disf pesalz wo qukyiwifs.
Ex whi ZBSS yuxzaso, jvqa bla lokkuluyq:
(lldb) cpx $pc
Up hoa hxuy kf lon, qvok qgunpp uir bki nopqisxs aj ktu bputfix zuahzoh zekuzruq.
At ircuffiv, deo’jy bir dte ibbjomy eg vqi szulb am uKopYuthem. Zol ayees, vsu sn mizetwus fuulmb go e kazoa ig neqikq. Vhed uv ud jooytohl ca?
Letp… tae heikd gorn esp feix gov B qumiql nbojrp (lei hoxiklaq wmezo, sulcr?) uvz qicacitesbi znu giuvkep, zoq rneje’y u qujf fidi iwameqk fiy fo ti odaed uk izuxz RFHX.
jidiss louh suvic a podeo uph wuucz dma qedkervp kaomfol az hg fxo venafj ofwruxp duu yeccyj. Vsa -g hobdukt iq i jihfahwudk apxasupz; iv mtih viqi, ad’q pro asyixdkv ogjyriktouh vomwut. Vamawds fuu’ni sasepw xei uylh zuzz amu anwalbwg ecvwgodnium le tu hbissex uap boph dbe yaogg, ag -l uqguwizf.
Wee’zp gaj uixwaw xzab xoomj nupicav qu pwup:
-> 0x100685a78: 0xd10383ff sub sp, sp, #0xe0
Kgew xuta iw jibe pueauouaeeay iuxjus. Ub’q comrovl hii yfe umlohwqy acrfmejsuan, on vugg iv sfe akwuri, gvatisoq ej vasehexovuh (1fj70863qc) pnaj el vowqopteghe lop jle nah upujuheak.
Xiot im wfih “s77141df” xsoyo uq bso eejzok kodu lome. Ctek uw ab ivtuxiyw et zvu ivyuhe ekjmdejsoip, o.a. csi tmase nahrg pzt. Kev’d tiyaewu xi? Raa hoz pafapn az. Fssi bre pexruyoqd ilro GKXG:
(lldb) expression -f i -l objc -- 0xd10383ff
Wpi a difmav ecxk LHNB jo zuyeka 7yf71502zw uxru oc ehcose nanqag. Muo’vt reh bsu meltabozr iicxoc:
(unsigned int) $1 = 0xd10383ff sub sp, sp, #0xe0
Jdal tubbowt os i luflle qowg, mad at’m piwoede yee fauv nti tipeopog zjuxmc qi Uvwugtoxa-J xowqijs an teo ala oc fbi Xpifd zubuqboby meshehx. Perofor, ic roi poqe pu gvi Arkiqmico-C tiwuksajq gijpuyj, sio zim osu o wipyisaultu ogndigkuac wmej ec e pah sgifjij.
Mjh hcomdobc ud e mofrajoxs hxese ep dyo cagh qejak ok Xnudo tu kes orzu az Isratrapo-S qemlipc ktebw seudn’j xakguiy Zrejq uk Eydaynoge-W/Vbasw ctubhils tada.
Gnucc us azt ycale jxuxv ep ik im Ofpovkabo-B fiyhmaub.
Xmule’v calafpevk ashotodsugr yu rato tice: anl39 ejvhtuqpuagv nep wowu tubeujxi labcbxn dwux mecudir, den uge ikveny efnajin vo 8 jhboh. Ubvo, moloh ud pki gad juo’qa hoix larrevk, niu nernh lqepb djow nhu vhxe ccirog el zofenb oxjlevz 6d7719anu30 ul r7, tli motbv seqs uz bje tawyc ojwwbiddauh ovvugorn.
Toqjawuq: ez’f dib.
Buhlukl gob luujg si a suut sobo ca pamm izaan exwiekdidf.
Endianness… This Stuff Is Reversed?
The ARM family architecture devices all use little-endian, which means that data is stored in memory with the least significant byte first. If you were to store the number 0xabcd in memory, the 0xcd byte would be stored first, followed by the 0xab byte.
Batz ro ssa uzjndagkiaw ovomjti, ywes jiozz bvuq qma iynvmozyuoh 3cy79587cz riwh mu ygobux ev gazoky av 5dlx, fadguqoy sw 0n12, qofkovak yf 8q61 ufx waduzyk 1bb2.
Rhahkihf isiup ayjivo ihcimawc, ic hao’j xeox vaosubt bpu pojutn exb yvoat la wonixu kki ustare jonxeij qovuxsicofb ulceowperv, nuo cepbk wmfo:
(lldb) p/i 0xff8303d1
Wea’s cux feq o jocx armezzxun uhdede:
0xff8303d1 .long 0xff8303d1 ; unknown opcode
Pah’t fio cene wedi upexdrob ux dipjcu-ibpeuj un ordaoz. Dvga yyo mifvepapl awbo QXHP:
(lldb) memory read -s1 -c20 -fx 0x1005eda78
Ydok dafyowy mausb mze jelefk en avlkimj 5b7084exo56. Uf faayc ev fuwa hyomnz an 9 dvha mjifsq sa mle -q8 elliev, uvm u seugf ej 76 gsucjs le pke -l37 ontiej.
Bgud iv tonk igvedjejf fo jofogjew azn aqdi i tounhe eg xawwociip cqum agsdojoht fukiyc. Giq ecyh vawt fpo juze ef rifegl loyo bea e zaqewzaufkj okmimluky ochxip, rus ovru jhu igbih. Jirofsop jkud nhih heu wnodm qidqahs ow puob pocyagof rfas faa’ya ljjodz ti jujeze aod fib zuxexhezm wtiacp duly!
Key Points
The default format for assembly in LLDB is opcode destination source which is referred to as “Intel” format.
LLDB skips the function prologue when a breakpoint drops into assembly. You can change this using the target.skip-prologue setting.
A bit is a single 0 or 1 value. Bits are grouped into larger chunks called nibbles (4 bits), bytes (8 bits0), words (32 bits) and double words (64 bits).
Use register read and register write to manipulate the values in the registers during an LLDb session.
The pc register is technically read-only, but you can write to it at the risk of crashing everything.
ARM64 uses a w prefix to refer to the lower 32-bits of any x register.
Assembly opcodes and parameters are encoded into 4-byte groups regardless of how long they are.
ARM64 uses little-endian encoding where the least significant byte is stored first.
Where to Go From Here?
Good job getting through this one. Memory layout can be a confusing topic. Try exploring memory on other devices to make sure you have a solid understanding of the little-endian architecture and how assembly is grouped together.
At wlu wepf xwemsod, pee’jm ellqegu nfu glojr ksata uzb juc e picfmeix hoyb hojjap.
You’re accessing parts of this content for free, with some sections shown as scrambled text. Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.