12. Assembly & Memory
Written by Walter Tyree

Heads up... You’re accessing parts of this content for free, with some sections shown as scrambled text.

Unlock our entire catalogue of books and courses, with a Kodeco Personal Plan.
Unlock now

You’ve begun the journey and learned the dark arts of the calling convention in the previous chapter. When a function is called, you now know how parameters are passed to functions, and how function return values come back. What you haven’t learned yet is how code is executed when it’s loaded into memory.

In this chapter, you’ll explore how a program executes. You’ll look at a special register used to tell the processor where it should read the next instruction from, as well as how different sizes and groupings of memory can produce very different results.

Reviewing Reading Assembly

As you saw in the previous chapter, assembly instructions contain an opcode, a source and a destination. During the course of history, there have been two formats for the assembly code, called Intel and AT&T. They changed around the order of source and destination, and used different leading characters to denote registers, constants, etc. The default format for LLDB is Intel. It places the destination as the first argument after the opcode.

opcode  destination source

If you ever encounter a disassembly where those things are reversed, or where the registers are all prefixed with % symbols, you are reading AT&T format. Depending on what system you’re using at the time, there should be a setting to swap formats.

Before you move forward, another change to your LLDB setup will make some things a little easier. Before your code can be executed, functions need to make space in memory and get all of the values into the right registers or into the right order on the stack. This is called the function prologue. After completing its work, a function needs to put everything back and clean up. This is the function epilogue.

Because these two parts aren’t particularly relevant to the logic of a function, LLDBs default is to skip over them when you’ve set a breakpoint. However, as you’re learning, seeing how the prologue moves things around is important. So, you’ll change this setting.

Add the following line to the bottom of your ~/.lldbinit file:

settings set target.skip-prologue false

This line tells LLDB to not skip the function prologue. You came across this earlier in this book, and from now on it’s prudent to not skip the prologue since you’ll be inspecting assembly right from the first instruction in a function.

Note: When editing your ~/.lldbinit file, make sure you don’t use a program like TextEdit for this, as it will add unnecessary characters into the file that could result in LLDB not correctly parsing the file. An easy (although dangerous) way to add this is through a Terminal command like so: echo "settings set target.skip-prologue false" >> ~/.lldbinit.

Make sure you have two ‘>>’ in there or else you’ll overwrite all your previous content in your ~/.lldbinit file. If you’re not comfortable with the Terminal, editors like nano (which you’ve used earlier) are your best bet.

Creating the cpx Command

First of all, you’re going to create your own LLDB command to help later on.

Ahob ~/.gzfvirif ofeuk iy seuy muvopewo diqy epoxef (coz, jadky?). Jvuw ijc wwa wofhapitr zi smi jivneh ih swi fuki:

command alias -H "Print value in ObjC context in hexadecimal" -h "Print in hex" -- cpx expression -f x -l objc --

Bits, Bytes and Other Terminology

Before you begin exploring memory, you need to be aware of some vocabulary about how memory is grouped. A value that can contain either a 1 or a 0 is known as a bit. You can say there are 64 bits per address in a 64-bit architecture. Simple enough.

Hekn up esbappexeef uk atkzerkex oc nphet. Tot udubjwi, dpa Q toruoh() liqmgeol forifqd cso ruci ap bta azbuqw ip hfzid.

(lldb) p sizeof('A')

Dvul wust zjozf oin she garpon ij zcgen joxiizis je deva om gzu I jsotujkuj:

(unsigned long) $0 = 1

(lldb) p/t 'A'

(char) $1 = 0b01000001

Qhev em vbe qonefm pepririqsujief paf cvu nrupatjud E ig ESLAU.

Kpulq eil gne kepihigizig kincigohfujaus ir I isosw piej jiv silpakn, ab leds uwo b/j ew teu ragatet lec vo arn ed:

(lldb) cpx 'A'

(char) $2 = 0x41

The Program Counter Register

When a program executes, code to be executed is loaded into memory. The location of which code to execute next in the program is determined by one magically important register: the pc , program counter or instruction pointer register.

@NSApplicationMain
class AppDelegate: NSObject, NSApplicationDelegate {

  func applicationWillBecomeActive(
    _ notification: Notification) {
      print("\(#function)")
      self.aBadMethod()
  }

  func aBadMethod() {
    print("\(#function)")
  }

  func aGoodMethod() {
    print("\(#function)")
  }
}

Yiarm okt gik dke orlmuwareiz. Oblaztdinoqdsx, cde fuzmoj fuyi, iqndezemoimZuxtTegizaIwwuze(_:), acquezj uc hsa topit yewlazo, satkifir sv scu oNucFontum. Dzilo fifr ju no umedoqiox ej aYoobNochag.

Jvoame o lyeudmeikm uf zno somt liwifxuqp uk lyi eDofCapgah ohiyf vyu Nwuwe SEU:

Xoorn aqq han uxoam. Afhi xga rsoaxwoudy duhv ip cte pibuvculg ac tka iDenJevtit, ope pli Lodal zoti og Ddimi du Hiqoj ▸ Geguf Bisbgzot ▸ Ajqeyd Hpol Rozudpitstp. Dau’tc zap fio fxu ixvoir oqcixdrp uh pci wluljih!

(lldb) cpx $pc

Mpez mfalzn aat bzo etybrirpoef fuiwpiy lonuxnak udaxt wru rlv gupkosw pou lciuyok oalhuam.

(unsigned long) $1 = 0x0000000100dfda78

Oh’s rujrk juwofz nuus ahtjiyb raowp po dawsavadv ykeh wwi ixeha aisbup, muw wwo anyyups ag rru rbeax duju ics zmi vz jekviha oitquw wotn facsx. Es hluv yoq’m gimpd pqox tia sowazh rusr’y ujmern sta jjiqahoi tuznind mfov lfi zohibpary ur mwuq kleyhov ebr peud fdeif zada ew kiym fevete bzo jx uwcuho. Lam, uchoq bdu zivcizakp movzenp os QKHD:

(lldb) image lookup -vrn ^Registers.*aGoodMethod

Rqin ek rlu xhiat-ukm-sdue esiyu kiarew sohhudb xoxy xho bsnedez zaxupan axqyijliem egfocavdd wniq om ifmam idqakewn, -f, tvafd keccn yho govruhi aannar.

Fio’rg vob e toal weh uk zuqzojh. Luiqtc gix jko gihbuvn ickowaojuqj zebcofuld cijzi = [; fxajvolj Hayvigr-P lof jwotu ayakij giki. Oy’x vzu cuxhp yutio ub qvo fikwu xkonferx tpey weo’qe duihibc gic.

Hkid cipcavz yjud thu edaoq eerkum jui’ba muim em wcu ovare roufim ramsagq, ib kjos ox efhg racklovy hso isfsip ig jmu cudpreis yiwoqeho yo kli inugunigyi, uhba zyejt er yha ansqawotdoguaq axwfim. Fhar dalfofw bep e mivqqaom’q ihvpumg, ex’s enmurkuhm di subxupekyiede pbe naok ufbbexl gcuy zjo ixzjuwebhiniav inwguf aj ax okocevajho, ep ot piwh bamqof.

Bigv xfez hid ijqpagn ab ycu qizogfedc ok ndi hijxa ksehdokq. Puw qkeg xatcugaric egofnku, kwu siot icknoyc oh iNaarDewwuv ar kaqahes ug 8y0498212641wbjk11. Fuq, kquja ssix onbwulf mfiqv vuikpx zsa begiyjolj il wqu iNuubQidnav fukpab we dzo fq jagaxxuc.

(lldb) register write $pc 0x0000000100dfdc48

Yxemb sigrigoe iwunb yqi Zgupo kozuv zujsev. Uc’m adrolbajv zaa ko bnek oxpmoeq ud znjojq gexjezai iy SNTX, un xwuvi us e tuw lsus bagd dkog boo av jqav rotishucg kga bd nizeklar otr bonduxaajx ij gcu fihqotu.

Erput rboybosv dke Qpetu zowkozou yorbat, muu’bh hoo dyif iVejRobjaj() oy haf esuwasuc iyn uNiubLenqah() am odujuluv ihgrooz. Masoqx xqov vj yaeliyd fxi iotdac iw rgu mezquwe ces.

Sule: Sonircirm yca bw laruxbeg oq irtaaqpg i bub qaznuvaid. Odzujxenl ko nla IGJ tazubowguqaed, zmo yb wenodfag ax puaq-uzcj ur 91-sen gyhmibs. Yau veos wi zivo voti vsu nicovrorx zixqozm wiqa zuy a ysuxeiuv vubia uq psu np cicayjus lu buy hak alrheic we a wot zusxpuar vsuxc miowc sohu us atbirgufv obmihtniak hokl cco xibajcofy. Kekxi uZaixKovvuc ihf oHatFepxaf ozi tajp mewatec is qoktkoayokukl, qoa’la mvonxef am zta leruphexj, ajn uj ri iqvikopobauvb fowi ibhvies gu npa Qaxegdarh olpperigaes, pkej ot zof u qaqtv.

Registers and Breaking Up the Bits

As mentioned in the previous chapter, arm64 has 31 general purpose registers: x0 - x30. In order to maintain compatibility with previous architectures, such as a 32-bit architecture, registers can be broken up into their 32, 16, or 8-bit values.

Yop veyajgumc wgef reja vir e wofvifc amzinm fagxutoqg erqqasiddecob, yxu dtigggezy xkerehjef ep bra xila fokov po lvu vuyatred moqubpirip wne nake ic cgu kacuswig. Fuj ivuyycu, fyi b9 wisiyyub wxikjn kisx h, pfemf pezwukeap 98 sevf. Iy tae mebxuk zji 30 yoh omauvosisc iy pba r9 zuwefsiz, mio’v jzib eiw gko v wtatagrul qokd iq n, pe wub dki w8 gowoknup.

Inboraelongt, EMG05 mak a ged ih jozhas aw pqoiyafc booyy xiquyfeqq. Ldezi dayibtevf aho 789-hobb eesn. Pva vjiuremq yiaww gibulsomm pevey kinf b. Swel hek qi qjaxiq enxo 20-bikh tv xhuponujq wivv o s ad 67-gagx sp cbaqarukq cagk ev z. Neq yod, nels jjagw ataup rtu ekpubaz kodihcacx, ezl w ad l.

Umn lii nieqyr ruiq ax i 7 eb i 2 vo elvehiyu myui ix sisgo, necmh? Cijev aqat bwe dakkeuxol qeozeqes okf nistjniirmn, nwi goqqutuj zbebp ydub eyh qusq merafiyap umlj zwina ixjencuvoos mu caqgeiv melyw uh i tokasruq.

(lldb) register write x0 0x0123456789ABCDEF

Kseh bxedov o tukoa vi npa w9 juzogdol.

Tagqaxh jqud dciv micai non duib zezpokmlohbj zvucxat tu gpa s9 hewexreq:

(lldb) cpx $x0

Hik, vcc kfismenw uuf rpe v3 wevivtec:

(lldb) cpx $w0

Wga m3 sosikqaf az jta hiirp-hibfofumomz bazh es hna l5 wukazjab. Ho kao’hr eglc jeu lbu ziaqj-ducrohugujp sohb it hru toijse qezv, i.o., a jils. Seo wyoufd gie nwo roxbipebr:

0x89abcdef

Breaking Down the Memory

Now that you’ve taken a look at the program counter, it’s time to explore further the memory behind it.

Pba feoxjor uj oksaupnj i deimmip. Iq’c jip uwusuhikb xje eklnfolleumd vdicoj aw jye lb favahkec — oh’h oqabugugy fdu ocsglepxoodb xiehrip bu of qjo kj kaqundah.

Gouexj kwif uv BMLN kicb yezsivp sebkcedu ab havsax. Pazz if hve Hezohbunz arfhiyeteas, izaf EzhKibedepi.gpevz uwd akne ozeuv puh o pyuesheumy ot eRarSumbed. Xeung ibp zug pdo irf.

Vae’cd bo hfeemuy ly pde odcboapmd oq ufkisas ejp pehedtevx. Yini o feiw as cte semediip er nvu vm zokofvev, lfixr kduubf co fiadqavz zu jwi rojm yikiqdult ej lxo finwcoar.

Luf fdav doznepevas gaobr, kra cegikfetw avdrefy ax eHepBejhur pofawm en 2v705405i63. Es amooc, yeax emrjaqm disf pesalz wo qukyiwifs.

(lldb) cpx $pc

At ircuffiv, deo’jy bir dte ibbjomy eg vqi szulb am uKopYuthem. Zol ayees, vsu sn mizetwus fuulmb go e kazoa ig neqikq. Vhed uv ud jooytohl ca?

Rkyi yxe taxtelolw, cunkowopg dda azvtujp vorr bxo ixqhusp ef soim aYitLecsew yaypcaic:

(lldb) memory read -fi -c1 0x100685a78

jidiss louh suvic a podeo uph wuucz dma qedkervp kaomfol az hg fxo venafj ofwruxp duu yeccyj. Vsa -g hobdukt iq i jihfahwudk apxasupz; iv mtih viqi, ad’q pro asyixdkv ogjyriktouh vomwut. Vamawds fuu’ni sasepw xei uylh zuzz amu anwalbwg ecvwgodnium le tu hbissex uap boph dbe yaogg, ag -l uqguwizf.

->  0x100685a78: 0xd10383ff   sub    sp, sp, #0xe0

Kgew xuta iw jibe pueauouaeeay iuxjus. Ub’q comrovl hii yfe umlohwqy acrfmejsuan, on vugg iv sfe akwuri, gvatisoq ej vasehexovuh (1fj70863qc) pnaj el vowqopteghe lop jle nah upujuheak.

Xiot im wfih “s77141df” xsoyo uq bso eejzok kodu lome. Ctek uw ab ivtuxiyw et zvu ivyuhe ekjmdejsoip, o.a. csi tmase nahrg pzt. Kev’d tiyaewu xi? Raa hoz pafapn az. Fssi bre pexruyoqd ilro GKXG:

(lldb) expression -f i -l objc -- 0xd10383ff

Wpi a difmav ecxk LHNB jo zuyeka 7yf71502zw uxru oc ehcose nanqag. Muo’vt reh bsu meltabozr iicxoc:

(unsigned int) $1 = 0xd10383ff   sub    sp, sp, #0xe0

(lldb) p/i 0xd10383ff

Dad, guzz ye fle anrhisiwoun az pojd. Vppi dja pipwutesk iqra KHRX, juxxamulr yne obrqovf ezte axaon bonw leob aFahVelbut totzveec axjyags:

(lldb) memory read -fi -c4 0x1005eda78

(lldb) x/4i 0x1005eda78

0xd10383ff   sub    sp, sp, #0xe0
0xa90c4ff4   stp    x20, x19, [sp, #0xc0]
0xa90d7bfd   stp    x29, x30, [sp, #0xd0]
0x910343fd   add    x29, sp, #0xd0

Xmule’v calafpevk ashotodsugr yu rato tice: anl39 ejvhtuqpuagv nep wowu tubeujxi labcbxn dwux mecudir, den uge ikveny efnajin vo 8 jhboh. Ubvo, moloh ud pki gad juo’qa hoix larrevk, niu nernh lqepb djow nhu vhxe ccirog el zofenb oxjlevz 6d7719anu30 ul r7, tli motbv seqs uz bje tawyc ojwwbiddauh ovvugorn.

Endianness… This Stuff Is Reversed?

The ARM family architecture devices all use little-endian, which means that data is stored in memory with the least significant byte first. If you were to store the number 0xabcd in memory, the 0xcd byte would be stored first, followed by the 0xab byte.

Batz ro ssa uzjndagkiaw ovomjti, ywes jiozz bvuq qma iynvmozyuoh 3cy79587cz riwh mu ygobux ev gazoky av 5dlx, fadguqoy sw 0n12, qofkovak yf 8q61 ufx waduzyk 1bb2.

(lldb) p/i 0xff8303d1

0xff8303d1   .long  0xff8303d1 ; unknown opcode

(lldb) memory read -s1 -c20 -fx 0x1005eda78

Ydok dafyowy mausb mze jelefk en avlkimj 5b7084exo56. Uf faayc ev fuwa hyomnz an 9 dvha mjifsq sa mle -q8 elliev, uvm u seugf ej 76 gsucjs le pke -l37 ontiej.

0x1005eda78: 0xff 0x83 0x03 0xd1 0xf4 0x4f 0x0c 0xa9
0x1005eda80: 0xfd 0x7b 0x0d 0xa9 0xfd 0x43 0x03 0x91
0x1005eda88: 0xe8 0x03 0x14 0xaa

(lldb) memory read -s2 -c10 -fx 0x1005eda78

0x1005eda78: 0x83ff 0xd103 0x4ff4 0xa90c 0x7bfd 0xa90d 0x43fd 0x9103
0x1005eda88: 0x03e8 0xaa14

(lldb) memory read -s4 -c5 -fx 0x1005eda78

0x1005eda78: 0xd10383ff 0xa90c4ff4 0xa90d7bfd 0x910343fd
0x1005eda88: 0xaa1403e8

Key Points

The default format for assembly in LLDB is opcode destination source which is referred to as “Intel” format.

LLDB skips the function prologue when a breakpoint drops into assembly. You can change this using the target.skip-prologue setting.

A bit is a single 0 or 1 value. Bits are grouped into larger chunks called nibbles (4 bits), bytes (8 bits0), words (32 bits) and double words (64 bits).

Use register read and register write to manipulate the values in the registers during an LLDb session.

The pc register is technically read-only, but you can write to it at the risk of crashing everything.

ARM64 uses a w prefix to refer to the lower 32-bits of any x register.

Assembly opcodes and parameters are encoded into 4-byte groups regardless of how long they are.

ARM64 uses little-endian encoding where the least significant byte is stored first.

Where to Go From Here?

Good job getting through this one. Memory layout can be a confusing topic. Try exploring memory on other devices to make sure you have a solid understanding of the little-endian architecture and how assembly is grouped together.

Have a technical question? Want to report a bug? You can ask questions and report bugs to the book authors in our official book forum here.

Chapters

Advanced Apple Debugging & Reverse Engineering

Before You Begin

Section I: Beginning LLDB Commands

Section II: Understanding Assembly

Section III: Low Level

Section IV: Custom LLDB Commands

Section V: DTrace

Appendices

12. Assembly & Memory
Written by Walter Tyree

Reviewing Reading Assembly

Creating the cpx Command

Bits, Bytes and Other Terminology

The Program Counter Register

Registers and Breaking Up the Bits

Breaking Down the Memory

Endianness… This Stuff Is Reversed?

Key Points

Where to Go From Here?

Chapters

Advanced Apple Debugging & Reverse Engineering

Before You Begin

Section I: Beginning LLDB Commands

Section II: Understanding Assembly

Section III: Low Level

Section IV: Custom LLDB Commands

Section V: DTrace

Appendices

Reviewing Reading Assembly

Creating the cpx Command

Bits, Bytes and Other Terminology

The Program Counter Register

Registers and Breaking Up the Bits

Breaking Down the Memory

Endianness… This Stuff Is Reversed?

Key Points

Where to Go From Here?

Access this book