iOS Assembly Tutorial: Understanding ARM

Learn how to read assembly in iOS – a useful skill when debugging your code or diagnosing why a crash has occurred. By Matt Galloway.

Leave a rating/review
Save for later
Share

Do you speak assembly?

Do you speak assembly?

Do you speak assembly?

When you write Objective-C code, it eventually turns into machine code – the raw 1s and 0s that the ARM CPU understands. In between Objective-C code and machine code, though, is the still human-readable assembly language.

Understanding assembly gives you insight into your code for debugging and optimizing, helps you decipher the Objective-C runtime, and also satisfies that inner nerd curiosity.

In this iOS assembly tutorial, you’ll learn:

  • What assembly is – and why you should care about it.
  • How to read assembly – in particular, the assembly generated for Objective-C methods.
  • How to use the assembly view while debugging – useful to see what is going on and why a bug or crash has occurred.

To get the most out of this tutorial, you should already be familiar with Objective-C programming. You should also understand some simple computer science concepts such as the stack, the CPU and how they work. If you are not at all familiar with CPUs, you might need a little pre-reading before continuing.

Warm up your copy of Xcode 4 and get ready to dive into the innards of ARM!

Getting Started: What is Assembly?

Objective-C is what is known as a high-level language. Your Objective-C code is compiled by a compiler into assembly language: low-level, but still not the lowest level.

This assembly is then assembled by an assembler (say that three times fast!) into machine code, the raw 1s and 0s that the CPU reads. Fortunately, you don’t ever need to worry about machine code, but understanding assembly in detail is sometimes extremely useful.

SpeakAssembly

Each assembly instruction is designed to tell the CPU to perform a task such as “add these two numbers” or “load the contents of this portion of memory.”

Aside from main memory – the 1GB on an iPhone 5 or the 8GB you might have on a Mac, for example – CPUs also have a little bit of working memory that can be accessed very quickly. This working memory is divided up into registers, which are like variables that can hold a single value.

All iOS devices (in fact, pretty much all mobile devices out there these days) use CPUs based on the ARM architecture. Fortunately, this is a fairly easy-to-read instruction set, not least because it’s what is known as RISC (Reduced Instruction Set Computing), meaning that there are fewer instructions available. It is much easier to read than x86, anyway!

An assembly instruction (or statement) looks something like this:

mov r0, #42

There are many commands, or opcodes, in assembly language. One of them, mov, moves data around. In ARM assembly, the destination comes first, so the above instruction moves the value 42 into register r0. Consider this next example:

ldr	r2, [r0]
ldr	r3, [r1]
add	r4, r2, r3

Don’t worry, I’m not expecting you to understand what that means straight away. But you might be able to roughly figure out what’s going on. The instruction is loading two values from memory and storing them in registers 2 and 3, then adding the numbers together and storing the result in register 4.

Now that you’ve seen it’s not so intimidating, let’s get a little more detailed.

Calling Conventions

The first and most important thing to understand about reading assembly is the way in which code interacts with other code. By this, I mean the way functions “call” other functions. This includes how parameters are passed to functions and how values are returned from functions.

The way these things are done make up what is known as the calling convention. Compilers must stick to this defined standard such that code compiled with one compiler can interact with code compiled with a different compiler. Without this standard, compilers could generate incompatible code.

As discussed above, registers are bits of memory very close to the CPU that are used to hold the data currently being acted upon. ARM CPUs contain 16 registers numbered r0 to r15, each of which are 32 bits wide. The calling convention dictates that some of these registers have a special purpose. They are as follows:

  • r0 - r3: These hold parameters passed to a function.
  • r4 - r11: These hold a function’s local variables.
  • r12: This is the intra-procedure-call scratch register. This register is special in that it can be changed across a function call.
  • r13: The stack pointer. The stack is a very important concept in computer science. This register holds a pointer to the top of the stack. See Wikipedia for more information about stacks.
  • r14: The link register. This holds the address of the next instruction to execute when returning from the current function.
  • r15: The program counter. This hold the address of the currently executing instruction. It is automatically incremented after each instruction is executed.

You can read more about the ARM calling convention in this document from ARM. Apple also has a document outlining further details about the calling convention used for iOS development.

Right, enough etiquette – time to get started with some real coding!

Creating the Project

In this iOS assembly tutorial, you won’t create an app, but you’ll still use an Xcode project to illustrate what’s going on. Start Xcode and go to File\New\New Project, select iOS\Application\Single View Application and click Next. Set up the project like so:

01 - Create the project

  • Product name: ARMAssembly
  • Company Identifier: Your usual reverse DNS identifier
  • Class Prefix: Leave blank
  • Devices: iPhone
  • Use Storyboards: No
  • Use Automatic Reference Counting: Yes
  • Include Unit Tests: No

Click Next and finally, choose a location to save your project.

One Plus One

The first thing you’ll do is look at a very simple function that adds two numbers and returns the result. You can’t get much easier than that!

Actually, you can, by starting with a simple C function, because Objective-C adds a little more complexity. Open main.m in the project’s Supporting Files folder and paste the following function at the top of the file:

int addFunction(int a, int b) {
    int c = a + b;
    return c;
}

Now make sure that the scheme is set to build for a device by selecting iOS Device as the scheme target (or it may say <Your_Device_Name>, such as “Matt Galloway’s iPhone 5”, if you have a device plugged in). You want to build for a device so that the assembly generated is ARM, rather than the x86 that the Simulator uses. The scheme selector in Xcode should look like this:

02 - Select iOS Device scheme

Now go to Product\Generate Output\Assembly File. After some thinking time, Xcode should land you with a file that contains a lot of strange-looking lines. At the top, you’ll see a lot of lines starting with .section. That means you’ve got the right thing! Now select Running from the Show Assembly Output For selector.

Note: You are selecting the Running scheme because by default, it uses the debug scheme settings. In debug mode, absolutely no optimizations are done by the compiler. You want to see the assembly without optimizations at first, so that you can see exactly what’s happening.

Note: You are selecting the Running scheme because by default, it uses the debug scheme settings. In debug mode, absolutely no optimizations are done by the compiler. You want to see the assembly without optimizations at first, so that you can see exactly what’s happening.

Search in the generated file for _addFunction. You should find something that looks like the following:

	.globl	_addFunction
	.align	2
	.code	16                      @ @addFunction
	.thumb_func	_addFunction
_addFunction:
	.cfi_startproc
Lfunc_begin0:
	.loc	1 13 0                  @ main.m:13:0
@ BB#0:
	sub	sp, #12
	str	r0, [sp, #8]
	str	r1, [sp, #4]
	.loc	1 14 18 prologue_end    @ main.m:14:18
Ltmp0:
	ldr	r0, [sp, #8]
	ldr	r1, [sp, #4]
	add	r0, r1
	str	r0, [sp]
	.loc	1 15 5                  @ main.m:15:5
	ldr	r0, [sp]
	add	sp, #12
	bx	lr
Ltmp1:
Lfunc_end0:
	.cfi_endproc

That may look a bit daunting, but it’s really not that hard to read what’s happening. First, all the lines that begin with a period are not assembly instructions but commands to the assembler itself. You can ignore all of those for now.

The lines that end with a colon, such as _addFunction: and Ltmp0:, are known as labels. These give names to parts of the assembly. The label called _addFunction: is, in fact, the entry point to the function.

This label is required so that other code can call the addFunction routine without having to know exactly where it is, simply by giving the symbolic name, or label. It is the linker’s job to then convert this label into the actual memory address when the final app binary is generated.

Note that the compiler always adds an underscore to the front of function names – this is purely a convention. The other labels all begin with L. These are known as local labels and are only used within the function itself. In this simple example, none of the local labels are actually used but the compiler still generates them, because it is not performing any optimizations at all.

Comments start with the @ character. Note that the compiler helpfully maps sections of assembly with their corresponding line number in main.c.

So, ignoring comments and labels, the important bits are as follows:

_addFunction:
@ 1:
	sub	sp, #12
@ 2:
	str	r0, [sp, #8]
	str	r1, [sp, #4]
@ 3:
	ldr	r0, [sp, #8]
	ldr	r1, [sp, #4]
@ 4:
	add	r0, r1
@ 5:
	str	r0, [sp]
	ldr	r0, [sp]
@ 6:
	add	sp, #12
@ 7:
	bx	lr

And this is what each part of that is doing:

Here, the two parameters are saved to the stack. This is achieved by the store register (str) instruction. The first parameter is the register to store and the second parameter is the address at which to store it. The square brackets indicate that the value is a memory address.

The instruction allows you to specify an offset to apply to the value, so [sp, #8] means to store at “the address held in the stack pointer register, plus 8.” Likewise, str r0, [sp, #8] means “store the contents of register 0 into the memory address of stack pointer, plus 8.”

If you’re wondering why r0 and r1 are being stored and then immediately reloaded, the answer is: yes, these two lines along with the two above are redundant! If the compiler were allowed to perform even basic optimizations, then this redundancy would be eliminated.

The add instruction can either take two parameters like this, or three. If three are given, then the first is the destination register and the remaining two are the source registers. So the instruction here could instead have been written as add r0, r0, r1.

  1. First, room on the stack is created for any temporary storage. The stack is a big blob of memory that functions can use as they wish. The stack in ARM extends downward, meaning to create some space on it, you must subtract (sub) from the stack pointer. In this case, 12 bytes are reserved.
  2. r0 and r1 hold the values passed to the function. If the function took four parameters, then r2 and r3 would hold the third and fourth parameters. If the function took more than four parameters, or took parameters that don’t fit into 32-bit registers such as larges structures, then parameters could be passed via the stack.

    Here, the two parameters are saved to the stack. This is achieved by the store register (str) instruction. The first parameter is the register to store and the second parameter is the address at which to store it. The square brackets indicate that the value is a memory address.

    The instruction allows you to specify an offset to apply to the value, so [sp, #8] means to store at “the address held in the stack pointer register, plus 8.” Likewise, str r0, [sp, #8] means “store the contents of register 0 into the memory address of stack pointer, plus 8.”

  3. The values just saved to the stack are read back out into the same registers they were in already. As an opposite of the str instruction, ldr (load register) loads data from a memory location into a register. The syntax is very similar. So ldr r0, [sp, #8] means “load the contents at the memory address of stack pointer plus 8 and put the value into register 0.”

    If you’re wondering why r0 and r1 are being stored and then immediately reloaded, the answer is: yes, these two lines along with the two above are redundant! If the compiler were allowed to perform even basic optimizations, then this redundancy would be eliminated.

  4. This is the most important instruction of the function, and performs the addition. It means add the contents of r0 and r1 and put the result back into r0.

    The add instruction can either take two parameters like this, or three. If three are given, then the first is the destination register and the remaining two are the source registers. So the instruction here could instead have been written as add r0, r0, r1.

  5. Once again, the compiler has generated some redundant code where the result of the addition is stored to the stack and immediately read back out.
  6. The function is about to terminate, so the stack pointer is put back to where it was originally. The function started by subtracting 12 from sp to reserve 12 bytes. Now it adds the 12 back. Functions must ensure they balance any stack pointer operations, otherwise the stack pointer would drift, eventually overrunning the allocated stack space. You really don’t want to do that…
  7. Finally, the branch indirect instruction bx is executed to go back to the calling function. Recall that the register lr is the “link register” which holds the next instruction to execute in the function that called the current function. Notice that after the addFunction routine returns, r0 will hold the result of the addition. This is another part of the calling convention. The return value from a function will always be in r0. That is, unless it can’t fit into a single register, at which point r1r3 can also be used.

That wasn’t all that complicated, was it? To get more information about each of these instructions, see the instruction set chart found on the ARM website.

You saw that much of the above function is redundant. As stated, this is because the compiler is in debug mode, meaning no optimizations are made. If you turn optimizations on, then you’ll see a much smaller function generated.

Change the Show Assembly Output For selector to Archiving. Now search for _addFunction: again and you’ll see the following (only instructions shown):

_addFunction:
	add	r0, r1
	bx	lr

That is much more concise! Notice how that add function can be done with just two instructions. You might not have expected that a function could be just two instructions, but there you have it. Of course, your own functions are likely to be much longer and do more interesting things. :]

Now you have a function that ends with a branch back to the caller. What about the other half of the equation, the part where the function gets called?