iOS Assembly Tutorial: Understanding ARM

Learn how to read assembly in iOS – a useful skill when debugging your code or diagnosing why a crash has occurred. By Matt Galloway.

Leave a rating/review
Save for later
Share
You are currently viewing page 3 of 4 of this article. Click here to view the first page.

Objective-C Assembly

Up until now, the functions you’ve seen have been C functions. Objective-C adds a bit more complexity on top, and let’s examine that now. Open ViewController.m and add the following method inside the class implementation:

- (int)addValue:(int)a toValue:(int)b {
    int c = a + b;
    return c;
}

Once again, go to Product\Generate Output\Assembly File to view the assembly. Make sure Archiving is set for the output type, then search for addValue:toValue: and find the assembly that looks like this:

"-[ViewController addValue:toValue:]":
	adds	r0, r3, r2
	bx	lr

The first thing you’ll notice is the label name. This time the name is a string that contains the class name and the full Objective-C method name.

If you look back at the assembly for addFunction and compare, then you’ll also notice is that the two values added together are in r2 and r3 rather than r0 and r1. That must mean that the two parameters to the method are in r2 and r3. Why is that?

Well, it’s because all Objective-C methods are really just C functions with two implicit parameters passed before the rest of the method’s parameters. The addValue:toValue: method is semantically equivalent to the following C function:

int ViewController_addValue_toValue(id self, SEL _cmd, int a, int b) {
    int c = a + b;
    return c;
}

This is why the parameters a and b appear in r2 and r3, respectively. You are probably already aware of the first of the two implicit parameters. You make use of self all the time.

However, _cmd is something you might not have seen before. Like self, it is available inside all Objective-C methods and contains the selector of the currently-executing method. You generally never need to access this, though, which is why you may not have ever heard of it!

To see how Objective-C methods are called, add the following method to ViewController:

- (void)foo {
    int add = [self addValue:12 toValue:34];
    NSLog(@"add = %i", add);
}

Generate the assembly file again and find this method. You should see the following:

"-[ViewController foo]":
@ 1:
	push	{r7, lr}
@ 2:
	movw	r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC1_0+4))
	movt	r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC1_0+4))
LPC1_0:
	add	r1, pc
@ 3:
	ldr	r1, [r1]
@ 4:
	movs	r2, #12
	movs	r3, #34
@ 5:
	mov	r7, sp
@ 6:
	blx	_objc_msgSend
@ 7:
	mov	r1, r0
@ 8:
	movw	r0, :lower16:(L__unnamed_cfstring_-(LPC1_1+4))
	movt	r0, :upper16:(L__unnamed_cfstring_-(LPC1_1+4))
LPC1_1:
	add	r0, pc
@ 9:
	blx	_NSLog
@ 10:
	pop	{r7, pc}

Once again, this is extremely similar to the plain C equivalent you saw earlier. Breaking it down, it does this:

This means that L_OBJC_SELECTOR_REFERENCES_, and hence r1 at this point, contain the address of the label L_OBJC_METH_VAR_NAME_. If you look at that label, you’ll find the string addValue:toValue:.

This instruction, ldr r1, [r1], is loading the value stored at the memory address held in r1 and putting the value back into r1. It is “dereferencing” r1. In pseudo-C-code this looks like: r1 = *r1. If you think about it carefully, this means that r1 will now contain a pointer to the addValue:toValue: string.

The parameters are the same as those that eventually get passed to the method. So r0 is self, r1 is _cmd, r2 and r3 are the remaining parameters. This is why the selector is loaded into r1 and the parameters to pass are loaded into r2 and r3. r0 is not explicitly loaded because it already holds the correct self variable.

  1. Push r7 and lr onto the stack.
  2. Load the value at the label L_OBJC_SELECTOR_REFERENCES_ into r1 using the same program counter relative addressing as seen earlier. The name gives away what this is. It is a reference to a selector. Selectors are just strings, really, stored in the same way in the data segment.
  3. If you look up what L_OBJC_SELECTOR_REFERENCES_ is in the assembly, you’ll see the following:
    
L_OBJC_SELECTOR_REFERENCES_:
    .long	L_OBJC_METH_VAR_NAME_


    

    This means that L_OBJC_SELECTOR_REFERENCES_, and hence r1 at this point, contain the address of the label L_OBJC_METH_VAR_NAME_. If you look at that label, you’ll find the string addValue:toValue:.

    This instruction, ldr r1, [r1], is loading the value stored at the memory address held in r1 and putting the value back into r1. It is “dereferencing” r1. In pseudo-C-code this looks like: r1 = *r1. If you think about it carefully, this means that r1 will now contain a pointer to the addValue:toValue: string.

  4. Load the constants into r2 and r3.
  5. Save the stack pointer.
  6. Branch, with link and exchange, to objc_msgSend. This is the function that is at the heart of the Objective-C runtime. It calls the implementation associated with the required selector.

    The parameters are the same as those that eventually get passed to the method. So r0 is self, r1 is _cmd, r2 and r3 are the remaining parameters. This is why the selector is loaded into r1 and the parameters to pass are loaded into r2 and r3. r0 is not explicitly loaded because it already holds the correct self variable.

  7. The result of the call to addValue:toValue: at this point is, as usual, in r0. This instruction moves the value into r1, since that’s where it’ll need to be for the call to NSLog, a C function.
  8. This loads a pointer to the string parameter to NSLog into r0, just as in the printf call in the C function example.
  9. Branch, with link and exchange to the NSLog function implementation.
  10. Two values are popped from the stack, one into r7 and one into the program counter. Just like before, this will perform the return from the foo method.

L_OBJC_SELECTOR_REFERENCES_:
    .long	L_OBJC_METH_VAR_NAME_



As you can see, there’s not all that much difference between plain C and Objective-C when it comes to the generated assembly. The extra things for which to be on the lookout are the implicit two parameters passed to a method implementation, and selectors being referenced by strings in the data segment.

Obj-C Msg Send What to the Who?

You saw above the function objc_msgSend made an appearance. You have probably seen this before in crash logs. This function is at the core of the Objective-C runtime. The runtime is the code that glues together an Objective-C application, including all the memory management methods and handling of classes.

Every time an Objective-C method is called, objc_msgSend is the C function that handles the message dispatching. It looks up the implementation for the method that’s been called by inspecting the type of object being messaged and finding the implementation for the method in the class’s method list. The signature for objc_msgSend looks like this:

id objc_msgSend(id self, SEL _cmd, ...)

The first parameter is the object that will be self during the method’s execution. So when you write something like self.someProperty, this is where the self is coming from.

The second parameter is a lesser-known, hidden parameter. Try it for yourself: write something like NSLog(@"%@", NSStringFromSelector(_cmd)); in an Objective-C method and you’ll see the current selector printed out. Neat, eh?

The remaining parameters are the parameters to the method itself. So a method that takes two parameters, like addValue:toValue: above, takes two extra parameters. Therefore, instead of calling it via Objective-C, you could, in fact, do the following:

- (void)foo {
    int add = (int)objc_msgSend(self, NSSelectorFromString(@"addValue:toValue:"), 12, 34);
    NSLog(@"add = %i", add);
}

Note: The return type of objc_msgSend is id but it has been cast to an int. This is fine because the size of each is the same. If the method returns something of a different size, then it’s actually another method that gets called. You can read more about that here. Similarly, if the return type is floating point, another variant of objc_msgSend gets called.

Note: The return type of objc_msgSend is id but it has been cast to an int. This is fine because the size of each is the same. If the method returns something of a different size, then it’s actually another method that gets called. You can read more about that here. Similarly, if the return type is floating point, another variant of objc_msgSend gets called.

Recall from above that the C-equivalent function that gets created when an Objective-C method is compiled has a signature that looks like this:

int ViewController_addValue_toValue(id self, SEL _cmd, int a, int b)

It should now be no surprise as to why that is. Notice that the signature matches objc_msgSend! That means that all the parameters will already be in the right place for when objc_msgSend finds the implementation for a method and jumps to it.

You can read more about objc_msgSend in these excellent posts.