iOS Assembly Tutorial: Understanding ARM
Learn how to read assembly in iOS – a useful skill when debugging your code or diagnosing why a crash has occurred. By Matt Galloway.
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Sign up/Sign in
With a free Kodeco account you can download source code, track your progress, bookmark, personalise your learner profile and more!
Create accountAlready a member of Kodeco? Sign in
Contents
iOS Assembly Tutorial: Understanding ARM
40 mins
Objective-C Assembly
Up until now, the functions you’ve seen have been C functions. Objective-C adds a bit more complexity on top, and let’s examine that now. Open ViewController.m and add the following method inside the class implementation:
- (int)addValue:(int)a toValue:(int)b {
int c = a + b;
return c;
}
Once again, go to Product\Generate Output\Assembly File to view the assembly. Make sure Archiving is set for the output type, then search for addValue:toValue:
and find the assembly that looks like this:
"-[ViewController addValue:toValue:]":
adds r0, r3, r2
bx lr
The first thing you’ll notice is the label name. This time the name is a string that contains the class name and the full Objective-C method name.
If you look back at the assembly for addFunction
and compare, then you’ll also notice is that the two values added together are in r2
and r3
rather than r0
and r1
. That must mean that the two parameters to the method are in r2
and r3
. Why is that?
Well, it’s because all Objective-C methods are really just C functions with two implicit parameters passed before the rest of the method’s parameters. The addValue:toValue:
method is semantically equivalent to the following C function:
int ViewController_addValue_toValue(id self, SEL _cmd, int a, int b) {
int c = a + b;
return c;
}
This is why the parameters a
and b
appear in r2
and r3
, respectively. You are probably already aware of the first of the two implicit parameters. You make use of self
all the time.
However, _cmd
is something you might not have seen before. Like self
, it is available inside all Objective-C methods and contains the selector of the currently-executing method. You generally never need to access this, though, which is why you may not have ever heard of it!
To see how Objective-C methods are called, add the following method to ViewController
:
- (void)foo {
int add = [self addValue:12 toValue:34];
NSLog(@"add = %i", add);
}
Generate the assembly file again and find this method. You should see the following:
"-[ViewController foo]":
@ 1:
push {r7, lr}
@ 2:
movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC1_0+4))
movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC1_0+4))
LPC1_0:
add r1, pc
@ 3:
ldr r1, [r1]
@ 4:
movs r2, #12
movs r3, #34
@ 5:
mov r7, sp
@ 6:
blx _objc_msgSend
@ 7:
mov r1, r0
@ 8:
movw r0, :lower16:(L__unnamed_cfstring_-(LPC1_1+4))
movt r0, :upper16:(L__unnamed_cfstring_-(LPC1_1+4))
LPC1_1:
add r0, pc
@ 9:
blx _NSLog
@ 10:
pop {r7, pc}
Once again, this is extremely similar to the plain C equivalent you saw earlier. Breaking it down, it does this:
This means that L_OBJC_SELECTOR_REFERENCES_
, and hence r1
at this point, contain the address of the label L_OBJC_METH_VAR_NAME_
. If you look at that label, you’ll find the string addValue:toValue:
.
This instruction, ldr r1, [r1]
, is loading the value stored at the memory address held in r1
and putting the value back into r1
. It is “dereferencing” r1
. In pseudo-C-code this looks like: r1 = *r1
. If you think about it carefully, this means that r1
will now contain a pointer to the addValue:toValue:
string.
The parameters are the same as those that eventually get passed to the method. So r0
is self
, r1
is _cmd
, r2
and r3
are the remaining parameters. This is why the selector is loaded into r1
and the parameters to pass are loaded into r2
and r3
. r0
is not explicitly loaded because it already holds the correct self
variable.
- Push
r7
andlr
onto the stack. - Load the value at the label
L_OBJC_SELECTOR_REFERENCES_
intor1
using the same program counter relative addressing as seen earlier. The name gives away what this is. It is a reference to a selector. Selectors are just strings, really, stored in the same way in the data segment. - If you look up what
L_OBJC_SELECTOR_REFERENCES_
is in the assembly, you’ll see the following:L_OBJC_SELECTOR_REFERENCES_: .long L_OBJC_METH_VAR_NAME_
This means that
L_OBJC_SELECTOR_REFERENCES_
, and hencer1
at this point, contain the address of the labelL_OBJC_METH_VAR_NAME_
. If you look at that label, you’ll find the stringaddValue:toValue:
.This instruction,
ldr r1, [r1]
, is loading the value stored at the memory address held inr1
and putting the value back intor1
. It is “dereferencing”r1
. In pseudo-C-code this looks like:r1 = *r1
. If you think about it carefully, this means thatr1
will now contain a pointer to theaddValue:toValue:
string. - Load the constants into
r2
andr3
. - Save the stack pointer.
- Branch, with link and exchange, to
objc_msgSend
. This is the function that is at the heart of the Objective-C runtime. It calls the implementation associated with the required selector.The parameters are the same as those that eventually get passed to the method. So
r0
isself
,r1
is_cmd
,r2
andr3
are the remaining parameters. This is why the selector is loaded intor1
and the parameters to pass are loaded intor2
andr3
.r0
is not explicitly loaded because it already holds the correctself
variable. - The result of the call to
addValue:toValue:
at this point is, as usual, inr0
. This instruction moves the value intor1
, since that’s where it’ll need to be for the call toNSLog
, a C function. - This loads a pointer to the string parameter to
NSLog
intor0
, just as in theprintf
call in the C function example. - Branch, with link and exchange to the
NSLog
function implementation. - Two values are popped from the stack, one into
r7
and one into the program counter. Just like before, this will perform the return from thefoo
method.
L_OBJC_SELECTOR_REFERENCES_:
.long L_OBJC_METH_VAR_NAME_
As you can see, there’s not all that much difference between plain C and Objective-C when it comes to the generated assembly. The extra things for which to be on the lookout are the implicit two parameters passed to a method implementation, and selectors being referenced by strings in the data segment.
Obj-C Msg Send What to the Who?
You saw above the function objc_msgSend
made an appearance. You have probably seen this before in crash logs. This function is at the core of the Objective-C runtime. The runtime is the code that glues together an Objective-C application, including all the memory management methods and handling of classes.
Every time an Objective-C method is called, objc_msgSend
is the C function that handles the message dispatching. It looks up the implementation for the method that’s been called by inspecting the type of object being messaged and finding the implementation for the method in the class’s method list. The signature for objc_msgSend
looks like this:
id objc_msgSend(id self, SEL _cmd, ...)
The first parameter is the object that will be self
during the method’s execution. So when you write something like self.someProperty
, this is where the self
is coming from.
The second parameter is a lesser-known, hidden parameter. Try it for yourself: write something like NSLog(@"%@", NSStringFromSelector(_cmd));
in an Objective-C method and you’ll see the current selector printed out. Neat, eh?
The remaining parameters are the parameters to the method itself. So a method that takes two parameters, like addValue:toValue:
above, takes two extra parameters. Therefore, instead of calling it via Objective-C, you could, in fact, do the following:
- (void)foo {
int add = (int)objc_msgSend(self, NSSelectorFromString(@"addValue:toValue:"), 12, 34);
NSLog(@"add = %i", add);
}
Note: The return type of objc_msgSend
is id
but it has been cast to an int
. This is fine because the size of each is the same. If the method returns something of a different size, then it’s actually another method that gets called. You can read more about that here. Similarly, if the return type is floating point, another variant of objc_msgSend
gets called.
Note: The return type of objc_msgSend
is id
but it has been cast to an int
. This is fine because the size of each is the same. If the method returns something of a different size, then it’s actually another method that gets called. You can read more about that here. Similarly, if the return type is floating point, another variant of objc_msgSend
gets called.
Recall from above that the C-equivalent function that gets created when an Objective-C method is compiled has a signature that looks like this:
int ViewController_addValue_toValue(id self, SEL _cmd, int a, int b)
It should now be no surprise as to why that is. Notice that the signature matches objc_msgSend
! That means that all the parameters will already be in the right place for when objc_msgSend
finds the implementation for a method and jumps to it.
You can read more about objc_msgSend
in these excellent posts.