Lab4 - Assembly in x86_64 and aarch64

In this post we'll write a simple loop that will print the statement 30 times. We'll write the program in assembly code for both x86_64 and aarch64 system.

The program will print 'Loop: #' where # is 0 ~ 30.

x86-64
First let's write the loop for x86-64
Source Code
Let's take a look at what's happening here.
Loop counter is initialized in _start:
Register r15 holds the loop counter so we can increment it after print execution.
Then we set r10 to value 10, a divisor. r10 is used to calculate quotient and remainder for two digit numbers.
rax is set to loop counter and rdx is set to 0, divide(div) rax by r10 to get the quotient.
Then we compare(cmp) to see if quotient is 0 or not. If quotient is not 0 then it jumps to label 'double'. If it is 0 then it continues on.

Quotient is 0:
Here the loop counter is set in r14 then 0x30 is added to r14 for ascii conversion of the iterator (0-9).
Then we add 0x20(white space) to r8 to use as a prefix to print out the numbers from 0-9.
To print out the numbers from 0-9 with white space prefix, we move 8bits of r8 into message at byte 5 and move 8bits of r14 into message at byte 6.

Quotient is not 0:
Here rdx is reset to 0 to get the remainder of the div r10.
After div r10, the values we have in rax is quotient and rdx is the remainder.
Convert rax and rdx into ascii by adding 0x30.
Insert lower bit of rax into message at byte 5 and lower bit of rdx into message at byte 6.

Print:
Print the message as set in loop or in double.
Check loop condition - is loop counter equal to max? If not jump back to loop, else exit.

The result:


aarch64
Source code:
Let's take a look this code.
The concept is same as x86 version.
Here we initialize iterator and divisor in _start:.
--Notice how we do not have 'start' and 'max' that we had in x86.--
Then we set up our loop condition.
Divide the iterator by divisor and store the quotient in the x4.
x5 is set to ascii '0' to pad numbers from 0-9.
Compare if the quotient is 0 or not. If the quotient is not equal to 0, then jump to label double.

Quotient is 0:
Convert the iterator into ascii by adding 0x30 and store the converted value in x3.
Load the message into x15 and store the ascii value of iterator into message at byte 6.
--Notice how x5 is commented. This will show 0-9 instead of 00-09--
Compare if quotient is equal to 0, jump to print if is it.

Quotient is not 0:
Get the remainder by iterator-(quotient * divisor) and store is in x6.
Convert quotient and remainder into ascii by adding 0x30.
Load message into x15.
Insert converted quotient into message at byte 5.
Insert converted remainder into message at byte 6.

Print:
Print out the message as set in loop or double.
Check loop condition - is loop counter equal to 31? If it is then exit, else jump back to loop.

Result:


When it comes to writing and debugging the code assembly is harder to do compared to other languages like C, C++, Java.
After writing the code in both x86 and aarch64 assembly, what I noticed is the assignment order is reverse from each other.
Ex.
x86: mov $start, %r15
This is read as 'move value of start into register r15'.
The assignment order: assign left to right

aarch64: mov x19, 0'
This is read as 'move 0 into register x19'
The assignment order: assign right to left

Also, when it comes to calculation x86 felt easier compared to aarch64. In x86 div automatically stores quotient into rax and remainder into rdx, but in aarch64 not only I have to specify where quotient/remainder is to be stored, the remainder calculation has to be done separately.

Same goes for message manipulation. I found x86 to be easier to work with message than with aarch64.

In the end they both have same concept of execution, and once I get used to the syntax it should get easier to write in both.
But I still prefer to write codes in higher level languages than in assembly.

Comments