Lab 5 - 64-Bit Assembly Language Lab

Introduction

In this lab, I explored how assembly language works on two different architectures: x86_64 and AArch64. The goal was to understand how simple programs like "Hello, World!" and loops are written, compiled, and executed in assembly for these architectures.

I started by running a C program, then analyzed how it translates into assembly. Finally, I modified and tested an AArch64 assembly program to print numbers in a loop.

Main Process

Step 1: Setting Up the Environment

First, I connected to the x86_64 via SSH. Then, I extracted the provided lab files:

cd ~
tar xvf /public/spo600-assembler-lab-examples.tgz

This created a directory structure with example files for both architectures.

Step 2: Running the C Program

Before diving into assembly, I compiled and ran the C version of the program to understand its behaviour:

cd ~/spo600/examples/hello/c
make  # Compiles the C files
./hello  # Runs the compiled program

Step 3: Disassembling the Executable

To analyze how the C program translates into assembly, I used objdump:

objdump -d hello

This showed the assembly instructions inside the executable, especially for the main function.

Key Takeaways (From the disassembled output):

The printf("Hello, World!") call in C maps to an assembly call printf@plt instruction, which tells the program to call the printf function from the standard library.
The mov instruction loads values into registers, while push and pop manage the function stack frame.
The compiled C binary includes a lot of additional instructions beyond just printing text. These include stack setup, function prologues/epilogues, and standard library function calls.
The C program has extra instructions for setup and cleanup, which are absent in handwritten assembly, making the compiled binary larger than a manually optimized assembly version.

Step 4: Running x86_64 Assembly Code

Next, I ran the x86_64 assembly version:

tar xvf /public/spo600-assembler-lab-examples.tgz

ls ~/spo600/examples/hello/assembler/aarch64

cd ~/spo600/examples/hello/assembler/aarch64
make
./hello

And the output was same, "Hello World!"

objdump -d hello

This confirmed that both architectures produce the same output, but use different assembly instructions.

Step 5: Modifying AArch64 Assembly and Debugging Issues

Initial Code

The original AArch64 assembly program was a simple implementation of Hello, World!, structured as follows:

.text
.globl _start
_start:
    mov     x0, 1
    adr     x1, msg
    mov     x2, len
    mov     x8, 64
    svc     0

    mov     x0, 0
    mov     x8, 93
    svc     0

.data
msg:    .ascii  "Hello, world!\n"
len=    . - msg

First Experiment: Single-Digit Loop Counter

The first attempt was a simple loop that printed numbers from 0 to 5 as a single ASCII character:

.text

.globl _start

_start:

mov x19, #0 // Loop counter (starts at 0)

loop:

mov x0, 1 // stdout file descriptor

adr x1, loop_msg // Load "Loop: " message address

mov x2, loop_len // Load message length

mov x8, 64 // write syscall

svc 0 // Call kernel

// Convert loop index (x19) to ASCII and print

add x20, x19, #48 // Convert number to ASCII ('0' = 48)

adr x1, num_char // Address of character buffer

strb w20, [x1] // Store ASCII character in buffer

mov x0, 1 // stdout file descriptor

adr x1, num_char // Load buffer address

mov x2, #1 // Print one character

mov x8, 64 // write syscall

svc 0 // Call kernel

// Print newline

mov x0, 1

adr x1, newline

mov x2, #1

mov x8, 64

svc 0

add x19, x19, #1 // Increment counter

cmp x19, #6 // Stop at 6

b.ne loop // Repeat if x19 < 6

// Exit program

mov x0, 0

mov x8, 93 // exit syscall

svc 0

.data

loop_msg: .ascii "Loop: " // Prefix message

loop_len= . - loop_msg // Message length

num_char: .ascii "0" // Placeholder for loop index

newline: .ascii "\n" // Newline character

Second Experiment: Two-Digit Loop Counter (0-32)

After successfully printing single digits, I modified the code to print numbers from 00 to 32 using two-digit formatting:

.text

.globl _start

_start:

mov x19, #0 // Loop counter (starts at 0)

mov x22, #10 // Store 10 in register for division

loop:

// Print "Loop: "

mov x0, 1

ldr x1, =loop_msg

mov x2, loop_len

mov x8, 64

svc 0

// Convert x19 (0-32) to two-digit ASCII

udiv x20, x19, x22 // x20 = x19 / 10 (quotient, tens digit)

msub x21, x20, x22, x19 // x21 = x19 - (x20 * 10) (remainder, ones digi>

add x20, x20, #48 // Convert quotient to ASCII ('0'-'9')

add x21, x21, #48 // Convert remainder to ASCII ('0'-'9')

// Store digits in buffer

ldr x1, =num_chars // Address of buffer

strb w20, [x1] // Store tens digit

strb w21, [x1, #1] // Store ones digit

// Print two-digit number

mov x0, 1

ldr x1, =num_chars

mov x2, #2

mov x8, 64

svc 0

// Print newline

mov x0, 1

ldr x1, =newline

mov x2, #1

mov x8, 64

svc 0

add x19, x19, #1 // Increment counter

cmp x19, #33 // Stop at 33 (prints 00 to 32)

b.ne loop // Repeat if x19 < 33

// Exit program

mov x0, 0

mov x8, 93

svc 0

.data

.align 4

loop_msg: .asciz "Loop: "

loop_len= . - loop_msg

.align 4

num_chars: .asciz "00" // Placeholder for 2-digit number

.align 4

newline: .asciz "\n"

I encountered an issue where the ASCII conversion was incorrect, leading to unintended characters appearing. After debugging, I found that I had not properly handled the division and remainder when splitting the digits. By correctly using `udiv` and `msub`, I was able to get the expected output.

Third Experiment: Hexadecimal Loop Counter

.text

.globl _start

_start:

mov x19, #0 // loop counter

loop:

// print "Loop: 0x"

mov x0, #1

ldr x1, =prefix

mov x2, #8

mov x8, #64

svc 0

// high nibble = x19 >> 4

mov x20, x19

lsr x21, x20, #4

bl to_hexchar

ldr x1, =hexbuf // Load hexbuf address

strb w0, [x1] // Store high nibble as ASCII

// low nibble = x19 & 0x0F

and x21, x20, #0x0F

bl to_hexchar

strb w0, [x1, #1] // Store low nibble as ASCII

// print hexbuf (2 chars)

mov x0, #1

ldr x1, =hexbuf

mov x2, #2

mov x8, #64

svc 0

// newline

mov x0, #1

ldr x1, =newline

mov x2, #1

mov x8, #64

svc 0

add x19, x19, #1

cmp x19, #33

b.ne loop

// exit

mov x0, #0

mov x8, #93

svc 0

// Subroutine to convert number (0-15) to hex ASCII character

to_hexchar:

cmp x21, #10

blt .digit

add x0, x21, #55 // 'A' = 65 = 10 + 55

ret

.digit:

add x0, x21, #48 // '0' = 48

ret

.data

prefix: .asciz "Loop: 0x"

hexbuf: .space 2

newline: .asciz "\n"

Step 6: Step 5 Experiment on x86_64

First Experiment: Single-Digit Loop Counter

.section .data

prefix: .ascii "Loop: "

newline: .ascii "\n"

digit: .byte 0

.section .text

.globl _start

_start:

mov $0, %r15

loop:

# print prefix

mov $1, %rax

mov $1, %rdi

lea prefix(%rip), %rsi

mov $6, %rdx

syscall

# convert number to ASCII

mov %r15b, %al

add $'0', %al

mov %al, digit(%rip)

# print digit

mov $1, %rax

mov $1, %rdi

lea digit(%rip), %rsi

mov $1, %rdx

syscall

# newline

mov $1, %rax

mov $1, %rdi

lea newline(%rip), %rsi

mov $1, %rdx

syscall

inc %r15

cmp $6, %r15

jne loop

mov $60, %rax

xor %rdi, %rdi

syscall

Second Experiment: Two-Digit Loop Counter (0-32)

.section .data

prefix: .ascii "Loop: "

newline: .ascii "\n"

digits: .space 2

.section .text

.globl _start

_start:

mov $0, %r15 # loop counter

loop:

# print prefix

mov $1, %rax

mov $1, %rdi

lea prefix(%rip), %rsi

mov $6, %rdx

syscall

# move loop counter to %rax for division

mov %r15, %rax

xor %rdx, %rdx # clear remainder

mov $10, %rbx

div %rbx # quotient = %rax, remainder = %rdx

# convert to ASCII

add $'0', %al # tens digit ASCII

add $'0', %dl # ones digit ASCII

# store digits

mov %al, digits(%rip)

mov %dl, digits+1(%rip)

# print digits

mov $1, %rax

mov $1, %rdi

lea digits(%rip), %rsi

mov $2, %rdx

syscall

# print newline

mov $1, %rax

mov $1, %rdi

lea newline(%rip), %rsi

mov $1, %rdx

syscall

inc %r15

cmp $33, %r15

jne loop

# exit

mov $60, %rax

xor %rdi, %rdi

syscall

Third Experiment: Hexadecimal Loop Counter

.section .data

prefix: .ascii "Loop: 0x"

newline: .ascii "\n"

hexbuf: .space 2 # 2 hex digits will go here

.section .text

.globl _start

_start:

mov $0, %r15 # loop counter = 0

loop:

# print "Loop: 0x"

mov $1, %rax

mov $1, %rdi

lea prefix(%rip), %rsi

mov $8, %rdx

syscall

# convert to hex digits

mov %r15b, %al # copy counter to AL

mov %al, %bl # backup for low nibble

shr $4, %al # high nibble

call hexchar

mov %al, hexbuf(%rip)

mov %bl, %al

and $0x0F, %al # low nibble

call hexchar

mov %al, hexbuf+1(%rip)

# print hex digits

mov $1, %rax

mov $1, %rdi

lea hexbuf(%rip), %rsi

mov $2, %rdx

syscall

# print newline

mov $1, %rax

mov $1, %rdi

lea newline(%rip), %rsi

mov $1, %rdx

syscall

inc %r15

cmp $0x21, %r15

jne loop

# exit

mov $60, %rax

xor %rdi, %rdi

syscall

# Subroutine: Convert AL to ASCII hex character

hexchar:

cmp $10, %al

jl .digit

add $55, %al # 'A' = 65 = 10 + 55

ret

.digit:

add $48, %al # '0' = 48

ret

6502 vs x86_64 vs AArch64 Assembly

6502 Assembly

Limited registers: 6502 only has three main registers (A, X, and Y), making it much more constrained than x86_64 and AArch64.
No direct multiplication or division: Unlike x86_64 and AArch64, 6502 lacks built-in multiplication or division instructions, so programmers must implement them manually.
Memory addressing is more restricted: The 6502 only supports 16-bit memory addressing, making it much more difficult to work with large data sets.
More manual memory management: Unlike modern architectures, stack operations and function calls must be manually handled, leading to more complexity.

x86_64 Assembly

Complex instruction set (CISC): x86_64 has a large number of instructions and addressing modes, making it very powerful but also complicated.
Stack-based function calls: Arguments are passed on the stack (or in registers, depending on the calling convention).
Support for floating-point operations and SIMD: Unlike 6502, x86_64 has built-in support for floating-point arithmetic and vectorized operations.

AArch64 Assembly

Simpler and more efficient than x86_64: AArch64 has a more uniform register set and a fixed instruction length, making it easier to optimize.
Register-based calling convention: Unlike x86_64, which often relies on the stack, AArch64 primarily passes arguments in registers.
More suited for modern mobile and embedded devices: Many smartphones and IoT devices use AArch64 because of its power efficiency.

Reflection

This lab provided hands-on experience with assembly programming on three architectures. I learned how C code translates into machine instructions, how different architectures handle function calls, and how to manipulate registers and memory directly.

Debugging assembly was challenging, especially with incorrect ASCII conversions in AArch64, but fixing these issues helped reinforce my understanding of low-level programming.

Overall, this was a challenging but rewarding exercise in low-level programming.

Reference

Comparing x86_64 and aarch64 assembly, which do you prefer?

https://paracr4ckbeginnings.wordpress.com/2014/10/03/comparing-x86_64-and-aarch64-assembly-which-do-you-prefer/

Comparison of Assemblers

https://en.wikipedia.org/wiki/Comparison_of_assemblers

Search This Blog

SPO600 Blog