Lab 5 - 64-Bit Assembly Language Lab

 Introduction

In this lab, I explored how assembly language works on two different architectures: x86_64 and AArch64. The goal was to understand how simple programs like "Hello, World!" and loops are written, compiled, and executed in assembly for these architectures.

I started by running a C program, then analyzed how it translates into assembly. Finally, I modified and tested an AArch64 assembly program to print numbers in a loop.


Main Process

Step 1: Setting Up the Environment

First, I connected to the x86_64 via SSH. Then, I extracted the provided lab files:

cd ~
tar xvf /public/spo600-assembler-lab-examples.tgz

This created a directory structure with example files for both architectures.


Step 2: Running the C Program

Before diving into assembly, I compiled and ran the C version of the program to understand its behaviour:

cd ~/spo600/examples/hello/c
make  # Compiles the C files
./hello  # Runs the compiled program









Step 3: Disassembling the Executable

To analyze how the C program translates into assembly, I used objdump:

objdump -d hello

This showed the assembly instructions inside the executable, especially for the main function.

Key Takeaways (From the disassembled output):

  • The printf("Hello, World!") call in C maps to an assembly call printf@plt instruction, which tells the program to call the printf function from the standard library.

  • The mov instruction loads values into registers, while push and pop manage the function stack frame.

  • The compiled C binary includes a lot of additional instructions beyond just printing text. These include stack setup, function prologues/epilogues, and standard library function calls.

  • The C program has extra instructions for setup and cleanup, which are absent in handwritten assembly, making the compiled binary larger than a manually optimized assembly version.













Step 4: Running x86_64 Assembly Code

Next, I ran the x86_64 assembly version:

tar xvf /public/spo600-assembler-lab-examples.tgz
ls ~/spo600/examples/hello/assembler/aarch64
cd ~/spo600/examples/hello/assembler/aarch64
make
./hello
And the output was same, "Hello World!"

objdump -d hello
This confirmed that both architectures produce the same output, but use different assembly instructions.


Step 5: Modifying AArch64 Assembly and Debugging Issues

Initial Code

The original AArch64 assembly program was a simple implementation of Hello, World!, structured as follows:

.text
.globl _start
_start:
    mov     x0, 1
    adr     x1, msg
    mov     x2, len
    mov     x8, 64
    svc     0

    mov     x0, 0
    mov     x8, 93
    svc     0

.data
msg:    .ascii  "Hello, world!\n"
len=    . - msg


First Experiment: Single-Digit Loop Counter

The first attempt was a simple loop that printed numbers from 0 to 5 as a single ASCII character:

.text

.globl _start

_start:

    mov     x19, #0         // Loop counter (starts at 0)

loop:

    mov     x0, 1           // stdout file descriptor

    adr     x1, loop_msg    // Load "Loop: " message address

    mov     x2, loop_len    // Load message length

    mov     x8, 64          // write syscall

    svc     0               // Call kernel


    // Convert loop index (x19) to ASCII and print

    add     x20, x19, #48   // Convert number to ASCII ('0' = 48)

    adr     x1, num_char    // Address of character buffer

    strb    w20, [x1]       // Store ASCII character in buffer

    mov     x0, 1           // stdout file descriptor

    adr     x1, num_char    // Load buffer address

    mov     x2, #1          // Print one character

    mov     x8, 64          // write syscall

    svc     0               // Call kernel


    // Print newline

    mov     x0, 1

    adr     x1, newline

    mov     x2, #1

    mov     x8, 64

    svc     0

    add     x19, x19, #1    // Increment counter

    cmp     x19, #6         // Stop at 6

    b.ne    loop            // Repeat if x19 < 6

    // Exit program

    mov     x0, 0

    mov     x8, 93          // exit syscall

    svc     0


.data

loop_msg:   .ascii  "Loop: "  // Prefix message

loop_len=   . - loop_msg      // Message length

num_char:   .ascii  "0"       // Placeholder for loop index

newline:    .ascii  "\n"      // Newline character











Second Experiment: Two-Digit Loop Counter (0-32)

After successfully printing single digits, I modified the code to print numbers from 00 to 32 using two-digit formatting:

.text

.globl _start

_start:

    mov     x19, #0         // Loop counter (starts at 0)

    mov     x22, #10        // Store 10 in register for division


loop:

    // Print "Loop: "

    mov     x0, 1

    ldr     x1, =loop_msg

    mov     x2, loop_len

    mov     x8, 64

    svc     0


    // Convert x19 (0-32) to two-digit ASCII

    udiv    x20, x19, x22   // x20 = x19 / 10 (quotient, tens digit)

    msub    x21, x20, x22, x19  // x21 = x19 - (x20 * 10) (remainder, ones digi>


    add     x20, x20, #48   // Convert quotient to ASCII ('0'-'9')

    add     x21, x21, #48   // Convert remainder to ASCII ('0'-'9')


    // Store digits in buffer

    ldr     x1, =num_chars  // Address of buffer

    strb    w20, [x1]       // Store tens digit

    strb    w21, [x1, #1]   // Store ones digit


    // Print two-digit number

    mov     x0, 1

    ldr     x1, =num_chars

    mov     x2, #2

    mov     x8, 64

    svc     0


    // Print newline

    mov     x0, 1

    ldr     x1, =newline

    mov     x2, #1

    mov     x8, 64

    svc     0


    add     x19, x19, #1    // Increment counter

    cmp     x19, #33        // Stop at 33 (prints 00 to 32)

    b.ne    loop            // Repeat if x19 < 33


    // Exit program

    mov     x0, 0

    mov     x8, 93

    svc     0


.data

.align 4

loop_msg:   .asciz  "Loop: "

loop_len=   . - loop_msg


.align 4

num_chars:  .asciz  "00"    // Placeholder for 2-digit number


.align 4

newline:    .asciz  "\n"

I encountered an issue where the ASCII conversion was incorrect, leading to unintended characters appearing. After debugging, I found that I had not properly handled the division and remainder when splitting the digits. By correctly using udiv and msub, I was able to get the expected output.





















Third Experiment: Hexadecimal Loop Counter

.text

.globl _start


_start:

    mov     x19, #0              // loop counter


loop:

    // print "Loop: 0x"

    mov     x0, #1

    ldr     x1, =prefix

    mov     x2, #8

    mov     x8, #64

    svc     0


    // high nibble = x19 >> 4

    mov     x20, x19

    lsr     x21, x20, #4

    bl      to_hexchar

    ldr     x1, =hexbuf          // Load hexbuf address

    strb    w0, [x1]             // Store high nibble as ASCII


    // low nibble = x19 & 0x0F

    and     x21, x20, #0x0F

    bl      to_hexchar

    strb    w0, [x1, #1]         // Store low nibble as ASCII


    // print hexbuf (2 chars)

    mov     x0, #1

    ldr     x1, =hexbuf

    mov     x2, #2

    mov     x8, #64

    svc     0


    // newline

    mov     x0, #1

    ldr     x1, =newline

    mov     x2, #1

    mov     x8, #64

    svc     0


    add     x19, x19, #1

    cmp     x19, #33

    b.ne    loop


    // exit

    mov     x0, #0

    mov     x8, #93

    svc     0


// Subroutine to convert number (0-15) to hex ASCII character

to_hexchar:

    cmp     x21, #10

    blt     .digit

    add     x0, x21, #55     // 'A' = 65 = 10 + 55

    ret

.digit:

    add     x0, x21, #48     // '0' = 48

    ret


.data

prefix:     .asciz "Loop: 0x"

hexbuf:     .space 2

newline:    .asciz "\n"























Step 6: Step 5 Experiment on x86_64

First Experiment: Single-Digit Loop Counter

.section .data

prefix:     .ascii "Loop: "

newline:    .ascii "\n"

digit:      .byte 0


.section .text

.globl _start

_start:

    mov     $0, %r15


loop:

    # print prefix

    mov     $1, %rax

    mov     $1, %rdi

    lea     prefix(%rip), %rsi

    mov     $6, %rdx

    syscall


    # convert number to ASCII

    mov     %r15b, %al

    add     $'0', %al

    mov     %al, digit(%rip)


    # print digit

    mov     $1, %rax

    mov     $1, %rdi

    lea     digit(%rip), %rsi

    mov     $1, %rdx

    syscall


    # newline

    mov     $1, %rax

    mov     $1, %rdi

    lea     newline(%rip), %rsi

    mov     $1, %rdx

    syscall


    inc     %r15

    cmp     $6, %r15

    jne     loop


    mov     $60, %rax

    xor     %rdi, %rdi

    syscall







Second Experiment: Two-Digit Loop Counter (0-32)

.section .data

prefix:     .ascii "Loop: "

newline:    .ascii "\n"

digits:     .space 2


.section .text

.globl _start

_start:

    mov     $0, %r15         # loop counter


loop:

    # print prefix

    mov     $1, %rax

    mov     $1, %rdi

    lea     prefix(%rip), %rsi

    mov     $6, %rdx

    syscall


    # move loop counter to %rax for division

    mov     %r15, %rax

    xor     %rdx, %rdx       # clear remainder

    mov     $10, %rbx

    div     %rbx             # quotient = %rax, remainder = %rdx


    # convert to ASCII

    add     $'0', %al        # tens digit ASCII

    add     $'0', %dl        # ones digit ASCII


    # store digits

    mov     %al, digits(%rip)

    mov     %dl, digits+1(%rip)


    # print digits

    mov     $1, %rax

    mov     $1, %rdi

    lea     digits(%rip), %rsi

    mov     $2, %rdx

    syscall


    # print newline

    mov     $1, %rax

    mov     $1, %rdi

    lea     newline(%rip), %rsi

    mov     $1, %rdx

    syscall


    inc     %r15

    cmp     $33, %r15

    jne     loop


    # exit

    mov     $60, %rax

    xor     %rdi, %rdi

    syscall

















Third Experiment: Hexadecimal Loop Counter

.section .data

prefix:     .ascii "Loop: 0x"

newline:    .ascii "\n"

hexbuf:     .space 2         # 2 hex digits will go here


.section .text

.globl _start

_start:

    mov     $0, %r15         # loop counter = 0


loop:

    # print "Loop: 0x"

    mov     $1, %rax

    mov     $1, %rdi

    lea     prefix(%rip), %rsi

    mov     $8, %rdx

    syscall


    # convert to hex digits

    mov     %r15b, %al       # copy counter to AL

    mov     %al, %bl         # backup for low nibble


    shr     $4, %al          # high nibble

    call    hexchar

    mov     %al, hexbuf(%rip)


    mov     %bl, %al

    and     $0x0F, %al       # low nibble

    call    hexchar

    mov     %al, hexbuf+1(%rip)


    # print hex digits

    mov     $1, %rax

    mov     $1, %rdi

    lea     hexbuf(%rip), %rsi

    mov     $2, %rdx

    syscall


    # print newline

    mov     $1, %rax

    mov     $1, %rdi

    lea     newline(%rip), %rsi

    mov     $1, %rdx

    syscall


    inc     %r15

    cmp     $0x21, %r15

    jne     loop


    # exit

    mov     $60, %rax

    xor     %rdi, %rdi

    syscall


# Subroutine: Convert AL to ASCII hex character

hexchar:

    cmp     $10, %al

    jl      .digit

    add     $55, %al         # 'A' = 65 = 10 + 55

    ret

.digit:

    add     $48, %al         # '0' = 48

    ret














6502 vs x86_64 vs AArch64 Assembly

6502 Assembly

  • Limited registers: 6502 only has three main registers (A, X, and Y), making it much more constrained than x86_64 and AArch64.

  • No direct multiplication or division: Unlike x86_64 and AArch64, 6502 lacks built-in multiplication or division instructions, so programmers must implement them manually.

  • Memory addressing is more restricted: The 6502 only supports 16-bit memory addressing, making it much more difficult to work with large data sets.

  • More manual memory management: Unlike modern architectures, stack operations and function calls must be manually handled, leading to more complexity.

x86_64 Assembly

  • Complex instruction set (CISC): x86_64 has a large number of instructions and addressing modes, making it very powerful but also complicated.

  • Stack-based function calls: Arguments are passed on the stack (or in registers, depending on the calling convention).

  • Support for floating-point operations and SIMD: Unlike 6502, x86_64 has built-in support for floating-point arithmetic and vectorized operations.

AArch64 Assembly

  • Simpler and more efficient than x86_64: AArch64 has a more uniform register set and a fixed instruction length, making it easier to optimize.

  • Register-based calling convention: Unlike x86_64, which often relies on the stack, AArch64 primarily passes arguments in registers.

  • More suited for modern mobile and embedded devices: Many smartphones and IoT devices use AArch64 because of its power efficiency.



Reflection

This lab provided hands-on experience with assembly programming on three architectures. I learned how C code translates into machine instructions, how different architectures handle function calls, and how to manipulate registers and memory directly.

Debugging assembly was challenging, especially with incorrect ASCII conversions in AArch64, but fixing these issues helped reinforce my understanding of low-level programming.

Overall, this was a challenging but rewarding exercise in low-level programming.




Reference

Comparing x86_64 and aarch64 assembly, which do you prefer?

https://paracr4ckbeginnings.wordpress.com/2014/10/03/comparing-x86_64-and-aarch64-assembly-which-do-you-prefer/


Comparison of Assemblers

https://en.wikipedia.org/wiki/Comparison_of_assemblers




Comments

Popular posts from this blog

SPO600 2025 Winter Project - Stage 1: Create a Basic GCC Pass (part1)

SPO600 2025 Winter Project - Stage 2: GIMPLE Level Clone Analysis and Pruning (part4)

Lab 1 - 6502 Assembly Language