Lab 1 - 6502 Assembly Language
Introduction
In this post, I'll share my experience working on a fun and challenging lab about 6502 assembly language. Assembly language is a low-level programming language, and in this lab, I explored how it works by writing code to manipulate a simple pixel display. My goal was to learn how to calculate performance, optimize code, and modify it to create interesting visual effects on the display.
Bitmap Code
This code:
- Starts at memory address $0200 (the beginning of the display)
- Loops through all 1536 pixels (6 pages of 256 pixels each)
- Sets each pixel to yellow ($07)
lda #$00 ; set a pointer in memory location $40 to point to $0200
sta $40 ; ... low byte ($00) goes in address $40
lda #$02
sta $41 ; ... high byte ($02) goes into address $41
lda #$07 ; colour number
ldy #$00 ; set index to 0
loop: sta ($40),y ; set pixel colour at the address (pointer)+Y
iny ; increment index
bne loop ; continue until done the page (256 pixels)
inc $41 ; increment the page
ldx $41 ; get the current page number
cpx #$06 ; compare with 6
bne loop ; continue until done all pages
Code Explanation Line by Line
1. lda #$00
- Load the number $00 into the A register- This will later set part of the memory address to point to the start of the display
2. sta $40
- Store the value in A register ($00) into memory address $40
- This sets the low byte of the pointer
3. lda #$02
- Load the number $02 into the A register
- This will set the next part of the memory address
4. sta $41
- Store the value in A register ($02) into memory address $41
- This sets the high byte of the pointer
- Together, $40 and $41 now point to $0200 (the starting address of the display)
5. lda #$07
- Load the number $07 into the A register
- $07 is the colour code for yellow
6. ldy #$00
- Load the number $00 into the Y register
- This is the starting index for looping through the pixels
> Loop
7. sta ($40),y
- Write the colour from the A register ($07) into the memory address pointed to by $40 and $41, plus the value in the Y register
- This colours one pixel
8. iny
- increase the Y register by 1
- This moves to the next pixel
9. bne loop
- Check if Y is not zero. If true, go back to loop:
- This repeats until all 256 pixels in the current page are filled
10. inc $41
- Increase the value in memory address $41 by 1
- This moves to the next page of the display (next 256 pixels)
11. ldx $41
- Load the value of $41 into the X register
- This keeps track of which page you are working on
12. cpx #$06
- Compare the value in the X register with $06
- $06 means the display's last page
13. bne loop
- If the value in the X register is not equal to 6, go back to loop
- This repeats until all pages (all pixels) are filled
Result
Code Execution Time Calculation
Step 1: Instruction Cycle Times
Inner Loop: Filling One Page (256 pixels)
Each instruction in code takes a specific number of cycles to execute.
- LAD #value: 2 cycles
- STA $address: 3 cycles
- STA ($address), Y: 6 cycles
- INY: 2 cycles
- BNE label: 3 cycles (when the branch succeeds), 2 cycles (when the branch fails)
Step 2: Analyzing the Code
- The loop: fills 256 pixels for one page. Each pixel is processes by:
- STA ($40),Y -> 6 cycles
- INY -> 2 cycles
- BNE loop -> 3 cycles (for 255 iterations), 2 cycles (for the final iteration)
- For 256 pixels, the total cycles are:
- STA ($40),Y: 6 * 256 = 1536 cycles
- INY: 2 * 256 = 512 cycles
- BNE loop: 3 * 255 + 2 = 767 cycles
- Total for one page: 2815 cycles
Outer Loop: Moving Between Pages
- The code precesses 6 pages (256 pixels per page). Between pages, these instructions execute:
- INC $41 -> 5 cycles
- LDX $41 -> 3 cycles
- CPX #$06 -> 2 cycles
- BNE loop -> 3 cycles (when the branch succeeds), 2 cycles (for the final iteration)
- These instructions are executed 5 times (6 pages, but the last page doesn't loop back)
- Total for page switching: (5 + 3 + 2 + 3) * 5 = 65 cycles
Step 3: Total Execution Time
- For 6 pages: 2815 cycles per page * 6 pages = 16,890 cycles
- Page switching: 65 cycles
- Total cycles: 16,890 + 65 = 16,955 cycles
Since each cycle takes 1 microsecond, the total time is:
16,955 microseconds
Memory Usage Calculation
Step 1: Code Size
- Each instruction in assembly language uses a specific number of bytes
- I checked the size of each instruction in the 6502 instruction set table and added them together
- Here is the breakdown of my program
- lda (immediate mode): 2 bytes each
- sta (absolute mode): 3 bytes each
- ldy (immediate mode): 2 bytes
- iny (implied mode): 1 byte
- bne (branch): 2 bytes
- inc (absolute mode): 3 bytes
- ldx (absolute mode): 2 bytes
- cpx (immediate mode): 2 bytes
- The program contains the following instructions:
- lda (immediate mode): 4 occurrences * 2 bytes = 8 bytes
- sta (absolute mode): 2 occurrences * 3 bytes = 6 bytes
- sta (indirect, indexed mode): 1 occurrence * 2 bytes = 2 bytes
- ldy (immediate mode): 1 occurrence * 2 bytes = 2 bytes
- iny (implied mode): 1 occurrence * 1 byte = 1 byte
- bne (branch): 2 occurrences * 2 bytes = 4 bytes
- inc (absolute mode): 1 occurrence * 3 bytes = 3 bytes
- ldx (absolute mode): 1 occurrence * 2 bytes = 2 bytes
- cpx (immediate mode): 1 occurrence * 2 bytes = 2 bytes
- Total code size: 8 + 6 + 2 + 2 + 1 + 4 + 3 + 2 + 2 = 30 bytes
Step 2: Pointer and Variable Size
- I used two memory locations for pointers ($40 and $41):
- Each pointer uses 1 byte of memory
- Total pointer size: 1 byte * 2 pointeres = 2 bytes
- Other temporary variables or flags were not uses in this specific program
Step 3: Total Memory Usage
- Adding the code size and pointer size:
- 30 bytes (code) + 2 bytes (pointers) = 32 bytes
So, the total memory used by my program is:
32 bytes
Optimization and Performance Comparison
1. Identifying Bottlenecks
In the original code, the main bottleneck is caused by processing one pixel at a time inside the loop. This results in:
- Loop Overhead:
- Each pixel requires multiple instructions like STA($40), Y, INY, and BNE loop
- These repetitive instructions increase the total execution time
- High Loop Count:
- For each page (256 pixels), the loop runs 256 times, which adds unnecessary overhead
2. Optimization Strategies
- Loop Unrolling:
- Process multiple pixels in a single loop iteration to reduce the number of loop executions
- I tried unrolling the loop to process 2 pixels and 4 pixels at one
- Reducing Loop Overhead:
- By unrolling, fewer instructions like BNE loop and INY are executed
- This makes the loop more efficient without changing the overall behaviour
3. Code Explanation and Optimization Reasoning
(1) 2-Pixel Loop Unrolling
lda #$00 ; Set pointer in memory location $40 to point to $0200
sta $40 ; Low byte ($00) goes in address $40
lda #$02
sta $41 ; High byte ($02) goes into address $41
lda #$07 ; Load the fixed colour number
ldy #$00 ; Reset Y index
loop_unrolled:
sta ($40),y ; Store colour at (pointer)+Y
iny
sta ($40),y ; Store colour at (pointer)+Y+1
iny
bne loop_unrolled ; Repeat until 256 pixels filled
inc $41 ; Move to the next page by incrementing the high byte
ldx $41 ; Load the current high byte into X register
cpx #$06 ; Compare with the final page value (6)
bne loop_unrolled ; Continue until all pages are filled
rts ; Return from subroutine
- Processes 2 pixels per iteration by executing two STA instructions in one loop cycle
- Reduces the number of loop iterations by half (from 256 to 128)
- The loop overhead is reduced significantly, resulting in about 1.5x speed improvement
(2) 4-Pixel Loop Unrolling
lda #$00 ; Set pointer in memory location $40 to point to $0200
sta $40 ; Low byte ($00) goes in address $40
lda #$02
sta $41 ; High byte ($02) goes into address $41
lda #$07 ; Load the fixed colour number
fill_page:
ldy #$00 ; Reset Y index
loop_unrolled:
sta ($40),y ; Store colour at (pointer)+Y
iny
sta ($40),y ; Store colour at (pointer)+Y+1
iny
sta ($40),y ; Store colour at (pointer)+Y+2
iny
sta ($40),y ; Store colour at (pointer)+Y+3
iny
bne loop_unrolled ; Repeat until 256 pixels filled
inc $41 ; Move to the next page by incrementing the high byte
ldx $41 ; Load the current high byte into X register
cpx #$06 ; Compare with the final page value (6)
bne fill_page ; Continue until all pages are filled
rts ; Return from subroutine
lda #$00 ; Set pointer in memory location $40 to point to $0200
sta $40 ; Low byte ($00) goes in address $40
lda #$02
sta $41 ; High byte ($02) goes into address $41
lda #$07 ; Load the fixed colour number
fill_page:
ldy #$00 ; Reset Y index
loop_unrolled:
sta ($40),y ; Store colour at (pointer)+Y
iny
sta ($40),y ; Store colour at (pointer)+Y+1
iny
sta ($40),y ; Store colour at (pointer)+Y+2
iny
sta ($40),y ; Store colour at (pointer)+Y+3
iny
bne loop_unrolled ; Repeat until 256 pixels filled
inc $41 ; Move to the next page by incrementing the high byte
ldx $41 ; Load the current high byte into X register
cpx #$06 ; Compare with the final page value (6)
bne fill_page ; Continue until all pages are filled
rts ; Return from subroutine
4. Performance Comparison
Code Version Total Execution Time (cycles) Relative Performance
Original Code 16,943 cycles Baseline (1x)
2-Pixel Unrolled 11,155 cycles 1.5x faster
4-Pixel Unrolled 9,440 cycles 1.8x faster
5. Conclusion and Reflection
Through this process, I explored various optimization strategies for the given assembly code. Both 2-pixel unrolling and 4-pixel unrolling improved performance significantly:
• 2-Pixel Unrolling: A good balance between simplicity and performance (1.5x faster).
• 4-Pixel Unrolling: Achieved close to 2x performance improvement but increased code length.
While these optimizations provided great improvements, I still wonder if there is a way to achieve nearly 2x improvement without further increasing code length.
Modifying the Code
1. Filling the Display with Light Blue ($e)
2. Different Colours per Page
lda #$00 ; Set pointer to $0200
sta $40 ; Store low byte in $40
lda #$02 ; Set high byte
sta $41 ; Store high byte in $41
ldy #$00 ; Reset index
set_color:
ldx $41 ; Load current page
cpx #$02 ; Compare to $02
bne page_2 ; If not $02, jump to page 2
lda #$02 ; Page 1: Red
jmp fill_page
page_2:
cpx #$03 ; Compare to $03
bne page_3 ; If not $03, jump to page 3
lda #$03 ; Page 2: Cyan
jmp fill_page
page_3:
cpx #$04 ; Compare to $04
bne page_4 ; If not $04, jump to page 4
lda #$07 ; Page 3: Yellow
jmp fill_page
page_4:
lda #$0A ; Page 4: Light red
fill_page:
ldy #$00 ; Reset Y index
loop:
sta ($40),y ; Store color at (pointer)+Y
iny ; Increment Y
bne loop ; Continue until done
inc $41 ; Increment page
ldx $41 ; Check current page number
cpx #$06 ; If less than $06, repeat
bne set_color
rts ; Return when done
3. Random Colours for Each Pixel
lda #$00 ; Set pointer to $0200
sta $40 ; Store low byte in $40
lda #$02 ; Set high byte
sta $41 ; Store high byte in $41
ldy #$00 ; Reset index
loop:
lda $fe ; Read random number from PRNG
and #$0F ; Mask to get lower 4 bits (colour code)
sta ($40),y ; Store random colour at (pointer)+Y
iny ; Increment Y
bne loop ; Continue for 256 pixels
inc $41 ; Increment page
ldx $41 ; Check current page
cpx #$06 ; If less than $06, repeat
bne loop
rts ; Return when done

Experiment
1. Add tya
- Add tya above iny
- The screen shows a gradient of colours that changes row by row. Each row has a single colour.
- How many colours? 256 different colours, as the Y register increments from 0 to 255.
- Why? The tya instruction copies the Y index register to the A register, which is used to set the pixel colour. Since y increments for each pixel, the colour changes gradually.
2. Add lsr
- Add lsr under tya
- The gradient appears blocky, with colours repeating in groups of two pixels
- How may colours? 16 colours, as the lsr shifts out bits, leaving only the lower 4 bits that determine the colour.
- Why? The lsr divides the Y value by 2, reducing the range of values and causing multiple pixels to share the same colour.
3. Multiple lsr Instruction
- Adding multiple lsr instructions(2-5) results in the screen displaying vertical bands of alternating colours. As the number of lsr instructions increases, these bands become progressively wider, and the transitions between colours become less frequent.
- How many colours? All 16 colours remain visible on the screen, but their arrangement changes with the number of lsr instructions.
- Why? Each lsr instruction shifts the bits in the accumulator (A register) to the right, dividing the current value by 2. This reduces the number of distinct values derived from the Y register at a given moment but still cycles through all possible values over the full range. Increasing the number of lsr instructions creates larger increments between pixel values, which results in wider bands of the same colour. However, the full 16-colour palette is maintained.
4. Replace lsr with asl
- Add multiple asl 1to 4 times and observe them. You may need to adjust speed for observation.
- The screen begins with vertical stripes, with each stripe representing a different colour in the 16-colour palette (when use asl once). As more asl instructions are added, the stripes become wider and fewer colours are displayed across the screen.
- How many colours?
- 1 asl : 16 colours
- 2 asl : 4 colours
- 3 asl : 2 colours
- 4 asl : 1 colour (I saw black)
5. Multiple iny
1 any Instruction
- Visual Effect: The screen fills line by line from top to bottom with yellow. Each row is entirely yellow, and this process repeats until the entire screen is filled.
- Reason: The iny instruction increments the Y register by 1, moving sequentially to the next pixel in the same row and filling all rows and columns.
- Colors: Only 1 colour (yellow)
- Pattern: A solid yellow screen
2 any Instruction
- Visual Effect: Similar to 1 iny, the screen fills from top to bottom, but the final result is alternating vertical stripes of yellow and black.
- Reason: With 2 iny instructions, every other memory address is skipped, resulting in alternating yellow and black pixels, creating vertical stripes.
- Colours: 2 colours (yellow and black)
- Pattern: Alternating vertical stripes of yellow black
3 any Instruction
- Visual Effect: The screen fills diagonally from the top-left to the bottom-right, creating a yellow and black checkerboard pattern. Eventually, all black spaces are filled, leaving the entire screen yellow.
- Reason: With 3 iny instructions, two addresses are skipped for every pixel filled, creating a checkerboard effect. Over time, the skipped black pixels are also filled as the loop repeats.
- Colours: Initially, 2 colours (yellow and black); ultimately, only yellow.
- Pattern: Diagonal checkerboard -> Full yellow screen
4 any Instruction
- Visual Effect: Similar to 2 iny, the screen fills from top to bottom with vertical yellow stripes, but the yellow stripes are much thinner, and the black spaces are much wider.
- Reason: With 4 iny instructions, 3 addresses are skipped for every pixel filled, resulting in a pattern with thin yellow stripes and thick black spaces.
- Colours: 2 Colours (yellow and black)
- Pattern: Thin yellow vertical stripes alternating with thick black spaces.
5 any Instruction
- Visual Effect: The screen fills diagonally from the top-left to the bottom-right with a checkerboard-like pattern. However, the pattern consists of larger blocks compared to 3 iny. The screen is fully yellow after 4 iterations.
- Reason: With 5 iny instructions, larger gaps are skipped for each pixel filled. Over time, the skipped spaces are filled in larger blocks until the screen becomes fully yellow.
- Colours: Initially, 2 colours (yellow and black); ultimately, only yellow.
- Pattern: Large-block checkerboard -> Fully yellow screen
Challenges
1. Set all of the display pixels to the same colour, except for the middle four pixels, which will be drawn in another colour.
The approach I chose was:
- Keep the existing code that fills the entire screen as is
- Find the memory addresses for the 4 centre pixels ($40E, $040F, $410, $0411) and change their colour
lda #$00
sta $40
lda #$02
sta $41
lda #$07
ldy #$00
loop:
sta ($40),y
iny
bne loop
inc $41
ldx $41
cpx #$06
bne loop
lda #$08
sta $040E
sta $040F
sta $0410
sta $0411
lda #$00
sta $40
lda #$02
sta $41
lda #$07
ldy #$00
loop:
sta ($40),y
iny
bne loop
inc $41
ldx $41
cpx #$06
bne loop
lda #$08
sta $040E
sta $040F
sta $0410
sta $0411
2. Write a program which draws lines around the edge of the display:
A red line across the top

lda #$00
sta $40
lda #$02
sta $41
lda #$02
ldy #$00
loop1:
sta ($40),y
iny
cpy #$20
bne loop1
lda #$07
loop2:
sta ($40),y
iny
bne loop2
inc $41
ldx $41
cpx #$06
bne loop2
This code repeatedly stores colour values at specified memory addresses to fill the screen. It uses loop1 and loop2 to fill the first row with red, and the subsequent areas with yellow. The memory addresses are incremented using the y register, setting the color for each pixel, and the page is changed to fill the entire screen.
A green line across the bottom
lda #$00
sta $40
lda #$02
sta $41
lda #$42
sta $42
lda #$43
sta $43
ldy #$00
loop:
ldx $41
cpx #$05
bne setYellow
cpy #$E0
bcc setYellow
lda $43
setPixel:
sta ($40),y
iny
bne loop
inc $41
cpx #$06
bne loop
rts
setYellow:
lda $42
jmp setPixel
This code fills the screen starting from memory address $0200. Most of the screen is filled with yellow ($07), and the last row of the last page ($E0~$FF) is filled with green ($05). It uses a pointer and a loop to go through the memory and changes the colour based on conditions.
At first, there were issues with the green row spilling into other rows or only colouring one pixel. This was fixed by adjusting the starting position of the last row to $E0 and fixing the conditions. The program now works as intended and fills the screen with the correct colours.
A blue line across the right side
lda #$00
sta $40
lda #$02
sta $41
lda #$07
ldy #$00
loop:
cpy #$1F
bne notLast
lda #$06
jmp setPixel
notLast:
lda #$07
setPixel:
sta ($40),y
iny
cpy #$20
bne loop
clc
lda $40
adc #$20
sta $40
bcc noCarry
inc $41
noCarry:
ldy #$00
lda $41
cmp #$08
bne loop
rts
This code fills each row of the screen memory with yellow and sets the last pixel of each row to blue. A memory pointer is used to navigate the screen, processing one row at a time. After completing a row, the pointer moves to the next row.
The main challenge was correctly handling the end of a row and moving to the next one without leaving black areas. This issue was caused by not properly updating the memory pointer and handling page boundaries. To fix this, the pointer was increased by 32 bytes after each row, and the carry was handled to update the high byte when crossing page boundaries.
A purple line across the left size
lda #$00
sta $40
lda #$02
sta $41
lda #$07
ldy #$00
loop:
cpy #$00
bne notFirst
lda #$04
jmp setPixel
notFirst:
lda #$07
setPixel:
sta ($40),y
iny
cpy #$20
bne loop
clc
lda $40
adc #$20
sta $40
bcc noCarry
inc $41
noCarry:
ldy #$00
lda $41
cmp #$08
bne loop
rts
Reflection
1. Understanding the Basics of Assembly Language
At first, assembly language felt very complicated and hard to understand. But by writing and running code myself, I gradually learned how basic instructions work. It was fascinating to see how computers process data and operate at such a low level.
2. Improving Problem-Solving Skills
When the code didn’t work as expected, I had to figure out the cause and debug it. Fixing errors, like changing pixel colours or drawing something in the centre, helped me think logically and improved my troubleshooting skills.
3. Being Creative with Code
I realized there are many ways to solve the same problem. Optimizing the code or making it simpler taught me how important it is to write efficient programs. Experimenting with different instructions and seeing their results was also a creative and fun process.
4. Strengths and Challenges of Assembly Language
I learned that assembly language allows precise control over hardware, which is very powerful. But it also takes a lot of effort to write and debug the code. This made me appreciate higher-level programming languages while understanding why low-level languages are still important.
5. The Importance of Patience and Attention to Detail
Even a small mistake in assembly code can stop the program from working. This lab taught me to focus on the details and have patience throughout the process.
Overall, this lab wasn’t just about learning how to code. It helped me solve problems, understand how programming works at a fundamental level, and develop a deeper appreciation for how computers operate. It was a challenging but rewarding experience.
Comments
Post a Comment