Chapter 1: Introduction to Assemblers and the 6502
Welcome to the first chapter of our journey into building a 6502 assembler! In this chapter, we’ll establish the foundational concepts you need before diving into code.
What is an Assembler?
An assembler is a program that translates assembly language source code into machine code that a processor can execute directly.
The Translation Pipeline
Source Code → Assembler → Machine Code
(text) (bytes)
Consider this simple example:
lda #0x42 ; Load 0x42 into accumulator
sta 0x00 ; Store to address 0x00
The assembler transforms this into:
A9 42 85 00
These four bytes are the actual instructions the 6502 CPU will execute.
Assembly vs Machine Code vs High-Level Languages
Machine Code: Raw bytes that the CPU understands. Each byte has a specific meaning - opcodes, operands, data. Writing directly in machine code is tedious and error-prone.
Assembly Language: A human-readable representation of machine code. Each instruction has a mnemonic (like LDA for “Load Accumulator”) that maps to a specific opcode. Assembly is a 1-to-1 mapping with machine code.
High-Level Languages: Languages like C, Rust, or Python abstract away machine details. One line of high-level code might compile to dozens of machine instructions.
The 6502 Microprocessor
A Brief History
The 6502, designed in 1975 by MOS Technology, became one of the most influential processors in computing history. It powered:
- Apple II (1977)
- Commodore 64 (1982)
- Nintendo Entertainment System (1983)
- Atari 2600 (1977)
Its simplicity and low cost made it ideal for early personal computers and game consoles.
Architecture Overview
The 6502 is an 8-bit processor with a 16-bit address bus:
- 8-bit: It processes data 8 bits (1 byte) at a time
- 16-bit address bus: It can address up to 64KB of memory (2^16 = 65,536 bytes)
Registers
The 6502 has a small set of registers:
| Register | Size | Purpose |
|---|---|---|
| A (Accumulator) | 8-bit | Main register for arithmetic and logic operations |
| X | 8-bit | Index register, used for addressing and counting |
| Y | 8-bit | Index register, similar to X |
| SP (Stack Pointer) | 8-bit | Points to current position in the stack (page $01) |
| PC (Program Counter) | 16-bit | Address of the next instruction to execute |
| P (Status/Flags) | 8-bit | Processor status flags |
Status Flags
The P register contains flags that reflect the result of operations:
7 6 5 4 3 2 1 0
N V - B D I Z C
| Flag | Name | Meaning |
|---|---|---|
| N | Negative | Set if result is negative (bit 7 is 1) |
| V | Overflow | Set on signed arithmetic overflow |
| B | Break | Set by BRK instruction |
| D | Decimal | Enables BCD arithmetic mode |
| I | Interrupt | Disables IRQ when set |
| Z | Zero | Set if result is zero |
| C | Carry | Set on unsigned overflow/underflow |
6502 Instruction Format
Each 6502 instruction consists of:
- Opcode (1 byte): Identifies the instruction and addressing mode
- Operand (0, 1, or 2 bytes): The data or address the instruction operates on
Instruction Sizes
- 1 byte: Instructions with no operand (implied, accumulator modes)
- 2 bytes: Opcode + 8-bit operand (immediate, zero page, relative)
- 3 bytes: Opcode + 16-bit operand (absolute addressing)
Little-Endian Byte Order
The 6502 uses little-endian byte order for 16-bit values. The low byte comes first:
Address $1234 is stored as: 34 12
This is important when emitting word values in our assembler.
Addressing Modes
The 6502 supports multiple addressing modes that determine how the operand is interpreted. Each combination of instruction and addressing mode has a unique opcode.
Implied Mode
No operand - the instruction operates on a specific register or performs a fixed action.
nop ; No operation (1 byte: EA)
clc ; Clear carry flag (1 byte: 18)
rts ; Return from subroutine (1 byte: 60)
Accumulator Mode
Operates directly on the A register.
asl a ; Arithmetic shift left on A (1 byte: 0A)
ror a ; Rotate right on A (1 byte: 6A)
Immediate Mode
The operand is the actual value to use.
lda #0xFF ; Load 0xFF into A (2 bytes: A9 FF)
ldx #0x10 ; Load 0x10 into X (2 bytes: A2 10)
The # prefix indicates immediate mode.
Zero Page Mode
The operand is an 8-bit address in the first 256 bytes of memory (page zero).
lda 0x80 ; Load from address 0x0080 (2 bytes: A5 80)
sta 0x00 ; Store to address 0x0000 (2 bytes: 85 00)
Zero page access is faster and uses fewer bytes than absolute addressing.
Absolute Mode
The operand is a full 16-bit address.
lda 0x2000 ; Load from address 0x2000 (3 bytes: AD 00 20)
jmp 0x8000 ; Jump to address 0x8000 (3 bytes: 4C 00 80)
Indexed Modes
Add an index register to the address:
lda 0x2000,x ; Load from 0x2000 + X (Absolute,X)
lda 0x2000,y ; Load from 0x2000 + Y (Absolute,Y)
lda 0x80,x ; Load from 0x80 + X (Zero Page,X)
Indirect Modes
Use an address stored in memory:
jmp (0x2000) ; Jump to address stored at 0x2000-0x2001 (Indirect)
lda (0x80,x) ; Indexed Indirect: address at (0x80+X)
lda (0x80),y ; Indirect Indexed: (address at 0x80) + Y
Relative Mode
Used only for branch instructions. The operand is a signed 8-bit offset from the next instruction.
beq label ; Branch if zero flag set
bne loop ; Branch if zero flag clear
The Byte Fantasy Console Memory Map
Our assembler targets the Byte Fantasy Console, which has a specific memory layout:
0x0000 - 0x00FF : Zero Page (fast access RAM)
0x0100 - 0x01FF : Stack
0x0200 - 0x0FFF : General RAM
0x1000 - 0x1FFF : Video RAM (64x64 pixels)
...
0x8000 - 0xFFFB : Program ROM
0xFFFC - 0xFFFD : Reset Vector (16-bit address)
0xFFFE - 0xFFFF : IRQ Vector (16-bit address)
Special Registers
| Address | Name | Purpose |
|---|---|---|
0xFD | VID_PTR | Video page pointer (high byte of VRAM address) |
0xFE | RANDOM | Random number generator |
0xFF | INPUT | Controller input state |
Reset and IRQ Vectors
When the console powers on:
- It reads the 16-bit address at
0xFFFC-0xFFFD - Jumps to that address (your reset/init code)
When an IRQ occurs (like VBLANK):
- It reads the address at
0xFFFE-0xFFFF - Jumps to that address (your interrupt handler)
What We’ll Build
Our assembler will:
- Scan source code into tokens (lexical analysis)
- Parse tokens into an Abstract Syntax Tree (syntax analysis)
- Resolve labels and forward references (symbol table)
- Generate machine code bytes (code generation)
- Output a binary file ready for the emulator
Example Input
.org 0x8000
start:
lda #0x42
sta 0x00
jmp start
.org 0xFFFC
.dw start
Example Output
A binary file containing:
- At offset 0x0000 (address 0x8000):
A9 42 85 00 4C 00 80 - At offset 0x7FFC (address 0xFFFC):
00 80
Summary
In this chapter, we learned:
- An assembler translates human-readable assembly into machine code
- The 6502 is an 8-bit processor with a 16-bit address space
- Instructions consist of opcodes and operands
- Different addressing modes determine how operands are interpreted
- The Byte console has a specific memory map with special registers
In the next chapter, we’ll start building our scanner to tokenize assembly source code.