Chapter 1: Introduction to Assemblers and the 6502

Welcome to the first chapter of our journey into building a 6502 assembler! In this chapter, we’ll establish the foundational concepts you need before diving into code.

What is an Assembler?

An assembler is a program that translates assembly language source code into machine code that a processor can execute directly.

The Translation Pipeline

Source Code  →  Assembler  →  Machine Code
  (text)                       (bytes)

Consider this simple example:

lda #0x42    ; Load 0x42 into accumulator
sta 0x00     ; Store to address 0x00

The assembler transforms this into:

A9 42 85 00

These four bytes are the actual instructions the 6502 CPU will execute.

Assembly vs Machine Code vs High-Level Languages

Machine Code: Raw bytes that the CPU understands. Each byte has a specific meaning - opcodes, operands, data. Writing directly in machine code is tedious and error-prone.

Assembly Language: A human-readable representation of machine code. Each instruction has a mnemonic (like LDA for “Load Accumulator”) that maps to a specific opcode. Assembly is a 1-to-1 mapping with machine code.

High-Level Languages: Languages like C, Rust, or Python abstract away machine details. One line of high-level code might compile to dozens of machine instructions.

The 6502 Microprocessor

A Brief History

The 6502, designed in 1975 by MOS Technology, became one of the most influential processors in computing history. It powered:

Apple II (1977)
Commodore 64 (1982)
Nintendo Entertainment System (1983)
Atari 2600 (1977)

Its simplicity and low cost made it ideal for early personal computers and game consoles.

Architecture Overview

The 6502 is an 8-bit processor with a 16-bit address bus:

8-bit: It processes data 8 bits (1 byte) at a time
16-bit address bus: It can address up to 64KB of memory (2^16 = 65,536 bytes)

Registers

The 6502 has a small set of registers:

Register	Size	Purpose
A (Accumulator)	8-bit	Main register for arithmetic and logic operations
X	8-bit	Index register, used for addressing and counting
Y	8-bit	Index register, similar to X
SP (Stack Pointer)	8-bit	Points to current position in the stack (page $01)
PC (Program Counter)	16-bit	Address of the next instruction to execute
P (Status/Flags)	8-bit	Processor status flags

Status Flags

The P register contains flags that reflect the result of operations:

7 6 5 4 3 2 1 0
N V - B D I Z C

Flag	Name	Meaning
N	Negative	Set if result is negative (bit 7 is 1)
V	Overflow	Set on signed arithmetic overflow
B	Break	Set by BRK instruction
D	Decimal	Enables BCD arithmetic mode
I	Interrupt	Disables IRQ when set
Z	Zero	Set if result is zero
C	Carry	Set on unsigned overflow/underflow

6502 Instruction Format

Each 6502 instruction consists of:

Opcode (1 byte): Identifies the instruction and addressing mode
Operand (0, 1, or 2 bytes): The data or address the instruction operates on

Instruction Sizes

1 byte: Instructions with no operand (implied, accumulator modes)
2 bytes: Opcode + 8-bit operand (immediate, zero page, relative)
3 bytes: Opcode + 16-bit operand (absolute addressing)

Little-Endian Byte Order

The 6502 uses little-endian byte order for 16-bit values. The low byte comes first:

Address $1234 is stored as: 34 12

This is important when emitting word values in our assembler.

Addressing Modes

The 6502 supports multiple addressing modes that determine how the operand is interpreted. Each combination of instruction and addressing mode has a unique opcode.

Implied Mode

No operand - the instruction operates on a specific register or performs a fixed action.

nop         ; No operation (1 byte: EA)
clc         ; Clear carry flag (1 byte: 18)
rts         ; Return from subroutine (1 byte: 60)

Accumulator Mode

Operates directly on the A register.

asl a       ; Arithmetic shift left on A (1 byte: 0A)
ror a       ; Rotate right on A (1 byte: 6A)

Immediate Mode

The operand is the actual value to use.

lda #0xFF   ; Load 0xFF into A (2 bytes: A9 FF)
ldx #0x10   ; Load 0x10 into X (2 bytes: A2 10)

The # prefix indicates immediate mode.

Zero Page Mode

The operand is an 8-bit address in the first 256 bytes of memory (page zero).

lda 0x80    ; Load from address 0x0080 (2 bytes: A5 80)
sta 0x00    ; Store to address 0x0000 (2 bytes: 85 00)

Zero page access is faster and uses fewer bytes than absolute addressing.

Absolute Mode

The operand is a full 16-bit address.

lda 0x2000  ; Load from address 0x2000 (3 bytes: AD 00 20)
jmp 0x8000  ; Jump to address 0x8000 (3 bytes: 4C 00 80)

Indexed Modes

Add an index register to the address:

lda 0x2000,x    ; Load from 0x2000 + X (Absolute,X)
lda 0x2000,y    ; Load from 0x2000 + Y (Absolute,Y)
lda 0x80,x      ; Load from 0x80 + X (Zero Page,X)

Indirect Modes

Use an address stored in memory:

jmp (0x2000)    ; Jump to address stored at 0x2000-0x2001 (Indirect)
lda (0x80,x)    ; Indexed Indirect: address at (0x80+X)
lda (0x80),y    ; Indirect Indexed: (address at 0x80) + Y

Relative Mode

Used only for branch instructions. The operand is a signed 8-bit offset from the next instruction.

beq label       ; Branch if zero flag set
bne loop        ; Branch if zero flag clear

The Byte Fantasy Console Memory Map

Our assembler targets the Byte Fantasy Console, which has a specific memory layout:

0x0000 - 0x00FF : Zero Page (fast access RAM)
0x0100 - 0x01FF : Stack
0x0200 - 0x0FFF : General RAM
0x1000 - 0x1FFF : Video RAM (64x64 pixels)
...
0x8000 - 0xFFFB : Program ROM
0xFFFC - 0xFFFD : Reset Vector (16-bit address)
0xFFFE - 0xFFFF : IRQ Vector (16-bit address)

Special Registers

Address	Name	Purpose
`0xFD`	VID_PTR	Video page pointer (high byte of VRAM address)
`0xFE`	RANDOM	Random number generator
`0xFF`	INPUT	Controller input state

Reset and IRQ Vectors

When the console powers on:

It reads the 16-bit address at 0xFFFC-0xFFFD
Jumps to that address (your reset/init code)

When an IRQ occurs (like VBLANK):

It reads the address at 0xFFFE-0xFFFF
Jumps to that address (your interrupt handler)

What We’ll Build

Our assembler will:

Scan source code into tokens (lexical analysis)
Parse tokens into an Abstract Syntax Tree (syntax analysis)
Resolve labels and forward references (symbol table)
Generate machine code bytes (code generation)
Output a binary file ready for the emulator

Example Input

.org 0x8000

start:
    lda #0x42
    sta 0x00
    jmp start

.org 0xFFFC
.dw start

Example Output

A binary file containing:

At offset 0x0000 (address 0x8000): A9 42 85 00 4C 00 80
At offset 0x7FFC (address 0xFFFC): 00 80

Summary

In this chapter, we learned:

An assembler translates human-readable assembly into machine code
The 6502 is an 8-bit processor with a 16-bit address space
Instructions consist of opcodes and operands
Different addressing modes determine how operands are interpreted
The Byte console has a specific memory map with special registers

In the next chapter, we’ll start building our scanner to tokenize assembly source code.

Next: Chapter 2 - Building the Scanner

Keyboard shortcuts

ByteASM Tutorial