Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 1: Introduction to Assemblers and the 6502

Welcome to the first chapter of our journey into building a 6502 assembler! In this chapter, we’ll establish the foundational concepts you need before diving into code.

What is an Assembler?

An assembler is a program that translates assembly language source code into machine code that a processor can execute directly.

The Translation Pipeline

Source Code  →  Assembler  →  Machine Code
  (text)                       (bytes)

Consider this simple example:

lda #0x42    ; Load 0x42 into accumulator
sta 0x00     ; Store to address 0x00

The assembler transforms this into:

A9 42 85 00

These four bytes are the actual instructions the 6502 CPU will execute.

Assembly vs Machine Code vs High-Level Languages

Machine Code: Raw bytes that the CPU understands. Each byte has a specific meaning - opcodes, operands, data. Writing directly in machine code is tedious and error-prone.

Assembly Language: A human-readable representation of machine code. Each instruction has a mnemonic (like LDA for “Load Accumulator”) that maps to a specific opcode. Assembly is a 1-to-1 mapping with machine code.

High-Level Languages: Languages like C, Rust, or Python abstract away machine details. One line of high-level code might compile to dozens of machine instructions.

The 6502 Microprocessor

A Brief History

The 6502, designed in 1975 by MOS Technology, became one of the most influential processors in computing history. It powered:

  • Apple II (1977)
  • Commodore 64 (1982)
  • Nintendo Entertainment System (1983)
  • Atari 2600 (1977)

Its simplicity and low cost made it ideal for early personal computers and game consoles.

Architecture Overview

The 6502 is an 8-bit processor with a 16-bit address bus:

  • 8-bit: It processes data 8 bits (1 byte) at a time
  • 16-bit address bus: It can address up to 64KB of memory (2^16 = 65,536 bytes)

Registers

The 6502 has a small set of registers:

RegisterSizePurpose
A (Accumulator)8-bitMain register for arithmetic and logic operations
X8-bitIndex register, used for addressing and counting
Y8-bitIndex register, similar to X
SP (Stack Pointer)8-bitPoints to current position in the stack (page $01)
PC (Program Counter)16-bitAddress of the next instruction to execute
P (Status/Flags)8-bitProcessor status flags

Status Flags

The P register contains flags that reflect the result of operations:

7 6 5 4 3 2 1 0
N V - B D I Z C
FlagNameMeaning
NNegativeSet if result is negative (bit 7 is 1)
VOverflowSet on signed arithmetic overflow
BBreakSet by BRK instruction
DDecimalEnables BCD arithmetic mode
IInterruptDisables IRQ when set
ZZeroSet if result is zero
CCarrySet on unsigned overflow/underflow

6502 Instruction Format

Each 6502 instruction consists of:

  1. Opcode (1 byte): Identifies the instruction and addressing mode
  2. Operand (0, 1, or 2 bytes): The data or address the instruction operates on

Instruction Sizes

  • 1 byte: Instructions with no operand (implied, accumulator modes)
  • 2 bytes: Opcode + 8-bit operand (immediate, zero page, relative)
  • 3 bytes: Opcode + 16-bit operand (absolute addressing)

Little-Endian Byte Order

The 6502 uses little-endian byte order for 16-bit values. The low byte comes first:

Address $1234 is stored as: 34 12

This is important when emitting word values in our assembler.

Addressing Modes

The 6502 supports multiple addressing modes that determine how the operand is interpreted. Each combination of instruction and addressing mode has a unique opcode.

Implied Mode

No operand - the instruction operates on a specific register or performs a fixed action.

nop         ; No operation (1 byte: EA)
clc         ; Clear carry flag (1 byte: 18)
rts         ; Return from subroutine (1 byte: 60)

Accumulator Mode

Operates directly on the A register.

asl a       ; Arithmetic shift left on A (1 byte: 0A)
ror a       ; Rotate right on A (1 byte: 6A)

Immediate Mode

The operand is the actual value to use.

lda #0xFF   ; Load 0xFF into A (2 bytes: A9 FF)
ldx #0x10   ; Load 0x10 into X (2 bytes: A2 10)

The # prefix indicates immediate mode.

Zero Page Mode

The operand is an 8-bit address in the first 256 bytes of memory (page zero).

lda 0x80    ; Load from address 0x0080 (2 bytes: A5 80)
sta 0x00    ; Store to address 0x0000 (2 bytes: 85 00)

Zero page access is faster and uses fewer bytes than absolute addressing.

Absolute Mode

The operand is a full 16-bit address.

lda 0x2000  ; Load from address 0x2000 (3 bytes: AD 00 20)
jmp 0x8000  ; Jump to address 0x8000 (3 bytes: 4C 00 80)

Indexed Modes

Add an index register to the address:

lda 0x2000,x    ; Load from 0x2000 + X (Absolute,X)
lda 0x2000,y    ; Load from 0x2000 + Y (Absolute,Y)
lda 0x80,x      ; Load from 0x80 + X (Zero Page,X)

Indirect Modes

Use an address stored in memory:

jmp (0x2000)    ; Jump to address stored at 0x2000-0x2001 (Indirect)
lda (0x80,x)    ; Indexed Indirect: address at (0x80+X)
lda (0x80),y    ; Indirect Indexed: (address at 0x80) + Y

Relative Mode

Used only for branch instructions. The operand is a signed 8-bit offset from the next instruction.

beq label       ; Branch if zero flag set
bne loop        ; Branch if zero flag clear

The Byte Fantasy Console Memory Map

Our assembler targets the Byte Fantasy Console, which has a specific memory layout:

0x0000 - 0x00FF : Zero Page (fast access RAM)
0x0100 - 0x01FF : Stack
0x0200 - 0x0FFF : General RAM
0x1000 - 0x1FFF : Video RAM (64x64 pixels)
...
0x8000 - 0xFFFB : Program ROM
0xFFFC - 0xFFFD : Reset Vector (16-bit address)
0xFFFE - 0xFFFF : IRQ Vector (16-bit address)

Special Registers

AddressNamePurpose
0xFDVID_PTRVideo page pointer (high byte of VRAM address)
0xFERANDOMRandom number generator
0xFFINPUTController input state

Reset and IRQ Vectors

When the console powers on:

  1. It reads the 16-bit address at 0xFFFC-0xFFFD
  2. Jumps to that address (your reset/init code)

When an IRQ occurs (like VBLANK):

  1. It reads the address at 0xFFFE-0xFFFF
  2. Jumps to that address (your interrupt handler)

What We’ll Build

Our assembler will:

  1. Scan source code into tokens (lexical analysis)
  2. Parse tokens into an Abstract Syntax Tree (syntax analysis)
  3. Resolve labels and forward references (symbol table)
  4. Generate machine code bytes (code generation)
  5. Output a binary file ready for the emulator

Example Input

.org 0x8000

start:
    lda #0x42
    sta 0x00
    jmp start

.org 0xFFFC
.dw start

Example Output

A binary file containing:

  • At offset 0x0000 (address 0x8000): A9 42 85 00 4C 00 80
  • At offset 0x7FFC (address 0xFFFC): 00 80

Summary

In this chapter, we learned:

  • An assembler translates human-readable assembly into machine code
  • The 6502 is an 8-bit processor with a 16-bit address space
  • Instructions consist of opcodes and operands
  • Different addressing modes determine how operands are interpreted
  • The Byte console has a specific memory map with special registers

In the next chapter, we’ll start building our scanner to tokenize assembly source code.


Next: Chapter 2 - Building the Scanner