Chapter 7: Two-Pass Assembly

In this chapter, we’ll implement the two-pass assembly process that transforms our AST into machine code.

Why Two Passes?

Consider this program:

    jmp end
    nop
end:
    rts

When we reach jmp end, we need to know the address of end. But we haven’t seen end yet! This is the forward reference problem.

The solution is two passes:

Pass 1: Collect all labels and calculate their addresses
Pass 2: Generate code using the complete symbol table

The Assembler Structure

#![allow(unused)]
fn main() {
pub struct Assembler {
    symbols: SymbolTable,
    current_address: u16,
    origin: u16,
    output: Vec<u8>,
    errors: Vec<CodeGenError>,
    current_file: Option<String>,
}

impl Assembler {
    pub fn new() -> Self {
        Self {
            symbols: SymbolTable::new(),
            current_address: 0,
            origin: 0,
            output: Vec::new(),
            errors: Vec::new(),
            current_file: None,
        }
    }
}
}

The Main Assemble Function

#![allow(unused)]
fn main() {
pub fn assemble(&mut self, program: &Program) -> Result<Vec<u8>, AssemblerError> {
    self.current_file = program.source_file.clone();

    // Pass 1: Collect symbols
    self.pass1(program)?;

    // Pass 2: Generate code
    self.pass2(program)?;

    if !self.errors.is_empty() {
        return Err(AssemblerError::Multiple(
            self.errors.iter().cloned().map(AssemblerError::CodeGen).collect(),
        ));
    }

    Ok(std::mem::take(&mut self.output))
}
}

Pass 1: Symbol Collection

In Pass 1, we walk through the program and:

Record label addresses
Process constants from .equ
Calculate instruction sizes to track the current address

#![allow(unused)]
fn main() {
fn pass1(&mut self, program: &Program) -> Result<(), AssemblerError> {
    self.current_address = self.origin;

    for stmt in &program.statements {
        match stmt {
            Statement::Label(label) => {
                self.pass1_label(label)?;
            }
            Statement::Instruction(instr) => {
                self.pass1_instruction(instr)?;
            }
            Statement::Directive(dir) => {
                self.pass1_directive(dir)?;
            }
        }
    }

    Ok(())
}
}

Processing Labels

#![allow(unused)]
fn main() {
fn pass1_label(&mut self, label: &LabelDef) -> Result<(), AssemblerError> {
    let name = if label.is_local {
        self.symbols.qualify_local_label(&label.name)
    } else {
        // Update parent for subsequent local labels
        self.symbols.set_parent(Some(label.name.clone()));
        label.name.clone()
    };

    if let Err(e) = self.symbols.define_label(&name, self.current_address, label.location) {
        self.errors.push(e);
    }

    Ok(())
}
}

Calculating Instruction Size

We need to determine how many bytes an instruction will take:

#![allow(unused)]
fn main() {
fn pass1_instruction(&mut self, instr: &InstructionStmt) -> Result<(), AssemblerError> {
    let size = self.instruction_size(instr);
    self.current_address = self.current_address.wrapping_add(size as u16);
    Ok(())
}

fn instruction_size(&self, instr: &InstructionStmt) -> u8 {
    match self.determine_addressing_mode(instr.mnemonic, &instr.operand) {
        Ok(mode) => {
            if let Some(opcode) = get_opcode(instr.mnemonic, mode) {
                opcode.size
            } else {
                1  // Invalid, will error in pass 2
            }
        }
        Err(_) => 1,
    }
}
}

Processing Directives in Pass 1

#![allow(unused)]
fn main() {
fn pass1_directive(&mut self, directive: &DirectiveStmt) -> Result<(), AssemblerError> {
    match directive {
        DirectiveStmt::Org { address, .. } => {
            let addr = self.evaluate(address)? as u16;
            self.origin = addr;
            self.current_address = addr;
        }

        DirectiveStmt::Db { values, .. } => {
            for value in values {
                match value {
                    DataValue::Byte(_) => {
                        self.current_address = self.current_address.wrapping_add(1);
                    }
                    DataValue::String(s) => {
                        self.current_address = self.current_address.wrapping_add(s.len() as u16);
                    }
                }
            }
        }

        DirectiveStmt::Dw { values, .. } => {
            self.current_address = self.current_address.wrapping_add((values.len() * 2) as u16);
        }

        DirectiveStmt::Equ { name, value, location } => {
            let val = self.evaluate(value)?;
            self.symbols.define_constant(name, val, *location)?;
        }

        DirectiveStmt::Include { .. } => {
            // Handle includes (recursive assembly)
        }
    }
    Ok(())
}
}

Pass 2: Code Generation

In Pass 2, we generate the actual machine code:

#![allow(unused)]
fn main() {
fn pass2(&mut self, program: &Program) -> Result<(), AssemblerError> {
    self.current_address = self.origin;
    self.output.clear();

    for stmt in &program.statements {
        match stmt {
            Statement::Label(label) => {
                // Update parent for local label resolution
                if !label.is_local {
                    self.symbols.set_parent(Some(label.name.clone()));
                }
            }

            Statement::Instruction(instr) => {
                if let Err(e) = self.emit_instruction(instr) {
                    self.errors.push(e);
                }
            }

            Statement::Directive(dir) => {
                if let Err(e) = self.emit_directive(dir) {
                    self.errors.push(e);
                }
            }
        }
    }

    Ok(())
}
}

Determining Addressing Mode

The same operand syntax can map to different addressing modes depending on the value:

#![allow(unused)]
fn main() {
fn determine_addressing_mode(
    &self,
    mnemonic: Mnemonic,
    operand: &Option<Operand>,
) -> Result<AddressingMode, CodeGenError> {
    match operand {
        None => Ok(AddressingMode::Implied),
        Some(Operand::Immediate(_)) => Ok(AddressingMode::Immediate),
        Some(Operand::Accumulator) => Ok(AddressingMode::Accumulator),
        Some(Operand::IndirectX(_)) => Ok(AddressingMode::IndirectX),
        Some(Operand::IndirectY(_)) => Ok(AddressingMode::IndirectY),
        Some(Operand::Indirect(_)) => Ok(AddressingMode::Indirect),

        Some(Operand::Address(expr)) => {
            // Branches always use relative
            if is_branch_instruction(mnemonic) {
                return Ok(AddressingMode::Relative);
            }

            // Try to evaluate; if small enough, use zero page
            match self.evaluate(expr) {
                Ok(value) if value >= 0 && value <= 0xFF => {
                    if get_opcode(mnemonic, AddressingMode::ZeroPage).is_some() {
                        Ok(AddressingMode::ZeroPage)
                    } else {
                        Ok(AddressingMode::Absolute)
                    }
                }
                _ => Ok(AddressingMode::Absolute),
            }
        }

        Some(Operand::IndexedX(expr)) => {
            match self.evaluate(expr) {
                Ok(value) if value >= 0 && value <= 0xFF => {
                    if get_opcode(mnemonic, AddressingMode::ZeroPageX).is_some() {
                        Ok(AddressingMode::ZeroPageX)
                    } else {
                        Ok(AddressingMode::AbsoluteX)
                    }
                }
                _ => Ok(AddressingMode::AbsoluteX),
            }
        }

        Some(Operand::IndexedY(expr)) => {
            match self.evaluate(expr) {
                Ok(value) if value >= 0 && value <= 0xFF => {
                    if get_opcode(mnemonic, AddressingMode::ZeroPageY).is_some() {
                        Ok(AddressingMode::ZeroPageY)
                    } else {
                        Ok(AddressingMode::AbsoluteY)
                    }
                }
                _ => Ok(AddressingMode::AbsoluteY),
            }
        }
    }
}
}

Zero Page Optimization

The 6502 has faster, shorter instructions for zero page addresses (0x00-0xFF):

lda 0x80       ; Zero Page: A5 80 (2 bytes, 3 cycles)
lda 0x0180     ; Absolute:  AD 80 01 (3 bytes, 4 cycles)

Our assembler automatically uses zero page mode when:

The address fits in 8 bits (0x00-0xFF)
A zero page variant exists for that instruction

Branch Instructions

Branches use relative addressing. The helper function identifies branch mnemonics:

#![allow(unused)]
fn main() {
fn is_branch_instruction(mnemonic: Mnemonic) -> bool {
    matches!(
        mnemonic,
        Mnemonic::BCC
            | Mnemonic::BCS
            | Mnemonic::BEQ
            | Mnemonic::BMI
            | Mnemonic::BNE
            | Mnemonic::BPL
            | Mnemonic::BVC
            | Mnemonic::BVS
    )
}
}

Output Helpers

#![allow(unused)]
fn main() {
fn emit_byte(&mut self, byte: u8) {
    self.output.push(byte);
    self.current_address = self.current_address.wrapping_add(1);
}

fn emit_word(&mut self, word: u16) {
    self.emit_byte((word & 0xFF) as u8);         // Low byte first
    self.emit_byte(((word >> 8) & 0xFF) as u8);  // High byte second
}

fn pad_to(&mut self, address: u16) {
    while self.current_address < address {
        self.emit_byte(0);
    }
}
}

Assembly Trace Example

Let’s trace assembling:

.org 0x8000
start:
    lda #0x42
    jmp start

Pass 1

Statement	Action	current_address
`.org 0x8000`	Set origin	0x8000
`start:`	Define start = 0x8000	0x8000
`lda #0x42`	Size = 2 bytes	0x8002
`jmp start`	Size = 3 bytes	0x8005

Symbol Table: { start: Address(0x8000) }

Pass 2

Statement	Output	Description
`.org 0x8000`	(reset to 0x8000)
`start:`	(nothing)	Just a label
`lda #0x42`	`A9 42`	LDA immediate = 0xA9
`jmp start`	`4C 00 80`	JMP absolute = 0x4C, addr = 0x8000 (little-endian)

Final output: A9 42 4C 00 80

Summary

In this chapter, we implemented two-pass assembly:

Pass 1: Collect labels and calculate addresses
- Process .equ constants
- Calculate instruction sizes
- Handle .org to set addresses
Pass 2: Generate machine code
- Look up all labels
- Emit opcode and operand bytes
- Handle zero page optimization

In the next chapter, we’ll implement the code generation details.

Previous: Chapter 6 - The Symbol Table | Next: Chapter 8 - Code Generation

Keyboard shortcuts

ByteASM Tutorial