Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 7: Two-Pass Assembly

In this chapter, we’ll implement the two-pass assembly process that transforms our AST into machine code.

Why Two Passes?

Consider this program:

    jmp end
    nop
end:
    rts

When we reach jmp end, we need to know the address of end. But we haven’t seen end yet! This is the forward reference problem.

The solution is two passes:

  1. Pass 1: Collect all labels and calculate their addresses
  2. Pass 2: Generate code using the complete symbol table

The Assembler Structure

#![allow(unused)]
fn main() {
pub struct Assembler {
    symbols: SymbolTable,
    current_address: u16,
    origin: u16,
    output: Vec<u8>,
    errors: Vec<CodeGenError>,
    current_file: Option<String>,
}

impl Assembler {
    pub fn new() -> Self {
        Self {
            symbols: SymbolTable::new(),
            current_address: 0,
            origin: 0,
            output: Vec::new(),
            errors: Vec::new(),
            current_file: None,
        }
    }
}
}

The Main Assemble Function

#![allow(unused)]
fn main() {
pub fn assemble(&mut self, program: &Program) -> Result<Vec<u8>, AssemblerError> {
    self.current_file = program.source_file.clone();

    // Pass 1: Collect symbols
    self.pass1(program)?;

    // Pass 2: Generate code
    self.pass2(program)?;

    if !self.errors.is_empty() {
        return Err(AssemblerError::Multiple(
            self.errors.iter().cloned().map(AssemblerError::CodeGen).collect(),
        ));
    }

    Ok(std::mem::take(&mut self.output))
}
}

Pass 1: Symbol Collection

In Pass 1, we walk through the program and:

  1. Record label addresses
  2. Process constants from .equ
  3. Calculate instruction sizes to track the current address
#![allow(unused)]
fn main() {
fn pass1(&mut self, program: &Program) -> Result<(), AssemblerError> {
    self.current_address = self.origin;

    for stmt in &program.statements {
        match stmt {
            Statement::Label(label) => {
                self.pass1_label(label)?;
            }
            Statement::Instruction(instr) => {
                self.pass1_instruction(instr)?;
            }
            Statement::Directive(dir) => {
                self.pass1_directive(dir)?;
            }
        }
    }

    Ok(())
}
}

Processing Labels

#![allow(unused)]
fn main() {
fn pass1_label(&mut self, label: &LabelDef) -> Result<(), AssemblerError> {
    let name = if label.is_local {
        self.symbols.qualify_local_label(&label.name)
    } else {
        // Update parent for subsequent local labels
        self.symbols.set_parent(Some(label.name.clone()));
        label.name.clone()
    };

    if let Err(e) = self.symbols.define_label(&name, self.current_address, label.location) {
        self.errors.push(e);
    }

    Ok(())
}
}

Calculating Instruction Size

We need to determine how many bytes an instruction will take:

#![allow(unused)]
fn main() {
fn pass1_instruction(&mut self, instr: &InstructionStmt) -> Result<(), AssemblerError> {
    let size = self.instruction_size(instr);
    self.current_address = self.current_address.wrapping_add(size as u16);
    Ok(())
}

fn instruction_size(&self, instr: &InstructionStmt) -> u8 {
    match self.determine_addressing_mode(instr.mnemonic, &instr.operand) {
        Ok(mode) => {
            if let Some(opcode) = get_opcode(instr.mnemonic, mode) {
                opcode.size
            } else {
                1  // Invalid, will error in pass 2
            }
        }
        Err(_) => 1,
    }
}
}

Processing Directives in Pass 1

#![allow(unused)]
fn main() {
fn pass1_directive(&mut self, directive: &DirectiveStmt) -> Result<(), AssemblerError> {
    match directive {
        DirectiveStmt::Org { address, .. } => {
            let addr = self.evaluate(address)? as u16;
            self.origin = addr;
            self.current_address = addr;
        }

        DirectiveStmt::Db { values, .. } => {
            for value in values {
                match value {
                    DataValue::Byte(_) => {
                        self.current_address = self.current_address.wrapping_add(1);
                    }
                    DataValue::String(s) => {
                        self.current_address = self.current_address.wrapping_add(s.len() as u16);
                    }
                }
            }
        }

        DirectiveStmt::Dw { values, .. } => {
            self.current_address = self.current_address.wrapping_add((values.len() * 2) as u16);
        }

        DirectiveStmt::Equ { name, value, location } => {
            let val = self.evaluate(value)?;
            self.symbols.define_constant(name, val, *location)?;
        }

        DirectiveStmt::Include { .. } => {
            // Handle includes (recursive assembly)
        }
    }
    Ok(())
}
}

Pass 2: Code Generation

In Pass 2, we generate the actual machine code:

#![allow(unused)]
fn main() {
fn pass2(&mut self, program: &Program) -> Result<(), AssemblerError> {
    self.current_address = self.origin;
    self.output.clear();

    for stmt in &program.statements {
        match stmt {
            Statement::Label(label) => {
                // Update parent for local label resolution
                if !label.is_local {
                    self.symbols.set_parent(Some(label.name.clone()));
                }
            }

            Statement::Instruction(instr) => {
                if let Err(e) = self.emit_instruction(instr) {
                    self.errors.push(e);
                }
            }

            Statement::Directive(dir) => {
                if let Err(e) = self.emit_directive(dir) {
                    self.errors.push(e);
                }
            }
        }
    }

    Ok(())
}
}

Determining Addressing Mode

The same operand syntax can map to different addressing modes depending on the value:

#![allow(unused)]
fn main() {
fn determine_addressing_mode(
    &self,
    mnemonic: Mnemonic,
    operand: &Option<Operand>,
) -> Result<AddressingMode, CodeGenError> {
    match operand {
        None => Ok(AddressingMode::Implied),
        Some(Operand::Immediate(_)) => Ok(AddressingMode::Immediate),
        Some(Operand::Accumulator) => Ok(AddressingMode::Accumulator),
        Some(Operand::IndirectX(_)) => Ok(AddressingMode::IndirectX),
        Some(Operand::IndirectY(_)) => Ok(AddressingMode::IndirectY),
        Some(Operand::Indirect(_)) => Ok(AddressingMode::Indirect),

        Some(Operand::Address(expr)) => {
            // Branches always use relative
            if is_branch_instruction(mnemonic) {
                return Ok(AddressingMode::Relative);
            }

            // Try to evaluate; if small enough, use zero page
            match self.evaluate(expr) {
                Ok(value) if value >= 0 && value <= 0xFF => {
                    if get_opcode(mnemonic, AddressingMode::ZeroPage).is_some() {
                        Ok(AddressingMode::ZeroPage)
                    } else {
                        Ok(AddressingMode::Absolute)
                    }
                }
                _ => Ok(AddressingMode::Absolute),
            }
        }

        Some(Operand::IndexedX(expr)) => {
            match self.evaluate(expr) {
                Ok(value) if value >= 0 && value <= 0xFF => {
                    if get_opcode(mnemonic, AddressingMode::ZeroPageX).is_some() {
                        Ok(AddressingMode::ZeroPageX)
                    } else {
                        Ok(AddressingMode::AbsoluteX)
                    }
                }
                _ => Ok(AddressingMode::AbsoluteX),
            }
        }

        Some(Operand::IndexedY(expr)) => {
            match self.evaluate(expr) {
                Ok(value) if value >= 0 && value <= 0xFF => {
                    if get_opcode(mnemonic, AddressingMode::ZeroPageY).is_some() {
                        Ok(AddressingMode::ZeroPageY)
                    } else {
                        Ok(AddressingMode::AbsoluteY)
                    }
                }
                _ => Ok(AddressingMode::AbsoluteY),
            }
        }
    }
}
}

Zero Page Optimization

The 6502 has faster, shorter instructions for zero page addresses (0x00-0xFF):

lda 0x80       ; Zero Page: A5 80 (2 bytes, 3 cycles)
lda 0x0180     ; Absolute:  AD 80 01 (3 bytes, 4 cycles)

Our assembler automatically uses zero page mode when:

  1. The address fits in 8 bits (0x00-0xFF)
  2. A zero page variant exists for that instruction

Branch Instructions

Branches use relative addressing. The helper function identifies branch mnemonics:

#![allow(unused)]
fn main() {
fn is_branch_instruction(mnemonic: Mnemonic) -> bool {
    matches!(
        mnemonic,
        Mnemonic::BCC
            | Mnemonic::BCS
            | Mnemonic::BEQ
            | Mnemonic::BMI
            | Mnemonic::BNE
            | Mnemonic::BPL
            | Mnemonic::BVC
            | Mnemonic::BVS
    )
}
}

Output Helpers

#![allow(unused)]
fn main() {
fn emit_byte(&mut self, byte: u8) {
    self.output.push(byte);
    self.current_address = self.current_address.wrapping_add(1);
}

fn emit_word(&mut self, word: u16) {
    self.emit_byte((word & 0xFF) as u8);         // Low byte first
    self.emit_byte(((word >> 8) & 0xFF) as u8);  // High byte second
}

fn pad_to(&mut self, address: u16) {
    while self.current_address < address {
        self.emit_byte(0);
    }
}
}

Assembly Trace Example

Let’s trace assembling:

.org 0x8000
start:
    lda #0x42
    jmp start

Pass 1

StatementActioncurrent_address
.org 0x8000Set origin0x8000
start:Define start = 0x80000x8000
lda #0x42Size = 2 bytes0x8002
jmp startSize = 3 bytes0x8005

Symbol Table: { start: Address(0x8000) }

Pass 2

StatementOutputDescription
.org 0x8000(reset to 0x8000)
start:(nothing)Just a label
lda #0x42A9 42LDA immediate = 0xA9
jmp start4C 00 80JMP absolute = 0x4C, addr = 0x8000 (little-endian)

Final output: A9 42 4C 00 80

Summary

In this chapter, we implemented two-pass assembly:

  • Pass 1: Collect labels and calculate addresses

    • Process .equ constants
    • Calculate instruction sizes
    • Handle .org to set addresses
  • Pass 2: Generate machine code

    • Look up all labels
    • Emit opcode and operand bytes
    • Handle zero page optimization

In the next chapter, we’ll implement the code generation details.


Previous: Chapter 6 - The Symbol Table | Next: Chapter 8 - Code Generation