Chapter 7: Two-Pass Assembly
In this chapter, we’ll implement the two-pass assembly process that transforms our AST into machine code.
Why Two Passes?
Consider this program:
jmp end
nop
end:
rts
When we reach jmp end, we need to know the address of end. But we haven’t seen end yet! This is the forward reference problem.
The solution is two passes:
- Pass 1: Collect all labels and calculate their addresses
- Pass 2: Generate code using the complete symbol table
The Assembler Structure
#![allow(unused)]
fn main() {
pub struct Assembler {
symbols: SymbolTable,
current_address: u16,
origin: u16,
output: Vec<u8>,
errors: Vec<CodeGenError>,
current_file: Option<String>,
}
impl Assembler {
pub fn new() -> Self {
Self {
symbols: SymbolTable::new(),
current_address: 0,
origin: 0,
output: Vec::new(),
errors: Vec::new(),
current_file: None,
}
}
}
}
The Main Assemble Function
#![allow(unused)]
fn main() {
pub fn assemble(&mut self, program: &Program) -> Result<Vec<u8>, AssemblerError> {
self.current_file = program.source_file.clone();
// Pass 1: Collect symbols
self.pass1(program)?;
// Pass 2: Generate code
self.pass2(program)?;
if !self.errors.is_empty() {
return Err(AssemblerError::Multiple(
self.errors.iter().cloned().map(AssemblerError::CodeGen).collect(),
));
}
Ok(std::mem::take(&mut self.output))
}
}
Pass 1: Symbol Collection
In Pass 1, we walk through the program and:
- Record label addresses
- Process constants from
.equ - Calculate instruction sizes to track the current address
#![allow(unused)]
fn main() {
fn pass1(&mut self, program: &Program) -> Result<(), AssemblerError> {
self.current_address = self.origin;
for stmt in &program.statements {
match stmt {
Statement::Label(label) => {
self.pass1_label(label)?;
}
Statement::Instruction(instr) => {
self.pass1_instruction(instr)?;
}
Statement::Directive(dir) => {
self.pass1_directive(dir)?;
}
}
}
Ok(())
}
}
Processing Labels
#![allow(unused)]
fn main() {
fn pass1_label(&mut self, label: &LabelDef) -> Result<(), AssemblerError> {
let name = if label.is_local {
self.symbols.qualify_local_label(&label.name)
} else {
// Update parent for subsequent local labels
self.symbols.set_parent(Some(label.name.clone()));
label.name.clone()
};
if let Err(e) = self.symbols.define_label(&name, self.current_address, label.location) {
self.errors.push(e);
}
Ok(())
}
}
Calculating Instruction Size
We need to determine how many bytes an instruction will take:
#![allow(unused)]
fn main() {
fn pass1_instruction(&mut self, instr: &InstructionStmt) -> Result<(), AssemblerError> {
let size = self.instruction_size(instr);
self.current_address = self.current_address.wrapping_add(size as u16);
Ok(())
}
fn instruction_size(&self, instr: &InstructionStmt) -> u8 {
match self.determine_addressing_mode(instr.mnemonic, &instr.operand) {
Ok(mode) => {
if let Some(opcode) = get_opcode(instr.mnemonic, mode) {
opcode.size
} else {
1 // Invalid, will error in pass 2
}
}
Err(_) => 1,
}
}
}
Processing Directives in Pass 1
#![allow(unused)]
fn main() {
fn pass1_directive(&mut self, directive: &DirectiveStmt) -> Result<(), AssemblerError> {
match directive {
DirectiveStmt::Org { address, .. } => {
let addr = self.evaluate(address)? as u16;
self.origin = addr;
self.current_address = addr;
}
DirectiveStmt::Db { values, .. } => {
for value in values {
match value {
DataValue::Byte(_) => {
self.current_address = self.current_address.wrapping_add(1);
}
DataValue::String(s) => {
self.current_address = self.current_address.wrapping_add(s.len() as u16);
}
}
}
}
DirectiveStmt::Dw { values, .. } => {
self.current_address = self.current_address.wrapping_add((values.len() * 2) as u16);
}
DirectiveStmt::Equ { name, value, location } => {
let val = self.evaluate(value)?;
self.symbols.define_constant(name, val, *location)?;
}
DirectiveStmt::Include { .. } => {
// Handle includes (recursive assembly)
}
}
Ok(())
}
}
Pass 2: Code Generation
In Pass 2, we generate the actual machine code:
#![allow(unused)]
fn main() {
fn pass2(&mut self, program: &Program) -> Result<(), AssemblerError> {
self.current_address = self.origin;
self.output.clear();
for stmt in &program.statements {
match stmt {
Statement::Label(label) => {
// Update parent for local label resolution
if !label.is_local {
self.symbols.set_parent(Some(label.name.clone()));
}
}
Statement::Instruction(instr) => {
if let Err(e) = self.emit_instruction(instr) {
self.errors.push(e);
}
}
Statement::Directive(dir) => {
if let Err(e) = self.emit_directive(dir) {
self.errors.push(e);
}
}
}
}
Ok(())
}
}
Determining Addressing Mode
The same operand syntax can map to different addressing modes depending on the value:
#![allow(unused)]
fn main() {
fn determine_addressing_mode(
&self,
mnemonic: Mnemonic,
operand: &Option<Operand>,
) -> Result<AddressingMode, CodeGenError> {
match operand {
None => Ok(AddressingMode::Implied),
Some(Operand::Immediate(_)) => Ok(AddressingMode::Immediate),
Some(Operand::Accumulator) => Ok(AddressingMode::Accumulator),
Some(Operand::IndirectX(_)) => Ok(AddressingMode::IndirectX),
Some(Operand::IndirectY(_)) => Ok(AddressingMode::IndirectY),
Some(Operand::Indirect(_)) => Ok(AddressingMode::Indirect),
Some(Operand::Address(expr)) => {
// Branches always use relative
if is_branch_instruction(mnemonic) {
return Ok(AddressingMode::Relative);
}
// Try to evaluate; if small enough, use zero page
match self.evaluate(expr) {
Ok(value) if value >= 0 && value <= 0xFF => {
if get_opcode(mnemonic, AddressingMode::ZeroPage).is_some() {
Ok(AddressingMode::ZeroPage)
} else {
Ok(AddressingMode::Absolute)
}
}
_ => Ok(AddressingMode::Absolute),
}
}
Some(Operand::IndexedX(expr)) => {
match self.evaluate(expr) {
Ok(value) if value >= 0 && value <= 0xFF => {
if get_opcode(mnemonic, AddressingMode::ZeroPageX).is_some() {
Ok(AddressingMode::ZeroPageX)
} else {
Ok(AddressingMode::AbsoluteX)
}
}
_ => Ok(AddressingMode::AbsoluteX),
}
}
Some(Operand::IndexedY(expr)) => {
match self.evaluate(expr) {
Ok(value) if value >= 0 && value <= 0xFF => {
if get_opcode(mnemonic, AddressingMode::ZeroPageY).is_some() {
Ok(AddressingMode::ZeroPageY)
} else {
Ok(AddressingMode::AbsoluteY)
}
}
_ => Ok(AddressingMode::AbsoluteY),
}
}
}
}
}
Zero Page Optimization
The 6502 has faster, shorter instructions for zero page addresses (0x00-0xFF):
lda 0x80 ; Zero Page: A5 80 (2 bytes, 3 cycles)
lda 0x0180 ; Absolute: AD 80 01 (3 bytes, 4 cycles)
Our assembler automatically uses zero page mode when:
- The address fits in 8 bits (0x00-0xFF)
- A zero page variant exists for that instruction
Branch Instructions
Branches use relative addressing. The helper function identifies branch mnemonics:
#![allow(unused)]
fn main() {
fn is_branch_instruction(mnemonic: Mnemonic) -> bool {
matches!(
mnemonic,
Mnemonic::BCC
| Mnemonic::BCS
| Mnemonic::BEQ
| Mnemonic::BMI
| Mnemonic::BNE
| Mnemonic::BPL
| Mnemonic::BVC
| Mnemonic::BVS
)
}
}
Output Helpers
#![allow(unused)]
fn main() {
fn emit_byte(&mut self, byte: u8) {
self.output.push(byte);
self.current_address = self.current_address.wrapping_add(1);
}
fn emit_word(&mut self, word: u16) {
self.emit_byte((word & 0xFF) as u8); // Low byte first
self.emit_byte(((word >> 8) & 0xFF) as u8); // High byte second
}
fn pad_to(&mut self, address: u16) {
while self.current_address < address {
self.emit_byte(0);
}
}
}
Assembly Trace Example
Let’s trace assembling:
.org 0x8000
start:
lda #0x42
jmp start
Pass 1
| Statement | Action | current_address |
|---|---|---|
.org 0x8000 | Set origin | 0x8000 |
start: | Define start = 0x8000 | 0x8000 |
lda #0x42 | Size = 2 bytes | 0x8002 |
jmp start | Size = 3 bytes | 0x8005 |
Symbol Table: { start: Address(0x8000) }
Pass 2
| Statement | Output | Description |
|---|---|---|
.org 0x8000 | (reset to 0x8000) | |
start: | (nothing) | Just a label |
lda #0x42 | A9 42 | LDA immediate = 0xA9 |
jmp start | 4C 00 80 | JMP absolute = 0x4C, addr = 0x8000 (little-endian) |
Final output: A9 42 4C 00 80
Summary
In this chapter, we implemented two-pass assembly:
-
Pass 1: Collect labels and calculate addresses
- Process
.equconstants - Calculate instruction sizes
- Handle
.orgto set addresses
- Process
-
Pass 2: Generate machine code
- Look up all labels
- Emit opcode and operand bytes
- Handle zero page optimization
In the next chapter, we’ll implement the code generation details.
Previous: Chapter 6 - The Symbol Table | Next: Chapter 8 - Code Generation