Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 8: Code Generation

In this chapter, we’ll implement the code generation phase that converts AST nodes into actual machine code bytes.

Instruction Encoding Overview

Each 6502 instruction consists of:

  1. Opcode byte: Identifies the instruction and addressing mode
  2. Operand bytes: 0, 1, or 2 bytes depending on addressing mode
Addressing ModeSizeExample
Implied1NOPEA
Accumulator1ASL A0A
Immediate2LDA #$42A9 42
Zero Page2LDA $80A5 80
Zero Page,X2LDA $80,XB5 80
Zero Page,Y2LDX $80,YB6 80
Absolute3LDA $2000AD 00 20
Absolute,X3LDA $2000,XBD 00 20
Absolute,Y3LDA $2000,YB9 00 20
Indirect3JMP ($2000)6C 00 20
Indexed Indirect2LDA ($80,X)A1 80
Indirect Indexed2LDA ($80),YB1 80
Relative2BEQ labelF0 offset

The emit_instruction Function

#![allow(unused)]
fn main() {
pub fn emit_instruction(&mut self, instr: &InstructionStmt) -> Result<(), CodeGenError> {
    // Save instruction start address for $ evaluation
    let instr_start = self.current_address;

    // Determine addressing mode
    let mode = self.determine_addressing_mode(instr.mnemonic, &instr.operand)?;

    // Look up the opcode
    let opcode = get_opcode(instr.mnemonic, mode).ok_or_else(|| {
        CodeGenError::InvalidAddressingMode {
            mnemonic: instr.mnemonic,
            mode,
            location: instr.location,
        }
    })?;

    // Emit opcode byte
    self.emit_byte(opcode.code);

    // Emit operand bytes
    self.emit_operand_bytes(&instr.operand, mode, instr.location, instr_start)?;

    Ok(())
}
}

Emitting Operand Bytes

Each addressing mode requires different operand handling:

#![allow(unused)]
fn main() {
fn emit_operand_bytes(
    &mut self,
    operand: &Option<Operand>,
    mode: AddressingMode,
    location: Location,
    instr_start: u16,
) -> Result<(), CodeGenError> {
    match mode {
        AddressingMode::Implied | AddressingMode::Accumulator => {
            // No operand bytes
        }

        AddressingMode::Immediate => {
            if let Some(Operand::Immediate(expr)) = operand {
                let value = self.evaluate_byte_at(expr, location, instr_start)?;
                self.emit_byte(value);
            }
        }

        AddressingMode::ZeroPage => {
            if let Some(Operand::Address(expr)) = operand {
                let value = self.evaluate_byte_at(expr, location, instr_start)?;
                self.emit_byte(value);
            }
        }

        AddressingMode::ZeroPageX => {
            if let Some(Operand::IndexedX(expr)) = operand {
                let value = self.evaluate_byte_at(expr, location, instr_start)?;
                self.emit_byte(value);
            }
        }

        AddressingMode::ZeroPageY => {
            if let Some(Operand::IndexedY(expr)) = operand {
                let value = self.evaluate_byte_at(expr, location, instr_start)?;
                self.emit_byte(value);
            }
        }

        AddressingMode::Absolute => {
            if let Some(Operand::Address(expr)) = operand {
                let value = self.evaluate_word_at(expr, location, instr_start)?;
                self.emit_word(value);
            }
        }

        AddressingMode::AbsoluteX => {
            if let Some(Operand::IndexedX(expr)) = operand {
                let value = self.evaluate_word_at(expr, location, instr_start)?;
                self.emit_word(value);
            }
        }

        AddressingMode::AbsoluteY => {
            if let Some(Operand::IndexedY(expr)) = operand {
                let value = self.evaluate_word_at(expr, location, instr_start)?;
                self.emit_word(value);
            }
        }

        AddressingMode::Indirect => {
            if let Some(Operand::Indirect(expr)) = operand {
                let value = self.evaluate_word_at(expr, location, instr_start)?;
                self.emit_word(value);
            }
        }

        AddressingMode::IndirectX => {
            if let Some(Operand::IndirectX(expr)) = operand {
                let value = self.evaluate_byte_at(expr, location, instr_start)?;
                self.emit_byte(value);
            }
        }

        AddressingMode::IndirectY => {
            if let Some(Operand::IndirectY(expr)) = operand {
                let value = self.evaluate_byte_at(expr, location, instr_start)?;
                self.emit_byte(value);
            }
        }

        AddressingMode::Relative => {
            if let Some(Operand::Address(expr)) = operand {
                let offset = self.calculate_branch_offset(expr, instr_start, location)?;
                self.emit_byte(offset as u8);
            }
        }
    }

    Ok(())
}
}

Relative Branch Calculation

Branch instructions use a signed 8-bit offset relative to the instruction after the branch:

#![allow(unused)]
fn main() {
pub fn calculate_branch_offset(
    &self,
    target_expr: &Expression,
    from_address: u16,
    location: Location,
) -> Result<i8, CodeGenError> {
    let target = self.evaluate_with_location(target_expr, location)?;

    // Branch is relative to PC after the branch instruction (PC + 2)
    let offset = target - (from_address as i64 + 2);

    if offset < -128 || offset > 127 {
        return Err(CodeGenError::BranchOutOfRange { offset, location });
    }

    Ok(offset as i8)
}
}

Branch Offset Example

.org 0x8000
loop:           ; Address 0x8000
    dex         ; Address 0x8000 (1 byte)
    bne loop    ; Address 0x8001 (2 bytes)

For bne loop:

  • Current address when encoding: 0x8001
  • Target: 0x8000
  • Offset = target - (current + 2) = 0x8000 - 0x8003 = -3 = 0xFD

Output: D0 FD

Little-Endian Word Emission

The 6502 uses little-endian byte order:

#![allow(unused)]
fn main() {
fn emit_word(&mut self, word: u16) {
    self.emit_byte((word & 0xFF) as u8);         // Low byte
    self.emit_byte(((word >> 8) & 0xFF) as u8);  // High byte
}
}

So address 0x2000 becomes bytes 00 20.

Opcode Lookup

We use the byte_common crate’s opcode table:

#![allow(unused)]
fn main() {
use byte_common::opcode::{get_opcode, AddressingMode, Mnemonic, Opcode};

// Returns Option<&'static Opcode>
let opcode = get_opcode(Mnemonic::LDA, AddressingMode::Immediate);
// opcode.code = 0xA9
// opcode.size = 2
}

The Opcode structure provides:

  • code: The actual opcode byte (0x00-0xFF)
  • size: Instruction size in bytes
  • tick: Cycle count (for emulation)

Code Generation Examples

Example 1: NOP (Implied)

nop
  1. Mnemonic: NOP
  2. Operand: None
  3. Mode: Implied
  4. Opcode: 0xEA
  5. Output: EA

Example 2: LDA Immediate

lda #0x42
  1. Mnemonic: LDA
  2. Operand: Immediate(0x42)
  3. Mode: Immediate
  4. Opcode: 0xA9
  5. Operand byte: 0x42
  6. Output: A9 42

Example 3: LDA Zero Page

lda 0x80
  1. Mnemonic: LDA
  2. Operand: Address(0x80)
  3. Value fits in 8 bits → Mode: Zero Page
  4. Opcode: 0xA5
  5. Address byte: 0x80
  6. Output: A5 80

Example 4: LDA Absolute

lda 0x2000
  1. Mnemonic: LDA
  2. Operand: Address(0x2000)
  3. Value > 0xFF → Mode: Absolute
  4. Opcode: 0xAD
  5. Address word: 0x2000 → 00 20 (little-endian)
  6. Output: AD 00 20

Example 5: JMP Indirect

jmp (0x2000)
  1. Mnemonic: JMP
  2. Operand: Indirect(0x2000)
  3. Mode: Indirect
  4. Opcode: 0x6C
  5. Address word: 00 20
  6. Output: 6C 00 20

Example 6: Branch Instruction

.org 0x8000
start:
    ldx #5
loop:
    dex
    bne loop

At bne loop:

  • Current address: 0x8003
  • Target: 0x8002 (loop)
  • Offset: 0x8002 - 0x8005 = -3 = 0xFD

Output for bne loop: D0 FD

Error Handling

Several errors can occur during code generation:

#![allow(unused)]
fn main() {
pub enum CodeGenError {
    // Invalid mnemonic + mode combination
    InvalidAddressingMode {
        mnemonic: Mnemonic,
        mode: AddressingMode,
        location: Location,
    },

    // Branch target too far
    BranchOutOfRange {
        offset: i64,
        location: Location,
    },

    // Value too large for operand
    ValueOutOfRange {
        value: i64,
        max: i64,
        location: Location,
    },

    // Undefined symbol
    UndefinedSymbol {
        name: String,
        location: Location,
    },
}
}

Complete Instruction Table

For reference, here’s how common instructions encode:

InstructionModeOpcode
LDA #nnImmediateA9
LDA zpZero PageA5
LDA zp,xZero Page,XB5
LDA absAbsoluteAD
LDA abs,xAbsolute,XBD
LDA abs,yAbsolute,YB9
LDA (zp,x)Indexed IndirectA1
LDA (zp),yIndirect IndexedB1
STA zpZero Page85
STA absAbsolute8D
JMP absAbsolute4C
JMP (abs)Indirect6C
JSR absAbsolute20
RTSImplied60
NOPImpliedEA
BEQ relRelativeF0
BNE relRelativeD0

Summary

In this chapter, we implemented code generation:

  • Opcode lookup using mnemonic + addressing mode
  • Operand byte emission for each addressing mode
  • Little-endian word encoding
  • Relative branch offset calculation
  • Error handling for invalid combinations

In the next chapter, we’ll implement expression evaluation for calculating addresses and values.


Previous: Chapter 7 - Two-Pass Assembly | Next: Chapter 9 - Expression Evaluation