Chapter 8: Code Generation
In this chapter, we’ll implement the code generation phase that converts AST nodes into actual machine code bytes.
Instruction Encoding Overview
Each 6502 instruction consists of:
- Opcode byte: Identifies the instruction and addressing mode
- Operand bytes: 0, 1, or 2 bytes depending on addressing mode
| Addressing Mode | Size | Example |
|---|---|---|
| Implied | 1 | NOP → EA |
| Accumulator | 1 | ASL A → 0A |
| Immediate | 2 | LDA #$42 → A9 42 |
| Zero Page | 2 | LDA $80 → A5 80 |
| Zero Page,X | 2 | LDA $80,X → B5 80 |
| Zero Page,Y | 2 | LDX $80,Y → B6 80 |
| Absolute | 3 | LDA $2000 → AD 00 20 |
| Absolute,X | 3 | LDA $2000,X → BD 00 20 |
| Absolute,Y | 3 | LDA $2000,Y → B9 00 20 |
| Indirect | 3 | JMP ($2000) → 6C 00 20 |
| Indexed Indirect | 2 | LDA ($80,X) → A1 80 |
| Indirect Indexed | 2 | LDA ($80),Y → B1 80 |
| Relative | 2 | BEQ label → F0 offset |
The emit_instruction Function
#![allow(unused)]
fn main() {
pub fn emit_instruction(&mut self, instr: &InstructionStmt) -> Result<(), CodeGenError> {
// Save instruction start address for $ evaluation
let instr_start = self.current_address;
// Determine addressing mode
let mode = self.determine_addressing_mode(instr.mnemonic, &instr.operand)?;
// Look up the opcode
let opcode = get_opcode(instr.mnemonic, mode).ok_or_else(|| {
CodeGenError::InvalidAddressingMode {
mnemonic: instr.mnemonic,
mode,
location: instr.location,
}
})?;
// Emit opcode byte
self.emit_byte(opcode.code);
// Emit operand bytes
self.emit_operand_bytes(&instr.operand, mode, instr.location, instr_start)?;
Ok(())
}
}
Emitting Operand Bytes
Each addressing mode requires different operand handling:
#![allow(unused)]
fn main() {
fn emit_operand_bytes(
&mut self,
operand: &Option<Operand>,
mode: AddressingMode,
location: Location,
instr_start: u16,
) -> Result<(), CodeGenError> {
match mode {
AddressingMode::Implied | AddressingMode::Accumulator => {
// No operand bytes
}
AddressingMode::Immediate => {
if let Some(Operand::Immediate(expr)) = operand {
let value = self.evaluate_byte_at(expr, location, instr_start)?;
self.emit_byte(value);
}
}
AddressingMode::ZeroPage => {
if let Some(Operand::Address(expr)) = operand {
let value = self.evaluate_byte_at(expr, location, instr_start)?;
self.emit_byte(value);
}
}
AddressingMode::ZeroPageX => {
if let Some(Operand::IndexedX(expr)) = operand {
let value = self.evaluate_byte_at(expr, location, instr_start)?;
self.emit_byte(value);
}
}
AddressingMode::ZeroPageY => {
if let Some(Operand::IndexedY(expr)) = operand {
let value = self.evaluate_byte_at(expr, location, instr_start)?;
self.emit_byte(value);
}
}
AddressingMode::Absolute => {
if let Some(Operand::Address(expr)) = operand {
let value = self.evaluate_word_at(expr, location, instr_start)?;
self.emit_word(value);
}
}
AddressingMode::AbsoluteX => {
if let Some(Operand::IndexedX(expr)) = operand {
let value = self.evaluate_word_at(expr, location, instr_start)?;
self.emit_word(value);
}
}
AddressingMode::AbsoluteY => {
if let Some(Operand::IndexedY(expr)) = operand {
let value = self.evaluate_word_at(expr, location, instr_start)?;
self.emit_word(value);
}
}
AddressingMode::Indirect => {
if let Some(Operand::Indirect(expr)) = operand {
let value = self.evaluate_word_at(expr, location, instr_start)?;
self.emit_word(value);
}
}
AddressingMode::IndirectX => {
if let Some(Operand::IndirectX(expr)) = operand {
let value = self.evaluate_byte_at(expr, location, instr_start)?;
self.emit_byte(value);
}
}
AddressingMode::IndirectY => {
if let Some(Operand::IndirectY(expr)) = operand {
let value = self.evaluate_byte_at(expr, location, instr_start)?;
self.emit_byte(value);
}
}
AddressingMode::Relative => {
if let Some(Operand::Address(expr)) = operand {
let offset = self.calculate_branch_offset(expr, instr_start, location)?;
self.emit_byte(offset as u8);
}
}
}
Ok(())
}
}
Relative Branch Calculation
Branch instructions use a signed 8-bit offset relative to the instruction after the branch:
#![allow(unused)]
fn main() {
pub fn calculate_branch_offset(
&self,
target_expr: &Expression,
from_address: u16,
location: Location,
) -> Result<i8, CodeGenError> {
let target = self.evaluate_with_location(target_expr, location)?;
// Branch is relative to PC after the branch instruction (PC + 2)
let offset = target - (from_address as i64 + 2);
if offset < -128 || offset > 127 {
return Err(CodeGenError::BranchOutOfRange { offset, location });
}
Ok(offset as i8)
}
}
Branch Offset Example
.org 0x8000
loop: ; Address 0x8000
dex ; Address 0x8000 (1 byte)
bne loop ; Address 0x8001 (2 bytes)
For bne loop:
- Current address when encoding: 0x8001
- Target: 0x8000
- Offset = target - (current + 2) = 0x8000 - 0x8003 = -3 = 0xFD
Output: D0 FD
Little-Endian Word Emission
The 6502 uses little-endian byte order:
#![allow(unused)]
fn main() {
fn emit_word(&mut self, word: u16) {
self.emit_byte((word & 0xFF) as u8); // Low byte
self.emit_byte(((word >> 8) & 0xFF) as u8); // High byte
}
}
So address 0x2000 becomes bytes 00 20.
Opcode Lookup
We use the byte_common crate’s opcode table:
#![allow(unused)]
fn main() {
use byte_common::opcode::{get_opcode, AddressingMode, Mnemonic, Opcode};
// Returns Option<&'static Opcode>
let opcode = get_opcode(Mnemonic::LDA, AddressingMode::Immediate);
// opcode.code = 0xA9
// opcode.size = 2
}
The Opcode structure provides:
code: The actual opcode byte (0x00-0xFF)size: Instruction size in bytestick: Cycle count (for emulation)
Code Generation Examples
Example 1: NOP (Implied)
nop
- Mnemonic: NOP
- Operand: None
- Mode: Implied
- Opcode: 0xEA
- Output:
EA
Example 2: LDA Immediate
lda #0x42
- Mnemonic: LDA
- Operand: Immediate(0x42)
- Mode: Immediate
- Opcode: 0xA9
- Operand byte: 0x42
- Output:
A9 42
Example 3: LDA Zero Page
lda 0x80
- Mnemonic: LDA
- Operand: Address(0x80)
- Value fits in 8 bits → Mode: Zero Page
- Opcode: 0xA5
- Address byte: 0x80
- Output:
A5 80
Example 4: LDA Absolute
lda 0x2000
- Mnemonic: LDA
- Operand: Address(0x2000)
- Value > 0xFF → Mode: Absolute
- Opcode: 0xAD
- Address word: 0x2000 →
00 20(little-endian) - Output:
AD 00 20
Example 5: JMP Indirect
jmp (0x2000)
- Mnemonic: JMP
- Operand: Indirect(0x2000)
- Mode: Indirect
- Opcode: 0x6C
- Address word:
00 20 - Output:
6C 00 20
Example 6: Branch Instruction
.org 0x8000
start:
ldx #5
loop:
dex
bne loop
At bne loop:
- Current address: 0x8003
- Target: 0x8002 (loop)
- Offset: 0x8002 - 0x8005 = -3 = 0xFD
Output for bne loop: D0 FD
Error Handling
Several errors can occur during code generation:
#![allow(unused)]
fn main() {
pub enum CodeGenError {
// Invalid mnemonic + mode combination
InvalidAddressingMode {
mnemonic: Mnemonic,
mode: AddressingMode,
location: Location,
},
// Branch target too far
BranchOutOfRange {
offset: i64,
location: Location,
},
// Value too large for operand
ValueOutOfRange {
value: i64,
max: i64,
location: Location,
},
// Undefined symbol
UndefinedSymbol {
name: String,
location: Location,
},
}
}
Complete Instruction Table
For reference, here’s how common instructions encode:
| Instruction | Mode | Opcode |
|---|---|---|
LDA #nn | Immediate | A9 |
LDA zp | Zero Page | A5 |
LDA zp,x | Zero Page,X | B5 |
LDA abs | Absolute | AD |
LDA abs,x | Absolute,X | BD |
LDA abs,y | Absolute,Y | B9 |
LDA (zp,x) | Indexed Indirect | A1 |
LDA (zp),y | Indirect Indexed | B1 |
STA zp | Zero Page | 85 |
STA abs | Absolute | 8D |
JMP abs | Absolute | 4C |
JMP (abs) | Indirect | 6C |
JSR abs | Absolute | 20 |
RTS | Implied | 60 |
NOP | Implied | EA |
BEQ rel | Relative | F0 |
BNE rel | Relative | D0 |
Summary
In this chapter, we implemented code generation:
- Opcode lookup using mnemonic + addressing mode
- Operand byte emission for each addressing mode
- Little-endian word encoding
- Relative branch offset calculation
- Error handling for invalid combinations
In the next chapter, we’ll implement expression evaluation for calculating addresses and values.
Previous: Chapter 7 - Two-Pass Assembly | Next: Chapter 9 - Expression Evaluation