Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 6: The Symbol Table

In this chapter, we’ll build the symbol table - the data structure that tracks labels and constants during assembly.

What Symbols Do We Track?

The symbol table stores:

  1. Labels: Names that refer to addresses in the program

    • main: → address where code follows
    • .loop: → local label within a function
  2. Constants: Named values defined with .equ

    • .equ SCREEN 0x1000 → SCREEN = 4096

Symbol Structure

#![allow(unused)]
fn main() {
pub struct Symbol {
    pub name: String,
    pub value: SymbolValue,
    pub defined_at: Location,
    pub referenced: bool,
}

pub enum SymbolValue {
    Address(u16),      // Label pointing to an address
    Constant(i64),     // Constant value from .equ
    Undefined,         // Forward reference not yet resolved
}
}

Symbol Constructors

#![allow(unused)]
fn main() {
impl Symbol {
    pub fn address(name: impl Into<String>, address: u16, location: Location) -> Self {
        Self {
            name: name.into(),
            value: SymbolValue::Address(address),
            defined_at: location,
            referenced: false,
        }
    }

    pub fn constant(name: impl Into<String>, value: i64, location: Location) -> Self {
        Self {
            name: name.into(),
            value: SymbolValue::Constant(value),
            defined_at: location,
            referenced: false,
        }
    }

    pub fn is_defined(&self) -> bool {
        !matches!(self.value, SymbolValue::Undefined)
    }

    pub fn numeric_value(&self) -> Option<i64> {
        match self.value {
            SymbolValue::Address(addr) => Some(addr as i64),
            SymbolValue::Constant(val) => Some(val),
            SymbolValue::Undefined => None,
        }
    }
}
}

The Symbol Table

#![allow(unused)]
fn main() {
pub struct SymbolTable {
    symbols: HashMap<String, Symbol>,
    current_parent: Option<String>,
}
}

The current_parent tracks the most recent global label for local label resolution.

Basic Operations

#![allow(unused)]
fn main() {
impl SymbolTable {
    pub fn new() -> Self {
        Self {
            symbols: HashMap::new(),
            current_parent: None,
        }
    }

    pub fn set_parent(&mut self, parent: Option<String>) {
        self.current_parent = parent;
    }

    pub fn parent(&self) -> Option<&str> {
        self.current_parent.as_deref()
    }
}
}

Defining Symbols

#![allow(unused)]
fn main() {
pub fn define(&mut self, symbol: Symbol) -> Result<(), CodeGenError> {
    let name = symbol.name.clone();
    let location = symbol.defined_at;

    if let Some(existing) = self.symbols.get(&name) {
        // If existing symbol is undefined (forward reference), update it
        if !existing.is_defined() {
            self.symbols.insert(name, symbol);
            return Ok(());
        }

        // Already defined - that's an error
        return Err(CodeGenError::DuplicateSymbol {
            name,
            first: existing.defined_at,
            second: location,
        });
    }

    self.symbols.insert(name, symbol);
    Ok(())
}
}

Convenience Methods

#![allow(unused)]
fn main() {
pub fn define_label(
    &mut self,
    name: impl Into<String>,
    address: u16,
    location: Location,
) -> Result<(), CodeGenError> {
    self.define(Symbol::address(name, address, location))
}

pub fn define_constant(
    &mut self,
    name: impl Into<String>,
    value: i64,
    location: Location,
) -> Result<(), CodeGenError> {
    self.define(Symbol::constant(name, value, location))
}
}

Looking Up Symbols

#![allow(unused)]
fn main() {
pub fn lookup(&self, name: &str) -> Option<&Symbol> {
    // Try direct lookup
    if let Some(sym) = self.symbols.get(name) {
        return Some(sym);
    }

    // For local labels, try with parent prefix
    if (name.starts_with('.') || name.starts_with('@'))
        && self.current_parent.is_some()
    {
        let qualified = self.qualify_local_label(name);
        return self.symbols.get(&qualified);
    }

    None
}

pub fn lookup_value(&self, name: &str) -> Option<i64> {
    self.lookup(name).and_then(|s| s.numeric_value())
}

pub fn is_defined(&self, name: &str) -> bool {
    self.lookup(name)
        .map(|s| s.is_defined())
        .unwrap_or(false)
}
}

Local Label Handling

Local labels are scoped to their parent global label:

#![allow(unused)]
fn main() {
pub fn qualify_local_label(&self, name: &str) -> String {
    if let Some(parent) = &self.current_parent {
        if name.starts_with('.') {
            // .loop -> parent.loop
            format!("{}{}", parent, name)
        } else if name.starts_with('@') {
            // @loop -> parent.loop
            format!("{}.{}", parent, &name[1..])
        } else {
            name.to_string()
        }
    } else {
        name.to_string()
    }
}
}

Example

main:           ; current_parent = "main"
.loop:          ; stored as "main.loop"
    bne .loop   ; resolved to "main.loop"

other:          ; current_parent = "other"
.loop:          ; stored as "other.loop"
    bne .loop   ; resolved to "other.loop"

When we encounter main:, we call set_parent(Some("main")). When we encounter .loop:, we store it as main.loop. When we reference .loop, we look up main.loop.

The Forward Reference Problem

Consider this code:

    jmp end     ; 'end' not defined yet!
    nop
end:
    rts

When we encounter jmp end, the label end hasn’t been defined yet. This is called a forward reference.

The Two-Pass Solution

We solve this with two passes:

  1. Pass 1: Scan through the code, recording where each label is defined
  2. Pass 2: Generate code, now that all labels are known

In Pass 1, when we see jmp end:

  • We don’t know end’s address
  • We just calculate that this instruction takes 3 bytes
  • We move on

In Pass 2, when we generate code for jmp end:

  • We look up end in the symbol table
  • We now have its address
  • We emit the correct bytes

Tracking References

We track whether symbols are referenced:

#![allow(unused)]
fn main() {
pub fn mark_referenced(&mut self, name: &str) {
    if let Some(sym) = self.symbols.get_mut(name) {
        sym.referenced = true;
    } else if (name.starts_with('.') || name.starts_with('@'))
        && self.current_parent.is_some()
    {
        let qualified = self.qualify_local_label(name);
        if let Some(sym) = self.symbols.get_mut(&qualified) {
            sym.referenced = true;
        }
    }
}
}

This allows us to warn about unused labels:

#![allow(unused)]
fn main() {
pub fn unreferenced_symbols(&self) -> Vec<&Symbol> {
    self.symbols
        .values()
        .filter(|s| !s.referenced && s.is_defined())
        .collect()
}
}

Finding Undefined Symbols

After Pass 1, we can check for undefined symbols:

#![allow(unused)]
fn main() {
pub fn undefined_symbols(&self) -> Vec<&Symbol> {
    self.symbols
        .values()
        .filter(|s| !s.is_defined())
        .collect()
}
}

Complete Example

Let’s trace symbol table operations for:

.equ SCREEN 0x1000

.org 0x8000
start:
    lda #>SCREEN
.loop:
    jmp .loop

.dw start

Pass 1

OperationSymbol Table
.equ SCREEN 0x1000{ SCREEN: Constant(4096) }
.org 0x8000(no change, just sets address)
start:{ SCREEN, start: Address(0x8000) }
lda #>SCREEN(no change, just advances address)
.loop:{ SCREEN, start, start.loop: Address(0x8002) }
jmp .loop(no change)
.dw start(no change)

Pass 2

When generating code:

  • lda #>SCREEN → look up SCREEN → 0x1000 → high byte is 0x10 → emit A9 10
  • jmp .loop → look up start.loop → 0x8002 → emit 4C 02 80
  • .dw start → look up start → 0x8000 → emit 00 80

Summary

In this chapter, we built a symbol table that:

  • Stores labels (addresses) and constants (values)
  • Handles local labels scoped to parent global labels
  • Detects duplicate symbol definitions
  • Supports forward references via undefined symbols
  • Tracks which symbols are referenced

In the next chapter, we’ll implement the two-pass assembly process that uses this symbol table.


Previous: Chapter 5 - Parser Implementation | Next: Chapter 7 - Two-Pass Assembly