Chapter 6: The Symbol Table
In this chapter, we’ll build the symbol table - the data structure that tracks labels and constants during assembly.
What Symbols Do We Track?
The symbol table stores:
-
Labels: Names that refer to addresses in the program
main:→ address where code follows.loop:→ local label within a function
-
Constants: Named values defined with
.equ.equ SCREEN 0x1000→ SCREEN = 4096
Symbol Structure
#![allow(unused)]
fn main() {
pub struct Symbol {
pub name: String,
pub value: SymbolValue,
pub defined_at: Location,
pub referenced: bool,
}
pub enum SymbolValue {
Address(u16), // Label pointing to an address
Constant(i64), // Constant value from .equ
Undefined, // Forward reference not yet resolved
}
}
Symbol Constructors
#![allow(unused)]
fn main() {
impl Symbol {
pub fn address(name: impl Into<String>, address: u16, location: Location) -> Self {
Self {
name: name.into(),
value: SymbolValue::Address(address),
defined_at: location,
referenced: false,
}
}
pub fn constant(name: impl Into<String>, value: i64, location: Location) -> Self {
Self {
name: name.into(),
value: SymbolValue::Constant(value),
defined_at: location,
referenced: false,
}
}
pub fn is_defined(&self) -> bool {
!matches!(self.value, SymbolValue::Undefined)
}
pub fn numeric_value(&self) -> Option<i64> {
match self.value {
SymbolValue::Address(addr) => Some(addr as i64),
SymbolValue::Constant(val) => Some(val),
SymbolValue::Undefined => None,
}
}
}
}
The Symbol Table
#![allow(unused)]
fn main() {
pub struct SymbolTable {
symbols: HashMap<String, Symbol>,
current_parent: Option<String>,
}
}
The current_parent tracks the most recent global label for local label resolution.
Basic Operations
#![allow(unused)]
fn main() {
impl SymbolTable {
pub fn new() -> Self {
Self {
symbols: HashMap::new(),
current_parent: None,
}
}
pub fn set_parent(&mut self, parent: Option<String>) {
self.current_parent = parent;
}
pub fn parent(&self) -> Option<&str> {
self.current_parent.as_deref()
}
}
}
Defining Symbols
#![allow(unused)]
fn main() {
pub fn define(&mut self, symbol: Symbol) -> Result<(), CodeGenError> {
let name = symbol.name.clone();
let location = symbol.defined_at;
if let Some(existing) = self.symbols.get(&name) {
// If existing symbol is undefined (forward reference), update it
if !existing.is_defined() {
self.symbols.insert(name, symbol);
return Ok(());
}
// Already defined - that's an error
return Err(CodeGenError::DuplicateSymbol {
name,
first: existing.defined_at,
second: location,
});
}
self.symbols.insert(name, symbol);
Ok(())
}
}
Convenience Methods
#![allow(unused)]
fn main() {
pub fn define_label(
&mut self,
name: impl Into<String>,
address: u16,
location: Location,
) -> Result<(), CodeGenError> {
self.define(Symbol::address(name, address, location))
}
pub fn define_constant(
&mut self,
name: impl Into<String>,
value: i64,
location: Location,
) -> Result<(), CodeGenError> {
self.define(Symbol::constant(name, value, location))
}
}
Looking Up Symbols
#![allow(unused)]
fn main() {
pub fn lookup(&self, name: &str) -> Option<&Symbol> {
// Try direct lookup
if let Some(sym) = self.symbols.get(name) {
return Some(sym);
}
// For local labels, try with parent prefix
if (name.starts_with('.') || name.starts_with('@'))
&& self.current_parent.is_some()
{
let qualified = self.qualify_local_label(name);
return self.symbols.get(&qualified);
}
None
}
pub fn lookup_value(&self, name: &str) -> Option<i64> {
self.lookup(name).and_then(|s| s.numeric_value())
}
pub fn is_defined(&self, name: &str) -> bool {
self.lookup(name)
.map(|s| s.is_defined())
.unwrap_or(false)
}
}
Local Label Handling
Local labels are scoped to their parent global label:
#![allow(unused)]
fn main() {
pub fn qualify_local_label(&self, name: &str) -> String {
if let Some(parent) = &self.current_parent {
if name.starts_with('.') {
// .loop -> parent.loop
format!("{}{}", parent, name)
} else if name.starts_with('@') {
// @loop -> parent.loop
format!("{}.{}", parent, &name[1..])
} else {
name.to_string()
}
} else {
name.to_string()
}
}
}
Example
main: ; current_parent = "main"
.loop: ; stored as "main.loop"
bne .loop ; resolved to "main.loop"
other: ; current_parent = "other"
.loop: ; stored as "other.loop"
bne .loop ; resolved to "other.loop"
When we encounter main:, we call set_parent(Some("main")).
When we encounter .loop:, we store it as main.loop.
When we reference .loop, we look up main.loop.
The Forward Reference Problem
Consider this code:
jmp end ; 'end' not defined yet!
nop
end:
rts
When we encounter jmp end, the label end hasn’t been defined yet. This is called a forward reference.
The Two-Pass Solution
We solve this with two passes:
- Pass 1: Scan through the code, recording where each label is defined
- Pass 2: Generate code, now that all labels are known
In Pass 1, when we see jmp end:
- We don’t know end’s address
- We just calculate that this instruction takes 3 bytes
- We move on
In Pass 2, when we generate code for jmp end:
- We look up
endin the symbol table - We now have its address
- We emit the correct bytes
Tracking References
We track whether symbols are referenced:
#![allow(unused)]
fn main() {
pub fn mark_referenced(&mut self, name: &str) {
if let Some(sym) = self.symbols.get_mut(name) {
sym.referenced = true;
} else if (name.starts_with('.') || name.starts_with('@'))
&& self.current_parent.is_some()
{
let qualified = self.qualify_local_label(name);
if let Some(sym) = self.symbols.get_mut(&qualified) {
sym.referenced = true;
}
}
}
}
This allows us to warn about unused labels:
#![allow(unused)]
fn main() {
pub fn unreferenced_symbols(&self) -> Vec<&Symbol> {
self.symbols
.values()
.filter(|s| !s.referenced && s.is_defined())
.collect()
}
}
Finding Undefined Symbols
After Pass 1, we can check for undefined symbols:
#![allow(unused)]
fn main() {
pub fn undefined_symbols(&self) -> Vec<&Symbol> {
self.symbols
.values()
.filter(|s| !s.is_defined())
.collect()
}
}
Complete Example
Let’s trace symbol table operations for:
.equ SCREEN 0x1000
.org 0x8000
start:
lda #>SCREEN
.loop:
jmp .loop
.dw start
Pass 1
| Operation | Symbol Table |
|---|---|
.equ SCREEN 0x1000 | { SCREEN: Constant(4096) } |
.org 0x8000 | (no change, just sets address) |
start: | { SCREEN, start: Address(0x8000) } |
lda #>SCREEN | (no change, just advances address) |
.loop: | { SCREEN, start, start.loop: Address(0x8002) } |
jmp .loop | (no change) |
.dw start | (no change) |
Pass 2
When generating code:
lda #>SCREEN→ look up SCREEN → 0x1000 → high byte is 0x10 → emitA9 10jmp .loop→ look up start.loop → 0x8002 → emit4C 02 80.dw start→ look up start → 0x8000 → emit00 80
Summary
In this chapter, we built a symbol table that:
- Stores labels (addresses) and constants (values)
- Handles local labels scoped to parent global labels
- Detects duplicate symbol definitions
- Supports forward references via undefined symbols
- Tracks which symbols are referenced
In the next chapter, we’ll implement the two-pass assembly process that uses this symbol table.
Previous: Chapter 5 - Parser Implementation | Next: Chapter 7 - Two-Pass Assembly