Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Chapter 5: Building the Parser - Implementation

In this chapter, we’ll implement the complete parser for ByteASM, building on the infrastructure from Chapter 4.

Parsing Labels

Labels are identifiers followed by a colon:

#![allow(unused)]
fn main() {
fn parse_label(&mut self) -> ParseResult<Statement> {
    let token = self.advance();
    let name = token.text(self.source).to_string();
    let is_local = token.kind == TokenKind::LocalLabel;
    let location = token.location;

    // Expect colon after label name
    if !self.check(TokenKind::Colon) {
        return Err(ParseError::InvalidLabel {
            message: "expected ':' after label name".to_string(),
            location,
        });
    }
    self.advance(); // consume colon

    Ok(Statement::Label(LabelDef {
        name,
        is_local,
        location,
    }))
}
}

Examples

InputResult
main:LabelDef { name: "main", is_local: false }
.loop:LabelDef { name: ".loop", is_local: true }
@temp:LabelDef { name: "@temp", is_local: true }

Parsing Instructions

Instructions have an optional operand:

#![allow(unused)]
fn main() {
fn parse_instruction(&mut self) -> ParseResult<Statement> {
    let token = self.advance();
    let mnemonic = token.mnemonic().unwrap();
    let location = token.location;

    // Check if there's an operand
    let operand = if self.has_operand() {
        Some(self.parse_operand()?)
    } else {
        None
    };

    Ok(Statement::Instruction(InstructionStmt {
        mnemonic,
        operand,
        location,
    }))
}

fn has_operand(&self) -> bool {
    matches!(
        self.current.kind,
        TokenKind::Hash
            | TokenKind::OpenParen
            | TokenKind::Identifier
            | TokenKind::LocalLabel
            | TokenKind::Number
            | TokenKind::Dollar
            | TokenKind::LessThan
            | TokenKind::GreaterThan
            | TokenKind::Register
    )
}
}

Parsing Operands

The operand determines the addressing mode. Here’s the decision tree:

Token           → Operand Type
──────────────────────────────────
#               → Immediate
a (register)    → Accumulator
(               → Indirect, IndirectX, or IndirectY
other           → Address, IndexedX, or IndexedY
#![allow(unused)]
fn main() {
fn parse_operand(&mut self) -> ParseResult<Operand> {
    // Immediate: #expr
    if self.check(TokenKind::Hash) {
        self.advance();
        let expr = self.parse_expression()?;
        return Ok(Operand::Immediate(expr));
    }

    // Accumulator: a
    if self.check(TokenKind::Register) {
        let text = self.current.text(self.source).to_lowercase();
        if text == "a" {
            self.advance();
            return Ok(Operand::Accumulator);
        }
    }

    // Indirect modes: (...)
    if self.check(TokenKind::OpenParen) {
        return self.parse_indirect_operand();
    }

    // Address or indexed: expr or expr,x or expr,y
    self.parse_address_operand()
}
}

Parsing Address Operands

#![allow(unused)]
fn main() {
fn parse_address_operand(&mut self) -> ParseResult<Operand> {
    let expr = self.parse_expression()?;

    // Check for indexing
    if self.check(TokenKind::Comma) {
        self.advance();

        if self.check(TokenKind::Register) {
            let reg = self.current.text(self.source).to_lowercase();
            self.advance();

            return match reg.as_str() {
                "x" => Ok(Operand::IndexedX(expr)),
                "y" => Ok(Operand::IndexedY(expr)),
                _ => Err(ParseError::InvalidOperand {
                    message: format!("expected 'x' or 'y', found '{}'", reg),
                    location: self.previous.location,
                }),
            };
        }
    }

    Ok(Operand::Address(expr))
}
}

Parsing Indirect Operands

Indirect operands have three forms:

  • (addr) - Indirect
  • (zp,x) - Indexed Indirect
  • (zp),y - Indirect Indexed
#![allow(unused)]
fn main() {
fn parse_indirect_operand(&mut self) -> ParseResult<Operand> {
    self.advance(); // consume '('

    let expr = self.parse_expression()?;

    // (zp,x) - Indexed Indirect
    if self.check(TokenKind::Comma) {
        self.advance();

        if self.check(TokenKind::Register) {
            let reg = self.current.text(self.source).to_lowercase();
            self.advance();

            if reg != "x" {
                return Err(ParseError::InvalidOperand {
                    message: "indexed indirect only supports X register".to_string(),
                    location: self.previous.location,
                });
            }

            self.expect(TokenKind::CloseParen, "')'")?;
            return Ok(Operand::IndirectX(expr));
        }
    }

    self.expect(TokenKind::CloseParen, "')'")?;

    // (addr),y - Indirect Indexed
    if self.check(TokenKind::Comma) {
        self.advance();

        if self.check(TokenKind::Register) {
            let reg = self.current.text(self.source).to_lowercase();
            self.advance();

            if reg != "y" {
                return Err(ParseError::InvalidOperand {
                    message: "indirect indexed only supports Y register".to_string(),
                    location: self.previous.location,
                });
            }

            return Ok(Operand::IndirectY(expr));
        }
    }

    // (addr) - Plain Indirect
    Ok(Operand::Indirect(expr))
}
}

Parsing Expressions

Expressions follow standard precedence rules:

  • *, / bind tighter than +, -
  • Unary operators (-, <, >) bind tightest

Expression Grammar

expression   → additive
additive     → multiplicative ( ('+' | '-') multiplicative )*
multiplicative → unary ( ('*' | '/') unary )*
unary        → ('-' | '<' | '>') unary | primary
primary      → NUMBER | IDENTIFIER | LOCAL_LABEL | '$' | '(' expression ')'

Implementation

#![allow(unused)]
fn main() {
pub fn parse_expression(&mut self) -> ParseResult<Expression> {
    self.parse_additive()
}

fn parse_additive(&mut self) -> ParseResult<Expression> {
    let mut left = self.parse_multiplicative()?;

    while self.check(TokenKind::Plus) || self.check(TokenKind::Minus) {
        let op = if self.check(TokenKind::Plus) {
            self.advance();
            BinaryOp::Add
        } else {
            self.advance();
            BinaryOp::Sub
        };

        let right = self.parse_multiplicative()?;
        left = Expression::binary(left, op, right);
    }

    Ok(left)
}

fn parse_multiplicative(&mut self) -> ParseResult<Expression> {
    let mut left = self.parse_unary()?;

    while self.check(TokenKind::Star) || self.check(TokenKind::Slash) {
        let op = if self.check(TokenKind::Star) {
            self.advance();
            BinaryOp::Mul
        } else {
            self.advance();
            BinaryOp::Div
        };

        let right = self.parse_unary()?;
        left = Expression::binary(left, op, right);
    }

    Ok(left)
}

fn parse_unary(&mut self) -> ParseResult<Expression> {
    if self.check(TokenKind::Minus) {
        self.advance();
        let operand = self.parse_unary()?;
        return Ok(Expression::unary(UnaryOp::Neg, operand));
    }

    if self.check(TokenKind::LessThan) {
        self.advance();
        let operand = self.parse_unary()?;
        return Ok(Expression::unary(UnaryOp::LoByte, operand));
    }

    if self.check(TokenKind::GreaterThan) {
        self.advance();
        let operand = self.parse_unary()?;
        return Ok(Expression::unary(UnaryOp::HiByte, operand));
    }

    self.parse_primary()
}

fn parse_primary(&mut self) -> ParseResult<Expression> {
    // Number literal
    if self.check(TokenKind::Number) {
        let token = self.advance();
        let value = token.number().unwrap_or(0);
        return Ok(Expression::Number(value as i64));
    }

    // Identifier
    if self.check(TokenKind::Identifier) {
        let token = self.advance();
        let name = token.text(self.source).to_string();
        return Ok(Expression::Identifier(name));
    }

    // Local label
    if self.check(TokenKind::LocalLabel) {
        let token = self.advance();
        let name = token.text(self.source).to_string();
        return Ok(Expression::LocalIdentifier(name));
    }

    // Current address ($)
    if self.check(TokenKind::Dollar) {
        self.advance();
        return Ok(Expression::CurrentAddress);
    }

    // Parenthesized expression
    if self.check(TokenKind::OpenParen) {
        self.advance();
        let expr = self.parse_expression()?;
        self.expect(TokenKind::CloseParen, "')'")?;
        return Ok(expr);
    }

    Err(ParseError::InvalidExpression {
        message: format!("expected expression, found {:?}", self.current.kind),
        location: self.current.location,
    })
}
}

Parsing Directives

Each directive has its own syntax:

#![allow(unused)]
fn main() {
fn parse_directive(&mut self) -> ParseResult<Statement> {
    let token = self.advance();
    let directive = token.directive().unwrap();
    let location = token.location;

    match directive {
        Directive::ORG => {
            let address = self.parse_expression()?;
            Ok(Statement::Directive(DirectiveStmt::Org { address, location }))
        }

        Directive::DB => {
            let values = self.parse_db_values()?;
            Ok(Statement::Directive(DirectiveStmt::Db { values, location }))
        }

        Directive::DW => {
            let values = self.parse_expression_list()?;
            Ok(Statement::Directive(DirectiveStmt::Dw { values, location }))
        }

        Directive::EQU => {
            // .equ NAME value
            if !self.check(TokenKind::Identifier) {
                return Err(ParseError::InvalidDirective {
                    message: "expected identifier after .equ".to_string(),
                    location: self.current.location,
                });
            }
            let name = self.advance().text(self.source).to_string();
            let value = self.parse_expression()?;
            Ok(Statement::Directive(DirectiveStmt::Equ { name, value, location }))
        }

        Directive::INCLUDE => {
            // .include "filename"
            if !self.check(TokenKind::String) {
                return Err(ParseError::InvalidDirective {
                    message: "expected string after .include".to_string(),
                    location: self.current.location,
                });
            }
            let path = self.advance().string().unwrap_or("").to_string();
            Ok(Statement::Directive(DirectiveStmt::Include { path, location }))
        }
    }
}
}

Parsing .db Values

The .db directive accepts bytes and strings:

#![allow(unused)]
fn main() {
fn parse_db_values(&mut self) -> ParseResult<Vec<DataValue>> {
    let mut values = Vec::new();

    loop {
        if self.check(TokenKind::String) {
            let token = self.advance();
            let s = token.string().unwrap_or("").to_string();
            values.push(DataValue::String(s));
        } else if can_start_expression(self.current.kind) {
            let expr = self.parse_expression()?;
            values.push(DataValue::Byte(expr));
        } else {
            break;
        }

        if !self.check(TokenKind::Comma) {
            break;
        }
        self.advance(); // consume comma
    }

    if values.is_empty() {
        return Err(ParseError::InvalidDirective {
            message: "expected at least one value for .db".to_string(),
            location: self.current.location,
        });
    }

    Ok(values)
}
}

Parsing Expression Lists

Used by .dw:

#![allow(unused)]
fn main() {
pub fn parse_expression_list(&mut self) -> ParseResult<Vec<Expression>> {
    let mut exprs = vec![self.parse_expression()?];

    while self.check(TokenKind::Comma) {
        self.advance();
        exprs.push(self.parse_expression()?);
    }

    Ok(exprs)
}
}

Local Label Resolution

Local labels are scoped to their parent global label. When parsing:

main:
.loop:
    bne .loop

other:
.loop:          ; different from main.loop
    bne .loop

The parser keeps track of the current global label. Local labels like .loop are qualified:

  • First .loopmain.loop
  • Second .loopother.loop

This is handled in the symbol table (next chapter), not the parser.

Complete Parsing Example

Let’s trace parsing:

.org 0x8000
start:
    lda (0x80),y
  1. Parse .org 0x8000

    • Token: Directive(ORG)
    • Parse expression: Number(0x8000)
    • Result: Directive(Org { address: Number(32768) })
  2. Parse start:

    • Token: Identifier
    • Text: “start”
    • Token: Colon
    • Result: Label(LabelDef { name: "start", is_local: false })
  3. Parse lda (0x80),y

    • Token: Instruction(LDA)
    • Has operand: yes (starts with ()
    • Parse indirect operand:
      • Token: OpenParen
      • Parse expression: Number(0x80)
      • Token: CloseParen
      • Token: Comma
      • Token: Register (y)
    • Result: Instruction(InstructionStmt { mnemonic: LDA, operand: IndirectY(Number(128)) })

Summary

In this chapter, we implemented:

  • Label parsing: identifier + colon, detecting local labels
  • Instruction parsing: mnemonic + optional operand
  • Operand parsing: immediate, accumulator, address, indexed, indirect modes
  • Expression parsing: with operator precedence
  • Directive parsing: .org, .db, .dw, .equ, .include

The parser now produces a complete AST from source code. In the next chapter, we’ll build the symbol table to track labels and constants.


Previous: Chapter 4 - Parser Structure | Next: Chapter 6 - The Symbol Table