Chapter 5: Building the Parser - Implementation
In this chapter, we’ll implement the complete parser for ByteASM, building on the infrastructure from Chapter 4.
Parsing Labels
Labels are identifiers followed by a colon:
#![allow(unused)]
fn main() {
fn parse_label(&mut self) -> ParseResult<Statement> {
let token = self.advance();
let name = token.text(self.source).to_string();
let is_local = token.kind == TokenKind::LocalLabel;
let location = token.location;
// Expect colon after label name
if !self.check(TokenKind::Colon) {
return Err(ParseError::InvalidLabel {
message: "expected ':' after label name".to_string(),
location,
});
}
self.advance(); // consume colon
Ok(Statement::Label(LabelDef {
name,
is_local,
location,
}))
}
}
Examples
| Input | Result |
|---|---|
main: | LabelDef { name: "main", is_local: false } |
.loop: | LabelDef { name: ".loop", is_local: true } |
@temp: | LabelDef { name: "@temp", is_local: true } |
Parsing Instructions
Instructions have an optional operand:
#![allow(unused)]
fn main() {
fn parse_instruction(&mut self) -> ParseResult<Statement> {
let token = self.advance();
let mnemonic = token.mnemonic().unwrap();
let location = token.location;
// Check if there's an operand
let operand = if self.has_operand() {
Some(self.parse_operand()?)
} else {
None
};
Ok(Statement::Instruction(InstructionStmt {
mnemonic,
operand,
location,
}))
}
fn has_operand(&self) -> bool {
matches!(
self.current.kind,
TokenKind::Hash
| TokenKind::OpenParen
| TokenKind::Identifier
| TokenKind::LocalLabel
| TokenKind::Number
| TokenKind::Dollar
| TokenKind::LessThan
| TokenKind::GreaterThan
| TokenKind::Register
)
}
}
Parsing Operands
The operand determines the addressing mode. Here’s the decision tree:
Token → Operand Type
──────────────────────────────────
# → Immediate
a (register) → Accumulator
( → Indirect, IndirectX, or IndirectY
other → Address, IndexedX, or IndexedY
#![allow(unused)]
fn main() {
fn parse_operand(&mut self) -> ParseResult<Operand> {
// Immediate: #expr
if self.check(TokenKind::Hash) {
self.advance();
let expr = self.parse_expression()?;
return Ok(Operand::Immediate(expr));
}
// Accumulator: a
if self.check(TokenKind::Register) {
let text = self.current.text(self.source).to_lowercase();
if text == "a" {
self.advance();
return Ok(Operand::Accumulator);
}
}
// Indirect modes: (...)
if self.check(TokenKind::OpenParen) {
return self.parse_indirect_operand();
}
// Address or indexed: expr or expr,x or expr,y
self.parse_address_operand()
}
}
Parsing Address Operands
#![allow(unused)]
fn main() {
fn parse_address_operand(&mut self) -> ParseResult<Operand> {
let expr = self.parse_expression()?;
// Check for indexing
if self.check(TokenKind::Comma) {
self.advance();
if self.check(TokenKind::Register) {
let reg = self.current.text(self.source).to_lowercase();
self.advance();
return match reg.as_str() {
"x" => Ok(Operand::IndexedX(expr)),
"y" => Ok(Operand::IndexedY(expr)),
_ => Err(ParseError::InvalidOperand {
message: format!("expected 'x' or 'y', found '{}'", reg),
location: self.previous.location,
}),
};
}
}
Ok(Operand::Address(expr))
}
}
Parsing Indirect Operands
Indirect operands have three forms:
(addr)- Indirect(zp,x)- Indexed Indirect(zp),y- Indirect Indexed
#![allow(unused)]
fn main() {
fn parse_indirect_operand(&mut self) -> ParseResult<Operand> {
self.advance(); // consume '('
let expr = self.parse_expression()?;
// (zp,x) - Indexed Indirect
if self.check(TokenKind::Comma) {
self.advance();
if self.check(TokenKind::Register) {
let reg = self.current.text(self.source).to_lowercase();
self.advance();
if reg != "x" {
return Err(ParseError::InvalidOperand {
message: "indexed indirect only supports X register".to_string(),
location: self.previous.location,
});
}
self.expect(TokenKind::CloseParen, "')'")?;
return Ok(Operand::IndirectX(expr));
}
}
self.expect(TokenKind::CloseParen, "')'")?;
// (addr),y - Indirect Indexed
if self.check(TokenKind::Comma) {
self.advance();
if self.check(TokenKind::Register) {
let reg = self.current.text(self.source).to_lowercase();
self.advance();
if reg != "y" {
return Err(ParseError::InvalidOperand {
message: "indirect indexed only supports Y register".to_string(),
location: self.previous.location,
});
}
return Ok(Operand::IndirectY(expr));
}
}
// (addr) - Plain Indirect
Ok(Operand::Indirect(expr))
}
}
Parsing Expressions
Expressions follow standard precedence rules:
*,/bind tighter than+,-- Unary operators (
-,<,>) bind tightest
Expression Grammar
expression → additive
additive → multiplicative ( ('+' | '-') multiplicative )*
multiplicative → unary ( ('*' | '/') unary )*
unary → ('-' | '<' | '>') unary | primary
primary → NUMBER | IDENTIFIER | LOCAL_LABEL | '$' | '(' expression ')'
Implementation
#![allow(unused)]
fn main() {
pub fn parse_expression(&mut self) -> ParseResult<Expression> {
self.parse_additive()
}
fn parse_additive(&mut self) -> ParseResult<Expression> {
let mut left = self.parse_multiplicative()?;
while self.check(TokenKind::Plus) || self.check(TokenKind::Minus) {
let op = if self.check(TokenKind::Plus) {
self.advance();
BinaryOp::Add
} else {
self.advance();
BinaryOp::Sub
};
let right = self.parse_multiplicative()?;
left = Expression::binary(left, op, right);
}
Ok(left)
}
fn parse_multiplicative(&mut self) -> ParseResult<Expression> {
let mut left = self.parse_unary()?;
while self.check(TokenKind::Star) || self.check(TokenKind::Slash) {
let op = if self.check(TokenKind::Star) {
self.advance();
BinaryOp::Mul
} else {
self.advance();
BinaryOp::Div
};
let right = self.parse_unary()?;
left = Expression::binary(left, op, right);
}
Ok(left)
}
fn parse_unary(&mut self) -> ParseResult<Expression> {
if self.check(TokenKind::Minus) {
self.advance();
let operand = self.parse_unary()?;
return Ok(Expression::unary(UnaryOp::Neg, operand));
}
if self.check(TokenKind::LessThan) {
self.advance();
let operand = self.parse_unary()?;
return Ok(Expression::unary(UnaryOp::LoByte, operand));
}
if self.check(TokenKind::GreaterThan) {
self.advance();
let operand = self.parse_unary()?;
return Ok(Expression::unary(UnaryOp::HiByte, operand));
}
self.parse_primary()
}
fn parse_primary(&mut self) -> ParseResult<Expression> {
// Number literal
if self.check(TokenKind::Number) {
let token = self.advance();
let value = token.number().unwrap_or(0);
return Ok(Expression::Number(value as i64));
}
// Identifier
if self.check(TokenKind::Identifier) {
let token = self.advance();
let name = token.text(self.source).to_string();
return Ok(Expression::Identifier(name));
}
// Local label
if self.check(TokenKind::LocalLabel) {
let token = self.advance();
let name = token.text(self.source).to_string();
return Ok(Expression::LocalIdentifier(name));
}
// Current address ($)
if self.check(TokenKind::Dollar) {
self.advance();
return Ok(Expression::CurrentAddress);
}
// Parenthesized expression
if self.check(TokenKind::OpenParen) {
self.advance();
let expr = self.parse_expression()?;
self.expect(TokenKind::CloseParen, "')'")?;
return Ok(expr);
}
Err(ParseError::InvalidExpression {
message: format!("expected expression, found {:?}", self.current.kind),
location: self.current.location,
})
}
}
Parsing Directives
Each directive has its own syntax:
#![allow(unused)]
fn main() {
fn parse_directive(&mut self) -> ParseResult<Statement> {
let token = self.advance();
let directive = token.directive().unwrap();
let location = token.location;
match directive {
Directive::ORG => {
let address = self.parse_expression()?;
Ok(Statement::Directive(DirectiveStmt::Org { address, location }))
}
Directive::DB => {
let values = self.parse_db_values()?;
Ok(Statement::Directive(DirectiveStmt::Db { values, location }))
}
Directive::DW => {
let values = self.parse_expression_list()?;
Ok(Statement::Directive(DirectiveStmt::Dw { values, location }))
}
Directive::EQU => {
// .equ NAME value
if !self.check(TokenKind::Identifier) {
return Err(ParseError::InvalidDirective {
message: "expected identifier after .equ".to_string(),
location: self.current.location,
});
}
let name = self.advance().text(self.source).to_string();
let value = self.parse_expression()?;
Ok(Statement::Directive(DirectiveStmt::Equ { name, value, location }))
}
Directive::INCLUDE => {
// .include "filename"
if !self.check(TokenKind::String) {
return Err(ParseError::InvalidDirective {
message: "expected string after .include".to_string(),
location: self.current.location,
});
}
let path = self.advance().string().unwrap_or("").to_string();
Ok(Statement::Directive(DirectiveStmt::Include { path, location }))
}
}
}
}
Parsing .db Values
The .db directive accepts bytes and strings:
#![allow(unused)]
fn main() {
fn parse_db_values(&mut self) -> ParseResult<Vec<DataValue>> {
let mut values = Vec::new();
loop {
if self.check(TokenKind::String) {
let token = self.advance();
let s = token.string().unwrap_or("").to_string();
values.push(DataValue::String(s));
} else if can_start_expression(self.current.kind) {
let expr = self.parse_expression()?;
values.push(DataValue::Byte(expr));
} else {
break;
}
if !self.check(TokenKind::Comma) {
break;
}
self.advance(); // consume comma
}
if values.is_empty() {
return Err(ParseError::InvalidDirective {
message: "expected at least one value for .db".to_string(),
location: self.current.location,
});
}
Ok(values)
}
}
Parsing Expression Lists
Used by .dw:
#![allow(unused)]
fn main() {
pub fn parse_expression_list(&mut self) -> ParseResult<Vec<Expression>> {
let mut exprs = vec![self.parse_expression()?];
while self.check(TokenKind::Comma) {
self.advance();
exprs.push(self.parse_expression()?);
}
Ok(exprs)
}
}
Local Label Resolution
Local labels are scoped to their parent global label. When parsing:
main:
.loop:
bne .loop
other:
.loop: ; different from main.loop
bne .loop
The parser keeps track of the current global label. Local labels like .loop are qualified:
- First
.loop→main.loop - Second
.loop→other.loop
This is handled in the symbol table (next chapter), not the parser.
Complete Parsing Example
Let’s trace parsing:
.org 0x8000
start:
lda (0x80),y
-
Parse
.org 0x8000- Token: Directive(ORG)
- Parse expression: Number(0x8000)
- Result:
Directive(Org { address: Number(32768) })
-
Parse
start:- Token: Identifier
- Text: “start”
- Token: Colon
- Result:
Label(LabelDef { name: "start", is_local: false })
-
Parse
lda (0x80),y- Token: Instruction(LDA)
- Has operand: yes (starts with
() - Parse indirect operand:
- Token: OpenParen
- Parse expression: Number(0x80)
- Token: CloseParen
- Token: Comma
- Token: Register (y)
- Result:
Instruction(InstructionStmt { mnemonic: LDA, operand: IndirectY(Number(128)) })
Summary
In this chapter, we implemented:
- Label parsing: identifier + colon, detecting local labels
- Instruction parsing: mnemonic + optional operand
- Operand parsing: immediate, accumulator, address, indexed, indirect modes
- Expression parsing: with operator precedence
- Directive parsing: .org, .db, .dw, .equ, .include
The parser now produces a complete AST from source code. In the next chapter, we’ll build the symbol table to track labels and constants.
Previous: Chapter 4 - Parser Structure | Next: Chapter 6 - The Symbol Table