Strongly typed, C-like systems programming language built for resource-constrained 8-bit microprocessors.
- Source Tracking Tokenizer: Maps characters to discrete tokens while maintaining source locations (file, line, column) for robust compilation errors.
- Recursive Descent Parser: Transforms the token stream into a structured AST, treating hardware registers and standard controls as first-class grammatical constructs.
- Lexically Scoped Semantic Analyzer: Two-pass validation engine over the AST. Pass 1 registers all top-level declarations (functions, structs, registers, globals) into the global symbol table. Pass 2 walks function bodies with a scoped symbol table, checking undeclared identifiers, type mismatches, argument counts/types, struct field access, lvalue validity, and return-type consistency. Invalid declarations are poisoned to prevent cascading diagnostics.
- IR Generator: Lowers the analysed AST into a self-contained three-address code (TAC) intermediate representation. The IR module contains struct layouts with computed field offsets, global/register definitions with hardware addresses baked in, and one flat instruction stream per function - codegen can emit target code from the IR alone, without consulting the AST or symbol table. Supports incremental compilation:
-cserializes the IR to a.ofile that can be loaded back to skip the frontend entirely. - Code Generator: Emits valid 65C02 ROM binaries (32K) with a bootstrap runtime, interrupt vectors, and flat zero-page register allocation. Avoids slow stack-based execution by mapping local variables, temporaries, and parameters directly onto zero-page slots. Globals are allocated in RAM ($0200+) and initialized in the bootstrap before
JSR main. String literals are placed in a ROM data section with backpatching fixups. Supports arithmetic (+,-, unary-) for all integer types (u8/i8/u16/i16), comparisons across all widths and signedness (unsigned via carry-flag, signed via N⊕V), and pointer dereference. Programs compile and run on real hardware.
- Disassembler: Decodes compiled
.binfiles back into annotated 65C02 assembly, resolving jump targets to named labels for readability. Supports section-aware output (.text/.datasplit), hex dumps with ASCII, and ROM usage summaries. See c02-objdump for more information.
C02 is under active, early development. The complete frontend (tokenizer, parser, semantic analyzer), IR generation, and code generator are functional and tested — simple programs compile to valid 65C02 ROMs and run on real hardware.
- Data movement: variable copies, constant stores, hardware register writes.
- Control flow:
if/else,while,forloops via label/jump/conditional-jump. - Arithmetic:
+,-, and unary-for all integer types (u8, i8, u16, i16). Width-aware multi-byte emission for 16-bit operations with carry/borrow propagation. - Comparisons: all six relational operators (
<,<=,==,!=,>=,>) for all widths (u8, u16) and signedness (unsigned via carry-flag, signed via N⊕V). 16-bit comparisons use a high-byte-first pattern. - Increment/decrement:
++/--for both u8 and 16-bit values (pointers, u16), including globals. - Pointer dereference:
*pvia indirect indexed addressing (LDA ($nn),Y). - Global variables: RAM-allocated globals with bootstrap initialization, correctly accessed via absolute addressing throughout all codegen paths. String literals placed in a ROM data section with backpatching fixups.
- Function calls — the ABI zone ($EF–$FF) is reserved but
JSR/parameter passing is not wired up yet. - Multiplication, division, modulo (
*,/,%) — no native 6502 instructions; needs runtime helper routines. - Struct field access (
s.fieldcodegen). - Type casts — implicit widening (u8→u16) reads a garbage high byte; needs explicit zero-extension.
- Address-of (
&x) — parsed and analysed but no codegen. - Pointer store (
*p = val) — parsed and analysed but no codegen for variable-destination stores. - Arrays — no array type or subscript syntax (
a[i]). - Missing-return detection is shallow. A non-void function with no
returnat the end is flagged, but the analyzer does not perform full path-coverage analysis.
If you're exploring the codebase: the parser (parser.c), the analyzer (analyzer.c), the IR generator (ir.c), and the code generator (generator.c) are the main files. Issues and PRs are welcome.
sudo apt install build-essential curl python3 python3-pip -y
# Official Rust install script (for c02-objdump)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# py65 6502 emulator (for runtime tests)
pip install py65
git clone https://github.com/jackwthake/C02.git
cd C02
makecc02 [OPTIONS] <FILE><FILE>: Input file (.c02source or.o/.outIR object)-h, --help: Show help message-c: Incremental compile - emit a.oIR object file instead of a final binary-o, --output: Specify output file--token-dump: Dump the token list after tokenization--ast-dump: Dump the AST after parsing--symbol-dump: Dump the global symbol table after analysis--ir-dump: Dump the IR (TAC instructions) after lowering--syntax-check-only: Stop after syntax and semantic checks--time-report: Print a report showing how long each stage of compilation took
Incremental compilation:
cc02 -c hello_world.c02 -o hello_world.o # compile to IR object
cc02 --ir-dump hello_world.o # inspect the IR from the object fileAll generated error messages are presented in a clang like format with concise source locations. The printed file locations use an editor-friendly format, enabling you to click to open the affected file.
The grammar below reflects what the tokenizer and parser currently accept. Semantic analysis validates the full AST after parsing, IR generation lowers it to TAC, and the code generator emits 65C02 machine code — see Getting Started and Current Status for what's working today.
u8/i8: 8-bit integers (unsigned / signed)u16/i16: 16-bit integers (unsigned / signed)void: Function return types with no payload.structnames: a bare identifier in type position resolves to a struct type (e.g.Point p;).- Pointer types: any base type followed by one or more
*(e.g.u8 *msg,u16 **pp).
// single-line comment
/*
block comment
*/A .c02 file is a sequence of top-level declarations: functions, reg declarations, struct declarations, global variables, and forward declarations (decl).
fn name(u8 a, u16 *b) -> void {
// body
}- Parameter list is
(type name, type name, ...), can be empty:(). - Return type is required, introduced with
->.
Hardware interface registers are pinned directly to absolute memory addresses.
reg u8 PORTA @ 0x6001;
reg u8 PORTB @ 0x6000;struct Point {
u8 x;
u8 y;
}- Body is a sequence of
type name;fields, no nested initialisers. - A trailing
;after the closing}is optional.
u8 *msg = "Hello C02!";
u16 counter;
Point origin;- Same form as a local variable declaration:
type name;ortype name = expr;. - Struct-typed globals are supported (
Point p;).
Forward declarations introduce the signature of a function or global defined in another translation unit, allowing cross-file references with incremental compilation (-c).
decl fn send_byte(u8 b) -> void;
decl u8 counter;- A
declfor a function uses the same signature syntax asfnbut has no body. - A
declfor a global isdecl type name;with no initialiser. - Redeclaring a name that already exists in the same file is an error.
// variable declaration (local)
u8 x = 5;
Point p; // struct-typed declaration
p = Point{ .x = x, .y = 10 }; // struct with initializer
p = Point{}; // zero initialized struct
Point *p2; // or p2 = null; pointer to a Point struct, uninitialized
Point *p2 = &p; // pointer to a Point struct, initialized
// assignment (also: += -= *= /= %=)
x = x + 1;
x += 1;
// return
return;
return x;
// if / else if / else
if (x > 0) {
// ...
} else if (true) { // `true` and `false` are accepted keywords
// ...
} else {
// ...
}
// while
while (x < 10) {
x += 1;
}
// for (any of the three clauses may be empty)
for (u8 i = 0; i < 10; i += 1) {
// ...
}
// function call statement
do_thing(a, b);Precedence, lowest to highest:
|| && | ^ & == != < > <= >= << >> + - * / % (unary) (postfix)
- Unary (prefix):
!(logical not),-(negate),&(address-of),~(bitwise not),++/--,*and@(dereference). - Postfix:
.fieldfield access, chainable (a.b.c). Auto-dereferences struct pointers (ptr.fieldwhereptris aStruct*). - Calls:
name(arg1, arg2, ...). - Casts:
(type)expr, e.g.(u16)x. - Grouping:
(expr). - Literals: decimal/hex integers (
l_num), string literals (l_string), identifiers.
This program cycles LEDs connected to PORTB on a 65C02 breadboard — counting up from 0 to 255 and back down in an infinite loop. It compiles to a valid 32K ROM and runs on real hardware.
reg u8 PORTB @ 0x6000;
reg u8 DDRB @ 0x6002;
fn main() -> void {
DDRB = 0xFF; // Set all pins of PORTB as output
while(true) {
u8 i = 0;
for (; i < 255; ++i) {
PORTB = i;
}
PORTB = i;
for (; i > 0; --i) {
PORTB = i;
}
}
}cc02 led_counter.c02 -o led_counter.bin # compile to 32K ROM
c02-objdump led_counter.bin # disassemble to inspect the outputTo maximize compilation density and execution speed, the code generator reserves and maps lower RAM ($0000–$00FF, The Zero Page) to form a virtual register file:
| Address Range | Identifier | Purpose |
|---|---|---|
$00 |
FP |
Frame Pointer: Tracks multi-byte local variable frames in main RAM. |
$02 |
RET |
Return Register: Where every function or conditional puts its return value. |
$04 – $EE |
r0 – r117 |
Scratch Registers: Compiler-managed 16-bit scratchpads for expression temporaries, local variables, and globals. Allocated per-function from $04 upward. |
$EF – $FF |
a0 – a8 |
Function ABI Zone: Rapid parameter passing without stack overhead. Supports up to 8 sixteen-bit parameters. |
| Dependency | License | Used By |
|---|---|---|
| clap | MIT / Apache-2.0 | c02-objdump CLI argument parsing |
| py65 | BSD | Test harness 65C02 emulator for runtime verification |
