Derive Handlers: Parser Generation from Event Schemas
The Idea
Yesterday we had event declarations. Today we have this:
~[derive(parser)]event token {}
| number u64[\d+]
| plus void[\+]
| eof That [\d+] isn’t decoration. It’s a regex pattern embedded in a phantom type. The derive(parser) annotation triggers a transform that reads those patterns and generates a complete lexer.
The generated token.parse event:
~token.parse(stream: "123+456")
| number n |> std.io:print.ln("number: {{ n.value }}")
| plus _ |> std.io:print.ln("plus")
| eof _ |> std.io:print.ln("eof")
| error e |> std.io:print.ln("error: {{ e.message }}") Output: number: 123
Ten lines. A lexer. That’s it.
How Derive Works
Derive handlers are transforms that operate on declarations, not invocations:
~[comptime]pub event parser {
event_decl: *const EventDecl, // The declaration being derived
program: *const Program, // The full AST
allocator: std.mem.Allocator
}
| transformed { program: *const Program } When the compiler sees ~[derive(parser)]event token {}, it:
- Finds the
parserderive handler - Passes it the
tokenevent declaration - The handler reads phantom types (
[\d+]) from the branches - Generates new AST nodes (
token.parseevent + proc) - Returns modified program with the new declarations
The handler is just Koru code. User-space. No compiler modifications needed.
Phantom Types Are The Key
Here’s the insight: phantom types are opaque strings. The compiler doesn’t interpret them—it just preserves them.
| number u64[\d+] The parser sees u64[\d+], extracts \d+ as the phantom annotation, stores it. Later, the derive handler reads that string and decides what to do with it.
Want regex patterns? Interpret them as regex. Want validation rules? Interpret them as validators. Want ORM mappings? Interpret them as column definitions.
The phantom is just data. The derive handler gives it meaning.
The Generated Code
The derive handler generates a proc body with pattern matching:
// Generated parser for token
const input = stream;
if (input.len == 0) return .{ .@"error" = ... };
// Try matching: number with pattern: d+
{
var i: usize = 0;
while (i < input.len and input[i] >= '0' and input[i] <= '9') : (i += 1) {}
if (i > 0) {
const parsed_value = @import("std").fmt.parseInt(u64, input[0..i], 10) catch 0;
return .{ .number = .{ .value = parsed_value, .remaining = input[i..] } };
}
}
// Try matching: plus with pattern: +
if (input[0] == '+') return .{ .plus = .{ .value = {}, .remaining = input[1..] } }; Native Zig. Zero overhead. The derive handler is compile-time code generation.
Why This Matters
This is the third pillar of Koru metaprogramming:
| Mechanism | Operates On | Example |
|---|---|---|
| Transforms | Invocations | ~if(cond) { } → generated branches |
| Expand | Templates | ~sql.query [SQL]{ SELECT * } → interpolated code |
| Derive | Declarations | ~[derive(X)]event foo → generated events/procs |
All three are user-space. All three compose. All three use the same AST manipulation primitives.
The Vision
This is a minimal lexer. But the architecture supports:
- Full grammars with precedence and associativity
- External regex libraries via
compiler:requires - Source block integration for compile-time DSL parsing
~[derive(parser)]event expr {}
| add { left: Expr, right: Expr }[prec:1, left]
| mul { left: Expr, right: Expr }[prec:2, left]
| num u64[\d+]
~parse.expr [expr] {
1 + 2 * 3
}
| add e |> // e.left = 1, e.right = mul(2,3) Parser generation from event schemas. The event IS the grammar. The branches ARE the AST nodes.
Metacircular all the way down.
Try It
~import "$std/parser_generator"
~import "$std/io"
~[derive(parser)]event token {}
| number u64[\d+]
| plus void[\+]
| eof
~token.parse(stream: "123+456")
| number n |> std.io:print.ln("number: {{ n.value }}")
| plus _ |> std.io:print.ln("plus")
| eof _ |> std.io:print.ln("eof")
| error e |> std.io:print.ln("error: {{ e.message }}") Pure Koru. No Zig boilerplate. Define your tokens, parse your input, handle your branches.
That’s the language we’re building.