Derive Handlers: Parser Generation from Event Schemas

· 8 min read

The Idea

Yesterday we had event declarations. Today we have this:

~[derive(parser)]event token {}
| number u64[\d+]
| plus void[\+]
| eof

That [\d+] isn’t decoration. It’s a regex pattern embedded in a phantom type. The derive(parser) annotation triggers a transform that reads those patterns and generates a complete lexer.

The generated token.parse event:

~token.parse(stream: "123+456")
| number n |> std.io:print.ln("number: {{ n.value }}")
| plus _ |> std.io:print.ln("plus")
| eof _ |> std.io:print.ln("eof")
| error e |> std.io:print.ln("error: {{ e.message }}")

Output: number: 123

Ten lines. A lexer. That’s it.


How Derive Works

Derive handlers are transforms that operate on declarations, not invocations:

~[comptime]pub event parser {
    event_decl: *const EventDecl,  // The declaration being derived
    program: *const Program,        // The full AST
    allocator: std.mem.Allocator
}
| transformed { program: *const Program }

When the compiler sees ~[derive(parser)]event token {}, it:

  1. Finds the parser derive handler
  2. Passes it the token event declaration
  3. The handler reads phantom types ([\d+]) from the branches
  4. Generates new AST nodes (token.parse event + proc)
  5. Returns modified program with the new declarations

The handler is just Koru code. User-space. No compiler modifications needed.


Phantom Types Are The Key

Here’s the insight: phantom types are opaque strings. The compiler doesn’t interpret them—it just preserves them.

| number u64[\d+]

The parser sees u64[\d+], extracts \d+ as the phantom annotation, stores it. Later, the derive handler reads that string and decides what to do with it.

Want regex patterns? Interpret them as regex. Want validation rules? Interpret them as validators. Want ORM mappings? Interpret them as column definitions.

The phantom is just data. The derive handler gives it meaning.


The Generated Code

The derive handler generates a proc body with pattern matching:

// Generated parser for token
const input = stream;
if (input.len == 0) return .{ .@"error" = ... };

// Try matching: number with pattern: d+
{
    var i: usize = 0;
    while (i < input.len and input[i] >= '0' and input[i] <= '9') : (i += 1) {}
    if (i > 0) {
        const parsed_value = @import("std").fmt.parseInt(u64, input[0..i], 10) catch 0;
        return .{ .number = .{ .value = parsed_value, .remaining = input[i..] } };
    }
}

// Try matching: plus with pattern: +
if (input[0] == '+') return .{ .plus = .{ .value = {}, .remaining = input[1..] } };

Native Zig. Zero overhead. The derive handler is compile-time code generation.


Why This Matters

This is the third pillar of Koru metaprogramming:

MechanismOperates OnExample
TransformsInvocations~if(cond) { } → generated branches
ExpandTemplates~sql.query [SQL]{ SELECT * } → interpolated code
DeriveDeclarations~[derive(X)]event foo → generated events/procs

All three are user-space. All three compose. All three use the same AST manipulation primitives.


The Vision

This is a minimal lexer. But the architecture supports:

  • Full grammars with precedence and associativity
  • External regex libraries via compiler:requires
  • Source block integration for compile-time DSL parsing
~[derive(parser)]event expr {}
| add { left: Expr, right: Expr }[prec:1, left]
| mul { left: Expr, right: Expr }[prec:2, left]
| num u64[\d+]

~parse.expr [expr] {
    1 + 2 * 3
}
| add e |> // e.left = 1, e.right = mul(2,3)

Parser generation from event schemas. The event IS the grammar. The branches ARE the AST nodes.

Metacircular all the way down.


Try It

~import "$std/parser_generator"
~import "$std/io"

~[derive(parser)]event token {}
| number u64[\d+]
| plus void[\+]
| eof

~token.parse(stream: "123+456")
| number n |> std.io:print.ln("number: {{ n.value }}")
| plus _ |> std.io:print.ln("plus")
| eof _ |> std.io:print.ln("eof")
| error e |> std.io:print.ln("error: {{ e.message }}")

Pure Koru. No Zig boilerplate. Define your tokens, parse your input, handle your branches.

That’s the language we’re building.