Koru Compiler Architecture

The compiler that bootstraps itself

· 12 min read

The Bootstrap Machine

Most compilers are static programs. The compiler reads your source, applies a fixed pipeline, and produces output. The pipeline was written by the language authors. You can’t change it. Every program you compile goes through the same one.

Koru’s frontend doesn’t compile your program. It generates a compiler for your program, then you compile and run that.

This is not a subtle distinction. It changes what compilation can do.

The Pipeline

The frontend — what you invoke when you run koruc — is the bootstrap machine. It reads your source once, determines what kind of compiler your program needs, and generates that compiler as a Zig source file. This generated backend is then compiled into an executable and run. The backend is what actually does the compilation work: running transforms, analysis passes, dead-stripping, and emitting the final output.

Every program gets its own compiler, assembled specifically from what that program declared.

Annotations Are Instructions to the Bootstrap Machine

The frontend reads your event declarations and uses their annotations to decide what infrastructure to generate. Each annotation is a different instruction:

  • [transform] — generate dispatch stubs that pass the live AST to this event’s proc and replace matching invocations with its output. This is how ~if becomes a real if with zero overhead.
  • [comptime] — compile these procs into the backend, not the output binary. They run during compilation, with full access to the AST.
  • [norun] — this flow is data, never a call. Extract its source block as a raw string. Used for declaring build dependencies, route metadata, schemas — anything you want to sit in the AST for other passes to read.
  • [abstract] — this event has a default implementation that user code can replace. Wire up the override mechanism. Route to the user’s version if one is present.

You declare an event with the right annotation and the frontend generates the plumbing. No registration. No compiler modification required. The declaration is sufficient.

The standard library’s control flow — if, for, capture — is implemented as [transform] events in std.control, a Koru source file like any other. There is nothing special about them. Any user can declare their own [transform] event and get the same treatment.

What the Generated Backend Can Do

The generated backend is a compiled Zig program with a main(). When it runs, it can do anything a program can do.

This is worth dwelling on, because it is genuinely different from every other compile-time evaluation mechanism we know of.

Zig’s comptime is powerful, but its restrictions are not a technical limitation — they are a design choice. Zig comptime cannot do I/O, cannot make system calls, cannot allocate with a general-purpose allocator. These restrictions exist specifically to make compilation deterministic and reproducible. Same source, same output. Always. The Zig team made that trade deliberately.

Rust’s proc macros run in a sandboxed subprocess with a restricted API. C++ template metaprogramming is effectively a pure functional language with no side effects. Lisp macros operate at read-time in a restricted environment. The pattern is consistent across languages: compile-time evaluation is constrained to make it safe and predictable.

Koru makes the opposite bet. The backend can read files, make network calls, spawn subprocesses, query a database, print to stdout, read environment variables, and ask the user a question. The ceiling is whatever the operating system allows.

This is not an oversight. It is the point.

A compiler for a program that embeds API types can fetch a live OpenAPI spec during compilation. A compiler for a program that generates routing tables can query the database schema. A compiler for a program that targets a GPU can compile and validate shaders as part of the build. None of this requires language extensions or special compiler support. It requires declaring [comptime] procs that do the work, and the bootstrap machine handles the rest.

Determinism Is a Non-Goal

Most build systems treat reproducible builds as a fundamental property. Same source, same environment, same output — every time.

Determinism in the Koru compilation pipeline is a decided non-goal.

A backend that queries a live database during compilation should produce different output when the database changes. A backend that reads environment variables to configure codegen should produce different output in different environments. Restricting this in the name of reproducibility would be restricting what programs can do — trading capability for a property that not every program needs.

If you want deterministic output, write deterministic backend procs. The language will not take the capability away from you.

And here is the thing: getting a fully deterministic, reproducible pipeline is not a deep architectural commitment. It is a few lines of code. A coordinate proc that is a pure function of the AST — no I/O, no external calls, no environment reads — gives you everything a traditional compiler guarantees. You opt into it by writing it that way. You are not opting out of something the system enforces.

This is the same philosophy as the rest of the language. The compiler doesn’t decide what your compilation pipeline does. You do.

The Compilation Pipeline Is User Code

The compiler passes themselves are orchestrated by a [comptime|abstract] event in the standard library:

This is the default compiler:coordinate pipeline. Because coordinate is [abstract], any source file can replace it:

~std.compiler:coordinate = my_pipeline(...)

The override is parsed by the frontend, compiled into the backend, and routed by the abstract/impl mechanism at runtime. You can add passes, reorder them, skip the optimizer, run additional analysis, or target a completely different output format.

The compilation pipeline is not inside the compiler. It is user code the bootstrap machine compiled into your compiler.

Note also that the analysis passes — structure checking, purity analysis, phantom type validation — run in the backend after transforms have been applied. This is not incidental. You cannot check branch coverage on an ~if before it has been transformed into a conditional node. The architecture forces the right order: transform first, then validate the transformed program.

The Build Graph Is Self-Describing

The backend for each program needs to know which Zig modules to link. These dependencies are declared in compiler.kz itself, using [norun] flows that contain raw Zig build.zig fragments:

~[comptime|norun] pub event requires { source: Source }

~requires {
    const ast_module = b.createModule(.{
        .root_source_file = .{ .cwd_relative = REL_TO_ROOT ++ "/src/ast.zig" },
    });
    exe.addImport("ast", ast_module);
}

The frontend scans the AST for compiler:requires flows, extracts their source blocks, and assembles them into the backend’s build file. The compiler declares its own build dependencies in the same language it compiles.

Any module — including user modules — can declare backend dependencies the same way. If your [comptime] proc needs a Zig library, you declare it with ~compiler:requires and the build system discovers it by parsing your source.

Interactive Compilation

One consequence of the backend being a full program: it can be interactive.

Koru ships --inter, which launches a live session during compilation. The backend pauses at defined points — specified inside compiler:coordinate as a regular pipeline step — and opens an interface to the compilation context: the AST in its current state, the passes completed, the data accumulated so far.

This is not a debugger attached to the compiler from the outside. The interactive session is declared inside the pipeline alongside the analysis and emission passes. It is just another thing the backend can do, because the backend is a program.

What This Architecture Enables

The consequence of generating a per-program compiler is that the boundary between “the compiler” and “user code” does not exist at the structural level.

The standard library’s if and for transforms, the compiler’s own analysis passes, user-defined code generators — all are declared the same way, compiled by the same bootstrap machine, and run in the same backend. The standard library is written in the same Koru source format user modules use. The compilation pipeline uses the same event/flow system application code uses.

Adding a new compiler pass, a new transform, or a new code generation target is the same operation as adding any other event. What makes something “part of the compiler” is only which module it lives in — not how it participates in the system, and not any special privilege granted by the language.

The bootstrap machine makes it all possible. It reads what you declared, builds exactly the compiler those declarations require, and gets out of the way.