From Magic to AST: Zero-Cost Event Taps

From Magic to AST: Zero-Cost Event Taps

The breakthrough: Event taps are no longer magical runtime injection. They’re regular AST nodes that the optimizer can see, inline, and eliminate. This transforms taps from a debugging curiosity into a foundational tool for progressive optimization.


The Problem: Magic Code Generation

Originally, event taps were implemented as code emission - when the compiler generated Zig code, it would inject tap invocations directly into the output. This worked, but it had serious limitations:

1. The Optimizer Was Blind

// What the emitter generated:
pub fn add_five(input: AddFiveInput) AddFiveOutput {
    const result = input.x + 5;

    // TAP INJECTED HERE (invisible to optimizer!)
    log_arithmetic(.{ .operation = "add", .value = result });

    return .{ .done = .{ .result = result } };
}

The Zig optimizer never saw this as part of the AST. It couldn’t:

  • Inline the tap call
  • Constant-fold the arguments
  • Eliminate dead taps
  • Reason about side effects

Worse: Opaque Code Blocks Optimization

This wasn’t just “taps can’t be optimized.” Opaque code prevents the optimizer from optimizing anything around it.

When the optimizer encounters a function call, it has to ask: “What does this do? Does it modify global state? Does it have side effects? Is it pure?”

With emission-time taps, the optimizer saw:

pub fn compute() i32 {
    const x = expensive_calculation();

    // OPAQUE CALL - optimizer knows NOTHING about this!
    unknown_tap_function(x);

    const y = expensive_calculation();  // Could this be CSE'd with x?
    return x + y;
}

The optimizer has to assume the worst:

  • “Maybe unknown_tap_function() modified global state?”
  • “Maybe it invalidated the result of expensive_calculation()?”
  • “I can’t eliminate redundant calculations because I don’t know if that tap changed things!”

So it couldn’t:

  • Common Subexpression Elimination (CSE): Can’t reuse x for y because tap might have side effects
  • Loop-Invariant Code Motion: Can’t hoist calculations out of loops if taps are inside
  • Function Inlining: Harder to inline functions that contain opaque taps
  • Dead Store Elimination: Can’t eliminate unused variables if taps might read them

Real example where taps broke optimization:

~event fibonacci { n: i32 } | done { result: i32 }

~proc fibonacci {
    if (n <= 1) return .{ .done = .{ .result = n } };
    const a = fibonacci(n: n - 1);  // Recursive call
    const b = fibonacci(n: n - 2);  // Another recursive call
    return .{ .done = .{ .result = a.done.result + b.done.result } };
}

// Tap: Profile fibonacci calls
~fibonacci -> *
| done d |> profile(n: d.result)

With emission-time taps, the optimizer couldn’t memoize the fibonacci results because:

  1. Each call had an opaque profile() tap injection
  2. Optimizer didn’t know if profile() was pure
  3. Had to assume it might modify state or have side effects
  4. Couldn’t eliminate redundant calculations

Result: Fibonacci stayed O(2^n) even when it should have been memoizable!

2. Hard to Reason About

When you read Koru source code, you saw:

~proc add_five {
    const result = x + 5;
    return .{ .done = .{ .result = result } };
}

But when you ran it, there was invisible code executing. The gap between source and execution was hidden in the emitter’s magic.

3. Not Composable

Other compiler passes couldn’t work with taps because they didn’t exist in the AST. Want to analyze tap invocations? Transform them? Optimize based on them? You couldn’t - they were emission-time phantoms.


The Breakthrough: Taps as AST Nodes

Now, event taps are transformed into regular AST nodes before emission. This happens as a dedicated compiler pass:

~compiler.coordinate.default =
    compiler.context.create(ast: ast, allocator: allocator)
    | created c0 |> compiler.coordinate.frontend(ctx: c0.ctx)
      | continued c1 |> compiler.coordinate.transform_taps(ctx: c1.ctx)  // ← HERE!
        | continued c2 |> compiler.coordinate.analysis(ctx: c2.ctx)
          | continued c3 |> compiler.coordinate.optimize(ctx: c3.ctx)    // ← Can see taps!
            | continued c4 |> compiler.coordinate.emission(ctx: c4.ctx)
              | continued c5 |> ...

The transform_taps pass:

  1. Scans the AST for tap declarations
  2. Builds a tap registry (which taps apply to which events/branches)
  3. Walks all flows and inserts tap invocations as regular flow steps
  4. Returns a transformed AST with taps fully materialized

After this pass, taps are no longer special. They’re just regular event invocations in the flow, visible to every subsequent pass.


The Code: How It Works

Here’s the core transformation algorithm from src/tap_transformer.zig:

pub fn transformAst(
    source_ast: *const ast.Program,
    tap_registry: *TapRegistry,
    allocator: std.mem.Allocator,
) !*ast.Program {
    // 1. Build tap registry from tap declarations
    // 2. Walk all flows and subflows
    // 3. For each continuation, check if any taps match
    // 4. Insert matching taps as regular Steps in the continuation pipeline

    const transformed_items = try transformItems(
        source_ast.items,
        tap_registry,
        allocator,
        source_ast.main_module_name,
    );

    return create_program_with(transformed_items);
}

Key insight: After transformation, a tap invocation is indistinguishable from a regular event call. It’s just Step{ .invocation = tap_event } in the continuation pipeline.


Example: Before and After

Original Koru source:

~event add_five { x: i32 } | done { result: i32 }

~proc add_five {
    const result = x + 5;
    return .{ .done = .{ .result = result } };
}

// Tap: Log when add_five completes
~add_five -> *
| done d |> log_arithmetic(operation: "add", value: d.result)

// User code
~add_five(x: 10)
| done d |> println(text: "Result: {d}", d.result)

After AST transformation (conceptual representation):

~add_five(x: 10)
| done d |>
    // TAP INSERTED AS REGULAR FLOW STEP!
    log_arithmetic(operation: "add", value: d.result) |>
    // Original continuation
    println(text: "Result: {d}", d.result)

After optimization (what Zig sees):

pub fn add_five(input: AddFiveInput) AddFiveOutput {
    const result = input.x + 5;
    return .{ .done = .{ .result = result } };
}

pub fn main_flow() void {
    const add_result = add_five(.{ .x = 10 });

    // Optimizer inlined the tap!
    log_arithmetic(.{
        .operation = "add",
        .value = add_result.done.result  // Constant folded to 15!
    });

    println(.{ .text = "Result: 15" });
}

The tap is now just a regular function call that the Zig optimizer can see and optimize.


Why This Is Powerful: Zero-Cost Abstraction

With taps as AST nodes, the optimizer can finally see them and reason about their purity. This unblocks optimizations that were previously impossible:

The optimizer now knows:

  • “This tap is pure (no side effects) → I can eliminate redundant calls”
  • “This tap doesn’t modify state → I can reorder operations”
  • “This tap is unused → I can eliminate it entirely”
  • “This tap is simple → I can inline it”

What Now Works That Was Broken Before

Remember that fibonacci example? Now the optimizer can see the profile() tap is pure (it just logs data). So it can:

// After AST transformation, optimizer sees:
pub fn fibonacci(n: i32) i32 {
    if (n <= 1) return n;

    // TAP VISIBLE AS REGULAR CALL!
    const result = fib_recursive(n);
    profile(.{ .n = result });  // ← Optimizer knows this is pure!

    return result;
}

Because profile() is visible and marked pure, the optimizer can now:

  • Memoize fibonacci results (tap doesn’t invalidate cache)
  • Inline the profile call (it’s just a print statement)
  • Eliminate the tap entirely in release builds (dead code elimination)

Result: Fibonacci goes from O(2^n) → O(n) with memoization, and the profiling tap costs ZERO in production!

Specific Optimizations That Now Work

1. Inline Tap Invocations

If a tap just logs a value, the optimizer can inline it:

// Before optimization
tap_log(value: result);

// After inlining
std.debug.print("Value: {}
", .{result});

2. Constant Fold Tap Arguments

If tap arguments are compile-time known:

// Before
tap_profile(operation: "add", value: 10 + 5);

// After constant folding
tap_profile(operation: "add", value: 15);

3. Dead Code Elimination

If a tap’s effects are unused, eliminate it entirely:

// If no profile handler is registered, optimizer removes the tap!
if (comptime !has_profile_handler()) {
    // tap_profile() eliminated - zero cost!
}

4. Conditional Tap Execution

Taps with when clauses become regular conditional blocks in the AST:

~compute -> *
| done when (done.value > 100) |> log_expensive()

Transforms to:

if (compute_result.done.value > 100) {
    log_expensive();
}

The optimizer can reason about the condition and potentially eliminate the branch.


Technical Deep Dive

Binding Rewrite

When a tap references the continuation binding, we rewrite it:

// Tap declaration uses 'd' binding
~add -> *
| done d |> log(value: d.result)

// User code uses 'result' binding
~add(x: 5)
| done result |> process(result)

The transformer rewrites tap arguments:

  • d.resultresult.result
  • Ensures tap code uses the actual continuation binding

Code (src/tap_transformer.zig:483):

fn rewriteBindingInValue(
    value: []const u8,
    old_binding: []const u8,
    new_binding: []const u8,
    allocator: std.mem.Allocator,
) ![]const u8 {
    // Replace "old_binding." with "new_binding."
    // Example: "d.result" → "result.result"
}

Conditional Blocks for When Clauses

Taps with when clauses are wrapped in conditional_block AST nodes:

// Tap with when clause
new_pipeline[idx] = ast.Step{ .conditional_block = .{
    .condition = rewritten_condition,  // "done.value > 100"
    .steps = rewritten_tap_steps,      // [log_expensive()]
}};

This is a regular AST node that:

  • The type checker validates
  • The optimizer can analyze
  • The emitter generates as if (condition) { ... }

Module Canonicalization

Before tap transformation, all event paths are canonicalized with module qualifiers:

// Before: Ambiguous
~add() | done |> ...

// After canonicalization: Unambiguous
~main:add() | done |> ...

This allows taps to match correctly across module boundaries:

~module:event -> *
| branch |> tap_handler()

Code (src/tap_transformer.zig:534):

fn pathToString(path: ast.DottedPath, allocator: std.mem.Allocator) ![]const u8 {
    // After canonicalization, module_qualifier is ALWAYS present
    const mq = path.module_qualifier orelse {
        return error.MissingModuleQualifier;  // Should never happen!
    };

    // Build: "module:event.path"
    return module_qualified_path;
}

The Vision: Everything Composes

This architectural change embodies a core Koru principle: the compiler is just event flows.

Before: Taps Were Special

Parser → AST → Analysis → [MAGIC TAP EMISSION] → Optimizer → Emitter
                                    ↑ Invisible to everything

After: Taps Are Regular Code

Parser → AST → Tap Transform → Analysis → Optimizer → Emitter
                     ↓              ↓           ↓
                 (AST nodes)  (validates) (optimizes)

Every pass sees the same thing. Taps are just event invocations. The optimizer works on real AST nodes, not emission-time ghosts.

This means:

  • Custom compiler passes can analyze taps
  • Optimization passes can transform taps
  • Type checker validates tap arguments
  • Everything is explicit, nothing is magic

Progressive Optimization + Taps = POWER

Combine this with progressive optimization and you get something extraordinary:

// ROM translation event with profile tap
~6502:LDA -> *
| done d |> profile_instruction(op: "LDA", cycles: d.cycles)

// Week 1: Interpreter baseline
~proc 6502:LDA {
    cpu.accumulator = read_memory(cpu.pc + 1);
    return .{ .done = .{ .cycles = 2 } };
}

// Week 4: Profiler shows "LDA from zero page is 60% of runtime"
// Add optimized variant:
~proc 6502:LDA|zig(zero_page_hot) {
    // Hand-written native code for hot path
    @as(*u8, @ptrFromInt(cpu.accumulator_addr)).* =
        @as(*u8, @ptrFromInt(cpu.zero_page_addr)).*;
    return .{ .done = .{ .cycles = 2 } };
}

The profile tap:

  1. Fires automatically (inserted by AST transformation)
  2. Costs nothing in release (optimizer eliminates when profiling is disabled)
  3. Guides optimization (shows you where to focus)
  4. Validates equivalence (both variants fire same taps, so property testing can compare)

Implementation Status

Completed (as of October 2025):

  • ✅ Core AST transformation pass
  • ✅ Binding rewrite system
  • ✅ Conditional block support for when clauses
  • ✅ Module canonicalization pass
  • ✅ Integration with compiler pipeline
  • ✅ Nested continuation support

In Progress:

  • 🔨 Tap matching improvements (destination event awareness)
  • 🔨 Property testing framework for optimization validation
  • 🔨 Profile-guided optimization hooks

Future:

  • 📋 Structural geohashing for tap call sites (refactor-resistant optimization tracking)
  • 📋 Multi-variant compilation with automatic property testing
  • 📋 Tap-based runtime verification system

The Lesson: Make Abstractions Visible

The old approach was magical - taps appeared at runtime without being in the source. Magic is fun, but it doesn’t compose.

The new approach is explicit - taps are AST nodes that every pass can see and work with. Explicitness enables composition.

Zero-cost abstraction isn’t just about optimizing the abstraction itself. It’s about making abstractions visible to the optimizer so they don’t block optimization of everything around them.

The critical insight: Opaque code isn’t just unoptimizable - it actively prevents optimization. When the optimizer can’t see what a function does, it has to assume the worst case, blocking CSE, loop hoisting, memoization, and inlining.

Making taps visible as AST nodes means:

  1. The tap itself can be optimized (inlined, eliminated, constant-folded)
  2. Code around the tap can ALSO be optimized (CSE, memoization, reordering)
  3. The optimizer can reason about purity and side effects
  4. Everything composes because everything is explicit

Koru’s event taps are now true zero-cost abstractions:

  • No runtime cost when optimized away
  • No conceptual cost because they’re explicit in the AST
  • No composition cost because every pass sees them
  • No optimization cost because they don’t block other optimizations


What’s Next?

This architectural foundation enables:

  1. Progressive Optimization: Use taps to profile hot paths, then optimize incrementally
  2. Property Testing: Validate optimized variants against baseline by comparing tap traces
  3. Runtime Verification: Keep lightweight taps in production to detect anomalies
  4. Emulation Performance: Translate ROM → Koru events → profile → optimize → native speed

Read more:


Koru: Where compiler coding is application coding, and everything is just event flows.

Published October 29, 2025