Koru's Interpreter is 5x Slower Than Python

· 5 min read

We built an interpreter for Koru. Here are the numbers:

RuntimeTimeCorrect
Python26msYes
Koru interpreter123msYes

Koru is ~5x slower than Python.

Python. The language famous for being slow. And we’re 5x slower than that.

This is embarrassing. We’re not going to pretend otherwise.

The Benchmark

Both runtimes execute the same task: nested for loops with 100,000 add operations.

Python:

global_sum = 0

def add(a: str, b: str):
    global global_sum
    global_sum += int(a) + int(b)

for _ in range(1000):
    for i in range(100):
        add("0", str(i))

# Result: 4,950,000

Koru (interpreted at runtime):

~for(0..1000)
| each _ |> for(0..100)
    | each i |> add(a: "0", b: i)
| done |> _

Same syntax as compiled Koru. Same parser. Just interpreted instead of compiled.

Both produce 4,950,000. Both verified correct. We made sure we’re comparing apples to apples.

The API

Here’s a complete example — define events, register them, interpret a flow:

~import "$std/runtime"
~import "$std/interpreter"
~import "$std/io"

// Define your events
~pub event produce {}
| value { num: []const u8 }

~proc produce {
    return .{ .value = .{ .num = "42" } };
}

~pub event consume { n: []const u8 }
| echoed { got: []const u8 }

// Pure Koru flow handler — no Zig proc needed!
~consume = std.io:print.ln("[consume] received n = '{{n:s}}'")
    |> echoed { got: n }

// Register for runtime dispatch
~std.runtime:register(scope: "api") {
    produce
    consume
}

// The flow to interpret at runtime
const FLOW = "~produce() | value v |> consume(n: v.num)";

// Run it!
~std.interpreter:run(source: FLOW, dispatcher: dispatch_api)
| result r |> std.io:print.ln("Result: {{ r.value.branch:s }}")
| parse_error e |> std.io:print.ln("Parse error: {{ e.message }}")

That’s it. ~40 lines. Define events, register them, interpret any flow string at runtime.

Why We Built It

Koru is a compiled language. Our compiled numbers are fast — Zig-level performance.

But sometimes you need to run code that doesn’t exist at compile time:

  • User input — accept Koru flows over HTTP
  • Configuration — load behavior from files
  • Scripting — let users extend your application
  • AI agents — generate and execute flows dynamically

So we built an interpreter. This is our first attempt.

What We Built in 24 Hours

Let’s be clear: this is a 24-hour effort to see what we could do. It works. It’s slow. There are known limitations.

String-Only Inputs

The current dispatcher only handles string inputs:

@field(input, field.name) = getArg(args, field.name) orelse "";

This means you can’t define ~event add { a: i32, b: i32 } for the interpreter. Everything has to be []const u8, parsed at runtime.

Both Python and Koru are doing string→int conversion in the benchmark, so it’s apples-to-apples. But native integer dispatch would be faster.

Where’s the Time Going?

1. Full Compiler Parser (~35μs per parse)

We’re using the exact same parser as the compiler. It’s built for:

  • Full error recovery with detailed messages
  • IDE support
  • Complete AST with all metadata

Overkill for runtime interpretation.

2. AST Walking

Each continuation requires:

  • Looking up the branch in the result
  • Binding captured values to the environment
  • Recursively executing the next flow

3. HashMap-Based Environment

Variable bindings use std.StringHashMap. Every lookup is a hash + comparison.

4. Dynamic Dispatch

Events dispatch through a registry. Flexible, but not free.

How We’ll Fix It

This is version one. Here’s what’s next:

Lightweight Runtime Parser

Build a fast parser that:

  • Fails fast (no error recovery)
  • Builds minimal AST
  • Skips source location tracking

Pre-compiled Bytecode

Parse once, execute many:

~std.interpreter:compile(source: FLOW)
| bytecode bc |> cache.store(key: "my_flow", value: bc)

Skip parsing entirely for repeated executions.

Environment Pooling

Reuse structures instead of allocating per-execution:

env.clear();  // Reset, don't reallocate

Stack-Based Binding Lookup

For small scopes (most flows have fewer than 10 bindings), array scan beats hash lookup.

Native Integer Dispatch

Fix the dispatcher to handle typed inputs, not just strings. Avoid the parse/format overhead entirely.

Pre-allocated Flow State (The Big One)

This is Koru-specific and potentially game-changing.

When we parse a flow, we know ALL the bindings from the | branch binding |> syntax. And the dispatch table knows ALL the return types.

So instead of HashMap lookups at runtime:

// Current: slow
const user = env.get("user");  // hash, lookup, box/unbox

We compute a memory layout at parse time:

// Proposed: fast
const user = @ptrCast(*User, state[user_offset..]);  // direct memory

One allocation for the entire flow. Direct offset access. No hashing. No string comparison. Native types.

Python can’t do this - it doesn’t know types until runtime. We know them at parse time from the dispatch table. This is a Koru superpower.

JIT Compilation

The nuclear option: detect hot flows, compile them at runtime.

The Honest Story

We built this in 24 hours. We have:

  • A working interpreter — nested for loops, continuations, dispatch
  • Same syntax as compiled Koru
  • Verified correct results
  • String-only inputs (limitation)
  • Shamefully slow performance

We’re showing these numbers because we’re not hiding. This is day one. We’ll keep working on it.

The interpreter exists. It works. Now we make it fast.


The benchmark is in the test suite: 430_038_interpreter_nested_loop_benchmark