Koru's Interpreter is 5x Slower Than Python
We built an interpreter for Koru. Here are the numbers:
| Runtime | Time | Correct |
|---|---|---|
| Python | 26ms | Yes |
| Koru interpreter | 123ms | Yes |
Koru is ~5x slower than Python.
Python. The language famous for being slow. And we’re 5x slower than that.
This is embarrassing. We’re not going to pretend otherwise.
The Benchmark
Both runtimes execute the same task: nested for loops with 100,000 add operations.
Python:
global_sum = 0
def add(a: str, b: str):
global global_sum
global_sum += int(a) + int(b)
for _ in range(1000):
for i in range(100):
add("0", str(i))
# Result: 4,950,000 Koru (interpreted at runtime):
~for(0..1000)
| each _ |> for(0..100)
| each i |> add(a: "0", b: i)
| done |> _ Same syntax as compiled Koru. Same parser. Just interpreted instead of compiled.
Both produce 4,950,000. Both verified correct. We made sure we’re comparing apples to apples.
The API
Here’s a complete example — define events, register them, interpret a flow:
~import "$std/runtime"
~import "$std/interpreter"
~import "$std/io"
// Define your events
~pub event produce {}
| value { num: []const u8 }
~proc produce {
return .{ .value = .{ .num = "42" } };
}
~pub event consume { n: []const u8 }
| echoed { got: []const u8 }
// Pure Koru flow handler — no Zig proc needed!
~consume = std.io:print.ln("[consume] received n = '{{n:s}}'")
|> echoed { got: n }
// Register for runtime dispatch
~std.runtime:register(scope: "api") {
produce
consume
}
// The flow to interpret at runtime
const FLOW = "~produce() | value v |> consume(n: v.num)";
// Run it!
~std.interpreter:run(source: FLOW, dispatcher: dispatch_api)
| result r |> std.io:print.ln("Result: {{ r.value.branch:s }}")
| parse_error e |> std.io:print.ln("Parse error: {{ e.message }}") That’s it. ~40 lines. Define events, register them, interpret any flow string at runtime.
Why We Built It
Koru is a compiled language. Our compiled numbers are fast — Zig-level performance.
But sometimes you need to run code that doesn’t exist at compile time:
- User input — accept Koru flows over HTTP
- Configuration — load behavior from files
- Scripting — let users extend your application
- AI agents — generate and execute flows dynamically
So we built an interpreter. This is our first attempt.
What We Built in 24 Hours
Let’s be clear: this is a 24-hour effort to see what we could do. It works. It’s slow. There are known limitations.
String-Only Inputs
The current dispatcher only handles string inputs:
@field(input, field.name) = getArg(args, field.name) orelse ""; This means you can’t define ~event add { a: i32, b: i32 } for the interpreter. Everything has to be []const u8, parsed at runtime.
Both Python and Koru are doing string→int conversion in the benchmark, so it’s apples-to-apples. But native integer dispatch would be faster.
Where’s the Time Going?
1. Full Compiler Parser (~35μs per parse)
We’re using the exact same parser as the compiler. It’s built for:
- Full error recovery with detailed messages
- IDE support
- Complete AST with all metadata
Overkill for runtime interpretation.
2. AST Walking
Each continuation requires:
- Looking up the branch in the result
- Binding captured values to the environment
- Recursively executing the next flow
3. HashMap-Based Environment
Variable bindings use std.StringHashMap. Every lookup is a hash + comparison.
4. Dynamic Dispatch
Events dispatch through a registry. Flexible, but not free.
How We’ll Fix It
This is version one. Here’s what’s next:
Lightweight Runtime Parser
Build a fast parser that:
- Fails fast (no error recovery)
- Builds minimal AST
- Skips source location tracking
Pre-compiled Bytecode
Parse once, execute many:
~std.interpreter:compile(source: FLOW)
| bytecode bc |> cache.store(key: "my_flow", value: bc) Skip parsing entirely for repeated executions.
Environment Pooling
Reuse structures instead of allocating per-execution:
env.clear(); // Reset, don't reallocate Stack-Based Binding Lookup
For small scopes (most flows have fewer than 10 bindings), array scan beats hash lookup.
Native Integer Dispatch
Fix the dispatcher to handle typed inputs, not just strings. Avoid the parse/format overhead entirely.
Pre-allocated Flow State (The Big One)
This is Koru-specific and potentially game-changing.
When we parse a flow, we know ALL the bindings from the | branch binding |> syntax. And the dispatch table knows ALL the return types.
So instead of HashMap lookups at runtime:
// Current: slow
const user = env.get("user"); // hash, lookup, box/unbox We compute a memory layout at parse time:
// Proposed: fast
const user = @ptrCast(*User, state[user_offset..]); // direct memory One allocation for the entire flow. Direct offset access. No hashing. No string comparison. Native types.
Python can’t do this - it doesn’t know types until runtime. We know them at parse time from the dispatch table. This is a Koru superpower.
JIT Compilation
The nuclear option: detect hot flows, compile them at runtime.
The Honest Story
We built this in 24 hours. We have:
- A working interpreter — nested for loops, continuations, dispatch
- Same syntax as compiled Koru
- Verified correct results
- String-only inputs (limitation)
- Shamefully slow performance
We’re showing these numbers because we’re not hiding. This is day one. We’ll keep working on it.
The interpreter exists. It works. Now we make it fast.
The benchmark is in the test suite: 430_038_interpreter_nested_loop_benchmark