The Producer Doesn't Need To Know
The Producer Doesn’t Need To Know
The benchmark results that started this conversation:
Koru Taps: 8.5ms
Zig MPMC Ring: 88ms
Go Channels: 681ms
Rust Crossbeam: 157ms Event taps are 90% faster than a hand-tuned lock-free MPMC ring.
Before you tweet that: this is not an apples-to-apples comparison. And that’s exactly the point.
The Honest Disclaimer
Let’s be completely transparent about what these benchmarks actually measure:
| Taps Version | Ring Version |
|---|---|
| Single-threaded | Multi-threaded |
| No synchronization | Lock-free atomics |
| Inline function calls | Cross-thread communication |
| ~80 lines of Koru | ~230 lines of Koru+Zig |
The taps version doesn’t DO what the ring version does. It’s like saying “a bicycle is faster than a truck” - true, but you can’t haul cargo on a bicycle.
So why are we even comparing them?
They Solve The Same Pattern
Both implementations solve the Producer/Consumer Observation Pattern:
- Something happens (producer emits values)
- Something else reacts (consumer accumulates them)
- At the end, we validate the result
The ring version assumes you need:
- Thread separation
- Buffering for backpressure
- Lock-free synchronization
- Cross-thread communication
But what if you don’t? What if your observer can run inline? What if you just want to watch what’s happening without introducing all that machinery?
The insight: We’ve been conflating “observation” with “transport” for decades.
The Revolutionary Inversion
Here’s how traditional concurrent programming works:
Producer decides → "I'll put values on a channel"
Consumer must adapt → "Okay, I'll read from that channel"
Want to change transport? → Rewrite the producer! This couples your producer to a specific communication mechanism. Every producer has to decide: channels? rings? callbacks? message queues?
Koru taps invert this entirely:
Producer just emits → ~count() returns .next or .done
Observer decides → "I'll tap that inline / via threadpool / via channel"
Want to change transport? → Change the tap, not the producer! Here’s what this looks like in practice:
// The producer - IDENTICAL in all cases
~event count { i: u64 }
| next { value: u64 }
| done {}
~proc count {
if (i >= MESSAGES) return .{ .done = .{} };
return .{ .next = .{ .value = i } };
}
// Observer Option A: Inline (what we benchmarked - zero overhead)
~count -> * | next v |> accumulate(value: v.value)
// Observer Option B: Threadpool (async processing)
~count -> * | next v |> pool.submit(work: v)
// Observer Option C: Channel (buffered, decoupled)
~count -> * | next v |> ring.enqueue(value: v.value) The producer code is identical in all three cases. The observer makes the transport decision.
Back To The Benchmark
Now the benchmark makes sense. We compared:
- Taps version: Observer says “I’ll just accumulate inline”
- Ring version: Observer says “I need cross-thread buffering”
The taps version is faster because it’s doing less work - and that’s exactly right. When you don’t need threading, you shouldn’t pay for it.
The benchmark isn’t proving taps are “better than channels.” It’s demonstrating what happens when the observer can choose the minimum viable transport.
Observation Fidelity: Choose Your Level
Taps don’t just let you choose transport. They let you choose how much information you want to observe:
// TRANSITION: Just the facts (source, destination, branch as enums)
// Zero string allocations, maximum performance
~count -> * | Transition t |> stats.increment(event: t.source)
// PROFILE: Timing information (strings, timestamps)
// For profiling and tracing
~count -> * | Profile p |> trace.record(name: p.source, ts: p.timestamp_ns)
// AUDIT: Full payload access (complete event data)
// For logging, debugging, event sourcing
~count -> * | Audit a |> log.write(event: a.source, data: a.payload) This creates a 2D matrix of observation strategies:
| Inline | Threadpool | Channel | |
|---|---|---|---|
| Transition | Counters | Async metrics | Buffered stats |
| Profile | Inline profiler | Async tracing | Log aggregation |
| Audit | Debug logging | Async audit | Full event sourcing |
The producer is unaware of ALL of this.
Real Example: Full Profiler in 3 Taps
Here’s profiler.kz from the Koru standard library - a complete Chrome Tracing profiler:
// Start profiling when program starts
~[opaque]tap(koru:start -> *)
| done |> write_header()
| done |> _
// Profile EVERY event transition in the entire program
~[opaque]tap(* -> *)
| Profile p |> write_event(source: p.source, timestamp_ns: p.timestamp_ns)
| done |> _
// End profiling when program ends
~[opaque]tap(koru:end -> *)
| done |> write_footer()
| done |> _ Three taps. That’s the whole profiler.
The ~* -> * syntax means “tap ALL event transitions.” Every event in your entire program fires through this tap.
Usage:
~import "$std/profiler"
// Your entire app is now profiled for Chrome Tracing The application code doesn’t know it’s being profiled. The profiler just observes.
Opting Out: [opaque]
Notice the [opaque] annotation on the profiler taps. This serves two purposes:
Prevents infinite recursion - The profiler’s own events (
write_header,write_event,write_footer) can’t be tapped by other observers, including itself.Privacy control - Any event can opt out of being observed:
// This event cannot be tapped by external observers
~[opaque] event internal_crypto_operation { key: []u8 } Use cases:
- Security-sensitive operations
- Performance-critical hot paths
- Internal implementation details
- Preventing observation loops
Why Taps Are Truly Zero-Cost: AST Rewriting
This isn’t some runtime hook system. Taps rewrite your AST at compile time.
When you write:
~count -> * | next v |> accumulate(value: v.value) The compiler’s tap transformer pass inserts the tap invocation directly into the flow’s AST:
BEFORE tap injection:
~count() | next n |> @loop(...)
AFTER tap injection:
~count() | next n |> accumulate(value: n.value) |> @loop(...)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
COMPILER INSERTED THIS This means taps:
- Participate in all optimization passes (purity checking, fusion, dead code elimination)
- Are visible to the type checker (arguments are validated)
- Can be inlined (the optimizer sees them as regular code)
- Are eliminated when unused (dead code elimination works normally)
Compare this to traditional observer patterns where callbacks are opaque function pointers the compiler can’t reason about.
The Generated Code
Here’s what our tap benchmark actually compiled to (from output_emitted.zig):
var loop_i: u64 = 0;
var result_1 = main_module.count_event.handler(.{ .i = loop_i });
loop: while (result_1 == .next) {
const n = result_1.next;
// TAP INLINED HERE - just a function call!
const result_2 = main_module.accumulate_event.handler(.{ .value = n.value });
_ = &result_2;
loop_i = n.value + 1;
result_1 = main_module.count_event.handler(.{ .i = loop_i });
continue :loop;
}
// TAP FOR DONE BRANCH
_ = main_module.validate_event.handler(.{}); No vtables. No dispatch. No runtime registration. Just a function call that the Zig optimizer can inline further.
Greppability and Tooling
Every tap in your codebase is discoverable:
# Find all taps
grep '~.*->' **/*.kz
# Find all opaque events
grep '[opaque]' **/*.kz
# Find all taps on a specific event
grep '~count ->' **/*.kz Because taps are AST nodes, tooling can:
- Show “who’s watching this event?”
- Warn about unused taps
- Verify tap chains for correctness
- Visualize observation topology
The Bigger Picture
The benchmark wasn’t about proving taps are “faster than channels.” It was about demonstrating a different way of thinking about producer/consumer relationships.
Traditional: Producer picks the transport. Consumer adapts. Koru: Producer emits events. Observer picks everything.
When you separate observation from transport:
- You can start with inline taps (maximum performance)
- Add threading only when profiling shows you need it
- The producer code never changes
- Observation strategies are greppable and explicit
What This Enables
- Progressive Optimization: Start inline, add threading only for proven hot paths
- Flexible Profiling: Profile everything with
~* -> *, cost nothing in release - Clean Architecture: Producers focus on logic, observers handle cross-cutting concerns
- Security Boundaries:
[opaque]for events that shouldn’t be observable
Conclusion
The headline “taps are 90% faster than rings” is technically true but misses the point.
The real story is: the producer doesn’t need to know.
It doesn’t need to know if you’re observing. It doesn’t need to know how you’re observing. It doesn’t need to know what transport you’re using. It doesn’t need to know anything about your observation strategy.
That’s not just faster. That’s a fundamentally different way to think about concurrent systems.
Published November 21, 2025
Koru: Where the observer decides, and the producer just produces.