Your Regex Is a Branch, Not a Library Call

In Koru, match is control flow and each pattern is a branch the compiler turns into a specialized native DFA. Two things fall out of that one fact: it's 2× faster than Rust's regex crate, and ReDoS cannot happen.

· 13 min read

Here is a regex match in basically every language you’ve used:

const re = /[a-z]+@[a-z]+/;
if (re.test(input)) { ... }

You build an object. At runtime it parses the pattern, builds some internal machine, and walks your input through it. The pattern is data that a general engine interprets.

Here is the same thing in Koru:

import std/regex
import std/io

std/regex:match("foo@bar")
| `[a-z]+@[a-z]+` _ |> std/io:print.ln("matched-email")
| `[0-9]+`        _ |> std/io:print.ln("matched-number")
| no-match          |> std/io:print.ln("no-match")

That isn’t a function call with a string argument. match is control flow — the same shape as any other branching construct in the language — and each backtick pattern is a branch. The compiler reads those patterns at compile time and turns each one into its own specialized native matcher, baked straight into the dispatch. There is no Regex object at runtime. There is no pattern stored anywhere. There is no general engine interpreting anything.

This is the whole post. Everything below is a consequence of that one move.

(The snippet above is tests/regression/600_STDLIB/640_REGEX/640_001_match_basic, verbatim, green. Its expected output is matched-email.)

What the compiler actually does

When the std/regex transform sees a match, it walks the branches and hands each backtick pattern to an in-house regex engine. That engine is small and it is honest about what it is:

pattern string → regex AST → NFA (Thompson) → DFA (subset construction)
              → emitted as a specialized native matcher

Parser, NFA builder, subset construction, code emitter — about 1,200 lines of Zig, 25 unit tests, zig build test-regex green. It’s the same family as RE2, the Rust regex crate, and Go’s regexp: a finite automaton, not a backtracker. The difference is when it runs. Theirs build the automaton at program startup. Koru’s builds it during compilation and emits the result as a function — a flat table walk, one branch per input byte, that the Zig optimizer then inlines into your dispatch like any other code.

So at runtime, std/regex:match("foo@bar") is not “look up a regex and run it.” It’s a handful of specialized functions that exist because of the patterns you wrote, called in first-match-wins order, falling through to no-match. The patterns have evaporated into native code.

Consequence A: it’s 2× faster than Rust

We benchmarked it. The workload: classify 3,000,000 inputs (rotating through an email, a number, and a non-matching string) against full-match patterns [a-z]+@[a-z]+ then [0-9]+, first match wins, count the buckets. Every language runs the same dispatch shape; the runtime engines hoist their compiled patterns out of the loop, as is idiomatic, so we’re measuring steady-state per-dispatch cost, not startup.

LanguageEngineMedian wallvs Koru
Korucompile-time DFA, native45 ms1.0×
Rustregex crate (linear, SIMD prefilters)93 ms2.0×
JavaScriptV8 Irregexp151 ms3.4×
Goregexp (RE2)324 ms7.2×
Pythonre917 ms20×

That’s about 15 nanoseconds per dispatch for Koru — a table walk plus a branch — against 31 ns for the Rust regex crate, which is genuinely fast and genuinely linear.

Be clear about why Koru wins here, because it’s not magic and it’s not “Koru beats Rust.” Koru wins because it does strictly less. The Rust crate is a general engine: it does search, captures, Unicode, patterns it only learns at runtime. The matcher this workload exercises is full-match, ASCII, capture-free, and the patterns are known at compile time — so it can monomorphize each one into straight-line code with no engine indirection and no pattern it has to stay flexible about. Compile-time specialization beats a runtime engine 2:1 here precisely because Koru threw away the generality this job wasn’t using. That’s the trade, stated plainly.

Consequence B: ReDoS cannot happen

This is the one that matters more than the speed.

Take the textbook catastrophic pattern, (a+)+b, and feed it a string of n as with no b. A backtracking engine tries every way to split the a-run between the two nested quantifiers before it can conclude “no match” — an exponential number of ways. The wall time goes vertical:

n (length of the a-run)JavaScript (V8)Python (re)KoruRust
1081 ms42 ms
20120 ms73 ms
251.36 s1.03 s
289.9 s7.9 s
3039.7 s(didn’t run)26 ms25 ms
1,000,00024 ms25 ms
10,000,00030 ms30 ms

Read the bottom-right corner against the JavaScript column. Koru full-matches ten million characters in less time than V8 needs to fail on thirty. Those 26–30 ms are almost entirely process startup — the match itself never registers. The Koru and Rust lines are flat from thirty characters to ten million.

Rust and Go are flat here too, and they earned it the hard way: their engines are deliberately engineered around backtracking blowup at runtime. Koru is flat for a different reason. It’s flat by construction — there is no backtracking code path to fall into, because the emitted matcher is a DFA table walk and a DFA has exactly one next state per byte. The blowup isn’t avoided; it’s unrepresentable.

And here’s the design consequence that makes this stick: the engine rejects backreferences. Not “doesn’t support yet” — rejects, on purpose. A backreference is the feature that forces backtracking and reopens the catastrophic case. Excluding it is the same decision as the DFA: the regular subset is exactly the subset that’s safe, and Koru stays inside it. You cannot write the ReDoS pattern in the first place.

It nests, because it’s just control flow

If match is real control flow, it should compose with the rest of the language’s control flow. It does:

import std/regex
import std/io

for(0..3)
! each _ |> std/regex:match("foo@bar")
            | `[a-z]+@[a-z]+` _ |> std/io:print.ln("hit")
            | no-match          |> std/io:print.ln("miss")

A match sitting inside a for, each iteration dispatching through the compiled DFAs. Output: hit three times. This is 640_002_match_nested_in_for, and it’s also the exact shape of the throughput benchmark’s hot loop. Getting here took a real design step — match declares its user surface as an ordinary event contract: any raw-name branch with a bindable payload, plus an optional void no-match. The checker validates a nested match exactly the way it validates a top-level one, exactly the way it validates any event: against the declared branches. There is no “trust the transform” exemption anywhere in the pipeline. The point: match isn’t a special form bolted on; it earns its place by composing like everything else.

The match is a value, not just a destination

A branch that fires because something matched should be able to hold the thing that matched. It does — the pattern branch’s binding receives the matched text:

import std/regex
import std/io

std/regex:match("foo@bar")
| `[a-z]+@[a-z]+` m |> std/io:print.ln("matched: {{ m:s }}")
| `[0-9]+` _ |> std/io:print.ln("number")
| no-match |> std/io:print.ln("no-match")

Prints matched: foo@bar (this is 640_003_match_binding, green in the suite). And notice something about every Koru snippet in this post: they’re .k files — pure Koru, top to bottom. Koru programs that mix in host-language code mark the boundary with a ~ host-switch glyph; none of these programs have one, because nothing here needs the host. Regex dispatch, iteration, binding, printing — not one escape. Bind the text with a name, or discard it with _ — the linear rule demands one or the other, because a payload you silently ignore is a bug you haven’t met yet. no-match carries nothing, so it binds nothing: the same rule’s void half.

Under full-match semantics the matched text is the whole input, so the binding is a fat-pointer copy of a slice that already exists — zero allocation, zero bytes copied. The benchmark agrees: binding the match in every branch and accumulating its length on all three million hits moves the median from 45.6 ms to 47.0 ms, and that delta is the added accumulation work, not the binding. Still under half of Rust’s 92.7 ms for the discard-only version of the workload.

The pattern is the payload schema

Whole-input text is the weak version. The strong version: a pattern with named groups delivers structured data into the branch, and the destructure at the binding position is where you say what shape you want back.

import std/io
import std/regex

std/regex:match("2x3x4")
| `(?<l>[0-9]+)x(?<w>[0-9]+)x(?<h>[0-9]+)` { l: i64, w: i64, h: i64 } |> std/io:print.ln("{{ l:d }} {{ w:d }} {{ h:d }}")
| no-match |> std/io:print.ln("no")

Prints 2 3 4. Look at what the branch head is doing. The pattern names three groups — l, w, h — and the { l: i64, w: i64, h: i64 } is the same shape-destructure Koru uses on any event payload. The pattern is the payload schema. And the i64 on each field is not an assertion that the text happens to be a number — it’s a conversion. The matched text dissolves into an actual integer at the splice, so inside the body l, w, and h are numbers you can do arithmetic on, not strings you have to parse in a second step. (This is 640_005_match_groups_destructure, verbatim, green.)

Don’t want to name every field? Bind the whole payload and read the groups off it as fields — untyped, so they stay as matched slices:

std/regex:match("21,9")
| `(?<row>[0-9]+),(?<col>[0-9]+)` pos |> std/io:print.ln("row={{ pos.row:s }} col={{ pos.col:s }}")
| no-match |> std/io:print.ln("no")

Prints row=21 col=9 (640_006_match_groups_whole_binding). Same pattern, two ways to receive it — destructure the fields you want with the types you want, or take the struct whole.

Because the transform owns the pattern and the destructure at the same site, it checks them against each other and the match is strict, both ways: every named group must be destructured and every destructured field must name a real group. Capture a group you don’t consume and it’s a compile error, not a silent leak — `(?<l>...)x(?<w>...)` { l: i64 } is rejected because w was captured and never destructured (640_007_reject_groups_field_mismatch). If you want the structural grouping without the capture, spell it bare: (...) groups for the engine, (?<name>...) captures for you.

Two restrictions ride along with captures, and they’re the same doctrine as the backreference rejection — refuse the cases where a captured span would be a lie. A group under a quantifier (`(?<x>a)+`) has no single span — which a did you mean? — so it’s rejected (640_008). A group under alternation (`a|(?<x>b)`) is empty on the branch that wasn’t taken, and “empty when skipped” is exactly the silent hole; rejected too (640_009). Loud at the match site, both of them.

Captures are fast for the same reason

Pulling three integers out of a string three million times — the same LxWxH pattern, every language matching the named groups and parsing them to ints and summing the volumes:

LanguageEngineMedian wallvs Koru
Korucompile-time tagged DFA, native111 ms1.0×
Rustregex crate (named captures)301 ms2.7×
JavaScriptV8 Irregexp324 ms2.9×
Goregexp (RE2)401 ms3.6×
Pythonre1,680 ms15×

Same shape as the discard benchmark, same winner — but notice the lead widened. Plain matching was 2.0× Rust; with captures it’s 2.7×. Captures are where Koru pulls further ahead, and that’s the opposite of what you’d expect, because captures are exactly where this design should have broken.

To know where each group matched and not just whether the input matched, the textbook move is to simulate the NFA thread by thread, carrying capture positions along — which is the per-state-per-byte work a DFA throws away to go fast. The first cut of this did exactly that, and on the benchmark above it was the slowest language here — behind V8, behind Go. A capture engine that falls back to NFA simulation isn’t the compile-time story; it’s a runtime engine wearing a hat.

Koru doesn’t have to simulate, and the reason is a restriction it already imposes. Because a capture group can’t sit under a quantifier or an alternation, every group is a clean once-entered bracket — so where each one opens and closes is structurally determined, with no ambiguity left for the engine to resolve at runtime. That is precisely the condition under which subset construction can bake the capture bookkeeping straight onto the DFA’s transitions (RE2 calls it the “one-pass” case). So (?<l>[0-9]+)x(?<w>[0-9]+)x(?<h>[0-9]+) compiles to the same flat table walk as [a-z]+@[a-z]+ did — one transition per byte — that also writes down six byte offsets as it passes them. The restriction that looked like a limitation is what buys the speed. The i64 conversions are the only runtime arithmetic left on top.

(Benchmark, all five implementations, and raw CSV: benchmarks/workloads/regex_match_groups. Same machine and method as the discard numbers above.)

What it can’t do yet — said plainly

This is cut 2. The honest gaps:

  • Full-match only. The whole input must match the pattern. No search, no “find the first occurrence.” ^ and $ parse but are no-ops, because the match is already whole-string-anchored.
  • Captures are full-span only. Named groups extract and convert (above), but a group under a quantifier or under an alternation is rejected — those are the cases where a captured span isn’t a single well-defined slice. No group-counted repetition extraction yet.
  • No backreferences — and as above, that one is permanent by design, not a TODO.
  • ASCII. No Unicode classes yet.

None of that is hidden behind a flag or a “coming soon” page. The grammar is a clean regular subset — alternation, concatenation, the three quantifiers, character classes, ., grouping, and named-group capture — and it is complete and correct within that subset. Search semantics are a later cut, and they’ll be built the same way: as automaton states, never as a backtracking escape hatch.

The shape of the idea

Treating a backend as a compile-time target instead of a runtime dependency is the recurring Koru move — we’ve made the same argument about JavaScript and about the compiler being just event flows. Regex is the same idea aimed at a thing nobody thinks of as a backend. Your patterns are known when you compile. So compile them. What you get back is a branch that runs at native speed and a whole category of vulnerability that you are structurally unable to write.

Numbers measured on an Apple Silicon M-series Mac, Zig 0.15.2, all AOT languages in ReleaseFast. Workloads, raw CSVs, and the green regression tests are in the repo.