Taint Tracking, For Free
Every secure code review starts with the same question. Some senior engineer points at a line and says “where did this value come from?” — and what they actually mean is is this tainted? Did it enter the program through a boundary the attacker controls? Has it been through a sanitizer? If yes, which one, and is it the right one for this sink?
The answer, in every language I know, is “I’ll have to read the code.” A few files, a few hops through function boundaries, sometimes a journey through a framework’s middleware stack. The information you actually want — did this string get sanitized between source and sink — isn’t in the type. It’s in the call graph. You reconstruct it by hand, every review, for every PR.
Languages have tried to fix this. Perl’s taint mode was 1993, runtime, dies on use. Ruby copied it. Research languages (FlowCaml, Jif) built whole information-flow type systems. Rust crates implement it with newtypes and discipline. None of these landed in mainstream practice because each one required adopting an entirely new dimension to the type system.
Koru got it for free this week. Phantom labels learned to ride on primitive
types — that’s the whole change. Once a label can sit on []const u8, and
the existing obligation marker ! already enforces “must be discharged
before scope exit,” the two compose. The result is taint tracking.
What it looks like
Mark the boundary that produces tainted strings. Mark the sanitizer that consumes them. The flow type-checks only when the sanitizer sits between them.
~import std/io
~pub event get_input {}
| line []const u8<unsanitized!>
~get_input = line "user input data"
~event sanitize { input: []const u8<!unsanitized> }
| clean []const u8
~sanitize = clean input
~get_input()
| line s |> sanitize(input: s)
| clean c |> std/io:print.ln(c)user input dataTwo annotations doing the work:
| line []const u8<unsanitized!>— the input event produces a string with an obligation. The!is the same obligation marker the phantom checker uses for*File<opened!>— the value cannot leave its scope with the obligation undischarged.{ input: []const u8<!unsanitized> }— the sanitize event declares it consumes the obligation. The!flipped to the front means “discharge it on the way in.” Output is bare[]const u8— clean, can flow to any sink that accepts a plain string.
The flow between them — ~get_input | line s |> sanitize(input: s) | clean c |> print.ln(c) —
is the entire safety statement. The compiler reads it as: produce a string
with the <unsanitized!> obligation, hand it to a discharger, then use the
clean output. Every link in that chain is type-checked.
What it catches
Skip the sanitizer.
~import std/io
~pub event get_input {}
| line []const u8<unsanitized!>
~get_input = line "user input data"
~get_input()
| line s |> std/io:print.ln(s)error[KORU030]: Resource 's' with phantom state <unsanitized!> was not discharged. No event accepts <!unsanitized>.
--> auto_discharge:8:0KORU030, at compile time. The tainted string flowed into a sink that
doesn’t accept <!unsanitized>. No event in scope discharges the
obligation. The binary won’t build. The endpoint can’t be deployed without
this being fixed first.
This is the SQL-injection shape in miniature. A database driver’s query event declares { sql: []const u8<!unsanitized> }. A web framework’s
request handler produces request.body | body []const u8<unsanitized!>.
Wire them together without a sanitize in the middle and the application
won’t compile. Not “fails a fuzzer.” Not “the WAF rejects it at runtime.” Doesn’t compile.
The same shape works for XSS (HTML output sinks demand <!html_escaped>), command injection (shell-invoke events demand <!shell_escaped>), path traversal (filesystem events demand <!path_canonicalized>), and any other “tainted source flows into
sensitive sink” pattern. Library authors declare which obligations their
event signatures consume; the compiler enforces it across every call site.
What changed
Nothing in the phantom checker. Nothing in the compiler. The taint shape was sitting on the other side of the change that let phantom labels ride on primitives — the post just before this one (Units of Measure, For Free) shows that change in the units-of-measure context. Same change, different domain.
The obligation marker ! already existed for resources. The phantom label
mechanism already existed for state. Pointing them at []const u8 instead
of *File was a carrier widening, not a new feature.
Three domains, one checker
| Domain | Carrier | Obligation? | Example |
|---|---|---|---|
| Resources | *T | yes | *File<opened!> → *File<!opened> |
| Units | primitive | no | f32<celsius>, i32<meter/second> |
| Taint | []const u8 (or any primitive) | yes | []const u8<unsanitized!> → []const u8<!unsanitized> |
Same phantom checker. Same auto-discharge inserter that ensures cleanup runs before scope exit. Same zero runtime cost — the labels are stripped before Zig emission; no taint flag is allocated; no runtime check fires. The checking happens once, at compile time, against the AST.
The thing nobody writing taint-tracking systems wants to admit is that taint tracking and resource cleanup are the same problem. A file that must be closed before scope exit and a string that must be sanitized before reaching a sink are the same shape: a value carries an obligation, the compiler enforces that an event discharges the obligation along every path that lets the value escape.
Phantom labels on primitives are the smallest piece of language design we shipped this week. Units of measure was one corollary. Taint tracking is another. The checker doesn’t care which.
Tests
330_068_phantom_obligation_on_primitive_string— produce<unsanitized!>, sanitize via<!unsanitized>, print the clean output330_069_reject_undischarged_string_obligation— skip the sanitize, KORU030 at compile time
Related
- Units of Measure, For Free — the sibling post on what else the same change unlocked
- Phantom Types in Koru: Resource Safety Without Runtime Cost — the original obligation-marker post that taint tracking inherits its enforcement from