Taint Tracking, For Free

· 4 min read

Every secure code review starts with the same question. Some senior engineer points at a line and says “where did this value come from?” — and what they actually mean is is this tainted? Did it enter the program through a boundary the attacker controls? Has it been through a sanitizer? If yes, which one, and is it the right one for this sink?

The answer, in every language I know, is “I’ll have to read the code.” A few files, a few hops through function boundaries, sometimes a journey through a framework’s middleware stack. The information you actually want — did this string get sanitized between source and sink — isn’t in the type. It’s in the call graph. You reconstruct it by hand, every review, for every PR.

Languages have tried to fix this. Perl’s taint mode was 1993, runtime, dies on use. Ruby copied it. Research languages (FlowCaml, Jif) built whole information-flow type systems. Rust crates implement it with newtypes and discipline. None of these landed in mainstream practice because each one required adopting an entirely new dimension to the type system.

Koru got it for free this week. Phantom labels learned to ride on primitive types — that’s the whole change. Once a label can sit on []const u8, and the existing obligation marker ! already enforces “must be discharged before scope exit,” the two compose. The result is taint tracking.

What it looks like

Mark the boundary that produces tainted strings. Mark the sanitizer that consumes them. The flow type-checks only when the sanitizer sits between them.

~import std/io

~pub event get_input {}
| line []const u8<unsanitized!>

~get_input = line "user input data"

~event sanitize { input: []const u8<!unsanitized> }
| clean []const u8

~sanitize = clean input

~get_input()
| line s |> sanitize(input: s)
    | clean c |> std/io:print.ln(c)
Output
user input data

Two annotations doing the work:

  • | line []const u8<unsanitized!> — the input event produces a string with an obligation. The ! is the same obligation marker the phantom checker uses for *File<opened!> — the value cannot leave its scope with the obligation undischarged.
  • { input: []const u8<!unsanitized> } — the sanitize event declares it consumes the obligation. The ! flipped to the front means “discharge it on the way in.” Output is bare []const u8 — clean, can flow to any sink that accepts a plain string.

The flow between them — ~get_input | line s |> sanitize(input: s) | clean c |> print.ln(c) — is the entire safety statement. The compiler reads it as: produce a string with the <unsanitized!> obligation, hand it to a discharger, then use the clean output. Every link in that chain is type-checked.

What it catches

Skip the sanitizer.

~import std/io

~pub event get_input {}
| line []const u8<unsanitized!>

~get_input = line "user input data"

~get_input()
| line s |> std/io:print.ln(s)
Output
error[KORU030]: Resource 's' with phantom state <unsanitized!> was not discharged. No event accepts <!unsanitized>.
  --> auto_discharge:8:0

KORU030, at compile time. The tainted string flowed into a sink that doesn’t accept <!unsanitized>. No event in scope discharges the obligation. The binary won’t build. The endpoint can’t be deployed without this being fixed first.

This is the SQL-injection shape in miniature. A database driver’s query event declares { sql: []const u8<!unsanitized> }. A web framework’s request handler produces request.body | body []const u8<unsanitized!>. Wire them together without a sanitize in the middle and the application won’t compile. Not “fails a fuzzer.” Not “the WAF rejects it at runtime.” Doesn’t compile.

The same shape works for XSS (HTML output sinks demand <!html_escaped>), command injection (shell-invoke events demand <!shell_escaped>), path traversal (filesystem events demand <!path_canonicalized>), and any other “tainted source flows into sensitive sink” pattern. Library authors declare which obligations their event signatures consume; the compiler enforces it across every call site.

What changed

Nothing in the phantom checker. Nothing in the compiler. The taint shape was sitting on the other side of the change that let phantom labels ride on primitives — the post just before this one (Units of Measure, For Free) shows that change in the units-of-measure context. Same change, different domain.

The obligation marker ! already existed for resources. The phantom label mechanism already existed for state. Pointing them at []const u8 instead of *File was a carrier widening, not a new feature.

Three domains, one checker

DomainCarrierObligation?Example
Resources*Tyes*File<opened!>*File<!opened>
Unitsprimitivenof32<celsius>, i32<meter/second>
Taint[]const u8 (or any primitive)yes[]const u8<unsanitized!>[]const u8<!unsanitized>

Same phantom checker. Same auto-discharge inserter that ensures cleanup runs before scope exit. Same zero runtime cost — the labels are stripped before Zig emission; no taint flag is allocated; no runtime check fires. The checking happens once, at compile time, against the AST.

The thing nobody writing taint-tracking systems wants to admit is that taint tracking and resource cleanup are the same problem. A file that must be closed before scope exit and a string that must be sanitized before reaching a sink are the same shape: a value carries an obligation, the compiler enforces that an event discharges the obligation along every path that lets the value escape.

Phantom labels on primitives are the smallest piece of language design we shipped this week. Units of measure was one corollary. Taint tracking is another. The checker doesn’t care which.

Tests

Related