Orisha: Faster Than nginx (The Honest Benchmark)
Earlier today we posted about comptime AST injection - how any [comptime] event can walk the full program AST. That was the foundation. This is what we built with it.
Orisha, a compile-time web framework, serves static files faster than nginx - even when nginx has all the optimizations enabled and all cores available.
Here are the numbers:
Per-thread comparison (1 worker vs 1 thread, nginx with optimizations):
| Connections | Orisha (1 thread) | nginx optimized (1 worker) | Ratio |
|---|---|---|---|
| 50 | 138,157 req/s | 70,989 req/s | 1.95x |
Real-world comparison (Orisha 1 thread vs nginx all cores, nginx with optimizations):
| Connections | Orisha (1 thread) | nginx optimized (14 workers) | Ratio |
|---|---|---|---|
| 10 | 131,317 req/s | 90,839 req/s | 1.45x |
| 50 | 138,157 req/s | 116,642 req/s | 1.18x |
UPDATE: Multi-threaded comparison (Orisha 4 workers vs nginx all cores):
| Connections | Orisha (4 workers) | nginx optimized (all cores) | Ratio |
|---|---|---|---|
| 200 | 148,359 req/s | 98,180 req/s | 1.51x |
Read that again: Orisha with 4 workers beats fully-optimized nginx with sendfile, tcp_nodelay, tcp_nopush, and open_file_cache enabled across all cores by 51%.
Same hardware. Same content. Both serving a 2,807-byte HTML file with keep-alive connections.
The Architecture
Here’s a typical web server at runtime:
Request arrives
→ Parse HTTP headers
→ Extract path
→ Check filesystem for file
→ Read file (even if cached, still a syscall)
→ Format HTTP headers (Content-Type, Content-Length, ETag...)
→ Concatenate headers + body
→ Write to socket Here’s Orisha:
Request arrives
→ Parse path (one strstr)
→ Lookup pre-computed blob
→ Write to socket That’s it. The HTTP response - status line, headers, body - is a single compile-time constant. No formatting. No file reads. No string concatenation. Just write(fd, blob, len).
How It Works
Routes are declared with [norun] events (metadata-only, no codegen):
~[norun]pub event route { route: Expression, source: Source }
~route(GET /) {
"file": "public/index.html"
}
~route(GET /about) {
"file": "public/about.html"
}
~route(GET /api/health) {
"content-type": "application/json",
"body": "OK"
} At compile time, a [comptime] event walks the AST and collects these:
~[comptime] proc collect_routes {
for (program.items) |item| {
if (item == .flow) {
const inv = item.flow.invocation;
if (isRouteEvent(inv)) {
// Parse config from Source block
const config = parseConfig(inv);
// Read file content
const content = std.fs.cwd().readFileAlloc(
allocator, config.file, 10_000_000
);
// Compute SHA256 etag
var hash: [32]u8 = undefined;
Sha256.hash(content, &hash, .{});
const etag = hexEncode(hash[0..16]);
// Build COMPLETE HTTP response
const blob = buildHttpResponse(content, etag);
routes.append(.{
.method = method,
.path = path,
.response = blob
});
}
}
}
// Generate routes.zig with all blobs embedded
generateRoutesFile(routes);
} The generated routes.zig looks like:
pub const routes = [_]Route{
.{
.method = "GET",
.path = "/",
.response = "HTTP/1.1 200 OK\r\n" ++
"Content-Type: text/html; charset=utf-8\r\n" ++
"Content-Length: 2807\r\n" ++
"ETag: "f49fe6a903aeffc0"\r\n" ++
"Connection: keep-alive\r\n" ++
"\r\n" ++
"<!DOCTYPE html>..." // 2807 bytes of HTML
},
// ... more routes
};
pub fn lookup(method: []const u8, path: []const u8) ?[]const u8 {
for (routes) |route| {
if (eql(route.method, method) and eql(route.path, path)) {
return route.response;
}
}
return null;
} At runtime, the server just:
const response = routes.lookup("GET", path) orelse not_found_blob;
_ = posix.write(fd, response); One syscall. Zero formatting. The bytes are just… there.
The Server
The runtime is a tight kqueue event loop:
while (true) {
const n = kevent(kq, &events);
for (events[0..n]) |event| {
if (event.fd == listen_fd) {
// Accept new connection
const conn = server.accept();
// TCP_NODELAY - critical for latency
setsockopt(conn.fd, TCP_NODELAY, 1);
// Register for read events
kevent_add(kq, conn.fd, EVFILT_READ);
} else {
// Client request
const request = read(event.fd, &buffer);
const path = extractPath(request);
const response = routes.lookup("GET", path);
write(event.fd, response);
// Keep-alive: wait for next request
kevent_add(kq, event.fd, EVFILT_READ);
}
}
} Key optimizations:
- TCP_NODELAY: Disable Nagle’s algorithm. Send immediately.
- Keep-alive: Reuse TCP connections. No handshake per request.
- ONESHOT events: Re-register after each request to avoid thundering herd.
- Pre-computed blobs: Zero runtime work per response.
Why nginx Can’t Do This
nginx is highly optimized. It uses sendfile(), it caches aggressively, it’s been tuned for 20 years. But it has a fundamental constraint: it doesn’t know your files at compile time.
nginx at runtime must:
- Check if the file exists
- Get file metadata (size, mtime for ETag)
- Format HTTP headers
- Either read the file or use sendfile()
Even with kernel page cache, there’s overhead. File descriptors. Stat calls. String formatting.
Orisha compiles the entire response into the binary. The ETag is pre-computed. The Content-Length is pre-computed. The headers and body are concatenated at compile time into a single string literal.
There’s no work left to do at runtime.
Why This Is Possible
Zig has comptime, but it’s intentionally sandboxed - no file I/O, no network, no side effects. This is a design choice for reproducibility and security.
Koru’s comptime is different. It’s full Zig runtime - not Zig comptime. A [comptime] proc can:
- Read files from disk
- Make network calls
- Compute SHA256 hashes
- Generate code
- Do anything Zig can do
The trade-off: your build depends on filesystem state. But for a web server that embeds its content, that’s exactly what you want.
The Heritage
This architecture didn’t appear from nowhere. It’s the culmination of months of research:
- beist-zig - Cell-based architecture with context injection and zero-cost composition. Proved that “no imports” isolation works.
- beist-image - Template-driven container generation. Proved that sub-100KB Docker images serving 65K+ req/s is achievable.
- beist-zig-http-perf - Performance testing ground. Proved zero-allocation HTTP handling.
Each project validated a piece of the puzzle. But they all required external tooling: Node.js generators, Liquid templates, CLI orchestration, build pipelines.
Koru collapses all of that into one compiler. The [comptime] system replaces template engines. AST walking replaces metadata scanning. Source blocks replace config files. What took a toolchain now takes a proc.
Benchmark Methodology
For reproducibility, here’s exactly how we tested:
Hardware: MacBook (Apple Silicon), macOS
Tool: wrk - modern HTTP benchmarking tool
Commands:
# 1 connection
wrk -t1 -c1 -d10s http://localhost:3000/
# 10 connections
wrk -t2 -c10 -d10s http://localhost:3000/
# 50 connections (extended test)
wrk -t2 -c50 -d20s --latency http://localhost:3000/ nginx configuration (fully optimized):
worker_processes 1; # For per-thread comparison
# OR
worker_processes auto; # For real-world comparison (14 workers on test machine)
daemon off;
error_log /dev/null;
events {
worker_connections 1024;
use kqueue;
}
http {
access_log off;
# TCP optimizations
tcp_nodelay on;
tcp_nopush on;
# File serving optimizations
sendfile on;
# File descriptor cache
open_file_cache max=1000 inactive=20s;
open_file_cache_valid 30s;
open_file_cache_min_uses 2;
open_file_cache_errors on;
server {
listen 3001;
location / {
root /path/to/public;
index index.html;
}
}
} We ran both configurations with all optimizations enabled. This is nginx at its best for static file serving.
Content: Both servers serving identical 2,807-byte HTML file.
Caveats (in the interest of honesty)
Orisha multi-threading not tested:UPDATE: Multi-threading now implemented! 4 workers with independent kqueue instances, round-robin connection distribution. Results above.- Localhost only: This is loopback benchmarking on macOS. Real network latency and NIC behavior would be different.
- Tiny file bias: 2.8KB files favor the pre-computed blob approach. Larger files might shift toward nginx’s
sendfile/zero-copy path.
Orisha configuration: Multi-threaded (4 workers with independent kqueue instances), TCP_NODELAY enabled, HTTP keep-alive enabled.
Verification
We verified this is real, not a benchmark artifact:
# During 136K req/s stress test, make manual requests
$ curl http://localhost:3000/
<!DOCTYPE html>
<html>
<head>
<title>Orisha - Koru Web Framework</title>
... Responses are correct. Content is complete. No errors in wrk output. The server actually serves pages under load.
Latency comparison:
| Percentile | Orisha | nginx |
|---|---|---|
| 50th | 351μs | 970μs |
| 99th | 733μs | 4,990μs |
Orisha’s 99th percentile is better than nginx’s median.
What’s Next
This is a static file server. Impressive, but limited. The roadmap:
Smarter route tables
Right now, route lookup is a linear scan. For 3 routes, who cares. For 300 routes, we’d want:
- Perfect hash maps - Generate a minimal perfect hash at compile time. O(1) lookup.
- Static string maps - Zig’s
std.StaticStringMapwith compile-time optimization. - Log-driven reordering - Feed HTTP access logs into the build. Put hot routes first. The 90th percentile of traffic often hits 3 routes - make those a single comparison.
This is something nginx can’t do. nginx discovers routes at runtime from config files. Orisha knows every route at compile time and can optimize the lookup structure accordingly.
Compile-time gzip
Pre-compress files at build time. Serve even smaller blobs. Should widen the gap further.
Multi-threading ✅ Done!
Currently single-threaded. UPDATE: Now running with 4 workers, each with independent kqueue instance. Main thread handles accept() and distributes connections round-robin. Result: 148K req/s - 51% faster than nginx on all cores.
Dynamic content
The real goal: Koru flows handling requests at runtime.
~route(GET /api/users/:id)
| request r |>
db:query(sql: "SELECT * FROM users WHERE id = ?", params: [r.id])
| rows data |> respond(json: data)
| error e |> respond(status: 500, body: e.msg) Docker integration
Declare your container image in Koru:
~docker:image {
FROM scratch
COPY ./a.out /server
EXPOSE 3000
ENTRYPOINT ["/server"]
} Compile to a FROM scratch image containing just the binary. Based on our work with beist-image, these images can be under 100KB - orders of magnitude smaller than Alpine-based containers.
The Lesson
Everyone serves static files with nginx. It’s the default. But nginx doesn’t know your content at compile time - it discovers files at runtime.
When you control the compiler, you can move work from runtime to compile time. And compile time is free - it happens once, on your machine, before deployment.
The fastest code is the code that doesn’t run. Orisha just proved it.
The Thesis
After publishing this post, we got feedback that crystallized what we’re actually doing:
“Performance comes from deleting runtime decisions, not accelerating them.”
That’s it. That’s Koru.
We’re not trying to win “fastest for-loop.” We’re winning “least work per request.” The advantage isn’t instruction throughput - it’s that:
- The compiler knows the app shape
- The runtime does almost nothing
- Whole classes of decisions disappear before runtime even exists
This isn’t an optimization. It’s an architectural bet - and it lines up perfectly with SPAs, edge workloads, and tiny container images.
And Koru’s runtime isn’t adding overhead either. In our ~capture vs foldl’ benchmark, Koru matched hand-written Zig within measurement noise - 17.0ms vs 17.3ms for 100 million iterations. The abstractions compile away completely. In Koru, the obvious code IS the fast code.
Coherence beats raw speed every time.
Orisha is part of the Koru ecosystem. The source is available at github.com/korulang/orisha.