Orisha: Faster Than nginx (The Honest Benchmark)

· 8 min read
Orisha - Compile-time Web Framework

Earlier today we posted about comptime AST injection - how any [comptime] event can walk the full program AST. That was the foundation. This is what we built with it.

Orisha, a compile-time web framework, serves static files faster than nginx - even when nginx has all the optimizations enabled and all cores available.

Here are the numbers:

Per-thread comparison (1 worker vs 1 thread, nginx with optimizations):

ConnectionsOrisha (1 thread)nginx optimized (1 worker)Ratio
50138,157 req/s70,989 req/s1.95x

Real-world comparison (Orisha 1 thread vs nginx all cores, nginx with optimizations):

ConnectionsOrisha (1 thread)nginx optimized (14 workers)Ratio
10131,317 req/s90,839 req/s1.45x
50138,157 req/s116,642 req/s1.18x

UPDATE: Multi-threaded comparison (Orisha 4 workers vs nginx all cores):

ConnectionsOrisha (4 workers)nginx optimized (all cores)Ratio
200148,359 req/s98,180 req/s1.51x

Read that again: Orisha with 4 workers beats fully-optimized nginx with sendfile, tcp_nodelay, tcp_nopush, and open_file_cache enabled across all cores by 51%.

Same hardware. Same content. Both serving a 2,807-byte HTML file with keep-alive connections.

The Architecture

Here’s a typical web server at runtime:

Request arrives
→ Parse HTTP headers
→ Extract path
→ Check filesystem for file
→ Read file (even if cached, still a syscall)
→ Format HTTP headers (Content-Type, Content-Length, ETag...)
→ Concatenate headers + body
→ Write to socket

Here’s Orisha:

Request arrives
→ Parse path (one strstr)
→ Lookup pre-computed blob
→ Write to socket

That’s it. The HTTP response - status line, headers, body - is a single compile-time constant. No formatting. No file reads. No string concatenation. Just write(fd, blob, len).

How It Works

Routes are declared with [norun] events (metadata-only, no codegen):

~[norun]pub event route { route: Expression, source: Source }

~route(GET /) {
    "file": "public/index.html"
}

~route(GET /about) {
    "file": "public/about.html"
}

~route(GET /api/health) {
    "content-type": "application/json",
    "body": "OK"
}

At compile time, a [comptime] event walks the AST and collects these:

~[comptime] proc collect_routes {
    for (program.items) |item| {
        if (item == .flow) {
            const inv = item.flow.invocation;
            if (isRouteEvent(inv)) {
                // Parse config from Source block
                const config = parseConfig(inv);

                // Read file content
                const content = std.fs.cwd().readFileAlloc(
                    allocator, config.file, 10_000_000
                );

                // Compute SHA256 etag
                var hash: [32]u8 = undefined;
                Sha256.hash(content, &hash, .{});
                const etag = hexEncode(hash[0..16]);

                // Build COMPLETE HTTP response
                const blob = buildHttpResponse(content, etag);

                routes.append(.{
                    .method = method,
                    .path = path,
                    .response = blob
                });
            }
        }
    }

    // Generate routes.zig with all blobs embedded
    generateRoutesFile(routes);
}

The generated routes.zig looks like:

pub const routes = [_]Route{
    .{
        .method = "GET",
        .path = "/",
        .response = "HTTP/1.1 200 OK\r\n" ++
                    "Content-Type: text/html; charset=utf-8\r\n" ++
                    "Content-Length: 2807\r\n" ++
                    "ETag: "f49fe6a903aeffc0"\r\n" ++
                    "Connection: keep-alive\r\n" ++
                    "\r\n" ++
                    "<!DOCTYPE html>..."  // 2807 bytes of HTML
    },
    // ... more routes
};

pub fn lookup(method: []const u8, path: []const u8) ?[]const u8 {
    for (routes) |route| {
        if (eql(route.method, method) and eql(route.path, path)) {
            return route.response;
        }
    }
    return null;
}

At runtime, the server just:

const response = routes.lookup("GET", path) orelse not_found_blob;
_ = posix.write(fd, response);

One syscall. Zero formatting. The bytes are just… there.

The Server

The runtime is a tight kqueue event loop:

while (true) {
    const n = kevent(kq, &events);
    for (events[0..n]) |event| {
        if (event.fd == listen_fd) {
            // Accept new connection
            const conn = server.accept();
            // TCP_NODELAY - critical for latency
            setsockopt(conn.fd, TCP_NODELAY, 1);
            // Register for read events
            kevent_add(kq, conn.fd, EVFILT_READ);
        } else {
            // Client request
            const request = read(event.fd, &buffer);
            const path = extractPath(request);
            const response = routes.lookup("GET", path);
            write(event.fd, response);
            // Keep-alive: wait for next request
            kevent_add(kq, event.fd, EVFILT_READ);
        }
    }
}

Key optimizations:

  • TCP_NODELAY: Disable Nagle’s algorithm. Send immediately.
  • Keep-alive: Reuse TCP connections. No handshake per request.
  • ONESHOT events: Re-register after each request to avoid thundering herd.
  • Pre-computed blobs: Zero runtime work per response.

Why nginx Can’t Do This

nginx is highly optimized. It uses sendfile(), it caches aggressively, it’s been tuned for 20 years. But it has a fundamental constraint: it doesn’t know your files at compile time.

nginx at runtime must:

  1. Check if the file exists
  2. Get file metadata (size, mtime for ETag)
  3. Format HTTP headers
  4. Either read the file or use sendfile()

Even with kernel page cache, there’s overhead. File descriptors. Stat calls. String formatting.

Orisha compiles the entire response into the binary. The ETag is pre-computed. The Content-Length is pre-computed. The headers and body are concatenated at compile time into a single string literal.

There’s no work left to do at runtime.

Why This Is Possible

Zig has comptime, but it’s intentionally sandboxed - no file I/O, no network, no side effects. This is a design choice for reproducibility and security.

Koru’s comptime is different. It’s full Zig runtime - not Zig comptime. A [comptime] proc can:

  • Read files from disk
  • Make network calls
  • Compute SHA256 hashes
  • Generate code
  • Do anything Zig can do

The trade-off: your build depends on filesystem state. But for a web server that embeds its content, that’s exactly what you want.

The Heritage

This architecture didn’t appear from nowhere. It’s the culmination of months of research:

  • beist-zig - Cell-based architecture with context injection and zero-cost composition. Proved that “no imports” isolation works.
  • beist-image - Template-driven container generation. Proved that sub-100KB Docker images serving 65K+ req/s is achievable.
  • beist-zig-http-perf - Performance testing ground. Proved zero-allocation HTTP handling.

Each project validated a piece of the puzzle. But they all required external tooling: Node.js generators, Liquid templates, CLI orchestration, build pipelines.

Koru collapses all of that into one compiler. The [comptime] system replaces template engines. AST walking replaces metadata scanning. Source blocks replace config files. What took a toolchain now takes a proc.

Benchmark Methodology

For reproducibility, here’s exactly how we tested:

Hardware: MacBook (Apple Silicon), macOS

Tool: wrk - modern HTTP benchmarking tool

Commands:

# 1 connection
wrk -t1 -c1 -d10s http://localhost:3000/

# 10 connections
wrk -t2 -c10 -d10s http://localhost:3000/

# 50 connections (extended test)
wrk -t2 -c50 -d20s --latency http://localhost:3000/

nginx configuration (fully optimized):

worker_processes 1;    # For per-thread comparison
# OR
worker_processes auto; # For real-world comparison (14 workers on test machine)

daemon off;
error_log /dev/null;

events {
    worker_connections 1024;
    use kqueue;
}

http {
    access_log off;

    # TCP optimizations
    tcp_nodelay on;
    tcp_nopush on;

    # File serving optimizations
    sendfile on;

    # File descriptor cache
    open_file_cache max=1000 inactive=20s;
    open_file_cache_valid 30s;
    open_file_cache_min_uses 2;
    open_file_cache_errors on;

    server {
        listen 3001;
        location / {
            root /path/to/public;
            index index.html;
        }
    }
}

We ran both configurations with all optimizations enabled. This is nginx at its best for static file serving.

Content: Both servers serving identical 2,807-byte HTML file.

Caveats (in the interest of honesty)

  • Orisha multi-threading not tested: UPDATE: Multi-threading now implemented! 4 workers with independent kqueue instances, round-robin connection distribution. Results above.
  • Localhost only: This is loopback benchmarking on macOS. Real network latency and NIC behavior would be different.
  • Tiny file bias: 2.8KB files favor the pre-computed blob approach. Larger files might shift toward nginx’s sendfile/zero-copy path.

Orisha configuration: Multi-threaded (4 workers with independent kqueue instances), TCP_NODELAY enabled, HTTP keep-alive enabled.

Verification

We verified this is real, not a benchmark artifact:

# During 136K req/s stress test, make manual requests
$ curl http://localhost:3000/
<!DOCTYPE html>
<html>
<head>
    <title>Orisha - Koru Web Framework</title>
...

Responses are correct. Content is complete. No errors in wrk output. The server actually serves pages under load.

Latency comparison:

PercentileOrishanginx
50th351μs970μs
99th733μs4,990μs

Orisha’s 99th percentile is better than nginx’s median.

What’s Next

This is a static file server. Impressive, but limited. The roadmap:

Smarter route tables

Right now, route lookup is a linear scan. For 3 routes, who cares. For 300 routes, we’d want:

  • Perfect hash maps - Generate a minimal perfect hash at compile time. O(1) lookup.
  • Static string maps - Zig’s std.StaticStringMap with compile-time optimization.
  • Log-driven reordering - Feed HTTP access logs into the build. Put hot routes first. The 90th percentile of traffic often hits 3 routes - make those a single comparison.

This is something nginx can’t do. nginx discovers routes at runtime from config files. Orisha knows every route at compile time and can optimize the lookup structure accordingly.

Compile-time gzip

Pre-compress files at build time. Serve even smaller blobs. Should widen the gap further.

Multi-threading ✅ Done!

Currently single-threaded. UPDATE: Now running with 4 workers, each with independent kqueue instance. Main thread handles accept() and distributes connections round-robin. Result: 148K req/s - 51% faster than nginx on all cores.

Dynamic content

The real goal: Koru flows handling requests at runtime.

~route(GET /api/users/:id)
| request r |>
    db:query(sql: "SELECT * FROM users WHERE id = ?", params: [r.id])
    | rows data |> respond(json: data)
    | error e |> respond(status: 500, body: e.msg)

Docker integration

Declare your container image in Koru:

~docker:image {
    FROM scratch
    COPY ./a.out /server
    EXPOSE 3000
    ENTRYPOINT ["/server"]
}

Compile to a FROM scratch image containing just the binary. Based on our work with beist-image, these images can be under 100KB - orders of magnitude smaller than Alpine-based containers.

The Lesson

Everyone serves static files with nginx. It’s the default. But nginx doesn’t know your content at compile time - it discovers files at runtime.

When you control the compiler, you can move work from runtime to compile time. And compile time is free - it happens once, on your machine, before deployment.

The fastest code is the code that doesn’t run. Orisha just proved it.

The Thesis

After publishing this post, we got feedback that crystallized what we’re actually doing:

“Performance comes from deleting runtime decisions, not accelerating them.”

That’s it. That’s Koru.

We’re not trying to win “fastest for-loop.” We’re winning “least work per request.” The advantage isn’t instruction throughput - it’s that:

  • The compiler knows the app shape
  • The runtime does almost nothing
  • Whole classes of decisions disappear before runtime even exists

This isn’t an optimization. It’s an architectural bet - and it lines up perfectly with SPAs, edge workloads, and tiny container images.

And Koru’s runtime isn’t adding overhead either. In our ~capture vs foldl’ benchmark, Koru matched hand-written Zig within measurement noise - 17.0ms vs 17.3ms for 100 million iterations. The abstractions compile away completely. In Koru, the obvious code IS the fast code.

Coherence beats raw speed every time.


Orisha is part of the Koru ecosystem. The source is available at github.com/korulang/orisha.