zero-io

The thinnest network between your code and the kernel

0 alloc 0 lock 0 copy

See the design → Read the API

The contract

Three zeros
verified by CI

Every commit on the hot path is gated against three counting tests. Not a marketing claim — a build invariant.

allocations

Zero malloc/free per packet, per request, per tick. Pre-allocated pools, RAII slot leases, stack-resident state.

CI gate · zero_alloc_proof — counting global allocator

locks

No mutex, rwlock, or spinlock on the hot path. Single-threaded shards, atomics for cross-thread coordination, lock-free SPSC rings.

CI gate · loom-verified atomics + 3-state futex protocol

copies

The TX path writes encrypted QUIC packets directly into kernel-bound buffers. No staging, no to_vec, no memcpy. Only DMA touches the bytes after that.

CI gate · perf gate "memcpy/pkt TX = 0"

Zero memcpy — with receipts. Turn on linux-af-xdp or land on kernel ≥ 6.18 for io_uring ZCRX — both shipped features — and the transport-path memcpys are gone. Only the two hardware DMAs remain, because that's how Ethernet moves bytes. Broadcast stays at 0 memcpy when producers write through SendBuffer; response stays at 0 memcpy with ZeroResponse native builders. End-to-end floor under AF_XDP: 2 DMAs, 0 application memcpy. Default io_uring without ZC modes is the portable fallback — 2 kernel memcpys above the DMAs, clean, labeled, predictable.

Architecture

One shard, one CPU
one destiny

Each shard owns its sockets, io_uring ring, payload pool, and connection table. Nothing is shared on the hot path. Tokio still drives application code via a deliberate async bridge.

Backends are per-shard on the same Io instance — run io_uring everywhere, turn on linux-af-xdp on the shards that carry the hottest UDP traffic, mix freely. The protocol handlers above see the same RecvPacket / TxSink.

RoutingSO_REUSEPORT + eBPFor DCID hash dispatcher

DCID14 b · server_id + shard16 384 cluster slots, partitioned

CPU pinningpthread_setaffinity_npcpuset for tokio + alerts

NUMAFirst-touch on shard threadlocal-node pages

Inter-threadrtrb SPSC ringsfixed at boot, no MPSC contention

Wakeup3 strategiesThree ways to wait →

Memory

Memory you understand
memory you control

You configure the pools. Any count, any slot size, as many tiers as you want. RAII checkout, lifetime-scoped read guards, atomic refcounts. Allocations happen at boot — never in poll().

Pool A · example

small slots · tuned to your packet size

0 in flight e.g. 2 KB × N for QUIC datagrams

Pool B · example

large slots · optional, for jumbo / bodies

0 in flight e.g. 256 KB for H3 response bodies

Single pool, two pools, or any tiered layout. Config { pool_slot_count, pool_slot_size, ... } on Io::new; multiple pools via IoBuilder::pool(slot_size, count). The example above is one reasonable shape.

Slot lifecycle

checkoutshard requests slot, gets PayloadSlotReserved

0 alloc · 1 atomic CAS

commitquiche::stream_recv writes into slot.as_mut_slice()

0 alloc · 1 memcpy (DMA)

lease handed to dispatcherPayloadLease = 16 B Copy index, crosses SPSC ring

0 alloc · 1 atomic store

read guard acquiredPayloadReadGuard<'a> increments refcount, &[u8] exposed

0 alloc · 1 atomic add

drop & recyclerefcount → 0, slot pushed back onto free stack

0 alloc · 1 atomic CAS

Stack-first

ArrayString, ArrayVec, SmallVec. If the size is bounded, it lives on the stack. Heap is a deliberate decision.

HugePages

MAP_HUGETLB on Linux. VM_FLAGS_SUPERPAGE_2MB on Intel macOS only — Apple Silicon's 16 KB native page already cuts TLB pressure. TLB misses eliminated on pools > 2 MB.

memfd_secret

TLS keys live in pages no other process can map, no swap, zeroized on drop via Zeroizing<T>.

NUMA-local

Heavy allocations happen on the shard thread after CPU pin. Linux first-touch places pages on the local NUMA node.

Syscalls

Five became one
one becomes none

Default path costs five user/kernel transitions per request. io_uring collapses them into one. AF_XDP goes further: shared-memory rings, zero syscalls per packet in steady state.

Traditional · epoll

epoll_wait()

recvmsg()

process

sendmsg()

epoll_ctl()

Each syscall: TLB flush, user/kernel switch, msghdr copy.

io_uring · linked SQEs

io_uring_enter()

RECVMSG_MULTI + pbuf ring

SENDMSG_ZC linked

FUTEX_WAIT coalesced

SQPOLL · kernel-driven

With SQPOLL: zero io_uring_enter per tick — a kernel thread polls the SQ for you.

AF_XDP · shared-memory rings

0per packet, steady state

TX/RX/FILL/COMPLETION rings in UMEM

NIC writes direct to UMEM (ZC_MODE)

Busy-poll loop — no epoll, no enter

sendto() only when ring empty (NEED_WAKEUP)

Optional feature linux-af-xdp. Driver-dependent ZC_MODE; kernel-copy fallback otherwise.

NAPI busy-poll

Kernel ≥ 6.9 lets the io_uring driver poll the NIC directly. Eliminates softirq latency under contended load.

Registered buffers

Pre-pin pool pages once. Skip get_user_pages on every recv: ~200 ns saved per packet.

UDP_GSO + GRO

Segment a 64 KB superpacket into 1500-byte frames in the NIC. One sendmsg, N wire packets.

TCP Fast Open

Piggyback request data on the SYN. First HTTP byte arrives in 1 RTT instead of 2.

splice / IORING_OP_SPLICE

File → socket without ever touching userspace. The page cache moves directly to the NIC.

kTLS

TLS encrypt offloaded to the kernel. Plaintext comes from the page cache, ciphertext goes straight to the NIC.

Message flow

The bytes never move
only indices do

A packet arrives, lives in one pool slot. Sync and async handlers both read from that same slot. The response is written into another slot. What crosses threads is a 16-byte lease or a 16-byte Arc pointer — not the data.

Read it twice. Two pool slots appear in this diagram. Both are pre-allocated, registered with the kernel for zero-copy DMA, and written exactly once each. Between them live the handlers — sync reads the RX slot directly, async crosses to a worker via an Arc pointer over the same slot. At no point does a memcpy happen in the application path. The two DMAs at top and bottom are hardware transfers, not copies.

Wakeup

Three ways to wait
pick your latency / CPU trade

Between two messages, the shard has to wait. The Wakeup trait is a ZST at runtime — the strategy is monomorphized into the loop, no virtual dispatch. Mix per shard: futex on the API tier, spin on the order-book.

FutexWakeup

Futex

The default. IORING_OP_FUTEX_WAIT on Linux ≥ 6.7. Three-state coalesced protocol. 0 file descriptors per shard, 1 syscall per wakeup cycle, kernel handles fairness. ~700 ns wakeup latency.

Best for: API tier · general workloads · per-CPU shard with mixed load

XdpPollMode::Auto

Adaptive

Best of both. AF_XDP's 4-tier ladder — Hot · Warm · Cool · Idle — promotes to busy-poll during traffic bursts, falls back to interrupt-driven sleep when idle. Tier eval every 100 ms, hysteresis to prevent flapping.

Best for: bursty UDP feeds · market data · DNS resolvers · NTP fleets

SpinWakeup

Spin

Cores you own. Pure std::hint::spin_loop(). 100 % CPU, no syscall, no kernel involvement. Sub-100 ns wakeup. Pairs naturally with AF_XDP busy-poll for sub-microsecond end-to-end latency.

Best for: HFT order books · market makers · ultra-low-latency RPC tier

Why 3, not 1. A REST tier handling 10 K conn/s wants Futex — sleep cheaply between bursts, share the CPU. A market-data multicast feeder wants Adaptive — busy-poll during the open, sleep after-hours. An order-router wants Spin — never sleep, never miss a quote. zero-io lets you choose per shard, the trait is monomorphized so the cold paths cost zero in the hot one.

Safety

The compiler is the contract
use-after-invalidate cannot compile

Zero-copy in C and C++ requires runtime discipline, README warnings, and code reviews. zero-io encodes the invariant in the type system: Event<'poll> ties every borrowed &[u8] to the &mut Io from next_event(). The next io.poll() call mutably borrows Io — and the borrow checker rejects it as long as one byte of slot data is still in scope.

cargo build compile error

let Some(event) = io.next_event() else { return };
let data: &[u8] = match event {
    Event::StreamFrame { data, .. } => data,
    _ => return,
};

io.poll(Duration::from_millis(10))?;
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// error[E0502]: cannot borrow `io` as mutable
//   because it is also borrowed as immutable
//   first borrow occurred here, used by `data`

println!("{}", data[0]); // dead code — compiler stopped you

The lifetime IS the contract

No runtime check, no allocator inspection, no test that hopes to catch it. The annotation 'poll on the Event tells rustc exactly when the bytes die — and rustc enforces it before your code reaches a CPU.

Cross-thread? Use `OwnedSlot`

Need the bytes after poll or on another thread? io.detach_event_data() hands you an OwnedSlot — Send + Sync, Arc-counted over the same slot. The compiler still tracks it; the slot is dropped only when the last reference goes away.

What this doesn't need

No reference counting on the hot path. No bounds-check after a memcpy. No "did the kernel still own this buffer?" question. No // SAFETY: comment in handler code. The unsafe primitives live in PoolFreeStack, audited once, behind a typestate façade.

Slot<S> typestate

Pool slots transition Empty → Reserved → Committed → Released through generic state types. Slot<Reserved> has no .read() method ; Slot<Released> has no .commit(). Use-after-release, double-commit, write-without-acquire — all impossible to type-check. Wrong code refuses to compile.

Why this changes the game

Every other zero-copy network library — DPDK, userspace TCP stacks, custom C kernels — relies on documentation and reviews to keep callers honest. Rust's borrow checker turns "don't read this after the next poll" from a comment into a compile error. The result: zero-copy without the footgun tax.

Performance

What it costs
what it doesn't

Architectural targets, not field-measured numbers. Production benchmarks land with the 1.0 release.

30–55M pps

Throughput

Per-shard, AF_XDP ZC_MODE. io_uring path: 5–15.

15–25ns/pkt

Latency

RX-to-handler hot path, AF_XDP. io_uring path: 50–100.

0allocs/req

HTTP path

Warm pool · run_sync · match-arm router

0protocols

Coverage

QUIC, H3, WT, TCP, UDS, WebSocket, HTTP/1.1+2, REST, gRPC, MQTT, Redis, FIX, SBE, SMTP, FTP, DNS, mDNS, NTP, SOCKS5

Throughput · packets per second per shard

DPDK referencekernel-bypass, no protocols

50–80M pps

zero-io · AF_XDPZC_MODE, busy-poll

30–55M pps *

zero-io · io_uringdefault, SQPOLL on

5–15M pps

nginx-quicworkers + reuseport

~3M pps

tokio-quicheCloudflare wrapper

~2M pps

Quinn + Tokiomulti-thread runtime

~1.5M pps

Per-packet latency · nanoseconds

DPDK reference

5–10ns

zero-io · AF_XDP

15–25ns *

zero-io · io_uring

50–100ns

Tokio · mio + epoll

500–2000ns

* AF_XDP ZC_MODE — driver dependent; fallback XDP_COPY_MODE matches io_uring. See Backends for supported NICs.

HTTP allocations · per request, warm pool

run_syncmatch-arm router

run_tower + matchitgeneric Tower layers

64B

zero_io_axumdrop-in axum + HeaderMap pool *

200B

hyper + axumstock Tokio stack

~3,500B · 5–7 allocs

reqwest GETpopular client

~6,000B · 10–15 allocs

* header-map-pool feature flag · reclaims axum's HeaderMap after the response chain so a per-shard pool can re-issue it. Steady-state floor ~200 B per request; first request still pays the initial allocation.

Backends

One API
the right backend for your box

From an embedded gateway in a forklift to a CDN edge node to an HFT order router — same code, different config. Memory tuned to the box (64 KB pool slots × 16 frames on a Pi, 64 MB UMEM × 16 K frames on a 100 G NIC), wakeup tuned to the workload (futex on the API tier, spin on the order book), backend tuned to the kernel (io_uring everywhere, AF_XDP where the driver supports zero-copy).

Embedded · IoT gateways

Single-shard Io, pool_slot_count = 16, slot_size = 1024. FutexWakeup. ~64 KB total. ARMv7+ / Raspberry Pi class.

Generic apps · APIs

Single shard or 2–4 shards via IoCluster, default config. io_uring. The right tier for a REST/gRPC service that just needs "not Tokio's perf cliff".

HPC · CDN edge

Cluster of N shards = N CPUs, NUMA-pinned. AF_XDP on dedicated NIC queues. Adaptive busy-poll. 64 MB UMEM × 16 K frames. Many millions pps per box.

HFT · order routers

Per-CPU shard pinned isolated, SpinWakeup, AF_XDP ZC_MODE, FreeBSD userspace TCP on listen ports. Hugepages on. p99 sub-microsecond.

Per-OS backends · same API, native kernel

Kernel backend

Status

Notes

Linux ≥ 6.7

io_uring

tier 1

Production default. Futex2 wakeup, SQPOLL, registered buffers, GSO/GRO, linked SQEs, optional ZCRX (kernel ≥ 6.18) *.

macOS ≥ 14

kqueue

tier 1

Identical API surface. EVFILT_USER fflags for targeted wakeups, sendmsg_x/recvmsg_x batch syscalls, hugepages on x86.

Windows ≥ 10 1809

RIO + IOCP

tier 1

Registered buffers give zero-copy TX on par with SENDMSG_ZC. Dedicated completion queues per listener.

Linux I/O strategies · pick per shard

Strategy

Protocols

TLS

Why pick it

io_uring(default)

UDP, TCP, all protocols via the kernel stack

kTLS · including NIC HW offload (mlx5 ConnectX-6+, etc.)

Portable, CI-gated, runs every protocol. 5–15 M pps, 50–100 ns/pkt. The right default unless you have a specific reason.

AF_XDPlinux-af-xdp

UDP native (kernel-bypass via UMEM). TCP via FreeBSD userspace stack on opt-in ports — the BSD TCP state machine ported to run on top of AF_XDP frames, replacing the kernel TCP path for those listen ports only *

software only · rustls on CPU (AES-NI accelerated). No NIC HW TLS offload — kTLS is a kernel feature and AF_XDP bypasses the kernel TCP stack by design.

UDP-heavy hot paths (market data, DNS at scale, NTP fleets) and HFT-tier TCP on specific ports. 30–55 M pps, 15–25 ns/pkt.

Driver-support footnote (canonical) — AF_XDP XDP_ZC_MODE requires zero-copy support in the NIC driver: mlx5 (Mellanox/NVIDIA ConnectX-4 Lx and later), i40e / ice (Intel X710/E810), ena (AWS Nitro), virtio-net (recent kernels). Fallback XDP_COPY_MODE works on every driver but adds one kernel memcpy (matches io_uring's cost). The FreeBSD userspace TCP path is opt-in per listen port via [userspace_tcp] enabled_ports = […]; other ports keep kernel TCP through XDP_PASS. TSO / GRO / LRO / kTLS are unavailable in the AF_XDP path by design — the cost of bypassing the kernel TCP stack.

Tier note — the per-OS table above is about API parity; the strategies table above is about Linux I/O backend maturity. Linux ≥ 6.7 ships at OS-tier 1 (full API), but its AF_XDP strategy is opt-in and CI-gated only when the feature is on. macOS and Windows are tier 1 for the OS surface (kqueue / RIO have parity); Linux's I/O strategies have their own tiering.

Protocols

Every protocol
one library

Pluggable ProtocolHandler trait. Each protocol is a feature gate. Pay only for what you use.

QUICtier 1

HTTP/3tier 1

WebTransporttier 1

TCPtier 1

UDStier 1

WebSockettier 1

HTTP/1.1 + 2tier 1

RESTtier 1

gRPCtier 1

MQTT 3.1.1 / 5tier 2

Redis · RESP2/3tier 2

FIX 4.4 · texttier 2

SBE · CME MDP3tier 2

SMTP · MIMEtier 2

FTP · FTPStier 2

DNS · DoT/DoHtier 2

mDNS · RFC 6762tier 2

NTP · SNTPtier 2

SOCKS5tier 2

Async bridge

Tokio when you want it
not when you don't

Four runtime modes. Each picks a different point on the latency / ergonomics curve. The hot path stays sync; the application stays async.

Mode

Allocs

Body stream

Use case

run_sync

0 B

Inline handler, sync. Zero allocations. CPU-bound RPC, parsing, transform.

run_per_core

0 / 64 B

yes

Tokio runtime per shard. .await inline; 64 B for streaming bodies via FuturesUnordered.

run_async

~64 B

yes

Cross-thread dispatch via SPSC ring. Database queries, slow handlers, isolated workers.

run_tower

~64 B

yes

Direct tower::Service. S::Future concrete, no Box::pin.

axum migration cost

Path

Allocs/req

Compatibility

Effort

run_sync + match-arm

0 B

none — write your own

~30 LOC. Best for small services and RPC.

run_tower + matchit

64 B

generic Tower (Timeout, Retry, …)

~50 LOC. Recommended for production HTTP perf.

run_tower + zero-io natives

64 B

7 native middleware (CORS, Auth, Trace, Compress, RequestId, NormalizePath, SensitiveHeaders)

Zero alloc per layer. Covers ~90% of tower-http use.

tower-http-compat

~640 B

full tower-http

Use real tower-http layers via HttpOwnedRequest → http::Request.

zero_io_axum + pool

~200 B

full axum + tower-http

HeaderMap reclaim. Same compat at lower cost.

zero_io_axum::serve

~640 B

full axum + tower-http

Two-line migration: axum::serve → zero_io_axum::serve.

vs. the ecosystem

Row by row
the API surface

Thread-per-core, plug-in protocol matrix, CI-enforced zero-alloc — the combination doesn't exist anywhere else.

Feature	zero-io	tokio-quiche	Glommio	monoio	Quinn	neqo	s2n-quic	smoltcp	nginx-quic
Thread model	thread-per-core	Tokio MT	thread-per-core	thread-per-core	sans-io / Tokio	sans-io	Tokio	sync no_std	workers
io_uring	yes	no	required	yes	no	no	no	no	no
AF_XDP kernel-bypass	yes (opt-in)	no	no	no	no	no	no	no	no
Cross-platform	Linux · macOS · Win	cross	Linux only	cross (varies)	cross	cross	cross	bare-metal	cross
0-alloc hot path	CI gate	none	none	none	none	none	none	strict	n/a (C)
0-lock hot path	lock-free	Mutex	shard-local	shard-local	Mutex	sans-io	Mutex	single-thread	per-worker
0-copy TX	yes *	feature-gated	none	ownership API	GSO/sendmmsg	none	GSO	n/a	UDP GSO
QUIC	yes (quiche)	yes	no	no	yes	yes	yes	no	yes
H3 + WebTransport	yes	H3 only	no	no	via h3 crate	yes	partial	no	yes
TCP	yes	no	yes	yes	no	no	no	yes	yes
Plugin protocol model	yes	trait	no	no	sans-io	sans-io	provider	sockets	C modules
Tokio bridge	yes · 4 modes	is Tokio	no	partial	yes	n/a	yes	n/a	n/a

* with AF_XDP ZC or io_uring ZCRX (kernel ≥ 6.18). Default io_uring path uses borrowed-send + zero-alloc QPACK vendor patches.

Code

Beautiful by design
server · client

Same Io, same event loop. _listen for servers, _connect for clients. Zero allocations on the hot path either way.

Server

examples/server/quic_echo.rs Rust

use zero_io::{Io, Config, Event};

fn main() -> std::io::Result<()> {
    let mut io = Io::new(Config::default())?;
    io.quic_listen("0.0.0.0:4433".parse()?, &CERT_PEM, &KEY_PEM)?;

    loop {
        io.poll(Duration::from_millis(10))?;
        while let Some(event) = io.next_event() {
            match event {
                Event::StreamFrame { conn, stream, data, .. } => {
                    // `data` is &[u8] borrowed from a pool slot
                    // invalidated on the next poll() — process now or detach
                    io.stream_write(conn, stream, data)?;
                }
                _ => {}
            }
        }
    }
}

examples/server/axum_drop_in.rs Rust

// Before: tokio + hyper
let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await?;
axum::serve(listener, app).await?;

// After: zero-io. Two lines changed. Same axum router.
let mut io = Io::new(Config::default())?;
io.http_listen("0.0.0.0:8080".parse()?)?;
zero_io_axum::serve(io, app)?.wait()?;

examples/server/tower_perf.rs Rust

// 64 B / req. Generic Tower + zero-io native middleware. 0 alloc per layer.
use tower::ServiceBuilder;
use zero_io_async::{ZeroRuntime, ZeroCorsLayer, ZeroTraceLayer, ZeroCompressionLayer};

let svc = ServiceBuilder::new()
    .layer(ZeroTraceLayer::new())
    .layer(ZeroCorsLayer::permissive())
    .layer(ZeroCompressionLayer::zstd())
    .layer(tower::timeout::TimeoutLayer::new(Duration::from_secs(10)))
    .service(my_handler);

ZeroRuntime::new(io).run_tower(svc)?.wait()?;

Client

examples/client/quic_connect.rs Rust

use zero_io::{Io, Config, Event, StreamId};

fn main() -> std::io::Result<()> {
    let mut io = Io::new(Config::default())?;
    let conn = io.quic_connect("203.0.113.1:4433".parse()?)?;

    loop {
        io.poll(Duration::from_millis(10))?;
        while let Some(event) = io.next_event() {
            match event {
                Event::Connected { .. } => {
                    // 1-RTT ready · session ticket cached for next 0-RTT
                    io.stream_write(conn, StreamId(0), b"GET /quote HTTP/3\n")?;
                }
                Event::StreamFrame { data, .. } => {
                    // `data` borrows the same pool slot the NIC DMA'd into
                    process(data);
                }
                _ => {}
            }
        }
    }
}

examples/client/http_get.rs Rust

// 0 allocs/req on warm pool · LIFO connection reuse · TFO + Happy Eyeballs +
// TLS 0-RTT + Alt-Svc h3 auto-upgrade — all transparent.
use zero_io::http_client::{HttpClient, HttpClientConfig};

// Async wrapper: drop-in replacement for reqwest at ~10× the throughput.
#[tokio::main]
async fn main() -> std::io::Result<()> {
    let client = HttpClient::new(HttpClientConfig::default())?;

    let resp = client
        .get("https://api.example.com/users/42")
        .header("authorization", "Bearer …")
        .send().await?;

    match resp.status() {
        200 => println!("{}", resp.text()?),
        s   => eprintln!("http {}", s),
    }
    Ok(())
}

examples/client/redis_pipeline.rs Rust

// 100 SET + 100 GET in one round-trip via writev. RESP3 inline.
use zero_io::redis::{RedisClient, RedisConfig};

let mut io = Io::new(Config::default())?;
let client = RedisClient::connect(&mut io, RedisConfig::localhost())?;

let mut pipe = client.pipeline();
for i in 0..100 {
    pipe.set(&format!("k:{}", i), format!("v:{}", i).as_bytes());
}
let results = pipe.exec(&mut io)?; // 1 round-trip, not 100

Backpressure

It bends before it breaks
cascade, observed

One cascade state machine, four states. As pool utilization climbs, each step costs a little more — until the last one closes idle connections to recover. Every transition is observable, every drop is counted.

Seven pools feed the cascade

FreeStackpool slotsmain LIFO free list

FillRingRX descriptorskernel reads, driver fills

RxRingRX completionspackets ready to process

TxRingTX descriptorsoutbound queue

CompletionRingTX donebuffers reclaim here

ScratchPoolstagingshort-lived intermediates

PerConnper-connectionstreams, h3, WT state

Budget partition. 60 % RX · 10 % TX critical · 25 % TX bulk · 5 % scratch. Critical TX never starves (handshake packets, ACKs, FIN). Bulk TX is the first to be shed when Critical hits. Idle connections close last — only once we cross 95 %.

Deployment

Embedded or daemon
your call, same API

Two ways to ship zero-io. Pick once at compile time. The protocol handlers, the Io surface, the event loop — identical. The only thing that changes is who owns the privileged file descriptors.

Embedded · default

Linked in

One process. zero-io is a library inside your binary. No IPC, no daemon, no extra moving parts. Simplest possible deployment.

Restart drops in-flight connections — fine for stateless tiers, dev environments, embedded targets, or anything where a cold cycle is acceptable.

cargo add zero-io

Daemon · production tier

Separated

Two processes. A privileged daemon owns the dangerous FDs (BPF, raw sockets, UMEM, TLS keys) ; your unprivileged app talks to it through a sealed memfd.

Hot-reload the app binary without dropping a packet — the daemon keeps everything alive across the execve. Privilege isolation, zero-downtime upgrades, multi-tenant safety.

cargo add zero-io --features daemon-client

How daemon mode works

The daemon holds CAP_NET_ADMIN, CAP_BPF, CAP_SYS_RESOURCE. The app runs as a regular user, zero caps. They share state through a sealed memfd mapping — UDS carries control only.

Privilege isolation

Daemon holds the dangerous bits. App runs setuid(nobody) + seccomp + landlock. A handler bug never escalates to BPF / raw socket reach. Compatible with Kubernetes securityContext.

Shared memory · perf-optimal

Connection state, UMEM, pool slots all live in a sealed memfd. App and daemon both mmap MAP_SHARED — same physical pages, no syscall on the data path. UDS only carries setup commands.

Binary hot-reload · zero packet drop

Ship a new binary, signal the supervisor, app execve's the replacement. Daemon-owned BPF / sockets / UMEM persist across the boundary. The new binary calls recover_from_exec() and rebinds — in-flight TCP and QUIC connections continue without notice.

Secrets that can't leak

TLS keys live in memfd_secret pages — no other process can map them, no swap, no /proc/<pid>/mem exposure. Wrapped in Zeroizing<T> so they're scrubbed on drop. Even a kernel exploit on the app side never reaches them.

Cert hot-reload, no restart

Rotate TLS certificates without dropping a single connection. CertReloadHandle::reload_from_pem(...) swaps via arc-swap — old connections finish on the old cert, new connections pick up the new one. Sub-microsecond cutover.

Audit trail · structured

Every privileged action — BPF load, cert reload, ops command, signal handler — emits a structured event on target = "audit::*". Actor uid/pid, before / after state, monotonic timestamp. Compatible with NIST SP 800-53 AU-2/3, OWASP ASVS §1.4.

Ops surface, locked down

Force actions (drain-tx, conn-kill, cert-reload, BPF reload) speak through a UDS /run/...ops.sock at mode 0600. Mandatory HMAC, two-phase commit for destructives, per-class rate limit, profile allowlist (dev / staging / prod), sealed-token for prod-only operations.

Multi-tenant safe

Per-shard connection tables, per-shard pools, no cross-tenant pointers. Tenant identity bound at handshake, enforced through the audit chain. Acceptable for cooperative tenants today; adversarial multi-tenancy = run multiple daemons.

Read the API

Every public type. Every protocol method. Every config knob.

Open the reference → Back to top

Reference

Public API

Every type, trait, and method exposed at the crate boundary. Architectural plan: PLAN-STEP173 → STEP199.

Quick start

Three primitives. Io owns the shard, poll() drives one tick, next_event() drains the queue.

main.rsRust

let mut io = Io::new(Config::default())?;
io.quic_listen("0.0.0.0:4433".parse()?, &cert, &key)?;
loop {
    io.poll(Duration::from_millis(10))?;
    while let Some(ev) = io.next_event() { handle(ev); }
}

Feature gates

Compile only what you use. Defaults: quic + tcp. Everything else is opt-in.

Feature	Enables	Implies
quic	QUIC listen/connect, datagrams, streams	—
tcp	TCP listen/connect, WriteBufferPool	—
websocket	WebSocket listen/connect, masking, ping/pong	tcp
websocket-deflate	permessage-deflate compression	websocket
http	HTTP/1.1 + HTTP/2 server	tcp
http-client	HTTP client + HttpPool	http
webtransport	H3 CONNECT + WT sessions	quic
tls-ktls	kernel TLS offload	tcp / http
tower	tower::Service<HttpOwnedRequest> impls	—
tower-http-compat	HttpOwnedRequest → http::Request adapter	tower
linux-af-xdp	AF_XDP backend (tier 2)	—
linux-userspace-tcp	FreeBSD userspace TCP stack on AF_XDP (tier 3, experimental)	—
socks5 / dns / ntp / mdns	opt-in feature gates per protocol	—

`Io`

The shard handle. Owns sockets, the io_uring ring, the payload pool, and the connection table. Single-threaded; do not Send.

Method	Purpose
`fn new(config: Config) -> io::Result<Self>`	Construct a single-shard `Io`. Allocates pools, opens io_uring.
`fn poll(&mut self, timeout: Duration) -> io::Result<()>`	One tick: drain CQEs → process_dirty → flush TX → fire timers.
`fn next_event(&mut self) -> Option<Event<'_>>`	Drain the per-tick event queue. Borrows `&mut self`.
`fn detach_event_data(&mut self) -> Option<OwnedSlot>`	Promote the current event's payload to an owned, `Send + Sync` slot.
`fn detach_http_request(&mut self) -> Option<HttpOwnedRequest>`	HTTP-only. Detach the current request as a 96–112 B owned struct.
`fn send_buffer(&mut self, min: usize) -> io::Result<SendBuffer>`	Check out a writable pool slot for zero-copy TX.
`fn close(&mut self, conn: ConnId) -> io::Result<()>`	Immediate close.
`fn close_graceful(&mut self, conn: ConnId, timeout: Duration)`	Drain in-flight, then close.
`fn pool_pressure(&self) -> Option<PoolPressureInfo>`	Snapshot of pool utilization for back-pressure checks.
`fn pool_stats(&self) -> PoolStats`	Current / peak / capacity per pool.
`fn conn_stats(&self, conn) -> io::Result<ConnStats>`	RTT, cwnd, bytes, packets-lost per connection.
`fn handle(&self) -> IoHandle`	Cross-thread send-side handle. Cheaply cloneable.

`Event<'poll>`

Borrowed from the current poll. Invalidated by the next poll(). Process synchronously or call detach_event_data() for cross-thread.

Variant	Carries
`UdpRecv`	`endpoint, from, data: &'poll [u8]`
`Connected`	`conn, peer, protocol: Protocol`
`Disconnected`	`conn, error_code: u64, reason: &'poll [u8]`
`Datagram`	`conn, data: &'poll [u8]`
`StreamFrame`	`conn, stream, kind: MessageKind, data: &'poll [u8]`
`StreamReset · StopSending`	`conn, stream, error_code`
`SessionReady`	`conn` · WebTransport CONNECT 200
`PathMigration`	`conn, old_peer, new_peer`
`HttpRequest · HttpBodyChunk · HttpResponse`	HTTP feature only
`PoolPressure · DnsResolved · MqttEvent`	Per-feature

`MessageKind`

Typed message discriminator on StreamFrame and ConnDatagram. Replaces the old opaque msg_type: u8.

Binary · WsText · WsBinary · MqttPacket · GrpcFrame · FixText · Sbe · User(u8)

`Config`

Per-shard knobs. #[non_exhaustive] — extend without breaking semver.

Field	Default
`pool_slot_count: usize`	4096
`pool_slot_size: usize`	2048 (≥ 1200, RFC 9000)
`huge_pages: Toggle`	Auto
`max_connections: usize`	1024
`max_events_per_poll: usize`	256
`pool_pressure_pct: u8`	80
`compression_threshold: usize`	128 B
`uring: UringConfig` linux	auto-tuned
`debug: DebugConfig`	disabled

`IoHandle`

Send-side handle obtained via io.handle(). Send + Sync + Clone. Cross-thread paths funnel through this — workers send, the shard wakes and writes.

fn send_datagram(&self, conn, data: &[u8]) -> io::Result<()>
fn stream_write(&self, conn, stream, data: &[u8]) -> io::Result<()>
fn send_datagram_buffer(&self, conn, buf: SendBuffer) -> io::Result<()>
fn stream_write_buffer(&self, conn, stream, buf: SendBuffer) -> io::Result<()>
fn http_respond(&self, conn, request_id, response: ZeroResponse)
fn close(&self, conn) · close_graceful(&self, conn, timeout)

`OwnedSlot` · `SendBuffer`

OwnedSlot is a payload pulled out of the per-poll lifetime and made Send + Sync. Internally an Arc over a pool slot — refcounted, recycled on drop.

SendBuffer is a writable pool slot for zero-copy TX. Acquire via io.send_buffer(n), write into as_mut_slice(), hand to send_datagram_buffer / stream_write_buffer.

`IoCluster` · multi-shard

Production entry point for > 1 shard. Owns N reuseport sockets (or one shared socket + DCID dispatcher) and exposes the same listen / connect surface, fanned out across shards.

Item	Purpose
`ClusterConfig`	`shard_count`, `routing`, `cpu_affinity`, `expected_protocols`
`RoutingStrategy`	`ReusePortCbpf` · `ReusePortEbpf` · `DcidDispatch`
`ScidGenerator`	14 bits encode (server_id, shard_id) jointly in QUIC SCIDs — 16 384 cluster slots partitioned across the two fields. `SERVER_ID_MASK = 0x3FFF`, shard count is a power of 2 within each server.
`ShardIo`	Per-shard handle; identical surface to `Io`.

Pick one. Io for tests, single-core deploys, tools. IoCluster for production servers. Don't roll your own N Io instances — you'll miss the routing.

UDP

fn udp_bind(&mut self, addr: SocketAddr) -> io::Result<EndpointId>
fn udp_bind_with(&mut self, config: UdpEndpointConfig) -> io::Result<EndpointId>
fn udp_send(&mut self, endpoint, to, buf: SendBuffer) -> io::Result<()>
fn udp_send_bytes(&mut self, endpoint, to, data: &[u8]) -> io::Result<()> · convenience, 1 memcpy
fn multicast_join · multicast_leave (group: MulticastGroup)

MulticastGroup is typed: AnySource (mDNS, RFC 1112) or SourceSpecific (CME / Eurex feeds, RFC 4607).

QUIC

fn quic_listen(&mut self, addr, cert, key) -> io::Result<EndpointId>
fn quic_listen_with(&mut self, config: QuicListenConfig)
fn quic_connect(&mut self, addr) -> io::Result<ConnId>
fn quic_connect_with(&mut self, config: QuicConnectConfig) · supports Happy Eyeballs (RFC 8305) when HostOrAddr::Host.
fn send_datagram(&mut self, conn, data: &[u8])
fn stream_write(&mut self, conn, stream, data) -> io::Result<usize>
fn stream_read(&mut self, conn, stream, &mut [u8]) · QUIC / WT only · returns StreamNotPullable on TCP/WS
fn early_data_send · session_ticket · set_session_ticket · 0-RTT

QuicListenConfig covers idle timeout, stream / data limits, congestion (Reno · Cubic · BBRv2), DPLPMTUD, retry tokens, ECN, allowed origins.

TCP · Unix Domain Sockets

fn tcp_listen · tcp_listen_with
fn tcp_connect · tcp_connect_with · Happy Eyeballs supported
fn uds_listen(&mut self, path: &str)
fn uds_connect(&mut self, path: &str)

TCP RX is push-only. Data arrives via Event::StreamFrame { kind: MessageKind::Binary }. There is no tcp_stream_read; calling stream_read on a TCP ConnId returns IoError::StreamNotPullable.

WebSocket

fn ws_listen · ws_listen_tls · ws_listen_with
fn ws_connect · ws_connect_with
fn ws_send(&mut self, conn, data: &[u8], text: bool)
fn ws_send_buffer(&mut self, conn, buf: SendBuffer, text: bool)
fn ws_close(&mut self, conn, code: u16, reason: &str)

Frames arrive as Event::StreamFrame { kind: WsText | WsBinary }. Ping/pong handled internally.

HTTP · HTTP/2

fn http_listen · http_listen_tls · http_listen_with(HttpListenConfig)
fn http_respond(&mut self, conn, request_id, response: ZeroResponse)
fn http_request(&mut self, …) -> io::Result<RequestId> · client

HttpListenConfig: max_header_count, max_header_size, max_body_inline, request_timeout_ms, H/2 streams / window / frame / header-list, compression threshold.

WebTransport

fn wt_connect(&mut self, addr, path: &str)
fn wt_connect_with(WtConnectConfig)
Server: quic_listen_with(QuicListenConfig { enable_webtransport: true, allowed_origins, … })
Event::SessionReady · H3 CONNECT 200 accepted

One session per connection. Datagrams + streams over the H3 CONNECT.

TLS · STARTTLS · hot-reload

fn tls_upgrade(&mut self, conn, config: TlsClientConfig) · client STARTTLS
fn tls_accept_upgrade(&mut self, conn, config: TlsServerConfig) · server STARTTLS
fn enable_cert_hot_reload(&mut self, endpoint) -> CertReloadHandle
CertReloadHandle::reload_from_pem · reload_from_bytes · reload_quic_from_pem · Send + Sync + Clone, atomic swap via arc-swap

Auto-attempts kTLS after handshake if available (Linux ≥ 6.7). Falls back to rustls in-process if not.

Multicast · DNS · NTP · mDNS · SOCKS5

Method	Purpose
`fn dns_init · dns_resolve · dns_result`	Async DNS via UDP, optional TCP fallback, optional DoT/DoH.
`fn ntp_init(NtpConfig) · ntp_offset_us · ntp_now_us`	SNTP / NTP, multi-server, KoD.
`fn mdns_init · mdns_register(MdnsService) · mdns_discover · mdns_resolve`	RFC 6762 / 6763, ASM 224.0.0.251.
`fn tcp_connect_socks5(proxy, dest, auth)`	RFC 1928. Universal.
`fn quic_connect_socks5(proxy, dest, auth)`	UDP ASSOCIATE. Best-effort, server allowlist required, MTU auto-adjusted, migration disabled.

`ZeroRuntime` · async bridge

Wraps Io with a Tokio-friendly driver. Four modes pick a different point on the latency / ergonomics curve.

Method	Allocs	Best for
`fn run_sync<H: SyncHandler>(self, handler) -> io::Result<ShutdownHandle>`	0 B	CPU-bound inline handlers
`fn run_async<H: AsyncHandler + Clone>(self, handler)`	~64 B	DB queries, slow handlers
`fn run_tower<S: tower::Service<HttpOwnedRequest>>(self, svc)`	~64 B	Tower middleware, generic Tower
`fn run_per_core<H: AsyncHandler + Clone>(self, cluster, handler)`	0 / 64 B	Per-shard tokio runtime, mixed inline + streaming

`HttpOwnedRequest` · `BodyStream`

~96–112 B owned struct. Send + Sync. Path / headers / body offsets stored as a 12-byte table inside the pool slot. Zero-copy accessors return &str slices into the slot.

fn path(&self) -> &str
fn header(&self, name: &str) -> Option<&str>
fn method(&self) -> HttpMethod
fn body(&self) -> &[u8] · inline body
fn body_stream(&mut self) -> Option<BodyStream> · streaming uploads, 8-slot SPSC ring per request

ZeroResponse mirrors this on the response side. Builders: ZeroResponse::ok().json(&value), ZeroResponse::not_found(), etc.

Native middleware (zero alloc)

Layer	tower-http equivalent
`ZeroCorsLayer`	`tower_http::cors::CorsLayer`
`ZeroAuthLayer`	`tower_http::auth::ValidateRequestHeader`
`ZeroTraceLayer`	`tower_http::trace::TraceLayer`
`ZeroCompressionLayer`	`tower_http::compression::CompressionLayer`
`ZeroRequestIdLayer`	`tower_http::request_id::SetRequestIdLayer`
`ZeroNormalizePathLayer`	`tower_http::normalize_path::NormalizePathLayer`
`ZeroSensitiveHeadersLayer`	`tower_http::sensitive_headers::SetSensitiveHeadersLayer`

For anything outside this list, use tower-http-compat at a 640 B / req cost.

`zero-io-axum`

fn serve(io: Io, app: axum::Router) -> io::Result<ShutdownHandle>

Two-line migration from axum::serve. Cost: ~200 B steady-state with the default header-map-pool feature (which reclaims axum's HeaderMap per request); 640 B without the pool. The HeaderMap itself is structural — axum's signature requires it.

REST · gRPC

Higher-level crates building on the HTTP base.

zero-rest: Router, RestRequest, RestResponse, PathParams, optional CacheMiddleware.
zero-grpc: GrpcService, ServerStream, ClientStream, BidiStream, Code, Status. Code generated from .proto via zero-grpc-build.

MQTT · Redis

zero-mqtt: MqttClient, MqttBroker, QoS 0/1/2, MQTT 3.1.1 + 5, trie-based topic match.
zero-redis: RedisClient, RedisPipeline, RESP2 / RESP3, pub/sub.

FIX · SBE

zero-fix: zero-copy text FIX 4.4 parser/builder, session FSM. Persistence in SessionWal (PLAN-STEP193b) — append-only WAL per session, CRC32C, atomic checkpoint.
zero-sbe: flyweight SBE decoder for CME MDP 3.0 / Eurex T7. Multicast feed handler with explicit gap-recovery FSM (T1..T14 transitions, I1..I5 invariants).

SMTP · FTP

zero-smtp: SMTP client + server, STARTTLS, AUTH PLAIN / LOGIN / XOAUTH2, MIME, DKIM (Ed25519 / RSA), pipelining.
zero-ftp: FTP client + server, AUTH TLS (FTPS), passive / EPSV, splice / mmap for transfers.

Ops CLI

charting-status binary. UDS at /run/charting-server/ops.sock (mode 0600). Mandatory HMAC. Two-phase commit for destructive actions. Profile-based allowlist (dev / staging / prod). Sealed-token for prod-restricted operations.

charting-status snapshot · healthz · readyz · read-only, no auth needed beyond peer-cred
charting-status drain-tx · conn-kill · bpf reload-ports · cert-reload · reset-peaks · privileged, audit-logged

`IoError`

Structured. Variants pinned to #[non_exhaustive]. Diagnostic strings are actionable — they name the syscall, the cause, and the fix.

KernelTooOld { required: KernelVersion, found: KernelVersion }
StreamNotPullable { protocol: Protocol } · use Event::StreamFrame
NotSupportedOnPlatform { platform }
NotSupportedOnBackend { backend, feature }
PoolExhausted · DnsError · ConnectTimeout · TlsHandshakeFailed
HmacMismatch · NonceReplay · ProfileForbidden · ops API

Backpressure cascade

Seven pools (FreeStack · FillRing · RxRing · TxRing · CompletionRing · ScratchPool · PerConn) feed one cascade state: Healthy → Warning → Critical → Drain. Each transition has a budget partition (60% RX / 10% TX critical / 25% TX bulk / 5% scratch) and a drop policy. Live snapshot via Io::pool_pressure() or the ops endpoint.

← Back to overview

zero-io

Three zerosverified by CI

allocations

locks

copies

One shard, one CPUone destiny

Memory you understandmemory you control

Slot lifecycle

Stack-first

HugePages

memfd_secret

NUMA-local

Five became oneone becomes none

NAPI busy-poll

Registered buffers

UDP_GSO + GRO

TCP Fast Open

splice / IORING_OP_SPLICE

kTLS

The bytes never moveonly indices do

Three ways to waitpick your latency / CPU trade

Futex

Adaptive

Spin

The compiler is the contractuse-after-invalidate cannot compile

The lifetime IS the contract

Cross-thread? Use OwnedSlot

What this doesn't need

Slot<S> typestate

Why this changes the game

What it costswhat it doesn't

Throughput · packets per second per shard

Per-packet latency · nanoseconds

HTTP allocations · per request, warm pool

One APIthe right backend for your box

Embedded · IoT gateways

Generic apps · APIs

HPC · CDN edge

HFT · order routers

Per-OS backends · same API, native kernel

Linux I/O strategies · pick per shard

Every protocolone library

Tokio when you want itnot when you don't

axum migration cost

Row by rowthe API surface

Beautiful by designserver · client

Server

Client

It bends before it breakscascade, observed

Seven pools feed the cascade

Embedded or daemonyour call, same API

Linked in

Separated

How daemon mode works

Privilege isolation

Shared memory · perf-optimal

Binary hot-reload · zero packet drop

Secrets that can't leak

Cert hot-reload, no restart

Audit trail · structured

Ops surface, locked down

Multi-tenant safe

Read the API

Public API

Quick start

Feature gates

Io

Event<'poll>

MessageKind

Config

IoHandle

OwnedSlot · SendBuffer

IoCluster · multi-shard

UDP

QUIC

TCP · Unix Domain Sockets

WebSocket

HTTP · HTTP/2

WebTransport

TLS · STARTTLS · hot-reload

Three zeros
verified by CI

One shard, one CPU
one destiny

Memory you understand
memory you control

Five became one
one becomes none

The bytes never move
only indices do

Three ways to wait
pick your latency / CPU trade

The compiler is the contract
use-after-invalidate cannot compile

Cross-thread? Use `OwnedSlot`

What it costs
what it doesn't

One API
the right backend for your box

Every protocol
one library

Tokio when you want it
not when you don't

Row by row
the API surface

Beautiful by design
server · client

It bends before it breaks
cascade, observed

Embedded or daemon
your call, same API

`Io`

`Event<'poll>`

`MessageKind`

`Config`

`IoHandle`

`OwnedSlot` · `SendBuffer`

`IoCluster` · multi-shard

`ZeroRuntime` · async bridge

`HttpOwnedRequest` · `BodyStream`

`zero-io-axum`

`IoError`