zero-io

The thinnest network between your code and the kernel

0 alloc 0 lock 0 copy
The contract

Three zeros
verified by CI

Every commit on the hot path is gated against three counting tests. Not a marketing claim — a build invariant.

0

allocations

Zero malloc/free per packet, per request, per tick. Pre-allocated pools, RAII slot leases, stack-resident state.

CI gate · zero_alloc_proof — counting global allocator
0

locks

No mutex, rwlock, or spinlock on the hot path. Single-threaded shards, atomics for cross-thread coordination, lock-free SPSC rings.

CI gate · loom-verified atomics + 3-state futex protocol
0

copies

The TX path writes encrypted QUIC packets directly into kernel-bound buffers. No staging, no to_vec, no memcpy. Only DMA touches the bytes after that.

CI gate · perf gate "memcpy/pkt TX = 0"
Zero memcpy — with receipts. Turn on linux-af-xdp or land on kernel ≥ 6.18 for io_uring ZCRX — both shipped features — and the transport-path memcpys are gone. Only the two hardware DMAs remain, because that's how Ethernet moves bytes. Broadcast stays at 0 memcpy when producers write through SendBuffer; response stays at 0 memcpy with ZeroResponse native builders. End-to-end floor under AF_XDP: 2 DMAs, 0 application memcpy. Default io_uring without ZC modes is the portable fallback — 2 kernel memcpys above the DMAs, clean, labeled, predictable.
Architecture

One shard, one CPU
one destiny

Each shard owns its sockets, io_uring ring, payload pool, and connection table. Nothing is shared on the hot path. Tokio still drives application code via a deliberate async bridge.

KERNEL · NIC · DMA io_uring · AF_XDP · kqueue · RIO SHARD 0 · CPU 0 UringBackend PacketBufPool QuicHandler Connection table Wakeup futex2 SO_INCOMING_CPU pinned SHARD 1 · CPU 1 UringBackend PacketBufPool QuicHandler Connection table Wakeup futex2 SO_INCOMING_CPU pinned SHARD 2 · CPU 2 XdpBackend UMEM frames QuicHandler Connection table Busy-poll / NAPI XDP_ZC_MODE · NIC DMA → UMEM SHARD N · CPU N Tokio multi-thread runtime · async-tower bridge · application code

Backends are per-shard on the same Io instance — run io_uring everywhere, turn on linux-af-xdp on the shards that carry the hottest UDP traffic, mix freely. The protocol handlers above see the same RecvPacket / TxSink.

RoutingSO_REUSEPORT + eBPFor DCID hash dispatcher
DCID14 b · server_id + shard16 384 cluster slots, partitioned
CPU pinningpthread_setaffinity_npcpuset for tokio + alerts
NUMAFirst-touch on shard threadlocal-node pages
Inter-threadrtrb SPSC ringsfixed at boot, no MPSC contention
Wakeup3 strategiesThree ways to wait →
Memory

Memory you understand
memory you control

You configure the pools. Any count, any slot size, as many tiers as you want. RAII checkout, lifetime-scoped read guards, atomic refcounts. Allocations happen at boot — never in poll().

Pool A · example
small slots · tuned to your packet size
0 in flight e.g. 2 KB × N for QUIC datagrams
Pool B · example
large slots · optional, for jumbo / bodies
0 in flight e.g. 256 KB for H3 response bodies

Single pool, two pools, or any tiered layout. Config { pool_slot_count, pool_slot_size, ... } on Io::new; multiple pools via IoBuilder::pool(slot_size, count). The example above is one reasonable shape.

Slot lifecycle

1
checkoutshard requests slot, gets PayloadSlotReserved
0 alloc · 1 atomic CAS
2
commitquiche::stream_recv writes into slot.as_mut_slice()
0 alloc · 1 memcpy (DMA)
3
lease handed to dispatcherPayloadLease = 16 B Copy index, crosses SPSC ring
0 alloc · 1 atomic store
4
read guard acquiredPayloadReadGuard<'a> increments refcount, &[u8] exposed
0 alloc · 1 atomic add
5
drop & recyclerefcount → 0, slot pushed back onto free stack
0 alloc · 1 atomic CAS

Stack-first

ArrayString, ArrayVec, SmallVec. If the size is bounded, it lives on the stack. Heap is a deliberate decision.

HugePages

MAP_HUGETLB on Linux. VM_FLAGS_SUPERPAGE_2MB on Intel macOS only — Apple Silicon's 16 KB native page already cuts TLB pressure. TLB misses eliminated on pools > 2 MB.

memfd_secret

TLS keys live in pages no other process can map, no swap, zeroized on drop via Zeroizing<T>.

NUMA-local

Heavy allocations happen on the shard thread after CPU pin. Linux first-touch places pages on the local NUMA node.

Syscalls

Five became one
one becomes none

Default path costs five user/kernel transitions per request. io_uring collapses them into one. AF_XDP goes further: shared-memory rings, zero syscalls per packet in steady state.

Traditional · epoll
5
epoll_wait()
recvmsg()
process
sendmsg()
epoll_ctl()

Each syscall: TLB flush, user/kernel switch, msghdr copy.

io_uring · linked SQEs
1
io_uring_enter()
RECVMSG_MULTI + pbuf ring
SENDMSG_ZC linked
FUTEX_WAIT coalesced
SQPOLL · kernel-driven

With SQPOLL: zero io_uring_enter per tick — a kernel thread polls the SQ for you.

AF_XDP · shared-memory rings
0per packet, steady state
TX/RX/FILL/COMPLETION rings in UMEM
NIC writes direct to UMEM (ZC_MODE)
Busy-poll loop — no epoll, no enter
sendto() only when ring empty (NEED_WAKEUP)

Optional feature linux-af-xdp. Driver-dependent ZC_MODE; kernel-copy fallback otherwise.

NAPI busy-poll

Kernel ≥ 6.9 lets the io_uring driver poll the NIC directly. Eliminates softirq latency under contended load.

Registered buffers

Pre-pin pool pages once. Skip get_user_pages on every recv: ~200 ns saved per packet.

UDP_GSO + GRO

Segment a 64 KB superpacket into 1500-byte frames in the NIC. One sendmsg, N wire packets.

TCP Fast Open

Piggyback request data on the SYN. First HTTP byte arrives in 1 RTT instead of 2.

splice / IORING_OP_SPLICE

File → socket without ever touching userspace. The page cache moves directly to the NIC.

kTLS

TLS encrypt offloaded to the kernel. Plaintext comes from the page cache, ciphertext goes straight to the NIC.

Message flow

The bytes never move
only indices do

A packet arrives, lives in one pool slot. Sync and async handlers both read from that same slot. The response is written into another slot. What crosses threads is a 16-byte lease or a 16-byte Arc pointer — not the data.

NIC · HARDWARE RX · inbound packet arrives DMA · 1 hardware transfer RX pool slot · packet bytes PacketBufPool[n] · registered buffer · the SAME physical page, the whole trip bytes &[u8] · borrow OwnedSlot · Arc ptr (16 B) SYNC · run_sync match event { StreamFrame { data, .. } => ... same thread · &[u8] into the RX slot let buf = io.send_buffer(n)?; checkout TX slot · write response directly io.stream_write_buffer(conn, stream, buf)?; TX slot handed to shard · queued for send 0 memcpy · 0 alloc · 0 cross-thread sync Reads one slot, writes another. Nothing is moved. BEST FOR · CPU-bound handlers, RPC, fast paths ASYNC · run_async / run_tower io.detach_http_request()? promote slot → Arc · 1 atomic refcount bump rtrb::Producer::push(HttpOwnedRequest) SPSC ring → worker · 112 B struct crosses, not the data let r = handler.handle(req).await; worker reads the SAME slot via req.path() / .header() 0 memcpy · ~64 B alloc (FU node) Slot still lives in the shard's pool. Worker reads across threads via the Arc. BEST FOR · DB queries, long handlers, cooperative mt write once into handle.http_respond() TX pool slot · response bytes Written once by SendBuffer / ZeroResponse · registered for DMA egress bytes DMA · 1 hardware transfer NIC · HARDWARE TX · outbound packet leaves
Read it twice. Two pool slots appear in this diagram. Both are pre-allocated, registered with the kernel for zero-copy DMA, and written exactly once each. Between them live the handlers — sync reads the RX slot directly, async crosses to a worker via an Arc pointer over the same slot. At no point does a memcpy happen in the application path. The two DMAs at top and bottom are hardware transfers, not copies.
Wakeup

Three ways to wait
pick your latency / CPU trade

Between two messages, the shard has to wait. The Wakeup trait is a ZST at runtime — the strategy is monomorphized into the loop, no virtual dispatch. Mix per shard: futex on the API tier, spin on the order-book.

FutexWakeup

Futex

The default. IORING_OP_FUTEX_WAIT on Linux ≥ 6.7. Three-state coalesced protocol. 0 file descriptors per shard, 1 syscall per wakeup cycle, kernel handles fairness. ~700 ns wakeup latency.

Best for: API tier · general workloads · per-CPU shard with mixed load
XdpPollMode::Auto

Adaptive

Best of both. AF_XDP's 4-tier ladder — Hot · Warm · Cool · Idle — promotes to busy-poll during traffic bursts, falls back to interrupt-driven sleep when idle. Tier eval every 100 ms, hysteresis to prevent flapping.

Best for: bursty UDP feeds · market data · DNS resolvers · NTP fleets
SpinWakeup

Spin

Cores you own. Pure std::hint::spin_loop(). 100 % CPU, no syscall, no kernel involvement. Sub-100 ns wakeup. Pairs naturally with AF_XDP busy-poll for sub-microsecond end-to-end latency.

Best for: HFT order books · market makers · ultra-low-latency RPC tier
Why 3, not 1. A REST tier handling 10 K conn/s wants Futex — sleep cheaply between bursts, share the CPU. A market-data multicast feeder wants Adaptive — busy-poll during the open, sleep after-hours. An order-router wants Spin — never sleep, never miss a quote. zero-io lets you choose per shard, the trait is monomorphized so the cold paths cost zero in the hot one.
Safety

The compiler is the contract
use-after-invalidate cannot compile

Zero-copy in C and C++ requires runtime discipline, README warnings, and code reviews. zero-io encodes the invariant in the type system: Event<'poll> ties every borrowed &[u8] to the &mut Io from next_event(). The next io.poll() call mutably borrows Io — and the borrow checker rejects it as long as one byte of slot data is still in scope.

cargo build compile error
let Some(event) = io.next_event() else { return };
let data: &[u8] = match event {
    Event::StreamFrame { data, .. } => data,
    _ => return,
};

io.poll(Duration::from_millis(10))?;
// ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
// error[E0502]: cannot borrow `io` as mutable
//   because it is also borrowed as immutable
//   first borrow occurred here, used by `data`

println!("{}", data[0]); // dead code — compiler stopped you

The lifetime IS the contract

No runtime check, no allocator inspection, no test that hopes to catch it. The annotation 'poll on the Event tells rustc exactly when the bytes die — and rustc enforces it before your code reaches a CPU.

Cross-thread? Use OwnedSlot

Need the bytes after poll or on another thread? io.detach_event_data() hands you an OwnedSlotSend + Sync, Arc-counted over the same slot. The compiler still tracks it; the slot is dropped only when the last reference goes away.

What this doesn't need

No reference counting on the hot path. No bounds-check after a memcpy. No "did the kernel still own this buffer?" question. No // SAFETY: comment in handler code. The unsafe primitives live in PoolFreeStack, audited once, behind a typestate façade.

Slot<S> typestate

Pool slots transition Empty → Reserved → Committed → Released through generic state types. Slot<Reserved> has no .read() method ; Slot<Released> has no .commit(). Use-after-release, double-commit, write-without-acquire — all impossible to type-check. Wrong code refuses to compile.

Why this changes the game

Every other zero-copy network library — DPDK, userspace TCP stacks, custom C kernels — relies on documentation and reviews to keep callers honest. Rust's borrow checker turns "don't read this after the next poll" from a comment into a compile error. The result: zero-copy without the footgun tax.

Performance

What it costs
what it doesn't

Architectural targets, not field-measured numbers. Production benchmarks land with the 1.0 release.

30–55M pps
Throughput
Per-shard, AF_XDP ZC_MODE. io_uring path: 5–15.
15–25ns/pkt
Latency
RX-to-handler hot path, AF_XDP. io_uring path: 50–100.
0allocs/req
HTTP path
Warm pool · run_sync · match-arm router
0protocols
Coverage
QUIC, H3, WT, TCP, UDS, WebSocket, HTTP/1.1+2, REST, gRPC, MQTT, Redis, FIX, SBE, SMTP, FTP, DNS, mDNS, NTP, SOCKS5

Throughput · packets per second per shard

DPDK referencekernel-bypass, no protocols
50–80M pps
zero-io · AF_XDPZC_MODE, busy-poll
30–55M pps *
zero-io · io_uringdefault, SQPOLL on
5–15M pps
nginx-quicworkers + reuseport
~3M pps
tokio-quicheCloudflare wrapper
~2M pps
Quinn + Tokiomulti-thread runtime
~1.5M pps

Per-packet latency · nanoseconds

DPDK reference
5–10ns
zero-io · AF_XDP
15–25ns *
zero-io · io_uring
50–100ns
Tokio · mio + epoll
500–2000ns

* AF_XDP ZC_MODE — driver dependent; fallback XDP_COPY_MODE matches io_uring. See Backends for supported NICs.

HTTP allocations · per request, warm pool

run_syncmatch-arm router
0B
run_tower + matchitgeneric Tower layers
64B
zero_io_axumdrop-in axum + HeaderMap pool *
200B
hyper + axumstock Tokio stack
~3,500B · 5–7 allocs
reqwest GETpopular client
~6,000B · 10–15 allocs

* header-map-pool feature flag · reclaims axum's HeaderMap after the response chain so a per-shard pool can re-issue it. Steady-state floor ~200 B per request; first request still pays the initial allocation.

Backends

One API
the right backend for your box

From an embedded gateway in a forklift to a CDN edge node to an HFT order router — same code, different config. Memory tuned to the box (64 KB pool slots × 16 frames on a Pi, 64 MB UMEM × 16 K frames on a 100 G NIC), wakeup tuned to the workload (futex on the API tier, spin on the order book), backend tuned to the kernel (io_uring everywhere, AF_XDP where the driver supports zero-copy).

Embedded · IoT gateways

Single-shard Io, pool_slot_count = 16, slot_size = 1024. FutexWakeup. ~64 KB total. ARMv7+ / Raspberry Pi class.

Generic apps · APIs

Single shard or 2–4 shards via IoCluster, default config. io_uring. The right tier for a REST/gRPC service that just needs "not Tokio's perf cliff".

HPC · CDN edge

Cluster of N shards = N CPUs, NUMA-pinned. AF_XDP on dedicated NIC queues. Adaptive busy-poll. 64 MB UMEM × 16 K frames. Many millions pps per box.

HFT · order routers

Per-CPU shard pinned isolated, SpinWakeup, AF_XDP ZC_MODE, FreeBSD userspace TCP on listen ports. Hugepages on. p99 sub-microsecond.

Per-OS backends · same API, native kernel

OS
Kernel backend
Status
Notes
Linux ≥ 6.7
io_uring
tier 1
Production default. Futex2 wakeup, SQPOLL, registered buffers, GSO/GRO, linked SQEs, optional ZCRX (kernel ≥ 6.18) *.
macOS ≥ 14
kqueue
tier 1
Identical API surface. EVFILT_USER fflags for targeted wakeups, sendmsg_x/recvmsg_x batch syscalls, hugepages on x86.
Windows ≥ 10 1809
RIO + IOCP
tier 1
Registered buffers give zero-copy TX on par with SENDMSG_ZC. Dedicated completion queues per listener.

Linux I/O strategies · pick per shard

Strategy
Protocols
TLS
Why pick it
io_uring(default)
UDP, TCP, all protocols via the kernel stack
kTLS · including NIC HW offload (mlx5 ConnectX-6+, etc.)
Portable, CI-gated, runs every protocol. 5–15 M pps, 50–100 ns/pkt. The right default unless you have a specific reason.
AF_XDPlinux-af-xdp
UDP native (kernel-bypass via UMEM). TCP via FreeBSD userspace stack on opt-in ports — the BSD TCP state machine ported to run on top of AF_XDP frames, replacing the kernel TCP path for those listen ports only *
software only · rustls on CPU (AES-NI accelerated). No NIC HW TLS offload — kTLS is a kernel feature and AF_XDP bypasses the kernel TCP stack by design.
UDP-heavy hot paths (market data, DNS at scale, NTP fleets) and HFT-tier TCP on specific ports. 30–55 M pps, 15–25 ns/pkt.

Driver-support footnote (canonical) — AF_XDP XDP_ZC_MODE requires zero-copy support in the NIC driver: mlx5 (Mellanox/NVIDIA ConnectX-4 Lx and later), i40e / ice (Intel X710/E810), ena (AWS Nitro), virtio-net (recent kernels). Fallback XDP_COPY_MODE works on every driver but adds one kernel memcpy (matches io_uring's cost). The FreeBSD userspace TCP path is opt-in per listen port via [userspace_tcp] enabled_ports = […]; other ports keep kernel TCP through XDP_PASS. TSO / GRO / LRO / kTLS are unavailable in the AF_XDP path by design — the cost of bypassing the kernel TCP stack.

Tier note — the per-OS table above is about API parity; the strategies table above is about Linux I/O backend maturity. Linux ≥ 6.7 ships at OS-tier 1 (full API), but its AF_XDP strategy is opt-in and CI-gated only when the feature is on. macOS and Windows are tier 1 for the OS surface (kqueue / RIO have parity); Linux's I/O strategies have their own tiering.

Protocols

Every protocol
one library

Pluggable ProtocolHandler trait. Each protocol is a feature gate. Pay only for what you use.

QUICtier 1
HTTP/3tier 1
WebTransporttier 1
TCPtier 1
UDStier 1
WebSockettier 1
HTTP/1.1 + 2tier 1
RESTtier 1
gRPCtier 1
MQTT 3.1.1 / 5tier 2
Redis · RESP2/3tier 2
FIX 4.4 · texttier 2
SBE · CME MDP3tier 2
SMTP · MIMEtier 2
FTP · FTPStier 2
DNS · DoT/DoHtier 2
mDNS · RFC 6762tier 2
NTP · SNTPtier 2
SOCKS5tier 2
Async bridge

Tokio when you want it
not when you don't

Four runtime modes. Each picks a different point on the latency / ergonomics curve. The hot path stays sync; the application stays async.

Mode
Allocs
Body stream
Use case
run_sync
0 B
no
Inline handler, sync. Zero allocations. CPU-bound RPC, parsing, transform.
run_per_core
0 / 64 B
yes
Tokio runtime per shard. .await inline; 64 B for streaming bodies via FuturesUnordered.
run_async
~64 B
yes
Cross-thread dispatch via SPSC ring. Database queries, slow handlers, isolated workers.
run_tower
~64 B
yes
Direct tower::Service. S::Future concrete, no Box::pin.

axum migration cost

Path
Allocs/req
Compatibility
Effort
run_sync + match-arm
0 B
none — write your own
~30 LOC. Best for small services and RPC.
run_tower + matchit
64 B
generic Tower (Timeout, Retry, …)
~50 LOC. Recommended for production HTTP perf.
run_tower + zero-io natives
64 B
7 native middleware (CORS, Auth, Trace, Compress, RequestId, NormalizePath, SensitiveHeaders)
Zero alloc per layer. Covers ~90% of tower-http use.
tower-http-compat
~640 B
full tower-http
Use real tower-http layers via HttpOwnedRequest → http::Request.
zero_io_axum + pool
~200 B
full axum + tower-http
HeaderMap reclaim. Same compat at lower cost.
zero_io_axum::serve
~640 B
full axum + tower-http
Two-line migration: axum::servezero_io_axum::serve.
vs. the ecosystem

Row by row
the API surface

Thread-per-core, plug-in protocol matrix, CI-enforced zero-alloc — the combination doesn't exist anywhere else.

Feature zero-io tokio-quiche Glommio monoio Quinn neqo s2n-quic smoltcp nginx-quic
Thread model thread-per-core Tokio MT thread-per-core thread-per-core sans-io / Tokio sans-io Tokio sync no_std workers
io_uring yes no required yes no no no no no
AF_XDP kernel-bypass yes (opt-in) no no no no no no no no
Cross-platform Linux · macOS · Win cross Linux only cross (varies) cross cross cross bare-metal cross
0-alloc hot path CI gate none none none none none none strict n/a (C)
0-lock hot path lock-free Mutex shard-local shard-local Mutex sans-io Mutex single-thread per-worker
0-copy TX yes * feature-gated none ownership API GSO/sendmmsg none GSO n/a UDP GSO
QUIC yes (quiche) yes no no yes yes yes no yes
H3 + WebTransport yes H3 only no no via h3 crate yes partial no yes
TCP yes no yes yes no no no yes yes
Plugin protocol model yes trait no no sans-io sans-io provider sockets C modules
Tokio bridge yes · 4 modes is Tokio no partial yes n/a yes n/a n/a

* with AF_XDP ZC or io_uring ZCRX (kernel ≥ 6.18). Default io_uring path uses borrowed-send + zero-alloc QPACK vendor patches.

Code

Beautiful by design
server · client

Same Io, same event loop. _listen for servers, _connect for clients. Zero allocations on the hot path either way.

Server

examples/server/quic_echo.rs Rust
use zero_io::{Io, Config, Event};

fn main() -> std::io::Result<()> {
    let mut io = Io::new(Config::default())?;
    io.quic_listen("0.0.0.0:4433".parse()?, &CERT_PEM, &KEY_PEM)?;

    loop {
        io.poll(Duration::from_millis(10))?;
        while let Some(event) = io.next_event() {
            match event {
                Event::StreamFrame { conn, stream, data, .. } => {
                    // `data` is &[u8] borrowed from a pool slot
                    // invalidated on the next poll() — process now or detach
                    io.stream_write(conn, stream, data)?;
                }
                _ => {}
            }
        }
    }
}
examples/server/axum_drop_in.rs Rust
// Before: tokio + hyper
let listener = tokio::net::TcpListener::bind("0.0.0.0:8080").await?;
axum::serve(listener, app).await?;

// After: zero-io. Two lines changed. Same axum router.
let mut io = Io::new(Config::default())?;
io.http_listen("0.0.0.0:8080".parse()?)?;
zero_io_axum::serve(io, app)?.wait()?;
examples/server/tower_perf.rs Rust
// 64 B / req. Generic Tower + zero-io native middleware. 0 alloc per layer.
use tower::ServiceBuilder;
use zero_io_async::{ZeroRuntime, ZeroCorsLayer, ZeroTraceLayer, ZeroCompressionLayer};

let svc = ServiceBuilder::new()
    .layer(ZeroTraceLayer::new())
    .layer(ZeroCorsLayer::permissive())
    .layer(ZeroCompressionLayer::zstd())
    .layer(tower::timeout::TimeoutLayer::new(Duration::from_secs(10)))
    .service(my_handler);

ZeroRuntime::new(io).run_tower(svc)?.wait()?;

Client

examples/client/quic_connect.rs Rust
use zero_io::{Io, Config, Event, StreamId};

fn main() -> std::io::Result<()> {
    let mut io = Io::new(Config::default())?;
    let conn = io.quic_connect("203.0.113.1:4433".parse()?)?;

    loop {
        io.poll(Duration::from_millis(10))?;
        while let Some(event) = io.next_event() {
            match event {
                Event::Connected { .. } => {
                    // 1-RTT ready · session ticket cached for next 0-RTT
                    io.stream_write(conn, StreamId(0), b"GET /quote HTTP/3\n")?;
                }
                Event::StreamFrame { data, .. } => {
                    // `data` borrows the same pool slot the NIC DMA'd into
                    process(data);
                }
                _ => {}
            }
        }
    }
}
examples/client/http_get.rs Rust
// 0 allocs/req on warm pool · LIFO connection reuse · TFO + Happy Eyeballs +
// TLS 0-RTT + Alt-Svc h3 auto-upgrade — all transparent.
use zero_io::http_client::{HttpClient, HttpClientConfig};

// Async wrapper: drop-in replacement for reqwest at ~10× the throughput.
#[tokio::main]
async fn main() -> std::io::Result<()> {
    let client = HttpClient::new(HttpClientConfig::default())?;

    let resp = client
        .get("https://api.example.com/users/42")
        .header("authorization", "Bearer …")
        .send().await?;

    match resp.status() {
        200 => println!("{}", resp.text()?),
        s   => eprintln!("http {}", s),
    }
    Ok(())
}
examples/client/redis_pipeline.rs Rust
// 100 SET + 100 GET in one round-trip via writev. RESP3 inline.
use zero_io::redis::{RedisClient, RedisConfig};

let mut io = Io::new(Config::default())?;
let client = RedisClient::connect(&mut io, RedisConfig::localhost())?;

let mut pipe = client.pipeline();
for i in 0..100 {
    pipe.set(&format!("k:{}", i), format!("v:{}", i).as_bytes());
}
let results = pipe.exec(&mut io)?; // 1 round-trip, not 100
Backpressure

It bends before it breaks
cascade, observed

One cascade state machine, four states. As pool utilization climbs, each step costs a little more — until the last one closes idle connections to recover. Every transition is observable, every drop is counted.

POOL UTILIZATION · shard hot path Healthy < 60 % used Warning 60–80 % Critical 80–95 % · drop bulk TX Drain 95 %+ · close idle

Seven pools feed the cascade

FreeStackpool slotsmain LIFO free list
FillRingRX descriptorskernel reads, driver fills
RxRingRX completionspackets ready to process
TxRingTX descriptorsoutbound queue
CompletionRingTX donebuffers reclaim here
ScratchPoolstagingshort-lived intermediates
PerConnper-connectionstreams, h3, WT state
Budget partition. 60 % RX · 10 % TX critical · 25 % TX bulk · 5 % scratch. Critical TX never starves (handshake packets, ACKs, FIN). Bulk TX is the first to be shed when Critical hits. Idle connections close last — only once we cross 95 %.
Deployment

Embedded or daemon
your call, same API

Two ways to ship zero-io. Pick once at compile time. The protocol handlers, the Io surface, the event loop — identical. The only thing that changes is who owns the privileged file descriptors.

Embedded · default

Linked in

One process. zero-io is a library inside your binary. No IPC, no daemon, no extra moving parts. Simplest possible deployment.

Restart drops in-flight connections — fine for stateless tiers, dev environments, embedded targets, or anything where a cold cycle is acceptable.

cargo add zero-io
Daemon · production tier

Separated

Two processes. A privileged daemon owns the dangerous FDs (BPF, raw sockets, UMEM, TLS keys) ; your unprivileged app talks to it through a sealed memfd.

Hot-reload the app binary without dropping a packet — the daemon keeps everything alive across the execve. Privilege isolation, zero-downtime upgrades, multi-tenant safety.

cargo add zero-io --features daemon-client

How daemon mode works

The daemon holds CAP_NET_ADMIN, CAP_BPF, CAP_SYS_RESOURCE. The app runs as a regular user, zero caps. They share state through a sealed memfd mapping — UDS carries control only.

ZERO-IO-DAEMON · ROOT / CAPS privileged · long-lived CAP_NET_ADMIN · CAP_BPF · CAP_SYS_RESOURCE BPF programs · xsk_map · UMEM memfd io_uring rings · NIC sockets · TLS keys survives across app restarts → hot-reload YOUR APP · UNPRIVILEGED user · ephemeral setuid(nobody) · seccomp + landlock protocol handlers · business logic no caps · no raw sockets · no BPF load execve replacement = zero-downtime upgrade UDS · control + SCM_RIGHTS for FDs SHARED MEMFD · ShardLayout · F_SEAL_GROW + F_SEAL_SHRINK + F_SEAL_WRITE connection table · UMEM frames · pool slots · cascade state · stats counters DAEMON writes · seals at boot APP v1 mmap MAP_SHARED APP v2 (post-execve) recover_from_exec() HOT-RELOAD TIMELINE app v1 running connections live prepare_hot_reload() F_SETFD clear-cloexec on memfd execve(new_binary) ~5 ms · daemon FDs survive recover_from_exec() remap memfd · resume polling

Privilege isolation

Daemon holds the dangerous bits. App runs setuid(nobody) + seccomp + landlock. A handler bug never escalates to BPF / raw socket reach. Compatible with Kubernetes securityContext.

Shared memory · perf-optimal

Connection state, UMEM, pool slots all live in a sealed memfd. App and daemon both mmap MAP_SHARED — same physical pages, no syscall on the data path. UDS only carries setup commands.

Binary hot-reload · zero packet drop

Ship a new binary, signal the supervisor, app execve's the replacement. Daemon-owned BPF / sockets / UMEM persist across the boundary. The new binary calls recover_from_exec() and rebinds — in-flight TCP and QUIC connections continue without notice.

Secrets that can't leak

TLS keys live in memfd_secret pages — no other process can map them, no swap, no /proc/<pid>/mem exposure. Wrapped in Zeroizing<T> so they're scrubbed on drop. Even a kernel exploit on the app side never reaches them.

Cert hot-reload, no restart

Rotate TLS certificates without dropping a single connection. CertReloadHandle::reload_from_pem(...) swaps via arc-swap — old connections finish on the old cert, new connections pick up the new one. Sub-microsecond cutover.

Audit trail · structured

Every privileged action — BPF load, cert reload, ops command, signal handler — emits a structured event on target = "audit::*". Actor uid/pid, before / after state, monotonic timestamp. Compatible with NIST SP 800-53 AU-2/3, OWASP ASVS §1.4.

Ops surface, locked down

Force actions (drain-tx, conn-kill, cert-reload, BPF reload) speak through a UDS /run/...ops.sock at mode 0600. Mandatory HMAC, two-phase commit for destructives, per-class rate limit, profile allowlist (dev / staging / prod), sealed-token for prod-only operations.

Multi-tenant safe

Per-shard connection tables, per-shard pools, no cross-tenant pointers. Tenant identity bound at handshake, enforced through the audit chain. Acceptable for cooperative tenants today; adversarial multi-tenancy = run multiple daemons.

Read the API

Every public type. Every protocol method. Every config knob.

Reference

Public API

Every type, trait, and method exposed at the crate boundary. Architectural plan: PLAN-STEP173 → STEP199.

Quick start

Three primitives. Io owns the shard, poll() drives one tick, next_event() drains the queue.

main.rsRust
let mut io = Io::new(Config::default())?;
io.quic_listen("0.0.0.0:4433".parse()?, &cert, &key)?;
loop {
    io.poll(Duration::from_millis(10))?;
    while let Some(ev) = io.next_event() { handle(ev); }
}

Feature gates

Compile only what you use. Defaults: quic + tcp. Everything else is opt-in.

FeatureEnablesImplies
quicQUIC listen/connect, datagrams, streams
tcpTCP listen/connect, WriteBufferPool
websocketWebSocket listen/connect, masking, ping/pongtcp
websocket-deflatepermessage-deflate compressionwebsocket
httpHTTP/1.1 + HTTP/2 servertcp
http-clientHTTP client + HttpPoolhttp
webtransportH3 CONNECT + WT sessionsquic
tls-ktlskernel TLS offloadtcp / http
towertower::Service<HttpOwnedRequest> impls
tower-http-compatHttpOwnedRequest → http::Request adaptertower
linux-af-xdpAF_XDP backend (tier 2)
linux-userspace-tcpFreeBSD userspace TCP stack on AF_XDP (tier 3, experimental)
socks5 / dns / ntp / mdnsopt-in feature gates per protocol

Io

The shard handle. Owns sockets, the io_uring ring, the payload pool, and the connection table. Single-threaded; do not Send.

MethodPurpose
fn new(config: Config) -> io::Result<Self>Construct a single-shard Io. Allocates pools, opens io_uring.
fn poll(&mut self, timeout: Duration) -> io::Result<()>One tick: drain CQEs → process_dirty → flush TX → fire timers.
fn next_event(&mut self) -> Option<Event<'_>>Drain the per-tick event queue. Borrows &mut self.
fn detach_event_data(&mut self) -> Option<OwnedSlot>Promote the current event's payload to an owned, Send + Sync slot.
fn detach_http_request(&mut self) -> Option<HttpOwnedRequest>HTTP-only. Detach the current request as a 96–112 B owned struct.
fn send_buffer(&mut self, min: usize) -> io::Result<SendBuffer>Check out a writable pool slot for zero-copy TX.
fn close(&mut self, conn: ConnId) -> io::Result<()>Immediate close.
fn close_graceful(&mut self, conn: ConnId, timeout: Duration)Drain in-flight, then close.
fn pool_pressure(&self) -> Option<PoolPressureInfo>Snapshot of pool utilization for back-pressure checks.
fn pool_stats(&self) -> PoolStatsCurrent / peak / capacity per pool.
fn conn_stats(&self, conn) -> io::Result<ConnStats>RTT, cwnd, bytes, packets-lost per connection.
fn handle(&self) -> IoHandleCross-thread send-side handle. Cheaply cloneable.

Event<'poll>

Borrowed from the current poll. Invalidated by the next poll(). Process synchronously or call detach_event_data() for cross-thread.

VariantCarries
UdpRecvendpoint, from, data: &'poll [u8]
Connectedconn, peer, protocol: Protocol
Disconnectedconn, error_code: u64, reason: &'poll [u8]
Datagramconn, data: &'poll [u8]
StreamFrameconn, stream, kind: MessageKind, data: &'poll [u8]
StreamReset · StopSendingconn, stream, error_code
SessionReadyconn · WebTransport CONNECT 200
PathMigrationconn, old_peer, new_peer
HttpRequest · HttpBodyChunk · HttpResponseHTTP feature only
PoolPressure · DnsResolved · MqttEventPer-feature

MessageKind

Typed message discriminator on StreamFrame and ConnDatagram. Replaces the old opaque msg_type: u8.

  • Binary · WsText · WsBinary · MqttPacket · GrpcFrame · FixText · Sbe · User(u8)

Config

Per-shard knobs. #[non_exhaustive] — extend without breaking semver.

FieldDefault
pool_slot_count: usize4096
pool_slot_size: usize2048 (≥ 1200, RFC 9000)
huge_pages: ToggleAuto
max_connections: usize1024
max_events_per_poll: usize256
pool_pressure_pct: u880
compression_threshold: usize128 B
uring: UringConfig linuxauto-tuned
debug: DebugConfigdisabled

IoHandle

Send-side handle obtained via io.handle(). Send + Sync + Clone. Cross-thread paths funnel through this — workers send, the shard wakes and writes.

  • fn send_datagram(&self, conn, data: &[u8]) -> io::Result<()>
  • fn stream_write(&self, conn, stream, data: &[u8]) -> io::Result<()>
  • fn send_datagram_buffer(&self, conn, buf: SendBuffer) -> io::Result<()>
  • fn stream_write_buffer(&self, conn, stream, buf: SendBuffer) -> io::Result<()>
  • fn http_respond(&self, conn, request_id, response: ZeroResponse)
  • fn close(&self, conn) · close_graceful(&self, conn, timeout)

OwnedSlot · SendBuffer

OwnedSlot is a payload pulled out of the per-poll lifetime and made Send + Sync. Internally an Arc over a pool slot — refcounted, recycled on drop.

SendBuffer is a writable pool slot for zero-copy TX. Acquire via io.send_buffer(n), write into as_mut_slice(), hand to send_datagram_buffer / stream_write_buffer.

IoCluster · multi-shard

Production entry point for > 1 shard. Owns N reuseport sockets (or one shared socket + DCID dispatcher) and exposes the same listen / connect surface, fanned out across shards.

ItemPurpose
ClusterConfigshard_count, routing, cpu_affinity, expected_protocols
RoutingStrategyReusePortCbpf · ReusePortEbpf · DcidDispatch
ScidGenerator14 bits encode (server_id, shard_id) jointly in QUIC SCIDs — 16 384 cluster slots partitioned across the two fields. SERVER_ID_MASK = 0x3FFF, shard count is a power of 2 within each server.
ShardIoPer-shard handle; identical surface to Io.
Pick one. Io for tests, single-core deploys, tools. IoCluster for production servers. Don't roll your own N Io instances — you'll miss the routing.

UDP

  • fn udp_bind(&mut self, addr: SocketAddr) -> io::Result<EndpointId>
  • fn udp_bind_with(&mut self, config: UdpEndpointConfig) -> io::Result<EndpointId>
  • fn udp_send(&mut self, endpoint, to, buf: SendBuffer) -> io::Result<()>
  • fn udp_send_bytes(&mut self, endpoint, to, data: &[u8]) -> io::Result<()> · convenience, 1 memcpy
  • fn multicast_join · multicast_leave (group: MulticastGroup)

MulticastGroup is typed: AnySource (mDNS, RFC 1112) or SourceSpecific (CME / Eurex feeds, RFC 4607).

QUIC

  • fn quic_listen(&mut self, addr, cert, key) -> io::Result<EndpointId>
  • fn quic_listen_with(&mut self, config: QuicListenConfig)
  • fn quic_connect(&mut self, addr) -> io::Result<ConnId>
  • fn quic_connect_with(&mut self, config: QuicConnectConfig) · supports Happy Eyeballs (RFC 8305) when HostOrAddr::Host.
  • fn send_datagram(&mut self, conn, data: &[u8])
  • fn stream_write(&mut self, conn, stream, data) -> io::Result<usize>
  • fn stream_read(&mut self, conn, stream, &mut [u8]) · QUIC / WT only · returns StreamNotPullable on TCP/WS
  • fn early_data_send · session_ticket · set_session_ticket · 0-RTT

QuicListenConfig covers idle timeout, stream / data limits, congestion (Reno · Cubic · BBRv2), DPLPMTUD, retry tokens, ECN, allowed origins.

TCP · Unix Domain Sockets

  • fn tcp_listen · tcp_listen_with
  • fn tcp_connect · tcp_connect_with · Happy Eyeballs supported
  • fn uds_listen(&mut self, path: &str)
  • fn uds_connect(&mut self, path: &str)

TCP RX is push-only. Data arrives via Event::StreamFrame { kind: MessageKind::Binary }. There is no tcp_stream_read; calling stream_read on a TCP ConnId returns IoError::StreamNotPullable.

WebSocket

  • fn ws_listen · ws_listen_tls · ws_listen_with
  • fn ws_connect · ws_connect_with
  • fn ws_send(&mut self, conn, data: &[u8], text: bool)
  • fn ws_send_buffer(&mut self, conn, buf: SendBuffer, text: bool)
  • fn ws_close(&mut self, conn, code: u16, reason: &str)

Frames arrive as Event::StreamFrame { kind: WsText | WsBinary }. Ping/pong handled internally.

HTTP · HTTP/2

  • fn http_listen · http_listen_tls · http_listen_with(HttpListenConfig)
  • fn http_respond(&mut self, conn, request_id, response: ZeroResponse)
  • fn http_request(&mut self, …) -> io::Result<RequestId> · client

HttpListenConfig: max_header_count, max_header_size, max_body_inline, request_timeout_ms, H/2 streams / window / frame / header-list, compression threshold.

WebTransport

  • fn wt_connect(&mut self, addr, path: &str)
  • fn wt_connect_with(WtConnectConfig)
  • Server: quic_listen_with(QuicListenConfig { enable_webtransport: true, allowed_origins, … })
  • Event::SessionReady · H3 CONNECT 200 accepted

One session per connection. Datagrams + streams over the H3 CONNECT.

TLS · STARTTLS · hot-reload

  • fn tls_upgrade(&mut self, conn, config: TlsClientConfig) · client STARTTLS
  • fn tls_accept_upgrade(&mut self, conn, config: TlsServerConfig) · server STARTTLS
  • fn enable_cert_hot_reload(&mut self, endpoint) -> CertReloadHandle
  • CertReloadHandle::reload_from_pem · reload_from_bytes · reload_quic_from_pem · Send + Sync + Clone, atomic swap via arc-swap

Auto-attempts kTLS after handshake if available (Linux ≥ 6.7). Falls back to rustls in-process if not.

Multicast · DNS · NTP · mDNS · SOCKS5

MethodPurpose
fn dns_init · dns_resolve · dns_resultAsync DNS via UDP, optional TCP fallback, optional DoT/DoH.
fn ntp_init(NtpConfig) · ntp_offset_us · ntp_now_usSNTP / NTP, multi-server, KoD.
fn mdns_init · mdns_register(MdnsService) · mdns_discover · mdns_resolveRFC 6762 / 6763, ASM 224.0.0.251.
fn tcp_connect_socks5(proxy, dest, auth)RFC 1928. Universal.
fn quic_connect_socks5(proxy, dest, auth)UDP ASSOCIATE. Best-effort, server allowlist required, MTU auto-adjusted, migration disabled.

ZeroRuntime · async bridge

Wraps Io with a Tokio-friendly driver. Four modes pick a different point on the latency / ergonomics curve.

MethodAllocsBest for
fn run_sync<H: SyncHandler>(self, handler) -> io::Result<ShutdownHandle>0 BCPU-bound inline handlers
fn run_async<H: AsyncHandler + Clone>(self, handler)~64 BDB queries, slow handlers
fn run_tower<S: tower::Service<HttpOwnedRequest>>(self, svc)~64 BTower middleware, generic Tower
fn run_per_core<H: AsyncHandler + Clone>(self, cluster, handler)0 / 64 BPer-shard tokio runtime, mixed inline + streaming

HttpOwnedRequest · BodyStream

~96–112 B owned struct. Send + Sync. Path / headers / body offsets stored as a 12-byte table inside the pool slot. Zero-copy accessors return &str slices into the slot.

  • fn path(&self) -> &str
  • fn header(&self, name: &str) -> Option<&str>
  • fn method(&self) -> HttpMethod
  • fn body(&self) -> &[u8] · inline body
  • fn body_stream(&mut self) -> Option<BodyStream> · streaming uploads, 8-slot SPSC ring per request

ZeroResponse mirrors this on the response side. Builders: ZeroResponse::ok().json(&value), ZeroResponse::not_found(), etc.

Native middleware (zero alloc)

Layertower-http equivalent
ZeroCorsLayertower_http::cors::CorsLayer
ZeroAuthLayertower_http::auth::ValidateRequestHeader
ZeroTraceLayertower_http::trace::TraceLayer
ZeroCompressionLayertower_http::compression::CompressionLayer
ZeroRequestIdLayertower_http::request_id::SetRequestIdLayer
ZeroNormalizePathLayertower_http::normalize_path::NormalizePathLayer
ZeroSensitiveHeadersLayertower_http::sensitive_headers::SetSensitiveHeadersLayer

For anything outside this list, use tower-http-compat at a 640 B / req cost.

zero-io-axum

  • fn serve(io: Io, app: axum::Router) -> io::Result<ShutdownHandle>

Two-line migration from axum::serve. Cost: ~200 B steady-state with the default header-map-pool feature (which reclaims axum's HeaderMap per request); 640 B without the pool. The HeaderMap itself is structural — axum's signature requires it.

REST · gRPC

Higher-level crates building on the HTTP base.

  • zero-rest: Router, RestRequest, RestResponse, PathParams, optional CacheMiddleware.
  • zero-grpc: GrpcService, ServerStream, ClientStream, BidiStream, Code, Status. Code generated from .proto via zero-grpc-build.

MQTT · Redis

  • zero-mqtt: MqttClient, MqttBroker, QoS 0/1/2, MQTT 3.1.1 + 5, trie-based topic match.
  • zero-redis: RedisClient, RedisPipeline, RESP2 / RESP3, pub/sub.

FIX · SBE

  • zero-fix: zero-copy text FIX 4.4 parser/builder, session FSM. Persistence in SessionWal (PLAN-STEP193b) — append-only WAL per session, CRC32C, atomic checkpoint.
  • zero-sbe: flyweight SBE decoder for CME MDP 3.0 / Eurex T7. Multicast feed handler with explicit gap-recovery FSM (T1..T14 transitions, I1..I5 invariants).

SMTP · FTP

  • zero-smtp: SMTP client + server, STARTTLS, AUTH PLAIN / LOGIN / XOAUTH2, MIME, DKIM (Ed25519 / RSA), pipelining.
  • zero-ftp: FTP client + server, AUTH TLS (FTPS), passive / EPSV, splice / mmap for transfers.

Ops CLI

charting-status binary. UDS at /run/charting-server/ops.sock (mode 0600). Mandatory HMAC. Two-phase commit for destructive actions. Profile-based allowlist (dev / staging / prod). Sealed-token for prod-restricted operations.

  • charting-status snapshot · healthz · readyz · read-only, no auth needed beyond peer-cred
  • charting-status drain-tx · conn-kill · bpf reload-ports · cert-reload · reset-peaks · privileged, audit-logged

IoError

Structured. Variants pinned to #[non_exhaustive]. Diagnostic strings are actionable — they name the syscall, the cause, and the fix.

  • KernelTooOld { required: KernelVersion, found: KernelVersion }
  • StreamNotPullable { protocol: Protocol } · use Event::StreamFrame
  • NotSupportedOnPlatform { platform }
  • NotSupportedOnBackend { backend, feature }
  • PoolExhausted · DnsError · ConnectTimeout · TlsHandshakeFailed
  • HmacMismatch · NonceReplay · ProfileForbidden · ops API

Backpressure cascade

Seven pools (FreeStack · FillRing · RxRing · TxRing · CompletionRing · ScratchPool · PerConn) feed one cascade state: Healthy → Warning → Critical → Drain. Each transition has a budget partition (60% RX / 10% TX critical / 25% TX bulk / 5% scratch) and a drop policy. Live snapshot via Io::pool_pressure() or the ops endpoint.

← Back to overview