43. Spin or Sleep
In a co-located HFT gateway, microburst latency dictates execution quality. Choosing between epoll/kqueue and busy-polling trades CPU for tail latency. Consider this minimal backoff helper used inside a packet pump.
struct Backoff { int spins=0; int max; };
bool tick(Backoff& b, int (*recv)()) noexcept {
int n = recv();
if (n>0) { b.spins=0; return true; }
if (b.spins++ < b.max) return true;
return false;
}
Part 1.
Using tick, describe how you would build a hybrid loop: busy-spin while it returns true, then block on epoll/kqueue when it returns false. Explain how you’d tune and adapt Backoff::max for bursty feeds to minimize tail while controlling CPU.
Part 2.
(1) Edge-triggered vs level-triggered: impact on latency, syscalls, and risk of missed readiness?
(2) How would you size Backoff::max under bursty traffic to minimize tail while limiting CPU burn?
(3) When prefer SO_BUSY_POLL or busy-spin over epoll_wait timeouts on modern NICs?
(4) How to avoid cross-core contention and cache thrash while spinning?
(5) Would noexcept and inline here affect inlining or codegen on hot path?
Answer
Answer (Part 1)
Busy-spin on the hot path until tick(b, try_recv) returns false; then perform a bounded wait (epoll_wait/kevent) and reset spin budget. Example shape: for(;;){ if (tick(b, try_recv)) continue; /* kevent/epoll_wait with small timeout */ b.spins = 0; }. Choose Backoff::max from profiling: start small for low CPU, increase until P99 latency flattens; adapt per-core based on recent idle/burst history.
Answer (Part 2)
(1) Edge-triggered reduces wakeups/syscalls but requires draining queues to avoid stalls; lower jitter. Level-triggered is simpler but can increase wakeup storms.
(2) Calibrate against traffic percentiles and CPU budget; sweep to minimize P99/P999. Adapt dynamically: grow on bursts, shrink during idle.
(3) Prefer when queues are shallow and microbursts dominate, and NIC/driver support busy poll; it collapses wakeup latency. Avoid under CPU pressure or shared cores.
(4) Pin threads and isolate cores, avoid shared atomics, and use per-core rings. Align Backoff (alignas(64)) and use PAUSE hints to reduce contention.
(5) noexcept can enable better inlining/unwinding elimination; hot code shrinks. inline/LTO helps devirtualize the callback and hoist checks; watch code size.