Plast interview

43. Spin or Sleep

In a co-located HFT gateway, microburst latency dictates execution quality. Choosing between epoll/kqueue and busy-polling trades CPU for tail latency. Consider this minimal backoff helper used inside a packet pump.

struct Backoff { int spins=0; int max; };
bool tick(Backoff& b, int (*recv)()) noexcept {
  int n = recv();
  if (n>0) { b.spins=0; return true; }
  if (b.spins++ < b.max) return true;
  return false;
}

Part 1.

Using tick, describe how you would build a hybrid loop: busy-spin while it returns true, then block on epoll/kqueue when it returns false. Explain how you’d tune and adapt Backoff::max for bursty feeds to minimize tail while controlling CPU.

Part 2.

(1) Edge-triggered vs level-triggered: impact on latency, syscalls, and risk of missed readiness?

(2) How would you size Backoff::max under bursty traffic to minimize tail while limiting CPU burn?

(3) When prefer SO_BUSY_POLL or busy-spin over epoll_wait timeouts on modern NICs?

(4) How to avoid cross-core contention and cache thrash while spinning?

(5) Would noexcept and inline here affect inlining or codegen on hot path?

Answer

Answer (Part 1)

Busy-spin on the hot path until tick(b, try_recv) returns false; then perform a bounded wait (epoll_wait/kevent) and reset spin budget. Example shape: for(;;){ if (tick(b, try_recv)) continue; /* kevent/epoll_wait with small timeout */ b.spins = 0; }. Choose Backoff::max from profiling: start small for low CPU, increase until P99 latency flattens; adapt per-core based on recent idle/burst history.

Answer (Part 2)

(1) Edge-triggered reduces wakeups/syscalls but requires draining queues to avoid stalls; lower jitter. Level-triggered is simpler but can increase wakeup storms.

(2) Calibrate against traffic percentiles and CPU budget; sweep to minimize P99/P999. Adapt dynamically: grow on bursts, shrink during idle.

(3) Prefer when queues are shallow and microbursts dominate, and NIC/driver support busy poll; it collapses wakeup latency. Avoid under CPU pressure or shared cores.

(4) Pin threads and isolate cores, avoid shared atomics, and use per-core rings. Align Backoff (alignas(64)) and use PAUSE hints to reduce contention.

(5) noexcept can enable better inlining/unwinding elimination; hot code shrinks. inline/LTO helps devirtualize the callback and hoist checks; watch code size.