Plast interview

25. Release/Acquire Flag

In a tick-to-trade pipeline, one thread publishes decoded data while another consumes it immediately. Latency pressure pushes for spin-based handoff without locks. Correctness under the C++ memory model is critical to avoid heisenbugs at market open.

#include <atomic>
struct Msg{int v;};
std::atomic<bool> ready{false};
Msg m{0};
void publish(){ m.v=42; ready.store(true, std::memory_order_relaxed); }
int consume(){ while(!ready.load(std::memory_order_relaxed)){} return m.v; }

Part 1.

Assuming publish() runs on one thread and consume() on another, is this correct under the C++ memory model? What values can consume() observe, and what is the lowest-overhead fix?

Part 2.

(1) Is there a data race on m.v? Why or why not?

(2) Can a release store on ready alone establish ordering? Explain.

(3) Would a std::atomic_thread_fence solution be equivalent here?

(4) When is memory_order_seq_cst justified over acquire-release?

(5) How to mitigate false sharing between ready and m fields?

Answer

Answer (Part 1)

As written, it is incorrect: m.v is written and then read across threads without a happens-before; the relaxed flag does not order those accesses, so behavior is undefined and consume() may read 0 or any stale value. Use a release/acquire pair on the flag to form a happens-before: writer ready.store(true, std::memory_order_release), reader ready.load(std::memory_order_acquire) after which reading m.v is sequenced and safe with minimal overhead.

Answer (Part 2)

(1) Yes. Non-atomic m.v is accessed by multiple threads without synchronization, creating a data race and UB. The relaxed flag provides no ordering.

(2) No, not by itself. A release store only orders prior writes when a matching acquire load reads that value.

(3) Yes, with care. Writer: release fence before a relaxed ready store; reader: acquire fence after observing ready true via a relaxed load.

(4) When global single-order reasoning across multiple atomics is needed, or for simplicity during debug. It can add extra fences on weakly ordered hardware.

(5) Place ready and m on separate cache lines using alignas(64) or padding. This reduces invalidation traffic and tail latency.