25. Release/Acquire Flag
In a tick-to-trade pipeline, one thread publishes decoded data while another consumes it immediately. Latency pressure pushes for spin-based handoff without locks. Correctness under the C++ memory model is critical to avoid heisenbugs at market open.
#include <atomic>
struct Msg{int v;};
std::atomic<bool> ready{false};
Msg m{0};
void publish(){ m.v=42; ready.store(true, std::memory_order_relaxed); }
int consume(){ while(!ready.load(std::memory_order_relaxed)){} return m.v; }
Part 1.
Assuming publish() runs on one thread and consume() on another, is this correct under the C++ memory model? What values can consume() observe, and what is the lowest-overhead fix?
Part 2.
(1) Is there a data race on m.v? Why or why not?
(2) Can a release store on ready alone establish ordering? Explain.
(3) Would a std::atomic_thread_fence solution be equivalent here?
(4) When is memory_order_seq_cst justified over acquire-release?
(5) How to mitigate false sharing between ready and m fields?
Answer
Answer (Part 1)
As written, it is incorrect: m.v is written and then read across threads without a happens-before; the relaxed flag does not order those accesses, so behavior is undefined and consume() may read 0 or any stale value. Use a release/acquire pair on the flag to form a happens-before: writer ready.store(true, std::memory_order_release), reader ready.load(std::memory_order_acquire) after which reading m.v is sequenced and safe with minimal overhead.
Answer (Part 2)
(1) Yes. Non-atomic m.v is accessed by multiple threads without synchronization, creating a data race and UB. The relaxed flag provides no ordering.
(2) No, not by itself. A release store only orders prior writes when a matching acquire load reads that value.
(3) Yes, with care. Writer: release fence before a relaxed ready store; reader: acquire fence after observing ready true via a relaxed load.
(4) When global single-order reasoning across multiple atomics is needed, or for simplicity during debug. It can add extra fences on weakly ordered hardware.
(5) Place ready and m on separate cache lines using alignas(64) or padding. This reduces invalidation traffic and tail latency.