35. Flags That Bite
In an HFT decode path, compiler flags and warnings can determine both correctness and nanosecond latency. UB that seems harmless at -O0 can silently miscompile at -O3 with LTO. You need configurations that are fast yet safe under aggressive optimization.
#include <cstring>
#include <cstdint>
float as_float(std::uint32_t x){
float f;
std::memcpy(&f, &x, sizeof f);
return f;
}
Part 1.
A teammate proposes replacing the std::memcpy bit-cast with reinterpret_cast<float*>(&x) to "go faster." Explain the behavioral and performance consequences under -O3 with and without -fstrict-aliasing, and how you would enforce the correct approach in CMake.
Part 2.
(1) When is -Ofast acceptable over -O3 in pricing code, and what correctness risks does it introduce?
(2) Pros/cons of LTO/IPO for a matching engine; how to toggle per-target in CMake?
(3) Which warning flags help catch dangerous casts/aliasing, and should they be errors?
(4) How would you structure sanitizer builds in CMake to avoid impacting release performance?
(5) When and why use -march=native versus a portable baseline; deployment implications?
Answer
Answer (Part 1)
Type-punning via reinterpret_cast<float*>(&x) is undefined behavior in C++ and can miscompile under -O3 with strict aliasing enabled (the default at higher optimizations): the compiler may assume int and float do not alias, hoist loads, or eliminate the store, yielding stale or nonsense values. Disabling aliasing with -fno-strict-aliasing may incidentally make it appear to work, but the behavior remains undefined and non-portable.
Keep the defined approach: std::memcpy (or std::bit_cast in C++20) which optimizes to a register move at -O3 on trivial types. In CMake, prefer correctness-by-construction: expose C++20 if available (target_compile_features(t PRIVATE cxx_std_20)), avoid adding -fno-strict-aliasing to “fix” UB, enable -O3 -march=native -flto for Release, and enforce warnings (-Wall -Wextra -Werror plus aliasing/cast warnings) per-target with target_compile_options.
Answer (Part 2)
(1) -Ofast relaxes IEEE/strict semantics (e.g., reassociation, NaNs, signed zeros), risking numerical divergence. Use only with proven invariants and exhaustive validation.
(2) LTO improves cross-TU inlining and dead-stripping, reducing call overhead; link times grow. Toggle via set_property(TARGET t PROPERTY INTERPROCEDURAL_OPTIMIZATION TRUE) and restrict to Release.
(3) Use -Wall -Wextra -Wconversion -Wfloat-conversion -Wcast-align and GCC’s -Wstrict-aliasing. Treat them as errors to prevent risky merges.
(4) Create separate sanitizer configs/targets (ASan/UBSan/TSan) using target_compile_options/target_link_options; never enable in Release. Keep feature flags orthogonal via presets or cache variables.
(5) -march=native maximizes throughput using host-specific ISA but breaks portability and reproducible artifacts. Prefer a baseline (-march=x86-64-v3) and per-deployment tuning.