Plast interview

35. Flags That Bite

In an HFT decode path, compiler flags and warnings can determine both correctness and nanosecond latency. UB that seems harmless at -O0 can silently miscompile at -O3 with LTO. You need configurations that are fast yet safe under aggressive optimization.

#include <cstring>
#include <cstdint>
float as_float(std::uint32_t x){
  float f;
  std::memcpy(&f, &x, sizeof f);
  return f;
}

Part 1.

A teammate proposes replacing the std::memcpy bit-cast with reinterpret_cast<float*>(&x) to "go faster." Explain the behavioral and performance consequences under -O3 with and without -fstrict-aliasing, and how you would enforce the correct approach in CMake.

Part 2.

(1) When is -Ofast acceptable over -O3 in pricing code, and what correctness risks does it introduce?

(2) Pros/cons of LTO/IPO for a matching engine; how to toggle per-target in CMake?

(3) Which warning flags help catch dangerous casts/aliasing, and should they be errors?

(4) How would you structure sanitizer builds in CMake to avoid impacting release performance?

(5) When and why use -march=native versus a portable baseline; deployment implications?

Answer

Answer (Part 1)

Type-punning via reinterpret_cast<float*>(&x) is undefined behavior in C++ and can miscompile under -O3 with strict aliasing enabled (the default at higher optimizations): the compiler may assume int and float do not alias, hoist loads, or eliminate the store, yielding stale or nonsense values. Disabling aliasing with -fno-strict-aliasing may incidentally make it appear to work, but the behavior remains undefined and non-portable.

Keep the defined approach: std::memcpy (or std::bit_cast in C++20) which optimizes to a register move at -O3 on trivial types. In CMake, prefer correctness-by-construction: expose C++20 if available (target_compile_features(t PRIVATE cxx_std_20)), avoid adding -fno-strict-aliasing to “fix” UB, enable -O3 -march=native -flto for Release, and enforce warnings (-Wall -Wextra -Werror plus aliasing/cast warnings) per-target with target_compile_options.

Answer (Part 2)

(1) -Ofast relaxes IEEE/strict semantics (e.g., reassociation, NaNs, signed zeros), risking numerical divergence. Use only with proven invariants and exhaustive validation.

(2) LTO improves cross-TU inlining and dead-stripping, reducing call overhead; link times grow. Toggle via set_property(TARGET t PROPERTY INTERPROCEDURAL_OPTIMIZATION TRUE) and restrict to Release.

(3) Use -Wall -Wextra -Wconversion -Wfloat-conversion -Wcast-align and GCC’s -Wstrict-aliasing. Treat them as errors to prevent risky merges.

(4) Create separate sanitizer configs/targets (ASan/UBSan/TSan) using target_compile_options/target_link_options; never enable in Release. Keep feature flags orthogonal via presets or cache variables.

(5) -march=native maximizes throughput using host-specific ISA but breaks portability and reproducible artifacts. Prefer a baseline (-march=x86-64-v3) and per-deployment tuning.