Plast interview

38. Crash Backtrace

In a low-latency gateway, rare crashes must be root-caused quickly without adding runtime overhead. Postmortem debugging with core dumps and precise backtraces is essential to minimize downtime. Consider this minimal reproducer for an intermittent SIGSEGV.

const char* sym() { char buf[4] = "AB"; return buf; }
int main() { auto p = sym(); return p[0]; }

Part 1.

You observe sporadic crashes in production matching this pattern. How do you capture a Linux core dump, extract a symbolized backtrace with gdb, and identify the defect in this snippet?

Part 2.

(1) Why enable frame pointers in production, and what are the tradeoffs?

(2) How do inlining and LTO affect gdb backtraces and symbol resolution?

(3) Which Linux core-dump settings matter most: size limits, core_pattern routing, and per-process naming?

(4) How does ASLR/PIE influence symbolization and reproducibility during postmortem debugging?

(5) When would you prefer live gdb attach over cores, and why in low-latency systems?

Answer

Answer (Part 1)

Enable core dumps (raise RLIMIT_CORE via ulimit -c unlimited) and ensure the binary carries debug info (-g, not stripped). Verify /proc/sys/kernel/core_pattern stores cores to a retrievable path or collector, then on crash: gdb /path/to/binary /path/to/core and run bt full, info frame, and disassemble /m. The root cause is returning a pointer to stack storage (buf), leaving p dangling; dereferencing is undefined behavior. Fix by returning an owning object (e.g., std::array<char,4> or std::string) by value, or using durable storage.

Answer (Part 2)

(1) Frame pointers yield fast, reliable unwinds for gdb/perf and better postmortems. Cost is small extra instructions and reduced register availability.

(2) Inlining can remove call frames; LTO amplifies this, obscuring call chains. With -g, debuggers can show inlined call sites, but it’s noisier.

(3) RLIMIT_CORE controls size; core_pattern controls destination and naming templates. Use per-process naming (%e.%p.%t) and ensure suid_dumpable if needed.

(4) ASLR randomizes addresses; symbolization needs matching binaries and build-ids. PIE shifts text base; gdb resolves via ELF headers and debug info.

(5) Live attach captures ephemeral state but perturbs timing and requires ptrace permissions. Cores are non-intrusive, reproducible, and safer for hot paths.