Digging deep with heaptrack/massif

Debugging Memory Growth on a Ti-Sitara AArch64 ECU with Heaptrack

A while back I had to track down a recurring heap growth issue on a customer’s automotive ECU based on a TI Sitara Cortex-A53. The system ran multiple ML models for ADAS feature extraction at different frequencies. Each model ran in its own POSIX preemptive real-time thread, pinned to cores with priorities like 120, 110, 80, 60. When switching scheduling policies (for example SCHED_OTHERSCHED_RR or SCHED_FIFO under PREEMPT_RT), the visible priority ranges changed — which is expected behavior on a real-time patched kernel.


The Problem

Over time, memory usage crept up until the system became unstable. The workload made it tricky — multiple ML models meant irregular allocation patterns with frequent buffer churn. I needed something lightweight that could run directly on the target to pinpoint the source.


Why Heaptrack

Heaptrack turned out to be the best fit. It hooks into memory allocations using a preload library (libheaptrack_preload.so) and logs backtraces for each event. Later, you can interpret or visualize those traces with heaptrack_print, heaptrack_interpret, or heaptrack_gui. Compared to Valgrind’s Massif, Heaptrack had much lower runtime overhead — essential on an embedded ARM target.


Making Heaptrack Work on an Embedded Target

Because the ECU used a custom glibc runtime, I had to modify Heaptrack’s launcher scripts. The default wrapper didn’t handle the target’s ld.so and library paths correctly. The workaround was to call the dynamic loader directly with --library-path and --preload options.

Here’s the adapted command that worked reliably:

/home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so \
  --library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ \
  --preload /usr/lib/heaptrack/libheaptrack_preload.so \
  ./xMSRawFileAppDemo <app params>

After running, the trace file (heaptrack.<pid>) could be interpreted and compressed on the device:

/home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so \
  --library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ \
  /usr/lib/heaptrack/libexec/heaptrack_interpret < heaptrack.11632 | gzip -c > ht.gz

And for more control, I extended the Heaptrack script to handle named pipes and output redirection — basically adapting it to the embedded environment:

 INTERPRETER="/home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so --library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ /usr/lib/heaptrack/libexec/heaptrack_interpret"

 LIBHEAPTRACK_PRELOAD="/home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so --library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ --preload /usr/lib/heaptrack/libheaptrack_preload.so"
#$(readlink -f "$LIBHEAPTRACK_PRELOAD")

  

  if [ -z "$write_raw_data" ]; then
/home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so --library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ /usr/lib/heaptrack/libexec/heaptrack_interpret < $pipe > "$output" &
 echo "starting application, this might take some time..."
DUMP_HEAPTRACK_OUTPUT="$pipe" /home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so --library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ --preload /usr/lib/heaptrack/libheaptrack_preload.so "$client" "$@"
EXIT_CODE=$?

These tweaks made Heaptrack fully functional on the aarch64 ECU without needing to rebuild glibc or modify the target filesystem.


The Findings

The analysis showed two main causes:

  1. A buffer in one ML feature extractor wasn’t freed in an error path.

  2. A cleanup thread was starved under certain real-time priority combinations, leaving temporary allocations cached longer than expected.

Fixing these resolved the recurring heap growth entirely.


Key Takeaways

  • Heaptrack works well on embedded targets if you adapt its preload and interpreter scripts for custom glibc paths.

  • Always validate real-time thread priorities after changing scheduling policies — PREEMPT_RT can shift effective ranges.

  • Memory leaks in mixed-frequency ML workloads often hide behind thread timing and priority interactions.


References:

  • Valgrind Massif Manual

  • Oracle Linux: Understanding Task Prioritybehaved oddly -> https://blogs.oracle.com/linux/post/task-priority

Comments

Popular posts from this blog

SSL certification of you web site

Installing MPICH2 on Ubuntu

ALLTALK WIRELESS SIGN LANGUAGE INTERPRETER