Digging deep with heaptrack/massif
Debugging Memory Growth on a Ti-Sitara AArch64 ECU with Heaptrack
A while back I had to track down a recurring heap growth issue on a customer’s automotive ECU based on a TI Sitara Cortex-A53. The system ran multiple ML models for ADAS feature extraction at different frequencies. Each model ran in its own POSIX preemptive real-time thread, pinned to cores with priorities like 120, 110, 80, 60
. When switching scheduling policies (for example SCHED_OTHER
→ SCHED_RR
or SCHED_FIFO
under PREEMPT_RT), the visible priority ranges changed — which is expected behavior on a real-time patched kernel.
The Problem
Over time, memory usage crept up until the system became unstable. The workload made it tricky — multiple ML models meant irregular allocation patterns with frequent buffer churn. I needed something lightweight that could run directly on the target to pinpoint the source.
Why Heaptrack
Heaptrack turned out to be the best fit. It hooks into memory allocations using a preload library (libheaptrack_preload.so
) and logs backtraces for each event. Later, you can interpret or visualize those traces with heaptrack_print
, heaptrack_interpret
, or heaptrack_gui
. Compared to Valgrind’s Massif, Heaptrack had much lower runtime overhead — essential on an embedded ARM target.
Making Heaptrack Work on an Embedded Target
Because the ECU used a custom glibc runtime, I had to modify Heaptrack’s launcher scripts. The default wrapper didn’t handle the target’s ld.so
and library paths correctly. The workaround was to call the dynamic loader directly with --library-path
and --preload
options.
Here’s the adapted command that worked reliably:
/home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so \
--library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ \
--preload /usr/lib/heaptrack/libheaptrack_preload.so \
./xMSRawFileAppDemo <app params>
After running, the trace file (heaptrack.<pid>
) could be interpreted and compressed on the device:
/home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so \
--library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ \
/usr/lib/heaptrack/libexec/heaptrack_interpret < heaptrack.11632 | gzip -c > ht.gz
And for more control, I extended the Heaptrack script to handle named pipes and output redirection — basically adapting it to the embedded environment:
INTERPRETER="/home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so --library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ /usr/lib/heaptrack/libexec/heaptrack_interpret"
LIBHEAPTRACK_PRELOAD="/home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so --library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ --preload /usr/lib/heaptrack/libheaptrack_preload.so"
#$(readlink -f "$LIBHEAPTRACK_PRELOAD")
if [ -z "$write_raw_data" ]; then
/home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so --library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ /usr/lib/heaptrack/libexec/heaptrack_interpret < $pipe > "$output" &
echo "starting application, this might take some time..."
DUMP_HEAPTRACK_OUTPUT="$pipe" /home/root/glibc/usr/aarch64-linux-gnu/usr/bin/ld.so --library-path /home/root/glibc/usr/aarch64-linux-gnu/lib/ --preload /usr/lib/heaptrack/libheaptrack_preload.so "$client" "$@"
EXIT_CODE=$?
These tweaks made Heaptrack fully functional on the aarch64 ECU without needing to rebuild glibc or modify the target filesystem.
The Findings
The analysis showed two main causes:
-
A buffer in one ML feature extractor wasn’t freed in an error path.
-
A cleanup thread was starved under certain real-time priority combinations, leaving temporary allocations cached longer than expected.
Fixing these resolved the recurring heap growth entirely.
Key Takeaways
-
Heaptrack works well on embedded targets if you adapt its preload and interpreter scripts for custom glibc paths.
-
Always validate real-time thread priorities after changing scheduling policies — PREEMPT_RT can shift effective ranges.
-
Memory leaks in mixed-frequency ML workloads often hide behind thread timing and priority interactions.
References:
-
Valgrind Massif Manual
-
Oracle Linux: Understanding Task Prioritybehaved oddly -> https://blogs.oracle.com/linux/post/task-priority
Comments
Post a Comment