perf report shows this function “__memset_avx2_unaligned_erms” has overhead. does this mean memory is unaligned?
perf report shows this function “__memset_avx2_unaligned_erms” has overhead. does this mean memory is unaligned?
I am trying to profile my C++ code using perf tool. Implementation contains code with SSE/AVX/AVX2 instructions. In addition to that code is compiled with -O3 -mavx2 -march=native
flags. I believe __memset_avx2_unaligned_erms
function is a libc implementation of memset
. perf shows that this function has considerable overhead. Function name indicates that memory is unaligned, however in the code I am explicitly aligning the memory using GCC built-in macro __attribute__((aligned (x)))
What might be the reason for this function to have significant overhead and also why unaligned version is called although memory is aligned explicitly?
-O3 -mavx2 -march=native
__memset_avx2_unaligned_erms
memset
__attribute__((aligned (x)))
I have attached the sample report as picture.
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
memset happens to be implemented in a way that does not require alignment.
– Marc Glisse
56 secs ago