perf report shows this function “__memset_avx2_unaligned_erms” has overhead. does this mean memory is unaligned?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP


perf report shows this function “__memset_avx2_unaligned_erms” has overhead. does this mean memory is unaligned?



I am trying to profile my C++ code using perf tool. Implementation contains code with SSE/AVX/AVX2 instructions. In addition to that code is compiled with -O3 -mavx2 -march=native flags. I believe __memset_avx2_unaligned_erms function is a libc implementation of memset. perf shows that this function has considerable overhead. Function name indicates that memory is unaligned, however in the code I am explicitly aligning the memory using GCC built-in macro __attribute__((aligned (x))) What might be the reason for this function to have significant overhead and also why unaligned version is called although memory is aligned explicitly?


-O3 -mavx2 -march=native


__memset_avx2_unaligned_erms


memset


__attribute__((aligned (x)))



I have attached the sample report as picture.enter image description here





memset happens to be implemented in a way that does not require alignment.
– Marc Glisse
56 secs ago









By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Makefile test if variable is not empty

Visual Studio Code: How to configure includePath for better IntelliSense results

Will Oldham