Load 256-bits of integer data from unaligned memory into dst. This intrinsic may run better than _mm256_loadu_si256 when the data crosses a cache line boundary.
See Implementation
Load 256-bits of integer data from unaligned memory into dst. This intrinsic may run better than _mm256_loadu_si256 when the data crosses a cache line boundary.