Load 128-bits of integer data from unaligned memory.
Alternatively add and subtract packed double-precision (64-bit) floating-point elements in a to/from packed elements in b.
Alternatively add and subtract packed single-precision (32-bit) floating-point elements in a to/from packed elements in b.
Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in a and b.
Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in a and b.
Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in a and b.
Horizontally subtract adjacent pairs of single-precision (32-bit) floating-point elements in a and b.
Load a double-precision (64-bit) floating-point element from memory into both elements of result.
Duplicate the low double-precision (64-bit) floating-point element from a.
Duplicate odd-indexed single-precision (32-bit) floating-point elements from a.
Duplicate even-indexed single-precision (32-bit) floating-point elements from a.
SSE3 intrinsics. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=SSE3