Add packed double-precision (64-bit) floating-point elements in a and b.
Add packed single-precision (32-bit) floating-point elements in a and b.
Alternatively add and subtract packed double-precision (64-bit) floating-point elements in a to/from packed elements in b.
Alternatively add and subtract packed single-precision (32-bit) floating-point elements in a to/from packed elements in b.
Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in a and b.
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b.
Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in a and then AND with b.
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b.
Blend packed double-precision (64-bit) floating-point elements from a and b using control mask imm8.
Blend packed single-precision (32-bit) floating-point elements from a and b using control mask imm8.
Blend packed double-precision (64-bit) floating-point elements from a and b using mask.
Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements. This effectively duplicates the 128-bit vector.
Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements. This effectively duplicates the 128-bit vector.
Broadcast a single-precision (32-bit) floating-point element from memory to all elements.
Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined.
Cast vector of type __m256d to type __m128d; the upper 128 bits of a are lost.
Cast vector of type __m256d to type __m256.
Cast vector of type __m256d to type __m256i.
Cast vector of type __m128 to type __m256; the upper 128 bits of the result are undefined.
Cast vector of type __m256 to type __m256d.
Cast vector of type __m256 to type __m256i.
Extract a 32-bit integer from a, selected with imm8.
Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
Load 256-bits of integer data from memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.
Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory. mem_addr does not need to be aligned on any particular boundary.
Load 256-bits of integer data from memory. mem_addr does not need to be aligned on any particular boundary.
Multiply packed double-precision (64-bit) floating-point elements in a and b.
Multiply packed single-precision (32-bit) floating-point elements in a and b.
Compute the bitwise NOT of 256 bits in a. #BONUS
Broadcast 16-bit integer a to all elements of the return value.
Broadcast 32-bit integer a to all elements.
Broadcast 64-bit integer a to all elements of the return value.
Broadcast 8-bit integer a to all elements of the return value.
Broadcast double-precision (64-bit) floating-point value a to all elements of the return value.
Broadcast single-precision (32-bit) floating-point value a to all elements of the return value.
Set packed double-precision (64-bit) floating-point elements with the supplied values.
Set packed single-precision (32-bit) floating-point elements with the supplied values.
Set packed 16-bit integers with the supplied values in reverse order.
Set packed 32-bit integers with the supplied values in reverse order.
Set packed 8-bit integers with the supplied values in reverse order.
Set packed double-precision (64-bit) floating-point elements with the supplied values in reverse order.
Set packed single-precision (32-bit) floating-point elements with the supplied values in reverse order.
Return vector of type __m256d with all elements set to zero.
Return vector of type __m256 with all elements set to zero.
Return vector of type __m256i with all elements set to zero.
Store 256-bits of integer data from a into memory. mem_addr does not need to be aligned on any particular boundary.
Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a.
Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a.
Return vector of type __m256d with undefined elements.
Return vector of type __m256 with undefined elements.
Return vector of type __m256i with undefined elements.
Broadcast a single-precision (32-bit) floating-point element from memory to all elements.
AVX intrinsics. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=AVX