Convert packed signed 32-bit integers in b to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements, and copy the upper 2 packed elements from a to the upper elements of result.
Load a single-precision (32-bit) floating-point element from memory into all elements.
Get the exception mask bits from the MXCSR control and status register. The exception mask may contain any of the following flags: _MM_MASK_INVALID, _MM_MASK_DIV_ZERO, _MM_MASK_DENORM, _MM_MASK_OVERFLOW, _MM_MASK_UNDERFLOW, _MM_MASK_INEXACT. Note: won't correspond to reality on non-x86, where MXCSR this is emulated.
Get the exception state bits from the MXCSR control and status register. The exception state may contain any of the following flags: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO, _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW, _MM_EXCEPT_INEXACT. Note: won't correspond to reality on non-x86, where MXCSR this is emulated. No exception reported.
Get the flush zero bits from the MXCSR control and status register. The flush zero may contain any of the following flags: _MM_FLUSH_ZERO_ON or _MM_FLUSH_ZERO_OFF
Get the rounding mode bits from the MXCSR control and status register. The rounding mode may contain any of the following flags: _MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO`.
Set the exception mask bits of the MXCSR control and status register to the value in unsigned 32-bit integer _MM_MASK_xxxx. The exception mask may contain any of the following flags: _MM_MASK_INVALID, _MM_MASK_DIV_ZERO, _MM_MASK_DENORM, _MM_MASK_OVERFLOW, _MM_MASK_UNDERFLOW, _MM_MASK_INEXACT.
Set the exception state bits of the MXCSR control and status register to the value in unsigned 32-bit integer _MM_EXCEPT_xxxx. The exception state may contain any of the following flags: _MM_EXCEPT_INVALID, _MM_EXCEPT_DIV_ZERO, _MM_EXCEPT_DENORM, _MM_EXCEPT_OVERFLOW, _MM_EXCEPT_UNDERFLOW, _MM_EXCEPT_INEXACT.
Set the flush zero bits of the MXCSR control and status register to the value in unsigned 32-bit integer _MM_FLUSH_xxxx. The flush zero may contain any of the following flags: _MM_FLUSH_ZERO_ON or _MM_FLUSH_ZERO_OFF.
Set the rounding mode bits of the MXCSR control and status register to the value in unsigned 32-bit integer _MM_ROUND_xxxx. The rounding mode may contain any of the following flags: _MM_ROUND_NEAREST, _MM_ROUND_DOWN, _MM_ROUND_UP, _MM_ROUND_TOWARD_ZERO.
Transpose the 4x4 matrix formed by the 4 rows of single-precision (32-bit) floating-point elements in row0, row1, row2, and row3, and store the transposed matrix in these vectors (row0 now contains column 0, etc.).
Add packed single-precision (32-bit) floating-point elements in a and b.
Add the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.
Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b.
Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b.
Average packed unsigned 16-bit integers in `a and b`.
Average packed unsigned 8-bit integers in `a and b`.
Compare packed single-precision (32-bit) floating-point elements in a and b for equality.
Compare the lower single-precision (32-bit) floating-point elements in a and b for equality, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b for greater-than-or-equal.
Compare the lower single-precision (32-bit) floating-point elements in a and b for greater-than-or-equal, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b for greater-than.
Compare the lower single-precision (32-bit) floating-point elements in a and b for greater-than, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b for less-than-or-equal.
Compare the lower single-precision (32-bit) floating-point elements in a and b for less-than-or-equal, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b for less-than.
Compare the lower single-precision (32-bit) floating-point elements in a and b for less-than, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b for not-equal.
Compare the lower single-precision (32-bit) floating-point elements in a and b for not-equal, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b for not-greater-than-or-equal.
Compare the lower single-precision (32-bit) floating-point elements in a and b for not-greater-than-or-equal, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b for not-greater-than.
Compare the lower single-precision (32-bit) floating-point elements in a and b for not-greater-than, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b for not-less-than-or-equal.
Compare the lower single-precision (32-bit) floating-point elements in a and b for not-less-than-or-equal, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b for not-less-than.
Compare the lower single-precision (32-bit) floating-point elements in a and b for not-less-than, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b to see if neither is NaN.
Compare the lower single-precision (32-bit) floating-point elements in a and b to see if neither is NaN, and copy the upper 3 packed elements from a to the upper elements of result.
Compare packed single-precision (32-bit) floating-point elements in a and b to see if either is NaN.
Compare the lower single-precision (32-bit) floating-point elements in a and b to see if either is NaN. and copy the upper 3 packed elements from a to the upper elements of result.
Compare the lower single-precision (32-bit) floating-point element in a and b for equality, and return the boolean result (0 or 1).
Compare the lower single-precision (32-bit) floating-point element in a and b for greater-than-or-equal, and return the boolean result (0 or 1).
Compare the lower single-precision (32-bit) floating-point element in a and b for greater-than, and return the boolean result (0 or 1).
Compare the lower single-precision (32-bit) floating-point element in a and b for less-than-or-equal, and return the boolean result (0 or 1).
Compare the lower single-precision (32-bit) floating-point element in a and b for less-than, and return the boolean result (0 or 1).
Compare the lower single-precision (32-bit) floating-point element in a and b for not-equal, and return the boolean result (0 or 1).
Convert 2 lower packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers.
Convert the signed 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element, and copy the upper 3 packed elements from a to the upper elements of the result.
Convert packed 16-bit integers in a to packed single-precision (32-bit) floating-point elements.
Convert packed signed 32-bit integers in b to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements, and copy the upper 2 packed elements from a to the upper elements of result.
Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements, store the results in the lower 2 elements, then covert the packed signed 32-bit integers in b to single-precision (32-bit) floating-point element, and store the results in the upper 2 elements.
Convert the lower packed 8-bit integers in a to packed single-precision (32-bit) floating-point elements.
Convert packed single-precision (32-bit) floating-point elements in a to packed 16-bit integers. Note: this intrinsic will generate 0x7FFF, rather than 0x8000, for input values between 0x7FFF and 0x7FFFFFFF.
Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers.
Convert packed single-precision (32-bit) floating-point elements in a to packed 8-bit integers, and store the results in lower 4 elements. Note: this intrinsic will generate 0x7F, rather than 0x80, for input values between 0x7F and 0x7FFFFFFF.
Convert packed unsigned 16-bit integers in a to packed single-precision (32-bit) floating-point elements.
Convert the lower packed unsigned 8-bit integers in a to packed single-precision (32-bit) floating-point element.
Convert the signed 32-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element, and copy the upper 3 packed elements from a to the upper elements of result.
Convert the signed 64-bit integer b to a single-precision (32-bit) floating-point element, store the result in the lower element, and copy the upper 3 packed elements from a to the upper elements of result.
Take the lower single-precision (32-bit) floating-point element of a.
Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer.
Convert the lower single-precision (32-bit) floating-point element in a to a 64-bit integer.
Convert packed single-precision (32-bit) floating-point elements in a to packed 32-bit integers with truncation.
Convert the lower single-precision (32-bit) floating-point element in a to a 32-bit integer with truncation.
Convert the lower single-precision (32-bit) floating-point element in a to a 64-bit integer with truncation.
Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b.
Divide the lower single-precision (32-bit) floating-point element in a by the lower single-precision (32-bit) floating-point element in b, store the result in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.
Extract a 16-bit unsigned integer from a, selected with imm8. Zero-extended.
Free aligned memory that was allocated with _mm_malloc.
Get the unsigned 32-bit value of the MXCSR control and status register. Note: this is emulated on ARM, because there is no MXCSR register then.
Insert a 16-bit integer i inside a at the location specified by imm8.
Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory.
Load a single-precision (32-bit) floating-point element from memory into all elements.
Load a single-precision (32-bit) floating-point element from memory into the lower of dst, and zero the upper 3 elements. mem_addr does not need to be aligned on any particular boundary.
Load 2 single-precision (32-bit) floating-point elements from memory into the upper 2 elements of result, and copy the lower 2 elements from a to result. mem_addr does not need to be aligned on any particular boundary.
Load 2 single-precision (32-bit) floating-point elements from memory into the lower 2 elements of result, and copy the upper 2 elements from a to result. mem_addr does not need to be aligned on any particular boundary.
Load 4 single-precision (32-bit) floating-point elements from memory in reverse order. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory. mem_addr does not need to be aligned on any particular boundary.
Load unaligned 16-bit integer from memory into the first element, fill with zeroes otherwise.
Load unaligned 64-bit integer from memory into the first element of result. Upper 64-bit is zeroed.
Allocate size bytes of memory, aligned to the alignment specified in align, and return a pointer to the allocated memory. _mm_free should be used to free memory that is allocated with _mm_malloc.
Conditionally store 8-bit integer elements from a into memory using mask (elements are not stored when the highest bit is not set in the corresponding element) and a non-temporal memory hint.
Compare packed signed 16-bit integers in a and b, and return packed maximum value.
Compare packed single-precision (32-bit) floating-point elements in a and b, and return packed maximum values.
Compare packed unsigned 8-bit integers in a and b, and return packed maximum values.
Compare the lower single-precision (32-bit) floating-point elements in a and b, store the maximum value in the lower element of result, and copy the upper 3 packed elements from a to the upper element of result.
Compare packed signed 16-bit integers in a and b, and return packed minimum values.
Compare packed single-precision (32-bit) floating-point elements in a and b, and return packed maximum values.
Compare packed unsigned 8-bit integers in a and b, and return packed minimum values.
Compare the lower single-precision (32-bit) floating-point elements in a and b, store the minimum value in the lower element of result, and copy the upper 3 packed elements from a to the upper element of result.
Move the lower single-precision (32-bit) floating-point element from b to the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.
Move the upper 2 single-precision (32-bit) floating-point elements from b to the lower 2 elements of result, and copy the upper 2 elements from a to the upper 2 elements of dst.
Move the lower 2 single-precision (32-bit) floating-point elements from b to the upper 2 elements of result, and copy the lower 2 elements from a to the lower 2 elements of result
Create mask from the most significant bit of each 8-bit element in a.
Set each bit of result based on the most significant bit of the corresponding packed single-precision (32-bit) floating-point element in a.
Multiply packed single-precision (32-bit) floating-point elements in a and b.
Multiply the lower single-precision (32-bit) floating-point element in a and b, store the result in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.
Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and return the high 16 bits of the intermediate integers.
Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in a and b, and return the result.
Fetch the line of data from memory that contains address p to a location in the cache hierarchy specified by the locality hint i.
Compute the approximate reciprocal of packed single-precision (32-bit) floating-point elements in a` , and return the results. The maximum relative error for this approximation is less than 1.5*2^-12.
Compute the approximate reciprocal of the lower single-precision (32-bit) floating-point element in a, store it in the lower element of the result, and copy the upper 3 packed elements from a to the upper elements of result. The maximum relative error for this approximation is less than 1.5*2^-12.
Reallocate size bytes of memory, aligned to the alignment specified in alignment, and return a pointer to the newly allocated memory. _mm_free or alignedRealloc with size 0 should be used to free memory that is allocated with _mm_malloc or _mm_realloc. Previous data is preserved.
Reallocate size bytes of memory, aligned to the alignment specified in alignment, and return a pointer to the newly allocated memory. _mm_free or alignedRealloc with size 0 should be used to free memory that is allocated with _mm_malloc or _mm_realloc. Previous data is discarded.
Compute the approximate reciprocal square root of packed single-precision (32-bit) floating-point elements in a. The maximum relative error for this approximation is less than 1.5*2^-12.
Compute the approximate reciprocal square root of the lower single-precision (32-bit) floating-point element in a, store the result in the lower element. Copy the upper 3 packed elements from a to the upper elements of result. The maximum relative error for this approximation is less than 1.5*2^-12.
Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce four unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of result.
Broadcast single-precision (32-bit) floating-point value a to all elements.
Set packed single-precision (32-bit) floating-point elements with the supplied values.
Copy single-precision (32-bit) floating-point element a to the lower element of result, and zero the upper 3 elements.
Set the MXCSR control and status register with the value in unsigned 32-bit integer controlWord.
Set packed single-precision (32-bit) floating-point elements with the supplied values in reverse order.
Return vector of type __m128 with all elements set to zero.
Perform a serializing operation on all store-to-memory instructions that were issued prior to this instruction. Guarantees that every store instruction that precedes, in program order, is globally visible before any store instruction which follows the fence in program order.
Warning: the immediate shuffle value imm8 is given at compile-time instead of runtime.
Warning: the immediate shuffle value imm8 is given at compile-time instead of runtime.
Compute the square root of packed single-precision (32-bit) floating-point elements in a.
Compute the square root of the lower single-precision (32-bit) floating-point element in a, store it in the lower element, and copy the upper 3 packed elements from a to the upper elements of result.
Store the lower single-precision (32-bit) floating-point element from a into 4 contiguous elements in memory. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
Store the lower single-precision (32-bit) floating-point element from a into memory. mem_addr does not need to be aligned on any particular boundary.
Store the upper 2 single-precision (32-bit) floating-point elements from a into memory.
Store the lower 2 single-precision (32-bit) floating-point elements from a into memory.
Store 4 single-precision (32-bit) floating-point elements from a into memory in reverse order. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.
Store 64-bits of integer data from a into memory using a non-temporal memory hint.
Store 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from as into memory using a non-temporal memory hint. mem_addr must be aligned on a 16-byte boundary or a general-protection exception may be generated.
Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a.
Subtract the lower single-precision (32-bit) floating-point element in b from the lower single-precision (32-bit) floating-point element in a, store the subtration result in the lower element of result, and copy the upper 3 packed elements from a to the upper elements of result.
Return vector of type __m128 with undefined elements.
Unpack and interleave single-precision (32-bit) floating-point elements from the high half a and b.
Unpack and interleave single-precision (32-bit) floating-point elements from the low half of a and b.
Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in a and b.
MXCSR Exception states.
MXCSR Exception states.
MXCSR Exception states mask.
MXCSR Exception states.
MXCSR Denormal flush to zero mask.
MXCSR Denormal flush to zero modes.
MXCSR Denormal flush to zero modes.
MXCSR Exception masks.
MXCSR Exception masks.
MXCSR Exception masks mask.
MXCSR Exception masks.
MXCSR Rounding mode.
MXCSR Rounding mode mask.
MXCSR Rounding mode.
SSE intrinsics. https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=SSE