inteli.avx2intrin

Public Imports

inteli.types
public import inteli.types;
Undocumented in source.
inteli.avxintrin
public import inteli.avxintrin;
Undocumented in source.

Members

Aliases

_mm256_broadcastsi128_si256
alias _mm256_broadcastsi128_si256 = _mm_broadcastsi128_si256

Broadcast 128 bits of integer data from `a to all 128-bit lanes in result. Note: also exist with name _mm256_broadcastsi128_si256 which is identical.

_mm256_slli_si256
alias _mm256_slli_si256 = _mm256_bslli_epi128

Shift 128-bit lanes in a left by bytes bytes while shifting in zeroes.

_mm256_srli_si256
alias _mm256_srli_si256 = _mm256_bsrli_epi128

Shift 128-bit lanes in a right by bytes bytes while shifting in zeroes.

Functions

_mm256_abs_epi16
__m256i _mm256_abs_epi16(__m256i a)

Compute the absolute value of packed signed 16-bit integers in a.

_mm256_abs_epi32
__m256i _mm256_abs_epi32(__m256i a)

Compute the absolute value of packed signed 32-bit integers in a.

_mm256_abs_epi8
__m256i _mm256_abs_epi8(__m256i a)

Compute the absolute value of packed signed 8-bit integers in a.

_mm256_add_epi16
__m256i _mm256_add_epi16(__m256i a, __m256i b)

Add packed 16-bit integers in a and b.

_mm256_add_epi32
__m256i _mm256_add_epi32(__m256i a, __m256i b)

Add packed 32-bit integers in a and b.

_mm256_add_epi64
__m256i _mm256_add_epi64(__m256i a, __m256i b)

Add packed 64-bit integers in a and b.

_mm256_add_epi8
__m256i _mm256_add_epi8(__m256i a, __m256i b)

Add packed 8-bit integers in a and b.

_mm256_adds_epi16
__m256i _mm256_adds_epi16(__m256i a, __m256i b)

Add packed 16-bit signed integers in a and b using signed saturation.

_mm256_adds_epi8
__m256i _mm256_adds_epi8(__m256i a, __m256i b)

Add packed 8-bit signed integers in a and b using signed saturation.

_mm256_adds_epu16
__m256i _mm256_adds_epu16(__m256i a, __m256i b)

Add packed 16-bit unsigned integers in a and b using unsigned saturation.

_mm256_adds_epu8
__m256i _mm256_adds_epu8(__m256i a, __m256i b)

Add packed 8-bit unsigned integers in a and b using unsigned saturation.

_mm256_alignr_epi8
__m256i _mm256_alignr_epi8(__m256i a, __m256i b)

Concatenate pairs of 16-byte blocks in a and b into a 32-byte temporary result, shift the result right by imm8 bytes, and return the low 16 bytes of that in each lane.

_mm256_and_si256
__m256i _mm256_and_si256(__m256i a, __m256i b)

Compute the bitwise AND of 256 bits (representing integer data) in a and b.

_mm256_andnot_si256
__m256i _mm256_andnot_si256(__m256i a, __m256i b)

Compute the bitwise NOT of 256 bits (representing integer data) in a and then AND with b.

_mm256_avg_epu16
__m256i _mm256_avg_epu16(__m256i a, __m256i b)

Average packed unsigned 16-bit integers in a and b.

_mm256_avg_epu8
__m256i _mm256_avg_epu8(__m256i a, __m256i b)

Average packed unsigned 8-bit integers in a and b.

_mm256_blend_epi16
__m256i _mm256_blend_epi16(__m256i a, __m256i b)

Blend packed 16-bit integers from a and b within 128-bit lanes using 8-bit control mask imm8, in each of the two lanes. Note: this is functionally equivalent to two _mm_blend_epi16.

_mm256_blend_epi32
__m256i _mm256_blend_epi32(__m256i a, __m256i b)

Blend packed 32-bit integers from a and b using 8-bit control mask imm8.

_mm256_blendv_epi8
__m256i _mm256_blendv_epi8(__m256i a, __m256i b, __m256i mask)

Blend packed 8-bit integers from a and b using mask. Select from b if the high-order bit of the corresponding 8-bit element in mask is set, else select from a.

_mm256_broadcastb_epi8
__m256i _mm256_broadcastb_epi8(__m128i a)

Bro0adcast the low packed 8-bit integer from a to all elements of result.

_mm256_broadcastd_epi32
__m256i _mm256_broadcastd_epi32(__m128i a)

Broadcast the low packed 32-bit integer from a to all elements of result.

_mm256_broadcastq_epi64
__m256i _mm256_broadcastq_epi64(__m128i a)

Broadcast the low packed 64-bit integer from a to all elements of result.

_mm256_broadcastsd_pd
__m256d _mm256_broadcastsd_pd(__m128d a)

Broadcast the low double-precision (64-bit) floating-point element from a to all elements of result.

_mm256_broadcastss_ps
__m256 _mm256_broadcastss_ps(__m128 a)

Broadcast the low single-precision (32-bit) floating-point element from a to all elements of result.

_mm256_broadcastw_epi16
__m256i _mm256_broadcastw_epi16(__m128i a)

Broadcast the low packed 16-bit integer from a to all elements of result.

_mm256_bslli_epi128
__m256i _mm256_bslli_epi128(__m256i a)

Shift 128-bit lanes in a left by bytes bytes while shifting in zeroes.

_mm256_bsrli_epi128
__m256i _mm256_bsrli_epi128(__m256i a)

Shift 128-bit lanes in a right by bytes bytes while shifting in zeroes.

_mm256_cmpeq_epi16
__m256i _mm256_cmpeq_epi16(__m256i a, __m256i b)

Compare packed 16-bit integers in a and b for equality.

_mm256_cmpeq_epi32
__m256i _mm256_cmpeq_epi32(__m256i a, __m256i b)

Compare packed 32-bit integers in a and b for equality.

_mm256_cmpeq_epi64
__m256i _mm256_cmpeq_epi64(__m256i a, __m256i b)

Compare packed 64-bit integers in a and b for equality.

_mm256_cmpeq_epi8
__m256i _mm256_cmpeq_epi8(__m256i a, __m256i b)

Compare packed 8-bit integers in a and b for equality.

_mm256_cmpgt_epi16
__m256i _mm256_cmpgt_epi16(__m256i a, __m256i b)

Compare packed signed 16-bit integers in a and b for greater-than.

_mm256_cmpgt_epi32
__m256i _mm256_cmpgt_epi32(__m256i a, __m256i b)

Compare packed signed 32-bit integers in a and b for greater-than.

_mm256_cmpgt_epi64
__m256i _mm256_cmpgt_epi64(__m256i a, __m256i b)
Undocumented in source. Be warned that the author may not have intended to support it.
_mm256_cmpgt_epi8
__m256i _mm256_cmpgt_epi8(__m256i a, __m256i b)

Compare packed signed 8-bit integers in a and b for greater-than.

_mm256_cvtepi16_epi32
__m256i _mm256_cvtepi16_epi32(__m128i a)

Sign extend packed 16-bit integers in a to packed 32-bit integers.

_mm256_cvtepi16_epi64
__m256i _mm256_cvtepi16_epi64(__m128i a)

Sign extend packed 16-bit integers in a to packed 64-bit integers.

_mm256_cvtepi32_epi64
__m256i _mm256_cvtepi32_epi64(__m128i a)

Sign extend packed 32-bit integers in a to packed 64-bit integers.

_mm256_cvtepi8_epi16
__m256i _mm256_cvtepi8_epi16(__m128i a)

Sign extend packed 8-bit integers in a to packed 16-bit integers.

_mm256_cvtepi8_epi32
__m256i _mm256_cvtepi8_epi32(__m128i a)

Sign extend packed 8-bit integers in a to packed 32-bit integers.

_mm256_cvtepi8_epi64
__m256i _mm256_cvtepi8_epi64(__m128i a)

Sign extend packed 8-bit integers in the low 8 bytes of a to packed 64-bit integers.

_mm256_cvtepu16_epi32
__m256i _mm256_cvtepu16_epi32(__m128i a)

Zero-extend packed unsigned 16-bit integers in a to packed 32-bit integers.

_mm256_cvtepu16_epi64
__m256i _mm256_cvtepu16_epi64(__m128i a)

Zero-extend packed unsigned 16-bit integers in a to packed 64-bit integers.

_mm256_cvtepu32_epi64
__m256i _mm256_cvtepu32_epi64(__m128i a)

Zero-extend packed unsigned 32-bit integers in a to packed 64-bit integers.

_mm256_cvtepu8_epi16
__m256i _mm256_cvtepu8_epi16(__m128i a)

Zero-extend packed unsigned 8-bit integers in a to packed 16-bit integers.

_mm256_cvtepu8_epi32
__m256i _mm256_cvtepu8_epi32(__m128i a)

Zero-extend packed unsigned 8-bit integers in a to packed 32-bit integers.

_mm256_cvtepu8_epi64
__m256i _mm256_cvtepu8_epi64(__m128i a)

Zero-extend packed unsigned 8-bit integers in a to packed 64-bit integers.

_mm256_extract_epi16
int _mm256_extract_epi16(__m256i a, int index)

Extract a 16-bit integer from a, selected with index.

_mm256_extract_epi8
int _mm256_extract_epi8(__m256i a, int index)

Extract a 8-bit integer from a, selected with index.

_mm256_extracti128_si256
__m128i _mm256_extracti128_si256(__m256i a)

Extract 128 bits (composed of integer data) from a, selected with imm8.

_mm256_hadd_epi16
__m256i _mm256_hadd_epi16(__m256i a, __m256i b)

Horizontally add adjacent pairs of 16-bit integers in a and b, and pack the signed 16-bit results.

_mm256_hadd_epi32
__m256i _mm256_hadd_epi32(__m256i a, __m256i b)

Horizontally add adjacent pairs of 32-bit integers in a and b, and pack the signed 32-bit results.

_mm256_hadds_epi16
__m256i _mm256_hadds_epi16(__m256i a, __m256i b)

Horizontally add adjacent pairs of signed 16-bit integers in a and b using saturation, and pack the signed 16-bit results.

_mm256_hsub_epi16
__m256i _mm256_hsub_epi16(__m256i a, __m256i b)

Horizontally subtract adjacent pairs of 16-bit integers in a and b, and pack the signed 16-bit results.

_mm256_hsub_epi32
__m256i _mm256_hsub_epi32(__m256i a, __m256i b)

Horizontally subtract adjacent pairs of 32-bit integers in a and b, and pack the signed 32-bit results.

_mm256_hsubs_epi16
__m256i _mm256_hsubs_epi16(__m256i a, __m256i b)

Horizontally subtract adjacent pairs of signed 16-bit integers in a and b using saturation, and pack the signed 16-bit results.

_mm256_i32gather_epi32
__m256i _mm256_i32gather_epi32(const(int)* base_addr, __m256i vindex)

Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm256_i32gather_epi64
__m256i _mm256_i32gather_epi64(const(long)* base_addr, __m128i vindex)

Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm256_i32gather_pd
__m256d _mm256_i32gather_pd(const(double)* base_addr, __m128i vindex)

Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm256_i32gather_ps
__m256 _mm256_i32gather_ps(const(float)* base_addr, __m256i vindex)

Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm256_i64gather_epi32
__m128i _mm256_i64gather_epi32(const(int)* base_addr, __m256i vindex)

Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Return gathered elements. scale should be 1, 2, 4 or 8.

_mm256_i64gather_epi64
__m256i _mm256_i64gather_epi64(const(long)* base_addr, __m256i vindex)

Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm256_i64gather_pd
__m256d _mm256_i64gather_pd(const(double)* base_addr, __m256i vindex)

Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm256_i64gather_ps
__m128 _mm256_i64gather_ps(const(float)* base_addr, __m256i vindex)

Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm256_inserti128_si256
__m256i _mm256_inserti128_si256(__m256i a, __m128i b, int imm8)

Copy a to result, then insert 128 bits from b into result at the location specified by imm8.

_mm256_madd_epi16
__m256i _mm256_madd_epi16(__m256i a, __m256i b)

Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Horizontally add adjacent pairs of intermediate 32-bit integers, and pack the results in destination.

_mm256_maddubs_epi16
__m256i _mm256_maddubs_epi16(__m256i a, __m256i b)

Vertically multiply each unsigned 8-bit integer from a with the corresponding signed 8-bit integer from b, producing intermediate signed 16-bit integers. Horizontally add adjacent pairs of intermediate signed 16-bit integers, and pack the saturated results.

_mm256_mask_i32gather_epi32
__m256i _mm256_mask_i32gather_epi32(__m256i src, const(int)* base_addr, __m256i vindex, __m256i mask)

Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm256_mask_i32gather_epi64
__m256i _mm256_mask_i32gather_epi64(__m256i src, const(long)* base_addr, __m128i vindex, __m256i mask)

Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm256_mask_i32gather_pd
__m256d _mm256_mask_i32gather_pd(__m256d src, const(double)* base_addr, __m128i vindex, __m256d mask)

Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm256_mask_i32gather_ps
__m256 _mm256_mask_i32gather_ps(__m256 src, const(float)* base_addr, __m256i vindex, __m256 mask)

Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm256_mask_i64gather_epi32
__m128i _mm256_mask_i64gather_epi32(__m128i src, const(int)* base_addr, __m256i vindex, __m128i mask)

Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm256_mask_i64gather_epi64
__m256i _mm256_mask_i64gather_epi64(__m256i src, const(long)* base_addr, __m256i vindex, __m256i mask)

Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm256_mask_i64gather_pd
__m256d _mm256_mask_i64gather_pd(__m256d src, const(double)* base_addr, __m256i vindex, __m256d mask)

Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm256_mask_i64gather_ps
__m128 _mm256_mask_i64gather_ps(__m128 src, const(float)* base_addr, __m256i vindex, __m128 mask)

Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm256_maskload_epi32
__m256i _mm256_maskload_epi32(const(int)* mem_addr, __m256i mask)

Load packed 32-bit integers from memory using mask (elements are zeroed out when the highest bit is not set in the corresponding element). Warning: See "Note about mask load/store" to know why you must address valid memory only.

_mm256_maskload_epi64
__m256i _mm256_maskload_epi64(const(long)* mem_addr, __m256i mask)

Load packed 64-bit integers from memory using mask (elements are zeroed out when the highest bit is not set in the corresponding element). Warning: See "Note about mask load/store" to know why you must address valid memory only.

_mm256_max_epi16
__m256i _mm256_max_epi16(__m256i a, __m256i b)

Compare packed signed 16-bit integers in a and b, and return packed maximum values.

_mm256_max_epi32
__m256i _mm256_max_epi32(__m256i a, __m256i b)

Compare packed signed 32-bit integers in a and b, and return packed maximum values.

_mm256_max_epi8
__m256i _mm256_max_epi8(__m256i a, __m256i b)

Compare packed signed 8-bit integers in a and b, and return packed maximum values.

_mm256_max_epu16
__m256i _mm256_max_epu16(__m256i a, __m256i b)

Compare packed unsigned 16-bit integers in a and b, and return packed maximum values.

_mm256_max_epu32
__m256i _mm256_max_epu32(__m256i a, __m256i b)

Compare packed unsigned 32-bit integers in a and b, and return packed maximum values.

_mm256_max_epu8
__m256i _mm256_max_epu8(__m256i a, __m256i b)

Compare packed unsigned 8-bit integers in a and b, and return packed maximum values.

_mm256_min_epi16
__m256i _mm256_min_epi16(__m256i a, __m256i b)
Undocumented in source. Be warned that the author may not have intended to support it.
_mm256_min_epi32
__m256i _mm256_min_epi32(__m256i a, __m256i b)

Compare packed signed 32-bit integers in a and b, and return packed minimum values.

_mm256_min_epi8
__m256i _mm256_min_epi8(__m256i a, __m256i b)

Compare packed signed 8-bit integers in a and b, and return packed minimum values.

_mm256_min_epu16
__m256i _mm256_min_epu16(__m256i a, __m256i b)

Compare packed unsigned 16-bit integers in a and b, and return packed minimum values.

_mm256_min_epu32
__m256i _mm256_min_epu32(__m256i a, __m256i b)

Compare packed unsigned 32-bit integers in a and b, and return packed minimum values.

_mm256_min_epu8
__m256i _mm256_min_epu8(__m256i a, __m256i b)

Compare packed unsigned 8-bit integers in a and b, and return packed minimum values.

_mm256_movemask_epi8
int _mm256_movemask_epi8(__m256i a)

Create mask from the most significant bit of each 8-bit element in a.

_mm256_mpsadbw_epu8
__m256i _mm256_mpsadbw_epu8(__m256i a, __m256i b)

Basically 2x _mm_mpsadbw_epu8 in parallel, over the two lanes.

_mm256_mul_epi32
__m256i _mm256_mul_epi32(__m256i a, __m256i b)

Multiply the low signed 32-bit integers from each packed 64-bit element in a and b, and return the signed 64-bit results.

_mm256_mul_epu32
__m256i _mm256_mul_epu32(__m256i a, __m256i b)

Multiply the low unsigned 32-bit integers from each packed 64-bit element in a and b, and return the unsigned 64-bit results.

_mm256_mulhi_epi16
__m256i _mm256_mulhi_epi16(__m256i a, __m256i b)

Multiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and return the high 16 bits of the intermediate integers.

_mm256_mulhi_epu16
__m256i _mm256_mulhi_epu16(__m256i a, __m256i b)

Multiply the packed unsigned 16-bit integers in a and b, producing intermediate 32-bit integers, and return the high 16 bits of the intermediate integers.

_mm256_mulhrs_epi16
__m256i _mm256_mulhrs_epi16(__m256i a, __m256i b)

Multiply packed signed 16-bit integers in a and b, producing intermediate signed 32-bit integers. Truncate each intermediate integer to the 18 most significant bits, round by adding 1, and return bits [16:1] to dst.

_mm256_mullo_epi16
__m256i _mm256_mullo_epi16(__m256i a, __m256i b)

Multiply the packed signed 16-bit integers in a and b, producing intermediate 32-bit integers, and return the low 16 bits of the intermediate integers.

_mm256_mullo_epi32
__m256i _mm256_mullo_epi32(__m256i a, __m256i b)

Multiply the packed signed 32-bit integers in a and b, producing intermediate 64-bit integers, and store the low 32 bits of the intermediate integer.

_mm256_or_si256
__m256i _mm256_or_si256(__m256i a, __m256i b)

Compute the bitwise OR of 256 bits (representing integer data) in a and b.

_mm256_packs_epi16
__m256i _mm256_packs_epi16(__m256i a, __m256i b)

Convert packed signed 16-bit integers from a and b to packed 8-bit integers using signed saturation. Warning: a and b are interleaved per-lane. Result has: a lane 0, b lane 0, a lane 1, b lane 1.

_mm256_packs_epi32
__m256i _mm256_packs_epi32(__m256i a, __m256i b)

Convert packed signed 32-bit integers from a and b to packed 16-bit integers using signed saturation. Warning: a and b are interleaved per-lane. Result has: a lane 0, b lane 0, a lane 1, b lane 1.

_mm256_packus_epi16
__m256i _mm256_packus_epi16(__m256i a, __m256i b)

Convert packed signed 16-bit integers from a and b to packed 8-bit integers using unsigned saturation. Warning: a and b are interleaved per-lane. Result has: a lane 0, b lane 0, a lane 1, b lane 1.

_mm256_packus_epi32
__m256i _mm256_packus_epi32(__m256i a, __m256i b)

Convert packed signed 32-bit integers from a and b to packed 16-bit integers using unsigned saturation. Warning: a and b are interleaved per-lane. Result has: a lane 0, b lane 0, a lane 1, b lane 1.

_mm256_permute2x128_si256
__m256i _mm256_permute2x128_si256(__m256i a, __m256i b)

Shuffle 128-bits (composed of 2 packed (128-bit) integer elements) selected by imm8 from a and b. See the documentation as the imm8 format is quite complex.

_mm256_permute4x64_epi64
__m256i _mm256_permute4x64_epi64(__m256i a)

Shuffle 64-bit integers in a across lanes using the control in imm8.

_mm256_permute4x64_pd
__m256d _mm256_permute4x64_pd(__m256d a)

Shuffle 64-bit double in a across lanes using the control in imm8.

_mm256_permutevar8x32_epi32
__m256i _mm256_permutevar8x32_epi32(__m256i a, __m256i idx)

Shuffle 32-bit integers in a across lanes using the corresponding index in idx.

_mm256_permutevar8x32_ps
__m256 _mm256_permutevar8x32_ps(__m256 a, __m256i idx)

Shuffle single-precision (32-bit) floating-point in a across lanes using the corresponding index in idx.

_mm256_sad_epu8
__m256i _mm256_sad_epu8(__m256i a, __m256i b)

Compute the absolute differences of packed unsigned 8-bit integers in a and b, then horizontally sum each consecutive 8 differences to produce two unsigned 16-bit integers, and pack these unsigned 16-bit integers in the low 16 bits of 64-bit elements in result.

_mm256_shuffle_epi32
__m256i _mm256_shuffle_epi32(__m256i a)

Shuffle 32-bit integers in a within 128-bit lanes using the control in imm8, and return the results.

_mm256_shuffle_epi8
__m256i _mm256_shuffle_epi8(__m256i a, __m256i b)

Shuffle 8-bit integers in a within 128-bit lanes according to shuffle control mask in the corresponding 8-bit element of b.

_mm256_shufflehi_epi16
__m256i _mm256_shufflehi_epi16(__m256i a)

Shuffle 16-bit integers in the high 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the high 64 bits of 128-bit lanes of result, with the low 64 bits of 128-bit lanes being copied from from a. See also: _MM_SHUFFLE.

_mm256_shufflelo_epi16
__m256i _mm256_shufflelo_epi16(__m256i a)

Shuffle 16-bit integers in the low 64 bits of 128-bit lanes of a using the control in imm8. Store the results in the low 64 bits of 128-bit lanes of result, with the high 64 bits of 128-bit lanes being copied from from a. See also: _MM_SHUFFLE.

_mm256_sign_epi16
__m256i _mm256_sign_epi16(__m256i a, __m256i b)

Negate packed signed 16-bit integers in a when the corresponding signed 8-bit integer in b is negative. Elements in result are zeroed out when the corresponding element in b is zero.

_mm256_sign_epi32
__m256i _mm256_sign_epi32(__m256i a, __m256i b)

Negate packed signed 32-bit integers in a when the corresponding signed 8-bit integer in b is negative. Elements in result are zeroed out when the corresponding element in b is zero.

_mm256_sign_epi8
__m256i _mm256_sign_epi8(__m256i a, __m256i b)

Negate packed signed 8-bit integers in a when the corresponding signed 8-bit integer in b is negative. Elements in result are zeroed out when the corresponding element in b is zero.

_mm256_sll_epi16
__m256i _mm256_sll_epi16(__m256i a, __m128i count)

Shift packed 16-bit integers in a left by count while shifting in zeroes. Bit-shift is a single value in the low-order 64-bit of count. If bit-shift > 15, result is defined to be all zeroes. Note: prefer _mm256_slli_epi16, less of a trap.

_mm256_sll_epi32
__m256i _mm256_sll_epi32(__m256i a, __m128i count)

Shift packed 32-bit integers in a left by count while shifting in zeroes. Bit-shift is a single value in the low-order 64-bit of count. If bit-shift > 31, result is defined to be all zeroes. Note: prefer _mm256_slli_epi32, less of a trap.

_mm256_sll_epi64
__m256i _mm256_sll_epi64(__m256i a, __m128i count)

Shift packed 64-bit integers in a left by count while shifting in zeroes. Bit-shift is a single value in the low-order 64-bit of count. If bit-shift > 63, result is defined to be all zeroes. Note: prefer _mm256_sll_epi64, less of a trap.

_mm256_slli_epi16
__m256i _mm256_slli_epi16(__m256i a, int imm8)

Shift packed 16-bit integers in a left by imm8 while shifting in zeros.

_mm256_slli_epi32
__m256i _mm256_slli_epi32(__m256i a, int imm8)

Shift packed 32-bit integers in a left by imm8 while shifting in zeros.

_mm256_slli_epi64
__m256i _mm256_slli_epi64(__m256i a, int imm8)

Shift packed 64-bit integers in a left by imm8 while shifting in zeros.

_mm256_sllv_epi32
__m256i _mm256_sllv_epi32(__m256i a, __m256i count)

Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeroes.

_mm256_sllv_epi64
__m256i _mm256_sllv_epi64(__m256i a, __m256i count)

Shift packed 64-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeroes.

_mm256_sra_epi16
__m256i _mm256_sra_epi16(__m256i a, __m128i count)

Shift packed 16-bit integers in a right by count while shifting in sign bits. Bit-shift is a single value in the low-order 64-bit of count. If bit-shift > 15, result is defined to be all sign bits. Warning: prefer _mm256_srai_epi16, less of a trap.

_mm256_sra_epi32
__m256i _mm256_sra_epi32(__m256i a, __m128i count)

Shift packed 32-bit integers in a right by count while shifting in sign bits. Bit-shift is a single value in the low-order 64-bit of count. If bit-shift > 31, result is defined to be all sign bits. Warning: prefer _mm256_sra_epi32, less of a trap.

_mm256_srai_epi16
__m256i _mm256_srai_epi16(__m256i a, int imm8)

Shift packed 16-bit integers in a right by imm8 while shifting in sign bits.

_mm256_srai_epi32
__m256i _mm256_srai_epi32(__m256i a, int imm8)

Shift packed 32-bit integers in a right by imm8 while shifting in sign bits.

_mm256_srav_epi32
__m256i _mm256_srav_epi32(__m256i a, __m256i count)
Undocumented in source. Be warned that the author may not have intended to support it.
_mm256_srl_epi16
__m256i _mm256_srl_epi16(__m256i a, __m128i count)

Shift packed 16-bit integers in a right by count while shifting in zeroes. Bit-shift is a single value in the low-order 64-bit of count. If bit-shift > 15, result is defined to be all zeroes. Note: prefer _mm256_srli_epi16, less of a trap.

_mm256_srl_epi32
__m256i _mm256_srl_epi32(__m256i a, __m128i count)

Shift packed 32-bit integers in a right by count while shifting in zeroes. Bit-shift is a single value in the low-order 64-bit of count. If bit-shift > 31, result is defined to be all zeroes. Note: prefer _mm256_srli_epi32, less of a trap.

_mm256_srl_epi64
__m256i _mm256_srl_epi64(__m256i a, __m128i count)

Shift packed 64-bit integers in a right by count while shifting in zeroes. Bit-shift is a single value in the low-order 64-bit of count. If bit-shift > 63, result is defined to be all zeroes. Note: prefer _mm256_srli_epi64, less of a trap.

_mm256_srli_epi16
__m256i _mm256_srli_epi16(__m256i a, int imm8)

Shift packed 16-bit integers in a right by imm8 while shifting in zeros.

_mm256_srli_epi32
__m256i _mm256_srli_epi32(__m256i a, int imm8)

Shift packed 32-bit integers in a right by imm8 while shifting in zeros.

_mm256_srli_epi64
__m256i _mm256_srli_epi64(__m256i a, int imm8)

Shift packed 64-bit integers in a right by imm8 while shifting in zeros.

_mm256_srlv_epi32
__m256i _mm256_srlv_epi32(__m256i a, __m256i count)

Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeroes.

_mm256_srlv_epi64
__m256i _mm256_srlv_epi64(__m256i a, __m256i count)

Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeroes.

_mm256_stream_load_si256
__m256i _mm256_stream_load_si256(const(__m256i)* mem_addr)

Load 256-bits of integer data from memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_sub_epi16
__m256i _mm256_sub_epi16(__m256i a, __m256i b)

Subtract packed 16-bit integers in b from packed 16-bit integers in a.

_mm256_sub_epi32
__m256i _mm256_sub_epi32(__m256i a, __m256i b)

Subtract packed 32-bit integers in b from packed 32-bit integers in a.

_mm256_sub_epi64
__m256i _mm256_sub_epi64(__m256i a, __m256i b)

Subtract packed 64-bit integers in b from packed 64-bit integers in a.

_mm256_sub_epi8
__m256i _mm256_sub_epi8(__m256i a, __m256i b)

Subtract packed 8-bit integers in b from packed 8-bit integers in a.

_mm256_subs_epi16
__m256i _mm256_subs_epi16(__m256i a, __m256i b)

Subtract packed signed 16-bit integers in b from packed 16-bit integers in a using saturation.

_mm256_subs_epi8
__m256i _mm256_subs_epi8(__m256i a, __m256i b)

Subtract packed signed 8-bit integers in b from packed 8-bit integers in a using saturation.

_mm256_subs_epu16
__m256i _mm256_subs_epu16(__m256i a, __m256i b)

Subtract packed unsigned 16-bit integers in b from packed unsigned 16-bit integers in a using saturation.

_mm256_subs_epu8
__m256i _mm256_subs_epu8(__m256i a, __m256i b)

Subtract packed unsigned 8-bit integers in b from packed unsigned 8-bit integers in a using saturation.

_mm256_unpackhi_epi16
__m256i _mm256_unpackhi_epi16(__m256i a, __m256i b)

Unpack and interleave 16-bit integers from the high half of each 128-bit lane in a and b.

_mm256_unpackhi_epi32
__m256i _mm256_unpackhi_epi32(__m256i a, __m256i b)

Unpack and interleave 32-bit integers from the high half of each 128-bit lane in a and b.

_mm256_unpackhi_epi64
__m256i _mm256_unpackhi_epi64(__m256i a, __m256i b)

Unpack and interleave 64-bit integers from the high half of each 128-bit lane in a and b.

_mm256_unpackhi_epi8
__m256i _mm256_unpackhi_epi8(__m256i a, __m256i b)

Unpack and interleave 8-bit integers from the high half of each 128-bit lane in a and b,

_mm256_unpacklo_epi16
__m256i _mm256_unpacklo_epi16(__m256i a, __m256i b)

Unpack and interleave 16-bit integers from the low half of each 128-bit lane in a and b.

_mm256_unpacklo_epi32
__m256i _mm256_unpacklo_epi32(__m256i a, __m256i b)

Unpack and interleave 32-bit integers from the low half of each 128-bit lane in a and b.

_mm256_unpacklo_epi64
__m256i _mm256_unpacklo_epi64(__m256i a, __m256i b)

Unpack and interleave 64-bit integers from the low half of each 128-bit lane in a and b.

_mm256_unpacklo_epi8
__m256i _mm256_unpacklo_epi8(__m256i a, __m256i b)

Unpack and interleave 8-bit integers from the low half of each 128-bit lane in a and b.

_mm256_xor_si256
__m256i _mm256_xor_si256(__m256i a, __m256i b)

Compute the bitwise XOR of 256 bits (representing integer data) in a and b.

_mm_blend_epi32
__m128i _mm_blend_epi32(__m128i a, __m128i b)

Blend packed 32-bit integers from a and b using 4-bit control mask imm8.

_mm_broadcastb_epi8
__m128i _mm_broadcastb_epi8(__m128i a)

Broadcast the low packed 8-bit integer from a to all elements of result.

_mm_broadcastd_epi32
__m128i _mm_broadcastd_epi32(__m128i a)

Broadcast the low packed 32-bit integer from a to all elements of result.

_mm_broadcastq_epi64
__m128i _mm_broadcastq_epi64(__m128i a)

Broadcast the low packed 64-bit integer from a to all elements of result.

_mm_broadcastsd_pd
__m128d _mm_broadcastsd_pd(__m128d a)

Broadcast the low double-precision (64-bit) floating-point element from a to all elements of result.

_mm_broadcastsi128_si256
__m256i _mm_broadcastsi128_si256(__m128i a)

Broadcast 128 bits of integer data from `a to all 128-bit lanes in result. Note: also exist with name _mm256_broadcastsi128_si256 which is identical.

_mm_broadcastss_ps
__m128 _mm_broadcastss_ps(__m128 a)

Broadcast the low single-precision (32-bit) floating-point element from a to all elements of result.

_mm_broadcastw_epi16
__m128i _mm_broadcastw_epi16(__m128i a)

Broadcast the low packed 16-bit integer from a to all elements of result.

_mm_i32gather_epi32
__m128i _mm_i32gather_epi32(const(int)* base_addr, __m128i vindex)

Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Return gathered elements. scale should be 1, 2, 4 or 8.

_mm_i32gather_epi64
__m128i _mm_i32gather_epi64(const(long)* base_addr, __m128i vindex)

Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm_i32gather_pd
__m128d _mm_i32gather_pd(const(double)* base_addr, __m128i vindex)

Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm_i32gather_ps
__m128 _mm_i32gather_ps(const(float)* base_addr, __m128i vindex)

Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm_i64gather_epi32
__m128i _mm_i64gather_epi32(const(int)* base_addr, __m128i vindex)

Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Return gathered elements. scale should be 1, 2, 4 or 8.

_mm_i64gather_epi64
__m128i _mm_i64gather_epi64(const(long)* base_addr, __m128i vindex)

Gather 64-bit integers from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm_i64gather_pd
__m128d _mm_i64gather_pd(const(double)* base_addr, __m128i vindex)

Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm_i64gather_ps
__m128 _mm_i64gather_ps(const(float)* base_addr, __m128i vindex)

Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are returned. scale should be 1, 2, 4 or 8.

_mm_mask_i32gather_epi32
__m128i _mm_mask_i32gather_epi32(__m128i src, const(int)* base_addr, __m128i vindex, __m128i mask)

Gather 32-bit integers from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm_mask_i32gather_epi64
__m128i _mm_mask_i32gather_epi64(__m128i src, const(long)* base_addr, __m128i vindex, __m128i mask)

Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm_mask_i32gather_pd
__m128d _mm_mask_i32gather_pd(__m128d src, const(double)* base_addr, __m128i vindex, __m128d mask)

Gather double-precision (64-bit) floating-point elements from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm_mask_i32gather_ps
__m128 _mm_mask_i32gather_ps(__m128 src, const(float)* base_addr, __m128i vindex, __m128 mask)

Gather single-precision (32-bit) floating-point elements from memory using 32-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm_mask_i64gather_epi32
__m128i _mm_mask_i64gather_epi32(__m128i src, const(int)* base_addr, __m128i vindex, __m128i mask)

Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm_mask_i64gather_epi64
__m128i _mm_mask_i64gather_epi64(__m128i src, const(long)* base_addr, __m128i vindex, __m128i mask)

Gather 64-bit integers from memory using 32-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 32-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm_mask_i64gather_pd
__m128d _mm_mask_i64gather_pd(__m128d src, const(double)* base_addr, __m128i vindex, __m128d mask)

Gather double-precision (64-bit) floating-point elements from memory using 64-bit indices. 64-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged into dst using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm_mask_i64gather_ps
__m128 _mm_mask_i64gather_ps(__m128 src, const(float)* base_addr, __m128i vindex, __m128 mask)

Gather single-precision (32-bit) floating-point elements from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.

_mm_maskload_epi32
__m128i _mm_maskload_epi32(const(int)* mem_addr, __m128i mask)

Load packed 32-bit integers from memory using mask (elements are zeroed out when the highest bit is not set in the corresponding element). Warning: See "Note about mask load/store" to know why you must address valid memory only.

_mm_maskload_epi64
__m128i _mm_maskload_epi64(const(long)* mem_addr, __m128i mask)

Load packed 64-bit integers from memory using mask (elements are zeroed out when the highest bit is not set in the corresponding element). Warning: See "Note about mask load/store" to know why you must address valid memory only.

_mm_sllv_epi32
__m128i _mm_sllv_epi32(__m128i a, __m128i count)

Shift packed 32-bit integers in a left by the amount specified by the corresponding element in count while shifting in zeroes.

_mm_sllv_epi64
__m128i _mm_sllv_epi64(__m128i a, __m128i count)

Shift packed 64-bit integers in a left by the amount specified by the corresponding element in b while shifting in zeros.

_mm_srav_epi32
__m128i _mm_srav_epi32(__m128i a, __m128i count)
Undocumented in source. Be warned that the author may not have intended to support it.
_mm_srlv_epi32
__m128i _mm_srlv_epi32(__m128i a, __m128i count)

Shift packed 32-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeroes.

_mm_srlv_epi64
__m128i _mm_srlv_epi64(__m128i a, __m128i count)

Shift packed 64-bit integers in a right by the amount specified by the corresponding element in count while shifting in zeroes.

Meta