inteli.avxintrin

Public Imports

inteli.types
public import inteli.types;
Undocumented in source.
inteli.tmmintrin
public import inteli.tmmintrin;
Undocumented in source.

Members

Functions

_mm256_add_pd
__m256d _mm256_add_pd(__m256d a, __m256d b)

Add packed double-precision (64-bit) floating-point elements in a and b.

_mm256_add_ps
__m256 _mm256_add_ps(__m256 a, __m256 b)

Add packed single-precision (32-bit) floating-point elements in a and b.

_mm256_addsub_pd
__m256d _mm256_addsub_pd(__m256d a, __m256d b)

Alternatively add and subtract packed double-precision (64-bit) floating-point elements in a to/from packed elements in b.

_mm256_addsub_ps
__m256 _mm256_addsub_ps(__m256 a, __m256 b)

Alternatively add and subtract packed single-precision (32-bit) floating-point elements in a to/from packed elements in b.

_mm256_and_pd
__m256d _mm256_and_pd(__m256d a, __m256d b)

Compute the bitwise AND of packed double-precision (64-bit) floating-point elements in a and b.

_mm256_and_ps
__m256 _mm256_and_ps(__m256 a, __m256 b)

Compute the bitwise AND of packed single-precision (32-bit) floating-point elements in a and b.

_mm256_andnot_pd
__m256d _mm256_andnot_pd(__m256d a, __m256d b)

Compute the bitwise NOT of packed double-precision (64-bit) floating-point elements in a and then AND with b.

_mm256_andnot_ps
__m256 _mm256_andnot_ps(__m256 a, __m256 b)

Compute the bitwise NOT of packed single-precision (32-bit) floating-point elements in a and then AND with b.

_mm256_blend_pd
__m256d _mm256_blend_pd(__m256d a, __m256d b)

Blend packed double-precision (64-bit) floating-point elements from a and b using control mask imm8.

_mm256_blend_ps
__m256 _mm256_blend_ps(__m256 a, __m256 b)

Blend packed single-precision (32-bit) floating-point elements from a and b using control mask imm8.

_mm256_blendv_pd
__m256d _mm256_blendv_pd(__m256d a, __m256d b, __m256d mask)

Blend packed double-precision (64-bit) floating-point elements from a and b using mask.

_mm256_blendv_ps
__m256 _mm256_blendv_ps(__m256 a, __m256 b, __m256 mask)

Blend packed single-precision (32-bit) floating-point elements from a and b using mask. Blend packed single-precision (32-bit) floating-point elements from a and b using mask.

_mm256_broadcast_pd
__m256d _mm256_broadcast_pd(const(__m128d)* mem_addr)

Broadcast 128 bits from memory (composed of 2 packed double-precision (64-bit) floating-point elements) to all elements. This effectively duplicates the 128-bit vector.

_mm256_broadcast_ps
__m256 _mm256_broadcast_ps(const(__m128)* mem_addr)

Broadcast 128 bits from memory (composed of 4 packed single-precision (32-bit) floating-point elements) to all elements. This effectively duplicates the 128-bit vector.

_mm256_broadcast_sd
__m256d _mm256_broadcast_sd(const(double)* mem_addr)

Broadcast a single-precision (32-bit) floating-point element from memory to all elements.

_mm256_broadcast_ss
__m256 _mm256_broadcast_ss(const(float)* mem_addr)
Undocumented in source. Be warned that the author may not have intended to support it.
_mm256_castpd128_pd256
__m256d _mm256_castpd128_pd256(__m128d a)

Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are undefined.

_mm256_castpd256_pd128
__m128d _mm256_castpd256_pd128(__m256d a)

Cast vector of type __m256d to type __m128d; the upper 128 bits of a are lost.

_mm256_castpd_ps
__m256 _mm256_castpd_ps(__m256d a)

Cast vector of type __m256d to type __m256.

_mm256_castpd_si256
__m256i _mm256_castpd_si256(__m256d a)

Cast vector of type __m256d to type __m256i.

_mm256_castps128_ps256
__m256 _mm256_castps128_ps256(__m128 a)

Cast vector of type __m128 to type __m256; the upper 128 bits of the result are undefined.

_mm256_castps256_ps128
__m128 _mm256_castps256_ps128(__m256 a)

Cast vector of type __m256 to type __m128. The upper 128-bit of a are lost.

_mm256_castps_pd
__m256d _mm256_castps_pd(__m256 a)

Cast vector of type __m256 to type __m256d.

_mm256_castps_si256
__m256i _mm256_castps_si256(__m256 a)

Cast vector of type __m256 to type __m256i.

_mm256_castsi128_si256
__m256i _mm256_castsi128_si256(__m128i a)

Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are undefined.

_mm256_castsi256_pd
__m256d _mm256_castsi256_pd(__m256i a)

Cast vector of type __m256i to type __m256d.

_mm256_castsi256_ps
__m256 _mm256_castsi256_ps(__m256i a)

Cast vector of type __m256i to type __m256.

_mm256_castsi256_si128
__m128i _mm256_castsi256_si128(__m256i a)

Cast vector of type __m256i to type __m128i. The upper 128-bit of a are lost.

_mm256_cvtepi32_pd
__m256d _mm256_cvtepi32_pd(__m128i a)

Convert packed signed 32-bit integers in a to packed double-precision (64-bit) floating-point elements.

_mm256_cvtepi32_ps
__m256 _mm256_cvtepi32_ps(__m256i a)

Convert packed signed 32-bit integers in a to packed single-precision (32-bit) floating-point elements.

_mm256_cvtpd_ps
__m128 _mm256_cvtpd_ps(__m256d a)

Convert packed double-precision (64-bit) floating-point elements in a to packed single-precision (32-bit) floating-point elements.

_mm256_cvtps_pd
__m256d _mm256_cvtps_pd(__m128 a)

Convert packed single-precision (32-bit) floating-point elements in a` to packed double-precision (64-bit) floating-point elements.

_mm256_cvtsd_f64
double _mm256_cvtsd_f64(__m256d a)

Return the lower double-precision (64-bit) floating-point element of a.

_mm256_cvtsi256_si32
int _mm256_cvtsi256_si32(__m256i a)

Return the lower 32-bit integer in a.

_mm256_cvtss_f32
float _mm256_cvtss_f32(__m256 a)

Return the lower single-precision (32-bit) floating-point element of a.

_mm256_cvttpd_epi32
__m128i _mm256_cvttpd_epi32(__m256d a)

Convert packed double-precision (64-bit) floating-point elements in a to packed 32-bit integers with truncation.

_mm256_cvttps_epi32
__m256i _mm256_cvttps_epi32(__m256 a)

Convert packed single-precision (32-bit) floating-point elements in a.

_mm256_div_pd
__m256d _mm256_div_pd(__m256d a, __m256d b)

Divide packed double-precision (64-bit) floating-point elements in a by packed elements in b.

_mm256_div_ps
__m256 _mm256_div_ps(__m256 a, __m256 b)

Divide packed single-precision (32-bit) floating-point elements in a by packed elements in b.

_mm256_dp_ps
__m256 _mm256_dp_ps(__m256 a, __m256 b)

Conditionally multiply the packed single-precision (32-bit) floating-point elements in a and b using the high 4 bits in imm8, sum the four products, and conditionally store the sum using the low 4 bits of imm8.

_mm256_extract_epi32
int _mm256_extract_epi32(__m256i a, int imm8)

Extract a 32-bit integer from a, selected with imm8.

_mm256_extract_epi64
long _mm256_extract_epi64(__m256i a, int index)

Extract a 64-bit integer from a, selected with index.

_mm256_extractf128_pd
__m128d _mm256_extractf128_pd(__m256d a)
_mm256_extractf128_ps
__m128 _mm256_extractf128_ps(__m256 a)
_mm256_extractf128_si256
__m128i _mm256_extractf128_si256(__m256i a)

Extract a 128-bits lane from a, selected with index (0 or 1).

_mm256_hadd_pd
__m256d _mm256_hadd_pd(__m256d a, __m256d b)

Horizontally add adjacent pairs of double-precision (64-bit) floating-point elements in a and b.

_mm256_hadd_ps
__m256 _mm256_hadd_ps(__m256 a, __m256 b)

Horizontally add adjacent pairs of single-precision (32-bit) floating-point elements in a and b.

_mm256_hsub_pd
__m256d _mm256_hsub_pd(__m256d a, __m256d b)

Horizontally subtract adjacent pairs of double-precision (64-bit) floating-point elements in a and b.

_mm256_hsub_ps
__m256 _mm256_hsub_ps(__m256 a, __m256 b)
Undocumented in source. Be warned that the author may not have intended to support it.
_mm256_insertf128_pd
__m256d _mm256_insertf128_pd(__m256d a, __m128d b)

Copy a, then insert 128 bits (composed of 2 packed double-precision (64-bit) floating-point elements) from b at the location specified by imm8.

_mm256_insertf128_ps
__m256 _mm256_insertf128_ps(__m256 a, __m128 b)

Copy a then insert 128 bits (composed of 4 packed single-precision (32-bit) floating-point elements) from b, at the location specified by imm8.

_mm256_insertf128_si256
__m256i _mm256_insertf128_si256(__m256i a, __m128i b)

Copy a, then insert 128 bits from b at the location specified by imm8.

_mm256_lddqu_si256
__m256i _mm256_lddqu_si256(const(__m256i)* mem_addr)

Load 256-bits of integer data from unaligned memory into dst. This intrinsic may perform better than _mm256_loadu_si256 when the data crosses a cache line boundary.

_mm256_load_pd
__m256d _mm256_load_pd(const(double)* mem_addr)

Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_load_ps
__m256 _mm256_load_ps(const(float)* mem_addr)

Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_load_si256
__m256i _mm256_load_si256(const(void)* mem_addr)

Load 256-bits of integer data from memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_loadu2_m128
__m256 _mm256_loadu2_m128(const(float)* hiaddr, const(float)* loaddr)

Load two 128-bit values (composed of 4 packed single-precision (32-bit) floating-point elements) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_loadu2_m128d
__m256d _mm256_loadu2_m128d(const(double)* hiaddr, const(double)* loaddr)

Load two 128-bit values (composed of 2 packed double-precision (64-bit) floating-point elements) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_loadu2_m128i
__m256i _mm256_loadu2_m128i(const(__m128i)* hiaddr, const(__m128i)* loaddr)

Load two 128-bit values (composed of integer data) from memory, and combine them into a 256-bit value. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_loadu_pd
__m256d _mm256_loadu_pd(const(void)* mem_addr)

Load 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_loadu_ps
__m256 _mm256_loadu_ps(const(float)* mem_addr)

Load 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_loadu_si256
__m256i _mm256_loadu_si256(const(__m256i)* mem_addr)

Load 256-bits of integer data from memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_max_pd
__m256d _mm256_max_pd(__m256d a, __m256d b)

Compare packed double-precision (64-bit) floating-point elements in a and b, and return packed maximum values.

_mm256_max_ps
__m256 _mm256_max_ps(__m256 a, __m256 b)

Compare packed single-precision (32-bit) floating-point elements in a and b, and return packed maximum values.

_mm256_min_pd
__m256d _mm256_min_pd(__m256d a, __m256d b)

packed minimum values.

_mm256_min_ps
__m256 _mm256_min_ps(__m256 a, __m256 b)

Compare packed single-precision (32-bit) floating-point elements in a and b, and return packed maximum values.

_mm256_mul_pd
__m256d _mm256_mul_pd(__m256d a, __m256d b)

Multiply packed double-precision (64-bit) floating-point elements in a and b.

_mm256_mul_ps
__m256 _mm256_mul_ps(__m256 a, __m256 b)

Multiply packed single-precision (32-bit) floating-point elements in a and b.

_mm256_not_si256
__m256i _mm256_not_si256(__m256i a)

Compute the bitwise NOT of 256 bits in a. #BONUS

_mm256_or_pd
__m256d _mm256_or_pd(__m256d a, __m256d b)

Compute the bitwise OR of packed double-precision (64-bit) floating-point elements in a and b.

_mm256_or_ps
__m256 _mm256_or_ps(__m256 a, __m256 b)

Compute the bitwise OR of packed single-precision (32-bit) floating-point elements in a and b.

_mm256_set1_epi16
__m256i _mm256_set1_epi16(short a)

Broadcast 16-bit integer a to all elements of the return value.

_mm256_set1_epi32
__m256i _mm256_set1_epi32(int a)

Broadcast 32-bit integer a to all elements.

_mm256_set1_epi64x
__m256i _mm256_set1_epi64x(long a)

Broadcast 64-bit integer a to all elements of the return value.

_mm256_set1_epi8
__m256i _mm256_set1_epi8(byte a)

Broadcast 8-bit integer a to all elements of the return value.

_mm256_set1_pd
__m256d _mm256_set1_pd(double a)

Broadcast double-precision (64-bit) floating-point value a to all elements of the return value.

_mm256_set1_ps
__m256 _mm256_set1_ps(float a)

Broadcast single-precision (32-bit) floating-point value a to all elements of the return value.

_mm256_set_epi16
__m256i _mm256_set_epi16(short e15, short e14, short e13, short e12, short e11, short e10, short e9, short e8, short e7, short e6, short e5, short e4, short e3, short e2, short e1, short e0)

Set packed 16-bit integers with the supplied values.

_mm256_set_epi32
__m256i _mm256_set_epi32(int e7, int e6, int e5, int e4, int e3, int e2, int e1, int e0)

Set packed 32-bit integers with the supplied values.

_mm256_set_epi64x
__m256i _mm256_set_epi64x(long e3, long e2, long e1, long e0)

Set packed 64-bit integers with the supplied values.

_mm256_set_epi8
__m256i _mm256_set_epi8(byte e31, byte e30, byte e29, byte e28, byte e27, byte e26, byte e25, byte e24, byte e23, byte e22, byte e21, byte e20, byte e19, byte e18, byte e17, byte e16, byte e15, byte e14, byte e13, byte e12, byte e11, byte e10, byte e9, byte e8, byte e7, byte e6, byte e5, byte e4, byte e3, byte e2, byte e1, byte e0)

Set packed 8-bit integers with the supplied values.

_mm256_set_m128
__m256 _mm256_set_m128(__m128 hi, __m128 lo)

Set packed __m256d vector with the supplied values.

_mm256_set_m128d
__m256d _mm256_set_m128d(__m128d hi, __m128d lo)

Set packed __m256d vector with the supplied values.

_mm256_set_m128i
__m256i _mm256_set_m128i(__m128i hi, __m128i lo)

Set packed __m256i vector with the supplied values.

_mm256_set_pd
__m256d _mm256_set_pd(double e3, double e2, double e1, double e0)

Set packed double-precision (64-bit) floating-point elements with the supplied values.

_mm256_set_ps
__m256 _mm256_set_ps(float e7, float e6, float e5, float e4, float e3, float e2, float e1, float e0)

Set packed single-precision (32-bit) floating-point elements with the supplied values.

_mm256_setr_epi16
__m256i _mm256_setr_epi16(short e15, short e14, short e13, short e12, short e11, short e10, short e9, short e8, short e7, short e6, short e5, short e4, short e3, short e2, short e1, short e0)

Set packed 16-bit integers with the supplied values in reverse order.

_mm256_setr_epi32
__m256i _mm256_setr_epi32(int e7, int e6, int e5, int e4, int e3, int e2, int e1, int e0)

Set packed 32-bit integers with the supplied values in reverse order.

_mm256_setr_epi64x
__m256i _mm256_setr_epi64x(long e3, long e2, long e1, long e0)

Set packed 64-bit integers with the supplied values in reverse order.

_mm256_setr_epi8
__m256i _mm256_setr_epi8(byte e31, byte e30, byte e29, byte e28, byte e27, byte e26, byte e25, byte e24, byte e23, byte e22, byte e21, byte e20, byte e19, byte e18, byte e17, byte e16, byte e15, byte e14, byte e13, byte e12, byte e11, byte e10, byte e9, byte e8, byte e7, byte e6, byte e5, byte e4, byte e3, byte e2, byte e1, byte e0)

Set packed 8-bit integers with the supplied values in reverse order.

_mm256_setr_m128
__m256 _mm256_setr_m128(__m128 lo, __m128 hi)

Set packed __m256 vector with the supplied values.

_mm256_setr_m128d
__m256d _mm256_setr_m128d(__m128d lo, __m128d hi)

Set packed __m256d vector with the supplied values.

_mm256_setr_m128i
__m256i _mm256_setr_m128i(__m128i lo, __m128i hi)

Set packed __m256i vector with the supplied values.

_mm256_setr_pd
__m256d _mm256_setr_pd(double e3, double e2, double e1, double e0)

Set packed double-precision (64-bit) floating-point elements with the supplied values in reverse order.

_mm256_setr_ps
__m256 _mm256_setr_ps(float e7, float e6, float e5, float e4, float e3, float e2, float e1, float e0)

Set packed single-precision (32-bit) floating-point elements with the supplied values in reverse order.

_mm256_setzero_pd
__m256d _mm256_setzero_pd()

Return vector of type __m256d with all elements set to zero.

_mm256_setzero_ps
__m256 _mm256_setzero_ps()

Return vector of type __m256 with all elements set to zero.

_mm256_setzero_si256
__m256i _mm256_setzero_si256()

Return vector of type __m256i with all elements set to zero.

_mm256_shuffle_pd
__m256d _mm256_shuffle_pd(__m256d a, __m256d b)

Shuffle double-precision (64-bit) floating-point elements within 128-bit lanes using the control in imm8.

_mm256_shuffle_ps
__m256 _mm256_shuffle_ps(__m256 a, __m256 b)

Shuffle single-precision (32-bit) floating-point elements in a within 128-bit lanes using the control in imm8.

_mm256_sqrt_pd
__m256d _mm256_sqrt_pd(__m256d a)

Compute the square root of packed double-precision (64-bit) floating-point elements in a.

_mm256_sqrt_ps
__m256 _mm256_sqrt_ps(__m256 a)

Compute the square root of packed single-precision (32-bit) floating-point elements in a.

_mm256_store_pd
void _mm256_store_pd(double* mem_addr, __m256d a)

Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_store_ps
void _mm256_store_ps(float* mem_addr, __m256 a)

Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_store_si256
void _mm256_store_si256(__m256i* mem_addr, __m256i a)

Store 256-bits of integer data from a into memory. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated.

_mm256_storeu2_m128
void _mm256_storeu2_m128(float* hiaddr, float* loaddr, __m256 a)

Store the high and low 128-bit halves (each composed of 4 packed single-precision (32-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_storeu2_m128d
void _mm256_storeu2_m128d(double* hiaddr, double* loaddr, __m256d a)

Store the high and low 128-bit halves (each composed of 2 packed double-precision (64-bit) floating-point elements) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_storeu2_m128i
void _mm256_storeu2_m128i(__m128i* hiaddr, __m128i* loaddr, __m256i a)

Store the high and low 128-bit halves (each composed of integer data) from a into memory two different 128-bit locations. hiaddr and loaddr do not need to be aligned on any particular boundary.

_mm256_storeu_pd
void _mm256_storeu_pd(double* mem_addr, __m256d a)

Store 256-bits (composed of 4 packed double-precision (64-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_storeu_ps
void _mm256_storeu_ps(float* mem_addr, __m256 a)

Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_storeu_si256
void _mm256_storeu_si256(__m256i* mem_addr, __m256i a)

Store 256-bits of integer data from a into memory. mem_addr does not need to be aligned on any particular boundary.

_mm256_stream_pd
void _mm256_stream_pd(double* mem_addr, __m256d a)

Store 256-bits (composed of 4 packed single-precision (64-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. Note: non-temporal stores should be followed by _mm_sfence() for reader threads.

_mm256_stream_ps
void _mm256_stream_ps(float* mem_addr, __m256 a)

Store 256-bits (composed of 8 packed single-precision (32-bit) floating-point elements) from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. Note: non-temporal stores should be followed by _mm_sfence() for reader threads.

_mm256_stream_si256
void _mm256_stream_si256(__m256i* mem_addr, __m256i a)

Store 256-bits of integer data from a into memory using a non-temporal memory hint. mem_addr must be aligned on a 32-byte boundary or a general-protection exception may be generated. Note: there isn't any particular instruction in AVX to do that. It just defers to SSE2. Note: non-temporal stores should be followed by _mm_sfence() for reader threads.

_mm256_sub_pd
__m256d _mm256_sub_pd(__m256d a, __m256d b)

Subtract packed double-precision (64-bit) floating-point elements in b from packed double-precision (64-bit) floating-point elements in a.

_mm256_sub_ps
__m256 _mm256_sub_ps(__m256 a, __m256 b)

Subtract packed single-precision (32-bit) floating-point elements in b from packed single-precision (32-bit) floating-point elements in a.

_mm256_undefined_pd
__m256d _mm256_undefined_pd()

Return vector of type __m256d with undefined elements.

_mm256_undefined_ps
__m256 _mm256_undefined_ps()

Return vector of type __m256 with undefined elements.

_mm256_undefined_si256
__m256i _mm256_undefined_si256()

Return vector of type __m256i with undefined elements.

_mm256_unpackhi_pd
__m256d _mm256_unpackhi_pd(__m256d a, __m256d b)

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b.

_mm256_unpackhi_ps
__m256 _mm256_unpackhi_ps(__m256 a, __m256 b)

Unpack and interleave double-precision (64-bit) floating-point elements from the high half of each 128-bit lane in a and b.

_mm256_unpacklo_pd
__m256d _mm256_unpacklo_pd(__m256d a, __m256d b)

Unpack and interleave double-precision (64-bit) floating-point elements from the low half of each 128-bit lane in a and b.

_mm256_unpacklo_ps
__m256 _mm256_unpacklo_ps(__m256 a, __m256 b)

Unpack and interleave single-precision (32-bit) floating-point elements from the low half of each 128-bit lane in a and b.

_mm256_xor_pd
__m256d _mm256_xor_pd(__m256d a, __m256d b)

Compute the bitwise XOR of packed double-precision (64-bit) floating-point elements in a and b.

_mm256_xor_ps
__m256 _mm256_xor_ps(__m256 a, __m256 b)

Compute the bitwise XOR of packed single-precision (32-bit) floating-point elements in a and b.

_mm256_zeroall
void _mm256_zeroall()
Undocumented in source. Be warned that the author may not have intended to support it.
_mm256_zeroupper
void _mm256_zeroupper()
Undocumented in source. Be warned that the author may not have intended to support it.
_mm256_zextpd128_pd256
__m256d _mm256_zextpd128_pd256(__m128d a)

Cast vector of type __m128d to type __m256d; the upper 128 bits of the result are zeroed.

_mm256_zextps128_ps256
__m256 _mm256_zextps128_ps256(__m128 a)

Cast vector of type __m128 to type __m256; the upper 128 bits of the result are zeroed.

_mm256_zextsi128_si256
__m256i _mm256_zextsi128_si256(__m128i a)

Cast vector of type __m128i to type __m256i; the upper 128 bits of the result are zeroed.

_mm_broadcast_ss
__m128 _mm_broadcast_ss(const(float)* mem_addr)

Broadcast a single-precision (32-bit) floating-point element from memory to all elements.

Meta