_mm256_dp_ps

Conditionally multiply the packed single-precision (32-bit) floating-point elements in a and b using the high 4 bits in imm8, sum the four products, and conditionally store the sum using the low 4 bits of imm8.

Meta