_mm_maskload_ps

Load packed single-precision (32-bit) floating-point elements from memory using mask (elements are zeroed out when the high bit of the corresponding element is not set). Note: emulating that instruction isn't efficient, since it needs to perform memory access only when needed. See: "Note about mask load/store" to know why you must address valid memory only.

Meta