Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory. mem_addr does not need to be aligned on any particular boundary.
See Implementation
Load 128-bits (composed of 4 packed single-precision (32-bit) floating-point elements) from memory. mem_addr does not need to be aligned on any particular boundary.