Compute the square root of the lower single-precision (32-bit) floating-point element in a, store it in the lower element, and copy the upper 3 packed elements from a to the upper elements of result.
See Implementation
Compute the square root of the lower single-precision (32-bit) floating-point element in a, store it in the lower element, and copy the upper 3 packed elements from a to the upper elements of result.