Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded
from addresses starting at base_addr and offset by each 64-bit element in vindex
(each index is scaled by the factor in scale). Gathered elements are merged using
mask (elements are copied from src when the highest bit is not set in the
corresponding element). scale should be 1, 2, 4 or 8.
Gather 32-bit integers from memory using 64-bit indices. 32-bit elements are loaded from addresses starting at base_addr and offset by each 64-bit element in vindex (each index is scaled by the factor in scale). Gathered elements are merged using mask (elements are copied from src when the highest bit is not set in the corresponding element). scale should be 1, 2, 4 or 8.