In theory, masked load/store can adress unadressable memory provided the mask is zero.
In practice, that is not the case for the following reasons:
- AMD manual says:
"Exception and trap behavior for elements not selected for loading or storing from/to memory
is implementation dependent. For instance, a given implementation may signal a data
breakpoint or a page fault for doublewords that are zero-masked and not actually written."
"Even if the mask is stored in the special mask registers, it will still first fetch the data
before checking the mask."
So intel-intrinsics adopted the tightened semantics of only adressing fully addressable memory
with masked loads and stores.
Some AVX intrinsics takes a float comparison constant.
When labelled "ordered" it means "AND ordered"
When labelled "unordered" it means "OR unordered"
IMPORTANT NOTE ABOUT MASK LOAD/STORE:
In theory, masked load/store can adress unadressable memory provided the mask is zero. In practice, that is not the case for the following reasons:
- AMD manual says: "Exception and trap behavior for elements not selected for loading or storing from/to memory is implementation dependent. For instance, a given implementation may signal a data breakpoint or a page fault for doublewords that are zero-masked and not actually written."
- Intel fetches the whole cacheline anyway: https://erik.science/2019/06/21/AVX-fun.html
"Even if the mask is stored in the special mask registers, it will still first fetch the data before checking the mask."
So intel-intrinsics adopted the tightened semantics of only adressing fully addressable memory with masked loads and stores. Some AVX intrinsics takes a float comparison constant. When labelled "ordered" it means "AND ordered" When labelled "unordered" it means "OR unordered"