|
file | hqsimd.h |
| Macros implementing SIMD operations.
|
|
We provide a set of packaged SIMD operations, which we can use to optimise common operations on multiple compilers and processor architectures, using either compiler intrinsics or in-line assembly.
Short sequences of SIMD operations are packaged into macros, with a generic version of the operation in the initial section. Then, for each architecture on which specialisation is done, the generic macro is undefined, and a specialised version of the macro is implemented.
◆ SIMD_16x8u_COPY_NONZERO [1/3]
#define SIMD_16x8u_COPY_NONZERO |
( |
|
dest, |
|
|
|
src |
|
) |
| |
Value: MACRO_START \
uint8 *_dest_ = (dest), *_src_ = (src) ; \
for ( unsigned int _i_ = 0 ; _i_ < 16 ; ++_i_ ) { \
if ( _src_[_i_] != 0 ) \
_dest_[_i_] = _src_[_i_] ; \
} \
MACRO_END
Copy 16 byte values from src to dest, but only if the value in src is not zero.
- Parameters
-
[out] | dest | Destination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be. |
[in] | src | Source address to copy from. For best performance, this should be 128 bit aligned, but does not need to be. |
◆ SIMD_16x8u_COPY_NONZERO [2/3]
#define SIMD_16x8u_COPY_NONZERO |
( |
|
dest, |
|
|
|
src |
|
) |
| |
Copy 16 byte values from src to dest, but only if the value in src is not zero.
- Parameters
-
[out] | dest | Destination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be. |
[in] | src | Source address to copy from. For best performance, this should be 128 bit aligned, but does not need to be. |
◆ SIMD_16x8u_COPY_NONZERO [3/3]
#define SIMD_16x8u_COPY_NONZERO |
( |
|
dest, |
|
|
|
src |
|
) |
| |
Value: MACRO_START \
simd_16x8u_t src128 = vld1q_u8((uint8_t const *)(src)) ; \
simd_16x8u_t dest128 = vld1q_u8((uint8_t const *)(dest)) ; \
simd_16x8u_t mask128 = vceqzq_u8(src128) ; \
\
simd_16x8u_t out128 = vbslq_u8(mask128, dest128, src128) ; \
vst1q_u8((uint8_t *)(dest), out128) ; \
MACRO_END
Copy 16 byte values from src to dest, but only if the value in src is not zero.
- Parameters
-
[out] | dest | Destination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be. |
[in] | src | Source address to copy from. For best performance, this should be 128 bit aligned, but does not need to be. |
◆ SIMD_8x16u_COPY_NONZERO [1/3]
#define SIMD_8x16u_COPY_NONZERO |
( |
|
dest, |
|
|
|
src |
|
) |
| |
Value: MACRO_START \
uint16 *_dest_ = (dest), *_src_ = (src) ; \
for ( unsigned int _i_ = 0 ; _i_ < 8 ; ++_i_ ) { \
if ( _src_[_i_] != 0 ) \
_dest_[_i_] = _src_[_i_] ; \
} \
MACRO_END
Copy 8 short values from src to dest, but only if the value in src is not zero.
- Parameters
-
[out] | dest | Destination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be. |
[in] | src | Source address to copy from. For best performance, this should be 128 bit aligned, but does not need to be. |
◆ SIMD_8x16u_COPY_NONZERO [2/3]
#define SIMD_8x16u_COPY_NONZERO |
( |
|
dest, |
|
|
|
src |
|
) |
| |
Copy 8 short values from src to dest, but only if the value in src is not zero.
- Parameters
-
[out] | dest | Destination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be. |
[in] | src | Source address to copy from. For best performance, this should be 128 bit aligned, but does not need to be. |
◆ SIMD_8x16u_COPY_NONZERO [3/3]
#define SIMD_8x16u_COPY_NONZERO |
( |
|
dest, |
|
|
|
src |
|
) |
| |
Value: MACRO_START \
simd_8x16u_t src128 = vld1q_u16((uint16_t const *)(src)) ; \
simd_8x16u_t dest128 = vld1q_u16((uint16_t const *)(dest)) ; \
simd_8x16u_t mask128 = vceqzq_u16(src128) ; \
\
simd_8x16u_t out128 = vbslq_u16(mask128, dest128, src128) ; \
vst1q_u16((uint16_t *)(dest), out128) ; \
MACRO_END
Copy 8 short values from src to dest, but only if the value in src is not zero.
- Parameters
-
[out] | dest | Destination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be. |
[in] | src | Source address to copy from. For best performance, this should be 128 bit aligned, but does not need to be. |
◆ simd_16x8i_t
A SIMD type that supports 16x8bit signed integer operations.
◆ simd_16x8u_t
A SIMD type that supports 16x8bit unsigned integer operations.
◆ simd_4x32f_t
A SIMD type that supports 4x32bit floating point operations.
◆ simd_4x32i_t
A SIMD type that supports 4x32bit signed integer operations.
◆ simd_4x32u_t
A SIMD type that supports 4x32bit unsigned integer operations.
◆ simd_8x16i_t
A SIMD type that supports 8x16bit signed integer operations.
◆ simd_8x16u_t
A SIMD type that supports 8x16bit unsigned integer operations.