Harlequin RIP SDK
Harlequin SIMD (Single Instruction Multiple Data) operations

Files

file  hqsimd.h
 Macros implementing SIMD operations.
 

Macros

#define SIMD_16x8u_COPY_NONZERO(dest, src)
 
#define SIMD_16x8u_COPY_NONZERO(dest, src)
 
#define SIMD_16x8u_COPY_NONZERO(dest, src)
 
#define SIMD_8x16u_COPY_NONZERO(dest, src)
 
#define SIMD_8x16u_COPY_NONZERO(dest, src)
 
#define SIMD_8x16u_COPY_NONZERO(dest, src)
 

Typedefs

typedef __m128i simd_16x8i_t
 
typedef __m128i simd_16x8u_t
 
typedef __m128i simd_8x16i_t
 
typedef __m128i simd_8x16u_t
 
typedef __m128i simd_4x32i_t
 
typedef __m128i simd_4x32u_t
 
typedef __m128 simd_4x32f_t
 

Detailed Description

We provide a set of packaged SIMD operations, which we can use to optimise common operations on multiple compilers and processor architectures, using either compiler intrinsics or in-line assembly.

Short sequences of SIMD operations are packaged into macros, with a generic version of the operation in the initial section. Then, for each architecture on which specialisation is done, the generic macro is undefined, and a specialised version of the macro is implemented.

Macro Definition Documentation

◆ SIMD_16x8u_COPY_NONZERO [1/3]

#define SIMD_16x8u_COPY_NONZERO (   dest,
  src 
)
Value:
MACRO_START \
uint8 *_dest_ = (dest), *_src_ = (src) ; \
for ( unsigned int _i_ = 0 ; _i_ < 16 ; ++_i_ ) { \
if ( _src_[_i_] != 0 ) \
_dest_[_i_] = _src_[_i_] ; \
} \
MACRO_END

Copy 16 byte values from src to dest, but only if the value in src is not zero.

Parameters
[out]destDestination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be.
[in]srcSource address to copy from. For best performance, this should be 128 bit aligned, but does not need to be.

◆ SIMD_16x8u_COPY_NONZERO [2/3]

#define SIMD_16x8u_COPY_NONZERO (   dest,
  src 
)

Copy 16 byte values from src to dest, but only if the value in src is not zero.

Parameters
[out]destDestination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be.
[in]srcSource address to copy from. For best performance, this should be 128 bit aligned, but does not need to be.

◆ SIMD_16x8u_COPY_NONZERO [3/3]

#define SIMD_16x8u_COPY_NONZERO (   dest,
  src 
)
Value:
MACRO_START \
simd_16x8u_t src128 = vld1q_u8((uint8_t const *)(src)) ; \
simd_16x8u_t dest128 = vld1q_u8((uint8_t const *)(dest)) ; \
simd_16x8u_t mask128 = vceqzq_u8(src128) ; \
/* Select from first source if mask bit is 1, second source if 0. \
i.e., select destination if source was 0, source if non-zero. */ \
simd_16x8u_t out128 = vbslq_u8(mask128, dest128, src128) ; \
vst1q_u8((uint8_t *)(dest), out128) ; \
MACRO_END

Copy 16 byte values from src to dest, but only if the value in src is not zero.

Parameters
[out]destDestination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be.
[in]srcSource address to copy from. For best performance, this should be 128 bit aligned, but does not need to be.

◆ SIMD_8x16u_COPY_NONZERO [1/3]

#define SIMD_8x16u_COPY_NONZERO (   dest,
  src 
)
Value:
MACRO_START \
uint16 *_dest_ = (dest), *_src_ = (src) ; \
for ( unsigned int _i_ = 0 ; _i_ < 8 ; ++_i_ ) { \
if ( _src_[_i_] != 0 ) \
_dest_[_i_] = _src_[_i_] ; \
} \
MACRO_END

Copy 8 short values from src to dest, but only if the value in src is not zero.

Parameters
[out]destDestination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be.
[in]srcSource address to copy from. For best performance, this should be 128 bit aligned, but does not need to be.

◆ SIMD_8x16u_COPY_NONZERO [2/3]

#define SIMD_8x16u_COPY_NONZERO (   dest,
  src 
)

Copy 8 short values from src to dest, but only if the value in src is not zero.

Parameters
[out]destDestination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be.
[in]srcSource address to copy from. For best performance, this should be 128 bit aligned, but does not need to be.

◆ SIMD_8x16u_COPY_NONZERO [3/3]

#define SIMD_8x16u_COPY_NONZERO (   dest,
  src 
)
Value:
MACRO_START \
simd_8x16u_t src128 = vld1q_u16((uint16_t const *)(src)) ; \
simd_8x16u_t dest128 = vld1q_u16((uint16_t const *)(dest)) ; \
simd_8x16u_t mask128 = vceqzq_u16(src128) ; \
/* Select from first source if mask bit is 1, second source if 0. \
i.e., select destination if source was 0, source if non-zero. */ \
simd_8x16u_t out128 = vbslq_u16(mask128, dest128, src128) ; \
vst1q_u16((uint16_t *)(dest), out128) ; \
MACRO_END

Copy 8 short values from src to dest, but only if the value in src is not zero.

Parameters
[out]destDestination address to copy to. For best performance, this should be 128 bit aligned, but does not need to be.
[in]srcSource address to copy from. For best performance, this should be 128 bit aligned, but does not need to be.

Typedef Documentation

◆ simd_16x8i_t

typedef int8x16_t simd_16x8i_t

A SIMD type that supports 16x8bit signed integer operations.

◆ simd_16x8u_t

typedef uint8x16_t simd_16x8u_t

A SIMD type that supports 16x8bit unsigned integer operations.

◆ simd_4x32f_t

typedef float32x4_t simd_4x32f_t

A SIMD type that supports 4x32bit floating point operations.

◆ simd_4x32i_t

typedef int32x4_t simd_4x32i_t

A SIMD type that supports 4x32bit signed integer operations.

◆ simd_4x32u_t

typedef uint32x4_t simd_4x32u_t

A SIMD type that supports 4x32bit unsigned integer operations.

◆ simd_8x16i_t

typedef int16x8_t simd_8x16i_t

A SIMD type that supports 8x16bit signed integer operations.

◆ simd_8x16u_t

typedef uint16x8_t simd_8x16u_t

A SIMD type that supports 8x16bit unsigned integer operations.