gpt4 book ai didi

c++ - 带符号的 16 位 SSE 平均值

转载 作者:太空狗 更新时间:2023-10-29 21:26:14 29 4
gpt4 key购买 nike

_mm_avg_epu16 通过 PAVGW 提供两个无符号 16 位整数的平均值。转换为 float 并除以 2. 是使用 SSE 获取 两个有符号 16 位整数的平均值的唯一合适(最佳)方法 一个“向上取整后反转最高位的有符号平均值”(@Mysticial),或者有其他方法吗?


编辑:这是我想优化的代码,到目前为止,我使用 SSE 的所有尝试都接近但不完全匹配,通常是围绕饱和/溢出包装的问题:

int16_t *a;
int16_t *b;
uint16_t *out;

out[i] = int((a[i] + b[i]) / 2.0f + 32768.5f)

尝试 #1:

const __m128i outputVal = _mm_add_epi16(_mm_avg_epu16(a, b),  _mm_set1_epi16(32768));

尝试#2:

const __m128i sum = _mm_add_epi16(a, b);
const __m128i outputVal = _mm_add_epi16(_mm_srai_epi16(sum, 1), _mm_set1_epi16(32768));

尝试 #3:

const __m128 elt_offset = _mm_set1_ps(32768.5f);

const __m128 avg_divisor = _mm_set1_ps(2.f);

const __m128i eltSum = _mm_add_epi16(edgeRowElts, edgeInnerRowElts); /* eltSum = int((inputData[i] + inputData[i + (direction*x)]) */
const __m64 eltSumLow = _mm_movepi64_pi64(eltSum); /* eltSumLow = (__m64) (0x0ffffffff & eltSum) */
const __m64 eltSumHigh = _mm_movepi64_pi64(_mm_srli_si128(eltSum, 8)); /* eltSumHigh = (__m64) (0x0ffffffff & (eltSum >> 64)) */

/* Lower */
__m128 eltSumF = _mm_cvtpi16_ps(eltSumLow); /* eltSumF = (float) eltSum; */

__m128 eltAvg = _mm_div_ps(eltSumF, avg_divisor); /* eltAvg = eltSum / 2.0f */
__m128 eltAvgOffset = _mm_add_ps(eltAvg, elt_offset); /* eltAvgOffset = eltAvg + 32768.5f */
const __m64 outputValLow = _mm_cvtps_pi16(eltAvgOffset); /* outputVal = (short) eltAvgOffset */

/* Upper */
eltSumF = _mm_cvtpi16_ps(eltSumHigh); /* eltSumF = (float) eltSum; */

eltAvg = _mm_div_ps(eltSumF, avg_divisor); /* eltAvg = eltSum / 2.0f */
eltAvgOffset = _mm_add_ps(eltAvg, elt_offset); /* eltAvgOffset = eltAvg + 32768.5f */
const __m64 outputValHigh = _mm_cvtps_pi16(eltAvgOffset); /* outputVal = (short) eltAvgOffset */

__m128i outputVal = _mm_slli_si128(_mm_movpi64_epi64(outputValHigh), 8); /* outputVal = (outputValHigh << 64); */
outputVal = _mm_or_si128(outputVal, _mm_movpi64_epi64(outputValLow)); /* outputVal = outputVal | (outputValLow); */

最佳答案

我不确定我是否完全理解这里的所有要求,但似乎:

a = _mm_add_epi16(a, _mm_set1_epi16(32768));
b = _mm_add_epi16(b, _mm_set1_epi16(32768));
outputVal = _mm_avg_epu16(a, b);

应该给你除了四舍五入要求之外的一切。

如果是这样,那么事后修正四舍五入应该不难:

round = _mm_xor_si128(a, b);
round = _mm_and_si128(round, _mm_set1_epi16(1));
outputVal = _mm_add_epi16(outputVal, round);

关于c++ - 带符号的 16 位 SSE 平均值,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/12152640/

29 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com