gpt4 book ai didi

cuda 快速近似函数 : what is the trade-off?

转载 作者:行者123 更新时间:2023-12-02 07:24:59 27 4
gpt4 key购买 nike

我在寻找 sigmoid 内核的 sigmoid 函数和 sigmoid 素数实现,我不小心迷路了 upon a reply在 SO 中,它使用了 __fmul_rz 和其他一些 CUDA 函数名称。所以我出于好奇用谷歌搜索了它们,发现那些是单精度函数 as shown here (注意:那些适用于 4.1)。

文档说这些是快速近似值,所以直觉说它们跳过了精度,以便使计算更快?

以前我有:

float x = 1.f / (1.f + exp ( -1.f * input ) );
return x * ( 1.f - x );

而现在,我有:

float s = __fdividef( 1.f, (1.f + __expf(-1.f*input)));
return x = s * (1.f - s);

我认为上述两个可能有不同的结果是否正确?

最佳答案

Am I right to assume that the two above may have different results?

你的假设是正确的。快速数学内在函数以性能换取精度和处理某些特殊情况。由用户决定这是否是可接受的权衡。

CUDA C Programming Guide, Appendix D.2. Intrinsic functions :

Among these functions are the less accurate, but faster versions of some of the functions of Standard Functions .They have the same name prefixed with __ (such as __sinf(x)). They are faster as they map to fewer native instructions. [...] In addition to reducing the accuracy of the affected functions, it may also cause some differences in special case handling.

文档还提供了一个实际的区别示例:

[...] for 2126 < y < 2128, __fdividef(x,y) delivers a result of zero, whereas the / operator delivers the correct result to within the accuracy stated in Table 9. Also, for 2126 < y < 2128, if x is infinity, __fdividef(x,y) delivers a NaN (as a result of multiplying infinity by zero), while the / operator returns infinity.

对于 __expf(x),最大 ULP 误差范围规定为 2 + floor(abs(1.16 * x)) 而符合 IEEE 的 expf 的最大 ULP 误差界限为 2。

关于cuda 快速近似函数 : what is the trade-off?,我们在Stack Overflow上找到一个类似的问题: https://stackoverflow.com/questions/33662894/

27 4 0
Copyright 2021 - 2024 cfsdn All Rights Reserved 蜀ICP备2022000587号
广告合作:1813099741@qq.com 6ren.com