英文:
How to get _mm256_rcp_pd in AVX2?
问题
_mm256_rcp_pd
在 AVX 或 AVX2 中并不存在。
在 AVX512 中,我们有 _mm256_rcp14_pd
。
在 AVX2 中,是否有一种获取双精度快速近似倒数的方法?我们是否应该先转换为单精度,然后再转回双精度?
英文:
For some reason _mm256_rcp_pd
is not in AVX or AVX2.
In AVX512 we got _mm256_rcp14_pd
.
Is there a way to get a fast approximate reciprocal in double precision on AVX2? Are we supposed to convert to single precision and then back?
答案1
得分: 2
通过一些整数强制类型转换和牛顿-拉弗森修正步骤,您可以使用3个微操作获得相对准确的近似值。延迟可能不太理想,因为这涉及混合整数和双精度操作。但它应该比divpd
要好得多。此解决方案还假定所有输入都是标准化的双精度。
__m256d fastinv(__m256d y)
{
// 对于2的幂,获得精确结果
__m256i const magic = _mm256_set1_epi64x(0x7fe0'0000'0000'0000);
// 位运算魔术:对于2的幂,这只是反转指数,
// 对于其他值,进行线性插值
__m256d x = _mm256_castsi256_pd(_mm256_sub_epi64(magic, _mm256_castpd_si256(y)));
// 牛顿-拉弗森修正:x = x*(2.0 - x*y):
x = _mm256_mul_pd(x, _mm256_fnmadd_pd(x, y, _mm256_set1_pd(2.0)));
return x;
}
使用上述常数,对于2的幂,反转是精确的,但在sqrt(2)
附近有大约1.44%的误差。
如果您微调magic
常数以及2.0
常数或添加另一个NR步骤,可以增加精度。
Godbolt链接:https://godbolt.org/z/f7YhnhT96
英文:
With some integer-cast-hacking, and a Newton–Raphson refinement step, you can get a somewhat accurate approximation with 3 uops. Latency is probably not too good, since this involves mixing integer and double operations. But it should be much better than divpd
.
This solution also assumes that all inputs are normalized doubles.
__m256d fastinv(__m256d y)
{
// exact results for powers of two
__m256i const magic = _mm256_set1_epi64x(0x7fe0'0000'0000'0000);
// Bit-magic: For powers of two this just inverts the exponent,
// and values between that are linearly interpolated
__m256d x = _mm256_castsi256_pd(_mm256_sub_epi64(magic,_mm256_castpd_si256(y)));
// Newton-Raphson refinement: x = x*(2.0 - x*y):
x = _mm256_mul_pd(x, _mm256_fnmadd_pd(x, y, _mm256_set1_pd(2.0)));
return x;
}
With the constants above, the inverse is exact for powers of two, but has an error of ~1.44% near sqrt(2)
.
If you fine-tune the magic
constant as well as the 2.0
constant or add another NR-step, you can increase the accuracy.
Godbolt link: https://godbolt.org/z/f7YhnhT96
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论