英文:
__m128i initializers and _mm_madd_epi16: What is the result?
问题
我尝试了以下代码:
__m128i x = {1,2,3,4,5,6,7,8};
__m128i y = {10,20,30,40,50,60,70,80};
__m128i z = _mm_madd_epi16(x, y);
结果是:
z = {6244, 201, -17692, 1006, 0,0,0,0}
但第一个元素应该是 1*10 + 2*20 = 50
。
你能解释一下我得到的结果吗?
英文:
I tried the following code:
__m128i x = { 1,2,3,4,5,6,7,8 };
__m128i y = { 10,20,30,40,50,60,70,80};
__m128i z = _mm_madd_epi16(x, y);
The result is:
z = {6244, 201, -17692, 1006, 0,0,0,0}
But the first element should be 1*10 + 2*20 = 50
.
Can you please explain the result I got ?
答案1
得分: 3
问题出在初始值。
__m128i x = {1, 2, 3, 4, 5, 6, 7, 8};
这些 __m128i
的初始值设定器并不是你想象的那样。首先,它们在大多数编译器上甚至不能编译通过,除了 MSVC。在 MSVC 的情况下,这里发生的情况等价于:
__m128i x = _mm_setr_epi8(1, 2, 3, 4, 5, 6, 7, 8, 0, 0, 0, 0, 0, 0, 0, 0);
这不是你想要的。
修复很简单:使用适当的 set
内置函数,这里是 _mm_setr_epi16
用于 16 位元素。(或者如果你想在最左边放置最高位的话,可以用 _mm_set_epi16
。)
请记住,用于结构体/联合体/数组的 C 初始化列表可以具有较少的元素,其余的元素会被隐式置为零。因此,显式元素的数量不能暗示你指的是哪种元素宽度。内置函数 API 使用 _mm_set
内置函数而不是裸初始化列表,因为相同类型可以容纳不同数量的元素。
你可以使用调试器检查 __m128i
的元素。
英文:
The problem is in the initial values.
__m128i x = { 1,2,3,4,5,6,7,8 };
Those __m128i
initializers don't do what you think they do.
To begin with they do not even compile on most compilers, except MSVC. In the case of MSVC, what happened here is equivalent to:
__m128i x = _mm_setr_epi8( 1,2,3,4,5,6,7,8, 0,0,0,0,0,0,0,0 );
Which isn't what you meant.
The fix is simple: use the proper set
intrinsic, in this case _mm_setr_epi16
for 16-bit elements. (Or _mm_set_epi16
if you want to give the highest one on the left, on the side a left-shift would shift towards.)
Remember that C initializer lists for structs / unions / arrays can have fewer elements with the rest being implicit zeros. So the number of explicit elements can't imply which element width you meant. The intrinsics API uses _mm_set
intrinsics instead of bare initializer lists because the same type can hold different numbers of elements.
You can check the elements of a __m128i
with a debugger.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论