英文:
_mm_comieq_ss difference between Clang and GCC
问题
我有一些SIMD代码,用于检查变量之间的相等性,但涉及到NaN时,在GCC和clang之间得到了不同的结果:
GCC:
comieq(a,b):1
comieq(b,a):1
comieq(b,c):1
comieq(a,a):1
Clang:
comieq(a,b):0
comieq(b,a):0
comieq(b,c):1
comieq(a,a):0
有人知道发生这种情况的原因吗?我只是想检查两个寄存器是否相等,是否有一种更一致的替代方法?
英文:
I have some SIMD code that's checking for equality between vars but I'm getting different results between GCC and clang when NaNs are involved:
bool equal(__m128 a, __m128 b){
return _mm_comieq_ss(a,b) == 1;
}
int main()
{
__m128 a, b, c;
a = _mm_set_ss(std::numeric_limits<float>::quiet_NaN());
b = _mm_set_ss(1.0f);
c = _mm_set_ss(1.0f);
std::cout << "comieq(a,b):" << equal(a,b) << std::endl;
std::cout << "comieq(b,a):" << equal(b,a) << std::endl;
std::cout << "comieq(b,c):" << equal(b,c) << std::endl;
std::cout << "comieq(a,a):" << equal(a,a) << std::endl;
return 0;
}
Clang and GCC return different values:
gcc:
comieq(a,b):1
comieq(b,a):1
comieq(b,c):1
comieq(a,a):1
clang:
comieq(a,b):0
comieq(b,a):0
comieq(b,c):1
comieq(a,a):0
Does anyone have an idea why this is happening? I just want to check if two regs are equal or not; is there an alternative way to do this that's consistent?
godbolt: https://godbolt.org/z/ETKenE45f
答案1
得分: 3
NaN
值比较时返回值的不同处理是在Clang 3.9.0中特定地更改的。相关链接。
尽管人们会期望内部函数是与CPU内部相关的,而不是与编译器有关,comiss
指令会在FLAGS的多个位上产生结果。 不同的内部函数检查不同的谓词来定义一个布尔返回值;在汇编中,由程序员使用je
,jb
,jp
,setcc
/ cmovcc
等指令的组合来使用比较结果。
这里发生的情况是,GCC仅检查ZF
(零标志)值,而Clang还(正确地)检查PF
('奇偶'标志:如果比较无序,则设置,即其中一个输入是NaN
)。 这与由P6 x87 fcomi
设置的整数FLAGS的方式相匹配,反过来又与旧的x87 fcom
/ fstsw ax
/ sahf
相匹配)。
我将引用上述讨论中的简短引语,可能会对LLVM(Clang)团队所做决定背后的理由有所启发:
> 在Clang 3.8.0及之前,比较至少有一个NaN的两个标量将返回1。这也是GCC、Visual Studio和我们当前的Emscripten代码实现的行为。这种行为在某种程度上是令人困惑的,因为在IEEE-754中,比较浮点NaN的传统与之相反,即"没有什么等于NaN"。
> Intel是这些内部函数的原始作者,必须承认,这些函数长期以来一直缺乏详细的文档。 Intel没有详细规定这些内部函数在处理NaN时应该如何工作(https://software.intel.com/en-us/node/514308),但可以推测,GCC、VS和Clang<= 3.8各自遵循的行为可能源自遵循Intel编译器中原始代码的做法,其中_mm_comieq_ss
被实现为执行COMISS指令并返回内部函数的输出整数值的零标志(ZF)寄存器状态。 COMISS指令本身已经很好地文档化,因为它是ISA的一部分,例如可以在http://x86.renejeschke.de/html/file_module_x86_id_44.html中找到。这显示了意外的NaN行为的根源,因为如果比较相等或比较结果无序,即至少有一个寄存器是NaN,那么零标志就会被设置。
<hr>
根据Peter Cordes的评论,现在清楚(修改后的)Clang行为是正确的,并且上面引文中提到的"文档贫乏"已经得到了纠正。_mm_comieq_ss
的Intel文档现在明确表示,任何存在的NaN
值都应该返回零:
> #### 操作<br/>
> RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0
英文:
The different handling of return values when comparing NaN
values was specifically changed in Clang 3.9.0. Related Link.
Although one would expect intrinsic functions to be just that – intrinsic to the CPU and not compiler-dependent, the comiss
instruction produces a result in multiple bits of FLAGS. Different intrinsics check different predicates to define a single boolean return value; in asm it would be up to the programmer to use a combination of instructions like je
, jb
, and/or jp
or setcc
/ cmovcc
to use the compare result.
What is happening here is that GCC is checking only the ZF
(zero flag) value, whereas Clang is also (correctly) checking the PF
('parity' flag: set if the comparison is unordered, i.e. one of the inputs is NaN
. This matches the way integer FLAGS were set by P6 x87 fcomi
, in turn matching older x87 fcom
/ fstsw ax
/ sahf
).
I shall offer a short quotation from the discussion linked above, which may shed some light on the reasoning behind the decision made by the LLVM (clang) team:
> In Clang 3.8.0 and before, comparing two scalars of which at least one
> is a NaN would return 1. This is also the behavior that GCC, Visual
> Studio, and our current Emscripten code implements. This behavior is
> unintuitive in the sense that comparing NaNs in floats have the
> opposite tradition in IEEE-754, i.e. "nothing is equal to a NaN".
>
> Intel is the original author of these intrinsics, and it must be
> admitted that these functions have long suffered from poor
> documentation. Intel doesn't spec in detail how these intrinsics
> should work with respect to NaNs
> (https://software.intel.com/en-us/node/514308), but presumably the
> reference implementation in their own compiler was held as the ground
> truth. The behavior that GCC, VS and Clang <= 3.8 each follow likely
> comes from adhering to the original code as implemented in Intel's
> compilers, where _mm_comieq_ss is implemented to perform the COMISS
> instruction and return the resulting zero flag (ZF) register state as
> the output int value of the intrinsic function. The COMISS instruction
> itself is though well documented since it's part of the ISA, and is
> shown e.g. at
> http://x86.renejeschke.de/html/file_module_x86_id_44.html. This shows
> the origin of the unexpected NaN behavior, since the zero flag is set
> if the comparison is equal, or if the comparison result is unordered,
> i.e. at least one of the registers is a NaN.
<hr>
Following the comment from Peter Cordes, it is now clear that the (modified) clang behaviour is correct and the "poor documentation" from Intel referred to in the above citation has been corrected. The Intel documentation for _mm_comieq_ss
now makes it clear that any NaN
value present should yield a return value of zero:
> #### Operation<br/>
> RETURN ( a[31:0] != NaN AND b[31:0] != NaN AND a[31:0] == b[31:0] ) ? 1 : 0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论