2023年6月5日 23:32:53go评论116阅读模式

英文:

Trivial C program yields different result in clang/macOS/arm64 and clang/macOS/x86_64

问题

I have some problems while porting some complex under macOS/arm64 and ended up with the following trivial code to exhibit the different behavior w.r.t. macOS/x86_64 (using native osx/arm64 clang version 14.0.6 from conda-forge, and cross compiling for x86_64):

#include &quot;assert.h&quot;
#include &quot;stdio.h&quot;
int main()
{
    double y[2] = {-0.01,0.9};
    double r;
    r = y[0]+0.03*y[1];
    printf(&quot;r = %24.26e\n&quot;,r);
    assert(r == 0.017);
}

The results on arm64 is

$ clang -arch arm64 test.c -o test; ./test
Assertion failed: (r == 0.017), function main, file test.c, line 9.
r = 1.69999999999999977517983751e-02
zsh: abort      ./test

while the result on x86_64 is

$ clang -arch x86_64 test.c -o test; ./test
r = 1.70000000000000012212453271e-02
$

The test program has also been compiled/run on a x86_64 machine, it yields the same result as above (cross compiled on arm64 and run with Rosetta).

In fact it doesn't matter that the arm64 result is not bitwise equal to 1.7 parsed and stored as a IEEE754 number, but rather the different value of the expression w.r.t. x86_64.

Update 1:

In order to check eventual different conventions (e.g. rounding mode), the following program has been compiled and run on both platforms

#include &lt;iostream&gt;
#include &lt;limits&gt;
#define LOG(x) std::cout &lt;&lt; #x &quot; = &quot; &lt;&lt; x &lt;&lt; &#39;\n&#39;
int main()
{
    using l = std::numeric_limits&lt;double&gt;;
    LOG(l::digits);
    LOG(l::round_style);
    LOG(l::epsilon());
    LOG(l::min());
    return 0;
}

it yields the same result:

l::digits = 53
l::round_style = 1
l::epsilon() = 2.22045e-16
l::min() = 2.22507e-308

hence the problem seems to be elsewhere.

Update 2:

If it can help: under arm64 the result obtained with the expression is the same as the one obtained by calling refBLAS ddot with vectors {1,0.03} and y.

Update 3:

The toolchain seems to be the cause. Using the default toolchain of macOS 11.6.1:

mottelet@portmottelet-cr-1 ~ % clang -v
Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: arm64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

gives the same results for both architecture ! So the problem seems to be in the actual toolchain I am using: I use the version 1.5.2 of conda package cxx-compiler (I need conda as a package manager because the application I am building has a lot of dependencies that conda provides me).

Using -v shows a bunch of compilation flags, which one would be eventually incriminated ?

英文:

#include &quot;assert.h&quot;
#include &quot;stdio.h&quot;
int main()
{
    double y[2] = {-0.01,0.9};
    double r;
    r = y[0]+0.03*y[1];
    printf(&quot;r = %24.26e\n&quot;,r);
    assert(r == 0.017);
}

The results on arm64 is

$ clang -arch arm64 test.c -o test; ./test
Assertion failed: (r == 0.017), function main, file test.c, line 9.
r = 1.69999999999999977517983751e-02
zsh: abort      ./test

while the result on x86_64 is

$ clang -arch x86_64 test.c -o test; ./test
r = 1.70000000000000012212453271e-02
$

The test program has also been compiled/run on a x86_64 machine, it yields the same result as above (cross compiled on arm64 and run with Rosetta).

In fact it doesn't matter that the arm64 result is not bitwise equal to 1.7 parsed and stored as a IEEE754 number, but rather the different value of the expression w.r.t. x86_64.

Update 1:

In order to check eventual different conventions (e.g. rounding mode), the following program has been compiled and run on both platforms

#include &lt;iostream&gt;
#include &lt;limits&gt;
#define LOG(x) std::cout &lt;&lt; #x &quot; = &quot; &lt;&lt; x &lt;&lt; &#39;\n&#39;
int main()
{
    using l = std::numeric_limits&lt;double&gt;;
    LOG(l::digits);
    LOG(l::round_style);
    LOG(l::epsilon());
    LOG(l::min());
    return 0;
}

it yields the same result:

l::digits = 53
l::round_style = 1
l::epsilon() = 2.22045e-16
l::min() = 2.22507e-308

hence the problem seems to be elsewhere.

Update 2:

If it can help: under arm64 the result obtained with the expression is the same as the one obtained by calling refBLAS ddot with vectors {1,0.03} and y.

Update 3:

The toolchain seems to be the cause. Using the default toolchain of macOS 11.6.1:

mottelet@portmottelet-cr-1 ~ % clang -v
Apple clang version 13.0.0 (clang-1300.0.29.30)
Target: arm64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Using -v shows a bunch of compilation flags, which one would be eventually incriminated ?

答案1

得分: 5

以下是翻译好的部分：

The results differ in the least significant bit due to different rounding given the compilers and architectures.
结果在最不显著的位上不同，这是由于不同编译器和体系结构的舍入方式不同。
You can use %a to see all of the bits in the double in hex. Then you get on arm64:
您可以使用%a以十六进制查看双精度浮点数的所有位。然后在arm64上获得：
0x1.16872b020c49bp-6
0x1.16872b020c49bp-6
and on x86_64:
而在x86_64上：
0x1.16872b020c49cp-6
0x1.16872b020c49cp-6
The IEEE 754 standard by itself does not guarantee exactly the same results across conforming implementations, in particular due to destination accuracy, decimal conversions, and instruction choices.
单独的IEEE 754标准不能保证在符合规范的实现中完全相同的结果，特别是由于目标精度、十进制转换和指令选择的原因。
Variations in the least significant bit, or more with multiple operations, can and should be expected.
应该预期最不显著位或在多个操作中的变化。
In this case, the fmadd operation on the arm64 architecture is used, doing the multiply and add in a single operation.
在这种情况下，arm64体系结构上使用了fmadd操作，将乘法和加法合并为一次操作。
That gives a different result than the separate multiply and add XMM operations used in the x86_64 architecture.
这与x86_64体系结构中使用的分开的乘法和加法XMM操作产生了不同的结果。
In the comments, Eric points out the C library function fma() to do a combined multiply-add. Indeed, if I use that call on the x86_64 architecture (as well as on arm64), I get the arm64 fmadd result.
在评论中，Eric指出了C库函数fma()来执行合并的乘法和加法。确实，如果我在x86_64架构上使用该调用（以及在arm64上），我会得到arm64 fmadd 的结果。
You could potentially get different behavior in the same architecture if the compiler optimizes away the operation, as it should in the example.
如果编译器优化掉操作，您在_相同的_体系结构中可能会获得不同的行为，就像在示例中一样。
Then the compiler is doing the computation. The compiler could very well use separate multiply and add operations at compile time, giving a different result on arm64 than the fmadd operation when not optimized out.
那么编译器会执行计算。编译器在编译时很可能会使用分开的乘法和加法操作，在arm64上与未被优化的fmadd操作产生不同的结果。
Also if you're cross-compiling, then the optimized-out calculation could depend on the architecture of the machine you're compiling on, as opposed to the one you're running it on.
此外，如果您在交叉编译，那么被优化掉的计算可能会依赖于您正在编译的机器的体系结构，而不是您正在运行它的机器。
Comparison for exact equality of floating point values is fraught with peril. Whenever you see yourself attempting that, you need to think more deeply about your intent.
对浮点数值进行精确相等比较充满了危险。每当您尝试这样做时，您需要更深入地思考您的意图。

英文:

The results differ in the least significant bit due to different rounding given the compilers and architectures. You can use %a to see all of the bits in the double in hex. Then you get on arm64:

0x1.16872b020c49bp-6

and on x86_64:

0x1.16872b020c49cp-6

The IEEE 754 standard by itself does not guarantee exactly the same results across conforming implementations, in particular due to destination accuracy, decimal conversions, and instruction choices. Variations in the least significant bit, or more with multiple operations, can and should be expected.

In this case, the fmadd operation on the arm64 architecture is used, doing the multiply and add in a single operation. That gives a different result than the separate multiply and add XMM operations used in the x86_64 architecture.

In the comments, Eric points out the C library function fma() to do a combined multiply-add. Indeed, if I use that call on the x86_64 architecture (as well as on arm64), I get the arm64 fmadd result.

You could potentially get different behavior in the same architecture if the compiler optimizes away the operation, as it should in the example. Then the compiler is doing the computation. The compiler could very well use separate multiply and add operations at compile time, giving a different result on arm64 than the fmadd operation when not optimized out. Also if you're cross-compiling, then the optimized-out calculation could depend the architecture of the machine you're compiling on, as opposed to the one you're running it on.

Comparison for exact equality of floating point values is fraught with peril. Whenever you see yourself attempting that, you need to think more deeply about your intent.

答案2

得分: 3

It appears that clang behavior changed between 13.x and 14.x. When using -O, the result is computed at compile time and the target's floating point has nothing to do with it, so this is strictly a compiler issue.

Try on godbolt

The difference is easier to see in hex float output. clang 13 and earlier computes the value 0x1.16872b020c49cp-6 which is slightly greater than 1.7. clang 14 and later computes 0x1.16872b020c49bp-6 which is slightly less (different by 1 in the least significant bit).

The same discrepancy exists between the two versions whether on arm64 or x86-64.

I am not sure offhand which one is better or worse. I guess you could git bisect if you really care, and look at the rationale for the corresponding commit and see whether it seems to be correct. For comparison, gcc in all versions tested gives the "old clang" value of 0x1.16872b020c49cp-6.

英文:

Try on godbolt

The same discrepancy exists between the two versions whether on arm64 or x86-64.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Trivial C程序在clang/macOS/arm64和clang/macOS/x86_64上产生不同的结果。

问题

Update 1:

Update 2:

Update 3:

Update 1:

Update 2:

Update 3:

答案1

答案2

以非阻塞方式打开FIFO（命名管道）以进行只写操作是否可行？

配置过程中出现错误：C编译器无法创建可执行文件（Concorde TSP）

Looking to suppress "COM Interop is not supported on this platform" error in Exchange Online Script

在Linux 6.0中注册设备时如何初始化设备类。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论