2023年6月13日 17:38:23go评论83阅读模式

英文:

Which variant of the matrix 4x4 multiplication function should I use?

问题

You're correct that both options are valid in C. However, returning the result using a return statement is often preferred because it allows for a more functional and clean coding style. This way, you can create a new matrix without modifying the input matrices, which can be useful for various scenarios. It also makes the code more readable and easier to reason about, especially in complex operations.

Returning a new matrix avoids unexpected side effects and is generally considered a good practice in programming. So, your preference for using a return statement is a good choice.

英文:

I am studying 3d rendering using OpenGL and C, and writing a small mathematical library for the purpose of studying. Is it better to return the result of the matrix multiplication function using a return statement, or by modifying an output matrix via pointer?

typedef float vec_t;

typedef struct mat4_s {
    vec_t m[4][4];
} mat4_t;

void Mat4Mult(mat4_t* out, const mat4_t* in1, const mat4_t* in2) {
    out-&gt;m[0][0] = /* ... */;
    out-&gt;m[1][0] = /* ... */;
    /* ... */
}

mat4_t Mat4Mult(const mat4_t* in1, const mat4_t* in2) {
    mat4_t result;
    result.m[0][0] = /* ... */;
    result.m[1][0] = /* ... */;
    /* ... */
    return result;
}

I want to understand which option would be more correct. I think both options are correct, but I prefer to return the result of a function using a return statement. Please correct me if I'm wrong, I haven't fully mastered C.

答案1

得分: 1

这是代码示例和一些性能比较的结果，以下是翻译好的部分：

"According to the benchmarks, returning by value is at least as fast as writing to a destination matrix. It is more inlining-friendly for some compilers, which may improve performance. However, you could likely achieve the same results by annotating your functions so that they are more likely to be inlined."

根据基准测试结果，按值返回至少与写入目标矩阵一样快。对于一些编译器来说，它更容易进行内联，这可能会提高性能。但是，通过注释函数，您可能可以达到相同的结果，以便更容易进行内联。

"It is worth noting that in Mat4Mul_Ret, return out; is writing to a destination in-place anyways, because large objects are passed via destination pointer in the x86_64 ABI:"

值得注意的是，在Mat4Mul_Ret中，return out; 实际上是在原地写入目标，因为在x86_64 ABI中，大型对象是通过目标指针传递的：

Mat4Mult_Ret:
// ...
// 最后4条指令将结果移动到目标指针
movups  xmmword ptr [rdi + 48], xmm1
movups  xmmword ptr [rdi + 32], xmm7
movups  xmmword ptr [rdi + 16], xmm4
movups  xmmword ptr [rdi], xmm3
ret

"There is one notable difference between your functions though: mat4_t* out can be aliased by in1 and in2, but a local mat4_t out can not. Consider marking your pointers restrict to give the compiler more optimization freedom."

然而，您的函数之间存在一个值得注意的区别：mat4_t* out 可能会被 in1 和 in2 别名化，但本地的 mat4_t out 不会。考虑标记您的指针为 restrict，以便给编译器更多的优化自由。

英文:

It is very difficult to answer these questions by intuition, even if you have a mountain of experience. This is why you should try both, and profile the results. Let's compare the following naive 4x4 matrix multiplication functions:

void Mat4Mult_Dest(mat4_t* out, const mat4_t* in1, const mat4_t* in2) {
    for (int i = 0; i &lt; 4; ++i) {
        for (int j = 0; j &lt; 4; ++j) {
            out-&gt;m[i][j] = 0;
            for (int k = 0; k &lt; 4; ++k) {
                out-&gt;m[i][j] += in1-&gt;m[i][k] * in2-&gt;m[k][j];
            }
        }
    }
}

mat4_t Mat4Mult_Ret(const mat4_t* in1, const mat4_t* in2) {
    mat4_t out = {0};
    for (int i = 0; i &lt; 4; ++i) {
        for (int j = 0; j &lt; 4; ++j) {
            out.m[i][j] = 0;
            for (int k = 0; k &lt; 4; ++k) {
                out.m[i][j] += in1-&gt;m[i][k] * in2-&gt;m[k][j];
            }
        }
    }
    return out;
}

Clang 15.0 Results

GCC 12.2 Results

The results vary significantly between GCC and clang. Looking at the assembly, this is probably because clang inlined the _Ret version, but didn't do the same for the _Dest version. GCC inlined both functions, making them perform essentially the same. This is unsurprising, because the two functions are performing the same calculations.

Conclusion

According to the benchmarks, returning by value is at least as fast as writing to a destination matrix. It is more inlining-friendly for some compilers, which may improve performance. However, you could likely achieve the same results by annotating your functions so that they are more likely to be inlined.

It is worth noting that in Mat4Mul_Ret, return out; is writing to a destination in-place anyways, because large objects are passed via destination pointer in the x86_64 ABI:

Mat4Mult_Ret:
// ...
// last 4 instructions move result to destination pointer
movups  xmmword ptr [rdi + 48], xmm1
movups  xmmword ptr [rdi + 32], xmm7
movups  xmmword ptr [rdi + 16], xmm4
movups  xmmword ptr [rdi], xmm3
ret

There is one notable difference between your functions though: mat4_t* out can be aliased by in1 and in2, but a local mat4_t out can not. Consider marking your pointers restrict to give the compiler more optimization freedom.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

你应该使用哪个4×4矩阵乘法函数的变体？

问题

答案1

Clang 15.0 Results

GCC 12.2 Results

Conclusion

在C语言中，为什么变量需要在使用之前声明，而函数不需要？

Gstreamer在切换状态时存在内存泄漏。

这张图片有什么问题？

在使用C语言进行静态链接时，出现了意外的内存使用行为。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论