你应该使用哪个4×4矩阵乘法函数的变体?

huangapple go评论63阅读模式
英文:

Which variant of the matrix 4x4 multiplication function should I use?

问题

You're correct that both options are valid in C. However, returning the result using a return statement is often preferred because it allows for a more functional and clean coding style. This way, you can create a new matrix without modifying the input matrices, which can be useful for various scenarios. It also makes the code more readable and easier to reason about, especially in complex operations.

Returning a new matrix avoids unexpected side effects and is generally considered a good practice in programming. So, your preference for using a return statement is a good choice.

英文:

I am studying 3d rendering using OpenGL and C, and writing a small mathematical library for the purpose of studying. Is it better to return the result of the matrix multiplication function using a return statement, or by modifying an output matrix via pointer?

typedef float vec_t;

typedef struct mat4_s {
    vec_t m[4][4];
} mat4_t;

void Mat4Mult(mat4_t* out, const mat4_t* in1, const mat4_t* in2) {
    out->m[0][0] = /* ... */;
    out->m[1][0] = /* ... */;
    /* ... */
}

mat4_t Mat4Mult(const mat4_t* in1, const mat4_t* in2) {
    mat4_t result;
    result.m[0][0] = /* ... */;
    result.m[1][0] = /* ... */;
    /* ... */
    return result;
}

I want to understand which option would be more correct. I think both options are correct, but I prefer to return the result of a function using a return statement. Please correct me if I'm wrong, I haven't fully mastered C.

答案1

得分: 1

这是代码示例和一些性能比较的结果,以下是翻译好的部分:

"According to the benchmarks, returning by value is at least as fast as writing to a destination matrix. It is more inlining-friendly for some compilers, which may improve performance. However, you could likely achieve the same results by annotating your functions so that they are more likely to be inlined."

根据基准测试结果,按值返回至少与写入目标矩阵一样快。对于一些编译器来说,它更容易进行内联,这可能会提高性能。但是,通过注释函数,您可能可以达到相同的结果,以便更容易进行内联。

"It is worth noting that in Mat4Mul_Ret, return out; is writing to a destination in-place anyways, because large objects are passed via destination pointer in the x86_64 ABI:"

值得注意的是,在Mat4Mul_Ret中,return out; 实际上是在原地写入目标,因为在x86_64 ABI中,大型对象是通过目标指针传递的:

Mat4Mult_Ret:
// ...
// 最后4条指令将结果移动到目标指针
movups  xmmword ptr [rdi + 48], xmm1
movups  xmmword ptr [rdi + 32], xmm7
movups  xmmword ptr [rdi + 16], xmm4
movups  xmmword ptr [rdi], xmm3
ret

"There is one notable difference between your functions though: mat4_t* out can be aliased by in1 and in2, but a local mat4_t out can not. Consider marking your pointers restrict to give the compiler more optimization freedom."

然而,您的函数之间存在一个值得注意的区别:mat4_t* out 可能会被 in1in2 别名化,但本地的 mat4_t out 不会。考虑标记您的指针为 restrict,以便给编译器更多的优化自由。

英文:

It is very difficult to answer these questions by intuition, even if you have a mountain of experience. This is why you should try both, and profile the results. Let's compare the following naive 4x4 matrix multiplication functions:

void Mat4Mult_Dest(mat4_t* out, const mat4_t* in1, const mat4_t* in2) {
    for (int i = 0; i < 4; ++i) {
        for (int j = 0; j < 4; ++j) {
            out->m[i][j] = 0;
            for (int k = 0; k < 4; ++k) {
                out->m[i][j] += in1->m[i][k] * in2->m[k][j];
            }
        }
    }
}

mat4_t Mat4Mult_Ret(const mat4_t* in1, const mat4_t* in2) {
    mat4_t out = {0};
    for (int i = 0; i < 4; ++i) {
        for (int j = 0; j < 4; ++j) {
            out.m[i][j] = 0;
            for (int k = 0; k < 4; ++k) {
                out.m[i][j] += in1->m[i][k] * in2->m[k][j];
            }
        }
    }
    return out;
}

Clang 15.0 Results

你应该使用哪个4×4矩阵乘法函数的变体?

GCC 12.2 Results

你应该使用哪个4×4矩阵乘法函数的变体?

The results vary significantly between GCC and clang. Looking at the assembly, this is probably because clang inlined the _Ret version, but didn't do the same for the _Dest version. GCC inlined both functions, making them perform essentially the same. This is unsurprising, because the two functions are performing the same calculations.

Conclusion

According to the benchmarks, returning by value is at least as fast as writing to a destination matrix. It is more inlining-friendly for some compilers, which may improve performance. However, you could likely achieve the same results by annotating your functions so that they are more likely to be inlined.

It is worth noting that in Mat4Mul_Ret, return out; is writing to a destination in-place anyways, because large objects are passed via destination pointer in the x86_64 ABI:

Mat4Mult_Ret:
// ...
// last 4 instructions move result to destination pointer
movups  xmmword ptr [rdi + 48], xmm1
movups  xmmword ptr [rdi + 32], xmm7
movups  xmmword ptr [rdi + 16], xmm4
movups  xmmword ptr [rdi], xmm3
ret

There is one notable difference between your functions though: mat4_t* out can be aliased by in1 and in2, but a local mat4_t out can not. Consider marking your pointers restrict to give the compiler more optimization freedom.

huangapple
  • 本文由 发表于 2023年6月13日 17:38:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/76463555.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定