英文:
Which variant of the matrix 4x4 multiplication function should I use?
问题
You're correct that both options are valid in C. However, returning the result using a return
statement is often preferred because it allows for a more functional and clean coding style. This way, you can create a new matrix without modifying the input matrices, which can be useful for various scenarios. It also makes the code more readable and easier to reason about, especially in complex operations.
Returning a new matrix avoids unexpected side effects and is generally considered a good practice in programming. So, your preference for using a return
statement is a good choice.
英文:
I am studying 3d rendering using OpenGL and C, and writing a small mathematical library for the purpose of studying. Is it better to return the result of the matrix multiplication function using a return
statement, or by modifying an output matrix via pointer?
typedef float vec_t;
typedef struct mat4_s {
vec_t m[4][4];
} mat4_t;
void Mat4Mult(mat4_t* out, const mat4_t* in1, const mat4_t* in2) {
out->m[0][0] = /* ... */;
out->m[1][0] = /* ... */;
/* ... */
}
mat4_t Mat4Mult(const mat4_t* in1, const mat4_t* in2) {
mat4_t result;
result.m[0][0] = /* ... */;
result.m[1][0] = /* ... */;
/* ... */
return result;
}
I want to understand which option would be more correct. I think both options are correct, but I prefer to return the result of a function using a return
statement. Please correct me if I'm wrong, I haven't fully mastered C.
答案1
得分: 1
这是代码示例和一些性能比较的结果,以下是翻译好的部分:
"According to the benchmarks, returning by value is at least as fast as writing to a destination matrix. It is more inlining-friendly for some compilers, which may improve performance. However, you could likely achieve the same results by annotating your functions so that they are more likely to be inlined."
根据基准测试结果,按值返回至少与写入目标矩阵一样快。对于一些编译器来说,它更容易进行内联,这可能会提高性能。但是,通过注释函数,您可能可以达到相同的结果,以便更容易进行内联。
"It is worth noting that in Mat4Mul_Ret
, return out;
is writing to a destination in-place anyways, because large objects are passed via destination pointer in the x86_64 ABI:"
值得注意的是,在Mat4Mul_Ret
中,return out;
实际上是在原地写入目标,因为在x86_64 ABI中,大型对象是通过目标指针传递的:
Mat4Mult_Ret:
// ...
// 最后4条指令将结果移动到目标指针
movups xmmword ptr [rdi + 48], xmm1
movups xmmword ptr [rdi + 32], xmm7
movups xmmword ptr [rdi + 16], xmm4
movups xmmword ptr [rdi], xmm3
ret
"There is one notable difference between your functions though: mat4_t* out
can be aliased by in1
and in2
, but a local mat4_t out
can not. Consider marking your pointers restrict
to give the compiler more optimization freedom."
然而,您的函数之间存在一个值得注意的区别:mat4_t* out
可能会被 in1
和 in2
别名化,但本地的 mat4_t out
不会。考虑标记您的指针为 restrict
,以便给编译器更多的优化自由。
英文:
It is very difficult to answer these questions by intuition, even if you have a mountain of experience. This is why you should try both, and profile the results. Let's compare the following naive 4x4 matrix multiplication functions:
void Mat4Mult_Dest(mat4_t* out, const mat4_t* in1, const mat4_t* in2) {
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
out->m[i][j] = 0;
for (int k = 0; k < 4; ++k) {
out->m[i][j] += in1->m[i][k] * in2->m[k][j];
}
}
}
}
mat4_t Mat4Mult_Ret(const mat4_t* in1, const mat4_t* in2) {
mat4_t out = {0};
for (int i = 0; i < 4; ++i) {
for (int j = 0; j < 4; ++j) {
out.m[i][j] = 0;
for (int k = 0; k < 4; ++k) {
out.m[i][j] += in1->m[i][k] * in2->m[k][j];
}
}
}
return out;
}
Clang 15.0 Results
GCC 12.2 Results
The results vary significantly between GCC and clang. Looking at the assembly, this is probably because clang inlined the _Ret
version, but didn't do the same for the _Dest
version. GCC inlined both functions, making them perform essentially the same. This is unsurprising, because the two functions are performing the same calculations.
Conclusion
According to the benchmarks, returning by value is at least as fast as writing to a destination matrix. It is more inlining-friendly for some compilers, which may improve performance. However, you could likely achieve the same results by annotating your functions so that they are more likely to be inlined.
It is worth noting that in Mat4Mul_Ret
, return out;
is writing to a destination in-place anyways, because large objects are passed via destination pointer in the x86_64 ABI:
Mat4Mult_Ret:
// ...
// last 4 instructions move result to destination pointer
movups xmmword ptr [rdi + 48], xmm1
movups xmmword ptr [rdi + 32], xmm7
movups xmmword ptr [rdi + 16], xmm4
movups xmmword ptr [rdi], xmm3
ret
There is one notable difference between your functions though: mat4_t* out
can be aliased by in1
and in2
, but a local mat4_t out
can not. Consider marking your pointers restrict
to give the compiler more optimization freedom.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论