使用OpenMP并行化两个for循环。

huangapple go评论93阅读模式
英文:

Parallelise 2 for loops with OpenMP

问题

以下是代码部分的中文翻译:

  1. 所以我有这个函数,我必须使用OpenMP静态调度来并行化,以适应n个线程
  2. void computeAccelerations(){
  3. int i,j;
  4. for(i=0;i<bodies;i++){
  5. accelerations[i].x = 0; accelerations[i].y = 0; accelerations[i].z = 0;
  6. for(j=0;j<bodies;j++){
  7. if(i!=j){
  8. //accelerations[i] = addVectors(accelerations[i],scaleVector(GravConstant*masses[j]/pow(mod(subtractVectors(positions[i],positions[j])),3),subtractVectors(positions[j],positions[i])));
  9. vector sij = {positions[i].x-positions[j].x,positions[i].y-positions[j].y,positions[i].z-positions[j].z};
  10. vector sji = {positions[j].x-positions[i].x,positions[j].y-positions[i].y,positions[j].z-positions[i].z};
  11. double mod = sqrt(sij.x*sij.x + sij.y*sij.y + sij.z*sij.z);
  12. double mod3 = mod * mod * mod;
  13. double s = GravConstant*masses[j]/mod3;
  14. vector S = {s*sji.x,s*sji.y,s*sji.z};
  15. accelerations[i].x+=S.x;accelerations[i].y+=S.y;accelerations[i].z+=S.z;
  16. }
  17. }
  18. }
  19. }
  20. 我尝试做类似的事情:
  21. ```c
  22. void computeAccelerations_static(int num_of_threads){
  23. int i,j;
  24. #pragma omp parallel for num_threads(num_of_threads) schedule(static)
  25. for(i=0;i<bodies;i++){
  26. accelerations[i].x = 0; accelerations[i].y = 0; accelerations[i].z = 0;
  27. for(j=0;j<bodies;j++){
  28. if(i!=j){
  29. //accelerations[i] = addVectors(accelerations[i],scaleVector(GravConstant*masses[j]/pow(mod(subtractVectors(positions[i],positions[j])),3),subtractVectors(positions[j],positions[i])));
  30. vector sij = {positions[i].x-positions[j].x,positions[i].y-positions[j].y,positions[i].z-positions[j].z};
  31. vector sji = {positions[j].x-positions[i].x,positions[j].y-positions[i].y,positions[j].z-positions[i].z};
  32. double mod = sqrt(sij.x*sij.x + sij.y*sij.y + sij.z*sij.z);
  33. double mod3 = mod * mod * mod;
  34. double s = GravConstant*masses[j]/mod3;
  35. vector S = {s*sji.x,s*sji.y,s*sji.z};
  36. accelerations[i].x+=S.x;accelerations[i].y+=S.y;accelerations[i].z+=S.z;
  37. }
  38. }
  39. }

自然而然地添加#pragma omp parallel for num_threads(num_of_threads) schedule(static),但这并不正确。

我认为accelerations[i]存在一些虚假共享问题,但我不知道如何解决它。我感激任何帮助。谢谢。

  1. <details>
  2. <summary>英文:</summary>
  3. So I have this function that I have to parallelize with OpenMP static scheduling for n threads
  4. void computeAccelerations(){
  5. int i,j;
  6. for(i=0;i&lt;bodies;i++){
  7. accelerations[i].x = 0; accelerations[i].y = 0; accelerations[i].z = 0;
  8. for(j=0;j&lt;bodies;j++){
  9. if(i!=j){
  10. //accelerations[i] = addVectors(accelerations[i],scaleVector(GravConstant*masses[j]/pow(mod(subtractVectors(positions[i],positions[j])),3),subtractVectors(positions[j],positions[i])));
  11. vector sij = {positions[i].x-positions[j].x,positions[i].y-positions[j].y,positions[i].z-positions[j].z};
  12. vector sji = {positions[j].x-positions[i].x,positions[j].y-positions[i].y,positions[j].z-positions[i].z};
  13. double mod = sqrt(sij.x*sij.x + sij.y*sij.y + sij.z*sij.z);
  14. double mod3 = mod * mod * mod;
  15. double s = GravConstant*masses[j]/mod3;
  16. vector S = {s*sji.x,s*sji.y,s*sji.z};
  17. accelerations[i].x+=S.x;accelerations[i].y+=S.y;accelerations[i].z+=S.z;
  18. }
  19. }
  20. }
  21. }
  22. I tried to do something like:

void computeAccelerations_static(int num_of_threads){
int i,j;
#pragma omp parallel for num_threads(num_of_threads) schedule(static)
for(i=0;i<bodies;i++){
accelerations[i].x = 0; accelerations[i].y = 0; accelerations[i].z = 0;
for(j=0;j<bodies;j++){
if(i!=j){
//accelerations[i] = addVectors(accelerations[i],scaleVector(GravConstantmasses[j]/pow(mod(subtractVectors(positions[i],positions[j])),3),subtractVectors(positions[j],positions[i])));
vector sij = {positions[i].x-positions[j].x,positions[i].y-positions[j].y,positions[i].z-positions[j].z};
vector sji = {positions[j].x-positions[i].x,positions[j].y-positions[i].y,positions[j].z-positions[i].z};
double mod = sqrt(sij.x
sij.x + sij.ysij.y + sij.zsij.z);
double mod3 = mod * mod * mod;
double s = GravConstantmasses[j]/mod3;
vector S = {s
sji.x,ssji.y,ssji.z};
accelerations[i].x+=S.x;accelerations[i].y+=S.y;accelerations[i].z+=S.z;
}
}
}

  1. It comes naturally to just add the ```#pragma omp parallel for num_threads(num_of_threads) schedule(static)``` but it isn&#39;t correct.
  2. I think there is some kind of false sharing with the ``accelerations[i]`` but I don&#39;t know how to approach it. I appreciate any kind of help. Thank you.
  3. </details>
  4. # 答案1
  5. **得分**: 2
  6. 在你的循环嵌套中,只有外部循环的迭代被并行化。因为`i`是循环控制变量,每个线程都有自己的私有副本,但从代码风格的角度来看,最好在循环控制块中声明`i`
  7. `j`则不同。它在并行区域之外声明,*不是*并行化循环的控制变量。因此,它在线程之间是共享的。因为执行`i`循环迭代的每个线程都操作共享变量`j`,所以会出现数据竞争的严重问题。这可以通过将`j`的声明移到并行区域内来解决(除其他替代方案之外),最好放在其关联循环的控制块中。
  8. 总之,代码应该是这样的:
  9. ```cpp
  10. // int i, j;
  11. #pragma omp parallel for num_threads(num_of_threads) schedule(static)
  12. for (int i = 0; i < bodies; i++) {
  13. accelerations[i].x = 0;
  14. accelerations[i].y = 0;
  15. accelerations[i].z = 0;
  16. #pragma omp parallel for num_threads(num_of_threads) schedule(static)
  17. for (int j = 0; j < bodies; j++) {
  18. if (i != j) {
  19. vector sij = { positions[i].x - positions[j].x,
  20. positions[i].y - positions[j].y,
  21. positions[i].z - positions[j].z };
  22. double mod = sqrt(sij.x * sij.x + sij.y * sij.y + sij.z * sij.z);
  23. double mod3 = mod * mod * mod;
  24. double s = GravConstant * masses[j] / mod3;
  25. accelerations[i].x -= s * sij.x;
  26. accelerations[i].y -= s * sij.y;
  27. accelerations[i].z -= s * sij.z;
  28. }
  29. }
  30. }

还要注意,计算sji似乎是浪费的,因为在数学上它只是-sij,而且既不sji也不sij被修改。我建议将上述代码简化为以下形式:

  1. #pragma omp parallel for num_threads(num_of_threads) schedule(static)
  2. for (int i = 0; i < bodies; i++) {
  3. accelerations[i].x = 0;
  4. accelerations[i].y = 0;
  5. accelerations[i].z = 0;
  6. for (int j = 0; j < bodies; j++) {
  7. if (i != j) {
  8. vector sij = { positions[i].x - positions[j].x,
  9. positions[i].y - positions[j].y,
  10. positions[i].z - positions[j].z };
  11. double mod = sqrt(sij.x * sij.x + sij.y * sij.y + sij.z * sij.z);
  12. double mod3 = mod * mod * mod;
  13. double s = GravConstant * masses[j] / mod3;
  14. accelerations[i].x -= s * sij.x;
  15. accelerations[i].y -= s * sij.y;
  16. accelerations[i].z -= s * sij.z;
  17. }
  18. }
  19. }
英文:

In your loop nest, only the iterations of the outer loop are parallelized. Because i is the loop-control variable, each thread gets its own, private copy, but as a matter of style, it would be better to declare i in the loop control block.

j is another matter. It is declared outside the parallel region and it is not the control variable of a parallelized loop. As a result, it is shared among the threads. Because each of the threads executing i-loop iterations manipulates shared variable j, you have a huge problem with data races. This would be resolved (among other alternatives) by moving the declaration of j into the parallel region, preferrably into the control block of its associated loop.

Overall, then:

  1. // int i, j;
  2. #pragma omp parallel for num_threads(num_of_threads) schedule(static)
  3. for (int i = 0; i &lt; bodies; i++) {
  4. accelerations[i].x = 0;
  5. accelerations[i].y = 0;
  6. accelerations[i].z = 0;
  7. for (int j = 0; j &lt; bodies; j++) {
  8. if (i != j) {
  9. //accelerations[i] = addVectors(accelerations[i],scaleVector(GravConstant*masses[j]/pow(mod(subtractVectors(positions[i],positions[j])),3),subtractVectors(positions[j],positions[i])));
  10. vector sij = { positions[i].x - positions[j].x,
  11. positions[i].y - positions[j].y,
  12. positions[i].z - positions[j].z };
  13. vector sji = { positions[j].x - positions[i].x,
  14. positions[j].y - positions[i].y,
  15. positions[j].z - positions[i].z };
  16. double mod = sqrt(sij.x * sij.x + sij.y * sij.y + sij.z * sij.z);
  17. double mod3 = mod * mod * mod;
  18. double s = GravConstant * masses[j] / mod3;
  19. vector S = { s * sji.x, s * sji.y, s * sji.z };
  20. accelerations[i].x += S.x;
  21. accelerations[i].y += S.y;
  22. accelerations[i].z += S.z;
  23. }
  24. }
  25. }

Note also that computing sji appears to be wasteful, as in mathematical terms it is just -sij, and neither sji nor sij is modified. I would probably reduce the above to something more like this:

  1. #pragma omp parallel for num_threads(num_of_threads) schedule(static)
  2. for (int i = 0; i &lt; bodies; i++) {
  3. accelerations[i].x = 0;
  4. accelerations[i].y = 0;
  5. accelerations[i].z = 0;
  6. for (int j = 0; j &lt; bodies; j++) {
  7. if (i != j) {
  8. vector sij = { positions[i].x - positions[j].x,
  9. positions[i].y - positions[j].y,
  10. positions[i].z - positions[j].z };
  11. double mod = sqrt(sij.x * sij.x + sij.y * sij.y + sij.z * sij.z);
  12. double mod3 = mod * mod * mod;
  13. double s = GravConstant * masses[j] / mod3;
  14. accelerations[i].x -= s * sij.x;
  15. accelerations[i].y -= s * sij.y;
  16. accelerations[i].z -= s * sij.z;
  17. }
  18. }
  19. }

huangapple
  • 本文由 发表于 2023年2月16日 04:05:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75464941.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定