OpenMP嵌套循环的并行化

huangapple go评论72阅读模式
英文:

OpenMP parallelization of nested loops

问题

I have translated the requested content for you:

我正在为一场考试练习,我想知道如何使用OpenMP正确解决以下C程序。

以下程序搜索从向量values[]的位置i..N-1中找到的最大元素,并将结果存储在第二个向量(maxs[])中。通过添加适当的OpenMP语句来并行化此程序。提供两种不同的解决方案:
1. 基于外循环的并行化
2. 第二个基于内循环的并行化

在代码中添加变量和更改,以实现更高效的解决方案(习题的得分将考虑此因素)。

提供解释,描述如何并行化程序以及线程在每种情况下如何协作,假设应用程序将使用T个线程运行(T<<N)。

```C
int values[N], sums[N]
int i , j, s;

for (i = 0; i < N ; i ++) {
    s = values[i];

    for (j = i+1; j < N ; j ++)
        if (s < values[j])
            s = value[j];
 
    maxs[i] = s;
} 

对于第一种情况,我提供的答案如下:

int values[N], sums[N]
int s;

#pragma omp parallel for schedule(dynamic, 1)
for (int i = 0; i < N ; i ++) {
    s = values[i];

    for (int j = i+1; j < N ; j ++)
        if (s < values[j])
             s = value[j];

    maxs[i] = s;
} 

我只是并行化了外循环,使用了动态调度。因此,每当有线程可用时,它将处理一个迭代。我没有使用ordered子句,因为在这种情况下执行顺序并不重要。此外,我没有使用collapse(2)子句,因为该语句指定我只需要并行化外循环(而不是内循环)。

在第二种情况下,我的答案如下:

int values[N], sums[N]
int s;

for (int i = 0; i < N ; i ++) {
    s = values[i];
    
    #pragma omp parallel for schedule(dynamic, 1)
    for (int j = i+1; j < N ; j ++)
        if (s < values[j])
             s = value[j];

    maxs[i] = s;
} 

我如何优化代码以获得尽可能高的分数?

谢谢! OpenMP嵌套循环的并行化


请注意,我仅提供了翻译,并未回答关于代码优化的问题。如果您需要代码优化建议,请提出相关问题。

<details>
<summary>英文:</summary>

I am practicing for an exam and I would like to know how to solve correctly the following C program using OpenMP.

The following program searches the biggest element in positions i..N-1 from vector
values[] and the results are stored in a second vector (maxs[]). Parallelize this program
by adding the appropriate OpenMP statements. Provide two different solutions: 
1. One based on the parallelization of the outer loop 
2. The second one based on the parallelization ofthe inner loop. 

Add variables and changes in the code in order to achieve a more efficient solution (the score of the exercise will take into account this).

Provide explanations that describe how the program is parallelized and how threads are
going to collaborate in each case, assuming the application will run with T threads
(T&lt;&lt;N).

int values[N], sums[N]
int i , j, s;

for (i = 0; i < N ; i ++) {
s = values[i];

for (j = i+1; j &lt; N ; j ++)
    if (s &lt; values[j])
        s = value[j];

maxs[i] = s;

}


For the first case, the answer I would provide is the following:

int values[N], sums[N]
int s;

#pragma omp parallel for schedule(dynamic, 1)
for (int i = 0; i < N ; i ++) {
s = values[i];

for (int j = i+1; j &lt; N ; j ++)
    if (s &lt; values[j])
         s = value[j];

maxs[i] = s;

}


I am simply parallelizing the outer loop, using a dynamic schedule. Thus, every time a thread is available it will handle an iteration. I do not use an ordered clause because the order of the execution does not really matter in this case. Moreover, I have not used the collapse(2) clause because the statement specifies that I only need to parallelize the outer loop (not the inner loop).


In the second scenario, I would answer the following:

int values[N], sums[N]
int s;

for (int i = 0; i < N ; i ++) {
s = values[i];

#pragma omp parallel for schedule(dynamic, 1)
for (int j = i+1; j &lt; N ; j ++)
    if (s &lt; values[j])
         s = value[j];

maxs[i] = s;

}



How could I optimize the code in order to obtain the highest grade possible?

Thanks! :)



</details>


# 答案1
**得分**: 1

这不严格遵循您的指示,但也许对某人有所帮助:

```plaintext
data:  1  1  4  3  3  2  1  4  4  5  2  3  5  1  1  3  4  1  2  1  3
maxs:  5  5  5  5  5  5  5  5  5  5  5  5  5  4  4  4  4  3  3  3  3

这是生成的(除去样板代码和打印部分):

  int partial_max;
  partial_max=0;
#pragma omp parallel for reduction(inscan,max:partial_max)
  for ( int i=values.size()-1; i&gt;=0; i-- ) {
    auto v = values[i];
    partial_max = std::max( partial_max, v );
#   pragma omp scan inclusive(partial_max)
    max_so_far[i] = partial_max;
  }

请注意,scan 是相对较新的功能,不是每个编译器都支持。我使用的是gcc12。

英文:

Ok, this is not strictly following your instructions but maybe it will benefit someone:

data:  1  1  4  3  3  2  1  4  4  5  2  3  5  1  1  3  4  1  2  1  3
maxs:  5  5  5  5  5  5  5  5  5  5  5  5  5  4  4  4  4  3  3  3  3

This was generated (minus boilerplate and printing) by:

  int partial_max;
  partial_max=0;
#pragma omp parallel for reduction(inscan,max:partial_max)
  for ( int i=values.size()-1; i&gt;=0; i-- ) {
    auto v = values[i];
    partial_max = std::max( partial_max, v );
#   pragma omp scan inclusive(partial_max)
    max_so_far[i] = partial_max;
  }

Note that scan is fairly new and not every compiler may support it. I'm using gcc12.

huangapple
  • 本文由 发表于 2023年6月26日 15:45:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76554560.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定