2023年6月26日 15:45:08go评论98阅读模式

英文:

OpenMP parallelization of nested loops

问题

I have translated the requested content for you:

我正在为一场考试练习，我想知道如何使用OpenMP正确解决以下C程序。
以下程序搜索从向量values[]的位置i..N-1中找到的最大元素，并将结果存储在第二个向量(maxs[])中。通过添加适当的OpenMP语句来并行化此程序。提供两种不同的解决方案：
1. 基于外循环的并行化
2. 第二个基于内循环的并行化
在代码中添加变量和更改，以实现更高效的解决方案（习题的得分将考虑此因素）。
提供解释，描述如何并行化程序以及线程在每种情况下如何协作，假设应用程序将使用T个线程运行（T<<N）。
```C
int values[N], sums[N]
int i , j, s;
for (i = 0; i < N ; i ++) {
    s = values[i];
    for (j = i+1; j < N ; j ++)
        if (s < values[j])
            s = value[j];
 
    maxs[i] = s;
}

对于第一种情况，我提供的答案如下：

int values[N], sums[N]
int s;
#pragma omp parallel for schedule(dynamic, 1)
for (int i = 0; i < N ; i ++) {
    s = values[i];
    for (int j = i+1; j < N ; j ++)
        if (s < values[j])
             s = value[j];
    maxs[i] = s;
}

我只是并行化了外循环，使用了动态调度。因此，每当有线程可用时，它将处理一个迭代。我没有使用ordered子句，因为在这种情况下执行顺序并不重要。此外，我没有使用collapse(2)子句，因为该语句指定我只需要并行化外循环（而不是内循环）。

在第二种情况下，我的答案如下：

int values[N], sums[N]
int s;
for (int i = 0; i < N ; i ++) {
    s = values[i];
    
    #pragma omp parallel for schedule(dynamic, 1)
    for (int j = i+1; j < N ; j ++)
        if (s < values[j])
             s = value[j];
    maxs[i] = s;
}

我如何优化代码以获得尽可能高的分数？

谢谢！


请注意，我仅提供了翻译，并未回答关于代码优化的问题。如果您需要代码优化建议，请提出相关问题。
<details>
<summary>英文:</summary>
I am practicing for an exam and I would like to know how to solve correctly the following C program using OpenMP.
The following program searches the biggest element in positions i..N-1 from vector
values[] and the results are stored in a second vector (maxs[]). Parallelize this program
by adding the appropriate OpenMP statements. Provide two different solutions: 
1. One based on the parallelization of the outer loop 
2. The second one based on the parallelization ofthe inner loop. 
Add variables and changes in the code in order to achieve a more efficient solution (the score of the exercise will take into account this).
Provide explanations that describe how the program is parallelized and how threads are
going to collaborate in each case, assuming the application will run with T threads
(T&lt;&lt;N).

int values[N], sums[N]
int i , j, s;

for (i = 0; i < N ; i ++) {
s = values[i];

for (j = i+1; j &lt; N ; j ++)
    if (s &lt; values[j])
        s = value[j];
maxs[i] = s;

}


For the first case, the answer I would provide is the following:

int values[N], sums[N]
int s;

#pragma omp parallel for schedule(dynamic, 1)
for (int i = 0; i < N ; i ++) {
s = values[i];

for (int j = i+1; j &lt; N ; j ++)
    if (s &lt; values[j])
         s = value[j];
maxs[i] = s;

}


I am simply parallelizing the outer loop, using a dynamic schedule. Thus, every time a thread is available it will handle an iteration. I do not use an ordered clause because the order of the execution does not really matter in this case. Moreover, I have not used the collapse(2) clause because the statement specifies that I only need to parallelize the outer loop (not the inner loop).
In the second scenario, I would answer the following:

int values[N], sums[N]
int s;

for (int i = 0; i < N ; i ++) {
s = values[i];

#pragma omp parallel for schedule(dynamic, 1)
for (int j = i+1; j &lt; N ; j ++)
    if (s &lt; values[j])
         s = value[j];
maxs[i] = s;

}


How could I optimize the code in order to obtain the highest grade possible?
Thanks! :)
</details>
# 答案1
**得分**: 1
这不严格遵循您的指示，但也许对某人有所帮助：
```plaintext
data:  1  1  4  3  3  2  1  4  4  5  2  3  5  1  1  3  4  1  2  1  3
maxs:  5  5  5  5  5  5  5  5  5  5  5  5  5  4  4  4  4  3  3  3  3

这是生成的（除去样板代码和打印部分）：

  int partial_max;
  partial_max=0;
#pragma omp parallel for reduction(inscan,max:partial_max)
  for ( int i=values.size()-1; i&gt;=0; i-- ) {
    auto v = values[i];
    partial_max = std::max( partial_max, v );
#   pragma omp scan inclusive(partial_max)
    max_so_far[i] = partial_max;
  }

请注意，scan 是相对较新的功能，不是每个编译器都支持。我使用的是gcc12。

英文:

Ok, this is not strictly following your instructions but maybe it will benefit someone:

data:  1  1  4  3  3  2  1  4  4  5  2  3  5  1  1  3  4  1  2  1  3
maxs:  5  5  5  5  5  5  5  5  5  5  5  5  5  4  4  4  4  3  3  3  3

This was generated (minus boilerplate and printing) by:

  int partial_max;
  partial_max=0;
#pragma omp parallel for reduction(inscan,max:partial_max)
  for ( int i=values.size()-1; i&gt;=0; i-- ) {
    auto v = values[i];
    partial_max = std::max( partial_max, v );
#   pragma omp scan inclusive(partial_max)
    max_so_far[i] = partial_max;
  }

Note that scan is fairly new and not every compiler may support it. I'm using gcc12.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

OpenMP嵌套循环的并行化

问题

如何从cgo获取exe文件

xv6 – 我正在尝试实现共享内存，但出现了类似于分段错误的 usertrap 错误。

是否存在与编译器无关的 POSIX + C 标准方法来刷新指令缓存？

如何从文件中读取一个长列表的数据？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。