omp for loop for constexpr indexes

huangapple go评论125阅读模式
英文:

omp for loop for constexpr indexes

问题

  1. 假设我有一个依赖于一个非类型模板参数的函数,一个```std::size_t```,它可以取值```0,...,N-1```,其中```N```在编译时已知。
  2. 可以使用```std::sequence```或模板递归来迭代所有值。例如:
  3. ```cpp
  4. #include <utility>
  5. template <std::size_t I>
  6. void f() {
  7. //...
  8. }
  9. template <std::size_t... I>
  10. void loop_f_impl(std::index_sequence<I...>) {
  11. (f<I>(),...);
  12. }
  13. template <std::size_t N>
  14. void loop_f() {
  15. loop_f_impl(std::make_index_sequence<N>{});
  16. }
  17. int main() {
  18. constexpr std::size_t N = 4;
  19. loop_f<N>();
  20. }

我如何将“展开的循环”转换为可以使用openmp并行化的标准for循环?类似于以下代码(显然不能编译...)

  1. #pragma omp for
  2. for (std::size_t i = 0; i < N; ++i)
  3. f<i>();

显然,例如,如果N=3,我可以用以下方式实现:

  1. #pragma omp for
  2. for (std::size_t i = 0; i < N; ++i)
  3. switch (i) {
  4. case 1:
  5. f<1>();
  6. break;
  7. case 2:
  8. f<2>();
  9. break;
  10. case 3:
  11. f<3>();
  12. break;
  13. }

然而,我对一个适用于每个N的通用代码感兴趣。

英文:

Suppose I have a function depending on a nontype template argument, an std::size_t, which can take value 0,...,N-1, with N known at compile time.
An iteration over all values can be done with a std::sequence or with a template recursion. E.g.:

  1. #include &lt;utility&gt;
  2. template &lt;std::size_t I&gt;
  3. void f() {
  4. //...
  5. }
  6. template &lt;std::size_t... I&gt;
  7. void loop_f_impl(std::index_sequence&lt;I...&gt;) {
  8. (f&lt;I&gt;(),...);
  9. }
  10. template &lt;std::size_t N&gt;
  11. void loop_f() {
  12. loop_f_impl(std::make_index_sequence&lt;N&gt;{});
  13. }
  14. int main() {
  15. constexpr std::size_t N = 4;
  16. loop_f&lt;N&gt;();
  17. }

How can I convert the "unrolled loop" to a standard for loop that I can parallelize with openmp? Something like that (which obviously does not compile...)

  1. #pragma omp for
  2. for (std::size_t i = 0; i &lt; N; ++i)
  3. f&lt;i&gt;();

Clearly, if, say, N=3, I could implement that with

  1. #pragma omp for
  2. for (std::size_t i = 0; i &lt; N; ++i)
  3. switch (i) {
  4. case 1:
  5. f&lt;1&gt;();
  6. break;
  7. case 2:
  8. f&lt;2&gt;();
  9. break;
  10. case 3:
  11. f&lt;3&gt;();
  12. break;
  13. }

I am interested however in a generic code that works for every N.

答案1

得分: 4

omp for loop for constexpr indexes

您可以将 f 修改为接受 I 作为参数,因为您的 for 循环中的 i 不是 constexpr,不能在需要 constexpr 的地方使用。

  1. void f(std::size_t I) {
  2. }

Another option, without using omp, could be to launch all f<I...>()s asynchronously:

另一种选择,不使用 omp,是将所有的 f<I...>() 异步启动:

  1. #include <future>
  2. #include <tuple>
  3. template <std::size_t... I>
  4. void loop_f_impl(std::index_sequence<I...>) {
  5. std::tuple all{ std::async(std::launch::async, f<I>)... };
  6. } // here all futures stored in the `tuple` wait until done

An alternative could be to use one of the standard (since C++17) Execution Policies directly from loop_f in a std::for_each. Example:

另一种选择是直接从 loop_f 中使用标准的(自 C++17 起)Execution Policies 中的一个,在 std::for_each 中使用。示例:

  1. #include <algorithm>
  2. #include <array>
  3. #include <execution>
  4. template <std::size_t N>
  5. void loop_f() {
  6. // C++20 lambda template:
  7. constexpr auto funcs = []<std::size_t... Is>(std::index_sequence<Is...>) {
  8. return std::array{f<Is>...};
  9. }(std::make_index_sequence<N>{});
  10. std::for_each(std::execution::par_unseq, funcs.begin(), funcs.end(),
  11. [](auto func) { func(); });
  12. }

This will make use of Intel® oneAPI Threading Building Blocks or whatever your implementation uses as a backend.

英文:

> omp for loop for constexpr indexes

You could change f to take I as an argument instead since i in your for loop is not constexpr and can't be used where one is needed.

  1. void f(std::size_t I) {
  2. }

Another option, without using omp, could be to launch all f&lt;I...&gt;()s asynchronously:

  1. #include &lt;future&gt;
  2. #include &lt;tuple&gt;
  3. template &lt;std::size_t... I&gt;
  4. void loop_f_impl(std::index_sequence&lt;I...&gt;) {
  5. std::tuple all{ std::async(std::launch::async, f&lt;I&gt;)... };
  6. } // here all futures stored in the `tuple` wait until done

An alternative could be to use one of the standard (since C++17) Execution Policies directly from loop_f in a std::for_each. Example:

  1. #include &lt;algorithm&gt;
  2. #include &lt;array&gt;
  3. #include &lt;execution&gt;
  4. template &lt;std::size_t N&gt;
  5. void loop_f() {
  6. // C++20 lambda template:
  7. constexpr auto funcs = []&lt;std::size_t... Is&gt;(std::index_sequence&lt;Is...&gt;) {
  8. return std::array{f&lt;Is&gt;...};
  9. }(std::make_index_sequence&lt;N&gt;{});
  10. std::for_each(std::execution::par_unseq, funcs.begin(), funcs.end(),
  11. [](auto func) { func(); });
  12. }

This will make use of Intel® oneAPI Threading Building Blocks or whatever your implementation uses as a backend.

huangapple
  • 本文由 发表于 2023年7月18日 15:17:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76710335.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定