英文:
Parallel threads doing the same until a barrier, after barrier the result is not always the same
问题
多个线程并行地增加数组中的值。每个线程增加自己的值,不会改变其他线程的值。在每次增加之后,都有一个屏障,使得线程等待彼此,并只有在所有线程都完成后才继续下一次循环迭代。我认为这样,每次所有线程都达到屏障时,每个数组元素中的值都应该相等。然而,这并不总是如此。为什么这并不总是如此?
代码:
#include <iostream>
#include <omp.h>
#include <vector>
#include <chrono>
int main() {
const int num_threads = 4;
const auto start_time = std::chrono::high_resolution_clock::now();
std::vector<int> arr(num_threads, 0);
#pragma omp parallel num_threads(num_threads)
{
int thread_num = omp_get_thread_num();
while (true) {
// 检查经过的时间
auto now = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(now - start_time).count();
if (elapsed >= 10) break;
arr[thread_num]++;
std::cout << "线程 " << thread_num << " 完成增加操作。\n";
#pragma omp barrier // 等待其他线程
// 读取其他线程的值
for (int i = 0; i < num_threads; ++i) {
if (i != thread_num && arr[i] != arr[thread_num]) {
std::cout << "线程 " << thread_num << ": 我的值是 " << arr[thread_num]
<< ",但线程 " << i << " 的值是 " << arr[i] << "。\n";
}
}
}
}
std::cout << "最终数组值:\n";
for (int i = 0; i < num_threads; ++i) {
std::cout << "索引 " << i << ": " << arr[i] << "\n";
}
return 0;
}
输出(末尾部分):
线程 3 完成增加操作。
线程 2 完成增加操作。
线程 1: 我的值是 271402,但线程 2 的值是 271403。
线程 1 完成增加操作。
线程 3: 我的值是 271402,但线程 1 的值是 271403。
线程 3: 我的值是 271402,但线程 2 的值是 271403。
线程 0: 我的值是 271402,但线程 2 的值是 271403。
线程 1: 我的值是 271403,但线程 0 的值是 271402。
线程 1: 我的值是 271403,但线程 3 的值是 271402。
线程 2: 我的值是 271403,但线程 0 的值是 271402。
线程 2: 我的值是 271403,但线程 3 的值是 271402。
最终数组值:
索引 0: 271402
索引 1: 271403
索引 2: 271403
索引 3: 271402
英文:
Several threads increment a value in an array in parallel. Every thread increments its own value, not changing the one of the others. After each incrementation we have a barrier so that threads wait for each other and continue to the next loop iteration only when all are done. This way, I suppose, every time all the threads reach the barrier, the values in every array element are equal. However, it is not always the case. Why is this not always the case?
The code:
#include <iostream>
#include <omp.h>
#include <vector>
#include <chrono>
int main() {
const int num_threads = 4;
const auto start_time = std::chrono::high_resolution_clock::now();
std::vector<int> arr(num_threads, 0);
#pragma omp parallel num_threads(num_threads)
{
int thread_num = omp_get_thread_num();
while (true) {
// Check elapsed time
auto now = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(now - start_time).count();
if (elapsed >= 10) break;
arr[thread_num]++;
std::cout << "Thread " << thread_num << " done grinding.\n";
#pragma omp barrier // Wait for other threads
// Read other threads' values
for (int i = 0; i < num_threads; ++i) {
if (i != thread_num && arr[i] != arr[thread_num]) {
std::cout << "Thread " << thread_num << ": My value is " << arr[thread_num]
<< ", but thread " << i << "'s value is " << arr[i] << ".\n";
}
}
}
}
std::cout << "Final array values:\n";
for (int i = 0; i < num_threads; ++i) {
std::cout << "Index " << i << ": " << arr[i] << "\n";
}
return 0;
}
The output (the end of it):
Thread 3 done grinding.
Thread 2 done grinding.
Thread 1: My value is 271402, but thread 2's value is 271403.
Thread 1 done grinding.
Thread 3: My value is 271402, but thread 1's value is 271403.
Thread 3: My value is 271402, but thread 2's value is 271403.
Thread 0: My value is 271402, but thread 2's value is 271403.
Thread 1: My value is 271403, but thread 0's value is 271402.
Thread 1: My value is 271403, but thread 3's value is 271402.
Thread 2: My value is 271403, but thread 0's value is 271402.
Thread 2: My value is 271403, but thread 3's value is 271402.
Final array values:
Index 0: 271402
Index 1: 271403
Index 2: 271403
Index 3: 271402
答案1
得分: 1
以下是您提供的代码的翻译部分:
当我运行这段代码时,几乎每次都会挂起,永远无法完成,因为其中一个线程被卡在等待其他已经终止的线程的屏障上。这是因为某个线程在9.99999秒时检查`elapsed`并继续执行,但另一个线程在10秒时检查并退出。这也是为什么一些线程上的完成计数可能比其他线程上的要高,因为它们多进行了一轮,此外,尽管在运行中增加的值有时不相等,因为一些线程在打印当前值的for循环中花费的时间比其他线程多,这些线程已经继续前进到下一个等待的屏障。
为了适当终止线程并完美同步它们,需要一个标志,该标志将在`#pragma omp flush`中使用,以在相同时间中终止线程,还需要在增量比较和打印for循环之后添加另一个屏障:
```cpp
#include <iostream>
#include <omp.h>
#include <vector>
#include <chrono>
#include <sstream>
int main()
{
const auto start_time = std::chrono::high_resolution_clock::now();
const int num_threads = 4;
std::vector<int> arr(num_threads, 0);
bool breakFlag = false;
#pragma omp parallel num_threads(num_threads) default(none) shared(start_time, arr, std::cout, breakFlag)
{
int thread_num = omp_get_thread_num();
while (true) {
// Check elapsed time
auto now = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(now - start_time).count();
if (elapsed >= 10) breakFlag = true;
arr[thread_num]++;
std::stringstream ss;
ss << "Thread " << thread_num << " done grinding.\n";
std::cout << ss.str();
#pragma omp barrier // 等待其他线程
#pragma omp flush(breakFlag)
if (breakFlag) break;
// 读取其他线程的值
for (int i = 0; i < num_threads; ++i) {
if (i != thread_num && arr[i] != arr[thread_num]) {
std::stringstream _ss;
_ss << "Thread " << thread_num << ": My value is " << arr[thread_num]
<< ", but thread " << i << "'s value is " << arr[i] << ".\n";
std::cout << _ss.str();
}
}
#pragma omp barrier // 等待其他线程
}
}
std::cout << "Final array values:\n";
for (int i = 0; i < num_threads; ++i) {
std::cout << "Index " << i << ": " << arr[i] << "\n";
}
return 0;
}
请注意,这只是代码的翻译部分,不包括问题或附加信息。如果您需要任何其他信息或更多的帮助,请随时提问。
英文:
The code as written hangs nearly every time I run it, never finishing, because one of the threads gets stuck at the barrier waiting for the others that already terminated. This happens because some thread checks elapsed
at 9.99999 seconds and proceeds but another thread checks at 10 and exits. This is also why the finish count can be higher on some threads than on others because they go an extra round, also amid the run the increment values are sometimes not equal because some threads spend longer in the for loop printing out their current value than others who have already proceeded to wrap around to the next barrier to wait.
In order to terminate the threads appropriately and to synchronize them perfectly a flag is needed which will be #pragma omp flush
ed in order to break the threads at the same time, also another barrier after the increment comparison and print for loop is needed:
#include <iostream>
#include <omp.h>
#include <vector>
#include <chrono>
#include <sstream>
int main()
{
const auto start_time = std::chrono::high_resolution_clock::now();
const int num_threads = 4;
std::vector<int> arr(num_threads, 0);
bool breakFlag = false;
#pragma omp parallel num_threads(num_threads) default(none) shared(start_time,arr,std::cout, breakFlag)
{
int thread_num = omp_get_thread_num();
while (true) {
// Check elapsed time
auto now = std::chrono::high_resolution_clock::now();
auto elapsed = std::chrono::duration_cast<std::chrono::seconds>(now - start_time).count();
if (elapsed >= 10) breakFlag = true;
arr[thread_num]++;
std::stringstream ss;
ss << "Thread " << thread_num << " done grinding.\n";
std::cout << ss.str();
#pragma omp barrier // Wait for other threads
#pragma omp flush(breakFlag)
if(breakFlag) break;
// Read other threads' values
for (int i = 0; i < num_threads; ++i) {
if (i != thread_num && arr[i] != arr[thread_num]) {
std::stringstream _ss;
_ss << "Thread " << thread_num << ": My value is " << arr[thread_num]
<< ", but thread " << i << "'s value is " << arr[i] << ".\n";
std::cout << _ss.str();
}
}
#pragma omp barrier // Wait for other threads
}
}
std::cout << "Final array values:\n";
for (int i = 0; i < num_threads; ++i) {
std::cout << "Index " << i << ": " << arr[i] << "\n";
}
return 0;
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论