pthread在线程函数中的障碍后停在加入上

huangapple go评论73阅读模式
英文:

pthread stall on join after barriers in thread function

问题

我会翻译你提供的内容,但因为代码部分不需要翻译,我会跳过这些部分。以下是翻译的内容:


我进一步研究了pthread屏障,按照pthread教程 - Peter Chapin,3.2屏障第11页中的pthread教程的说明,该教程介绍了在线程函数中使用两个屏障的用法。第一个屏障会挂起所有线程,直到它们都到达了loop_barrier,由任意线程选中来执行屏障返回后的串行清理,然后线程函数中的另一个屏障prep_barrier会挂起所有线程,直到串行清理完成。

我的理解是,这允许线程在提供线程同步的同时连续运行,而在处理的特定指针位置,每个周期的工作在所有线程继续以并发方式运行之前都完成。示例只展示了一个这样的周期发生了什么,然后设置了一个done标志,线程函数返回。

所有线程都会在loop_barrierprep_barrier上挂起等待,但问题是,线程函数返回后,程序会在第一个pthread_join()上陷入停滞,gdb解释说,这是由于"in pthread_barrier_destroy () from /lib64/libpthread.so.0"的结果。

教程只提供了线程函数和主程序的框架,我只提供了完成它所需的最小内容,声明了一个结构来保存每个线程函数中的for循环的不同循环限制以及用于保存线程索引和for循环变量值之和的成员。显然,我对屏障的理解并不如我以为的那么完全。导致在join时挂起的代码如下:

(以下是代码的部分,请注意,我不会翻译它,因此已被省略)

示例用法/输出:

(以下是示例输出的部分,请注意,我不会翻译它,因此已被省略)

代码在第94行的if ((rtn = pthread_join (id[i], &res))) {处卡住,提供了手动中断。所以,为什么每个线程函数都会在第二个屏障释放(如"thread index: x, sum: yyyy"输出所示),但代码在main()中的pthread_join()上卡住呢?

英文:

I was looking further into pthread barriers following the pthread tutorial at pthread Tutorial - Peter Chapin, 3.2 Barriers pg 11 going through the use of two barriers in the thread function, the first suspends all threads until all have reached the loop_barrier confirmed by the arbitrary thread elected to do any serial cleanup following the barrier return of PTHREAD_BARRIER_SERIAL_THREAD and a subsequent barrier in the thread function of prep_barrier which ensures all threads are suspended until the serial cleanup is done.

My understanding being that this allows threads to run continually while providing thread synchronization at a given pointer in the processing where all work on a per-cycle basis is completed before all threads continue running in a concurrent manner. The example simply shows what occurs for one such cycle and then a done flag is set and the thread function returns.

All threads do suspend and wait on the loop_barrier and prep_barrier, but the problem is that following thread function return the program stalls on the first pthread_join() which gdb explaiins, rather unhelpfully, is the result of "in pthread_barrier_destroy () from /lib64/libpthread.so.0"

The tutorial provides only a framework for the thread function and main program and I simply provided the minimum to complete it, declaring a struct to hold the different loop-limits for the for loop in each thread function and members to hold the thread index and sum of the for loop variable values. Apparently I didn't understand the barriers quite a completely as I thought I did. The code causing the hang-on-join is:

#define _GNU_SOURCE

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <pthread.h>

#define handle_error_en(en, msg) \
  do { errno = en; perror(msg); exit(EXIT_FAILURE); } while (0)

#define handle_error(msg) \
  do { perror(msg); exit(EXIT_FAILURE); } while (0)

#define NCPU 4
#define ITER_PER_CPU  100

typedef struct {
  int index, start, end;
  unsigned sum;
} loop_data;

pthread_barrier_t loop_barrier;   /* global barriers (could pass in data) */
pthread_barrier_t prep_barrier;

void *thread_fn (void *data)
{
  int done = 0, 
      i = 0;
  loop_data *thread_data = data;
  
  do {
    for (i = thread_data->start; i < thread_data->end; i++) {
      /* each arg gets separate loop_data - do work */
      thread_data->sum += i;
    }
    
    /* suspend on barrier and do any per-cycle cleanup */
    if (pthread_barrier_wait (&loop_barrier) == PTHREAD_BARRIER_SERIAL_THREAD) {
      puts ("PTHREAD_BARRIER_SERIAL_THREAD");
      /* no actual per-cycle cleanup, just set done flag */
      done = 1;
    }
    /* suspend on barrier until per-cycle cleanup complete */
    pthread_barrier_wait (&prep_barrier);
    
    printf ("thread index: %d, sum: %d\n", 
            thread_data->index, thread_data->sum);
    
  } while (!done);
  
  return data;
}

int main (void) {

  pthread_t id[NCPU];
  pthread_attr_t attr;
  loop_data arr[NCPU] = {{ .start = 0 }};
  void *res;
  int rtn = 0;
  
  /* initialize barriers and validate */
  if ((rtn = pthread_barrier_init (&loop_barrier, NULL, NCPU))) {
    handle_error_en (rtn, "pthread_barrier_init-loop_barrier");
  }
  if ((rtn = pthread_barrier_init (&prep_barrier, NULL, NCPU))) {
    handle_error_en (rtn, "pthread_barrier_init-prep_barrier");
  }
  
  /* initialize thread attributes (using defaults) and validate */
  if ((rtn = pthread_attr_init (&attr))) {
    handle_error_en (rtn, "pthread_attr_init");
  }
  
  /* set data index, start, end and create/validate each thread */ 
  for (int i = 0; i < NCPU; i++) {
    /* initialize index, start / end values */
    arr[i].index = i;
    arr[i].start = i * ITER_PER_CPU;
    arr[i].end = (i + 1) * ITER_PER_CPU;
    printf ("id: %d, start: %3d, end: %3d\n", i, arr[i].start, arr[i].end);
    /* create thread and validate */
    if ((rtn = pthread_create (&id[i], &attr, thread_fn, &arr[i]))) {
      handle_error_en (rtn, "pthread_create");
    }
  }

  /* join all threads and compare sums from threads with sums in main */
  for (int i = 0; i < NCPU; i++) {
    loop_data *data = NULL;
    /* join and validate */
    printf ("joining thread index: %d\n", i);
    if ((rtn = pthread_join (id[i], &res))) {
      fprintf (stderr, "error: thread %d\n", i);
      handle_error_en (rtn, "pthread_join");
    }
    data = res;   /* pointer to return struct provided through parameter */
    printf ("thread index: %d joined\n", data->index);
  }
  
  /* destroy barriers and validate */
  if ((rtn = pthread_barrier_destroy (&loop_barrier))) {
    handle_error_en (rtn, "pthread_barrier_destroy-loop_barrier");
  }
  if ((rtn = pthread_barrier_destroy (&prep_barrier))) {
    handle_error_en (rtn, "pthread_barrier_destroy-prep_barrier");
  }
}

Example Use/Output

$ ./bin/pthread-vtctut-04
id: 0, start:   0, end: 100
id: 1, start: 100, end: 200
id: 2, start: 200, end: 300
id: 3, start: 300, end: 400
joining thread index: 0
PTHREAD_BARRIER_SERIAL_THREAD
thread index: 2, sum: 24950
thread index: 0, sum: 4950
thread index: 1, sum: 14950
thread index: 3, sum: 34950
^C

The manual interrupt provided where the code hangs on line 94 at if ((rtn = pthread_join (id[i], &res))) {. So why since each thread function is released by the second barrier (as indicated by the "thread index: x, sum: yyyy" output does the code hang on pthread_join() in main()?

答案1

得分: 2

根据 pthread_barrier_wait 的 man 页面,在必要数量的线程成功到达屏障后,线程将通过屏障并且屏障将返回到与最近一次调用 pthread_barrier_init 时相同的状态。这意味着当再次使用相同的屏障时,必须再次有 4 个线程到达屏障。

然而,只有 3 个线程将到达循环的第二次迭代,因为获取 PTHREAD_BARRIER_SERIAL_THREAD 的线程将退出循环并终止。

这意味着在第二次循环迭代中,将有 3 个线程被阻塞在 pthread_barrier_wait(&loop_barrier),因为终止的线程永远不会再次到达该屏障。

因此,main 函数只能与 4 个线程中的 1 个进行连接。

英文:

According to the man page for pthread_barrier_wait, after the necessary number of threads have successfully reached the barrier, the threads will pass the barrier and the barrier will be returned to the same state it had after the most recent call to pthread_barrier_init. This means that when you use the same barrier again, it will again be 4 threads that must reach the barrier.

However, only 3 threads will reach the second iteration of the loop, because the thread that gets PTHREAD_BARRIER_SERIAL_THREAD will exit the loop and terminate.

This means that 3 threads will get stuck on pthread_barrier_wait (&loop_barrier) in the second loop iteration, because the terminated thread will never reach that barrier a second time.

For this reason, the function main will only be able to join with 1 of the 4 threads.

答案2

得分: 2

代码部分不翻译。以下是翻译好的内容:

主要问题在于从第一个屏障等待中接收 PTHREAD_BARRIER_SERIAL_THREAD 的线程未有效地通知其他线程停止。它设置了变量 done,但这是线程函数的局部变量。该函数的每次执行都有自己的 done,因此一个线程修改它的 done 对其他线程的 done 不可见。

由于只有一个线程停止,main 将在第一个或第二个 pthread_join() 调用中阻塞,具体取决于哪个线程终止。如果是第一个 pthread_join() 调用挂起,那么一个线程将被终止但未加入。无论如何,三个线程都将重新回到第一个屏障并在那里阻塞。

修复方法是将 done 更改为具有静态存储期而不是自动存储期,以便它在线程之间共享。可以通过在函数内部使用 static 关键字声明它,也可以将其从函数中移到文件范围。如果要能够在运行 thread_fn() 的线程之外的范围内设置或重置 done,则后者是唯一可行的替代方法。

此外,屏障是足够的同步原语,以确保达到它们的所有线程看到静态存储期 done 的修改,但如果您希望线程看到未达到屏障的线程的修改,那么您将需要不同的机制。您可以另外使用互斥锁来保护它,但根据您要做的确切操作,使用 _Atomic 可能更容易。

英文:

The primary issue is that the thread that receives PTHREAD_BARRIER_SERIAL_THREAD from the first barrier wait does not effectively signal the other threads to stop. It sets variable done, but this is a local variable of the thread function. Each execution of that function has its own, so one thread modifying its done is not visible to the the other threads via their dones.

Since only one of the threads stops, main will block on either the first or the second pthread_join() call, depending on which thread it happens to be that terminates. If it's the first pthread_join() call that hangs then one thread will be left terminated but unjoined. Either way, three threads will have cycled back around to the first barrier and blocked there.

Fix it by giving done static instead of automatic storage duration, so that it is shared among the threads. You can do that either by declaring it with the static keyword inside the function or by pulling it out of the function to file scope. If you want to be able to set or reset done from outside the scope of the threads running thread_fn() then the latter is the only viable alternative.

Also, the barriers are adequate synchronization primitives for the purpose of ensuring that the threads reaching them all see each others' modifications to a static-duration done, but if you want the threads to see modifications by a thread that does not reach the barrier then you'll need a different mechanism. You could additionally protect it with a mutex, but depending on exactly what you want to do, it might be easier to just make it _Atomic.

huangapple
  • 本文由 发表于 2023年3月7日 10:15:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75657493.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定