2023年5月29日 06:45:21go评论124阅读模式

英文:

How to store thread-specific data in C++ globally?

问题

我需要在每个线程中存储一些数据，但由于我将在多个地方访问它，我不能只使用 extern thread_local。我想知道是否有一种方法可以在当前线程中存储一些数据，类似于这样：

auto data = std::this_thread::my_data;

我知道这不是标准的C++ API，而且 this_thread 没有任何字段来存储自定义数据。但我需要类似于这样的东西。

（对于那些想知道我为什么需要这个的人，简而言之，我正在设计一个围绕LibTorch（PyTorch的C++ API）的调度程序，并且我需要在LibTorch的特定部分知道我分配给该特定线程的CUDA上下文，因为我有多个并行运行的线程，每个线程都使用不同的CUDA上下文。我目前使用cuCtxGetCurrent()，但问题是它会增加一些不可忽视的开销，并且在从多个线程调用时会导致一些同步问题。）

更新：
由于有一些误解，让我更详细地解释为什么我不能使用 thread_local。正如我之前提到的，我正在为LibTorch设计一个调度程序。想象一下这个简单的例子：

CUcontext ctx;
cuCtxCreate(ctx, 0, 0);
cuCtxSetCurrent(ctx);
auto stream = at::cuda::getStreamFromPool(false, 0);
at::cuda::setCurrentCUDAStream(stream);
Tensor output = mod->forward(input);

现在，forward 是在LibTorch后端中实现并已经编译的函数。在forward 内部调用了许多内部的PyTorch函数。在后端的某个地方，我想知道我选择了哪个CUDA上下文和流，但使用 thread_local 我无法做到这一点。这可能是可能的，但不那么容易和直接。因此，如前所述，我需要类似于 std::this_thread::my_data 的东西，以便在任何地方都可以全局访问。

英文:

I need to store some data in each thread, but since I'm going to access it in multiple places, I can't just use extern thread_local. I was wondering if there is an approach to somehow store some data in the current thread, something like this:

auto data = std:this_thread::my_data;

I know this is not a standard C++ API and this_thread does not have any field to store custom data. But I need something similar to that.

(For those who wondering why I need this, in summary, I am designing a scheduler around LibTorch (PyTorch's C++ API), and I need to know in specific parts of LibTorch's API which CUDA Context I have assigned to that specific thread since I have multiple threads running in parallel each using a different CUA Context. I'm currently using cuCtxGetCurrent() but the problem is that it adds not that negligible overhead and also causes some synchronization problems when called from multiple threads.)

UPDATE:
Since there is a misunderstanding, let me explain in more detail why I can't use thread_local. As I mentioned, I'm designing a scheduler for LibTorch. Imagine this simple example:

CUcontext ctx;
cuCtxCreate(ctx, 0, 0);
cuCtxSetCurrent(ctx);
auto stream = at::cuda::getStreamFromPool(false, 0);
at::cuda::setCurrentCUDAStream(stream);
Tensor output = mod-&gt;forward(input);

Now, forward is a function implemented in the LibTorch backend and already compiled. And there are tons of inner PyTorch functions called inside forward. Somewhere in the backend, I want to know which CUDA Context and Stream I have selected but using thread_local I can't do it. It might be possible, but not that easy and straightforward. So, as mentioned earlier, I need something like std:this_thread::my_daya to have global access anywhere.

答案1

得分: 1

std::thread 构造函数期望一个可调用对象：thread( Function&& f, Args&&... args );。有多种方法可以在其他线程之间共享线程的 my_data。一个标准的方法是通过在 args 中传递 my_data 的引用：

int my_data = 0;
std::thread thread([](int& my_data){ my_data = 1; }, std::ref(my_data));
thread.join();
std::cout << my_data << "\n";

另一种方法是使用可调用对象：

struct my_thread {
  int my_data = 0;
  void operator()() { my_data = 1; }
};
my_thread this_thread;
std::thread thread(std::ref(this_thread));
thread.join();
std::cout << this_thread.my_data << "\n";

请注意，上述代码中的注释和标记已被保留原样。

英文:

std::thread constructor expected a callable: thread( Function&& f, Args&&... args );. There are many approaches how to share thread's my_data between other threads. A canonical approach is passing my_data by reference in args:

int my_data = 0;
std::thread thread([](int&amp; my_data){ my_data = 1; }, std::ref(my_data));
thread.join();
std::cout &lt;&lt; my_data &lt;&lt; &quot;\n&quot;;

Another approach is using a callable object:

struct my_thread {
  int my_data = 0;
  void operator()() { my_data = 1; }
};
my_thread this_thread;
std::thread thread(std::ref(this_thread));
thread.join();
std::cout &lt;&lt; this_thread.my_data &lt;&lt; &quot;\n&quot;;

答案2

得分: 0

你在某处定义了一个数据，并通过引用将其传递给线程构造函数。可以执行任何你想让线程工作器执行的操作，并且可以在其他地方反复使用它。

英文:

You define a data somewhere and pass it by ref to your thread constructor. Do whatever you'd like your thread worker to perform and you can use it again and again in other places.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在全局范围内在C++中存储线程特定数据？

问题

答案1

答案2

如何使用ZeroMQ编写自己的RPC实现以支持Protocol Buffers？

循环模板与非常量参数

在派生类中可以初始化基类成员结构变量吗？

getline不允许我输入多行。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。