C# – 在并行.for中重复使用AsyncLocal值

huangapple go评论72阅读模式
英文:

C# - AsyncLocal value reused on parallel.for

问题

The Microsoft documentation about AsyncLocal states that:

表示与给定异步控制流局部相关的环境数据,例如异步方法。

I have a class that is used to capture some data during code execution and may be used in async code. I was trying to use an AsyncLocal to share data in async flows and it works "as expected" when using tasks.
我有一个类,用于在代码执行期间捕获一些数据,可能在异步代码中使用。我试图使用AsyncLocal在异步流中共享数据,并且在使用任务时它的工作方式"符合预期"。

However, it is a bit strange when doing a parallel.for.
但是,在使用parallel.for时,情况有点奇怪。

Example:
示例:

var asyncValue = new AsyncLocal<int>();

Parallel.For(1, 30, _ =>
{
    asyncValue.Value = asyncValue.Value + 1;
    Console.Write($"{asyncValue.Value}, ");
});

I was expecting the output to be 1 for all executions, but it isn't.
我期望所有执行的输出都是1,但实际情况并非如此。

Output example:
输出示例:
1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2,

Then, by adding the Task.CurrentId to the output, I can see that the values start to be different than 1 when the task ID is "reused".
然后,通过将Task.CurrentId添加到输出中,我可以看到当任务ID被"重用"时,值开始与1不同。

$"{Task.CurrentId} - {asyncValue.Value}";

It looks like tasks are being reused to run more than one execution (probably because the loop has more iterations than the threads available) and the AsyncValue captured by the first task execution is also being shared.
看起来任务正在被重用以执行多次(可能是因为循环的迭代次数多于可用的线程),并且由第一个任务执行捕获的AsyncValue也被共享。

Is this the expected behavior?
这是否是预期行为?

I need to share data between tasks and their possible child tasks, but I have no control on how those tasks are created (as they are created by external code using my library). Can't use AsyncLocal because I would never know if the data is being shared with descendant or sibling tasks.
我需要在任务和它们可能的子任务之间共享数据,但我无法控制这些任务是如何创建的(因为它们是由使用我的库的外部代码创建的)。不能使用AsyncLocal,因为我永远不知道数据是否与后代任务或兄弟任务共享。

Update 1:
Some context:

  • The class that is part of a library that is used by other developers.
  • The main goal is to keep tracking of changes done to some objects (something like a database log)
  • The changes are organized in a tree, like the call stack. So we can see what have changed in which method, what are the parent and child methods, order, etc.
  • What I really need is a way to share data between tasks and its children tasks.
  • One option is to provide a method that external code must use to create tasks/perform parallel.for. This way I can share data using parameters/returns/captured variables, etc.
  • But, AsyncLocal would be much cleaner if the data wasn't shared between sibling tasks in parallel.for.
  • Note that currently I do not control how the external code uses my library, so they can use tasks/parallel.for/etc., as they need.

Update 2:
If we declare the parallel.for action as async, it no longer shares the AsyncLocal data, even though it still reuses the task IDs.
如果我们将parallel.for的操作声明为异步,它不再共享AsyncLocal数据,尽管它仍然重用任务ID。

var asyncValue = new AsyncLocal<int>();

Parallel.For(1, 30, **async** _ =>
{
    asyncValue.Value = asyncValue.Value + 1;
    Console.Write($"{asyncValue.Value}, ");
});
$"{Task.CurrentId} - {asyncValue.Value}";
12 - 1
8 - 1
9 - 1
10 - 1
11 - 1
13 - 1
13 - 1
13 - 1
13 - 1
13 - 1
13 - 1
13 - 1
...
英文:

The Microsoft documentation about AsyncLocal states that:
> Represents ambient data that is local to a given asynchronous control flow, such as an asynchronous method.

I have a class that is used to capture some data during code execution and may be used in async code. I was trying to use an AsyncLocal to share data in async flows and it works "as expected" when using tasks.

However, it is a bit strange when doing a parallel.for.

Example:

var asyncValue = new AsyncLocal&lt;int&gt;();

Parallel.For(1, 30, _ =&gt;
{
	asyncValue.Value = asyncValue.Value + 1;
	Console.Write($&quot;{asyncValue.Value}, &quot;);
});

I was expecting the ouput to be 1 for all executions, but it isn't.

Output example:
1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2,

Then, by adding the Task.CurrentId to the output, I can see that the values start to be different than 1 when the task ID is "reused".

$&quot;{Task.CurrentId} - {asyncValue.Value}&quot;

14 - 1
8 - 1
9 - 1
9 - 2
22 - 1
23 - 1
14 - 2
8 - 2
...

It looks like tasks are being reused to run more than one execution (probably because the loop has more iterations that the threads available) and the AsyncValue capture by the first task execution is also being shared.

Is this the expected behavior?

I need to share data between tasks and their possible child tasks, but I have no control on how those tasks are created (as they are created by external code using my library). Can't use AsyncLocal because I would never know if the data is being share with descendant or sibling tasks.

Update 1:
Some context:

  • The class that is part of a library that is used by other developers.
  • The main goal is to keep tracking of changes done to some objects (something like a database log)
  • The changes are organized in a three, like the call stack. So we can see what have changed in which method, what are the parent and child methods, order, etc.
  • What I really need is a way to share data between tasks and its childreen tasks.
  • One option is to provide a method that external code must use to create tasks/perform parallel.for. This way I can share data using parameters/returns/captured variables, etc.
  • But, AsyncLocal would be much cleaner, if the data wasn't shared between sibling tasks on parallel.for.
  • Note that currently I do not control how the external code use my library, so they can use tasks/parallel.for/etc, as they need.

Update 2:
If we declare theparallel.for action as async, it no longer shares the AsyncLocal data, even though it still reuses the task IDs.

var asyncValue = new AsyncLocal&lt;int&gt;();

Parallel.For(1, 30, **async** _ =&gt;
{
	asyncValue.Value = asyncValue.Value + 1;
	Console.Write($&quot;{asyncValue.Value}, &quot;);
});
$&quot;{Task.CurrentId} - {asyncValue.Value}&quot;

12 - 1
8 - 1
9 - 1
10 - 1
11 - 1
13 - 1
13 - 1
13 - 1
13 - 1
13 - 1
13 - 1
13 - 1
...

答案1

得分: 1

AsyncLocal 是一种允许将数据存储在异步流程中的对象。当然,这并不会自动发生,为了使其工作,后台必须发生一些事情。就像 ThreadLocal 需要后台发生一些事情一样:操作系统必须将数据附加到当前线程,因此线程的某些上下文必须存在。这只是操作系统对我们隐藏的一个细节。可以通过 C# 中的 ExecutionContext 访问附加到线程的此环境上下文,可以在 Stephen Toub 的优秀文章中了解更多信息:https://devblogs.microsoft.com/pfxteam/executioncontext-vs-synchronizationcontext/

因此,问题是 Parallel 方法似乎不能捕获当前正在运行的 ExecutionContext。可以通过以下方式修复这个问题:

var asyncValue = new AsyncLocal&lt;int&gt;();
var ec = ExecutionContext.Capture();

Parallel.For(1, 30, _ =&gt;
{
    ExecutionContext.Run(ec, delegate
    {
        asyncValue.Value = asyncValue.Value + 1;
        Console.Write($&quot;{asyncValue.Value}, &quot;);
    }, null);
});

这基本上就是在你使用类似任务(Tasks)和异步/等待(async/await)的高级结构时语言为你做的事情。

话虽如此,我强烈建议通过显式传递上下文来解决这个问题,而不是通过像 ExecutionContext 这样的环境构造隐式传递上下文,这样做很难进行测试和维护。

英文:

AsyncLocal is an object that allows storing data local to async flow. This of course doesn't happen automagically, for that to work something has to happen in the background. Just like for ThreadLocal to work something has to happen in background: os has to attach data to current thread, and so some ambient context of thread has to exist. It is just a detail hidden from us by the os. This ambient context attached to thread can be accessed in C# through ExecutionContext, read more about it in the excelent Stephen Toub's article here: https://devblogs.microsoft.com/pfxteam/executioncontext-vs-synchronizationcontext/

So, the problem is that Parallel methods doesn't seem to capture the ExecutionContext you're currently running on. This can be fixed like that:

var asyncValue = new AsyncLocal&lt;int&gt;();
var ec = ExecutionContext.Capture();

Parallel.For(1, 30, _ =&gt;
{
    ExecutionContext.Run(ec, delegate
    {
        asyncValue.Value = asyncValue.Value + 1;
        Console.Write($&quot;{asyncValue.Value}, &quot;);
    }, null);
});

and this is pretty much what the language does for you when you use higher level constructs like Tasks and async/await.

That being said, I strongly suggest you pass the context explicitly, not through ambient constructs like ExecutionContext, which is hard to test and maintain.

huangapple
  • 本文由 发表于 2023年4月17日 19:29:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/76034683.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定