Parallel class and thread context switch.

huangapple go评论72阅读模式
英文:

Parallel class and thread context switch

问题

尝试理解线程上下文切换如何影响通过ForEachFor使用的Parallel类的迭代执行。我尝试通过执行多个进程将CPU使用率提高到100%,但没有一个Parallel的迭代改变了它的Thread.CurrentThread.ManagedThreadId值。

  • 为了将CPU使用率提高到100%,我启动了几个高优先级进程,包括示例

  • 我们需要处理线程上下文切换的代码:

    Parallel.For(0, 3, (index, i) =>
    {
        var firstId = Thread.CurrentThread.ManagedThreadId;
        while (true)
        {
            var rnd = new Random(index);
            checkIds(firstId, Thread.CurrentThread.ManagedThreadId);
            var digits = new List<int>();
            for (int j = 0; j < 10000; j++)
            {
                digits.Add(rnd.Next());
                if (continueWriteLine)
                    Console.WriteLine($"ID: = {Thread.CurrentThread.ManagedThreadId}");
            }
    
            if (continueWriteLine)
                digits.ForEach(Console.WriteLine);
        }
    });
    
  • 尝试处理线程切换的代码:

    if (firstId != currentId)
    {
        continueWriteLine = false;
        Thread.Sleep(1000);
        Console.WriteLine($"{firstId} - {currentId}");
    }
    

所以,我有几个问题:

  1. Parallel类的迭代执行过程中,线程是否会由于某种原因(例如Thread.Sleeplock语句、Mutex等)切换到另一个线程?

  2. 如果线程确实会切换,这些线程切换会如何影响ManagedThreadId属性?

  3. 是否可以安全地使用ManagedThreadId作为ConcurrentDictionary的唯一键,从中可以检索有关当前操作的任何信息,例如有关文件读取的信息:当前行、要读取的目标对象、已读取的对象等等,在当前操作期间需要这些信息?

附言:在第三个问题中提供的解决方案的原因是不希望在帮助我读取和处理文件的每一行新数据以维护文件处理上下文的方法之间传输大部分数据。也许解决方案是在解析器方法之间仅传输一个对象,类似于FileProcessingInfo,其中包含所有上下文数据(我在第三个问题中提到的),但我不能确定哪种解决方案更好。

英文:

Trying to understand how the thread context switch affects the execution of iterations of the Parallel class through ForEach or For usage. I tried to increase CPU usage up to 100% with execution of several processes, but not a single Parallel's iteration has changed it's Thread.CurrentThread.ManagedThreadId value.

  • To increase CPU up to 100% usage, I've started several high priority processes, including the example.

  • Code where do we need to handle thread context switch:

    Parallel.For(0, 3, (index, i) =&gt;
    {
        var firstId = Thread.CurrentThread.ManagedThreadId;
        while (true)
        {
            var rnd = new Random(index);
            checkIds(firstId, Thread.CurrentThread.ManagedThreadId);
            var digits = new List&lt;int&gt;();
            for (int j = 0; j &lt; 10000; j++)
            {
                digits.Add(rnd.Next());
                if (continueWriteLine)
                    Console.WriteLine($&quot;ID: = {Thread.CurrentThread.ManagedThreadId}&quot;);
            }
    
            if (continueWriteLine)
                digits.ForEach(Console.WriteLine);
        }
    });
    
  • Code that tries to handle thread switch:

    if (firstId != currentId)
    {
        continueWriteLine = false;
        Thread.Sleep(1000);
        Console.WriteLine($&quot;{firstId} - {currentId}&quot;);
    }
    

So, I have several questions:

  1. Can a thread switch to another one, during the execution of an iteration of the Parallel class, by some reason e.g. Thread.Sleep, lock statement, Mutex, etc.?

  2. And how this threads' switch affects the ManagedThreadId property, if they really will be switched?

  3. Will it be safe to use ManagedThreadId as unique key of the ConcurrentDictionary from which any information can be retrieved for a current operation e.g. information about file's reading: current line, desired object to read, already read objects, and a lot of other things that are needed during current operation?

P.S. The reason for the solution given in the third question is lack of desire to transfer most of this data between methods that helps me read and process every new line of file in order to maintain context of file's processing. Maybe the solution would be to transfer only one object between parser's methods, something like FileProcessingInfo, that contains all context data (which I mentioned in the third question), but I don't know for sure which solution would be better.

答案1

得分: 1

  1. 不行。Parallel 类的迭代执行期间,由于某种原因,例如 Thread.Sleeplock 语句、Mutex 等,线程无法切换到另一个线程。每个独立的 Parallel.For/Parallel.ForEach 循环迭代都会从开始到结束在同一个线程上运行。此线程完全专注于此迭代,并且在此迭代完成之前不会在其他地方执行任何无关的工作。完成此迭代后,线程可能会专注于其他迭代,或者返回到线程池。

  2. 每个迭代并不保证在一个物理 CPU 核心上连续运行。操作系统可能会执行一个或多个线程切换,暂停此线程的执行,并将物理 CPU 核心分配给其他线程。这个现象对你的程序是透明的。线程本身在操作系统级别进行线程切换时不会体验到任何可观察到的症状。我不知道 .NET 平台本身是否会在操作系统发生线程切换时接收到任何通知。如果我必须猜测,我会说可能不会。

  3. 值得注意的是,新的异步 API Parallel.ForEachAsync(.NET 6)调用异步的 body 委托,包含 await 语句的异步委托会在每个 await 后常规切换线程。在这种情况下,这不是操作系统级别的线程切换,而是你关心的类型的线程切换,Thread.CurrentThread.ManagedThreadIdawait 后会发生变化。

  4. 不安全,因为这个 ID 不保证是唯一的。在线程终止后,其编号可以被重用。而且你无法控制线程池线程的生命周期,默认情况下 Parallel API 使用这些线程。如果你想唯一标识一个线程,可以将线程对象本身用作字典的 TKeyConcurrentDictionary&lt;Thread, FileProcessingInfo&gt;)。但你可能会发现使用 ThreadLocal&lt;FileProcessingInfo&gt; 更方便。

英文:

> 1. Can a thread switch to another one, during the execution of an iteration of the Parallel class, by some reason e.g. Thread.Sleep, lock statement, Mutex, etc.?

No. Each individual iteration of a Parallel.For/Parallel.ForEach loop runs invariably on the same thread from start to finish. This thread is completely dedicated to this iteration, and won't do any unrelated work elsewhere before this iteration completes. After this iteration completes, the thread might dedicate itself to some other iteration, or return to the ThreadPool.

Each iteration is not guaranteed to run non-stop on one physical CPU-core though. The operating system might perform one or many thread switches, by suspending the execution of this thread and assigning the physical CPU-core to some other thread. This phenomenon is transparent to your program. The thread itself doesn't experience any observable symptom whenever a thread-switch occurs at the operating system level. I don't know if the .NET platform itself receives any notification from the operating system whenever a thread-switch occurs. If I had to guess, I would say probably not.

It should be noted that the new asynchronous API Parallel.ForEachAsync (.NET 6) invokes an asynchronous body delegate, and asynchronous delegates that contain await statements are routinely switching threads after each await. In this case it's not an operating system thread-switching. It's the kind of thread-switching that you are interested for, with the Thread.CurrentThread.ManagedThreadId changing after the await.

> 3. Will it be safe to use ManagedThreadId as unique key of the ConcurrentDictionary from which any information can be retrieved for a current operation e.g. information about file's reading: current line, desired object to read, already read objects, and a lot of other things that are needed during current operation?

No, because this ID is not guaranteed to be unique. After a thread terminates, its number can be reused. And you have no control over the life-cycle of the ThreadPool threads, the threads that the Parallel APIs use by default. If you want to identify uniquely a Thread, use the Thread object itself as the TKey of the dictionary (ConcurrentDictionary&lt;Thread, FileProcessingInfo&gt;). But you might find using instead a ThreadLocal&lt;FileProcessingInfo&gt; to be more convenient.

2: https://stackoverflow.com/questions/2221908/how-unique-is-managedthreadid/2221963#2221963 "How unique is ManagedThreadID?"
3: https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadlocal-1

答案2

得分: 1

并行for循环使用一个分区器。如果未提供分区器,则使用内置的默认通用分区器。分区器的作用是将迭代范围划分为块。

并行for循环可以分区的最小粒度是一个迭代。换句话说,工作线程将使用尚未分配的循环索引来调用并行for循环体函数。当工作线程完成该索引的循环体后,它将再次查阅分区器,以确定接下来应该执行哪个索引。

默认分区器可以与线程池和调度程序以及一组复杂的启发式方法一起工作,以决定分配给哪个线程的范围。通过这种方式,范围会根据当前系统状态动态分配给线程批次,以尝试在典型的通用情况下实现良好的性能,而不是静态分区,其中我们只将范围划分为大致相等大小的固定数量的块。

以下是任何实现将如何处理分区和线程池管理的一些基本原则:

  • 目标很可能是实现高__占用率__。这意味着系统中的每个虚拟处理器都应该执行可运行的线程。这意味着我们可能希望至少有与虚拟处理器数量一样多的线程(如果循环体执行速度非常快,范围非常小,那么分配任务给线程的开销超过在当前线程上执行的好处时,情况可能不是这样)。
  • 当线程池中的线程进入__阻塞__状态(即阻塞等待IO或同步事件,或休眠)时,如果占用率不到100%,则线程池可能会创建新线程。然而,线程堆栈会占用宝贵的内存资源,因此启发式方法可能选择等待阻塞的线程解除阻塞,而不总是创建新线程,否则可能会有在每次循环迭代时创建一个线程的风险。一般来说,它可以从操作系统获取有关线程被阻塞的原因以及该线程是否会因某些可预测的不可避免事件(如休眠)而解除阻塞,或者是否会因来自另一个线程的不可预测的同步事件而解除阻塞。有时创建新线程是避免死锁的唯一方法(考虑等待第i + 1次迭代完成的循环体)。
  • 如果线程池中的所有线程都可运行,则不会创建新线程(这是实现高占用率的方式)。
  • 它会谨慎地测量执行每个循环体迭代所需的时间,并相应地划分剩余范围的一部分,以利用将一系列迭代分配给特定线程的降低开销的好处,其中它可以执行一批迭代,而无需再次检查分区器以查看应该执行哪个范围。
  • 默认分区器绝不是__最佳__的,也不能从根本上进行优化,因为它必须在不知道循环体行为模式的情况下设计。因此,它是一个通用的分区器,对于大多数典型的用例来说效果相当不错。
英文:

A parallel for loop makes use of a partitioner. If one is not provided, it uses the built-in default general-purpose one. The job of the partitioner is to divide up the iteration range into chunks.

The smallest unit of granularity with which a parallel for loop can be partitioned is one iteration. In other words, a worker thread will call the parallel for loop body function with one as-of-yet-unassigned loop index in the range. When that worker thread completes the loop body for that index, it will consult the partitioner again to see which index it should execute next.

The default partitioner can work hand-in-hand with the thread pool and the scheduler and a complex set of heuristics to decide which range to assign which thread. In this way, ranges are assigned in batches to threads dynamically in response to current system state (as opposed to a static partitioning where we just divide the range into a fixed number of chunks of approximately equal size) in an attempt to achieve good performance in typical general-purpose scenarios.

Here are some basic principles of how any implementation will approach partitioning and thread pool management:

  • The goal is likely to achieve high occupancy. Meaning each virtual processor in the system should be executing a runnable thread. This means we will likely want at least as many threads as there are virtual processors (this may not be so if the loop body executes very quickly, though, and the range is very small, where the overhead of assigning tasks to a thread exceeds the benefit of just executing on the current thread.)
  • As threads in the thread pool enter a blocked state (i.e. blocked waiting for IO or a synchronization an event, or sleeping), then new threads may be created by the thread pool if occupancy is not at 100%. Thread stacks consume valuable memory resources though, so the heuristics may choose to spend some time waiting for blocked threads to become unblocked rather than always creating new threads otherwise you would be at risk of creating a thread for every iteration of the loop. In general, it can get help from the operating system to know the reason for the thread being blocked, and if that thread is going to be unblocked by something predictably inevitable like a sleep or if it's blocked on something unpredictable like a synchronization event from another thread. Sometimes creating a new thread is the only way to avoid a deadlock. (Consider a loop body that waits for the i + 1st iteration to complete).
  • If all the thread pool threads are runnable, then no new threads are created (this is high-occupancy achieved).
  • It conservatively measures performance of how long it takes to execute each loop body iteration and partitions some of the remainder of the range accordingly, to take advantage of the reduced overhead benefit of assigning a range of iterations to a particular thread, where it can execute a batch of iterations without incurring the overhead of checking back with the partitioner to see which range it is supposed to execute next.
  • The default partitioner is by no means optimal, and cannot fundamentally be optimized because it must be designed without knowing the pattern of loop body behaviour. So it is a general purpose partitioner that does pretty good for most typical use cases.

huangapple
  • 本文由 发表于 2023年6月30日 03:22:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76584045.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定