Coarse-grained multithreading和Fine-grained multithreading的管道约束为什么不同?

huangapple go评论68阅读模式
英文:

why do pipeline constraints of Coarse-grained multithreading and Fine-grained multithreading differ?

问题

在《计算机组织与设计:硬件/软件接口,第六版》RISCV版(作者David A. Patterson和John L. Hennessy)的第6.4章中,提到了"coarse-grained multithreading"(粗粒度多线程):

这种改变减轻了需要快速进行线程切换的需求,不太可能减慢单个线程的执行,因为其他线程的指令只会在线程遇到昂贵的停滞时才会发出。

因为粗粒度多线程的处理器会从单个线程发出指令,所以当发生停滞时,必须清空或冻结流水线。在停滞后开始执行的新线程必须填充流水线,然后指令才能完成。

但是关于"Fine-grained multithreading"(细粒度多线程),它没有提到在切换线程时对流水线进行更改:

这种交错通常是以轮转方式进行的,跳过在该时钟周期处于停滞状态的任何线程。

问题:
由于书中说:

一个线程包括程序计数器、寄存器状态和堆栈。

并且两种多线程的类别在遇到停滞时都会进行线程切换,为什么粗粒度多线程需要清空流水线,因为流水线指令来源仅来自一个线程,然后填充流水线,但"细粒度多线程"不需要呢?

英文:

In "Computer Organization and Design: The Hardware/
Software Interface, Sixth Edition" RISCV Edition by David A. Patterson and John L. Hennessy chapter 6.4, it says about "coarse-grained multithreading":
> This change relieves the need to have thread
switching be extremely fast and is much less likely to slow down the execution of an
individual thread, since instructions from other threads will only be issued when
a thread encounters a costly stall.
>
> Because a processor with coarse-grained
multithreading issues instructions from a single thread, when a stall occurs, the
pipeline must be emptied or frozen. The new thread that begins executing after the
stall must fill the pipeline before instructions are able to complete.

But about "Fine-grained
multithreading", it doesn't refer to changes to pipeline when switching threads:
> This interleaving is often done in a
round-robin fashion, skipping any threads that are stalled at that clock cycle.

Q:
Since the book says:
> A thread includes
the program counter, the
register state, and the
stack
.

and both categories of multithreading begins switching threads when encountering stalls, why must Coarse-grained multithreading need pipeline be empty because pipeline instruction source is only from a single thread and then fill the pipeline but "Fine-grained multithreading" not?

答案1

得分: 2

我认为重点在于,如果你要同时使用两组寄存器状态、页表、浮点异常状态等,那么最好进行细粒度多线程处理。

因此,制作一个主要为支持细粒度多线程付出大部分成本的粗粒度多线程CPU不是一个好的权衡选择。至少在这一段中,这似乎是一个未明确说明的假设,但也许他们在其他地方讨论了这个问题。

只采用粗粒度多线程的好处在于,你不需要支持在流水线中同时包含来自不同上下文的指令,简化了诸如浮点异常和舍入模式之类的事物,不需要每条指令都有。

被交换出的线程的体系结构状态可以保存到只由硬件上下文切换逻辑访问的特殊存储中,而不是在许多地方额外的标记位和两倍的RAT条目中访问。

(正如Bandwidth博士所评论的,细粒度多线程通常仅用于具有乱序执行和寄存器重命名的CPU。)

英文:

I think the point is that if you're going to have two sets of register state, page tables, FP exception state, etc. that can be active at once, you might as well do fine-grained multithreading.

So it wouldn't be a good tradeoff to make a coarse-grained multithreading CPU that paid most of the cost to support fine-grained multithreading. In this paragraph at least, that looks like an unstated assumption, but perhaps they discuss it elsewhere.

The benefit of only doing coarse-grained multithreading this way is that you don't need to support having instructions from different contexts in the pipeline at once, simplifying things such as FP exceptions and rounding mode to not need to be per-instruction.

Architectural state for the thread being swapped out can get saved to special storage that's only accessed by the hardware-context-switching logic, instead of extra tag bits in a bunch of things, and a RAT with twice as many entries.

(As Dr. Bandwidth comments, fine-grained multithreading is usually only used in CPUs with out-of-order exec and register renaming.)

huangapple
  • 本文由 发表于 2023年7月20日 13:15:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76726857.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定