英文:
C++ Parallelization Without Threads?
问题
我最近查看了这个答案,讨论了管道处理。问题是为什么将两个列表求和到两个单独的变量比将相同的列表全部异或到一个变量中要快。链接的答案得出结论,可以并行运行求和操作,而每个异或操作必须按顺序计算,从而产生了所见效果。
我不明白。高效的并行化不是需要多个线程吗?这些加法如何在只有一个线程的情况下并行运行?
此外,如果编译器如此聪明以至于可以在第二个函数中创建一个全新的线程,为什么它不能在循环终止后创建两个变量,执行异或操作,然后将两个变量异或在一起?对于任何人来说,这样的优化都是显而易见的。难道将这样的优化程序化到编译器中比我想象的更难吗?
非常感谢任何解释!
英文:
I recently viewed this answer discussing pipelining. The question asked why a loop summing two lists to two separate variables was faster than xor-ing the same lists all to one variable. The linked answer concluded that the sums could be run in parallel, while each xor had to be computed consecutively, thus producing the seen effect.
I do not understand. Doesn't efficient parallelization require multiple threads? How can these additions be run in parallel on only one thread?
Additionally, if the compiler is so smart that it can magick in a whole new thread, why can't it just create two variables in the second function, execute the xor-s in parallel, and then xor the two variables back together after the loop terminates? To any human, such an optimization would be obvious. Is it harder to program such an optimization into the compiler than I realize?
Any explanation would be greatly appreciated!
答案1
得分: 4
CPU由一个流水线制成。多个操作可以执行各种任务(解码指令、评估、进行一些计算、读/写中央内存、读/写寄存器等),所有这些任务必须依次为每个指令执行。
可以有各种优化,以使这个流水线以更高效的方式完成工作。
实际上,CPU同时处理多个指令,但只有一个指令正在使用流水线的特定部分。
流水线概念还引入了各种容易出错的模式,例如写后读操作,但有处理方法(例如nop指令)。
这与多线程无关,多线程是一个更高级的概念。在这里,我们处于较低层次,即CPU如何执行指令。
你固定的线程中提供的链接是一个不错的起点(链接)
英文:
CPUs are made of a pipeline. Multiple operations may do various stuff (decode instruction, evaluate, do some calculations, read/write central memory, read/write registers, ...), and all this stuff must be done one after the other for each instruction.
There can be various optimizations so that this pipeline does the job in a more efficient way.
So in fact, multiple instructions are processed at the same time by the CPU, but only one instruction is using a specific part of the pipeline.
The pipeline concept also introduces various error-prone pattern, such as a read-after-write operation, but there are ways to deal with it (e.g nop instructions)
This is nothing relative to multithreading, which is a higher level concept. Here, we are at the lower point, i.e how the CPU executes instructions.
The provided link in the thread you pinned is a nice starting point (link)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论