我们可以将操作系统线程固定到CPU核心上吗?是否应该这样做?

huangapple go评论69阅读模式
英文:

Can we, and should we pin OS threads to CPU cores?

问题

我正在使用Go(和C)进行编程。
场景是:在C中有每个线程的事件通道。
有时我从Go中进行cgo调用,将请求发布到当前线程的通道中。
完成事件在几微秒后从同一通道返回,然后我进行cgo调用以轮询它们。

当然,当提交请求的goroutine轮询完成时,它可能在另一个线程中,但暂时我们忽略这个问题。
我的问题是:在纯C中,我们可以调用pthread_setaffinity_np将轮询线程固定到CPU核心上以减少延迟(假设#线程 <= #核心)。在Go中,在存在goroutine的情况下,我们应该这样做吗?

  • 如果是,我们应该如何做?在几个goroutine中调用runtime.LockOSThread()以获取足够的OS线程,并通过cgo调用将它们固定在不同的核心上?
  • 如果不是,为什么?

附注:我也了解一点关于Go的调度器(G-M-P模型)。然而,似乎M不绑定到CPU核心,而P与物理核心无关。

英文:

I am programming in Go (and C).
The scenario is: there are per-thread event channels in C.
Sometimes I make cgo calls from Go, post requests to the channel of the current thread.
Completion events come back from the same channel after a few microseconds, and I make cgo calls to poll them back.

Of course when the goroutine that submits a request polls for the completion, it can be in another thread, but let's ignore this issue for now.
My question is: in pure C, we can call pthread_setaffinity_np to pin the poller thread to a CPU core to reduce latency (assume #threads <= #cores). In Go, should we do this in the presence of goroutines?

  • If yes, how can we do it? Call runtime.LockOSThread() in several goroutines to acquire enough OS threads, and pin them on different cores by cgo calls?
  • If no, why?

P.S. I also know a little bit about Go's scheduler (the G-M-P model). However, it seems that M-s are not bound to CPU cores, and P-s has nothing to do with physical cores.

答案1

得分: 4

从 C 调用 Go 会使用调用者的线程,因此如果 C 线程已经锁定到一个 CPU,当它进入 Go 运行时时,它将保持锁定状态。(要演示这个属性,你可以在 Linux 上本地运行 https://play.golang.org/p/2C9nxyohA91 上的程序;Go Playground 不支持 cgo。)

你可以使用 runtime.LockOSThread 将 Goroutine 锁定到一个线程,然后可以通过使用 cgo 调用 pthread_setaffinity_np 来设置 CPU 亲和性。

但是我不会期望这会提供很多,如果有的话,对延迟的改进:C 和 Go 之间的转换已经增加了相当多的延迟(由于缓存争用更新 Go 运行时调度器中的元数据等因素)。如果你在一个足够短的间隔内进行轮询,CPU 亲和性可能无论如何都会导致 CPU 缓存的变动;如果你在一个足够长的间隔内进行轮询,CPU 亲和性的效果将被轮询间隔的影响所淹没。

通过从基于轮询的方法切换到基于推送的方法,你可能会获得更大的改进:要么让 API 的 C 部分阻塞直到请求完成(并通过返回到 Go 来信号完成),要么让 C 部分回调到 Go 来直接通知原始的 Goroutine,比如通过关闭一个通道。(在 Go 1.17 中,你将能够使用 cgo.Handle 来获取一个值,你可以将其传递给 C 来引用 Go 分配的值。)

英文:

A call from C to Go uses the caller's thread, so if the C thread is already locked to a CPU it will remain so when it enters the Go runtime. (To demonstrate that property, you can run the program at https://play.golang.org/p/2C9nxyohA91 locally on Linux; the Go Playground doesn't support cgo.)

You can lock a Goroutine to a thread using runtime.LockOSThread, and from there you can likely set CPU affinity by calling pthread_setaffinity_np using cgo.

But I wouldn't expect that to provide much, if any, improvement in latency: the transition between C and Go already adds a fair amount of latency (due to cache contention updating metadata in the Go runtime scheduler, among other factors). If you're polling on a short enough interval that CPU affinity matters, you're probably going to churn the CPU cache either way; if you're on a long enough interval that it doesn't matter, the effect of CPU affinity will be dwarfed by the effect of the polling interval.

You would probably get more of an improvement by switching from a polling-based approach to a push-based approach: either have the C part of the API block until the request completes (and signal completion by returning to Go), or have the C part call back into Go to notify the original goroutine directly, such as by closing a channel. (In Go 1.17, you will be able to use a cgo.Handle to obtain a value you can pass to C to refer to a Go-allocated value.)

huangapple
  • 本文由 发表于 2021年8月2日 17:24:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/68619172.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定