英文:
golang channels vs. routine spawn for performance
问题
我很好奇是否有人分析过这两种范式之间的性能差异。
-
有一个监听goroutine(可能有几个),监听一个套接字并生成一个新的goroutine来处理该信息并将其发送到需要发送的地方。发送命令后,该例程将完成并被销毁。每个请求都会创建一个例程,并在完成后销毁它。
-
有一个监听goroutine(可能有几个),监听一个套接字并将数据传递给一个通道。许多goroutine在通道接收上阻塞,并轮流从通道中取出并处理数据。处理完毕后,该例程将等待通道获取更多信息。在这种范式中,例程永远不会被销毁。有几个主例程在通道上接收套接字信息,其他例程在通道上等待处理信息。例程永远不会被销毁。
我想问的问题是,对于一个接收大量小信息的系统(每个消息0.5-1.5kb),但同时有大量消息进来(高容量,低大小),哪种范式对于速度和处理更好?是让一堆例程坐在那里,使用通道将它们分散到一堆监听例程中?还是为每个请求创建一个例程,并在每个请求后结束该例程?
即使是基本的思想和推测也可以。
谢谢。
英文:
i was curious if anyone has analyzed the performance difference between these two paradigms.
-
Have a listener goroutine (maybe a few) that listen on a socket and spawn a new goroutine to process that information and send it along to wherever it has to go. After the send command the routine will finish and will be destroyed. Every request will create a routine and then destroy it when finished.
-
Have a listener goroutine (maybe a few) that listen on a socket and passes the data to a channel. Many goroutines are blocking on a channel receive and will take turns taking things out of the channel and processing them. When done, the routine will wait on the channel to get more information. Routines are never destroyed in this paradigm. A couple master routines receiving socket info on channels and the other routines waiting on channels to process information. Routines are never destroyed.
The question i have is for a system that receives lots of small bits of information in a receive (0.5-1.5kb per message) but has lots of messages coming in at once (high volume, low size) what paradigm is better for speed and processing. Having a bunch of routines sitting and using channels to spread them over a bunch of listening routines? Or, creating a routing for each request and having that routine end after each request?
even rudimentary ideology and conjecture is cool.
Thanks.
答案1
得分: 2
一般来说,我倾向于认为无限制地生成大量的例程而不重用它们是一种浪费:即使 goroutine 是廉价的,它们也不是免费的,并且会带来调度成本。
现在,在高负载下,这两种方法都有缺点:生成例程会消耗内存、调度,并且可能使程序停顿,而使用通道,您的请求将在前一个请求处理完之前挂起。
我通常的做法是使用基于批处理的流水线(受到优秀的Go 并发模式:流水线和取消博文的启发):
- 监听器将请求发送到一个通道中
- 一个聚合例程将请求放入缓冲区
- N 个处理例程在空闲时从缓冲区中获取请求块
通过这种方式,您可以精确地控制流水线的流动和行为,同时保持多个工作线程的优势。您可以通过在缓冲区超过限制大小时丢弃传入的请求、生成或终止新的工作线程来轻松实现溢出机制,以适应负载等。
英文:
Generally speaking, I tend to find that spawning countless routines without re-using them is wasting: even if goroutines are cheap, they aren't free, and imply a scheduling cost.
Now, both methods have drawbacks under high load: spawning routines will cost you memory, scheduling, and may grind your program to a halt, while using channels your requests will hang until the previous one is processed.
My usual approach is to do a batch-based pipeline (inspired by the excellent Go Concurrency Patterns: Pipelines and cancellations blog post):
- The listener send requests in a channel
- One aggregation routine put requests in a buffer
- N processing routines requests chunks of the bufer whenever they are free
That way you can control precisely the flow and behavior of your pipeline, while keeping the advantages of multiple workers. You can easily implement an overflow mechanism by dropping the incomming requests if the buffer get over a limit size, spawn or kill new workers to acomodate the load, etc.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论