英文:
Handling lots of req / sec in go or nodejs
问题
我正在开发一个需要处理高负载突发请求的网络应用程序,每分钟我会在几秒钟内收到大量请求(每秒约1M-3M),然后在接下来的一分钟内不再收到请求。
我在每个前端服务器上处理尽可能多的请求/秒的最佳策略是什么?只需发送一个回复并以某种方式将请求存储在内存中,以便稍后由数据库写入工作程序进行处理?
目标是在突发期间尽量少做处理,并在突发结束后尽快将请求写入数据库。
编辑:事务的顺序不重要,我们可以丢失一些事务,但99%的事务需要记录下来。将所有请求传递给数据库的延迟可以在最后一个请求接收后几秒钟内,不超过15秒。
英文:
I'm developing a web app that needs to handle bursts of very high loads,
once per minute I get a burst of requests in very few seconds (~1M-3M/sec) and then for the rest of the minute I get nothing,
What's my best strategy to handle as many req /sec as possible at each front server, just sending a reply and storing the request in memory somehow to be processed in the background by the DB writer worker later ?
The aim is to do as less as possible during the burst, and write the requests to the DB ASAP after the burst.
Edit : the order of transactions in not important,
we can lose some transactions but 99% need to be recorded
latency of getting all requests to the DB can be a few seconds after then last request has been received. Lets say not more than 15 seconds
答案1
得分: 2
如何使用与DB写入程序在15秒内处理的缓冲区大小相等的通道?当请求到达时,将其发送到通道上。如果通道已满,则返回一种“系统超载”的错误响应。
然后,DB写入程序从通道中读取数据并写入数据库。
英文:
How about a channel with a buffer size equal to what the DB writer can handle in 15 seconds? When the request comes in, it is sent on the channel. If the channel is full, give some sort of "System Overloaded" error response.
Then the DB writer reads from the channel and writes to the database.
答案2
得分: 2
这个问题有点模糊,但我会试着回答一下。
1)你需要限制连接数。一个简单的实现会打开数百万个连接到数据库,这显然会导致性能下降。至少,每个连接在数据库上会占用几兆字节的内存。即使使用连接池,每个“线程”也可能需要大量内存来记录其(传入的)状态。
如果你的应用服务器有限数量的处理线程,你可以使用HAProxy来“接听电话”,并将请求缓冲在队列中几秒钟,直到你的应用服务器上有一个空闲线程来处理请求。
实际上,你可以只使用像nginx这样的Web服务器来接收请求并返回“200 OK”。然后,一个简单的应用程序读取Web日志并插入到数据库中。这种方法的扩展性相当好,尽管你可能希望有一个线程读取日志和多个线程插入。
2)如果你的编程语言支持协程,最好自己处理缓冲。你应该测量依赖于语言运行时进行调度的开销。
例如,如果每个HTTP请求是1K的头部+数据,你希望解析它并丢弃除了你实际需要的一两个数据(即数据库ID)之外的所有内容。如果你依赖于语言协程作为“隐式”队列,它将为每个协程保留1K的缓冲区,而它们正在被解析。在某些情况下,使用有限数量的工作线程并显式管理队列可能更高效/更快。当你有一百万件事情要做时,小的开销会迅速累积,而语言运行时并不总是针对你的应用程序进行优化。
此外,Go语言比Node.js更好地控制内存。(结构体比对象小得多。对于Go来说,结构体的“开销”是编译时的事情,而对于Node.js来说是运行时的事情)
3)你如何知道它是否工作正常?你希望能够准确地知道自己的运行情况。当你依赖于语言协程时,很难询问“我有多少个执行线程,最老的是哪个?”如果你创建一个显式的队列,这些问题就容易回答了。(想象一下,有一些工作线程将东西放入队列中,还有一些工作线程将东西取出。在边缘周围可能有一些不确定性,但中间的队列非常明确地捕捉到了你的积压情况。你可以轻松计算出“排水速率”和“最大内存使用量”,这些对于知道自己有多过载非常重要。)
我的建议是选择Go语言。从长远来看,Go将是一个更好的选择。目前,Go运行时还有点不成熟,但每个版本都在变得更好。在一些方面(成熟度、社区规模、库等),Node.js可能略领先一些。
英文:
This question is kind of vague. But I'll take a stab at it.
1) You need limits. A simple implementation will open millions of connections to the DB, which will obviously perform badly. At the very least, each connection eats MB of RAM on the DB. Even with connection pooling, each 'thread' could take a lot of RAM to record it's (incoming) state.
If your app server had a limited number of processing threads, you can use HAProxy to "pick up the phone" and buffer the request in a queue for a few seconds until there is a free thread on your app server to handle the request.
In fact, you could just use a web server like nginx to take the request and say "200 OK". Then later, a simple app reads the web log and inserts into DB. This will scale pretty well, although you probably want one thread reading the log and several threads inserting.
2) If your language has coroutines, it may be better to handle the buffering yourself. You should measure the overhead of relying on our language runtime for scheduling.
For example, if each HTTP request is 1K of headers + data, want to parse it and throw away everything but the one or two pieces of data that you actually need (i.e. the DB ID). If you rely on your language coroutines as an 'implicit' queue, it will have 1K buffers for each coroutine while they are being parsed. In some cases, it's more efficient/faster to have a finite number of workers, and manage the queue explicitly. When you have a million things to do, small overheads add up quickly, and the language runtime won't always be optimized for your app.
Also, Go will give you far better control over your memory than Node.js. (Structs are much smaller than objects. The 'overhead' for the Keys to your struct is a compile-time thing for Go, but a run-time thing for Node.js)
3) How do you know it's working? You want to be able to know exactly how you are doing. When you rely on the language co-routines, it's not easy to ask "how many threads of execution do I have and what's the oldest one?" If you make an explicit queue, those questions are much easier to ask. (Imagine a handful of workers putting stuff in the queue, and a handful of workers pulling stuff out. There is a little uncertainty around the edges, but the queue in the middle very explicitly captures your backlog. You can easily calculate things like "drain rate" and "max memory usage" which are very important to knowing how overloaded you are.)
My advice: Go with Go. Long term, Go will be a much better choice. The Go runtime is a bit immature right now, but every release is getting better. Node.js is probably slightly ahead in a few areas (maturity, size of community, libraries, etc.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论