英文:
Limit one container to handling only 1 request at the time in Azure Kubernetes Services
问题
如何确保一个容器只尝试处理一个请求?我在容器中运行一个 Flask API 服务器,但它并不设计为同时处理多个请求。
目前似乎多个请求被放入一个 pod/容器中,因为我一直收到 OOMKilled 状态。
请注意,这只会在我快速连续发送请求时发生,例如,3秒内发送3个请求。
请注意,我不能百分之百确定这种情况是否发生,我发现很难定义请求在 AKS 集群中的流向。如果您有关于如何监视此问题的建议,我将非常感激!
我尝试将 deployment.yaml 中的资源请求和资源限制设置为相同的值,如下所示:
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "100m"
memory: "128Mi"
这不是我首选的解决问题的方式,因为我的程序大多数时间只需要 32Mi 内存,128Mi 并不经常需要。
英文:
How do I make sure that one container only tries to handle one request? I am running a Flask API server in my container, but it is not designed to handle multiple requests at the same time.
Right now it seems like multiple requests are put into one pod/container as I keep getting an OOMKilled status.
Note that this only happens when I send requests in quick succession, e.g. 3 requests with 3 seconds in between.
Note that I am not 100% sure that this is happening, I find it difficult to define where the requests are going in the AKS cluster. If you have any advice on how to monitor this, I would greatly appreciate it!
I tried to put the resource request and resource limit to the same value in the deployment.yaml like this:
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "100m"
memory: "128Mi"
This is not my prefered way to solve the problem as most of the time my program only needs 32Mi memory and the 128Mi is not needed that often.
答案1
得分: 0
以下是翻译好的内容:
它没有设计用于同时处理多个请求
那么代码设计可能存在问题 😟,解决代码问题并不是一味地增加服务器的解决方案。
如果我是你,我会这样做:
- 修复代码以处理多个请求,也许存在内存泄漏问题。
- 增加内存(将其翻倍,看看是否有帮助)。
- 使用类似 grafana.com 这样的工具来监控你的应用,以了解为什么内存使用量增加。
- 增加并发性。
- 基于内存创建一个水平 Pod 自动缩放器 (HPA),当内存增加到一定阈值时,它会增加你的 Pod 数量。
- 添加 readiness probe 并将其配置成这样:如果 Pod 不响应,负载均衡器不会将请求发送到该 Pod。
- 如果确实需要逐个处理请求,可以使用队列。一个 API 在接收请求时会将项目放入队列中,而工作程序会逐个处理项目。
英文:
> It is not designed to handle multiple requests at the same time
Well the code is not designed properly then 😅 There are limits to throw more servers to solve a code problem.
If I were you, here is what I would do:
- Fix the code to handle several requests. Maybe you have a memory leak.
- Increase the memory (double it, and see if it helps)
- Monitor your app with something like grafana.com to know why it is increasing
- increase concurrency
- create an HPA (Horizontal pod autoscaler) based on Memory, when the memory increases to a certain threshold, it will increase your
pod count. - add readiness probe and configure it in a way that if the pod doesn't answer, the LB won't send requests to the pod.
- if you really need to process only 1 request at the time, use a queue. An API will put items in a queue when receiving requests, and a worker will process 1 item by 1 item.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论