英文:
MLRun, Issue with slow response times
问题
我看到吞吐量更高和平均响应延迟较长(等待工作进程在20-50秒的范围内),请参考grafana的输出:
我知道,优化的一部分可以是:
- 使用更多的工作进程(对于每个pod/副本)
- 增加每个pod/副本的资源
- 在k8s中使用更多的pod/副本
我根据增加资源和pod/副本来调整性能,如下所示:
# 增加资源(加快执行速度)
fn.with_requests(mem="500Mi", cpu=0.5) # 默认资源
fn.with_limits(mem="2Gi", cpu=1) # 最大资源
# 基于增加pod/副本的并行执行
fn.spec.replicas = 2 # 默认副本
fn.spec.min_replicas = 2 # 最小副本
fn.spec.max_replicas = 5 # 最大副本
你知道如何增加工作进程的数量以及对CPU/内存的预期影响吗?
英文:
I see higher throughput and long average response delay (waiting for worker in range 20-50 seconds), see outputs from grafana:
I know, that part of optimization can be:
- use more workers (for each pod/replica)
- increase sources for each pod/replica
- use more pods/replicas in k8s
I tuned performance based on increase sources and pods/replicas see:
# increase of sources (for faster execution)
fn.with_requests(mem="500Mi", cpu=0.5) # default sources
fn.with_limits(mem="2Gi", cpu=1) # maximal sources
# increase parallel execution based on increase of pods/replicas
fn.spec.replicas = 2 # default replicas
fn.spec.min_replicas = 2 # min replicas
fn.spec.max_replicas = 5 # max replicas
Do you know, how can I increase amount of workers and expected impacts to CPU/Memory?
答案1
得分: 0
我知道了。工作程序使用单独的工作程序范围。这意味着每个工作程序都有所有变量的副本,并且所有更改都在工作程序内部保留(由工作程序x更改,不会影响工作程序y)。这意味着至少要为pod/副本的内存级别增加请求/限制资源是有用的。
您可以根据**fn.with_http(workers=<n>)
**来设置http触发器的工作程序数量,更多信息请参见这里。我根据源代码进行了调整:
增加每个pod/副本的工作程序数(两个工作程序)
fn.with_http(workers=2)
增加资源请求(以提高执行速度)
fn.with_requests(mem="1Gi", cpu=0.7) # 内存增加了2倍,稍微增加了CPU,因为有两个工作程序
fn.with_limits(mem="2Gi", cpu=1) # 最大资源(未更改)
基于pod/副本增加并行执行
fn.spec.replicas = 2 # 默认副本数
fn.spec.min_replicas = 2 # 最小副本数
fn.spec.max_replicas = 5 # 最大副本数
英文:
I got it. The worker uses separate worker scope. This means that each worker has a copy of all variables, and all changes are kept within the worker (change by worker x, do not affect worker y). It means, it is useful to increase the request/limit resources at least for memory in level of pod/replica.
You can setup amount of workers for http trigger based on that fn.with_http(workers=<n>)
, more information see. I updated code based on source tuning:
# increase of workers (two workers) for each pod/replica
fn.with_http(workers=2)
# increase of sources (for faster execution)
fn.with_requests(mem="1Gi", cpu=0.7) # increased mem 2x and little cpu, because of two workers
fn.with_limits(mem="2Gi", cpu=1) # maximal sources (without changes)
# increase parallel execution based on increase of pods/replicas
fn.spec.replicas = 2 # default replicas
fn.spec.min_replicas = 2 # min replicas
fn.spec.max_replicas = 5 # max replicas
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论