MLRun,响应时间慢的问题

huangapple go评论125阅读模式
英文:

MLRun, Issue with slow response times

问题

我看到吞吐量更高和平均响应延迟较长(等待工作进程在20-50秒的范围内),请参考grafana的输出:

MLRun,响应时间慢的问题

我知道,优化的一部分可以是:

  • 使用更多的工作进程(对于每个pod/副本)
  • 增加每个pod/副本的资源
  • 在k8s中使用更多的pod/副本

我根据增加资源和pod/副本来调整性能,如下所示:

# 增加资源(加快执行速度)
fn.with_requests(mem="500Mi", cpu=0.5)	# 默认资源
fn.with_limits(mem="2Gi", cpu=1)	 	# 最大资源
	
# 基于增加pod/副本的并行执行
fn.spec.replicas = 2		# 默认副本
fn.spec.min_replicas = 2	# 最小副本
fn.spec.max_replicas = 5	# 最大副本

你知道如何增加工作进程的数量以及对CPU/内存的预期影响吗?

英文:

I see higher throughput and long average response delay (waiting for worker in range 20-50 seconds), see outputs from grafana:

MLRun,响应时间慢的问题

I know, that part of optimization can be:

  • use more workers (for each pod/replica)
  • increase sources for each pod/replica
  • use more pods/replicas in k8s

I tuned performance based on increase sources and pods/replicas see:

# increase of sources (for faster execution)
fn.with_requests(mem="500Mi", cpu=0.5)	# default sources
fn.with_limits(mem="2Gi", cpu=1)	 	# maximal sources
	
# increase parallel execution based on increase of pods/replicas
fn.spec.replicas = 2		# default replicas
fn.spec.min_replicas = 2	# min replicas
fn.spec.max_replicas = 5	# max replicas

Do you know, how can I increase amount of workers and expected impacts to CPU/Memory?

答案1

得分: 0

我知道了。工作程序使用单独的工作程序范围。这意味着每个工作程序都有所有变量的副本,并且所有更改都在工作程序内部保留(由工作程序x更改,不会影响工作程序y)。这意味着至少要为pod/副本的内存级别增加请求/限制资源是有用的。

您可以根据**fn.with_http(workers=<n>)**来设置http触发器的工作程序数量,更多信息请参见这里。我根据源代码进行了调整:

增加每个pod/副本的工作程序数(两个工作程序)

fn.with_http(workers=2)

增加资源请求(以提高执行速度)

fn.with_requests(mem="1Gi", cpu=0.7) # 内存增加了2倍,稍微增加了CPU,因为有两个工作程序
fn.with_limits(mem="2Gi", cpu=1) # 最大资源(未更改)

基于pod/副本增加并行执行

fn.spec.replicas = 2 # 默认副本数
fn.spec.min_replicas = 2 # 最小副本数
fn.spec.max_replicas = 5 # 最大副本数

英文:

I got it. The worker uses separate worker scope. This means that each worker has a copy of all variables, and all changes are kept within the worker (change by worker x, do not affect worker y). It means, it is useful to increase the request/limit resources at least for memory in level of pod/replica.

You can setup amount of workers for http trigger based on that fn.with_http(workers=<n>), more information see. I updated code based on source tuning:

# increase of workers (two workers) for each pod/replica
fn.with_http(workers=2)

# increase of sources (for faster execution)
fn.with_requests(mem="1Gi", cpu=0.7)	# increased mem 2x and little cpu, because of two workers
fn.with_limits(mem="2Gi", cpu=1)        # maximal sources (without changes)
    
# increase parallel execution based on increase of pods/replicas
fn.spec.replicas = 2        # default replicas 
fn.spec.min_replicas = 2    # min replicas
fn.spec.max_replicas = 5    # max replicas

huangapple
  • 本文由 发表于 2023年7月10日 20:54:39
  • 转载请务必保留本文链接:https://go.coder-hub.com/76653956.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定