英文:
Issue with MLRun Spark service start and impact all Jupyter notebooks
问题
我在K8s中重新配置了Spark基础设施(作为MLRun/Iguazio平台的一部分),之后,在服务级别出现了许多问题:
- Spark服务(显示“失败”信息)
- 所有Jupyter笔记本(显示“依赖失败”信息)
还有一般性的错误/消息:
一些服务未成功部署。请查看下面所示的服务状态。
我只更改了RAM的数量(1-30 GB RAM)、vCPU(1-14)和副本数(3)。
您是否遇到类似的问题,以及如何避免这种情况?
英文:
I reconfigured Spark infrastructure in K8s (as part of MLRun/Iguazio platform) and after that, I got a lot of issues in level of services:
- Spark service (with information
Failed
) - All jupyter notebooks (with information
Failed dependencies
)
and also general error/message:
Some services have not been successfully deployed. Check the services status as shown below.
I changed only amount of RAM (1-30 GB RAM), vCPU (1-14) and Replicas (3).
Did you get the similar issue and how to avoid the situation?
答案1
得分: 0
这是人为错误,解决办法很简单,关键问题出在Spark服务配置上(我配置了极小的vCPU值,导致Spark服务超时):
- 我将vCPU设置在1-14的范围内,但是使用了默认的单位
millicpu
(而不是cpu) - 在设置正确的单位
cpu
并重新启动Spark服务后,一切正常。
英文:
It was human mistake, the solution was easy and the key problem was in Spark service configuration (I configured extremely small vCPU values and it generated timeouts for Spark service):
- I used setting vCPU in the range 1-14 but I used default units
millicpu
(not cpu) - After setup correct units
cpu
and restart of Spark service, everything was fine.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论