2023年3月3日 18:17:41go评论60阅读模式

英文:

Issue with MLRun Spark service start and impact all Jupyter notebooks

问题

我在K8s中重新配置了Spark基础设施（作为MLRun/Iguazio平台的一部分），之后，在服务级别出现了许多问题：

Spark服务（显示“失败”信息）
所有Jupyter笔记本（显示“依赖失败”信息）

还有一般性的错误/消息：

一些服务未成功部署。请查看下面所示的服务状态。

请参见屏幕截图

我只更改了RAM的数量（1-30 GB RAM）、vCPU（1-14）和副本数（3）。

您是否遇到类似的问题，以及如何避免这种情况？

英文:

I reconfigured Spark infrastructure in K8s (as part of MLRun/Iguazio platform) and after that, I got a lot of issues in level of services:

Spark service (with information Failed)
All jupyter notebooks (with information Failed dependencies)

and also general error/message:

Some services have not been successfully deployed. Check the services status as shown below.

See the print screen

I changed only amount of RAM (1-30 GB RAM), vCPU (1-14) and Replicas (3).

Did you get the similar issue and how to avoid the situation?

答案1

得分: 0

这是人为错误，解决办法很简单，关键问题出在Spark服务配置上（我配置了极小的vCPU值，导致Spark服务超时）：

我将vCPU设置在1-14的范围内，但是使用了默认的单位millicpu（而不是cpu）
在设置正确的单位cpu并重新启动Spark服务后，一切正常。

错误设置

正确设置

英文:

It was human mistake, the solution was easy and the key problem was in Spark service configuration (I configured extremely small vCPU values and it generated timeouts for Spark service):

I used setting vCPU in the range 1-14 but I used default units millicpu (not cpu)
After setup correct units cpu and restart of Spark service, everything was fine.

Wrong setting

Correct setting

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

MLRun Spark 服务启动存在问题，影响所有 Jupyter 笔记本。

问题

答案1

你可以使用client-go API从Kubernetes中获取Pod的事件消息。

PySpark: 使DataFrame不再可访问

当我们删除Spark管理的表时会发生什么？

在PySpark中调优while循环（在循环中持久化或缓存数据框）。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论