2023年5月25日 05:52:28go评论108阅读模式

英文:

Kedro - how to set max_workers when running pipelines with ParallelRunner?

问题

我正在使用 Kedro 版本 0.18.7 和 Python 3.9 在 WSL2 中。

我想要通过运行命令 kedro run --pipeline <pipeline_name> --runner ParallelRunner 并行运行我的管道中的节点。根据文档 ParallelRunner ，应该可以定义要使用的最大 CPU 核心数量（使用 max_workers），但我在如何使用此参数方面感到困惑。显然，我不能只是将它添加到命令中，如 --runner ParallelRunner --max_workers 4。

有人知道如何为 ParallelRunner 设置 max_workers 吗？

以前关于 max_workers 的讨论都是针对较旧版本的 Kedro（例如 github 问题）。我猜我需要在项目目录的某个地方创建一个文件，并编写相关代码，类似于 runner=ParallelRunner(max_workers=4)（cli.py？run.py？settings.py？），但除此之外我一无所知。

任何提示或指导都将不胜感激。

英文:

I'm using Kedro version 0.18.7 and python 3.9 in WSL2.

I'd like to run nodes of my pipeline in parallel by running the command kedro run --pipeline <pipeline_name> --runner ParallelRunner. According to the documentation ParallelRunner, it should be possible to define the maximum number of CPU cores to use (using max_workers), but I am struggling to find out how to use this argument. Apparently I cannot just add it to the command like --runner ParallelRunner --max_workers 4.

Does somebody know how to set max_workers for ParallelRunner?

Previous discussions on max_workers are from older versions of Kedro (for example github issue). I guess I need to create a file somewhere in the project directory and write relevant code, something like runner=ParallelRunner(max_workers=4) (cli.py? run.py? settings.py?), but other than that I am lost.

Any tips or guidance would be appreciated.

答案1

得分: 2

One way that can work is by creating a kedro session to run your pipeline.

ref: https://docs.kedro.org/en/stable/kedro.framework.session.session.KedroSession.html#kedro-framework-session-session-kedrosession

from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from kedro.runner import ParallelRunner
from pathlib import Path
bootstrap_project(Path("<project_root>"))
with KedroSession.create() as session:
    session.run(pipeline_name=<pipeline-name>, runner=ParallelRunner(max_workers=4))

英文:

One way that can work is by creating a kedro session to run your pipeline.

ref: https://docs.kedro.org/en/stable/kedro.framework.session.session.KedroSession.html#kedro-framework-session-session-kedrosession

from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from kedro.runner import ParallelRunner
from pathlib import Path
bootstrap_project(Path(&quot;&lt;project_root&gt;&quot;))
with KedroSession.create() as session:
    session.run(pipeline_name=&lt;pipeline-name&gt;, runner=ParallelRunner(max_workers=4))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Kedro – 在使用ParallelRunner运行流水线时如何设置max_workers？

问题

答案1

为什么在爬取Google搜索结果时BeautifulSoup返回None？

What is the equivalent of .loc with multiple conditions in R?

可以使用OpenSSL将PKCS7转换为PkiPath吗？

在SQLAlchemy（2.x）模型中更新布尔属性以满足MyPy

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。