英文:
Kedro - how to set max_workers when running pipelines with ParallelRunner?
问题
我正在使用 Kedro 版本 0.18.7 和 Python 3.9 在 WSL2 中。
我想要通过运行命令 kedro run --pipeline <pipeline_name> --runner ParallelRunner
并行运行我的管道中的节点。根据文档 ParallelRunner ,应该可以定义要使用的最大 CPU 核心数量(使用 max_workers
),但我在如何使用此参数方面感到困惑。显然,我不能只是将它添加到命令中,如 --runner ParallelRunner --max_workers 4
。
有人知道如何为 ParallelRunner 设置 max_workers 吗?
以前关于 max_workers 的讨论都是针对较旧版本的 Kedro(例如 github 问题)。我猜我需要在项目目录的某个地方创建一个文件,并编写相关代码,类似于 runner=ParallelRunner(max_workers=4)
(cli.py?run.py?settings.py?),但除此之外我一无所知。
任何提示或指导都将不胜感激。
英文:
I'm using Kedro version 0.18.7 and python 3.9 in WSL2.
I'd like to run nodes of my pipeline in parallel by running the command kedro run --pipeline <pipeline_name> --runner ParallelRunner
. According to the documentation ParallelRunner, it should be possible to define the maximum number of CPU cores to use (using max_workers
), but I am struggling to find out how to use this argument. Apparently I cannot just add it to the command like --runner ParallelRunner --max_workers 4
.
Does somebody know how to set max_workers for ParallelRunner?
Previous discussions on max_workers are from older versions of Kedro (for example github issue). I guess I need to create a file somewhere in the project directory and write relevant code, something like runner=ParallelRunner(max_workers=4)
(cli.py? run.py? settings.py?), but other than that I am lost.
Any tips or guidance would be appreciated.
答案1
得分: 2
One way that can work is by creating a kedro session to run your pipeline.
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from kedro.runner import ParallelRunner
from pathlib import Path
bootstrap_project(Path("<project_root>"))
with KedroSession.create() as session:
session.run(pipeline_name=<pipeline-name>, runner=ParallelRunner(max_workers=4))
英文:
One way that can work is by creating a kedro session to run your pipeline.
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from kedro.runner import ParallelRunner
from pathlib import Path
bootstrap_project(Path("<project_root>"))
with KedroSession.create() as session:
session.run(pipeline_name=<pipeline-name>, runner=ParallelRunner(max_workers=4))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论