Kedro – 在使用ParallelRunner运行流水线时如何设置max_workers?

huangapple go评论67阅读模式
英文:

Kedro - how to set max_workers when running pipelines with ParallelRunner?

问题

我正在使用 Kedro 版本 0.18.7 和 Python 3.9 在 WSL2 中。

我想要通过运行命令 kedro run --pipeline <pipeline_name> --runner ParallelRunner 并行运行我的管道中的节点。根据文档 ParallelRunner ,应该可以定义要使用的最大 CPU 核心数量(使用 max_workers),但我在如何使用此参数方面感到困惑。显然,我不能只是将它添加到命令中,如 --runner ParallelRunner --max_workers 4

有人知道如何为 ParallelRunner 设置 max_workers 吗?

以前关于 max_workers 的讨论都是针对较旧版本的 Kedro(例如 github 问题)。我猜我需要在项目目录的某个地方创建一个文件,并编写相关代码,类似于 runner=ParallelRunner(max_workers=4)(cli.py?run.py?settings.py?),但除此之外我一无所知。

任何提示或指导都将不胜感激。

英文:

I'm using Kedro version 0.18.7 and python 3.9 in WSL2.

I'd like to run nodes of my pipeline in parallel by running the command kedro run --pipeline &lt;pipeline_name&gt; --runner ParallelRunner. According to the documentation ParallelRunner, it should be possible to define the maximum number of CPU cores to use (using max_workers), but I am struggling to find out how to use this argument. Apparently I cannot just add it to the command like --runner ParallelRunner --max_workers 4.

Does somebody know how to set max_workers for ParallelRunner?

Previous discussions on max_workers are from older versions of Kedro (for example github issue). I guess I need to create a file somewhere in the project directory and write relevant code, something like runner=ParallelRunner(max_workers=4) (cli.py? run.py? settings.py?), but other than that I am lost.

Any tips or guidance would be appreciated.

答案1

得分: 2

One way that can work is by creating a kedro session to run your pipeline.

ref: https://docs.kedro.org/en/stable/kedro.framework.session.session.KedroSession.html#kedro-framework-session-session-kedrosession

from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from kedro.runner import ParallelRunner
from pathlib import Path

bootstrap_project(Path("<project_root>"))
with KedroSession.create() as session:
    session.run(pipeline_name=<pipeline-name>, runner=ParallelRunner(max_workers=4))
英文:

One way that can work is by creating a kedro session to run your pipeline.

ref: https://docs.kedro.org/en/stable/kedro.framework.session.session.KedroSession.html#kedro-framework-session-session-kedrosession

from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from kedro.runner import ParallelRunner
from pathlib import Path

bootstrap_project(Path(&quot;&lt;project_root&gt;&quot;))
with KedroSession.create() as session:
    session.run(pipeline_name=&lt;pipeline-name&gt;, runner=ParallelRunner(max_workers=4))

huangapple
  • 本文由 发表于 2023年5月25日 05:52:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327634.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定