ML数据流和DataflowPythonJobOp是否支持自定义Docker容器?

huangapple go评论74阅读模式
英文:

ML pipeline for dataflow and DataflowPythonJobOp's support custom docker containers?

问题

我使用自定义的 Docker 容器来运行数据流作业。我想将其与我的 TPU 训练作业等一起串联起来,所以我考虑在 Vertex AI 上运行 Kubeflow 流水线。这是否是一个明智的想法?(似乎有许多替代方案,如 Airflow 等。)

特别是,我必须在流水线中使用 DataflowPythonJobOp 吗?它似乎不支持自定义的工作器镜像。我假设我可以只有一个小型机器,用于启动数据流水线并保持空闲状态(除了写一些日志),直到数据流水线完成。

英文:

I use customer docker containers to run dataflow jobs. I want to chain it together with my tpu training job etc, so I'm considering running kubeflow pipeline on vertex ai. Is this a sensible idea? (There seems to be many alternatives like airflow etc.)

In particular, must I use DataflowPythonJobOp in the pipeline? It does not seem to support custom worker images. I assume I can just have one small machine, which launches the dataflow pipeline and stays idle (besides writing some logs) until the dataflow pipeline finishes?

答案1

得分: 1

你尝试过使用https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.0.0/api/v1/dataflow.html#v1.dataflow.DataflowPythonJobOp.args传递自定义容器参数吗?

英文:

Have you tried to pass the custom container args with https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.0.0/api/v1/dataflow.html#v1.dataflow.DataflowPythonJobOp.args?

huangapple
  • 本文由 发表于 2023年6月29日 03:37:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76576228.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定