英文:
ML pipeline for dataflow and DataflowPythonJobOp's support custom docker containers?
问题
我使用自定义的 Docker 容器来运行数据流作业。我想将其与我的 TPU 训练作业等一起串联起来,所以我考虑在 Vertex AI 上运行 Kubeflow 流水线。这是否是一个明智的想法?(似乎有许多替代方案,如 Airflow 等。)
特别是,我必须在流水线中使用 DataflowPythonJobOp 吗?它似乎不支持自定义的工作器镜像。我假设我可以只有一个小型机器,用于启动数据流水线并保持空闲状态(除了写一些日志),直到数据流水线完成。
英文:
I use customer docker containers to run dataflow jobs. I want to chain it together with my tpu training job etc, so I'm considering running kubeflow pipeline on vertex ai. Is this a sensible idea? (There seems to be many alternatives like airflow etc.)
In particular, must I use DataflowPythonJobOp in the pipeline? It does not seem to support custom worker images. I assume I can just have one small machine, which launches the dataflow pipeline and stays idle (besides writing some logs) until the dataflow pipeline finishes?
答案1
得分: 1
你尝试过使用https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.0.0/api/v1/dataflow.html#v1.dataflow.DataflowPythonJobOp.args传递自定义容器参数吗?
英文:
Have you tried to pass the custom container args with https://google-cloud-pipeline-components.readthedocs.io/en/google-cloud-pipeline-components-2.0.0/api/v1/dataflow.html#v1.dataflow.DataflowPythonJobOp.args?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论