英文:
Vertex AI custom job to run python-module with pre-built containers (using gcloud CLI)
问题
I am updating a model that is previously running on gcp ai-platform
to vertex ai
[1, 2].
The settings that I am looking for are as below.
Vertex AI custom job
withpre-built containers
(usinggcloud CLI)
- to run a custom
python-module
which contains the code of the training phase of our model
Can someone help me if there is something wrong with the below sequence of the task?
It does not seem the python module is the cause of the problem since it is the same code that is currently running well with ai-platform
.
Python3 module packaging
# simplified python module structure
# ./vertex-ai-poc
# ├── __init__.py
# ├── trainer
# │ ├── __init__.py
# │ └── task.py
# └── setup.py
python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar
# -> dist generated
gsutil cp dist/trainer-0.2.tar.gz gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz
# -> uploaded correctly
Submit Custom Job
gcloud ai custom-jobs create \
--region us-central1 \
--display-name=vertex-ai-poc \
--project=[PROJECT_ID] \
--python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz' \
--worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task
However, I am encountering the below errors.
Error Messages
> file:///user_dir/trainer-0.2.tar.gz does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.
c.f. I am noticing file:///
with 3 slashes. And believe there is something to do with docker. [3]
References
英文:
I am updating a model that is previously running on gcp ai-platform
to vertex ai
[1, 2].
The settings that I am looking for are as below.
Vertex AI custom job
withpre-built containers
(usinggcloud CLI)
- to run a custom
python-module
which contains the code of the training phase of our model
Can someone help me if there is something wrong with the below sequence of the task?
It does not seem the python module is the cause of the problem since it is the same code that is currently running well with ai-platform
.
Python3 module packaging
# simplified python module structure
# ./vertex-ai-poc
# ├── __init__.py
# ├── trainer
# │ ├── __init__.py
# │ └── task.py
# └── setup.py
python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar
# -> dist generated
gsutil cp dist/trainer-0.2.tar.gz gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz
# -> uploaded correctly
Submit Custom Job
gcloud ai custom-jobs create \
--region us-central1 \
--display-name=vertex-ai-poc \
--project=[PROJECT_ID] \
--python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz' \
--worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task
However, I am encountering the below errors.
Error Messages
> file:///user_dir/trainer-0.2.tar.gz does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.
c.f. I am noticing file:///
with 3 slashes. And belive there is something to do with docker. [3]
References
答案1
得分: 1
我解决了这个问题。我将为您提供类似错误的情况。问题是我没有正确使用find_packages()
。
首先,有三种提交自定义 Vertex AI 作业的可能方式。
- 自动打包
- 无自动打包 - 自定义容器镜像
- 无自动打包 - Python 应用程序
- 使用
local-package-path
参数 - 使用
--python-package-uris
标志
- 使用
(我相信) 方法 1、2 和 3.1 在本地机器上构建 Docker 镜像,然后将构建的镜像提交给 vertex-ai
。方法 3.2 简单地使用预构建容器,并将 Python 包组合到 vertex-ai
中的 executor-image-uri
。
问题是,当我运行下面的命令来生成分发包时,我是从 ../..
目录下的 ./[PATH]/.
运行的,结果没有正确获取 find_packages()
的值,导致 3.1 和 3.2 方法都无法正确运行。
# 错误: python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar`
python3 ./setup.py sdist --formats=gztar`
from setuptools import find_packages, setup
setup(
name='trainer',
version='0.1',
packages=find_packages(), # <-- 这里
include_package_data=True,
)
修复后的 local-package
和外部 URI 版本可以使下面的脚本工作。
3.1 无自动打包 - Python 应用程序 - 使用 local-package-path
参数
gcloud ai custom-jobs create \
--region us-central1 \
--display-name=vertex-ai-poc \
--project=[PROJECT_ID] \
--worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',script=task.py,local-package-path=vertex-ai-poc/trainer
3.2 无自动打包 - Python 应用程序 - 使用 --python-package-uris
标志
gcloud ai custom-jobs create \
--region us-central1 \
--display-name=vertex-ai-poc \
--project=[PROJECT_ID] \
--python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.1.tar.gz' \
--worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task
结果
英文:
I end up fixing the problem. I'll share the situation for those of you with a similar error. The problem was that I wasn't using find_packages()
correctly.
First, there are three possible ways of submitting custom vertex-ai jobs.
- auto packaging
- without auto packaging - Custom container image
- without auto packaging - Python App
- using
local-package-path
param - using
--python-package-uris
flag
- using
(I believe) Method 1, 2, and 3.1 build docker images in the local machine and submit the built image to vertex-ai
. Method 3.2 simply uses a pre-built container and combines python packages at executor-image-uri
in vertex-ai
.
** The problem was that when I run the below command to generate the dist package, I ran it from ../..
with ./[PATH]/.
and ended up not correctly getting the find_packages()
values which lead to both 3.1 and 3.2 methods not correctly running.
# Error: python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar`
python3 ./setup.py sdist --formats=gztar`
from setuptools import find_packages, setup
setup(
name='trainer',
version='0.1',
packages=find_packages(), # <-- HERE
include_package_data=True,
)
The fixed version of local-package and external uris end up making the below script work.
3.1 without auto packaging - Python App - using local-package-path
param
gcloud ai custom-jobs create \
--region us-central1 \
--display-name=vertex-ai-poc \
--project=[PROJECT_ID] \
--worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',script=task.py,local-package-path=vertex-ai-poc/trainer
3.2 Without auto packaging - Python App - using --python-package-uris
flag
gcloud ai custom-jobs create \
--region us-central1 \
--display-name=vertex-ai-poc \
--project=[PROJECT_ID] \
--python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.1.tar.gz' \
--worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task
Results
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论