Vertex AI自定义作业运行Python模块,使用预构建的容器(使用gcloud CLI)。

huangapple go评论132阅读模式
英文:

Vertex AI custom job to run python-module with pre-built containers (using gcloud CLI)

问题

I am updating a model that is previously running on gcp ai-platform to vertex ai [1, 2].

The settings that I am looking for are as below.

  • Vertex AI custom job with pre-built containers (using gcloud CLI)
  • to run a custom python-module which contains the code of the training phase of our model

Can someone help me if there is something wrong with the below sequence of the task?

It does not seem the python module is the cause of the problem since it is the same code that is currently running well with ai-platform.

Python3 module packaging

# simplified python module structure
# ./vertex-ai-poc
# ├── __init__.py
# ├── trainer
# │   ├── __init__.py
# │   └── task.py
# └── setup.py

python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar
# -> dist generated

gsutil cp dist/trainer-0.2.tar.gz gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz
# -> uploaded correctly

Submit Custom Job

gcloud ai custom-jobs create \
    --region us-central1 \
    --display-name=vertex-ai-poc \
    --project=[PROJECT_ID] \
    --python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz' \
    --worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task

However, I am encountering the below errors.

Error Messages

> file:///user_dir/trainer-0.2.tar.gz does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.

c.f. I am noticing file:/// with 3 slashes. And believe there is something to do with docker. [3]

Vertex AI自定义作业运行Python模块,使用预构建的容器(使用gcloud CLI)。

Vertex AI自定义作业运行Python模块,使用预构建的容器(使用gcloud CLI)。

References

英文:

I am updating a model that is previously running on gcp ai-platform to vertex ai [1, 2].

The settings that I am looking for are as below.

  • Vertex AI custom job with pre-built containers (using gcloud CLI)
  • to run a custom python-module which contains the code of the training phase of our model

Can someone help me if there is something wrong with the below sequence of the task?

It does not seem the python module is the cause of the problem since it is the same code that is currently running well with ai-platform.

Python3 module packaging

# simplified python module structure
# ./vertex-ai-poc
# ├── __init__.py
# ├── trainer
# │   ├── __init__.py
# │   └── task.py
# └── setup.py

python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar
# -> dist generated

gsutil cp dist/trainer-0.2.tar.gz gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz
# -> uploaded correctly

Submit Custom Job

gcloud ai custom-jobs create \
    --region us-central1 \
    --display-name=vertex-ai-poc \
    --project=[PROJECT_ID] \
    --python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.2.tar.gz' \
    --worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task

However, I am encountering the below errors.

Error Messages

> file:///user_dir/trainer-0.2.tar.gz does not appear to be a Python project: neither 'setup.py' nor 'pyproject.toml' found.

c.f. I am noticing file:/// with 3 slashes. And belive there is something to do with docker. [3]

Vertex AI自定义作业运行Python模块,使用预构建的容器(使用gcloud CLI)。

Vertex AI自定义作业运行Python模块,使用预构建的容器(使用gcloud CLI)。

References

答案1

得分: 1

我解决了这个问题。我将为您提供类似错误的情况。问题是我没有正确使用find_packages()

首先,有三种提交自定义 Vertex AI 作业的可能方式。

  1. 自动打包
  2. 无自动打包 - 自定义容器镜像
  3. 无自动打包 - Python 应用程序
    1. 使用 local-package-path 参数
    2. 使用 --python-package-uris 标志

(我相信) 方法 1、2 和 3.1 在本地机器上构建 Docker 镜像,然后将构建的镜像提交给 vertex-ai。方法 3.2 简单地使用预构建容器,并将 Python 包组合到 vertex-ai 中的 executor-image-uri

问题是,当我运行下面的命令来生成分发包时,我是从 ../.. 目录下的 ./[PATH]/. 运行的,结果没有正确获取 find_packages() 的值,导致 3.1 和 3.2 方法都无法正确运行。

# 错误: python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar`
python3 ./setup.py sdist --formats=gztar`
from setuptools import find_packages, setup

setup(
    name='trainer',
    version='0.1',
    packages=find_packages(),  # <-- 这里
    include_package_data=True,
)

修复后的 local-package 和外部 URI 版本可以使下面的脚本工作。

3.1 无自动打包 - Python 应用程序 - 使用 local-package-path 参数
gcloud ai custom-jobs create \
    --region us-central1 \
    --display-name=vertex-ai-poc \
    --project=[PROJECT_ID] \
    --worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',script=task.py,local-package-path=vertex-ai-poc/trainer
3.2 无自动打包 - Python 应用程序 - 使用 --python-package-uris 标志
gcloud ai custom-jobs create \
    --region us-central1 \
    --display-name=vertex-ai-poc \
    --project=[PROJECT_ID] \
    --python-package-uris='gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.1.tar.gz' \
    --worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri='us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest',python-module=trainer.task
结果

Vertex AI自定义作业运行Python模块,使用预构建的容器(使用gcloud CLI)。

英文:

I end up fixing the problem. I'll share the situation for those of you with a similar error. The problem was that I wasn't using find_packages() correctly.

First, there are three possible ways of submitting custom vertex-ai jobs.

  1. auto packaging
  2. without auto packaging - Custom container image
  3. without auto packaging - Python App
    1. using local-package-path param
    2. using --python-package-uris flag

(I believe) Method 1, 2, and 3.1 build docker images in the local machine and submit the built image to vertex-ai. Method 3.2 simply uses a pre-built container and combines python packages at executor-image-uri in vertex-ai.

** The problem was that when I run the below command to generate the dist package, I ran it from ../.. with ./[PATH]/. and ended up not correctly getting the find_packages() values which lead to both 3.1 and 3.2 methods not correctly running.

# Error: python3 ./[PATH]/vertex-ai-poc/setup.py sdist --formats=gztar`
python3 ./setup.py sdist --formats=gztar`
from setuptools import find_packages, setup

setup(
    name=&#39;trainer&#39;,
    version=&#39;0.1&#39;,
    packages=find_packages(),  # &lt;-- HERE
    include_package_data=True,
)

The fixed version of local-package and external uris end up making the below script work.

3.1 without auto packaging - Python App - using local-package-path param
gcloud ai custom-jobs create \
    --region us-central1 \
    --display-name=vertex-ai-poc \
    --project=[PROJECT_ID] \
    --worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri=&#39;us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest&#39;,script=task.py,local-package-path=vertex-ai-poc/trainer
3.2 Without auto packaging - Python App - using --python-package-uris flag
gcloud ai custom-jobs create \
    --region us-central1 \
    --display-name=vertex-ai-poc \
    --project=[PROJECT_ID] \
    --python-package-uris=&#39;gs://[PROJECT_ID]/vertex-ai-poc/trainer-0.1.tar.gz&#39; \
    --worker-pool-spec=machine-type=e2-standard-4,replica-count=1,executor-image-uri=&#39;us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-7:latest&#39;,python-module=trainer.task
Results

Vertex AI自定义作业运行Python模块,使用预构建的容器(使用gcloud CLI)。

huangapple
  • 本文由 发表于 2023年3月15日 18:01:42
  • 转载请务必保留本文链接:https://go.coder-hub.com/75743143.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定