英文:
"RuntimeError: CustomJob resource has not been created" when creating Vertex AI CustomJob
问题
我尝试创建一个类似于 https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob 示例的 Vertex AI CustomJob。
import time
from google.cloud import aiplatform
worker_pool_specs = [
{
"machine_spec": {
"machine_type": "n1-standard-4",
},
"replica_count": 1,
"container_spec": {
"image_uri": "eu.gcr.io/somexistingimage",
"command": ["python", "myscript.py", "test", "--var"],
"args": [],
},
}
]
job = aiplatform.CustomJob(
display_name="job_{}".format(round(time.time())),
worker_pool_specs=worker_pool_specs,
project="my-project",
staging_bucket="gs://some-bucket",
)
现在,当我检查作业时,几乎所有字段(create_time、display_name、end_time等)都包含以下文本:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File ".../lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 686, in display_name
self._assert_gca_resource_is_available()
File ".../lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 1332, in _assert_gca_resource_is_available
raise RuntimeError(
RuntimeError: CustomJob resource has not been created.
环境:
python 3.9.16
google-cloud-aiplatform 1.28.1
我已登录,并且默认的应用程序身份验证已设置正确,因为我可以提交 CustomContainerTrainingJob
,但无法提交 CustomJob
。
我找不到关于这个错误的任何信息。我该如何修复这个问题?
英文:
I try to create a Vertex AI CustomJob similar to the example from https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob
import time
from google.cloud import aiplatform
worker_pool_specs = [
{
"machine_spec": {
"machine_type": "n1-standard-4",
},
"replica_count": 1,
"container_spec": {
"image_uri": "eu.gcr.io/somexistingimage",
"command": ["python", "myscript.py", "test", "--var"],
"args": [],
},
}
]
job = aiplatform.CustomJob(
display_name="job_{}".format(round(time.time())),
worker_pool_specs=worker_pool_specs,
project="my-project",
staging_bucket="gs://some-bucket",
)
Now when I inspect the job, practically all fields (create_time, display_name, end_time, ...) contain the following text:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "..../lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 686, in display_name
self._assert_gca_resource_is_available()
File "..../lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 1332, in _assert_gca_resource_is_available
raise RuntimeError(
RuntimeError: CustomJob resource has not been created.
Environment:
python 3.9.16
google-cloud-aiplatform 1.28.1
I'm logged in and default application auth is set correctly, as I can submit CustomContainerTrainingJob
s. Just not CustomJob
s.
I cannot find anything on this error. How can I fix this?
答案1
得分: 0
好的,解决方案非常简单:
在作业对象(例如job.display_name
)在作业已运行或提交之前,不应读取属性。
如果执行job.submit()
,然后可以在之后检查作业,或者如果运行job.run(sync=True)
。
如果运行job.run(sync=False)
,你会得到相同的错误,因为你永远不知道作业是否已完全初始化。
英文:
Ok, the solution is quite simple:
You must not read out the attributes of the Job object (e.g. job.display_name
) before the job has been run or submitted.
If you execute job.submit()
you can inspect the job afterwards, or if you run job.run(sync=True)
.
If you run job.run(sync=False)
you get the same error, because you never know if the Job has been fully initialized.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论