英文:
Vertex AI - RuntimeError: Job failed with: code: 13 message: "Internal error encountered. Please try again"
问题
I am trying to run a Vertex AI Pipeline.**
The pipeline is successfully created PipelineJob created. Resource name: XXX
then I am getting a PipelineState.PIPELINE_STATE_PENDING
multiple times until it crashes with this error:
Traceback (most recent call last):
File "/src/pipelines/build_model/pipeline_run.py", line 288, in <module>
cli()
File "/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return this part is code and should not be translated
This pipeline currently works in a dev
gcp project, it automatically gets into a RUNNING
state.
I have this issue when I try to make it work in another gcp project. I have reproduced the same steps (API enabled, service account created, same rights, same location), in my code I just change the project_id and credentials.
I have tried to change the location to check it is not due to a lack of resource on Google's side. Also checked a really simple Hello World Pipeline and can't make the Pipeline go into the Running state.
I also have checked Cloud logging but can't find anything useful.
Any ideas? Thanks
英文:
I am trying to run a Vertex AI Pipeline.
The pipeline is successfully created PipelineJob created. Resource name: XXX
then i am getting a PipelineState.PIPELINE_STATE_PENDING
multiples times until it crashes with this error :
Traceback (most recent call last):
File "/src/pipelines/build_model/pipeline_run.py", line 288, in <module>
cli()
File "/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/src/pipelines/build_model/pipeline_run.py", line 284, in cli
job.run()
File "/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py", line 314, in run
self._run(
File "/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/base.py", line 810, in wrapper
return method(*args, **kwargs)
File "/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py", line 351, in _run
self._block_until_complete()
File "/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py", line 499, in _block_until_complete
raise RuntimeError("Job failed with:\n%s" % self._gca_resource.error)
RuntimeError: Job failed with:
code: 13
message: "Internal error encountered. Please try again"
This pipeline currently works in a dev
gcp project, it automatically get into a RUNNING
state.
I have this issue when i try to make it works in another gcp project. I have reproduced the same step (API enabled, service account created, same rights, same location), in my code i just change the project_id and credentials.
I have tried to change the location to check it is not due to a lack of ressource on google side. Also checked a really simple Hello World Pipeline and can't make the Pipeline go into the Running state.
I also have checked Cloud logging but can't find anything useful.
Any ideas? Thanks
答案1
得分: 1
我终于找出了缺少的东西。这些是一些IAM权限(涉及到我的情况下的Cloud Storage和BigQuery)。
英文:
I finally found out what was missing. It was some IAM permissions (concerning Cloud Storage and Bigquery in my case)
答案2
得分: 0
我在使用一个与我的管道运行所在的区域不同的 GCS 存储桶时遇到了这个错误。
英文:
I got this error using a GCS bucket in a different region than the region my pipeline ran in.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论