2023年5月10日 18:24:22go评论57阅读模式

英文:

Vertex AI - RuntimeError: Job failed with: code: 13 message: "Internal error encountered. Please try again"

问题

I am trying to run a Vertex AI Pipeline.**

The pipeline is successfully created PipelineJob created. Resource name: XXX

then I am getting a PipelineState.PIPELINE_STATE_PENDING multiple times until it crashes with this error:

Traceback (most recent call last):
  File &quot;/src/pipelines/build_model/pipeline_run.py&quot;, line 288, in &lt;module&gt;
    cli()
  File &quot;/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py&quot;, line 1130, in __call__
    return this part is code and should not be translated

This pipeline currently works in a dev gcp project, it automatically gets into a RUNNING state.

I have this issue when I try to make it work in another gcp project. I have reproduced the same steps (API enabled, service account created, same rights, same location), in my code I just change the project_id and credentials.

I have tried to change the location to check it is not due to a lack of resource on Google's side. Also checked a really simple Hello World Pipeline and can't make the Pipeline go into the Running state.

I also have checked Cloud logging but can't find anything useful.

Any ideas? Thanks

英文:

I am trying to run a Vertex AI Pipeline.

The pipeline is successfully created PipelineJob created. Resource name: XXX

then i am getting a PipelineState.PIPELINE_STATE_PENDING multiples times until it crashes with this error :

Traceback (most recent call last):
  File &quot;/src/pipelines/build_model/pipeline_run.py&quot;, line 288, in &lt;module&gt;
    cli()
  File &quot;/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py&quot;, line 1130, in __call__
    return self.main(*args, **kwargs)
  File &quot;/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py&quot;, line 1055, in main
    rv = self.invoke(ctx)
  File &quot;/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py&quot;, line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File &quot;/opt/pysetup/.venv/lib/python3.9/site-packages/click/core.py&quot;, line 760, in invoke
    return __callback(*args, **kwargs)
  File &quot;/src/pipelines/build_model/pipeline_run.py&quot;, line 284, in cli
    job.run()
  File &quot;/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py&quot;, line 314, in run
    self._run(
  File &quot;/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/base.py&quot;, line 810, in wrapper
    return method(*args, **kwargs)
  File &quot;/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py&quot;, line 351, in _run
    self._block_until_complete()
  File &quot;/opt/pysetup/.venv/lib/python3.9/site-packages/google/cloud/aiplatform/pipeline_jobs.py&quot;, line 499, in _block_until_complete
    raise RuntimeError(&quot;Job failed with:\n%s&quot; % self._gca_resource.error)
RuntimeError: Job failed with:
code: 13
message: &quot;Internal error encountered. Please try again&quot;

This pipeline currently works in a dev gcp project, it automatically get into a RUNNING state.

I have this issue when i try to make it works in another gcp project. I have reproduced the same step (API enabled, service account created, same rights, same location), in my code i just change the project_id and credentials.

I have tried to change the location to check it is not due to a lack of ressource on google side. Also checked a really simple Hello World Pipeline and can't make the Pipeline go into the Running state.

I also have checked Cloud logging but can't find anything useful.

Any ideas? Thanks

答案1

得分: 1

我终于找出了缺少的东西。这些是一些IAM权限（涉及到我的情况下的Cloud Storage和BigQuery）。

英文:

I finally found out what was missing. It was some IAM permissions (concerning Cloud Storage and Bigquery in my case)

答案2

得分: 0

我在使用一个与我的管道运行所在的区域不同的 GCS 存储桶时遇到了这个错误。

英文:

I got this error using a GCS bucket in a different region than the region my pipeline ran in.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Vertex AI – RuntimeError: Job failed with: code: 13 message: "Internal error encountered. Please try again"

问题

答案1

答案2

gRPC Unimplemented在使用JobServiceClient在Vertex AI中创建自定义作业时引发的异常。

什么是训练中加载数据的最有效方式？

Notebooks in Vertex AI

获取Google的Vertex AI服务的授权令牌？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论