在1个线程中的tpu_cluster_resolver上资源项目None的权限被拒绝。

huangapple go评论73阅读模式
英文:

Permission denied on resource project None on 1 thread only in tpu_cluster_resolver

问题

我正在使用云TPU从Compute Engine上运行BERT预训练代码。

每次运行时,我在一个线程上收到这个错误,但训练正常继续。

我在Google Colab的TPU上运行相同的代码,它正常工作。

对于tpu_cluster_resolver,我传递了TPU实例的IP地址,我也尝试传递相同结果的区域和项目名称。

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/distribute/cluster_resolver/tpu_cluster_resolver.py", line 476, in _fetch_cloud_tpu_metadata
    return request.execute()
  File "/usr/local/lib/python3.5/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
    return wrapped(*args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/googleapiclient/http.py", line 856, in execute
    raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://tpu.googleapis.com/v1/projects/None/locations/None/nodes/xxxxxx:8470?alt=json returned "Permission denied on resource project None.". Details: "[{'links': [{'url': 'https://console.developers.google.com/project/None/apiui/credential', 'description': 'Google developer console API key'}], '@type': 'type.googleapis.com/google.rpc.Help'}]">

在处理上述异常时,发生了另一个异常:

Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/tpu/preempted_hook.py", line 87, in run
    response = self._cluster._fetch_cloud_tpu_metadata()  # pylint: disable=protected-access
  File "/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/distribute/cluster_resolver/tpu_cluster_resolver.py", line 480, in _fetch_cloud_tpu_metadata
    "constructor. Exception: %s" % (self._tpu, e))
ValueError: Could not lookup TPU metadata from name 'b'xxxxxxxx:8470''. Please doublecheck the tpu argument in the TPUClusterResolver constructor. Exception: <HttpError 403 when requesting https://tpu.googleapis.com/v1/projects/None/locations/None/nodes/xxxxxx:8470?alt=json returned "Permission denied on resource project None.". Details: "[{'links': [{'url': 'https://console.developers.google.com/project/None/apiui/credential', 'description': 'Google developer console API key'}], '@type': 'type.googleapis.com/google.rpc.Help'}]">```

<details>
<summary>英文:</summary>

I&#39;m running an BERT pretraining code on cloud TPUs from the Compute Engine.

Each time i run it, i get this error on 1 thread but the training continues normally.

I ran the same code on google Colab TPUs and it worked fine.

for the tpu_cluster_resolver im passing the IP address for the TPU instance, i also tried passing the zone and project name with the same results

Exception in thread Thread-5:
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/distribute/cluster_resolver/tpu_cluster_resolver.py", line 476, in _fetch_cloud_tpu_metadata
return request.execute()
File "/usr/local/lib/python3.5/dist-packages/googleapiclient/_helpers.py", line 130, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/googleapiclient/http.py", line 856, in execute
raise HttpError(resp, content, uri=self.uri)
googleapiclient.errors.HttpError: <HttpError 403 when requesting https://tpu.googleapis.com/v1/projects/None/locations/None/nodes/xxxxxx:8470?alt=json returned "Permission denied on resource project None.". Details: "[{'links': [{'url': 'https://console.developers.google.com/project/None/apiui/credential', 'description': 'Google developer console API key'}], '@type': 'type.googleapis.com/google.rpc.Help'}]">

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/tpu/preempted_hook.py", line 87, in run
response = self._cluster._fetch_cloud_tpu_metadata() # pylint: disable=protected-access
File "/usr/local/lib/python3.5/dist-packages/tensorflow_core/python/distribute/cluster_resolver/tpu_cluster_resolver.py", line 480, in _fetch_cloud_tpu_metadata
"constructor. Exception: %s" % (self._tpu, e))
ValueError: Could not lookup TPU metadata from name 'b'xxxxxxxx:8470''. Please doublecheck the tpu argument in the TPUClusterResolver constructor. Exception: <HttpError 403 when requesting https://tpu.googleapis.com/v1/projects/None/locations/None/nodes/xxxxxx:8470?alt=json returned "Permission denied on resource project None.". Details: "[{'links': [{'url': 'https://console.developers.google.com/project/None/apiui/credential', 'description': 'Google developer console API key'}], '@type': 'type.googleapis.com/google.rpc.Help'}]">


</details>


# 答案1
**得分**: 1

很难在没有看到代码的情况下知道。

通过查看错误信息 "Permission denied on resource project None.",我建议您在 [TPUClusterResolver][1] 中添加参数 "project",并将其设置为您的 Google Cloud 项目名称,因为它似乎填写为 "None"。

[1]: https://www.tensorflow.org/api_docs/python/tf/distribute/cluster_resolver/TPUClusterResolver?version=stable

<details>
<summary>英文:</summary>

Difficult to know without seeing the code.

By looking at the error &quot;Permission denied on resource project None.&quot;, I would suggest you to add in the [TPUClusterResolver][1] the argument &quot;project&quot; with your Google Cloud project name as it seems to be filled with &quot;None&quot;.



  [1]: https://www.tensorflow.org/api_docs/python/tf/distribute/cluster_resolver/TPUClusterResolver?version=stable

</details>



huangapple
  • 本文由 发表于 2020年1月4日 01:17:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/59582690.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定