2020年1月3日 17:47:16go评论187阅读模式

英文:

Why did I encounter an "Error syncing pod" with Dataflow pipeline?

问题

我在使用Dataflow管道时遇到了一个奇怪的错误，当我想要使用PyPI中的特定库时。我需要在ParDo中使用jsonschema，所以在我的requirements.txt文件中，我添加了jsonschema==3.2.0。我使用以下命令行启动我的管道：

python -m gcs_to_all \
    --runner DataflowRunner \
    --project <my-project-id> \
    --region europe-west1 \
    --temp_location gs://<my-bucket-name>/temp/ \
    --input_topic "projects/<my-project-id>/topics/<my-topic>" \
    --network=<my-network> \
    --subnetwork=<my-subnet> \
    --requirements_file=requirements.txt \
    --experiments=allow_non_updatable_job \
    --streaming

在终端中，一切似乎都很好：

INFO:root:2020-01-03T09:18:35.569Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-4 in europe-west1-b.
INFO:root:2020-01-03T09:18:35.806Z: JOB_MESSAGE_WARNING: The network default doesn't have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: Firewall rules associated with your network don't open TCP ports 12345-12346 for Dataflow instances. If a firewall rule opens connection in these ports, ensure target tags aren't specified, or that the rule includes the tag 'dataflow'.
INFO:root:2020-01-03T09:18:48.549Z: JOB_MESSAGE_DETAILED: Workers have started successfully.

在Dataflow网页的日志标签中没有错误，但在stackdriver中有错误：

message: "Error syncing pod 6515c378c6bed37a2c0eec1fcfea300c ("<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"), skipping: [failed to "StartContainer" for "sdk0" with CrashLoopBackOff: "Back-off 10s restarting failed container=sdk0 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"".
message: ", failed to "StartContainer" for "sdk1" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk1 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)" 
message: ", failed to "StartContainer" for "sdk2" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk2 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)" 
message: ", failed to "StartContainer" for "sdk3" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk3 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"

我也在信息模式下找到了这个错误：

Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
  Installing build dependencies: started
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
  Installing build dependencies: finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python3 /usr/local/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-mdurhav9/overlay --no-warn-script-location --no-binary :none: --only-binary :none: --no-index --find-links /var/opt/google/staged -- 'setuptools>=40.6.0' wheel
       cwd: None
  Complete output (5 lines):
  Looking in links: /var/opt/google/staged
  Collecting setuptools>=40.6.0
  Collecting wheel
    ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
  ERROR: No matching distribution found for wheel

但我不知道为什么会出现这个依赖问题...您是否知道我如何调试这个问题？或者为什么我遇到了这个错误？谢谢！

英文:

I experiment a weird error with my Dataflow pipeline when I want to use specific library from PyPI.

I need jsonschema in a ParDo, so, in my requirements.txtfile, I added jsonschema==3.2.0.
I launch my pipeline with the command line below:

python -m gcs_to_all \
    --runner DataflowRunner \
    --project &lt;my-project-id&gt; \
    --region europe-west1 \
    --temp_location gs://&lt;my-bucket-name&gt;/temp/ \
    --input_topic &quot;projects/&lt;my-project-id&gt;/topics/&lt;my-topic&gt;&quot; \
    --network=&lt;my-network&gt; \
    --subnetwork=&lt;my-subnet&gt; \
    --requirements_file=requirements.txt \
    --experiments=allow_non_updatable_job \
    --streaming

In the terminal, all seems to be good:

INFO:root:2020-01-03T09:18:35.569Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-4 in europe-west1-b.
INFO:root:2020-01-03T09:18:35.806Z: JOB_MESSAGE_WARNING: The network default doesn&#39;t have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag &#39;dataflow&#39; or empty target tags set apply. If you don&#39;t specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: Firewall rules associated with your network don&#39;t open TCP ports 12345-12346 for Dataflow instances. If a firewall rule opens connection in these ports, ensure target tags aren&#39;t specified, or that the rule includes the tag &#39;dataflow&#39;.
INFO:root:2020-01-03T09:18:48.549Z: JOB_MESSAGE_DETAILED: Workers have started successfully.

Where's no error in the log tab on Dataflow webpage, but in stackdriver:

message: &quot;Error syncing pod 6515c378c6bed37a2c0eec1fcfea300c (&quot;&lt;dataflow-id&gt;--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)&quot;), skipping: [failed to &quot;StartContainer&quot; for &quot;sdk0&quot; with CrashLoopBackOff: &quot;Back-off 10s restarting failed container=sdk0 pod=&lt;dataflow-id&gt;--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)&quot;&quot;
message: &quot;, failed to &quot;StartContainer&quot; for &quot;sdk1&quot; with CrashLoopBackOff: &quot;Back-off 5m0s restarting failed container=sdk1 pod=&lt;dataflow-id&gt;--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)&quot;&quot; 
message: &quot;, failed to &quot;StartContainer&quot; for &quot;sdk2&quot; with CrashLoopBackOff: &quot;Back-off 5m0s restarting failed container=sdk2 pod=&lt;dataflow-id&gt;--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)&quot;&quot; 
message: &quot;, failed to &quot;StartContainer&quot; for &quot;sdk3&quot; with CrashLoopBackOff: &quot;Back-off 5m0s restarting failed container=sdk3 pod=&lt;dataflow-id&gt;--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)&quot;&quot;

I find this error too (in info mode):

Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
  Installing build dependencies: started
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
  Installing build dependencies: finished with status &#39;error&#39;
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python3 /usr/local/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-mdurhav9/overlay --no-warn-script-location --no-binary :none: --only-binary :none: --no-index --find-links /var/opt/google/staged -- &#39;setuptools&gt;=40.6.0&#39; wheel
       cwd: None
  Complete output (5 lines):
  Looking in links: /var/opt/google/staged
  Collecting setuptools&gt;=40.6.0
  Collecting wheel
    ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
  ERROR: No matching distribution found for wheel

But I don't know why it can get this dependency...

Do you have any idea how I can debug this? or why I encounter this error?

Thanks

答案1

得分: 8

当Dataflow工作器启动时，它们执行以下几个步骤：

从requirements.txt安装包
安装指定为extra_packages的包
安装工作流tarball并执行setup.py中提供的操作。

出现Error syncing pod和CrashLoopBackOff消息可能与依赖冲突有关。您需要验证作业使用的库和版本是否存在冲突。请参考文档以获取流水线所需依赖项的详细信息。

此外，请查看预安装的依赖项以及此StackOverflow帖子。

您可以尝试更改jsonschema的版本，然后再次运行它。如果这不起作用，请提供requirements.txt文件。

希望这对您有帮助。

英文:

When Dataflow workers start, they execute several steps:

Install packages from requirements.txt
Install packages specified as extra_packages
Install workflow tarball and execute actions provided in setup.py.

Error syncing pod with CrashLoopBackOff message can be related to dependency conflict. You need to verify that there are no conflicts with the libraries and versions used for the job. Please refer to the documentation for staging required dependencies of the pipeline.

Also, take a look for preinstalled dependencies and this StackOverflow thread.

What you can try is change the version of jsonschema and try run it again. If it wouldn't help, please provide requirements.txt file.

I hope it will help you.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

为什么在Dataflow管道中遇到了“同步pod错误”？

问题

答案1

寻找一个快速的优化算法来解决具有唯一正解的非线性方程。

在使用Python插入完全二叉树中的节点。

翻译好的部分：颠倒对方

Python多进程为什么不能将处理时间减少到4核CPU的1/4？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论