为什么在Dataflow管道中遇到了“同步pod错误”?

huangapple go评论97阅读模式
英文:

Why did I encounter an "Error syncing pod" with Dataflow pipeline?

问题

我在使用Dataflow管道时遇到了一个奇怪的错误,当我想要使用PyPI中的特定库时。我需要在ParDo中使用jsonschema,所以在我的requirements.txt文件中,我添加了jsonschema==3.2.0。我使用以下命令行启动我的管道:

python -m gcs_to_all \
    --runner DataflowRunner \
    --project <my-project-id> \
    --region europe-west1 \
    --temp_location gs://<my-bucket-name>/temp/ \
    --input_topic "projects/<my-project-id>/topics/<my-topic>" \
    --network=<my-network> \
    --subnetwork=<my-subnet> \
    --requirements_file=requirements.txt \
    --experiments=allow_non_updatable_job \
    --streaming  

在终端中,一切似乎都很好:

INFO:root:2020-01-03T09:18:35.569Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-4 in europe-west1-b.
INFO:root:2020-01-03T09:18:35.806Z: JOB_MESSAGE_WARNING: The network default doesn't have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: Firewall rules associated with your network don't open TCP ports 12345-12346 for Dataflow instances. If a firewall rule opens connection in these ports, ensure target tags aren't specified, or that the rule includes the tag 'dataflow'.
INFO:root:2020-01-03T09:18:48.549Z: JOB_MESSAGE_DETAILED: Workers have started successfully.

在Dataflow网页的日志标签中没有错误,但在stackdriver中有错误:

message: "Error syncing pod 6515c378c6bed37a2c0eec1fcfea300c ("<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"), skipping: [failed to "StartContainer" for "sdk0" with CrashLoopBackOff: "Back-off 10s restarting failed container=sdk0 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"".
message: ", failed to "StartContainer" for "sdk1" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk1 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)" 
message: ", failed to "StartContainer" for "sdk2" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk2 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)" 
message: ", failed to "StartContainer" for "sdk3" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk3 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)" 

我也在信息模式下找到了这个错误:

Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
  Installing build dependencies: started
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
  Installing build dependencies: finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python3 /usr/local/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-mdurhav9/overlay --no-warn-script-location --no-binary :none: --only-binary :none: --no-index --find-links /var/opt/google/staged -- 'setuptools>=40.6.0' wheel
       cwd: None
  Complete output (5 lines):
  Looking in links: /var/opt/google/staged
  Collecting setuptools>=40.6.0
  Collecting wheel
    ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
  ERROR: No matching distribution found for wheel

但我不知道为什么会出现这个依赖问题...您是否知道我如何调试这个问题?或者为什么我遇到了这个错误?谢谢!

英文:

I experiment a weird error with my Dataflow pipeline when I want to use specific library from PyPI.

I need jsonschema in a ParDo, so, in my requirements.txtfile, I added jsonschema==3.2.0.
I launch my pipeline with the command line below:

python -m gcs_to_all \
    --runner DataflowRunner \
    --project &lt;my-project-id&gt; \
    --region europe-west1 \
    --temp_location gs://&lt;my-bucket-name&gt;/temp/ \
    --input_topic &quot;projects/&lt;my-project-id&gt;/topics/&lt;my-topic&gt;&quot; \
    --network=&lt;my-network&gt; \
    --subnetwork=&lt;my-subnet&gt; \
    --requirements_file=requirements.txt \
    --experiments=allow_non_updatable_job \
    --streaming  

In the terminal, all seems to be good:

INFO:root:2020-01-03T09:18:35.569Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-4 in europe-west1-b.
INFO:root:2020-01-03T09:18:35.806Z: JOB_MESSAGE_WARNING: The network default doesn&#39;t have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag &#39;dataflow&#39; or empty target tags set apply. If you don&#39;t specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: Firewall rules associated with your network don&#39;t open TCP ports 12345-12346 for Dataflow instances. If a firewall rule opens connection in these ports, ensure target tags aren&#39;t specified, or that the rule includes the tag &#39;dataflow&#39;.
INFO:root:2020-01-03T09:18:48.549Z: JOB_MESSAGE_DETAILED: Workers have started successfully.

Where's no error in the log tab on Dataflow webpage, but in stackdriver:

message: &quot;Error syncing pod 6515c378c6bed37a2c0eec1fcfea300c (&quot;&lt;dataflow-id&gt;--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)&quot;), skipping: [failed to &quot;StartContainer&quot; for &quot;sdk0&quot; with CrashLoopBackOff: &quot;Back-off 10s restarting failed container=sdk0 pod=&lt;dataflow-id&gt;--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)&quot;&quot;
message: &quot;, failed to &quot;StartContainer&quot; for &quot;sdk1&quot; with CrashLoopBackOff: &quot;Back-off 5m0s restarting failed container=sdk1 pod=&lt;dataflow-id&gt;--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)&quot;&quot; 
message: &quot;, failed to &quot;StartContainer&quot; for &quot;sdk2&quot; with CrashLoopBackOff: &quot;Back-off 5m0s restarting failed container=sdk2 pod=&lt;dataflow-id&gt;--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)&quot;&quot; 
message: &quot;, failed to &quot;StartContainer&quot; for &quot;sdk3&quot; with CrashLoopBackOff: &quot;Back-off 5m0s restarting failed container=sdk3 pod=&lt;dataflow-id&gt;--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)&quot;&quot; 

I find this error too (in info mode):

Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
  Installing build dependencies: started
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
Looking in links: /var/opt/google/staged
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
  Installing build dependencies: started
  Installing build dependencies: finished with status &#39;error&#39;
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python3 /usr/local/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-mdurhav9/overlay --no-warn-script-location --no-binary :none: --only-binary :none: --no-index --find-links /var/opt/google/staged -- &#39;setuptools&gt;=40.6.0&#39; wheel
       cwd: None
  Complete output (5 lines):
  Looking in links: /var/opt/google/staged
  Collecting setuptools&gt;=40.6.0
  Collecting wheel
    ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
  ERROR: No matching distribution found for wheel

But I don't know why it can get this dependency...

Do you have any idea how I can debug this? or why I encounter this error?

Thanks

答案1

得分: 8

当Dataflow工作器启动时,它们执行以下几个步骤:

  1. requirements.txt安装包
  2. 安装指定为extra_packages的包
  3. 安装工作流tarball并执行setup.py中提供的操作。

出现Error syncing podCrashLoopBackOff消息可能与依赖冲突有关。您需要验证作业使用的库和版本是否存在冲突。请参考文档以获取流水线所需依赖项的详细信息。

此外,请查看预安装的依赖项以及此StackOverflow帖子

您可以尝试更改jsonschema的版本,然后再次运行它。如果这不起作用,请提供requirements.txt文件。

希望这对您有帮助。

英文:

When Dataflow workers start, they execute several steps:

  1. Install packages from requirements.txt
  2. Install packages specified as extra_packages
  3. Install workflow tarball and execute actions provided in setup.py.

Error syncing pod with CrashLoopBackOff message can be related to dependency conflict. You need to verify that there are no conflicts with the libraries and versions used for the job. Please refer to the documentation for staging required dependencies of the pipeline.

Also, take a look for preinstalled dependencies and this StackOverflow thread.

What you can try is change the version of jsonschema and try run it again. If it wouldn't help, please provide requirements.txt file.

I hope it will help you.

huangapple
  • 本文由 发表于 2020年1月3日 17:47:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/59576287.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定