英文:
Why did I encounter an "Error syncing pod" with Dataflow pipeline?
问题
我在使用Dataflow管道时遇到了一个奇怪的错误,当我想要使用PyPI中的特定库时。我需要在ParDo中使用jsonschema
,所以在我的requirements.txt
文件中,我添加了jsonschema==3.2.0
。我使用以下命令行启动我的管道:
python -m gcs_to_all \
--runner DataflowRunner \
--project <my-project-id> \
--region europe-west1 \
--temp_location gs://<my-bucket-name>/temp/ \
--input_topic "projects/<my-project-id>/topics/<my-topic>" \
--network=<my-network> \
--subnetwork=<my-subnet> \
--requirements_file=requirements.txt \
--experiments=allow_non_updatable_job \
--streaming
在终端中,一切似乎都很好:
INFO:root:2020-01-03T09:18:35.569Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-4 in europe-west1-b.
INFO:root:2020-01-03T09:18:35.806Z: JOB_MESSAGE_WARNING: The network default doesn't have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: Firewall rules associated with your network don't open TCP ports 12345-12346 for Dataflow instances. If a firewall rule opens connection in these ports, ensure target tags aren't specified, or that the rule includes the tag 'dataflow'.
INFO:root:2020-01-03T09:18:48.549Z: JOB_MESSAGE_DETAILED: Workers have started successfully.
在Dataflow网页的日志标签中没有错误,但在stackdriver中有错误:
message: "Error syncing pod 6515c378c6bed37a2c0eec1fcfea300c ("<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"), skipping: [failed to "StartContainer" for "sdk0" with CrashLoopBackOff: "Back-off 10s restarting failed container=sdk0 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"".
message: ", failed to "StartContainer" for "sdk1" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk1 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"
message: ", failed to "StartContainer" for "sdk2" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk2 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"
message: ", failed to "StartContainer" for "sdk3" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk3 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"
我也在信息模式下找到了这个错误:
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
Installing build dependencies: started
Looking in links: /var/opt/google/staged
Installing build dependencies: started
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
Installing build dependencies: started
Looking in links: /var/opt/google/staged
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
Installing build dependencies: started
Installing build dependencies: finished with status 'error'
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3 /usr/local/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-mdurhav9/overlay --no-warn-script-location --no-binary :none: --only-binary :none: --no-index --find-links /var/opt/google/staged -- 'setuptools>=40.6.0' wheel
cwd: None
Complete output (5 lines):
Looking in links: /var/opt/google/staged
Collecting setuptools>=40.6.0
Collecting wheel
ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
ERROR: No matching distribution found for wheel
但我不知道为什么会出现这个依赖问题...您是否知道我如何调试这个问题?或者为什么我遇到了这个错误?谢谢!
英文:
I experiment a weird error with my Dataflow pipeline when I want to use specific library from PyPI.
I need jsonschema
in a ParDo, so, in my requirements.txt
file, I added jsonschema==3.2.0
.
I launch my pipeline with the command line below:
python -m gcs_to_all \
--runner DataflowRunner \
--project <my-project-id> \
--region europe-west1 \
--temp_location gs://<my-bucket-name>/temp/ \
--input_topic "projects/<my-project-id>/topics/<my-topic>" \
--network=<my-network> \
--subnetwork=<my-subnet> \
--requirements_file=requirements.txt \
--experiments=allow_non_updatable_job \
--streaming
In the terminal, all seems to be good:
INFO:root:2020-01-03T09:18:35.569Z: JOB_MESSAGE_BASIC: Worker configuration: n1-standard-4 in europe-west1-b.
INFO:root:2020-01-03T09:18:35.806Z: JOB_MESSAGE_WARNING: The network default doesn't have rules that open TCP ports 12345-12346 for internal connection with other VMs. Only rules with a target tag 'dataflow' or empty target tags set apply. If you don't specify such a rule, any pipeline with more than one worker that shuffles data will hang. Causes: Firewall rules associated with your network don't open TCP ports 12345-12346 for Dataflow instances. If a firewall rule opens connection in these ports, ensure target tags aren't specified, or that the rule includes the tag 'dataflow'.
INFO:root:2020-01-03T09:18:48.549Z: JOB_MESSAGE_DETAILED: Workers have started successfully.
Where's no error in the log tab on Dataflow webpage, but in stackdriver:
message: "Error syncing pod 6515c378c6bed37a2c0eec1fcfea300c ("<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)"), skipping: [failed to "StartContainer" for "sdk0" with CrashLoopBackOff: "Back-off 10s restarting failed container=sdk0 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)""
message: ", failed to "StartContainer" for "sdk1" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk1 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)""
message: ", failed to "StartContainer" for "sdk2" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk2 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)""
message: ", failed to "StartContainer" for "sdk3" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=sdk3 pod=<dataflow-id>--01030117-c9pc-harness-5lkv_default(6515c378c6bed37a2c0eec1fcfea300c)""
I find this error too (in info mode):
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
Installing build dependencies: started
Looking in links: /var/opt/google/staged
Installing build dependencies: started
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
Installing build dependencies: started
Looking in links: /var/opt/google/staged
Collecting jsonschema (from -r /var/opt/google/staged/requirements.txt (line 1))
Installing build dependencies: started
Installing build dependencies: finished with status 'error'
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3 /usr/local/lib/python3.7/site-packages/pip install --ignore-installed --no-user --prefix /tmp/pip-build-env-mdurhav9/overlay --no-warn-script-location --no-binary :none: --only-binary :none: --no-index --find-links /var/opt/google/staged -- 'setuptools>=40.6.0' wheel
cwd: None
Complete output (5 lines):
Looking in links: /var/opt/google/staged
Collecting setuptools>=40.6.0
Collecting wheel
ERROR: Could not find a version that satisfies the requirement wheel (from versions: none)
ERROR: No matching distribution found for wheel
But I don't know why it can get this dependency...
Do you have any idea how I can debug this? or why I encounter this error?
Thanks
答案1
得分: 8
当Dataflow工作器启动时,它们执行以下几个步骤:
- 从
requirements.txt
安装包 - 安装指定为
extra_packages
的包 - 安装工作流tarball并执行
setup.py
中提供的操作。
出现Error syncing pod
和CrashLoopBackOff
消息可能与依赖冲突有关。您需要验证作业使用的库和版本是否存在冲突。请参考文档以获取流水线所需依赖项的详细信息。
此外,请查看预安装的依赖项以及此StackOverflow帖子。
您可以尝试更改jsonschema
的版本,然后再次运行它。如果这不起作用,请提供requirements.txt
文件。
希望这对您有帮助。
英文:
When Dataflow workers start, they execute several steps:
- Install packages from
requirements.txt
- Install packages specified as
extra_packages
- Install workflow tarball and execute actions provided in
setup.py
.
Error syncing pod
with CrashLoopBackOff
message can be related to dependency conflict. You need to verify that there are no conflicts with the libraries and versions used for the job. Please refer to the documentation for staging required dependencies of the pipeline.
Also, take a look for preinstalled dependencies and this StackOverflow thread.
What you can try is change the version of jsonschema
and try run it again. If it wouldn't help, please provide requirements.txt
file.
I hope it will help you.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论