运行 Apache Beam Go SDK 示例时,发现 ID 为 xxxxxx 的容器未在运行。

huangapple go评论95阅读模式
英文:

No container running for id xxxxxx when running apache beam go sdk examples

问题

我想在一个具有一个主节点和两个从节点(spark2.4.5版本)的Spark集群上,使用Apache Beam Go SDK提供的成绩示例,并使用Spark Runner运行。然而,我遇到了以下错误。我无法确定主要问题,因为SSH和Docker已安装并运行。

无法检索分段文件:在3次尝试中无法检索/tmp/staged:无法检索/tmp/staged/worker的块
	原因是:
	rpc错误:代码=未知描述=;无法检索/tmp/staged/worker的块
	原因是:
	rpc错误:代码=未知描述=;无法检索/tmp/staged/worker的块
	原因是:
	rpc错误:代码=未知描述= 
21/09/19 11:01:47 WARN BlockManager:由于异常org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException,放置块rdd_2_1失败:java.lang.IllegalStateException:没有运行的容器
驱动程序命令了一个关闭
apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException:java.lang.IllegalStateException:没有运行的容器的ID
	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2050)
	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)
	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)

我使用以下命令运行作业服务端点:

docker run --net=host apache/beam_spark_job_server:latest --spark-master-url=spark://master:7077  --artifacts-dir /tmp/beam-artifact-staging

我使用以下命令运行它:

grades -runner=spark -endpoint=localhost:8099 -job_name=gradetest

对于stderr的Spark日志,我得到了以下内容:

INFO Utils: 将/tmp/spark-2cfdd145-1f88-4a9d-a878-aab4d9f3d880/executor-a62a671a-fb65-47fe-8f42-8c0da3fd31d6/spark-03eaf345-38eb-4f4f-80be-5833f222f20e/11971419701632128778344_cache复制到/opt/spark/work/app-20210920100618-0017/1/./beam-runners-spark-job-server.jar
23/09/20 10:06:22 INFO Executor: 将file:/opt/spark/work/app-20210920100619-0017/1/./beam-runners-spark-job-server.jar添加到类加载器
23/09/20 10:06:22 INFO TorrentBroadcast: 开始读取广播变量0
23/09/20 10:06:22 INFO TransportClientFactory: 成功创建到master/192.168.1.*:44365的连接(在引导中花费了1毫秒)
23/09/20 10:06:22 INFO MemoryStore: 将广播_0_piece0块存储为内存中的字节(估计大小为10.9 KB,空闲366.3 MB)
23/09/20 10:06:22 INFO TorrentBroadcast: 读取广播变量0花费了97毫秒
23/09/20 10:06:22 INFO MemoryStore: 将广播_0块存储为内存中的值(估计大小为24.9 KB,空闲366.3 MB)
java.io.FileNotFoundException: /tmp/beam-artifact-staging/e40099113cf8136935edc839aa85487c0532034c0a63f8cbadd7fccac0f98ed0/1-go-worker(没有该文件或目录)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.<init>(FileInputStream.java:138)
	at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:127)
	at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:83)
	at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:257)
	at org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:124)

以上是翻译好的内容,请确认是否正确。

英文:

I want to run grades example proposed by apache beam go sdk using spark runner on a spark cluster with one master and two slaves(spark2.4.5 version ). However I get the following error. I do not figure the main problem because ssh and docker are installed and running.

```Failed to retrieve staged files: failed to retrieve /tmp/staged in 3 attempts: failed to retrieve chunk for /tmp/staged/worker
	caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for /tmp/staged/worker
	caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for /tmp/staged/worker
	caused by:
rpc error: code = Unknown desc = ; failed to retrieve chunk for /tmp/staged/worker
	caused by:
rpc error: code = Unknown desc = 
21/09/19 11:01:47 WARN BlockManager: Putting block rdd_2_1 failed due to exception org.apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: No container running for id 
Driver commanded a shutdown
apache.beam.vendor.guava.v26_0_jre.com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalStateException: No container running for id xxxxxxxxx
	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2050)
	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.get(LocalCache.java:3952)
	at org.apache.beam.vendor.guava.v26_0_jre.com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
	at  ```

I run the job service endpoint using the following command :

docker run --net=host apache/beam_spark_job_server:latest --spark-master-url=spark://master:7077 --artifacts-dir /tmp/beam-artifact-staging

And I run it using the following command :
grades -runner=spark -endpoint=localhost:8099 -job_name=gradetest
for the stderr spark log, I got the following :

23/09/20 10:06:22 INFO Executor: Adding file:/opt/spark/work/app-20210920100619-0017/1/./beam-runners-spark-job-server.jar to class loader
23/09/20 10:06:22 INFO TorrentBroadcast: Started reading broadcast variable 0
23/09/20 10:06:22 INFO TransportClientFactory: Successfully created connection to master/192.168.1.*:44365 after 1 ms (0 ms spent in bootstraps)
23/09/20 10:06:22 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 10.9 KB, free 366.3 MB)
23/09/20 10:06:22 INFO TorrentBroadcast: Reading broadcast variable 0 took 97 ms
23/09/20 10:06:22 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 24.9 KB, free 366.3 MB)
java.io.FileNotFoundException: /tmp/beam-artifact-staging/e40099113cf8136935edc839aa85487c0532034c0a63f8cbadd7fccac0f98ed0/1-go-worker (No such file or directory)
	at java.io.FileInputStream.open0(Native Method)
	at java.io.FileInputStream.open(FileInputStream.java:195)
	at java.io.FileInputStream.&lt;init&gt;(FileInputStream.java:138)
	at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:127)
	at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:83)
	at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:257)
	at org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:124)
	at ```



</details>


# 答案1
**得分**: 1

这个错误是因为Spark Job Server在一个与你的流水线构建代码分离的docker容器中运行。问题在于你的流水线构建代码(即"grades"二进制文件)将Job Server的构件文件放在了本地机器上的`/tmp/beam-artifact-staging`目录中。然而,Job Server在docker容器内搜索该目录时找不到这些文件。

不幸的是,Go SDK目前不支持运行docker化的Spark Job Server。在其他SDK中,docker化的Spark Job Server用于以uber jar的形式运行流水线(将用户流水线和所有构件打包到一个jar文件中,并将所有内容发送到带有Job Server的docker容器中)。目前,Go SDK不支持这个功能。

对你来说最简单的替代方案是直接在本地机器上作为jar文件运行Job Server。我相信它作为Maven构件被索引为[`org.apache.beam:beam-runners-spark-job-server`][1]。这样它应该可以访问构件目录。

  [1]: https://mvnrepository.com/artifact/org.apache.beam/beam-runners-spark-job-server/2.32.0

<details>
<summary>英文:</summary>

This error is happening because the Spark Job Server is running in a docker container separate from your pipeline construction code. What happens is that your pipeline construction code (i.e. the &quot;grades&quot; binary file) stages artifacts for the Job Server in `/tmp/beam-artifact-staging`, a directory on your local machine. The Job Server however is searching in that directory within the docker container, and doesn&#39;t find the files.

Unfortunately the Go SDK currently does not support running a dockerized Spark Job Server. In other SDKs the dockerized Spark Job Server is intended to be used when running a pipeline as an uber jar (basically packaging the user pipeline and all artifacts into one jar and sending everything to the docker container with the Job Server). This functionality is currently unavailable in the Go SDK.

The simplest alternative for you is to run the Job Server directly as a jar on your local machine. I believe it is indexed as a Maven artifact as [`org.apache.beam:beam-runners-spark-job-server`][1]. This way it should have access to the artifacts directory.


  [1]: https://mvnrepository.com/artifact/org.apache.beam/beam-runners-spark-job-server/2.32.0

</details>



huangapple
  • 本文由 发表于 2021年9月19日 01:52:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/69237129.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定