2023年5月17日 15:55:01go评论88阅读模式

英文:

the executor pod in kubernetes keeps create then remove when submit spark job to k8s

问题

我通过Airflow使用KubernetesPodOperator提交了一个Spark作业，以下是代码部分：

spark_submit = KubernetesPodOperator(
    task_id='test_spark_k8s_submit',
    name='test_spark_k8s_submit',
    namespace='dev-spark',
    image='docker.io/vinhlq9/bitnami-spark-3.3',
    cmds=['/opt/spark/bin/spark-submit'],
    arguments=[
        '--master', k8s_url,
        '--deploy-mode', 'cluster',
        '--name', 'spark-job',
        '--conf', 'spark.kubernetes.namespace=dev-spark',
        '--conf', 'spark.kubernetes.container.image=docker.io/vinhlq9/bitnami-spark-3.3',
        '--conf', 'spark.kubernetes.authenticate.driver.serviceAccountName=spark-user',
        '--conf', 'spark.kubernetes.authenticate.executor.serviceAccountName=spark-user',
        '--conf', 'spark.kubernetes.driverEnv.SPARK_CONF_DIR=/opt/bitnami/spark/conf',
        '--conf', 'spark.kubernetes.driverEnv.SPARK_CONFIG_MAP=spark-config',
        '--conf', 'spark.kubernetes.file.upload.path=/opt/spark',
        '--conf', 'spark.kubernetes.driver.annotation.sidecar.istio.io/inject=false',
        '--conf', 'spark.kubernetes.executor.annotation.sidecar.istio.io/inject=false',
        '--conf', 'spark.eventLog.enabled=true ',
        '--conf', 'spark.eventLog.dir=oss://spark/spark-log/',
        '--conf', 'spark.hadoop.fs.oss.accessKeyId=' + spark_user_access_key,
        '--conf', 'spark.hadoop.fs.oss.accessKeySecret=' + spark_user_secret_key,
        '--conf', 'spark.hadoop.fs.oss.endpoint=' + spark_user_endpoint,
        '--conf', 'spark.hadoop.fs.oss.impl=org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem',
        '--conf', 'spark.executor.instances=1',
        '--conf', 'spark.executor.memory=4g',
        '--conf', 'spark.executor.cores=2',
        '--conf', 'spark.driver.memory=2g',
        'oss://spark/job/test_spark_k8s_job_simple.py'
    ],
    is_delete_operator_pod=True,
    config_file='/opt/airflow/plugins/k8sconfig-spark-user.json',
    get_logs=True,
    dag=dag
)

驱动程序Pod中的日志：

spark 08:40:12.26
spark 08:40:12.26 Welcome to the Bitnami spark container
# 更多日志...

执行器Pod中的循环日志：

# 循环日志...

请问是否有人遇到过这种情况？希望能够了解更多信息。

英文:

I submitted a Spark Job through Airflow with KubernetesPodOperator as the code below; the driver pod is created, but the executor pod keeps being created and deleted over and over.

spark_submit = KubernetesPodOperator(
    task_id=&#39;test_spark_k8s_submit&#39;,
    name=&#39;test_spark_k8s_submit&#39;,
    namespace=&#39;dev-spark&#39;,
    image=&#39;docker.io/vinhlq9/bitnami-spark-3.3&#39;,
    cmds=[&#39;/opt/spark/bin/spark-submit&#39;],
    arguments=[
        &#39;--master&#39;, k8s_url,
        &#39;--deploy-mode&#39;, &#39;cluster&#39;,
        &#39;--name&#39;, &#39;spark-job&#39;,
        &#39;--conf&#39;, &#39;spark.kubernetes.namespace=dev-spark&#39;,
        &#39;--conf&#39;, &#39;spark.kubernetes.container.image=docker.io/vinhlq9/bitnami-spark-3.3&#39;,
        &#39;--conf&#39;, &#39;spark.kubernetes.authenticate.driver.serviceAccountName=spark-user&#39;,
        &#39;--conf&#39;, &#39;spark.kubernetes.authenticate.executor.serviceAccountName=spark-user&#39;,
        &#39;--conf&#39;, &#39;spark.kubernetes.driverEnv.SPARK_CONF_DIR=/opt/bitnami/spark/conf&#39;,
        &#39;--conf&#39;, &#39;spark.kubernetes.driverEnv.SPARK_CONFIG_MAP=spark-config&#39;,
        &#39;--conf&#39;, &#39;spark.kubernetes.file.upload.path=/opt/spark&#39;,
        &#39;--conf&#39;, &#39;spark.kubernetes.driver.annotation.sidecar.istio.io/inject=false&#39;,
        &#39;--conf&#39;, &#39;spark.kubernetes.executor.annotation.sidecar.istio.io/inject=false&#39;,
        &#39;--conf&#39;, &#39;spark.eventLog.enabled=true &#39;,
        &#39;--conf&#39;, &#39;spark.eventLog.dir=oss://spark/spark-log/&#39;,
        &#39;--conf&#39;, &#39;spark.hadoop.fs.oss.accessKeyId=&#39; + spark_user_access_key ,
        &#39;--conf&#39;, &#39;spark.hadoop.fs.oss.accessKeySecret=&#39; + spark_user_secret_key,
        &#39;--conf&#39;, &#39;spark.hadoop.fs.oss.endpoint=&#39; + spark_user_endpoint,
        &#39;--conf&#39;, &#39;spark.hadoop.fs.oss.impl=org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem&#39;,
        &#39;--conf&#39;, &#39;spark.executor.instances=1&#39;,
        &#39;--conf&#39;, &#39;spark.executor.memory=4g&#39;,
        &#39;--conf&#39;, &#39;spark.executor.cores=2&#39;,
        &#39;--conf&#39;, &#39;spark.driver.memory=2g&#39;,
        &#39;oss://spark/job/test_spark_k8s_job_simple.py&#39;
    ],
    is_delete_operator_pod=True,
    config_file=&#39;/opt/airflow/plugins/k8sconfig-spark-user.json&#39;,
    get_logs=True,
    dag=dag
)

And the logs in the driver pod:

spark 08:40:12.26
spark 08:40:12.26 Welcome to the Bitnami spark container
spark 08:40:12.27 Subscribe to project updates by watching https://github.com/bitnami/containers
spark 08:40:12.27 Submit issues and feature requests at https://github.com/bitnami/containers/issues
spark 08:40:12.27
23/05/16 08:40:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/05/16 08:40:16 INFO SparkContext: Running Spark version 3.3.2
23/05/16 08:40:16 INFO ResourceUtils: ==============================================================
23/05/16 08:40:16 INFO ResourceUtils: No custom resources configured for spark.driver.
23/05/16 08:40:16 INFO ResourceUtils: ==============================================================
23/05/16 08:40:16 INFO SparkContext: Submitted application: spark-read-csv
23/05/16 08:40:16 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -&gt; name: cores, amount: 2, script: , vendor: , memory -&gt; name: memory, amount: 4096, script: , vendor: , offHeap -&gt; name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -&gt; name: cpus, amount: 1.0)
23/05/16 08:40:16 INFO ResourceProfile: Limiting resource is cpus at 2 tasks per executor
23/05/16 08:40:16 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/05/16 08:40:16 INFO SecurityManager: Changing view acls to: spark,root
23/05/16 08:40:16 INFO SecurityManager: Changing modify acls to: spark,root
23/05/16 08:40:16 INFO SecurityManager: Changing view acls groups to:
23/05/16 08:40:16 INFO SecurityManager: Changing modify acls groups to:
23/05/16 08:40:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark, root); groups with view permissions: Set(); users with modify permissions: Set(spark, root); groups with modify permissions: Set()
23/05/16 08:40:16 INFO Utils: Successfully started service &#39;sparkDriver&#39; on port 7078.
23/05/16 08:40:16 INFO SparkEnv: Registering MapOutputTracker
23/05/16 08:40:16 INFO SparkEnv: Registering BlockManagerMaster
23/05/16 08:40:16 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/05/16 08:40:16 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/05/16 08:40:16 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/05/16 08:40:16 INFO DiskBlockManager: Created local directory at /var/data/spark-77a2ee41-2c8e-45c6-9df6-bb1f549d4566/blockmgr-5350fab4-8dd7-432e-80b3-fbc1924f0dea
23/05/16 08:40:16 INFO MemoryStore: MemoryStore started with capacity 912.3 MiB
23/05/16 08:40:16 INFO SparkEnv: Registering OutputCommitCoordinator
23/05/16 08:40:16 INFO Utils: Successfully started service &#39;SparkUI&#39; on port 4040.
23/05/16 08:40:16 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
23/05/16 08:40:18 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:18 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:18 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:18 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
23/05/16 08:40:18 INFO Utils: Successfully started service &#39;org.apache.spark.network.netty.NettyBlockTransferService&#39; on port 7079.
23/05/16 08:40:18 INFO NettyBlockTransferService: Server created on spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc:7079
23/05/16 08:40:18 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/05/16 08:40:18 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManagerMasterEndpoint: Registering block manager spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc:7079 with 912.3 MiB RAM, BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.baseline-dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO SingleEventLogFileWriter: Logging events to oss://spark/spark-log/spark-f6f3a41be773442dbc9a30781dffbc11.inprogress
23/05/16 08:40:21 INFO BlockManagerMaster: Removal of executor 1 requested
23/05/16 08:40:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove non-existent executor 1
23/05/16 08:40:21 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
23/05/16 08:40:21 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:21 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:21 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
23/05/16 08:40:24 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.

The loop in the executor pod:

23/05/16 08:40:25 INFO BlockManagerMaster: Removal of executor 2 requested
23/05/16 08:40:25 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
23/05/16 08:40:25 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove non-existent executor 2
23/05/16 08:40:27 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:27 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh

Has anyone encountered this before? Would be great to get an idea about this.

答案1

得分: 1

我已经解决了这个问题，这是由于Spark镜像中的Java版本引起的。

英文:

I already fix this issue, it cause by the java version in spark image

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Kubernetes中的执行器Pod在提交Spark作业到K8s时不断创建然后移除。

问题

答案1

retry_exponential_backoff在Airflow任务中是如何工作的？

Writesteams 失败，出现 java.lang.NoClassDefFoundError 错误。

在两个参数之间查找多个字符串。

ModSecurity SecRule用于在REQUEST_URI或QUERY_STRING中包含特定单词时阻止请求。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。