Kubernetes中的执行器Pod在提交Spark作业到K8s时不断创建然后移除。

huangapple go评论52阅读模式
英文:

the executor pod in kubernetes keeps create then remove when submit spark job to k8s

问题

我通过Airflow使用KubernetesPodOperator提交了一个Spark作业,以下是代码部分:

spark_submit = KubernetesPodOperator(
    task_id='test_spark_k8s_submit',
    name='test_spark_k8s_submit',
    namespace='dev-spark',
    image='docker.io/vinhlq9/bitnami-spark-3.3',
    cmds=['/opt/spark/bin/spark-submit'],
    arguments=[
        '--master', k8s_url,
        '--deploy-mode', 'cluster',
        '--name', 'spark-job',
        '--conf', 'spark.kubernetes.namespace=dev-spark',
        '--conf', 'spark.kubernetes.container.image=docker.io/vinhlq9/bitnami-spark-3.3',
        '--conf', 'spark.kubernetes.authenticate.driver.serviceAccountName=spark-user',
        '--conf', 'spark.kubernetes.authenticate.executor.serviceAccountName=spark-user',
        '--conf', 'spark.kubernetes.driverEnv.SPARK_CONF_DIR=/opt/bitnami/spark/conf',
        '--conf', 'spark.kubernetes.driverEnv.SPARK_CONFIG_MAP=spark-config',
        '--conf', 'spark.kubernetes.file.upload.path=/opt/spark',
        '--conf', 'spark.kubernetes.driver.annotation.sidecar.istio.io/inject=false',
        '--conf', 'spark.kubernetes.executor.annotation.sidecar.istio.io/inject=false',
        '--conf', 'spark.eventLog.enabled=true ',
        '--conf', 'spark.eventLog.dir=oss://spark/spark-log/',
        '--conf', 'spark.hadoop.fs.oss.accessKeyId=' + spark_user_access_key,
        '--conf', 'spark.hadoop.fs.oss.accessKeySecret=' + spark_user_secret_key,
        '--conf', 'spark.hadoop.fs.oss.endpoint=' + spark_user_endpoint,
        '--conf', 'spark.hadoop.fs.oss.impl=org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem',
        '--conf', 'spark.executor.instances=1',
        '--conf', 'spark.executor.memory=4g',
        '--conf', 'spark.executor.cores=2',
        '--conf', 'spark.driver.memory=2g',
        'oss://spark/job/test_spark_k8s_job_simple.py'
    ],
    is_delete_operator_pod=True,
    config_file='/opt/airflow/plugins/k8sconfig-spark-user.json',
    get_logs=True,
    dag=dag
)

驱动程序Pod中的日志:

spark 08:40:12.26
spark 08:40:12.26 Welcome to the Bitnami spark container
# 更多日志...

执行器Pod中的循环日志:

# 循环日志...

请问是否有人遇到过这种情况?希望能够了解更多信息。

英文:

I submitted a Spark Job through Airflow with KubernetesPodOperator as the code below; the driver pod is created, but the executor pod keeps being created and deleted over and over.

spark_submit = KubernetesPodOperator(
    task_id='test_spark_k8s_submit',
    name='test_spark_k8s_submit',
    namespace='dev-spark',
    image='docker.io/vinhlq9/bitnami-spark-3.3',
    cmds=['/opt/spark/bin/spark-submit'],
    arguments=[
        '--master', k8s_url,
        '--deploy-mode', 'cluster',
        '--name', 'spark-job',
        '--conf', 'spark.kubernetes.namespace=dev-spark',
        '--conf', 'spark.kubernetes.container.image=docker.io/vinhlq9/bitnami-spark-3.3',
        '--conf', 'spark.kubernetes.authenticate.driver.serviceAccountName=spark-user',
        '--conf', 'spark.kubernetes.authenticate.executor.serviceAccountName=spark-user',
        '--conf', 'spark.kubernetes.driverEnv.SPARK_CONF_DIR=/opt/bitnami/spark/conf',
        '--conf', 'spark.kubernetes.driverEnv.SPARK_CONFIG_MAP=spark-config',
        '--conf', 'spark.kubernetes.file.upload.path=/opt/spark',
        '--conf', 'spark.kubernetes.driver.annotation.sidecar.istio.io/inject=false',
        '--conf', 'spark.kubernetes.executor.annotation.sidecar.istio.io/inject=false',
        '--conf', 'spark.eventLog.enabled=true ',
        '--conf', 'spark.eventLog.dir=oss://spark/spark-log/',
        '--conf', 'spark.hadoop.fs.oss.accessKeyId=' + spark_user_access_key ,
        '--conf', 'spark.hadoop.fs.oss.accessKeySecret=' + spark_user_secret_key,
        '--conf', 'spark.hadoop.fs.oss.endpoint=' + spark_user_endpoint,
        '--conf', 'spark.hadoop.fs.oss.impl=org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem',
        '--conf', 'spark.executor.instances=1',
        '--conf', 'spark.executor.memory=4g',
        '--conf', 'spark.executor.cores=2',
        '--conf', 'spark.driver.memory=2g',
        'oss://spark/job/test_spark_k8s_job_simple.py'
    ],
    is_delete_operator_pod=True,
    config_file='/opt/airflow/plugins/k8sconfig-spark-user.json',
    get_logs=True,
    dag=dag
)

And the logs in the driver pod:

spark 08:40:12.26
spark 08:40:12.26 Welcome to the Bitnami spark container
spark 08:40:12.27 Subscribe to project updates by watching https://github.com/bitnami/containers
spark 08:40:12.27 Submit issues and feature requests at https://github.com/bitnami/containers/issues
spark 08:40:12.27
23/05/16 08:40:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/05/16 08:40:16 INFO SparkContext: Running Spark version 3.3.2
23/05/16 08:40:16 INFO ResourceUtils: ==============================================================
23/05/16 08:40:16 INFO ResourceUtils: No custom resources configured for spark.driver.
23/05/16 08:40:16 INFO ResourceUtils: ==============================================================
23/05/16 08:40:16 INFO SparkContext: Submitted application: spark-read-csv
23/05/16 08:40:16 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 2, script: , vendor: , memory -> name: memory, amount: 4096, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/05/16 08:40:16 INFO ResourceProfile: Limiting resource is cpus at 2 tasks per executor
23/05/16 08:40:16 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/05/16 08:40:16 INFO SecurityManager: Changing view acls to: spark,root
23/05/16 08:40:16 INFO SecurityManager: Changing modify acls to: spark,root
23/05/16 08:40:16 INFO SecurityManager: Changing view acls groups to:
23/05/16 08:40:16 INFO SecurityManager: Changing modify acls groups to:
23/05/16 08:40:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark, root); groups with view permissions: Set(); users with modify permissions: Set(spark, root); groups with modify permissions: Set()
23/05/16 08:40:16 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
23/05/16 08:40:16 INFO SparkEnv: Registering MapOutputTracker
23/05/16 08:40:16 INFO SparkEnv: Registering BlockManagerMaster
23/05/16 08:40:16 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/05/16 08:40:16 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/05/16 08:40:16 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/05/16 08:40:16 INFO DiskBlockManager: Created local directory at /var/data/spark-77a2ee41-2c8e-45c6-9df6-bb1f549d4566/blockmgr-5350fab4-8dd7-432e-80b3-fbc1924f0dea
23/05/16 08:40:16 INFO MemoryStore: MemoryStore started with capacity 912.3 MiB
23/05/16 08:40:16 INFO SparkEnv: Registering OutputCommitCoordinator
23/05/16 08:40:16 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/05/16 08:40:16 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
23/05/16 08:40:18 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:18 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:18 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:18 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
23/05/16 08:40:18 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
23/05/16 08:40:18 INFO NettyBlockTransferService: Server created on spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc:7079
23/05/16 08:40:18 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/05/16 08:40:18 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManagerMasterEndpoint: Registering block manager spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc:7079 with 912.3 MiB RAM, BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.baseline-dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO SingleEventLogFileWriter: Logging events to oss://spark/spark-log/spark-f6f3a41be773442dbc9a30781dffbc11.inprogress
23/05/16 08:40:21 INFO BlockManagerMaster: Removal of executor 1 requested
23/05/16 08:40:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove non-existent executor 1
23/05/16 08:40:21 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
23/05/16 08:40:21 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:21 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:21 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
23/05/16 08:40:24 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.

The loop in the executor pod:

23/05/16 08:40:25 INFO BlockManagerMaster: Removal of executor 2 requested
23/05/16 08:40:25 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
23/05/16 08:40:25 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove non-existent executor 2
23/05/16 08:40:27 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:27 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh

Has anyone encountered this before? Would be great to get an idea about this.

答案1

得分: 1

我已经解决了这个问题,这是由于Spark镜像中的Java版本引起的。

英文:

I already fix this issue, it cause by the java version in spark image

huangapple
  • 本文由 发表于 2023年5月17日 15:55:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76269747.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定