英文:
the executor pod in kubernetes keeps create then remove when submit spark job to k8s
问题
我通过Airflow使用KubernetesPodOperator
提交了一个Spark作业,以下是代码部分:
spark_submit = KubernetesPodOperator(
task_id='test_spark_k8s_submit',
name='test_spark_k8s_submit',
namespace='dev-spark',
image='docker.io/vinhlq9/bitnami-spark-3.3',
cmds=['/opt/spark/bin/spark-submit'],
arguments=[
'--master', k8s_url,
'--deploy-mode', 'cluster',
'--name', 'spark-job',
'--conf', 'spark.kubernetes.namespace=dev-spark',
'--conf', 'spark.kubernetes.container.image=docker.io/vinhlq9/bitnami-spark-3.3',
'--conf', 'spark.kubernetes.authenticate.driver.serviceAccountName=spark-user',
'--conf', 'spark.kubernetes.authenticate.executor.serviceAccountName=spark-user',
'--conf', 'spark.kubernetes.driverEnv.SPARK_CONF_DIR=/opt/bitnami/spark/conf',
'--conf', 'spark.kubernetes.driverEnv.SPARK_CONFIG_MAP=spark-config',
'--conf', 'spark.kubernetes.file.upload.path=/opt/spark',
'--conf', 'spark.kubernetes.driver.annotation.sidecar.istio.io/inject=false',
'--conf', 'spark.kubernetes.executor.annotation.sidecar.istio.io/inject=false',
'--conf', 'spark.eventLog.enabled=true ',
'--conf', 'spark.eventLog.dir=oss://spark/spark-log/',
'--conf', 'spark.hadoop.fs.oss.accessKeyId=' + spark_user_access_key,
'--conf', 'spark.hadoop.fs.oss.accessKeySecret=' + spark_user_secret_key,
'--conf', 'spark.hadoop.fs.oss.endpoint=' + spark_user_endpoint,
'--conf', 'spark.hadoop.fs.oss.impl=org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem',
'--conf', 'spark.executor.instances=1',
'--conf', 'spark.executor.memory=4g',
'--conf', 'spark.executor.cores=2',
'--conf', 'spark.driver.memory=2g',
'oss://spark/job/test_spark_k8s_job_simple.py'
],
is_delete_operator_pod=True,
config_file='/opt/airflow/plugins/k8sconfig-spark-user.json',
get_logs=True,
dag=dag
)
驱动程序Pod中的日志:
spark 08:40:12.26
spark 08:40:12.26 Welcome to the Bitnami spark container
# 更多日志...
执行器Pod中的循环日志:
# 循环日志...
请问是否有人遇到过这种情况?希望能够了解更多信息。
英文:
I submitted a Spark Job through Airflow with KubernetesPodOperator
as the code below; the driver pod is created, but the executor pod keeps being created and deleted over and over.
spark_submit = KubernetesPodOperator(
task_id='test_spark_k8s_submit',
name='test_spark_k8s_submit',
namespace='dev-spark',
image='docker.io/vinhlq9/bitnami-spark-3.3',
cmds=['/opt/spark/bin/spark-submit'],
arguments=[
'--master', k8s_url,
'--deploy-mode', 'cluster',
'--name', 'spark-job',
'--conf', 'spark.kubernetes.namespace=dev-spark',
'--conf', 'spark.kubernetes.container.image=docker.io/vinhlq9/bitnami-spark-3.3',
'--conf', 'spark.kubernetes.authenticate.driver.serviceAccountName=spark-user',
'--conf', 'spark.kubernetes.authenticate.executor.serviceAccountName=spark-user',
'--conf', 'spark.kubernetes.driverEnv.SPARK_CONF_DIR=/opt/bitnami/spark/conf',
'--conf', 'spark.kubernetes.driverEnv.SPARK_CONFIG_MAP=spark-config',
'--conf', 'spark.kubernetes.file.upload.path=/opt/spark',
'--conf', 'spark.kubernetes.driver.annotation.sidecar.istio.io/inject=false',
'--conf', 'spark.kubernetes.executor.annotation.sidecar.istio.io/inject=false',
'--conf', 'spark.eventLog.enabled=true ',
'--conf', 'spark.eventLog.dir=oss://spark/spark-log/',
'--conf', 'spark.hadoop.fs.oss.accessKeyId=' + spark_user_access_key ,
'--conf', 'spark.hadoop.fs.oss.accessKeySecret=' + spark_user_secret_key,
'--conf', 'spark.hadoop.fs.oss.endpoint=' + spark_user_endpoint,
'--conf', 'spark.hadoop.fs.oss.impl=org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem',
'--conf', 'spark.executor.instances=1',
'--conf', 'spark.executor.memory=4g',
'--conf', 'spark.executor.cores=2',
'--conf', 'spark.driver.memory=2g',
'oss://spark/job/test_spark_k8s_job_simple.py'
],
is_delete_operator_pod=True,
config_file='/opt/airflow/plugins/k8sconfig-spark-user.json',
get_logs=True,
dag=dag
)
And the logs in the driver pod:
spark 08:40:12.26
spark 08:40:12.26 Welcome to the Bitnami spark container
spark 08:40:12.27 Subscribe to project updates by watching https://github.com/bitnami/containers
spark 08:40:12.27 Submit issues and feature requests at https://github.com/bitnami/containers/issues
spark 08:40:12.27
23/05/16 08:40:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/05/16 08:40:16 INFO SparkContext: Running Spark version 3.3.2
23/05/16 08:40:16 INFO ResourceUtils: ==============================================================
23/05/16 08:40:16 INFO ResourceUtils: No custom resources configured for spark.driver.
23/05/16 08:40:16 INFO ResourceUtils: ==============================================================
23/05/16 08:40:16 INFO SparkContext: Submitted application: spark-read-csv
23/05/16 08:40:16 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 2, script: , vendor: , memory -> name: memory, amount: 4096, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
23/05/16 08:40:16 INFO ResourceProfile: Limiting resource is cpus at 2 tasks per executor
23/05/16 08:40:16 INFO ResourceProfileManager: Added ResourceProfile id: 0
23/05/16 08:40:16 INFO SecurityManager: Changing view acls to: spark,root
23/05/16 08:40:16 INFO SecurityManager: Changing modify acls to: spark,root
23/05/16 08:40:16 INFO SecurityManager: Changing view acls groups to:
23/05/16 08:40:16 INFO SecurityManager: Changing modify acls groups to:
23/05/16 08:40:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark, root); groups with view permissions: Set(); users with modify permissions: Set(spark, root); groups with modify permissions: Set()
23/05/16 08:40:16 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
23/05/16 08:40:16 INFO SparkEnv: Registering MapOutputTracker
23/05/16 08:40:16 INFO SparkEnv: Registering BlockManagerMaster
23/05/16 08:40:16 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
23/05/16 08:40:16 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
23/05/16 08:40:16 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
23/05/16 08:40:16 INFO DiskBlockManager: Created local directory at /var/data/spark-77a2ee41-2c8e-45c6-9df6-bb1f549d4566/blockmgr-5350fab4-8dd7-432e-80b3-fbc1924f0dea
23/05/16 08:40:16 INFO MemoryStore: MemoryStore started with capacity 912.3 MiB
23/05/16 08:40:16 INFO SparkEnv: Registering OutputCommitCoordinator
23/05/16 08:40:16 INFO Utils: Successfully started service 'SparkUI' on port 4040.
23/05/16 08:40:16 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
23/05/16 08:40:18 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:18 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:18 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:18 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
23/05/16 08:40:18 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
23/05/16 08:40:18 INFO NettyBlockTransferService: Server created on spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc:7079
23/05/16 08:40:18 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
23/05/16 08:40:18 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManagerMasterEndpoint: Registering block manager spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc:7079 with 912.3 MiB RAM, BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.baseline-dev-spark.svc, 7079, None)
23/05/16 08:40:18 INFO SingleEventLogFileWriter: Logging events to oss://spark/spark-log/spark-f6f3a41be773442dbc9a30781dffbc11.inprogress
23/05/16 08:40:21 INFO BlockManagerMaster: Removal of executor 1 requested
23/05/16 08:40:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove non-existent executor 1
23/05/16 08:40:21 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
23/05/16 08:40:21 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:21 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
23/05/16 08:40:21 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
23/05/16 08:40:24 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
The loop in the executor pod:
23/05/16 08:40:25 INFO BlockManagerMaster: Removal of executor 2 requested
23/05/16 08:40:25 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
23/05/16 08:40:25 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove non-existent executor 2
23/05/16 08:40:27 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
23/05/16 08:40:27 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
Has anyone encountered this before? Would be great to get an idea about this.
答案1
得分: 1
我已经解决了这个问题,这是由于Spark镜像中的Java版本引起的。
英文:
I already fix this issue, it cause by the java version in spark image
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论