Kubernetes中的执行器Pod在提交Spark作业到K8s时不断创建然后移除。

huangapple go评论88阅读模式
英文:

the executor pod in kubernetes keeps create then remove when submit spark job to k8s

问题

我通过Airflow使用KubernetesPodOperator提交了一个Spark作业,以下是代码部分:

  1. spark_submit = KubernetesPodOperator(
  2. task_id='test_spark_k8s_submit',
  3. name='test_spark_k8s_submit',
  4. namespace='dev-spark',
  5. image='docker.io/vinhlq9/bitnami-spark-3.3',
  6. cmds=['/opt/spark/bin/spark-submit'],
  7. arguments=[
  8. '--master', k8s_url,
  9. '--deploy-mode', 'cluster',
  10. '--name', 'spark-job',
  11. '--conf', 'spark.kubernetes.namespace=dev-spark',
  12. '--conf', 'spark.kubernetes.container.image=docker.io/vinhlq9/bitnami-spark-3.3',
  13. '--conf', 'spark.kubernetes.authenticate.driver.serviceAccountName=spark-user',
  14. '--conf', 'spark.kubernetes.authenticate.executor.serviceAccountName=spark-user',
  15. '--conf', 'spark.kubernetes.driverEnv.SPARK_CONF_DIR=/opt/bitnami/spark/conf',
  16. '--conf', 'spark.kubernetes.driverEnv.SPARK_CONFIG_MAP=spark-config',
  17. '--conf', 'spark.kubernetes.file.upload.path=/opt/spark',
  18. '--conf', 'spark.kubernetes.driver.annotation.sidecar.istio.io/inject=false',
  19. '--conf', 'spark.kubernetes.executor.annotation.sidecar.istio.io/inject=false',
  20. '--conf', 'spark.eventLog.enabled=true ',
  21. '--conf', 'spark.eventLog.dir=oss://spark/spark-log/',
  22. '--conf', 'spark.hadoop.fs.oss.accessKeyId=' + spark_user_access_key,
  23. '--conf', 'spark.hadoop.fs.oss.accessKeySecret=' + spark_user_secret_key,
  24. '--conf', 'spark.hadoop.fs.oss.endpoint=' + spark_user_endpoint,
  25. '--conf', 'spark.hadoop.fs.oss.impl=org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem',
  26. '--conf', 'spark.executor.instances=1',
  27. '--conf', 'spark.executor.memory=4g',
  28. '--conf', 'spark.executor.cores=2',
  29. '--conf', 'spark.driver.memory=2g',
  30. 'oss://spark/job/test_spark_k8s_job_simple.py'
  31. ],
  32. is_delete_operator_pod=True,
  33. config_file='/opt/airflow/plugins/k8sconfig-spark-user.json',
  34. get_logs=True,
  35. dag=dag
  36. )

驱动程序Pod中的日志:

  1. spark 08:40:12.26
  2. spark 08:40:12.26 Welcome to the Bitnami spark container
  3. # 更多日志...

执行器Pod中的循环日志:

  1. # 循环日志...

请问是否有人遇到过这种情况?希望能够了解更多信息。

英文:

I submitted a Spark Job through Airflow with KubernetesPodOperator as the code below; the driver pod is created, but the executor pod keeps being created and deleted over and over.

  1. spark_submit = KubernetesPodOperator(
  2. task_id='test_spark_k8s_submit',
  3. name='test_spark_k8s_submit',
  4. namespace='dev-spark',
  5. image='docker.io/vinhlq9/bitnami-spark-3.3',
  6. cmds=['/opt/spark/bin/spark-submit'],
  7. arguments=[
  8. '--master', k8s_url,
  9. '--deploy-mode', 'cluster',
  10. '--name', 'spark-job',
  11. '--conf', 'spark.kubernetes.namespace=dev-spark',
  12. '--conf', 'spark.kubernetes.container.image=docker.io/vinhlq9/bitnami-spark-3.3',
  13. '--conf', 'spark.kubernetes.authenticate.driver.serviceAccountName=spark-user',
  14. '--conf', 'spark.kubernetes.authenticate.executor.serviceAccountName=spark-user',
  15. '--conf', 'spark.kubernetes.driverEnv.SPARK_CONF_DIR=/opt/bitnami/spark/conf',
  16. '--conf', 'spark.kubernetes.driverEnv.SPARK_CONFIG_MAP=spark-config',
  17. '--conf', 'spark.kubernetes.file.upload.path=/opt/spark',
  18. '--conf', 'spark.kubernetes.driver.annotation.sidecar.istio.io/inject=false',
  19. '--conf', 'spark.kubernetes.executor.annotation.sidecar.istio.io/inject=false',
  20. '--conf', 'spark.eventLog.enabled=true ',
  21. '--conf', 'spark.eventLog.dir=oss://spark/spark-log/',
  22. '--conf', 'spark.hadoop.fs.oss.accessKeyId=' + spark_user_access_key ,
  23. '--conf', 'spark.hadoop.fs.oss.accessKeySecret=' + spark_user_secret_key,
  24. '--conf', 'spark.hadoop.fs.oss.endpoint=' + spark_user_endpoint,
  25. '--conf', 'spark.hadoop.fs.oss.impl=org.apache.hadoop.fs.aliyun.oss.AliyunOSSFileSystem',
  26. '--conf', 'spark.executor.instances=1',
  27. '--conf', 'spark.executor.memory=4g',
  28. '--conf', 'spark.executor.cores=2',
  29. '--conf', 'spark.driver.memory=2g',
  30. 'oss://spark/job/test_spark_k8s_job_simple.py'
  31. ],
  32. is_delete_operator_pod=True,
  33. config_file='/opt/airflow/plugins/k8sconfig-spark-user.json',
  34. get_logs=True,
  35. dag=dag
  36. )

And the logs in the driver pod:

  1. spark 08:40:12.26
  2. spark 08:40:12.26 Welcome to the Bitnami spark container
  3. spark 08:40:12.27 Subscribe to project updates by watching https://github.com/bitnami/containers
  4. spark 08:40:12.27 Submit issues and feature requests at https://github.com/bitnami/containers/issues
  5. spark 08:40:12.27
  6. 23/05/16 08:40:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
  7. 23/05/16 08:40:16 INFO SparkContext: Running Spark version 3.3.2
  8. 23/05/16 08:40:16 INFO ResourceUtils: ==============================================================
  9. 23/05/16 08:40:16 INFO ResourceUtils: No custom resources configured for spark.driver.
  10. 23/05/16 08:40:16 INFO ResourceUtils: ==============================================================
  11. 23/05/16 08:40:16 INFO SparkContext: Submitted application: spark-read-csv
  12. 23/05/16 08:40:16 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 2, script: , vendor: , memory -> name: memory, amount: 4096, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)
  13. 23/05/16 08:40:16 INFO ResourceProfile: Limiting resource is cpus at 2 tasks per executor
  14. 23/05/16 08:40:16 INFO ResourceProfileManager: Added ResourceProfile id: 0
  15. 23/05/16 08:40:16 INFO SecurityManager: Changing view acls to: spark,root
  16. 23/05/16 08:40:16 INFO SecurityManager: Changing modify acls to: spark,root
  17. 23/05/16 08:40:16 INFO SecurityManager: Changing view acls groups to:
  18. 23/05/16 08:40:16 INFO SecurityManager: Changing modify acls groups to:
  19. 23/05/16 08:40:16 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark, root); groups with view permissions: Set(); users with modify permissions: Set(spark, root); groups with modify permissions: Set()
  20. 23/05/16 08:40:16 INFO Utils: Successfully started service 'sparkDriver' on port 7078.
  21. 23/05/16 08:40:16 INFO SparkEnv: Registering MapOutputTracker
  22. 23/05/16 08:40:16 INFO SparkEnv: Registering BlockManagerMaster
  23. 23/05/16 08:40:16 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
  24. 23/05/16 08:40:16 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
  25. 23/05/16 08:40:16 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
  26. 23/05/16 08:40:16 INFO DiskBlockManager: Created local directory at /var/data/spark-77a2ee41-2c8e-45c6-9df6-bb1f549d4566/blockmgr-5350fab4-8dd7-432e-80b3-fbc1924f0dea
  27. 23/05/16 08:40:16 INFO MemoryStore: MemoryStore started with capacity 912.3 MiB
  28. 23/05/16 08:40:16 INFO SparkEnv: Registering OutputCommitCoordinator
  29. 23/05/16 08:40:16 INFO Utils: Successfully started service 'SparkUI' on port 4040.
  30. 23/05/16 08:40:16 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file
  31. 23/05/16 08:40:18 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
  32. 23/05/16 08:40:18 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
  33. 23/05/16 08:40:18 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
  34. 23/05/16 08:40:18 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
  35. 23/05/16 08:40:18 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
  36. 23/05/16 08:40:18 INFO NettyBlockTransferService: Server created on spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc:7079
  37. 23/05/16 08:40:18 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
  38. 23/05/16 08:40:18 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
  39. 23/05/16 08:40:18 INFO BlockManagerMasterEndpoint: Registering block manager spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc:7079 with 912.3 MiB RAM, BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
  40. 23/05/16 08:40:18 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.dev-spark.svc, 7079, None)
  41. 23/05/16 08:40:18 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-job-84e1f08823b7833d-driver-svc.baseline-dev-spark.svc, 7079, None)
  42. 23/05/16 08:40:18 INFO SingleEventLogFileWriter: Logging events to oss://spark/spark-log/spark-f6f3a41be773442dbc9a30781dffbc11.inprogress
  43. 23/05/16 08:40:21 INFO BlockManagerMaster: Removal of executor 1 requested
  44. 23/05/16 08:40:21 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove non-existent executor 1
  45. 23/05/16 08:40:21 INFO BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
  46. 23/05/16 08:40:21 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
  47. 23/05/16 08:40:21 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh
  48. 23/05/16 08:40:21 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script
  49. 23/05/16 08:40:24 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.

The loop in the executor pod:

  1. 23/05/16 08:40:25 INFO BlockManagerMaster: Removal of executor 2 requested
  2. 23/05/16 08:40:25 INFO BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
  3. 23/05/16 08:40:25 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asked to remove non-existent executor 2
  4. 23/05/16 08:40:27 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647.
  5. 23/05/16 08:40:27 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/opt/bitnami/spark/conf) : spark-env.sh

Has anyone encountered this before? Would be great to get an idea about this.

答案1

得分: 1

我已经解决了这个问题,这是由于Spark镜像中的Java版本引起的。

英文:

I already fix this issue, it cause by the java version in spark image

huangapple
  • 本文由 发表于 2023年5月17日 15:55:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76269747.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定