英文:
Executors not seem to be created or scaling up on Spark Application on AWS EMR Serverless
问题
I'm here to assist with the translation. Please let me know which parts you'd like translated.
英文:
I would appreciate your help with my problem.
I'm running a spark application on AWS EMR serverless with emr 6.11 release.
I'm using Spark 3.3.2 with java 17, with configuration: maximum recourses of 200 vCPU and 1600gb memory.
My application is structured to read records from s3 and saving into a broadcast variable,
Then have reading some other records from s3 and iterating over them with parallelize rdd
and checking them against the first records, and saving them at last.
To simplify it is something like this :
Dataset<Row> groupA = readAllRecords();
groupA = manipulateData(groupA);
Dataset<Row> groupB = readAllRecords();
groupB.parallelize( recordB -> {
checkRecord(groupA, recordB)
saveRecord(record)
});
My problem is I don't see in any stage that spark is scaling up any executors,
although I have heavy workloads.
This is my dashboard :
Server UI history while running local:
Is it possible it is all running on the driver ?
what it is i don't see ?
Please let me know if more information is needed.
I attached some of the log at start of the app.
I noticed this error :
Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.spark.storage.BlockManagerId.executorId()" because "idWithoutTopologyInfo" is null
but it is ignored and i'm not sure it's related.
> 23/07/05 11:17:44 INFO SparkContext: Running Spark version
> 3.3.2-amzn-0 23/07/05 11:17:44 INFO ResourceUtils: ============================================================== 23/07/05 11:17:44 INFO ResourceUtils: No custom resources configured
> for spark.driver. 23/07/05 11:17:44 INFO ResourceUtils:
> ============================================================== 23/07/05 11:17:44 INFO SparkContext: Submitted application: FusionApp
> 23/07/05 11:17:44 INFO ResourceProfile: Default ResourceProfile
> created, executor resources: Map(cores -> name: cores, amount: 4,
> script: , vendor: , memory -> name: memory, amount: 14336, script: ,
> vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ),
> task resources: Map(cpus -> name: cpus, amount: 1.0) 23/07/05 11:17:44
> INFO ResourceProfile: Limiting resource is cpus at 4 tasks per
> executor 23/07/05 11:17:44 INFO ResourceProfileManager: Added
> ResourceProfile id: 0 23/07/05 11:17:44 INFO SecurityManager: Changing
> view acls to: hadoop 23/07/05 11:17:44 INFO SecurityManager: Changing
> modify acls to: hadoop 23/07/05 11:17:44 INFO SecurityManager:
> Changing view acls groups to: 23/07/05 11:17:44 INFO SecurityManager:
> Changing modify acls groups to: 23/07/05 11:17:44 INFO
> SecurityManager: SecurityManager: authentication enabled; ui acls
> disabled; users with view permissions: Set(hadoop); groups with view
> permissions: Set(); users with modify permissions: Set(hadoop);
> groups with modify permissions: Set() 23/07/05 11:17:45 INFO Utils:
> Successfully started service 'sparkDriver' on port 41285. 23/07/05
> 11:17:45 INFO SparkEnv: Registering MapOutputTracker 23/07/05 11:17:45
> INFO SparkEnv: Registering BlockManagerMaster 23/07/05 11:17:45 INFO
> BlockManagerMasterEndpoint: Using
> org.apache.spark.storage.DefaultTopologyMapper for getting topology
> information 23/07/05 11:17:45 INFO BlockManagerMasterEndpoint:
> BlockManagerMasterEndpoint up 23/07/05 11:17:45 INFO SparkEnv:
> Registering BlockManagerMasterHeartbeat 23/07/05 11:17:45 INFO
> DiskBlockManager: Created local directory at
> /tmp/blockmgr-98a76adc-5862-4053-93a5-584fec796959 23/07/05 11:17:45
> INFO MemoryStore: MemoryStore started with capacity 8.2 GiB 23/07/05
> 11:17:45 INFO SparkEnv: Registering OutputCommitCoordinator 23/07/05
> 11:17:45 INFO SubResultCacheManager: Sub-result caches are disabled.
> 23/07/05 11:17:45 INFO Utils: Successfully started service 'SparkUI'
> on port 4040. 23/07/05 11:17:45 INFO SparkContext: Added JAR
> s3://jobjars/fusion/fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar
> at
> s3://jobjars/fusion/fusion-job-1.0.0-29114-SNAPSHOT-jar-with-dependencies.jar
> with timestamp 1688555864852 23/07/05 11:17:45 INFO Executor: Starting
> executor ID driver on host ip-172-31.ec2.internal 23/07/05 11:17:45
> INFO Executor: Starting executor with user classpath
> (userClassPathFirst = false):
> 'file:/usr/lib/hadoop-lzo/lib/*,file:/usr/lib/hadoop/hadoop-aws.jar,file:/usr/share/aws/aws-java-sdk/*,file:/usr/share/aws/emr/emrfs/conf/,file:/usr/share/aws/emr/emrfs/lib/*,file:/usr/share/aws/emr/emrfs/auxlib/*,file:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar,file:/usr/share/aws/emr/goodies/lib/emr-serverless-spark-goodies.jar,file:/usr/share/aws/emr/security/conf,file:/usr/share/aws/emr/security/lib/*,file:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar,file:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar,file:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar,file:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar,file:/docker/usr/lib/hadoop-lzo/lib/*,file:/docker/usr/lib/hadoop/hadoop-aws.jar,file:/docker/usr/share/aws/aws-java-sdk/*,file:/docker/usr/share/aws/emr/emrfs/conf,file:/docker/usr/share/aws/emr/emrfs/lib/*,file:/docker/usr/share/aws/emr/emrfs/auxlib/*,file:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar,file:/docker/usr/share/aws/emr/security/conf,file:/docker/usr/share/aws/emr/security/lib/*,file:/docker/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar,file:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar,file:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar,file:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar,file:/usr/share/aws/redshift/jdbc/RedshiftJDBC.jar,file:/usr/share/aws/redshift/spark-redshift/lib/*,file:/home/hadoop/conf,file:/home/hadoop/emr-serverless-spark-goodies.jar,file:/home/hadoop/emr-spark-goodies.jar,file:/home/hadoop/*,file:/home/hadoop/aws-glue-datacatalog-spark-client.jar,file:/home/hadoop/hive-openx-serde.jar,file:/home/hadoop/sagemaker-spark-sdk.jar,file:/home/hadoop/hadoop-aws.jar,file:/home/hadoop/RedshiftJDBC.jar,file:/home/hadoop/emr-s3-select-spark-connector.jar'
> 23/07/05 11:17:45 INFO Executor: Fetching
> s3://jobjars-/fusion/yyyy-fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar
> with timestamp 1688555864852 23/07/05 11:17:45 INFO
> S3NativeFileSystem: Opening
> 's3://jobjars-29114/fusion/xxx-fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar'
> for reading 23/07/05 11:17:45 INFO Utils: Fetching
> s3://jobjars-/fusion/fusion-job-1.0.0--29114-SNAPSHOT-jar-with-dependencies.jar
> to
> /tmp/spark-fd0b8589-fbd6-4b4a-b69f-98955f2f906f/userFiles-7edecef0-92cd-46c4-9ca8-bdabe9c64f86/fetchFileTemp15434886852656050499.tmp
> 23/07/05 11:17:57 INFO Executor: Told to re-register on heartbeat
> 23/07/05 11:17:57 INFO BlockManager: BlockManager null re-registering
> with master 23/07/05 11:17:57 INFO BlockManagerMaster: Registering
> BlockManager null 23/07/05 11:17:57 WARN Executor: Issue communicating
> with driver in heartbeater org.apache.spark.SparkException: Exception
> thrown in awaitResult: at
> org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:103)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:87)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:89)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:643)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.executor.Executor.reportHeartBeat(Executor.scala:1057)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.executor.Executor.$anonfun$heartbeater$1(Executor.scala:238)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> ~[scala-library-2.12.15.jar:?] at
> org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:2100)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.Heartbeater$$anon$1.run(Heartbeater.scala:46)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
> ~[?:?] at
> java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
> ~[?:?] at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
> ~[?:?] at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> ~[?:?] at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?] Caused by:
> java.lang.NullPointerException: Cannot invoke
> "org.apache.spark.storage.BlockManagerId.executorId()" because
> "idWithoutTopologyInfo" is null at
> org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:591)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:123)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
> ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264)
> ~[?:?] 3 more 23/07/05 11:17:57 ERROR Inbox: Ignoring error
> java.lang.NullPointerException: Cannot invoke
> "org.apache.spark.storage.BlockManagerId.executorId()" because
> "idWithoutTopologyInfo" is null at
> org.apache.spark.storage.BlockManagerMasterEndpoint.org$apache$spark$storage$BlockManagerMasterEndpoint$$register(BlockManagerMasterEndpoint.scala:591)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.storage.BlockManagerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(BlockManagerMasterEndpoint.scala:123)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.netty.Inbox.$anonfun$process$1(Inbox.scala:103)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:213)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.netty.MessageLoop.org$apache$spark$rpc$netty$MessageLoop$$receiveLoop(MessageLoop.scala:75)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> org.apache.spark.rpc.netty.MessageLoop$$anon$1.run(MessageLoop.scala:41)
> ~[spark-core_2.12-3.3.2-amzn-0.jar:3.3.2-amzn-0] at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
> ~[?:?] at java.util.concurrent.FutureTask.run(FutureTask.java:264)
> ~[?:?] at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> ~[?:?] at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> ~[?:?] at java.lang.Thread.run(Thread.java:833) ~[?:?] 23/07/05
> 11:18:00 INFO Executor: Adding
> file:/tmp/spark-fd0b8589-fbd6-4b4a-b69f-98955f2f906f/userFiles-92cd-46c4-9ca8-/fusion-job-1.0.0-29114-SNAPSHOT-jar-with-dependencies.jar
> to class loader 23/07/05 11:18:00 INFO Utils: Successfully started
> service 'org.apache.spark.network.netty.NettyBlockTransferService' on
> port 35657. 23/07/05 11:18:00 INFO NettyBlockTransferService: Server
> created on [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 23/07/05
> 11:18:00 INFO BlockManager: Using
> org.apache.spark.storage.RandomBlockReplicationPolicy for block
> replication policy 23/07/05 11:18:00 INFO BlockManagerMaster:
> Registering BlockManager BlockManagerId(driver,
> [2600:1f18:6a92:2601:18f2:3af:f80f:4dab], 35657, None) 23/07/05
> 11:18:00 INFO BlockManagerMasterEndpoint: Registering block manager
> [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 with 8.2 GiB RAM,
> BlockManagerId(driver, [2600:1f18:6a92:2601:18f2:3af:f80f:4dab],
> 35657, None) 23/07/05 11:18:00 INFO BlockManagerMaster: Registered
> BlockManager BlockManagerId(driver,
> [2600:1f18:6a92:2601:18f2:3af:f80f:4dab], 35657, None) 23/07/05
> 11:18:00 INFO BlockManager: Initialized BlockManager:
> BlockManagerId(driver, [2600:1f18:6a92:2601:18f2:3af:f80f:4dab],
> 35657, None) 23/07/05 11:18:00 INFO SingleEventLogFileWriter: Logging
> events to file:/var/log/spark/apps/local-1688555865619.inprogress
> 23/07/05 11:18:00 INFO SharedState: Setting
> hive.metastore.warehouse.dir ('null') to the value of
> spark.sql.warehouse.dir. 23/07/05 11:18:00 INFO SharedState: Warehouse
> path is 'file:/home/hadoop/spark-warehouse'. 23/07/05 11:18:01 INFO
> MemoryStore: Block broadcast_0 stored as values in memory (estimated
> size 72.0 B, free 8.2 GiB) 23/07/05 11:18:01 INFO MemoryStore: Block
> broadcast_0_piece0 stored as bytes in memory (estimated size 124.0 B,
> free 8.2 GiB) 23/07/05 11:18:01 INFO BlockManagerInfo: Added
> broadcast_0_piece0 in memory on
> [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 (size: 124.0 B, free:
> 8.2 GiB) 23/07/05 11:18:01 INFO SparkContext: Created broadcast 0 from broadcast at Transformer.java:76 23/07/05 11:18:01 INFO MemoryStore:
> Block broadcast_1 stored as values in memory (estimated size 168.0 B,
> free 8.2 GiB) 23/07/05 11:18:01 INFO MemoryStore: Block
> broadcast_1_piece0 stored as bytes in memory (estimated size 304.0 B,
> free 8.2 GiB) 23/07/05 11:18:01 INFO BlockManagerInfo: Added
> broadcast_1_piece0 in memory on
> [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 (size: 304.0 B, free:
> 8.2 GiB) 23/07/05 11:18:01 INFO SparkContext: Created broadcast 1 from broadcast at Transformer.java:80 23/07/05 11:18:02 INFO SparkContext:
> Starting job: collect at Transformer.java:123 23/07/05 11:18:02 INFO
> DAGScheduler: Got job 0 (collect at Transformer.java:123) with 52
> output partitions 23/07/05 11:18:02 INFO DAGScheduler: Final stage:
> ResultStage 0 (collect at Transformer.java:123) 23/07/05 11:18:02 INFO
> DAGScheduler: Parents of final stage: List() 23/07/05 11:18:02 INFO
> DAGScheduler: Missing parents: List() 23/07/05 11:18:02 INFO
> DAGScheduler: Submitting ResultStage 0 (ParallelCollectionRDD[0] at
> parallelize at Transformer.java:122), which has no missing parents
> 23/07/05 11:18:02 INFO MemoryStore: Block broadcast_2 stored as values
> in memory (estimated size 3.2 KiB, free 8.2 GiB) 23/07/05 11:18:02
> INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory
> (estimated size 1899.0 B, free 8.2 GiB) 23/07/05 11:18:02 INFO
> BlockManagerInfo: Added broadcast_2_piece0 in memory on
> [2600:1f18:6a92:2601:18f2:3af:f80f:4dab]:35657 (size: 1899.0 B, free:
> 8.2 GiB) 23/07/05 11:18:02 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1570 23/07/05 11:18:02 INFO
> DAGScheduler: Submitting 52 missing tasks from ResultStage 0
> (ParallelCollectionRDD[0] at parallelize at Transformer.java:122)
> (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8,
> 9, 10, 11, 12, 13, 14)) 23/07/05 11:18:02 INFO TaskSchedulerImpl:
> Adding task set 0.0 with 52 tasks resource profile 0 23/07/05 11:18:02
> INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0)
> (ip-172-31.ec2.internal, executor driver, partition 0, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1)
> (ip-172-31.ec2.internal, executor driver, partition 1, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2)
> (ip-172-31.ec2.internal, executor driver, partition 2, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3)
> (ip-172-31.ec2.internal, executor driver, partition 3, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Running task 1.0 in stage 0.0 (TID 1) 23/07/05 11:18:02 INFO
> Executor: Running task 2.0 in stage 0.0 (TID 2) 23/07/05 11:18:02 INFO
> Executor: Running task 3.0 in stage 0.0 (TID 3) 23/07/05 11:18:02 INFO
> Executor: Running task 0.0 in stage 0.0 (TID 0) 23/07/05 11:18:02 INFO
> Executor: Finished task 2.0 in stage 0.0 (TID 2). 1496 bytes result
> sent to driver 23/07/05 11:18:02 INFO Executor: Finished task 3.0 in
> stage 0.0 (TID 3). 1496 bytes result sent to driver 23/07/05 11:18:02
> INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1496 bytes
> result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting
> task 4.0 in stage 0.0 (TID 4) (ip-172-31.ec2.internal, executor
> driver, partition 4, PROCESS_LOCAL, 5097 bytes)
> taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running
> task 4.0 in stage 0.0 (TID 4) 23/07/05 11:18:02 INFO Executor:
> Finished task 1.0 in stage 0.0 (TID 1). 1496 bytes result sent to
> driver 23/07/05 11:18:02 INFO Executor: Finished task 4.0 in stage 0.0
> (TID 4). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO
> TaskSetManager: Starting task 5.0 in stage 0.0 (TID 5)
> (ip-172-31.ec2.internal, executor driver, partition 5, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Running task 5.0 in stage 0.0 (TID 5) 23/07/05 11:18:02 INFO
> Executor: Finished task 5.0 in stage 0.0 (TID 5). 1496 bytes result
> sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task
> 6.0 in stage 0.0 (TID 6) (ip-172-31.ec2.internal, executor driver, partition 6, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map()
> 23/07/05 11:18:02 INFO Executor: Running task 6.0 in stage 0.0 (TID 6)
> 23/07/05 11:18:02 INFO TaskSetManager: Starting task 7.0 in stage 0.0
> (TID 7) (ip-172-31.ec2.internal, executor driver, partition 7,
> PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05
> 11:18:02 INFO Executor: Running task 7.0 in stage 0.0 (TID 7) 23/07/05
> 11:18:02 INFO TaskSetManager: Starting task 8.0 in stage 0.0 (TID 8)
> (ip-172-31.ec2.internal, executor driver, partition 8, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Running task 8.0 in stage 0.0 (TID 8) 23/07/05 11:18:02 INFO
> TaskSetManager: Starting task 9.0 in stage 0.0 (TID 9)
> (ip-172-31.ec2.internal, executor driver, partition 9, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Finished task 6.0 in stage 0.0 (TID 6). 1496 bytes result
> sent to driver 23/07/05 11:18:02 INFO Executor: Running task 9.0 in
> stage 0.0 (TID 9) 23/07/05 11:18:02 INFO TaskSetManager: Starting task
> 10.0 in stage 0.0 (TID 10) (ip-172-31.ec2.internal, executor driver, partition 10, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map()
> 23/07/05 11:18:02 INFO Executor: Running task 10.0 in stage 0.0 (TID
> 10) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 0.0 in stage
> 0.0 (TID 0) in 143 ms on ip-172-31.ec2.internal (executor driver) (1/52) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 3.0 in
> stage 0.0 (TID 3) in 122 ms on ip-172-31.ec2.internal (executor
> driver) (2/52) 23/07/05 11:18:02 INFO TaskSetManager: Finished task
> 4.0 in stage 0.0 (TID 4) in 33 ms on ip-172-31.ec2.internal (executor driver) (3/52) 23/07/05 11:18:02 INFO Executor: Finished task 9.0 in
> stage 0.0 (TID 9). 1496 bytes result sent to driver 23/07/05 11:18:02
> INFO TaskSetManager: Finished task 5.0 in stage 0.0 (TID 5) in 25 ms
> on ip-172-31.ec2.internal (executor driver) (4/52) 23/07/05 11:18:02
> INFO TaskSetManager: Finished task 2.0 in stage 0.0 (TID 2) in 124 ms
> on ip-172-31.ec2.internal (executor driver) (5/52) 23/07/05 11:18:02
> INFO Executor: Finished task 10.0 in stage 0.0 (TID 10). 1496 bytes
> result sent to driver 23/07/05 11:18:02 INFO Executor: Finished task
> 8.0 in stage 0.0 (TID 8). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1)
> in 140 ms on ip-172-31.ec2.internal (executor driver) (6/52) 23/07/05
> 11:18:02 INFO TaskSetManager: Starting task 11.0 in stage 0.0 (TID 11)
> (ip-172-31.ec2.internal, executor driver, partition 11, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Running task 11.0 in stage 0.0 (TID 11) 23/07/05 11:18:02
> INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 1496 bytes
> result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Finished
> task 6.0 in stage 0.0 (TID 6) in 34 ms on ip-172-31.ec2.internal
> (executor driver) (7/52) 23/07/05 11:18:02 INFO Executor: Finished
> task 11.0 in stage 0.0 (TID 11). 1496 bytes result sent to driver
> 23/07/05 11:18:02 INFO TaskSetManager: Starting task 12.0 in stage 0.0
> (TID 12) (ip-172-31.ec2.internal, executor driver, partition 12,
> PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05
> 11:18:02 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9)
> in 30 ms on ip-172-31.ec2.internal (executor driver) (8/52) 23/07/05
> 11:18:02 INFO TaskSetManager: Finished task 10.0 in stage 0.0 (TID 10)
> in 29 ms on ip-172-31.ec2.internal (executor driver) (9/52) 23/07/05
> 11:18:02 INFO Executor: Running task 12.0 in stage 0.0 (TID 12)
> 23/07/05 11:18:02 INFO TaskSetManager: Starting task 13.0 in stage 0.0
> (TID 13) (ip-172-31.ec2.internal, executor driver, partition 13,
> PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map() 23/07/05
> 11:18:02 INFO Executor: Finished task 12.0 in stage 0.0 (TID 12). 1496
> bytes result sent to driver 23/07/05 11:18:02 INFO Executor: Running
> task 13.0 in stage 0.0 (TID 13) 23/07/05 11:18:02 INFO TaskSetManager:
> Finished task 8.0 in stage 0.0 (TID 8) in 39 ms on
> ip-172-31.ec2.internal (executor driver) (10/52) 23/07/05 11:18:02
> INFO TaskSetManager: Finished task 7.0 in stage 0.0 (TID 7) in 42 ms
> on ip-172-31.ec2.internal (executor driver) (11/52) 23/07/05 11:18:02
> INFO TaskSetManager: Starting task 14.0 in stage 0.0 (TID 14)
> (ip-172-31.ec2.internal, executor driver, partition 14, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Finished task 13.0 in stage 0.0 (TID 13). 1496 bytes result
> sent to driver 23/07/05 11:18:02 INFO Executor: Running task 14.0 in
> stage 0.0 (TID 14) 23/07/05 11:18:02 INFO TaskSetManager: Starting
> task 15.0 in stage 0.0 (TID 15) (ip-172-31.ec2.internal, executor
> driver, partition 15, PROCESS_LOCAL, 5097 bytes)
> taskResourceAssignments Map() 23/07/05 11:18:02 INFO Executor: Running
> task 15.0 in stage 0.0 (TID 15) 23/07/05 11:18:02 INFO TaskSetManager:
> Finished task 11.0 in stage 0.0 (TID 11) in 21 ms on
> ip-172-31.ec2.internal (executor driver) (12/52) 23/07/05 11:18:02
> INFO Executor: Finished task 14.0 in stage 0.0 (TID 14). 1496 bytes
> result sent to driver 23/07/05 11:18:02 INFO Executor: Finished task
> 15.0 in stage 0.0 (TID 15). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task 16.0 in stage 0.0 (TID 16)
> (ip-172-31.ec2.internal, executor driver, partition 16, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> TaskSetManager: Finished task 12.0 in stage 0.0 (TID 12) in 24 ms on
> ip-172-31.ec2.internal (executor driver) (13/52) 23/07/05 11:18:02
> INFO TaskSetManager: Finished task 13.0 in stage 0.0 (TID 13) in 18 ms
> on ip-172-31.ec2.internal (executor driver) (14/52) 23/07/05 11:18:02
> INFO Executor: Running task 16.0 in stage 0.0 (TID 16) 23/07/05
> 11:18:02 INFO TaskSetManager: Starting task 17.0 in stage 0.0 (TID 17)
> (ip-172-31.ec2.internal, executor driver, partition 17, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Finished task 16.0 in stage 0.0 (TID 16). 1496 bytes result
> sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Starting task
> 18.0 in stage 0.0 (TID 18) (ip-172-31.ec2.internal, executor driver, partition 18, PROCESS_LOCAL, 5097 bytes) taskResourceAssignments Map()
> 23/07/05 11:18:02 INFO Executor: Running task 18.0 in stage 0.0 (TID
> 18) 23/07/05 11:18:02 INFO TaskSetManager: Finished task 14.0 in stage
> 0.0 (TID 14) in 20 ms on ip-172-31.ec2.internal (executor driver) (15/52) 23/07/05 11:18:02 INFO Executor: Finished task 18.0 in stage
> 0.0 (TID 18). 1496 bytes result sent to driver 23/07/05 11:18:02 INFO TaskSetManager: Finished task 15.0 in stage 0.0 (TID 15) in 22 ms on
> ip-172-31.ec2.internal (executor driver) (16/52) 23/07/05 11:18:02
> INFO Executor: Running task 17.0 in stage 0.0 (TID 17) 23/07/05
> 11:18:02 INFO TaskSetManager: Starting task 19.0 in stage 0.0 (TID 19)
> (ip-172-31.ec2.internal, executor driver, partition 19, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Running task 19.0 in stage 0.0 (TID 19) 23/07/05 11:18:02
> INFO TaskSetManager: Finished task 16.0 in stage 0.0 (TID 16) in 16 ms
> on ip-172-31.ec2.internal (executor driver) (17/52) 23/07/05 11:18:02
> INFO TaskSetManager: Starting task 20.0 in stage 0.0 (TID 20)
> (ip-172-31.ec2.internal, executor driver, partition 20, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Running task 20.0 in stage 0.0 (TID 20) 23/07/05 11:18:02
> INFO TaskSetManager: Starting task 21.0 in stage 0.0 (TID 21)
> (ip-172-31.ec2.internal, executor driver, partition 21, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Finished task 17.0 in stage 0.0 (TID 17). 1496 bytes result
> sent to driver 23/07/05 11:18:02 INFO Executor: Finished task 19.0 in
> stage 0.0 (TID 19). 1453 bytes result sent to driver 23/07/05 11:18:02
> INFO Executor: Running task 21.0 in stage 0.0 (TID 21) 23/07/05
> 11:18:02 INFO TaskSetManager: Starting task 22.0 in stage 0.0 (TID 22)
> (ip-172-31.ec2.internal, executor driver, partition 22, PROCESS_LOCAL,
> 5097 bytes) taskResourceAssignments Map() 23/07/05 11:18:02 INFO
> Executor: Running task 22.0 in stage 0.0 (TID 22)
Any insight would be precious
Thanks a lot!
I already was trying to add executors.instances and dynamicAllocation = true
also tried Dataset recordsDF.repartition(100) for example
答案1
得分: 0
对于遇到相同问题的任何人,我发现问题非常简单 - 看起来 'spark.setMaster(local[*])' 应该仅在本地运行时使用,而不是在集群上运行。
英文:
For anyone encounter the same problem, i found the problem was quite simple - it appears that 'spark.setMaster(local[*])' should only be used when you run locally and not to run on the cluster.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论