Apache Flink – Getting `NoResourceAvailableException` with local execution while using `slot_sharing_group`

huangapple go评论104阅读模式
英文:

Apache Flink - Getting `NoResourceAvailableException` with local execution while using `slot_sharing_group`

问题

我正在尝试通过PyCharm在本地执行Flink 1.17.1作业。

我的代码使用DataStream API,我正在从Kafka主题中读取数据,并使用.execute().print()将其打印到控制台。
然而,在使用slot_sharing_group时,我遇到了一些错误,如果我注释掉使用slot_sharing_group的行,则不会出现错误。

我已经检查过我有足够的taskamanger.numberOfTaskSlots(当我没有足够的任务插槽时,我期望会看到另一个错误,但我仍然验证了这个数字)。

以下是我得到的错误:

  1. py4j.protocol.Py4JJavaError: 在调用 o545.print 时发生错误。
  2. : java.lang.RuntimeException: 获取下一个结果失败
  3. at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:109)
  4. at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80)
  5. ...
  6. Caused by: java.io.IOException: 获取作业执行结果失败
  7. at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.getAccumulatorResults(CollectResultFetcher.java:184)
  8. ...
  9. Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobExecutionException: 作业执行失败。
  10. ...
  11. Caused by: org.apache.flink.runtime.client.JobExecutionException: 作业执行失败。
  12. ...
  13. Caused by: org.apache.flink.runtime.JobException: NoRestartBackoffTimeStrategy 抑制恢复
  14. ...
  15. Caused by: org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:139)
  16. ...
  17. Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 无法获取所需的最小资源。
  18. 进程退出,退出代码为 1

有关这个异常的来源有什么线索吗?

英文:

I'm trying to run a Flink 1.17.1 job with local execution through PyCharm.

My code is using the DataStream API and I'm reading data from a Kafka topic and printing it to the console with .execute().print().
However, I'm encountering some errors when using the slot_sharing_group, if I comment out the lines using slot_sharing_group.

I've checked that I have enough taskamanger.numberOfTaskSlots. (It would expect to see another error when I don't have enough task slots but I validated that number nonetheless).

Here's the error I'm getting:

  1. py4j.protocol.Py4JJavaError: An error occurred while calling o545.print.
  2. : java.lang.RuntimeException: Failed to fetch next result
  3. at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:109)
  4. at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.hasNext(CollectResultIterator.java:80)
  5. at org.apache.flink.table.planner.connectors.CollectDynamicSink$CloseableRowIteratorWrapper.hasNext(CollectDynamicSink.java:222)
  6. at org.apache.flink.table.utils.print.TableauStyle.print(TableauStyle.java:120)
  7. at org.apache.flink.table.api.internal.TableResultImpl.print(TableResultImpl.java:153)
  8. at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  9. at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  10. at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  11. at java.base/java.lang.reflect.Method.invoke(Method.java:566)
  12. at org.apache.flink.api.python.shaded.py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
  13. at org.apache.flink.api.python.shaded.py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
  14. at org.apache.flink.api.python.shaded.py4j.Gateway.invoke(Gateway.java:282)
  15. at org.apache.flink.api.python.shaded.py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
  16. at org.apache.flink.api.python.shaded.py4j.commands.CallCommand.execute(CallCommand.java:79)
  17. at org.apache.flink.api.python.shaded.py4j.GatewayConnection.run(GatewayConnection.java:238)
  18. at java.base/java.lang.Thread.run(Thread.java:829)
  19. Caused by: java.io.IOException: Failed to fetch job execution result
  20. at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.getAccumulatorResults(CollectResultFetcher.java:184)
  21. at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.next(CollectResultFetcher.java:121)
  22. at org.apache.flink.streaming.api.operators.collect.CollectResultIterator.nextResultFromFetcher(CollectResultIterator.java:106)
  23. ... 15 more
  24. Caused by: java.util.concurrent.ExecutionException: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
  25. at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
  26. at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2022)
  27. at org.apache.flink.streaming.api.operators.collect.CollectResultFetcher.getAccumulatorResults(CollectResultFetcher.java:182)
  28. ... 17 more
  29. Caused by: org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
  30. at org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:144)
  31. at org.apache.flink.runtime.minicluster.MiniClusterJobClient.lambda$getJobExecutionResult$3(MiniClusterJobClient.java:141)
  32. at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:642)
  33. at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
  34. at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
  35. at org.apache.flink.runtime.rpc.akka.AkkaInvocationHandler.lambda$invokeRpc$1(AkkaInvocationHandler.java:267)
  36. at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
  37. at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
  38. at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
  39. at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
  40. at org.apache.flink.util.concurrent.FutureUtils.doForward(FutureUtils.java:1300)
  41. at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$null$1(ClassLoadingUtils.java:93)
  42. at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:68)
  43. at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.lambda$guardCompletionWithContextClassLoader$2(ClassLoadingUtils.java:92)
  44. at java.base/java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:859)
  45. at java.base/java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:837)
  46. at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
  47. at java.base/java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:2073)
  48. at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$1.onComplete(AkkaFutureUtils.java:47)
  49. at akka.dispatch.OnComplete.internal(Future.scala:300)
  50. at akka.dispatch.OnComplete.internal(Future.scala:297)
  51. at akka.dispatch.japi$CallbackBridge.apply(Future.scala:224)
  52. at akka.dispatch.japi$CallbackBridge.apply(Future.scala:221)
  53. at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
  54. at org.apache.flink.runtime.concurrent.akka.AkkaFutureUtils$DirectExecutionContext.execute(AkkaFutureUtils.java:65)
  55. at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:72)
  56. at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1(Promise.scala:288)
  57. at scala.concurrent.impl.Promise$DefaultPromise.$anonfun$tryComplete$1$adapted(Promise.scala:288)
  58. at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:288)
  59. at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:622)
  60. at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:24)
  61. at akka.pattern.PipeToSupport$PipeableFuture$$anonfun$pipeTo$1.applyOrElse(PipeToSupport.scala:23)
  62. at scala.concurrent.Future.$anonfun$andThen$1(Future.scala:536)
  63. at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
  64. at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
  65. at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
  66. at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:63)
  67. at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:100)
  68. at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  69. at scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:85)
  70. at akka.dispatch.BatchingExecutor$BlockableBatch.run(BatchingExecutor.scala:100)
  71. at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:49)
  72. at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(ForkJoinExecutorConfigurator.scala:48)
  73. at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
  74. at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
  75. at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
  76. at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
  77. at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
  78. Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by NoRestartBackoffTimeStrategy
  79. at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:139)
  80. at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:83)
  81. at org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:258)
  82. at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:249)
  83. at org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:242)
  84. at org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:748)
  85. at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:725)
  86. at org.apache.flink.runtime.scheduler.UpdateSchedulerNgOnInternalFailuresListener.notifyTaskFailure(UpdateSchedulerNgOnInternalFailuresListener.java:51)
  87. at org.apache.flink.runtime.executiongraph.DefaultExecutionGraph.notifySchedulerNgAboutInternalTaskFailure(DefaultExecutionGraph.java:1664)
  88. at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1140)
  89. at org.apache.flink.runtime.executiongraph.Execution.processFail(Execution.java:1080)
  90. at org.apache.flink.runtime.executiongraph.Execution.markFailed(Execution.java:919)
  91. at org.apache.flink.runtime.scheduler.DefaultExecutionOperations.markFailed(DefaultExecutionOperations.java:43)
  92. at org.apache.flink.runtime.scheduler.DefaultExecutionDeployer.handleTaskDeploymentFailure(DefaultExecutionDeployer.java:327)
  93. at org.apache.flink.runtime.scheduler.DefaultExecutionDeployer.lambda$assignAllResourcesAndRegisterProducedPartitions$2(DefaultExecutionDeployer.java:170)
  94. at java.base/java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:930)
  95. at java.base/java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:907)
  96. at java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:506)
  97. at java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2088)
  98. at org.apache.flink.runtime.jobmaster.slotpool.PendingRequest.failRequest(PendingRequest.java:88)
  99. at org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolBridge.cancelPendingRequests(DeclarativeSlotPoolBridge.java:185)
  100. at org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolBridge.failPendingRequests(DeclarativeSlotPoolBridge.java:408)
  101. at org.apache.flink.runtime.jobmaster.slotpool.DeclarativeSlotPoolBridge.notifyNotEnoughResourcesAvailable(DeclarativeSlotPoolBridge.java:396)
  102. at org.apache.flink.runtime.jobmaster.JobMaster.notifyNotEnoughResourcesAvailable(JobMaster.java:887)
  103. at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  104. at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  105. at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  106. at java.base/java.lang.reflect.Method.invoke(Method.java:566)
  107. at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$0(AkkaRpcActor.java:301)
  108. at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)
  109. at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:300)
  110. at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:222)
  111. at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:84)
  112. at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:168)
  113. at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)
  114. at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)
  115. at scala.PartialFunction.applyOrElse(PartialFunction.scala:127)
  116. at scala.PartialFunction.applyOrElse$(PartialFunction.scala:126)
  117. at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)
  118. at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:175)
  119. at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
  120. at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:176)
  121. at akka.actor.Actor.aroundReceive(Actor.scala:537)
  122. at akka.actor.Actor.aroundReceive$(Actor.scala:535)
  123. at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)
  124. at akka.actor.ActorCell.receiveMessage(ActorCell.scala:579)
  125. at akka.actor.ActorCell.invoke(ActorCell.scala:547)
  126. at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
  127. at akka.dispatch.Mailbox.run(Mailbox.scala:231)
  128. at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
  129. ... 5 more
  130. Caused by: java.util.concurrent.CompletionException: java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not acquire the minimum required resources.
  131. at org.apache.flink.runtime.scheduler.DefaultExecutionDeployer.lambda$assignResource$4(DefaultExecutionDeployer.java:227)
  132. ... 40 more
  133. Caused by: java.util.concurrent.CompletionException: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not acquire the minimum required resources.
  134. at java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:331)
  135. at java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:346)
  136. at java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:632)
  137. ... 38 more
  138. Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not acquire the minimum required resources.
  139. Process finished with exit code 1

Any clue what could be the source of the exception ?

答案1

得分: 0

问题在于我在表API级别设置了配置,而应该在全局配置级别进行设置。

所以,改为这样:

  1. # 在全局级别进行配置
  2. config = Configuration()
  3. config.set_integer('parallelism.default', 1)
  4. config.set_integer('taskmanager.numberOfTaskSlots', 2)
  5. ds_env = StreamExecutionEnvironment.get_execution_environment(config)
  6. t_env = StreamTableEnvironment.create(ds_env)

然后它就起作用了。

英文:

The problem was that I was setting my configuration at the Table API level while it should have been at the global configuration level.

So instead of this:

  1. config = Configuration()
  2. ds_env = StreamExecutionEnvironment.get_execution_environment(config)
  3. t_env = StreamTableEnvironment.create(ds_env)
  4. # Configuration at the Table API scope
  5. t_env.get_config().set('parallelism.default', 1)
  6. t_env.get_config().set('taskmanager.numberOfTaskSlots', 2)

I used this:

  1. # Configuration at the global scope
  2. config = Configuration()
  3. config.set_integer('parallelism.default', 1)
  4. config.set_integer('taskmanager.numberOfTaskSlots', 2)
  5. ds_env = StreamExecutionEnvironment.get_execution_environment(config)
  6. t_env = StreamTableEnvironment.create(ds_env)

And it worked.

huangapple
  • 本文由 发表于 2023年7月18日 01:34:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76706862.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定