英文:
Spark UI reported time of execution plan doesn't match real time by a factor of 3x
问题
我在Databricks中运行了一个查询,笔记本上显示它花了12秒。同时,当我通过Spark UI浏览SQL执行计划时,它报告了12104毫秒,这是匹配的。但是,执行计划的阶段说它们中的一个单独阶段花了30秒。
以下是表格格式的执行计划。在下面是图形格式的执行计划。如果您注意表格格式,最左上角显示“已完成,用时12104毫秒”,但如果您阅读详细信息,您会看到其中一个* PhotonShuffleMapStage
步骤(由操作(7)
和(8)
组成)报告花了29.6秒
....
那么为什么整个查询说花了12秒,而其中一个阶段却说花了29.6秒?
Completed in 12104 ms
Node | # Tasks | Duration total (min, med, max) | # Rows | Est # Rows | Peak Mem total (min, med, max)
----------------------------------------------------------+---------+-----------------------------------------------------+-------------+------------+----------------------------------------------------------------
PhotonShuffleMapStage | 4 | 1.6 s (0 ms, 0 ms, 448 ms (stage 56.0: task 186)) | 0 | - | 824.0 MiB (0.0 B, 0.0 B, 206.0 MiB (stage 56.0: task 187))
-(1): PhotonScan parquet hive_metastore.default.user | - | 241 ms (0 ms, 0 ms, 77 ms (stage 56.0: task 188)) | 10,000,000 | - | 64.0 MiB (0.0 B, 0.0 B, 16.0 MiB (stage 56.0: task 187))
-(2): PhotonShuffleExchangeSink | - | 1.6 s (0 ms, 0 ms, 448 ms (stage 56.0: task 186)) | 10,000,000 | - | 248.0 MiB (0.0 B, 0.0 B, 62.0 MiB (stage 56.0: task 187))
PhotonShuffleMapStage | 8 | 10.0 s (0 ms, 1.4 s, 1.5 s (stage 60.0: task 201)) | 0 | - | 1626.0 MiB (0.0 B, 210.0 MiB, 212.0 MiB (stage 60.0: task 203))
-(5): AQEShuffleRead | - | - | - | - | -
-(6): PhotonShuffleExchangeSource | - | 136 ms (0 ms, 17 ms, 24 ms (stage 60.0: task 203)) | 10,000,000 | - | -
-(11): AQEShuffleRead | - | - | - | - | -
-(12): PhotonShuffleExchangeSource | - | 2.3 s (0 ms, 301 ms, 370 ms (stage 60.0: task 201)) | 320,000,000 | - | -
-(13): PhotonShuffledHashJoin | - | 9.9 s (0 ms, 1.4 s, 1.5 s (stage 60.0: task 201)) | 329,951,357 | - | 602.0 MiB (0.0 B, 82.0 MiB, 84.0 MiB (stage 60.0: task 203))
-(14): PhotonProject | - | 10.0 s (0 ms, 1.4 s, 1.5 s (stage 60.0: task 201)) | 329,951,357 | - | -
-(15): PhotonAgg | - | 10.0 s (0 ms, 1.4 s, 1.5 s (stage 60.0: task 201)) | 8 | - | -
-(16): PhotonShuffleExchangeSink | - | 10.0 s (0 ms, 1.4 s, 1.5 s (stage 60.0: task 201)) | 8 | - | 32.0 MiB (0.0 B, 4.0 MiB, 4.0 MiB (stage 60.0: task 200))
PhotonShuffleMapStage | 8 | 29.6 s (0 ms, 0 ms, 4.8 s (stage 57.0: task 195)) | 0 | - | 7.9 GiB (0.0 B, 0.0 B, 1158.0 MiB (stage 57.0: task 194))
-(7): PhotonScan parquet hive_metastore.default.revision | - | 2.9 s (0 ms, 0 ms, 474 ms (stage 57.0: task 193)) | 320,000,000 | - | 128.0 MiB (0.0 B, 0.0 B, 16.0 MiB (stage 57.0: task 190))
-(8): PhotonShuffleExchangeSink | - | 29.6 s (0 ms, 0 ms, 4.8 s (stage 57.0: task 195)) | 320,000,000 | - | 6.8 GiB (0.0 B, 0.0 B, 1014.0 MiB (stage 57.0: task 194))
PhotonResultStage | 1 | 0 ms | 0 | - | 128.0 MiB
-(19): PhotonShuffleExchangeSource | - | 0 ms | 8 | 1 | -
-(20): PhotonAgg | - | 0 ms | 1 | 1 | -
WholeStageCodegen (1) | 1 | 0 ms | - | - | -
-(22): ColumnarToRow | - | - | 1 | - | -
(40): AdaptiveSparkPlan | - | - | - | -
英文:
I ran a query in databricks, and the notebook it says it took 12 seconds. Meanwhile when I navigated through the spark UI to see the SQL execution plan it reports 12104 ms, which it matches. But the phases the execution plan is composed of say that they took as long as 30 seconds for a single phase.
Here is the execution plan in tabular format. And down below it's on graphical format. If you notice on the tabular format, at the very top left corner it says Completed in 12104 ms
but then if you read the breakdown you can see that one of the *PhotonShuffleMapStage
step (which is made of operations (7)
and (8)
) reports it took 29.6s
....
So how come the entire query say it took 12 seconds when a single of its phases says 29.6 seconds?
Completed in 12104 ms
Node | # Tasks | Duration total (min, med, max) | # Rows | Est # Rows | Peak Mem total (min, med, max)
----------------------------------------------------------+---------+-----------------------------------------------------+-------------+------------+----------------------------------------------------------------
*PhotonShuffleMapStage | 4 | 1.6 s (0 ms, 0 ms, 448 ms (stage 56.0: task 186)) | 0 | - | 824.0 MiB (0.0 B, 0.0 B, 206.0 MiB (stage 56.0: task 187))
-*(1): PhotonScan parquet hive_metastore.default.user | - | 241 ms (0 ms, 0 ms, 77 ms (stage 56.0: task 188)) | 10,000,000 | - | 64.0 MiB (0.0 B, 0.0 B, 16.0 MiB (stage 56.0: task 187))
-*(2): PhotonShuffleExchangeSink | - | 1.6 s (0 ms, 0 ms, 448 ms (stage 56.0: task 186)) | 10,000,000 | - | 248.0 MiB (0.0 B, 0.0 B, 62.0 MiB (stage 56.0: task 187))
*PhotonShuffleMapStage | 8 | 10.0 s (0 ms, 1.4 s, 1.5 s (stage 60.0: task 201)) | 0 | - | 1626.0 MiB (0.0 B, 210.0 MiB, 212.0 MiB (stage 60.0: task 203))
-*(5): AQEShuffleRead | - | - | - | - | -
-*(6): PhotonShuffleExchangeSource | - | 136 ms (0 ms, 17 ms, 24 ms (stage 60.0: task 203)) | 10,000,000 | - | -
-*(11): AQEShuffleRead | - | - | - | - | -
-*(12): PhotonShuffleExchangeSource | - | 2.3 s (0 ms, 301 ms, 370 ms (stage 60.0: task 201)) | 320,000,000 | - | -
-*(13): PhotonShuffledHashJoin | - | 9.9 s (0 ms, 1.4 s, 1.5 s (stage 60.0: task 201)) | 329,951,357 | - | 602.0 MiB (0.0 B, 82.0 MiB, 84.0 MiB (stage 60.0: task 203))
-*(14): PhotonProject | - | 10.0 s (0 ms, 1.4 s, 1.5 s (stage 60.0: task 201)) | 329,951,357 | - | -
-*(15): PhotonAgg | - | 10.0 s (0 ms, 1.4 s, 1.5 s (stage 60.0: task 201)) | 8 | - | -
-*(16): PhotonShuffleExchangeSink | - | 10.0 s (0 ms, 1.4 s, 1.5 s (stage 60.0: task 201)) | 8 | - | 32.0 MiB (0.0 B, 4.0 MiB, 4.0 MiB (stage 60.0: task 200))
*PhotonShuffleMapStage | 8 | 29.6 s (0 ms, 0 ms, 4.8 s (stage 57.0: task 195)) | 0 | - | 7.9 GiB (0.0 B, 0.0 B, 1158.0 MiB (stage 57.0: task 194))
-*(7): PhotonScan parquet hive_metastore.default.revision | - | 2.9 s (0 ms, 0 ms, 474 ms (stage 57.0: task 193)) | 320,000,000 | - | 128.0 MiB (0.0 B, 0.0 B, 16.0 MiB (stage 57.0: task 190))
-*(8): PhotonShuffleExchangeSink | - | 29.6 s (0 ms, 0 ms, 4.8 s (stage 57.0: task 195)) | 320,000,000 | - | 6.8 GiB (0.0 B, 0.0 B, 1014.0 MiB (stage 57.0: task 194))
*PhotonResultStage | 1 | 0 ms | 0 | - | 128.0 MiB
-*(19): PhotonShuffleExchangeSource | - | 0 ms | 8 | 1 | -
-*(20): PhotonAgg | - | 0 ms | 1 | 1 | -
*WholeStageCodegen (1) | 1 | 0 ms | - | - | -
-*(22): ColumnarToRow | - | - | 1 | - | -
(40): AdaptiveSparkPlan | - | - | - | - | -
答案1
得分: 1
让我们看一下这个看起来太长的有趣行:
Node | # Tasks | Duration total (min, med, max) | # Rows | Est # Rows | Peak Mem total (min, med, max)
-*(8): PhotonShuffleExchangeSink | - | 29.6 s (0 ms, 0 ms, 4.8 s (stage 57.0: task 195)) | 320,000,000 | - | 6.8 GiB (0.0 B, 0.0 B, 1014.0 MiB (stage 57.0: task 194))
确实,总持续时间为29.6秒。这怎么可能是一个在12秒内完成的更大查询的一部分呢?
嗯,Duration total
值是所有任务所花费的总时间。它是所有这些时间的简单总和。但你在这里没有得到的信息是,这些任务是并行运行的。
让我们举一个非常简单的例子:如果你有2个任务,每个任务在并行运行中花费了5秒,从开始到结束将需要5秒。但总持续时间(跨所有任务)是10秒!
因此,你需要一些关于你的集群有多少CPU的信息(这决定了有多少任务可以并行运行)才能对开始到结束的时间有一个概念。
在你的Spark UI顶部的 Stages
选项卡中查看并选择阶段8(运行时间较长的阶段)。查看任务的开始时间戳和结束时间戳,这应该能解释清楚这个问题
英文:
Lets have a look at the interesting line that seems to take way too long:
Node | # Tasks | Duration total (min, med, max) | # Rows | Est # Rows | Peak Mem total (min, med, max)
-*(8): PhotonShuffleExchangeSink | - | 29.6 s (0 ms, 0 ms, 4.8 s (stage 57.0: task 195)) | 320,000,000 | - | 6.8 GiB (0.0 B, 0.0 B, 1014.0 MiB (stage 57.0: task 194))
Indeed, this total duration is 29.6s. How can this be part of a bigger query that finished in 12s?
Well, the Duration total
value refers to the total time that all tasks took. It is a simple sum over all of those timings. But what you're not getting as information here is that these tasks ran in parallel.
Let's take a very simple example: If you have 2 tasks that ran 5s each in parallel, start to end will have taken 5s. But the total duration (across all tasks) was 10s!
So you need some information on how many CPU your cluster had (which dictates how many tasks run in parallel) to have an idea on what the start to finish time was.
Have a look at the Stages
tab in the top of your Spark UI and select stage 8 (the long running one). Check out the starting timestamps of the tasks and the ending timestamps of the tasks in there, that should clear it up
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论